Skip to content

Instantly share code, notes, and snippets.

@mikelove
Created June 15, 2023 06:17
Show Gist options
  • Select an option

  • Save mikelove/09cdb828392bfa7ee88d34fa13d24ae3 to your computer and use it in GitHub Desktop.

Select an option

Save mikelove/09cdb828392bfa7ee88d34fa13d24ae3 to your computer and use it in GitHub Desktop.
Drawing DESeq2 fitted spline curves on top of scaled counts
library(splines)
library(DESeq2)
# make some demo data
dds <- makeExampleDESeqDataSet(n=100, m=40)
dds$condition <- sort(runif(40))
# make one gene where expression has a curve shape (just for demo)
s_shape <- round(500 * sin(dds$condition*2*pi) + 1000 + rnorm(40,0,50))
mode(s_shape) <- "integer"
counts(dds)[1,] <- s_shape
# just for visualization
# next line just for demo, use standard size factors for real data
sizeFactors(dds) <- rep(1,ncol(dds))
library(ggplot2)
plotCounts(dds, 1, returnData=TRUE) %>%
ggplot(aes(condition, count)) +
geom_point()
# how to fit curves in DESeq2
design(dds) <- ~ns(condition, df=5)
dds <- DESeq(dds, test="LRT", reduced=~1)
# expected results?
res <- results(dds)
hist(res$pvalue)
plot(res$pvalue, log="y")
# build the needed matrices for fitted spline
coef_mat <- coef(dds)
design_mat <- model.matrix(design(dds), colData(dds))
# build the fitted splines and log counts
library(ggplot2)
library(dplyr)
dat <- plotCounts(dds, 1, returnData=TRUE) %>%
mutate(logmu = design_mat %*% coef_mat[1,],
logcount = log2(count))
# plot it all together
dat %>%
ggplot(aes(condition, logcount)) +
geom_point() +
geom_line(aes(condition, logmu), col="dodgerblue")
@apugalde
Copy link

Thanks for your reply. Then, assuming a new model.matrix X, are the weights correctly calculated in this code?

# Coefficients
   beta_hat <- coef(dds)[gene_id, ]
# Fitted values (log scale)
   mu_log <- as.vector(X %*% beta_hat)
# Weigth matrix
   disp <- dispersions(dds[gene_id])
   W <- diag(as.vector( 1 / (1/mu_log + disp) ))
# VCov matrix
   VAR <- solve(t(X) %*% W %*% X)
# SE (log scale)
   se_log <- sqrt(diag(X %*% VAR %*% t(X)))

You can obtain SE for the fitted values with

x' Cov(beta-hat) x

You can compute Cov manually using X (note, which has orthogonal columns with splines) and the GLM weights (see paper or code). Or more simply you can use glm.nb() with fixed dispersions(dds) and the log of size factor as offset.

@mikelove
Copy link
Author

This is correct for W. But then you need to use the formula from paper for Cov(beta hat)

IMG_3319

I think glm.nb() approach may be easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment