Created
June 15, 2023 06:17
-
-
Save mikelove/09cdb828392bfa7ee88d34fa13d24ae3 to your computer and use it in GitHub Desktop.
Drawing DESeq2 fitted spline curves on top of scaled counts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| library(splines) | |
| library(DESeq2) | |
| # make some demo data | |
| dds <- makeExampleDESeqDataSet(n=100, m=40) | |
| dds$condition <- sort(runif(40)) | |
| # make one gene where expression has a curve shape (just for demo) | |
| s_shape <- round(500 * sin(dds$condition*2*pi) + 1000 + rnorm(40,0,50)) | |
| mode(s_shape) <- "integer" | |
| counts(dds)[1,] <- s_shape | |
| # just for visualization | |
| # next line just for demo, use standard size factors for real data | |
| sizeFactors(dds) <- rep(1,ncol(dds)) | |
| library(ggplot2) | |
| plotCounts(dds, 1, returnData=TRUE) %>% | |
| ggplot(aes(condition, count)) + | |
| geom_point() | |
| # how to fit curves in DESeq2 | |
| design(dds) <- ~ns(condition, df=5) | |
| dds <- DESeq(dds, test="LRT", reduced=~1) | |
| # expected results? | |
| res <- results(dds) | |
| hist(res$pvalue) | |
| plot(res$pvalue, log="y") | |
| # build the needed matrices for fitted spline | |
| coef_mat <- coef(dds) | |
| design_mat <- model.matrix(design(dds), colData(dds)) | |
| # build the fitted splines and log counts | |
| library(ggplot2) | |
| library(dplyr) | |
| dat <- plotCounts(dds, 1, returnData=TRUE) %>% | |
| mutate(logmu = design_mat %*% coef_mat[1,], | |
| logcount = log2(count)) | |
| # plot it all together | |
| dat %>% | |
| ggplot(aes(condition, logcount)) + | |
| geom_point() + | |
| geom_line(aes(condition, logmu), col="dodgerblue") |
Author
Thanks for your reply. Then, assuming a new model.matrix X, are the weights correctly calculated in this code?
# Coefficients
beta_hat <- coef(dds)[gene_id, ]
# Fitted values (log scale)
mu_log <- as.vector(X %*% beta_hat)
# Weigth matrix
disp <- dispersions(dds[gene_id])
W <- diag(as.vector( 1 / (1/mu_log + disp) ))
# VCov matrix
VAR <- solve(t(X) %*% W %*% X)
# SE (log scale)
se_log <- sqrt(diag(X %*% VAR %*% t(X)))You can obtain SE for the fitted values with
x' Cov(beta-hat) x
You can compute Cov manually using X (note, which has orthogonal columns with splines) and the GLM weights (see paper or code). Or more simply you can use glm.nb() with fixed dispersions(dds) and the log of size factor as offset.
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

You can obtain SE for the fitted values with
x' Cov(beta-hat) x
You can compute Cov manually using X (note, which has orthogonal columns with splines) and the GLM weights (see paper or code). Or more simply you can use glm.nb() with fixed dispersions(dds) and the log of size factor as offset.