Gaussian derivatives applied to Smemo’s data

Last updated: 2017-12-21

Code version: 6e42447

Introduction

Re-analyze Smemo et al 2014’s mouse heart RNA-seq data after discussion with Matthew.

counts.mat = read.table("../data/smemo.txt", header = T, row.name = 1)
counts.mat = counts.mat[, -5]

Gene selection

Only use genes with total counts of \(4\) samples \(\geq 5\).

counts = counts.mat[rowSums(counts.mat) >= 5, ]
design = model.matrix(~c(0, 0, 1, 1))

Number of selected genes: 17191

Summary statistics

source("../code/count_to_summary.R")
summary <- count_to_summary(counts, design)
betahat <- summary$betahat
sebetahat <- summary$sebetahat
z <- summary$z

Fitting \(z\) with Gaussian derivatives

With stretch GD can fit \(z\) scores, but it seems there should be signals.

GD Coefficients:

0 : 1 ; 1 : 0.0112765123515518 ; 2 : 1.60751487400088 ; 3 : 0.361378958092878 ; 4 : 1.6578925761474 ; 5 : 0.670189379060511 ; 6 : 0.759975038796516 ; 7 : 0.557659272292095 ; 8 : -0.0586994517222567 ; 9 : 0.175073181131895 ; 10 : -0.132350826272878 ;

Discovered by `BH` and `ASH`

Feeding summary statistics to BH and ASH, both give thousands of discoveries.

fit.BH = p.adjust((1 - pnorm(abs(z))) * 2, method = "BH")
## Number of discoveries by BH
sum(fit.BH <= 0.05)

[1] 2541

fit.ash = ashr::ash(betahat, sebetahat, method = "fdr")
## Number of discoveries by ASH
sum(get_svalue(fit.ash) <= 0.05)

[1] 6440

Fitting `ASH` first or Gaussian derivatives first

Using default setting \(L = 10\), \(\lambda = 10\), \(\rho = 0.5\), compare the GD-ASH results by fitting ASH first vs fitting GD first. They indeed arrive at different local minima.

fit.gdash.ASH <- gdash(betahat, sebetahat,
                       gd.priority = FALSE)
## Regularized log-likelihood by fitting ASH first
fit.gdash.ASH$loglik

[1] -12483.86

fit.gdash.GD <- gdash(betahat, sebetahat)
## Regularized log-likelihood by fitting GD first
fit.gdash.GD$loglik

[1] -22136.92

`GD-ASH` with larger penalties on \(w\)

Using \(\lambda = 50\), \(\rho = 0.1\), fitting ASH first and GD first give the same result, and produce 1400+ discoveries with \(q\) values \(\leq 0.05\), all of which are discovered by BH.

L = 10
lambda = 50
rho = 0.1
fit.gdash.ASH <- gdash(betahat, sebetahat,
                       gd.ord = L, w.lambda = lambda, w.rho = rho,
                       gd.priority = FALSE)
## Regularized log-likelihood by fitting ASH first
fit.gdash.ASH$loglik

[1] -13651.59

## Number of discoveries
sum(fit.gdash.ASH$qvalue <= 0.05)

[1] 1431

fit.gdash.GD <- gdash(betahat, sebetahat,
                      gd.ord = L, w.lambda = lambda, w.rho = rho,
                      gd.priority = TRUE)
## Regularized log-likelihood by fitting GD first
fit.gdash.GD$loglik

[1] -13651.59

## Number of discoveries
sum(fit.gdash.GD$qvalue <= 0.05)

[1] 1431

GD Coefficients:

0 : 1 ; 1 : -0.0475544308510135 ; 2 : 0.707888470469342 ; 3 : 0.149489828947119 ; 4 : -8.97499076623316e-14 ; 5 : 0.109281416075664 ; 6 : -3.00530934822662e-13 ; 7 : 0.0783545592042359 ; 8 : -2.99572304462426e-13 ; 9 : 0.0911488252640105 ; 10 : -2.99578347875936e-13 ;

Session information

sessionInfo()

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ashr_2.2-2        Rmosek_8.0.69     PolynomF_1.0-1    CVXR_0.94-4      
[5] REBayes_1.2       Matrix_1.2-12     SQUAREM_2017.10-1 EQL_1.0-0        
[9] ttutils_1.0-1    

loaded via a namespace (and not attached):
 [1] gmp_0.5-13.1      Rcpp_0.12.14      compiler_3.4.3   
 [4] git2r_0.20.0      R.methodsS3_1.7.1 R.utils_2.6.0    
 [7] iterators_1.0.9   tools_3.4.3       digest_0.6.13    
[10] bit_1.1-12        evaluate_0.10.1   lattice_0.20-35  
[13] foreach_1.4.4     yaml_2.1.16       parallel_3.4.3   
[16] Rmpfr_0.6-1       ECOSolveR_0.3-2   stringr_1.2.0    
[19] knitr_1.17        locfit_1.5-9.1    rprojroot_1.3-1  
[22] bit64_0.9-7       grid_3.4.3        R6_2.2.2         
[25] rmarkdown_1.8     limma_3.34.4      edgeR_3.20.2     
[28] magrittr_1.5      MASS_7.3-47       backports_1.1.2  
[31] codetools_0.2-15  htmltools_0.3.6   scs_1.1-1        
[34] assertthat_0.2.0  stringi_1.1.6     pscl_1.5.2       
[37] doParallel_1.0.11 truncnorm_1.0-7   R.oo_1.21.0

This R Markdown site was created with workflowr

Gaussian derivatives applied to Smemo’s data

Lei Sun

2017-06-17

Introduction

Gene selection

Summary statistics

Fitting \(z\) with Gaussian derivatives

Discovered by `BH` and `ASH`

Fitting `ASH` first or Gaussian derivatives first

`GD-ASH` with larger penalties on \(w\)

Session information

Gaussian derivatives applied to Smemo’s data

Lei Sun

2017-06-17

Introduction

Gene selection

Summary statistics

Fitting \(z\) with Gaussian derivatives

Discovered by BH and ASH

Fitting ASH first or Gaussian derivatives first

GD-ASH with larger penalties on \(w\)

Session information

Discovered by `BH` and `ASH`

Fitting `ASH` first or Gaussian derivatives first

`GD-ASH` with larger penalties on \(w\)