5 SIP metagenomics and identification of isotope incorporators

The isotope incorporators could be identified using the below methods:

AFE mathematical model by Hungate et al. (2015) or
- Delta buoyant density \(AFE_{ΔBD}\) = \(\frac{W_{Lab}- W_{Light}}{I_{max}}\).
  Where Imax is the maximum linear shift in DNA BD (upon 100% labeling), as discussed by Birnie and Rickwood (1978)
- MW-HR-SIP Youngblut et al.

While the first two methods gives a quantitative estimate of isotopic enrichment in a taxon to determine if it is an incorporator, the third method relies on differential abundance analysis over multiple overlapping windows. Please refer to Youngblut et al. and HTS-SIP R package for more details.

5.1 Comparison of the methods

5.1.1 AFE method

Isotope incorporators from the introductory vignette were identified using the AFE method Reproducing the code here, we have

atomX = SIPmg::qSIP_atom_excess_MAGs(phylo.qSIP,
                               control_expr='Isotope=="12C"',
                               treatment_rep='Replicate',
                               Gi = GC_content)
#Bootstrap confidence intervals
df_atomX_boot = SIPmg::qSIP_bootstrap_fcr(atomX, n_boot=10)
CI_threshold = 0
df_atomX_boot = df_atomX_boot %>%
  dplyr::mutate(Incorporator_qSIP = A_CI_fcr_low > CI_threshold,
                Incorporator_delbd = A_delbd - A_delbd_sd > 0,
         OTU = stats::reorder(OTU, -A))

df_atomX_boot = df_atomX_boot %>%
  dplyr::inner_join(taxonomy_tibble %>% 
               dplyr::select(user_genome, classification) %>%
               dplyr::rename(OTU = user_genome))

5.1.2 MW-HR-SIP

Check incorporators in the overlapping windows of 1.71 - 1.74; 1.72 - 1.75; 1.73 - 1.76. The windows must be chosen in a more judicious manner that fits the hypothesis of the study.

windows = data.frame(density_min=c(1.71,1.72, 1.73), 
                     density_max=c(1.74,1.75,1.76))

padj_cutoff = 0.05
#ncores = 6
#doParallel::registerDoParallel(ncores)

mw.hr.sip = SIPmg::HRSIP(physeq = phylo.qSIP, design = ~Isotope,
                                    density_windows = windows,
                                    sparsity_threshold = seq(0, 0.3, 0.05), 
                                    padj_cutoff = padj_cutoff)

mw.hr.sip = mw.hr.sip %>%
  dplyr::mutate(incorp = padj < padj_cutoff)

Compare number of incorporators

#Get incorporator info
qSIP_incorp = df_atomX_boot %>%
  dplyr::select(OTU, classification, A, A_sd, Incorporator_qSIP) %>%
  dplyr::filter(Incorporator_qSIP == TRUE) %>%
  dplyr::select(-classification)
n_qSIP_incorp = nrow(qSIP_incorp)

delbd_incorp = df_atomX_boot %>%
  dplyr::select(OTU, classification, A_delbd, A_delbd_sd, Incorporator_delbd) %>%
  dplyr::filter(Incorporator_delbd == TRUE) %>%
  dplyr::select(-classification)
n_delbd_incorp = nrow(delbd_incorp)

mw.hr.sip_incorp = mw.hr.sip %>%
  dplyr::select(OTU, taxa, incorp) %>%
  dplyr::filter(incorp == TRUE) %>%
  dplyr::rename("Incorporator_mw_hr.sip" = "incorp") %>%
  dplyr::select(-taxa)
n_mw.hr.sip_incorp = nrow(mw.hr.sip_incorp)

all_incorp_tibble = dplyr::full_join(qSIP_incorp, dplyr::full_join(delbd_incorp, mw.hr.sip_incorp, by = "OTU"), by = "OTU")

#Print incorporator information
cat('Number of incorporators detected by qSIP:', n_qSIP_incorp, '\n')

## Number of incorporators detected by qSIP: 25

cat('Number of incorporators detected by ΔBD:', n_delbd_incorp, '\n')

## Number of incorporators detected by ΔBD: 25

cat('Number of incorporators detected by MW-HR-SIP:', n_mw.hr.sip_incorp, '\n')

## Number of incorporators detected by MW-HR-SIP: 6

rmarkdown::paged_table(all_incorp_tibble)

It is evident here that the AFE estimates using qSIP model or the ΔBD method are not the same, however, both provide the same incorporator information. Although the incorporators may not be the same everytime, we found an overlap between qSIP and ΔBD methods in our metagenome datasets. We also found that qSIP model provided more accurate estimates of AFE compared to ΔBD method. It also appears that MW-HR-SIP outputs lower number of incorporators. In our study and by Youngblut et al., MW-HR-SIP was found to have better sensitivity compared to qSIP or ΔBD methods. It is interesting that the AFE based methods and differential abundance based methods do not provide the same incorporator identity, which is perhaps due to comparistive analysis in only the multiple overlapping windows in the MW-HR-SIP method, while the AFE methods examine the complete BD gradient. Due to the higher sensitivity of the MW-HR-SIP method, we attempted a sequential analysis with first MW-HR-SIP to eliminate likely false positives, and then perform qSIP to estimate AFE. Such a method reduces multiple comparisons and increases confidence in detecting true incorporators due to higher statistical power. We are merely providing the different methods for the user to decide which method to apply for detecting incorporators in their study. The choice of the method is to the user’s discretion that fits best their study. ## Session information

sessionInfo()

## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8   
## [3] LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C                   
## [5] LC_TIME=English_Canada.utf8    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] SIPmg_1.4           ggpubr_0.5.0        HTSSIP_1.4.1       
##  [4] phyloseq_1.42.0     forcats_0.5.2       stringr_1.5.0      
##  [7] dplyr_1.0.10        purrr_0.3.5         readr_2.1.3        
## [10] tidyr_1.2.1         tibble_3.1.8        ggplot2_3.4.0      
## [13] tidyverse_1.3.2     BiocManager_1.30.19
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.4.1                backports_1.4.1            
##   [3] plyr_1.8.8                  igraph_1.3.5               
##   [5] lazyeval_0.2.2              splines_4.2.2              
##   [7] BiocParallel_1.32.4         GenomeInfoDb_1.34.4        
##   [9] digest_0.6.30               foreach_1.5.2              
##  [11] htmltools_0.5.4             fansi_1.0.3                
##  [13] magrittr_2.0.3              memoise_2.0.1              
##  [15] googlesheets4_1.0.1         cluster_2.1.4              
##  [17] tzdb_0.3.0                  Biostrings_2.66.0          
##  [19] annotate_1.76.0             modelr_0.1.10              
##  [21] matrixStats_0.63.0          vroom_1.6.0                
##  [23] timechange_0.1.1            colorspace_2.0-3           
##  [25] blob_1.2.3                  rvest_1.0.3                
##  [27] haven_2.5.1                 xfun_0.35                  
##  [29] crayon_1.5.2                RCurl_1.98-1.9             
##  [31] jsonlite_1.8.4              survival_3.4-0             
##  [33] iterators_1.0.14            ape_5.6-2                  
##  [35] glue_1.6.2                  gtable_0.3.1               
##  [37] gargle_1.2.1                zlibbioc_1.44.0            
##  [39] XVector_0.38.0              DelayedArray_0.24.0        
##  [41] car_3.1-1                   Rhdf5lib_1.20.0            
##  [43] BiocGenerics_0.44.0         abind_1.4-5                
##  [45] scales_1.2.1                DBI_1.1.3                  
##  [47] rstatix_0.7.1               Rcpp_1.0.9                 
##  [49] xtable_1.8-4                bit_4.0.5                  
##  [51] stats4_4.2.2                httr_1.4.4                 
##  [53] RColorBrewer_1.1-3          ellipsis_0.3.2             
##  [55] farver_2.1.1                pkgconfig_2.0.3            
##  [57] XML_3.99-0.13               sass_0.4.4                 
##  [59] dbplyr_2.2.1                locfit_1.5-9.6             
##  [61] utf8_1.2.2                  labeling_0.4.2             
##  [63] polynom_1.4-1               tidyselect_1.2.0           
##  [65] rlang_1.0.6                 reshape2_1.4.4             
##  [67] AnnotationDbi_1.60.0        munsell_0.5.0              
##  [69] cellranger_1.1.0            tools_4.2.2                
##  [71] cachem_1.0.6                cli_3.4.1                  
##  [73] generics_0.1.3              RSQLite_2.2.19             
##  [75] ade4_1.7-20                 broom_1.0.1                
##  [77] evaluate_0.19               biomformat_1.26.0          
##  [79] fastmap_1.1.0               yaml_2.3.6                 
##  [81] knitr_1.41                  bit64_4.0.5                
##  [83] fs_1.5.2                    KEGGREST_1.38.0            
##  [85] nlme_3.1-160                xml2_1.3.3                 
##  [87] compiler_4.2.2              rstudioapi_0.14            
##  [89] png_0.1-8                   ggsignif_0.6.4             
##  [91] reprex_2.0.2                geneplotter_1.76.0         
##  [93] bslib_0.4.2                 stringi_1.7.8              
##  [95] highr_0.10                  lattice_0.20-45            
##  [97] Matrix_1.5-3                vegan_2.6-4                
##  [99] permute_0.9-7               multtest_2.54.0            
## [101] vctrs_0.5.1                 pillar_1.8.1               
## [103] lifecycle_1.0.3             rhdf5filters_1.10.0        
## [105] jquerylib_0.1.4             data.table_1.14.6          
## [107] bitops_1.0-7                GenomicRanges_1.50.1       
## [109] R6_2.5.1                    bookdown_0.31              
## [111] IRanges_2.32.0              codetools_0.2-18           
## [113] MASS_7.3-58.1               assertthat_0.2.1           
## [115] rhdf5_2.42.0                SummarizedExperiment_1.28.0
## [117] DESeq2_1.38.1               withr_2.5.0                
## [119] S4Vectors_0.36.1            GenomeInfoDbData_1.2.9     
## [121] mgcv_1.8-41                 parallel_4.2.2             
## [123] hms_1.1.2                   grid_4.2.2                 
## [125] rmarkdown_2.19              MatrixGenerics_1.10.0      
## [127] carData_3.0-5               googledrive_2.0.0          
## [129] Biobase_2.58.0              lubridate_1.9.0