5 SIP metagenomics and identification of isotope incorporators
The isotope incorporators could be identified using the below methods:
- AFE mathematical model by Hungate et al. (2015) or
- Delta buoyant density \(AFE_{ΔBD}\) = \(\frac{W_{Lab}- W_{Light}}{I_{max}}\).
Where Imax is the maximum linear shift in DNA BD (upon 100% labeling), as discussed by Birnie and Rickwood (1978) - MW-HR-SIP Youngblut et al.
- Delta buoyant density \(AFE_{ΔBD}\) = \(\frac{W_{Lab}- W_{Light}}{I_{max}}\).
While the first two methods gives a quantitative estimate of isotopic enrichment in a taxon to determine if it is an incorporator, the third method relies on differential abundance analysis over multiple overlapping windows. Please refer to Youngblut et al. and HTS-SIP R package for more details.
5.1 Comparison of the methods
5.1.1 AFE method
Isotope incorporators from the introductory vignette were identified using the AFE method Reproducing the code here, we have
= SIPmg::qSIP_atom_excess_MAGs(phylo.qSIP,
atomX control_expr='Isotope=="12C"',
treatment_rep='Replicate',
Gi = GC_content)
#Bootstrap confidence intervals
= SIPmg::qSIP_bootstrap_fcr(atomX, n_boot=10)
df_atomX_boot = 0
CI_threshold = df_atomX_boot %>%
df_atomX_boot ::mutate(Incorporator_qSIP = A_CI_fcr_low > CI_threshold,
dplyrIncorporator_delbd = A_delbd - A_delbd_sd > 0,
OTU = stats::reorder(OTU, -A))
= df_atomX_boot %>%
df_atomX_boot ::inner_join(taxonomy_tibble %>%
dplyr::select(user_genome, classification) %>%
dplyr::rename(OTU = user_genome)) dplyr
5.1.2 MW-HR-SIP
Check incorporators in the overlapping windows of 1.71 - 1.74; 1.72 - 1.75; 1.73 - 1.76. The windows must be chosen in a more judicious manner that fits the hypothesis of the study.
= data.frame(density_min=c(1.71,1.72, 1.73),
windows density_max=c(1.74,1.75,1.76))
= 0.05
padj_cutoff #ncores = 6
#doParallel::registerDoParallel(ncores)
= SIPmg::HRSIP(physeq = phylo.qSIP, design = ~Isotope,
mw.hr.sip density_windows = windows,
sparsity_threshold = seq(0, 0.3, 0.05),
padj_cutoff = padj_cutoff)
= mw.hr.sip %>%
mw.hr.sip ::mutate(incorp = padj < padj_cutoff) dplyr
Compare number of incorporators
#Get incorporator info
= df_atomX_boot %>%
qSIP_incorp ::select(OTU, classification, A, A_sd, Incorporator_qSIP) %>%
dplyr::filter(Incorporator_qSIP == TRUE) %>%
dplyr::select(-classification)
dplyr= nrow(qSIP_incorp)
n_qSIP_incorp
= df_atomX_boot %>%
delbd_incorp ::select(OTU, classification, A_delbd, A_delbd_sd, Incorporator_delbd) %>%
dplyr::filter(Incorporator_delbd == TRUE) %>%
dplyr::select(-classification)
dplyr= nrow(delbd_incorp)
n_delbd_incorp
= mw.hr.sip %>%
mw.hr.sip_incorp ::select(OTU, taxa, incorp) %>%
dplyr::filter(incorp == TRUE) %>%
dplyr::rename("Incorporator_mw_hr.sip" = "incorp") %>%
dplyr::select(-taxa)
dplyr= nrow(mw.hr.sip_incorp)
n_mw.hr.sip_incorp
= dplyr::full_join(qSIP_incorp, dplyr::full_join(delbd_incorp, mw.hr.sip_incorp, by = "OTU"), by = "OTU")
all_incorp_tibble
#Print incorporator information
cat('Number of incorporators detected by qSIP:', n_qSIP_incorp, '\n')
## Number of incorporators detected by qSIP: 25
cat('Number of incorporators detected by ΔBD:', n_delbd_incorp, '\n')
## Number of incorporators detected by ΔBD: 25
cat('Number of incorporators detected by MW-HR-SIP:', n_mw.hr.sip_incorp, '\n')
## Number of incorporators detected by MW-HR-SIP: 6
::paged_table(all_incorp_tibble) rmarkdown
It is evident here that the AFE estimates using qSIP model or the ΔBD method are not the same, however, both provide the same incorporator information. Although the incorporators may not be the same everytime, we found an overlap between qSIP and ΔBD methods in our metagenome datasets. We also found that qSIP model provided more accurate estimates of AFE compared to ΔBD method. It also appears that MW-HR-SIP outputs lower number of incorporators. In our study and by Youngblut et al., MW-HR-SIP was found to have better sensitivity compared to qSIP or ΔBD methods. It is interesting that the AFE based methods and differential abundance based methods do not provide the same incorporator identity, which is perhaps due to comparistive analysis in only the multiple overlapping windows in the MW-HR-SIP method, while the AFE methods examine the complete BD gradient. Due to the higher sensitivity of the MW-HR-SIP method, we attempted a sequential analysis with first MW-HR-SIP to eliminate likely false positives, and then perform qSIP to estimate AFE. Such a method reduces multiple comparisons and increases confidence in detecting true incorporators due to higher statistical power. We are merely providing the different methods for the user to decide which method to apply for detecting incorporators in their study. The choice of the method is to the user’s discretion that fits best their study. ## Session information
sessionInfo()
## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_Canada.utf8 LC_CTYPE=English_Canada.utf8
## [3] LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C
## [5] LC_TIME=English_Canada.utf8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] SIPmg_1.4 ggpubr_0.5.0 HTSSIP_1.4.1
## [4] phyloseq_1.42.0 forcats_0.5.2 stringr_1.5.0
## [7] dplyr_1.0.10 purrr_0.3.5 readr_2.1.3
## [10] tidyr_1.2.1 tibble_3.1.8 ggplot2_3.4.0
## [13] tidyverse_1.3.2 BiocManager_1.30.19
##
## loaded via a namespace (and not attached):
## [1] readxl_1.4.1 backports_1.4.1
## [3] plyr_1.8.8 igraph_1.3.5
## [5] lazyeval_0.2.2 splines_4.2.2
## [7] BiocParallel_1.32.4 GenomeInfoDb_1.34.4
## [9] digest_0.6.30 foreach_1.5.2
## [11] htmltools_0.5.4 fansi_1.0.3
## [13] magrittr_2.0.3 memoise_2.0.1
## [15] googlesheets4_1.0.1 cluster_2.1.4
## [17] tzdb_0.3.0 Biostrings_2.66.0
## [19] annotate_1.76.0 modelr_0.1.10
## [21] matrixStats_0.63.0 vroom_1.6.0
## [23] timechange_0.1.1 colorspace_2.0-3
## [25] blob_1.2.3 rvest_1.0.3
## [27] haven_2.5.1 xfun_0.35
## [29] crayon_1.5.2 RCurl_1.98-1.9
## [31] jsonlite_1.8.4 survival_3.4-0
## [33] iterators_1.0.14 ape_5.6-2
## [35] glue_1.6.2 gtable_0.3.1
## [37] gargle_1.2.1 zlibbioc_1.44.0
## [39] XVector_0.38.0 DelayedArray_0.24.0
## [41] car_3.1-1 Rhdf5lib_1.20.0
## [43] BiocGenerics_0.44.0 abind_1.4-5
## [45] scales_1.2.1 DBI_1.1.3
## [47] rstatix_0.7.1 Rcpp_1.0.9
## [49] xtable_1.8-4 bit_4.0.5
## [51] stats4_4.2.2 httr_1.4.4
## [53] RColorBrewer_1.1-3 ellipsis_0.3.2
## [55] farver_2.1.1 pkgconfig_2.0.3
## [57] XML_3.99-0.13 sass_0.4.4
## [59] dbplyr_2.2.1 locfit_1.5-9.6
## [61] utf8_1.2.2 labeling_0.4.2
## [63] polynom_1.4-1 tidyselect_1.2.0
## [65] rlang_1.0.6 reshape2_1.4.4
## [67] AnnotationDbi_1.60.0 munsell_0.5.0
## [69] cellranger_1.1.0 tools_4.2.2
## [71] cachem_1.0.6 cli_3.4.1
## [73] generics_0.1.3 RSQLite_2.2.19
## [75] ade4_1.7-20 broom_1.0.1
## [77] evaluate_0.19 biomformat_1.26.0
## [79] fastmap_1.1.0 yaml_2.3.6
## [81] knitr_1.41 bit64_4.0.5
## [83] fs_1.5.2 KEGGREST_1.38.0
## [85] nlme_3.1-160 xml2_1.3.3
## [87] compiler_4.2.2 rstudioapi_0.14
## [89] png_0.1-8 ggsignif_0.6.4
## [91] reprex_2.0.2 geneplotter_1.76.0
## [93] bslib_0.4.2 stringi_1.7.8
## [95] highr_0.10 lattice_0.20-45
## [97] Matrix_1.5-3 vegan_2.6-4
## [99] permute_0.9-7 multtest_2.54.0
## [101] vctrs_0.5.1 pillar_1.8.1
## [103] lifecycle_1.0.3 rhdf5filters_1.10.0
## [105] jquerylib_0.1.4 data.table_1.14.6
## [107] bitops_1.0-7 GenomicRanges_1.50.1
## [109] R6_2.5.1 bookdown_0.31
## [111] IRanges_2.32.0 codetools_0.2-18
## [113] MASS_7.3-58.1 assertthat_0.2.1
## [115] rhdf5_2.42.0 SummarizedExperiment_1.28.0
## [117] DESeq2_1.38.1 withr_2.5.0
## [119] S4Vectors_0.36.1 GenomeInfoDbData_1.2.9
## [121] mgcv_1.8-41 parallel_4.2.2
## [123] hms_1.1.2 grid_4.2.2
## [125] rmarkdown_2.19 MatrixGenerics_1.10.0
## [127] carData_3.0-5 googledrive_2.0.0
## [129] Biobase_2.58.0 lubridate_1.9.0