Scientific Papers

Assessing and mitigating batch effects in large-scale omics studies | Genome Biology


  • Goh WWB, Yong CH, Wong L. Are batch effects still relevant in the age of big data? Trends Biotechnol. 2022;40:1029–40.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Cuklina J, Lee CH, Williams EG, Sajic T, Collins BC, Rodriguez Martinez M, et al. Diagnostics and correction of batch effects in large-scale proteomic studies: a tutorial. Mol Syst Biol. 2021;17:e10240.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Goh WWB, Wang W, Wong L. Why batch effects matter in omics data, and how to avoid them. Trends Biotechnol. 2017;35:498–507.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Lazar C, Meganck S, Taminau J, Steenhoff D, Coletta A, Molter C, et al. Batch effect removal methods for microarray gene expression data integration: a survey. Brief Bioinform. 2013;14:469–90.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Maceda I, Lao O. Analysis of the batch effect due to sequencing center in population statistics quantifying rare events in the 1000 genomes project. Genes (Basel). 2021;13:44.

    Article 
    PubMed 

    Google Scholar
     

  • Wickland DP, Ren Y, Sinnwell JP, Reddy JS, Pottier C, Sarangi V, et al. Impact of variant-level batch effects on identification of genetic risk factors in large sequencing studies. PLoS ONE. 2021;16:e0249305.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Anderson-Trocme L, Farouni R, Bourgey M, Kamatani Y, Higasa K, Seo JS, et al. Legacy data confound genomics studies. Mol Biol Evol. 2020;37:2–10.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Rasnic R, Brandes N, Zuk O, Linial M. Substantial batch effects in TCGA exome sequences undermine pan-cancer analysis of germline variants. BMC Cancer. 2019;19:783.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Mars RAT, Yang Y, Ward T, Houtti M, Priya S, Lekatz HR, et al. Longitudinal multi-omics reveals subset-specific mechanisms underlying irritable Bowel syndrome. Cell. 2020;183:1137–40.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Banchereau R, Hong S, Cantarel B, Baldwin N, Baisch J, Edens M, et al. Personalized immunomonitoring uncovers molecular networks that stratify lupus patients. Cell. 2016;165:1548–50.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Han W, Li L. Evaluating and minimizing batch effects in metabolomics. Mass Spectrom Rev. 2020;41:421–42.

    Article 
    PubMed 

    Google Scholar
     

  • Ugidos M, Nueda MJ, Prats-Montalban JM, Ferrer A, Conesa A, Tarazona S. MultiBaC: an R package to remove batch effects in multi-omic experiments. Bioinformatics. 2022;38:2657–8.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zheng Y, Liu Y, Yang J, Dong L, Zhang R, Tian S, et al. Multi-omics data integration using ratio-based quantitative profiling with Quartet reference materials. Nat Biotechnol. 2023. https://doi.org/10.1038/s41587-41023-01934-41581.

  • Chen W, Zhao Y, Chen X, Yang Z, Xu X, Bi Y, et al. A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples. Nat Biotechnol. 2021;39:1103–14.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Tran HTN, Ang KS, Chevrier M, Zhang X, Lee NYS, Goh M, et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 2020;21:12.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Freedman LP, Inglese J. The increasing urgency for standards in basic biologic research. Cancer Res. 2014;74:4024–9.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Hao Y, Hao S, Andersen-Nissen E, Mauck WM 3rd, Zheng S, Butler A, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184:3573–3587 e3529.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Eddy S, Mariani LH, Kretzler M. Integrated multi-omics approaches to improve classification of chronic kidney disease. Nat Rev Nephrol. 2020;16:657–68.

    Article 
    PubMed 

    Google Scholar
     

  • Hasin Y, Seldin M, Lusis A. Multi-omics approaches to disease. Genome Biol. 2017;18:83.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Rosellini M, Marchetti A, Mollica V, Rizzo A, Santoni M, Massari F. Prognostic and predictive biomarkers for immunotherapy in advanced renal cell carcinoma. Nat Rev Urol. 2023;20:133–57.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Hassan M, Awan FM, Naz A, deAndres-Galiana EJ, Alvarez O, Cernea A, et al. Innovations in genomics and big data analytics for personalized medicine and health care: a review. Int J Mol Sci. 2022;23:4645.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jiang P, Sinha S, Aldape K, Hannenhalli S, Sahinalp C, Ruppin E. Big data in basic and translational cancer research. Nat Rev Cancer. 2022;22:625–39.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Montaner J, Ramiro L, Simats A, Tiedt S, Makris K, Jickling GC, et al. Multilevel omics for the discovery of biomarkers and therapeutic targets for stroke. Nat Rev Neurol. 2020;16:247–64.

    Article 
    PubMed 

    Google Scholar
     

  • Li Y, Ma Y, Wang K, Zhang M, Wang Y, Liu X, et al. Using composite phenotypes to reveal hidden physiological heterogeneity in high-altitude acclimatization in a Chinese Han longitudinal cohort. Phenomics. 2021;1:3–14.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Xia Q, Thompson JA, Koestler DC. pwrBRIDGE: a user-friendly web application for power and sample size estimation in batch-confounded microarray studies with dependent samples. Stat Appl Genet Mol Biol. 2022;21:20220003.

    Article 
    PubMed 

    Google Scholar
     

  • Chen G, Ning B, Shi T. Single-cell RNA-seq technologies and related computational data analysis. Front Genet. 2019;10:317.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Yip SH, Sham PC, Wang J. Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data. Brief Bioinform. 2019;20:1583–9.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Phua SX, Lim KP, Goh WW. Perspectives for better batch effect correction in mass-spectrometry-based proteomics. Comput Struct Biotechnol J. 2022;20:4369–75.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet. 2010;11:733–9.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Gagnon-Bartsch JA, Speed TP. Using control genes to correct for unwanted variation in microarray data. Biostatistics. 2012;13:539–52.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8:118–27.

    Article 
    PubMed 

    Google Scholar
     

  • Yu Y, Zhang N, Mai Y, Chen Q, Cao Z, Chen Q, et al. Correcting batch effects in large-scale multiomic studies using a reference-material-based ratio method. Genome Biol. 2023;24:201.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zhou W, Koudijs KKM, Bohringer S. Influence of batch effect correction methods on drug induced differential gene expression profiles. BMC Bioinformatics. 2019;20:437.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Li S, Labaj PP, Zumbo P, Sykacek P, Shi W, Shi L, et al. Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat Biotechnol. 2014;32:888–95.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Li S, Tighe SW, Nicolet CM, Grove D, Levy S, Farmerie W, et al. Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study. Nat Biotechnol. 2014;32:915–25.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Luo J, Schumacher M, Scherer A, Sanoudou D, Megherbi D, Davison T, et al. A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data. Pharmacogenomics J. 2010;10:278–91.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Cardoso F, van’t Veer LJ, Bogaerts J, Slaets L, Viale G, Delaloge S, et al. 70-gene signature as an aid to treatment decisions in early-stage breast cancer. N Engl J Med. 2016;375:717–29.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Lin S, Lin Y, Nery JR, Urich MA, Breschi A, Davis CA, et al. Comparison of the transcriptional landscapes between human and mouse tissues. Proc Natl Acad Sci U S A. 2014;111:17224–9.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Gilad Y, MizrahiMan O. A reanalysis of mouse ENCODE comparative gene expression data. F1000Res. 2015;4:121.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Mullard A. Half of top cancer studies fail high-profile reproducibility effort. Nature. 2021;600:368–9.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Errington TM, Mathur M, Soderberg CK, Denis A, Perfito N, Iorns E, et al. Investigating the replicability of preclinical cancer biology. Elife. 2021;10:e71601.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Baker M. 1,500 scientists lift the lid on reproducibility. Nature. 2016;533:452–4.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Freedman LP, Cockburn IM, Simcoe TS. The economics of reproducibility in preclinical research. PLoS Biol. 2015;13:e1002165.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zhang S, Li X, Zhao S, Drobizhev M, Ai HW. Retraction note: a fast, high-affinity fluorescent serotonin biosensor engineered from a tick lipocalin. Nat Methods. 2021;18:575.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Yano Y, Mitoma N, Matsushima K, Wang F, Matsui K, Takakura A, et al. Retraction note: living annulative pi-extension polymerization for graphene nanoribbon synthesis. Nature. 2020;588:180.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Zhang S, Li X, Zhao S, Drobizhev M, Ai HW. A fast, high-affinity fluorescent serotonin biosensor engineered from a tick lipocalin. Nat Methods. 2021;18:258–61.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Errington TM, Denis A, Perfito N, Iorns E, Nosek BA. Challenges for assessing replicability in preclinical cancer biology. Elife. 2021;10:10.


    Google Scholar
     

  • Foox J, Tighe SW, Nicolet CM, Zook JM, Byrska-Bishop M, Clarke WE, et al. Performance assessment of DNA sequencing platforms in the ABRF next-generation sequencing study. Nat Biotechnol. 2021;39:1129–40.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Molania R, Foroutan M, Gagnon-Bartsch JA, Gandolfo LC, Jain A, Sinha A, et al. Removing unwanted variation from large-scale RNA sequencing data with PRPS. Nat Biotechnol. 2023;41:82–95.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Freedman LP, Venugopalan G, Wisman R. Reproducibility 2020: progress and priorities. F1000Res. 2017;6:604.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wang C, Gong B, Bushel PR, Thierry-Mieg J, Thierry-Mieg D, Xu J, et al. The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat Biotechnol. 2014;32:926–32.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lippi G, Chance JJ, Church S, Dazzi P, Fontana R, Giavarina D, et al. Preanalytical quality improvement: from dream to reality. Clin Chem Lab Med. 2011;49:1113–26.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Su Y, Chen D, Yuan D, Lausted C, Choi J, Dai CL, et al. Multi-omics resolves a sharp disease-state shift between mild and moderate COVID-19. Cell. 2020;183(1479–1495):e1420.


    Google Scholar
     

  • Geyer PE, Holdt LM, Teupser D, Mann M. Revisiting biomarker discovery by plasma proteomics. Mol Syst Biol. 2017;13:942.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Halvey P, Farutin V, Koppes L, Gunay NS, Pappas DA, Manning AM, et al. Variable blood processing procedures contribute to plasma proteomic variability. Clin Proteomics. 2021;18:5.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Abraham RA, Agrawal PK, Acharya R, Sarna A, Ramesh S, Johnston R, et al. Effect of temperature and time delay in centrifugation on stability of select biomarkers of nutrition and non-communicable diseases in blood samples. Biochem Med (Zagreb). 2019;29:020708.

    Article 
    PubMed 

    Google Scholar
     

  • Jonasdottir HS, Brouwers H, Toes REM, Ioan-Facsinay A, Giera M. Effects of anticoagulants and storage conditions on clinical oxylipid levels in human plasma. Biochim Biophys Acta Mol Cell Biol Lipids. 2018;1863:1511–22.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Oddoze C, Lombard E, Portugal H. Stability study of 81 analytes in human whole blood, in serum and in plasma. Clin Biochem. 2012;45:464–9.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Xue VW, Ng SSM, Leung WW, Ma BBY, Cho WCS, Au TCC, et al. The effect of centrifugal force in quantification of colorectal cancer-related mRNA in plasma using targeted sequencing. Front Genet. 2018;9:165.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wong SC, Ma BB, Lai PB, Ng SS, Lee JF, Hui EP, et al. The effect of centrifugation on circulating mRNA quantitation opens up a new scenario in expression profiling from patients with metastatic colorectal cancer. Clin Biochem. 2007;40:1277–84.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Zimmermann M, Traxler D, Simader E, Bekos C, Dieplinger B, Lainscak M, et al. In vitro stability of heat shock protein 27 in serum and plasma under different pre-analytical conditions: implications for large-scale clinical studies. Ann Lab Med. 2016;36:353–7.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lippi G, Lima-Oliveira G, Brocco G, Bassi A, Salvagno GL. Estimating the intra- and inter-individual imprecision of manual pipetting. Clin Chem Lab Med. 2017;55:962–6.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Bobryk S, Goossen L. Variation in pipetting may lead to the decreased detection of antibodies in manual gel testing. Clin Lab Sci. 2011;24:161–6.

    Article 
    PubMed 

    Google Scholar
     

  • Pandya K, Ray CA, Brunner L, Wang J, Lee JW, DeSilva B. Strategies to minimize variability and bias associated with manual pipetting in ligand binding assays to assure data quality of protein therapeutic quantification. J Pharm Biomed Anal. 2010;53:623–30.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Ambardar S, Gupta R, Trakroo D, Lal R, Vakhlu J. High throughput sequencing: an overview of sequencing chemistry. Indian J Microbiol. 2016;56:394–404.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Xiao W, Ren L, Chen Z, Fang LT, Zhao Y, Lack J, et al. Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing. Nat Biotechnol. 2021;39:1141–50.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Stark R, Grzelak M, Hadfield J. RNA sequencing: the teenage years. Nat Rev Genet. 2019;20:631–56.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016;17:13.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Yu Y, Hou W, Wang H, Dong L, Liu Y, Sun S, et al. Quartet RNA reference materials improve the quality of transcriptomic data through ratio-based profiling. Nat Biotechnol. 2023. https://doi.org/10.1038/s41587-41023-01867-41589.

  • Mereu E, Lafzi A, Moutinho C, Ziegenhain C, McCarthy DJ, Alvarez-Varela A, et al. Benchmarking single-cell RNA-sequencing protocols for cell atlas projects. Nat Biotechnol. 2020;38:747–55.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Dal Molin A, Di Camillo B. How to design a single-cell RNA-sequencing experiment: pitfalls, challenges and perspectives. Brief Bioinform. 2019;20:1384–94.

    Article 
    CAS 

    Google Scholar
     

  • Kolodziejczyk AA, Kim JK, Svensson V, Marioni JC, Teichmann SA. The technology and biology of single-cell RNA sequencing. Mol Cell. 2015;58:610–20.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Su Z, Łabaj PP, Li S, Thierry-Mieg J, Thierry-Mieg D, Shi W, et al. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing quality control consortium. Nat Biotechnol. 2014;32:903–14.

    Article 
    CAS 

    Google Scholar
     

  • Sprang M, Andrade-Navarro MA, Fontaine JF. Batch effect detection and correction in RNA-seq data using machine-learning-based automated assessment of quality. BMC Bioinformatics. 2022;23:279.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Li X, Zhang P, Wang H, Yu Y. Genes expressed at low levels raise false discovery rates in RNA samples contaminated with genomic DNA. BMC Genomics. 2022;23:554.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Sanchez-Illana A, Pineiro-Ramos JD, Sanjuan-Herraez JD, Vento M, Quintas G, Kuligowski J. Evaluation of batch effect elimination using quality control replicates in LC-MS metabolite profiling. Anal Chim Acta. 2018;1019:38–48.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Kuligowski J, Perez-Guaita D, Lliso I, Escobar J, Leon Z, Gombau L, et al. Detection of batch effects in liquid chromatography-mass spectrometry metabolomic data using guided principal component analysis. Talanta. 2014;130:442–8.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Poulos RC, Hains PG, Shah R, Lucas N, Xavier D, Manda SS, et al. Strategies to enable large-scale proteomics for reproducible research. Nat Commun. 2020;11:3793.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jiang F, Liu Q, Li Q, Zhang S, Qu X, Zhu J, et al. Signal drift in liquid chromatography tandem mass spectrometry and its internal standard calibration strategy for quantitative analysis. Anal Chem. 2020;92:7690–8.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Brenes A, Hukelmann J, Bensaddek D, Lamond AI. Multibatch TMT reveals false positives, batch effects and missing values. Mol Cell Proteomics. 2019;18:1967–80.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Bell AW, Deutsch EW, Au CE, Kearney RE, Beavis R, Sechi S, et al. A HUPO test sample study reveals common problems in mass spectrometry-based proteomics. Nat Methods. 2009;6:423–30.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tian S, Zhan D, Yu Y, Liu M, Wang Y, Song L, et al. Quartet protein reference materials and datasets for multi-platform assessment of label-free proteomics. Genome Biol. 2023;24:202.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zhang N, Chen Q, Zhang P, Zhou K, Liu Y, Wang H, et al. Quartet metabolite reference materials for inter-laboratory proficiency test and data integration of metabolomics profiling. Genome Biol. 2024;25:34.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Siskos AP, Jain P, Romisch-Margl W, Bennett M, Achaintre D, Asad Y, et al. Interlaboratory reproducibility of a targeted metabolomics platform for analysis of human serum and plasma. Anal Chem. 2017;89:656–65.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Goh WWB, Wong L. Advanced bioinformatics methods for practical applications in proteomics. Brief Bioinform. 2019;20:347–55.

    Article 
    PubMed 

    Google Scholar
     

  • Ren L, Duan X, Dong L, Zhang R, Yang J, Gao Y, et al. Quartet DNA reference materials and datasets for comprehensively evaluating germline variant calling performance. Genome Biol. 2023;24:270.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Pan B, Ren L, Onuchic V, Guan M, Kusko R, Bruinsma S, et al. Assessing reproducibility of inherited variants detected with short-read whole genome sequencing. Genome Biol. 2022;23:2.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Sahraeian SME, Mohiyuddin M, Sebra R, Tilgner H, Afshar PT, Au KF, et al. Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis. Nat Commun. 2017;8:59.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Frohlich K, Brombacher E, Fahrner M, Vogele D, Kook L, Pinter N, et al. Benchmarking of analysis strategies for data-independent acquisition proteomics using a large-scale dataset comprising inter-patient heterogeneity. Nat Commun. 2022;13:2622.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • O’Rawe J, Jiang T, Sun G, Wu Y, Wang W, Hu J, et al. Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med. 2013;5:28.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ioannidis JP, Greenland S, Hlatky MA, Khoury MJ, Macleod MR, Moher D, et al. Increasing value and reducing waste in research design, conduct, and analysis. Lancet. 2014;383:166–75.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ren L, Shi L, Zheng Y. Reference materials for improving reliability of multiomics profiling. Phenomics. 2024. https://doi.org/10.1007/s43657-023-00153-7. in press.

    Article 
    PubMed 

    Google Scholar
     

  • Sheng Q, Vickers K, Zhao S, Wang J, Samuels DC, Koues O, et al. Multi-perspective quality control of illumina RNA sequencing data analysis. Brief Funct Genomics. 2017;16:194–204.

    CAS 
    PubMed 

    Google Scholar
     

  • Manimaran S, Selby HM, Okrah K, Ruberman C, Leek JT, Quackenbush J, et al. BatchQC: interactive software for evaluating sample and batch effects in genomic data. Bioinformatics. 2016;32:3836–8.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Liu X, Li N, Liu S, Wang J, Zhang N, Zheng X, et al. Normalization methods for the analysis of unbalanced transcriptome data: a review. Front Bioeng Biotechnol. 2019;7:358.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wu Y, Li L. Sample normalization methods in quantitative metabolomics. J Chromatogr A. 2016;1430:80–95.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Quartet Project Team. Quartet RNA reference materials improve the quality of transcriptomic data through ratio-based profiling. Dataset. Open archive for miscellaneous data (OMIX). 2023. https://ngdc.cncb.ac.cn/omix/release/OMIX002254.

  • Quartet Project Team. Visualization of diagnsitics of batch effects. 2023. GitHub. https://doi.org/10.5281/zenodo.8101796.

  • van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learning Res. 2008;9:2579–605.


    Google Scholar
     

  • Diaz-Papkovich A, Anderson-Trocme L, Gravel S. A review of UMAP in population genetics. J Hum Genet. 2021;66:85–91.

    Article 
    PubMed 

    Google Scholar
     

  • Becht E, McInnes L, Healy J, Dutertre CA, Kwok IWH, Ng LG, et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol. 2019;37:38–44.

    Article 
    CAS 

    Google Scholar
     

  • Kobak D, Berens P. The art of using t-SNE for single-cell transcriptomics. Nat Commun. 2019;10:5416.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Gandolfo LC, Speed TP. RLE plots: visualizing unwanted variation in high dimensional data. PLoS ONE. 2018;13:e0191629.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Bushel P. Principal variance component analysis. 2021. https://www.niehs.nih.gov/research/resources/software/biostatistics/pvca/index.cfm.

  • Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36:411–20.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Varma S. Blind estimation and correction of microarray batch effect. PLoS ONE. 2020;15:e0231446.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Reese SE, Archer KJ, Therneau TM, Atkinson EJ, Vachon CM, de Andrade M, et al. A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis. Bioinformatics. 2013;29:2877–83.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Buttner M, Miao Z, Wolf FA, Teichmann SA, Theis FJ. A test metric for assessing single-cell RNA-seq batch correction. Nat Methods. 2019;16:43–9.

    Article 
    PubMed 

    Google Scholar
     

  • Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods. 2019;16:1289–96.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Azizi E, Carr AJ, Plitas G, Cornish AE, Konopacki C, Prabhakaran S, et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell. 2018;174(1293–1308):e1236.


    Google Scholar
     

  • Hubert L, Arabie P. Comparing partitions. J Classif. 1985;2:193–218.

    Article 

    Google Scholar
     

  • Batool F, Hennig C. Clustering with the average Silhouette width. Comput Stat Data Anal. 2021;158:107190.

    Article 

    Google Scholar
     

  • Albrecht S, Sprang M, Andrade-Navarro MA, Fontaine JF. seqQscorer: automated quality control of next-generation sequencing data using machine learning. Genome Biol. 2021;22:75.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Parker HS, Leek JT. The practical effect of batch on genomic prediction. Stat Appl Genet Mol Biol. 2012;11:Article 10.

    Article 
    PubMed 

    Google Scholar
     

  • Shi L, Campbell G, Jones WD, Campagne F, Wen Z, Walker SJ, et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol. 2010;28:827–38.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Handelman GS, Kok HK, Chandra RV, Razavi AH, Huang S, Brooks M, et al. Peering into the black box of artificial intelligence: evaluation metrics of machine learning methods. AJR Am J Roentgenol. 2019;212:38–43.

    Article 
    PubMed 

    Google Scholar
     

  • Zhang Y, Jenkins DF, Manimaran S, Johnson WE. Alternative empirical Bayes models for adjusting for batch effects in genomic studies. BMC Bioinformatics. 2018;19:262.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ni Z, Sun P, Zheng J, Wu M, Yang C, Cheng M, et al. JNK signaling promotes bladder cancer immune escape by regulating METTL3-mediated m6A modification of PD-L1 mRNA. Cancer Res. 2022;82:1789–802.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • He YY, Xie XM, Zhang HD, Ye J, Gencer S, van der Vorst EPC, et al. Identification of hypoxia induced metabolism associated genes in pulmonary hypertension. Front Pharmacol. 2021;12:753727.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Nakayama M, Marchi H, Dmitrieva AM, Chakraborty A, Merl-Pham J, Hennen E, et al. Quantitative proteomics of differentiated primary bronchial epithelial cells from chronic obstructive pulmonary disease and control identifies potential novel host factors post-influenza A virus infection. Front Microbiol. 2022;13:957830.

    Article 
    PubMed 

    Google Scholar
     

  • Acharjee A, Hazeldine J, Bazarova A, Deenadayalu L, Zhang J, Bentley C, et al. Integration of metabolomic and clinical data improves the prediction of intensive care unit length of stay following major traumatic injury. Metabolites. 2021;12:12.

    Article 

    Google Scholar
     

  • Stein CK, Qu P, Epstein J, Buros A, Rosenthal A, Crowley J, et al. Removing batch effects from purified plasma cell gene expression microarrays with modified ComBat. BMC Bioinformatics. 2015;16:63.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Adamer MF, Bruningk SC, Tejada-Arranz A, Estermann F, Basler M, Borgwardt K. reComBat: batch-effect removal in large-scale multi-source gene-expression data integration. Bioinform Adv. 2022;2:vbac071.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zhang Y, Parmigiani G, Johnson WE. ComBat-seq: batch effect adjustment for RNA-seq count data.NAR. Genom Bioinform. 2020;2:lqaa078.

    Article 

    Google Scholar
     

  • Leek JT, Storey JD. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007;3:1724–35.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28:882–3.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lee S, Sun W, Wright FA, Zou F. An improved and explicit surrogate variable analysis procedure by coefficient adjustment. Biometrika. 2017;104:303–16.

    Article 
    PubMed 

    Google Scholar
     

  • Parker HS, Leek JT, Favorov AV, Considine M, Xia X, Chavan S, et al. Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction. Bioinformatics. 2014;30:2757–63.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Chakraborty S, Datta S, Datta S. svapls: an R package to correct for hidden factors of variability in gene expression studies. BMC Bioinformatics. 2013;14:236.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Leek JT. svaseq: removing batch effects and other unwanted noise from sequencing data. Nucleic Acids Res. 2014;42:e161.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Karpievitch YV, Taverner T, Adkins JN, Callister SJ, Anderson GA, Smith RD, et al. Normalization of peak intensities in bottom-up MS-based proteomics using singular value decomposition. Bioinformatics. 2009;25:2573–80.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Crowell AM, Greene CS, Loros JJ, Dunlap JC. Learning and imputation for mass-spec bias reduction (LIMBR). Bioinformatics. 2019;35:1518–26.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Karpievitch YV, Nikolic SB, Wilson R, Sharman JE, Edwards LM. Metabolomics data normalization with EigenMS. PLoS ONE. 2014;9:e116221.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Risso D, Ngai J, Speed TP, Dudoit S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol. 2014;32:896–902.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Molania R, Gagnon-Bartsch JA, Dobrovic A, Speed TP. A new normalization for Nanostring nCounter gene expression data. Nucleic Acids Res. 2019;47:6073–83.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Salim A, Molania R, Wang J, De Livera A, Thijssen R, Speed TP. RUV-III-NB: normalization of single cell RNA-seq data. Nucleic Acids Res. 2022;50:e96.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • De Livera AM, Sysi-Aho M, Jacob L, Gagnon-Bartsch JA, Castillo S, Simpson JA, et al. Statistical methods for handling unwanted variation in metabolomics data. Anal Chem. 2015;87:3606–15.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Haghverdi L, Lun ATL, Morgan MD, Marioni JC. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat Biotechnol. 2018;36:421–7.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zou B, Zhang T, Zhou R, Jiang X, Yang H, Jin X, et al. deepMNN: deep learning-based single-cell RNA sequencing data batch correction using mutual nearest neighbors. Front Genet. 2021;12:708981.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wu Y, Zhang K. Tools for the analysis of high-dimensional single-cell RNA sequencing data. Nat Rev Nephrol. 2020;16:408–21.

    Article 
    PubMed 

    Google Scholar
     

  • Li H, Brouwer CR, Luo W. A universal deep neural network for in-depth cleaning of single-cell RNA-Seq data. Nat Commun. 2022;13:1901.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Li X, Wang K, Lyu Y, Pan H, Zhang J, Stambolian D, et al. Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis. Nat Commun. 2020;11:2338.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Lotfollahi M, Wolf FA, Theis FJ. scGen predicts single-cell perturbation responses. Nat Methods. 2019;16:715–21.

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Lopez R, Regier J, Cole MB, Jordan MI, Yosef N. Deep generative modeling for single-cell transcriptomics. Nat Methods. 2018;15:1053–8.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Editorial Board. Sequencing benchmarked. Nat Biotechnol. 2021;39:1027.

  • Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006;24:1151–61.

  • Mercer TR, Xu J, Mason CE, Tong W. The Sequencing Quality Control 2 study: establishing community standards for sequencing in precision medicine. Genome Biol. 2021;22:306.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Reis ALM, Deveson IW, Madala BS, Wong T, Barker C, Xu J, et al. Using synthetic chromosome controls to evaluate the sequencing of difficult regions within the human genome. Genome Biol. 2022;23:19.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Fang LT, Zhu B, Zhao Y, Chen W, Yang Z, Kerrigan L, et al. Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Nat Biotechnol. 2021;39:1151–60.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Gong B, Li D, Kusko R, Novoradovskaya N, Zhang Y, Wang S, et al. Cross-oncopanel study reveals high sensitivity and accuracy with overall analytical performance depending on genomic regions. Genome Biol. 2021;22:109.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Deveson IW, Gong B, Lai K, LoCoco JS, Richmond TA, Schageman J, et al. Evaluating the analytical validity of circulating tumor DNA sequencing assays for precision oncology. Nat Biotechnol. 2021;39:1115–28.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Foox J, Nordlund J, Lalancette C, Gong T, Lacey M, Lent S, et al. The SEQC2 epigenomics quality control (EpiQC) study. Genome Biol. 2021;22:332.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wang YW, Lêcao KA. Managing batch effects in microbiome data. Brief Bioinform. 2020;21:1954–70.

    Article 
    PubMed 

    Google Scholar
     

  • Fachrul M, Méric G, Inouye M, Pamp SJ, Salim A. Assessing and removing the effect of unwanted technical variations in microbiome data. Sci Rep. 2022;12:22236.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ling WD, Lu JY, Zhao N, Lulla A, Plantinga AM, Fu WJ, et al. Batch effects removal for microbiome data via conditional quantile regression. Nat Commun. 2022;13:5418.

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Wang YW, Cao KAL. PLSDA-batch: a multivariate framework to correct for batch effects in microbiome data. Brief Bioinform. 2023;24(2):bbac622.



  • Source link