Integrative Analysis of Transcriptome and Methylation Data in Human Non-Small Cell Lung Cancer

Integrative Analysis of Transcriptome and Methylation Data in Human Non-Small Cell Lung Cancer

Xiang AO

Department of Computer Science, City University of Hong Kong, Hong Kong.

Human lung cancer is the most prevalent cancer worldwide that consisting of two main subtypes: the non-small cell lung cancer (NSCLC) and the small cell lung cancer (SCLC). NSCLC comprises over 80% of lung cancer and the treatment of NSCLC is mostly guided by tumor stage, although distinctive molecular characteristics between two major subtypes of NSCLC, i.e., lung adenocarcinoma (LUAD) and squamous cell lung carcinoma (LUSC), have been increasingly identified. In this study, we integrated the gene expression data and methylation data to investigate the genetic differences between LUAD and LUSC. We further applied the Boruta package to select key features from LUAD and LUSC tumor samples to build predictive models of tumor stage. We finally obtained 6 key gene expression features and 4 key methylation features that can be reliably used in prediction of LUAD and LUSC stage.

Keywords: Transcriptome; Methylation Data; Lung Cancer

Free Full-text PDF

How to cite this article:
Xiang AO. Integrative Analysis of Transcriptome and Methylation Data in Human Non-Small Cell Lung Cancer. Scientific Research and Reviews, 2021; 14:124. DOI: 10.28933/srr-2021-04-1105


1. Cancer. Retrieved from
2. Cersosimo, Robert J. “Lung cancer: a review.” American journal of health-system pharmacy 59.7 (2002): 611-642.
3. Rahal, Zahraa, et al. “Smoking and lung cancer: a geo-regional perspective.” Frontiers in onco- logy 7 (2017): 194.
4. Jeon, Jihyoun, et al. “Smoking and lung cancer mortality in the United States from 2015 to 2065: a comparative modeling approach.” Annals of internal medicine 169.10 (2018): 684-693.
5. Walser, Tonya, et al. “Smoking and lung cancer: the role of inflammation.” Proceedings of the American Thoracic Society 5.8 (2008): 811-815.
6. Du, Yihui, et al. “Lung cancer occurrence attributable to passive smoking among never smokers in China: a systematic review and meta-analysis.” Translational Lung Cancer Research 9.2 (2020): 204.
7. Lung Cancer Risk Factors. Retrieved from
8. Travis, William D., et al. “The 2015 World Health Organization classification of lung tumors: impact of genetic, clinical and radio- logic advances since the 2004 classification.” Journal of thoracic oncology 10.9 (2015): 1243-1260.
9. What is lung cancer. Retrieved from
10. Chen, Zhao, et al. “Non-small-cell lung cancers: a heterogeneous set of diseases.” Nature Reviews Cancer 14.8 (2014): 535-546.
11. Davies, Helen, et al. “Mutations of the BRAF gene in human cancer.” Nature 417.6892 (2002): 949-954.
12. Santos, Eugenio, et al. “Malignant activation of a K-ras oncogene in lung carcinoma but not in normal tissue of the same patient.” Science 223.4637 (1984): 661-664.
13. Lynch, Thomas J., et al. “Activating mutations in the epidermal growth factor receptor underlying responsiveness of non–small-cell lung cancer to gefitinib.” New England Journal of Medicine 350.21 (2004): 2129-2139.
14. Paez, J. Guillermo, et al. “EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy.” Science 304.5676 (2004): 1497-1500.
15. Pao, William, et al. “EGF receptor gene mutations are common in lung cancers from “never smokers” and are associated with sensitivity of tumors to gefitinib and erlotinib.” Proceedings of the National Academy of Sciences 101.36 (2004): 13306-13311.
16. Shepherd, Frances A., et al. “Erlotinib in previously treated non–small-cell lung cancer.” New England Journal of Medicine 353.2 (2005): 123-132.
17. Engelman, Jeffrey A., et al. “MET amplification leads to gefitinib resistance in lung cancer by activating ERBB3 signaling.” science 316.5827 (2007): 1039-1043.
18. Fernandez-Cuesta, Lynnette, et al. “CD74– NRG1 fusions in lung adenocarcinoma.” Cancer discovery 4.4 (2014): 415-422.
19. Kohno, Takashi, et al. “KIF5B-RET fusions in lung adenocarcinoma.” Nature medicine 18.3 (2012): 375-377.
20. Rikova, Klarisa, et al. “Global survey of phosphotyrosine signaling identifies oncogenic kinases in lung cancer.” Cell 131.6 (2007): 1190-1203.
21. Soda, Manabu, et al. “Identification of the transforming EML4–ALK fusion gene in non-small-cell lung cancer.” Nature 448.7153 (2007): 561-566.
22. Stephens, Philip, et al. “Intragenic ERBB2 kinase mutations in tumours.” Nature 431.7008 (2004): 525-526.
23. Cancer Genome Atlas Research Network. “Comprehensive genomic characterization of squamous cell lung cancers.” Nature 489.7417 (2012): 519.
24. Vaishnavi, Aria, et al. “Oncogenic and drug- sensitive NTRK1 rearrangements in lung cancer.” Nature medicine 19.11 (2013): 1469- 1472.
25. Weiss, Jonathan, et al. “Frequent and focal FGFR1 amplification associates with therapeu- tically tractable FGFR1 dependency in squa- mous cell lung cancer.” Science translational medicine 2.62 (2010): 62ra93-62ra93.
26. Tomczak, Katarzyna, Patrycja Czerwińska, and Maciej Wiznerowicz. “The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge.” Contemporary oncology 19.1A (2015): A68.
27. Gao, Galen F., et al. “Before and after: comparison of legacy and harmonized TCGA genomic data commons’ data.” Cell systems 9.1 (2019): 24-34.
28. Colaprico, Antonio, et al. “TCGAbiolinks: an R / Bioconductor package for integrative analysis of TCGA data.” Nucleic acids research 44.8 (2016): e71-e71.
29. Silva, Tiago C., et al. “TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages.” F1000Research 5 (2016).
30. Mounir, Mohamed, et al. “New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx.” PLoS computational biology 15.3 (2019): e1006701.
32. Leek JT, Johnson WE, Parker HS, Fertig EJ, Jaffe AE, Zhang Y, Storey JD, Torres LC (2020). sva: Surrogate Variable Analysis. R package version 3.36.0.
33. Aryee, Martin J., et al. “Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation micro- arrays.” Bioinformatics 30.10 (2014): 1363- 1369.
34. Maksimovic, Jovana, Lavinia Gordon, and Alicia Oshlack. “SWAN: Subset-quantile within array normalization for illumina infinium HumanMe- thylation450 BeadChips.” Genome biology 13.6 (2012): R44.
35. Fortin, Jean-Philippe, et al. “Functional normali- zation of 450k methylation array data improves replication in large cancer studies.” Genome biology 15.11 (2014): 503.
36. Triche Jr, Timothy J., et al. “Low-level processing of Illumina Infinium DNA methylation beadarrays.” Nucleic acids research 41.7 (2013): e90-e90.
37. Fortin, Jean-Philippe, and Kasper D. Hansen. “Reconstructing A/B compartments as revealed by Hi-C using long-range correlations in epige- netic data.” Genome biology 16.1 (2015): 180.
38. [38] Andrews, Shan V., et al. ““Gap hunting” to characterize clustered probe signals in Illumina methylation array data.” Epigenetics & chro- matin 9.1 (2016): 1-21.
39. [39] Fortin, Jean-Philippe, Timothy J. Triche Jr, and Kasper D. Hansen. “Preprocessing, norma- lization and integration of the Illumina Human- MethylationEPIC array with minfi.” Bioinfor- matics 33.4 (2017): 558-560.
40. Kursa, Miron B., and Witold R. Rudnicki. “Feature selection with the Boruta package.” J Stat Softw 36.11 (2010): 1-13.
41. Meyer, David, and FH Technikum Wien. “Support vector machines.” The Interface to libsvm in package e1071 28 (2015).
42. Love, Michael I., Wolfgang Huber, and Simon Anders. “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.” Genome biology 15.12 (2014): 1-21.
43. Yu, Guangchuang, et al. “clusterProfiler: an R package for comparing biological themes among gene clusters.” Omics: a journal of integrative biology 16.5 (2012): 284-287.

Terms of Use/Privacy Policy/ Disclaimer/ Other Policies:
You agree that by using our site, you have read, understood, and agreed to be bound by all of our terms of use/privacy policy/ disclaimer/ other policies (click here for details).

This work and its PDF file(s) are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.