Jing Zhang ,Haohua Yao ,Chunliu Lai ,Xue Sun ,Xiujuan Yang ,Shurong Li ,Yubiao Guo,Junhang Luo,Zhihua Wen,Kejing Tang,7
1Division of Pulmonary and Critical Care Medicine,the First Affiliated Hospital of Sun Yat-sen University,Guangzhou 510080,China;2Department of Urology,Guangdong Provincial People’s Hospital,Guangdong Academy of Medical Sciences,Guangzhou 510080,China;3Department of Respiratory and Critical Care Medicine,the Fourth People’s Hospital of Shenyang,Shenyang 110031,China;4 Department of Pharmacy,Zhujiang Hospital,Southern Medical University,Guangzhou 510280,China;5Department of Radiology,the First Affiliated Hospital of Sun Yat-sen University,Guangzhou 510080,China;6Department of Urology,the First Affiliated Hospital of Sun Yat-sen University,Guangzhou 510080,China;7Department of Pharmacy,the First Affiliated Hospital of Sun Yat-Sen University,Guangzhou 510080,China
Abstract Objective: DNA methylation alterations are early events in carcinogenesis and immune signalling in lung cancer.This study aimed to develop a model based on short stature homeobox 2 gene (SHOX2)/prostaglandin E receptor 4 gene (PTGER4) DNA methylation in plasma,appearance subtype of pulmonary nodules (PNs) and low-dose computed tomography (LDCT) images to distinguish early-stage lung cancers.Methods: We developed a multimodal prediction model with a training set of 257 individuals.The performance of the multimodal prediction model was further validated in an independent validation set of 42 subjects.In addition,we explored the association between SHOX2/PTGER4 DNA methylation and driver gene mutations in lung cancer based on data from The Cancer Genome Atlas (TCGA) portal.Results: There were significant differences between the early-stage lung cancers and benign groups in the methylation levels.The area under a receiver operator characteristic curve (AUC) of SHOX2 in patients with solid nodules,mixed ground-glass opacity nodules and pure ground-glass opacity nodules were 0.693,0.497 and 0.864,respectively,while the AUCs of PTGER4 were 0.559,0.739 and 0.619,respectively.With the highest AUC of 0.894,the novel multimodal prediction model outperformed the Mayo Clinic model (0.519) and LDCT-based deep learning model (0.842) in the independent validation set.Database analysis demonstrated that patients with SHOX2/PTGER4 DNA hypermethylation were enriched in TP53 mutations.Conclusions: The present multimodal prediction model could more efficiently distinguish early-stage lung cancer from benign PNs.A prognostic index based on DNA methylation and lung cancer driver gene alterations may separate the patients into groups with good or poor prognosis.
Keywords: Lung neoplasms;short stature homeobox 2;prostaglandin E receptor 4;deep learning;early diagnosis
Lung cancer is the leading cause of cancer-related mortality globally,resulting in approximately 350 deaths per day in 2022 (1).The mortality of lung cancer remains high mainly due to late diagnoses,late treatment regimens,and unsatisfactory therapeutic effects (2).It has been shown that the prognosis of lung cancer is highly correlated with the stage of the disease at diagnosis,with a 5-year survival rate decreasing dramatically from 81%-85% for stage IA to 6% for stage IV (3).Thus,there is an urgent need to find a rapid,safe and cost-effective method for accurate early diagnosis of lung cancer.
In recent years,much progress has been made in the research field of lung cancer occurrence and progression.The National Lung Screening Trial (NLST) reported that low-dose computed tomography (LDCT) decreases the mortality rate by 20% in individuals at high risk for lung cancer (4,5),which led to a wide acceptance of LDCT as a reliable screening tool for early lung cancer detection.However,due to the high false-positive rate and overdiagnosis associated with this method,distinguishing the small percentage of malignant nodules from the majority of the detected pulmonary nodules (PNs) remains challenging (6,7).Based on patients’ demographic characteristics and radiological features of PNs on LDCT images,ccurrent guidelines provided inconsistent recommendations on the approach to evaluate the malignancy risk of PNs.The Mayo Clinic model was recommended for evaluation by most guidelines such as American College of Chest Physicians (ACCP) guideline(8),National Comprehensive Cancer Network (NCCN)guideline (www.nccn.org/patients) and Fleischner Society guideline (9),whereas the Brock calculator was recommended by British Thoracic Society (BTS) guideline(10).However,with the AUC of only 0.59 in external validation (11),models proposed by these guidelines were not always desirable.With the advent of the era of big data,data and computer computing power have greatly increased.Artificial intelligence (AI) technologies,typified by deep learning (DL),have promising potential in cancer diagnosis and treatment (12) and could partly compensate for the drawbacks of LDCT.
Based on the analysis of tumor-derived circulating nucleic acids,circulating tumor cells and exosomes,liquid biopsy has been considered as an easier,safer,and less invasive method for cancer diagnosis and therapeutic response monitoring (13,14).DNA methylation plays a crucial role in the regulation of gene expression,epigenetic changes,and maintenance of cellular identity in tumorigenesis (15).Several studies have reported the staged and diagnostic significance of short stature homeobox 2 gene (SHOX2)/prostaglandin E receptor 4 gene (PTGER4)DNA methylation in specimen of lung cancer patients(16,17).Nowadays,a combination of liquid biopsy withSHOX2/PTGER4DNA methylation analysis,in spite of lower methylation levels and unconcordance with specimen,has been identified as promising non-invasive biomarkers in the early diagnosis of lung cancer (11,18-20).However,those studies included a very large portion of locally advanced and advanced lung cancer patients,which would greatly reduce the early-stage diagnosis accuracy.In addition,driver gene alterations,such as mutations in theEGFR,ROS1,andTP53genes,play a critical role in the treatment of lung cancer,especially for patients who received radical surgery or radical radiotherapy.Interestingly,little is known about howSHOX2/PTGER4DNA methylation differs between lung cancer patients with different driver gene alteration statuses.
In this study,we examined the diagnostic performance ofSHOX2/PTGER4DNA methylation and aimed to develop a multimodal prediction model based onSHOX2/PTGER4DNA methylation,subtype of PNs and LDCT-based DL model to distinguish early-stage lung cancers from benign nodules.Finally,we aimed to identify driver gene mutations based onSHOX2/PTGER4DNA methylation levels.
From November 2019 to October 2021,patients with malignant or benign PNs were enrolled in the First Affiliated Hospital of Sun Yat-sen University (Guangzhou,China).The study was approved by the Medical Ethics Committee of the First Affiliated Hospital of Sun Yat-Sen University,and the written consent for this retrospective analysis was waived.
Patients were included if they met the following criteria:1) are diagnosed as malignant PNs or suspicious for malignancy by LDCT/radiologists with nodule sizes between 5 and 30 mm,including types of solid nodules(SNs),pure ground-glass opacity nodules (pGGNs) and mixed ground-glass opacity nodules (mGGNs);2) are older than 18 years and younger than 80 years;3) had complete clinical information;and 4) underwent only surgical resection or both tissue biopsy and surgical resection.The exclusion criteria were as follows: 1) pregnant or lactating females;2) history of any cancer or history of therapy for any cancer;3) current pulmonary infection;or 4) diagnosed with stage III lung cancer or stage IV lung cancer.For the selected patients,the surgical pathological staging was determined based on the the 8th lung cancer TNM classification and clinical staging system (21,22).The histopathologic classification was determined according to the World Health Organization classification (23).
Peripheral blood was drawn in 10-mL EDTA anticoagulant tubes (BD Biosciences,San Jose,CA) before surgery or other invasive operations.Plasma was separated within 2 h after collection.Circulating DNA was extracted from plasma using the Nucleic Acid Extraction Reagent(Excellent Medical Technology Co.,Ltd. Shenzhen,China) according to the instructions.The concentration of purified DNA was determined by the Qubit dsDNA HS Assay Kit (Life Technologies,Carlsbad,CA).Real-time polymerase chain reaction (7500 Real Time PCR System,Applied Biosystems) was performed by using a commercialSHOX2/PTGER4DNA Methylation Detection Kit(SINOMD,Beijing,China). DNA samples were normalized to the internal reference standard [threshold cycle (Ct) value for theGAPDHgene (CTG)].The 2-ΔCtfor each methylation detection replicate compared to the mean CTG was calculated,and the average value of the selected genes in triplicate was divided by the average value of the CTG triplicates.For some plasma sample replicates with extremely low levels of DNA methylation,a Ct of 45 was used,creating a near-zero value for 2-ΔCt.A receiver operating characteristic (ROC) curve was used to evaluate the diagnostic value ofSHOX2/PTGER4DNA methylation.Assessment indices included area under a receiver operator characteristic curve (AUC),sensitivity and specificity.
All patients underwent LDCT screening using a multi-slice spiral CT scanner (Aquilion One 320,Toshiba) in the First Affiliated Hospital of Sun Yat-Sen University.The CT acquisition parameters were as follows: tube voltage,120 kV;effective mAs,100-150;rotation time,0.5 s;matrix,512×512;and a reconstruction slice thickness of 1.0 mm.The unenhanced CT slices were downloaded in Digital Imaging and Communications in Medicine(DICOM) format and then converted to Joint Photographic Experts Group (JPG) images at the window of lung fields.CT images without target nodules were excluded.
As shown inFigure 1,the first quartile (Q1),second quartile (Q2) and third quartile (Q3) CT images of each PN were selected.Two experienced chest radiologists used a rectangle box,which was also named the region of interest (ROI),to crop the PN in each CT image.If there was disagreement about the boundaries of the rectangular box,a third trained chest radiologist gave the final judgement.Once the rectangle box boundary was determined,the ROI was cut out and marked with subtype(SN,pGGN or mGGN) and ground truth (benign or malignant).
DL models with different convolutional neural network(CNN) architectures trained on CT images were used to classify benign and malignant PNs.In this study,the CNN architectures we used included Vgg 16 (24),ResNet 34(25),Xception (26) and MobileNet (27).We downscaled the input resolution of different CNNs appropriately to reduce the computational load but kept the number of input channels at 3.Thus,the three single-channel CT images from a nodule were merged as input to the initial convolutional layer.Following the CNNs,the SoftMax function was used to create a probability distribution over two classes;the class with a higher probability was selected as the output.Thus,the predictions of the DL model based on LDCT images are output in the form of 0 or 1,where 0 means benign nodules and 1 means malignant nodules.
With regard to the training environment,we used the PYTORCH framework and an NVIDIA 2080Ti GPU to train the DL models.All models were trained using an Adam optimizer for 120 epochs,with a base learning rate of 0.01 and a training batch size of 128.The CT images of benign nodules were dynamically oversampled at the image level to strike a balance between the benign nodule samples and malignant nodule samples in each training batch.
Fivefold cross-validation was applied to train and validate these DL models.Patients were split into five partitions before training,keeping the benign and malignant labels balanced between partitions.In each iteration of crossvalidation,four portions were used to fit the model,and the remaining portion was used to validate the model.Models were evaluated at the patient level using the AUC,sensitivity,and specificity in each fold of validation.The average AUC of all five folds represented the overall performance of the models.
Patient-level DL predictions for the entire dataset were obtained from the LDCT-based DL model in each folded validation set in cross-validation. Then,a logistic regression model,called the multimodal prediction model,was constructed using DL prediction,DNA methylation levels ofSHOX2/PTGER4and nodule subtype as predictors of benign and malignant PNs (Figure 1).The output form of the multimodal prediction model was prediction probability. The multimodal prediction model was evaluated using fivefold cross-validation,keeping the same patient groups as in the development phase of the LDCTbased DL model.
The above evaluation of the prediction model was based on the fivefold cross-validation of the training set.In fivefold cross-validation,only four folds are used for model training and the remainder is used for validation of the model that does not fit all the training data.To obtain a final model,the entire data set is often used to retrain a final model for subsequent validation.Therefore,we used an independent validation set to further evaluate our prediction model.First,keeping the training parameters constant with those in the fivefold cross-validation,we retrained the DL model using LDCT images of all 257 patients in the training set,and predicted each patient in the independent validation set.Then we trained the multimodal prediction model using DL prediction,DNA methylation levels ofSHOX2/PTGER4and nodule subtypeof all patients in the training set and evaluated the independent validation set.
The Mayo Clinic model calculated the malignancy probability as a function of 3 clinical and 3 radiographic variables (28).The prediction model is described by the following equations: probability of malignancy=ex/(1+ex),where x=-6.8272+(0.0391 × age)+(0.7917 × smoking) +(1.3388 × cancer)+(0.1274 × nodule diameter)+(1.0407 ×spiculation)+(0.7838 × upper lobe),where e is the base of natural logarithms.
Based on driver gene alterations,the emergence of targeted therapies,such as osimertinib and alectinib,has significantly prolonged the survival time and improved the quality of life of both early-stage and advanced-stage lung cancer patients. Therefore,the investigation of the relationship betweenSHOX2/PTGER4DNA methylation and lung cancer driver gene alterations is of great importance to the opportunities for adjuvant(postoperative) targeted therapy and prognosis.
The DNA methylation profile of 439 patients with lung adenocarcinoma (LUAD) was retrieved from The Cancer Genome Atlas (TCGA) data portal (https://tcga-data.nci.nih.gov/tcga/) based on the Illumina Infinium Human Methylation 450 Bead Chip platform.The single nucleotide variation (SNV) data processed by MuTect2 were downloaded from GDC (https://portal.gdc.cancer.gov/).The SNV mutation data forEGFR,ALK,KRAS,ERBB2,RET,MET,ROS1andTP53were screened,and synonymous mutations were filtered to obtain wild-type(WT) and mutant (Mut) samples.The champ R-package was used to identify the differentially methylated positions(DMPs) between WT and Mut samples of the above genes and to further analyze the correlation between the DMPs inSHOX2andPTGER4and gene mutations in LUAD patients.
R software (Version 3.5;R Foundation for Statistical Computing,Vienna,Austria) and MedCalc Statistics Software (Version 19.6.4;MedCalc Software Ltd,Ostend,Belgium) were used for all analyses,including the subgroup analysis stratified by nodule subtypes.One-way analysis of variance was used to compare the concentrations of serum samples,and the two-samplettest was used to compare pairwise mean values between groups.Other variables were evaluated by the Chi-square test,Fisher’s exact test,and Mann-Whitney U test when appropriate.ROC curves and AUC were used to evaluate the diagnostic value.The ROC curves of the multimodal prediction model and Mayo Clinic model were calculated according to its prediction probability.The ROC curve of the LDCT-based DL model was determined based on the dichotomous output into benign or malignant.Thus,in the ROC space,the AUC of the LDCT-based DL model was calculated by the area under two lines that link the model’s point to (0,0)and (1,1).A two-tailed P value of less than 0.05 was considered statistically significant.The logistic regression was built using “scikit-learn”,a Python module for machine learning.The coefficients and intercept of the logistic regression are shown inSupplementary Table S1.
A total of 363 CT-positive patients were enrolled at the First Affiliated Hospital of Sun Yat-sen University in China.We excluded 39 patients because of a history of any cancer or a history of therapy for any cancer.Thirteen patients complicated by current pulmonary infections,such as bronchiectasis and chronic bronchial infection,were excluded.Of the remaining 311 patients,12 patients who were finally diagnosed with stage III lung cancer or stage IV lung cancer were further excluded.Finally,a total of 299 patients (226 malignant nodules and 73 benign nodules) were included in our meta-analysis.
The subjects were divided into a training set (from November 2019 to May 2021) and an independent validation set (from June 2021 to October 2021) by enrollment time.All the demographic and clinical characteristics are shown inTable 1.Altogether,we recruited 155 SN subjects,53 pGGN subjects and 91 mGGN subjects.There were no significant differences in age,sex,smoking status,stage,histopathology and nodule size between the training group and the independent validation group.Of the 226 lung cancer patients,12 were stage 0,195 were stage I,and 19 were stage II.The histology of lung cancer included adenocarcinoma (n=218),squamous cell carcinoma (n=3),small cell lung cancer (n=2)and unclassified lung cancer (n=3).
There were significant differences between the methylation levels ofSHOX2(0.72±0.06vs.0.76±0.07,P<0.001)(Figure 2A) andPTGER4(0.68±0.03vs.0.70±0.05,P<0.05)(Figure 2E) in the benign lung disease group and lung cancer group.The plasma expression level for each methylation biomarker was compared among different types of PNs.As shown inFigure 2B,C,SHOX2methylation levels were higher in the plasma of patients with malignant SNs or pGGNs than in patients with benign SNs(0.76±0.07vs.0.72±0.05,P<0.001) or pGGNs (0.75±0.06vs.0.67±0.02,P<0.05).PTGER4methylation levels were higher in the plasma of patients with malignant mGGNs than in patients with benign mGGNs (0.69±0.03vs.0.67±0.01,P<0.05) (Figure 2H).Thus,SHOX2/PTGER4DNA methylation could serve as a potential plasma biomarker for identifying malignant PNs.
Table 1 Clinical characteristics of all study participants (N=299)
Figure 2 Comparison of methylation levels of SHOX2 and PTGER4 in different subtypes of PNs.(A-D) Comparison of methylation levels of SHOX2 between patients with benign PN and malignant PN (A);between patients with benign SN and malignant SN (B);between patients with benign pGGN and malignant pGGN (C);between patients with benign mGGN and malignant mGGN (D);(E-H)Comparison of methylation levels of PTGER4 between patients with benign PN and malignant PN (E);between patients with benign SN and malignant SN (F);between patients with benign pGGN and malignant pGGN (G);between patients with benign mGGN and malignant mGGN(H).PN,pulmonary nodule;SN,solid nodule;mGGN,mixed ground-glass opacity nodule;pGGN,pure ground-glass opacity nodule.
To investigate whether clinical variables (gender,smoking history and age) affect the methylation levels ofSHOX2/PTGER4,we examined the potential gender and smoking history differences inSHOX2/PTGER4methylation levels (Supplementary Figure S1A-D).Our results showed that there were no significant difference between males and females inSHOX2(0.74±0.07vs.0.75±0.09,P>0.05) orPTGER4(0.69±0.04vs.0.70±0.05,P>0.05) DNA methylation levels (Supplementary Figure S1A,B).And similar results were found with different smoking status (Supplementary Figure S1C,D).Pearson correlation analysis revealed that there was an extremely weak correlation between age andSHOX2DNA methylation(Spearman r=0.249,Supplementary Figure S1E),while no significant difference was found between age andPTGER4DNA methylation (Supplementary Figure S1F).
To determine the diagnostic values of theSHOX2/PTGER4DNA methylation biomarkers,we performed ROC analysis to evaluate the capability of the two methylation biomarkers in discriminating patients with malignant PNs from patients with benign PNs.As shown inFigure 3,the AUC ofSHOX2was 0.616 [95% confidence interval (95%CI): 0.541,0.691],while the AUC ofPTGER4was 0.570(95% CI: 0.478,0.662).Furthermore,the AUCs ofSHOX2andPTGER4in different types of PNs were analyzed.The AUCs ofSHOX2in patients with SNs,mGGNs or pGGNs were 0.693 (95% CI: 0.603,0.783),0.497 (95% CI: 0.301,0.696) and 0.864 (95% CI: 0.734,0.994),respectively.The AUCs ofPTGER4in patients with SNs,mGGNs or pGGNs were 0.559 (95% CI: 0.456,0.662),0.739 (95%CI: 0.614,0.863) and 0.619 (95% CI: 0.372,0.866),respectively.The diagnostic value of DNA methylation levels ofSHOX2andPTGER4varied with nodule subtype.SHOX2showed better diagnostic performance in pGGN,while PTGER4 showed better performance in mGGN.
Table 2shows the AUCs of the DL models based on different CNN architectures trained at each fold in the fivefold cross-validation.Regardless of the architecture of the DL model,the performance in Fold 2 and Fold 3 was slightly worse than that in Fold 1,Fold 4,and Fold 5.Among all models,the model based on ResNet34 performed the best (mean AUC=0.830),and the model based on Xception performed the worst (mean AUC=0.786).These results indicate that ResNet34 is more suitable than Vgg 16,MobileNet and Xception for the differentiation of benign and malignant pulmonary nodules based on LDCT images.Finally,ResNet 34 was selected as the optimal CNN architecture for the LDCT-based DL model in this study.
Figure 3 ROC curve for diagnostic values of SHOX2 (A) and PTGER4 (B) in different subtypes of PNs.ROC,receiver operating characteristic curve;PN,pulmonary nodule;SN,solid nodule;mGGN,mixed ground-glass opacity nodule;pGGN,pure ground-glass opacity nodule;AUC,area under a receiver operator characteristic curve.
Table 2 Cross-validation of models based on different CNN architectures
The performance of the multimodal prediction model was compared when different predictors were included(Table 3).When only the DNA methylation data (DNA methylation values ofSHOX2andPTGER4) were used as predictors,the model had the lowest mean AUC of 0.655 in cross-validation.When the appearance subtype of PNs was added as a predictor,the mean AUC of the model increased to 0.740.The mean AUC of the DL model based on LDCT images in predicting benign and malignant PNs reached 0.830.When the DNA methylation and DL prediction were used as predictors,the mean AUC of the logistic regression model increased to 0.841.When the subtype of PNs was added as a predictor,the mean AUC was the best,reaching 0.848.Therefore,DNA methylation data and LDCT image information of PNs can complement each other and effectively improve the accuracy of distinguishing between benign and malignant PNs.
According to the ROC curves in independent validation(Figure 4A),the LDCT-based DL model obtained an AUC of 0.842 (95% CI: 0.685,0.998),similar to the performance in the fivefold cross-validation (mean AUC: 0.830).The multimodal prediction model based on DNA methylation,subtype and DL prediction achieved an AUC of 0.894(95% CI: 0.775,1.000),which was better than the performance in the fivefold cross-validation (mean AUC:0.848).The improved diagnostic value of the two models in the independent validation was probably due to the enlarged model training subjects.However,the Mayo Clinic model achieved a much lower AUC of 0.519 (95%CI: 0.334,0.705) in the independent validation set.Figure 4B-Dshow confusion matrices of 42 PNs in the independent validation set diagnosed using different methods,respectively.The multimodal prediction model failed to identify 2 out 30 malignant nodules,and 1 out of 12 benign nodules.The LDCT-based DL model failed to identify 2 out 30 malignant nodules,and 3 out of 12 benign nodules.The Mayo Clinic model failed to identify 22 out 30 malignant nodules,and 1 out of 12 benign nodules.
Table 4displays a more comprehensive landscape of evaluation metrics,including accuracy,positive predictive value (PPV),negative predictive value (NPV),sensitivity,specificity,F1 score,AUC and Delong’s test P value.Overall,the multimodal model and the LDCT-based deep learning model were both significantly superior to the ACCP-recommended Mayo Clinic model.
Figure 4 Performance of different models in independent validation.(A) ROC curves of different methods in independent validation;(B-D) Confusion matrices of 42 PNs (12 benign PNs and 30 malignant PNs) in independent validation set diagnosed using different methods.ROC,receiver operating characteristic curve;PN,pulmonary nodule;LDCT,low-dose computed tomography;DL,deep learning;AUC,area under a receiver operator characteristic curve.
In the analysis of DMPs,no significant methylation sites forEGFR,ALK,KRAS,ERBB2,RETandMETmutations were found inSHOX2andPTGER4(P>0.05).However,three methylation sites (cg21503297,cg26129769 and cg25694447) inSHOX2and one methylation site(cg11821200) inPTGER4were found to be significant forROS1mutations,and one methylation site (cg21552242) inSHOX2was significant forTP53mutations (P<0.05).In addition,based on the whole-genome methylation site log2FC density distribution (Figure 5A),DMPs were further filtered using |log2FC|>0.1 and P<0.05 inROS1andTP53.As shown inFigure 5B,the methylation levels of all four DMPs in the Mut group ofROS1were downregulated compared with those in the WT group,while the methylation level of DMPs was upregulated in theTP53Mut group compared with the WT group.Samples were divided into high and low methylation groups based on the median β value of the methylation site,and a confusion matrix between the methylation level and gene mutation (Figure 5C) was generated.Low methylation levels of the three methylation sites (cg21503297,cg26129769 and cg25694447) inSHOX2and one methylation site (cg11821200) inPTCGER4were associated withROS1mutations.The hypermethylation level of the methylation site cg21552242 inSHOX2is associated withTP53mutations.Finally,the performance of the β value of DMPs in predictingROS1andTP53mutations was analyzed by ROC curve (Figure 5D).The best predictor ofROS1mutations was a methylation site inSHOX2(cg21503297),with an AUC of 0.758 (95% CI: 0.675,0.841).The AUCs of the other two methylation sites(cg26129769 and cg25694447) inSHOX2were 0.716 (95%CI: 0.613,0.819) and 0.722 (0.614,0.829),respectively.However,the AUC of the methylation site inPTGER4(cg11821200) was only 0.648 (95% CI: 0.501,0.795).For predictingTP53mutations,the AUC of the methylation site inSHOX2(cg21552242) was 0.682 (95% CI: 0.630,0.734).The methylation levels ofSHOX2andPTGER4were helpful in predicting mutations inROS1andTP53,but the predictive value ofSHOX2was significantly higher than that ofPTGER4.
Table 4 Evaluation metrics of different methods in independent validation
In the diagnosis of early lung cancer,LDCT has a low specificity,whereas serum biomarkers,such as CEA,CA125 and CYFRA21-1,have low sensitivity (17,18,29).In the present study,we analyzed theSHOX2/PTGER4DNA methylation status in 299 pulmonary nodule patients and are the first to develop a multimodal prediction model based onSHOX2/PTGER4DNA methylation,subtype and LDCT image to rapidly and cost-effectively distinguish early-stage lung cancers from benign nodules.In addition,to the best of our knowledge,this is also the first study to identify the relationship between lung cancer driver gene mutations andSHOX2/PTGER4DNA methylation.
Aberrant DNA methylation in plasma is an emerging biomarker of liquid biopsy.Among these biomarkers,theSHOX2andPTGER4methylation biomarkers showed high potential in the diagnosis and prognosis of lung cancer.Several researchers performed DNA methylation analysis of theSHOX2andPTGER4genes in blood.These results showed thatSHOX2/PTGER4DNA methylation levels were much higher in patients with malignant PNs and achieved a high AUC in distinguishing malignant from benign PNs(18,19,30,31).However,the very large number of advanced and locally advanced lung cancer patients dramatically reduced the early diagnostic accuracy ofSHOX2/PTGER4DNA methylation in these studies.Liang and his group reported a model named PulmoSeek that could discriminate between patients with early-stage lung cancers and benign diseases with an AUC value of 0.76 by sequencing a panel of 12,899 lung cancer-specific methylation regions (17).Unfortunately,the high cost of large panel sequencing greatly prevented it from being popularized.In the present study,we analyzed theSHOX2/PTGER4DNA methylation status in 257 pulmonary nodule patients.The results showed that the AUC values ofSHOX2/PTGER4were substantially different in subtypes,indicating the necessity for building a model based onSHOX2/PTGER4DNA methylation,subtype and radiographic features.So we developed a DL model based on LDCT images,which achieved an AUC of 0.842.Then by integratingSHOX2/PTGER4DNA methylation,subtype and DL prediction,we developed a novel multimodal prediction model with a higher AUC value thanSHOX2/PTGER4DNA methylation or DL alone.Compared with the Mayo Clinic model,the multimodal model developed in this study was an effective tool for identifying early-stage lung cancer.Although the ROC curves of the multimodal model and the LDCT-based DL model did not reach a significantly statistical difference in Delong’s test,the multimodal model has higher accuracy,PPV,NPV,specificity,and F1 score.This may be because in the case of nested models,Delong’s test was not a suitable method for testing the effect of additional predictors on AUC (32).In addition,to provide more statistical evidence for the superiority of multimodal models,larger and more balanced validation datasets may need to be collected in the future (33).
Figure 5 Association between SHOX2/PTGER4 DNA methylation and ROS1 and TP53 mutations.(A) Whole-genome methylation site log2FC density distribution in ROS1 and TP53;(B) Comparison of methylation level of DMPs in Mut group and WT group of ROS1 and TP53;(C) Confusion matrix between methylation level and gene mutation;(D) ROC curves for predicting ROS1 and TP53 mutations based on β value of DMPs.DMP,differentially methylated position;Mut,mutation;WT,wide type;ROC,receiver operating characteristic curve;AUC,area under a receiver operator characteristic curve;OR,odds ratio;95% CI,95% confidence interval.
The discoveries of treatable genetic alterations have provided a revolutionary new treatment for lung cancer patients (34).The proto-oncogeneROS1drives a diverse range of cancers,including non-small cell lung cancer(NSCLC).It is located on the long arm of chromosome 6q22 and encodes a receptor tyrosine kinase.The overall prevalence of ROS1 fusions is reported to be 2% in NSCLC and up to 3.3% in LUAD (35,36).With targeted therapy,studies reported a median progression free survival of 19.2 months and disease control rate of 90%,demonstrating a better prognosis than other driver gene alterations (37).TP53is a tumor suppressor gene and plays a critical role in regulating cell proliferation,tumor growth,the spread of cancer,and drug resistance (38-40).TP53is one of the most frequent concomitant mutations in NSCLC and has been associated with poor prognosis.This study found that patients withSHOX2/PTGER4DNA hypermethylation,especiallySHOX2DNA hypermethylation,were associated with greater rates ofTP53mutations and lower rates ofROS1,indicating poorer survival outcomes.
However,our study also has several limitations.First,the number of samples,especially the benign lung disease patient group and pGGN subtype group,was not sufficient.Thus,the results of this analysis are not universal.Second,the possibility of the background of the pulmonary nodule LDCT image affecting the model performance cannot be ruled out.Third,all 299 pulmonary nodule patients were recruited from the First Affiliated Hospital of Sun Yat-sen University.Subjects from other hospitals should be enrolled to further validate the model.Fourth,other methylation markers,such as RASSF1A,APC and so on,also have been reported to be correlated with lung cancer diagnosis.However,the collection of these data was difficult due to the unavailable commercial detection kits in the First Affiliated Hospital of Sun Yat-Sen University.Fifth,the relationship betweenSHOX2/PTGER4DNA methylation andTP53gene mutations was analyzed using databases instead of the 299 pulmonary nodule patients,which might not represent the real-world situation.
We developed a multimodal prediction model based onSHOX2/PTGER4DNA methylation,subtype and LDCT image for the rapid and cost-effective diagnosis of earlystage lung cancers from benign nodules.Patients withSHOX2/PTGER4DNA hypermethylation were enriched inTP53mutations,indicating poor prognosis.
This study was supported by the National Natural Science Foundation of China (No.81600065 and No.82073805).
Conflicts of Interest: The authors have no conflicts of interest to declare.
扩展阅读文章
推荐阅读文章
推荐内容
老骥秘书网 https://www.round-online.com
Copyright © 2002-2018 . 老骥秘书网 版权所有