HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation

Evaluation de différentes variantes du modèle de Cox pour le pronostic de patients atteints de cancer à partir de données publiques de séquençage et cliniques

Abstract : Cancer has been the leading cause of premature mortality (death before the age of 65) in France since 2004. For the same organ, each cancer is unique, and personalized prognosis is therefore an important aspect of patient management and follow-up. The decrease in sequencing costs over the last decade have made it possible to measure the molecular profiles of many tumors on a large scale. Thus, the TCGA database provides RNA-seq data of tumors, clinical data (age, sex, grade, stage, etc.), and follow-up times of associated patients over several years (including patient survival, possible recurrence, etc.). New discoveries are thus made possible in terms of biomarkers built from transcriptomic data, with individualized prognoses. These advances require the development of large-scale data analysis methods adapted to take into account both survival data (right-censored), clinical characteristics, and molecular profiles of patients. In this context, the main goal of the thesis is to compare and adapt methodologies to construct prognostic risk scores for survival or recurrence of patients with cancer from sequencing and clinical data.The Cox model (semi-parametric) is widely used to model these survival data, and allows linking them to explanatory variables. The RNA-seq data from TCGA contain more than 20,000 genes for only a few hundred patients. The number p of variables then exceeds the number n of patients, and parameters estimation is subject to the “curse of dimensionality”. The two main strategies to overcome this issue are penalty methods and gene pre-filtering. Thus, the first objective of this thesis is to compare the classical penalization methods of Cox's model (i.e. ridge, lasso, elastic net, adaptive elastic net). To this end, we use real and simulated data to control the amount of information contained in the transcriptomic data. Then, the second issue addressed concerns the univariate pre-filtering of genes before using a multivariate Cox model. We propose a methodology to increase the stability of the genes selected, and to choose the filtering thresholds by optimizing the predictions. Finally, although the cost of sequencing (RNA-seq) has decreased drastically over the last decade, it remains too high for routine use in practice. In a final section, we show that the sequencing depth of miRNAs can be reduced without degrading the quality of predictions for some TCGA cancers, but not for others.
Complete list of metadata

Contributor : Abes Star :  Contact
Submitted on : Thursday, April 1, 2021 - 4:16:08 PM
Last modification on : Tuesday, March 29, 2022 - 3:09:31 AM
Long-term archiving on: : Friday, July 2, 2021 - 7:06:34 PM


Version validated by the jury (STAR)


  • HAL Id : tel-03188077, version 1



Rémy Jardillier. Evaluation de différentes variantes du modèle de Cox pour le pronostic de patients atteints de cancer à partir de données publiques de séquençage et cliniques. Ingénierie de l'environnement. Université Grenoble Alpes [2020-..], 2020. Français. ⟨NNT : 2020GRALS008⟩. ⟨tel-03188077⟩



Record views


Files downloads