Identifying and optimizing human endometrial gene expression signatures for endometrial dating.
Diaz-Gimeno P., Sebastian-Leon P., Sanchez-Reyes JM., Spath K., Aleman A., Vidal C., Devesa-Peiro A., Labarta E., Sánchez-Ribas I., Ferrando M., Kohls G., García-Velasco JA., Seli E., Wells D., Pellicer A.
STUDY QUESTION: What are the key considerations for developing an enhanced transcriptomic method for secretory endometrial tissue dating? SUMMARY ANSWER: Multiple gene expression signature combinations can serve as biomarkers for endometrial dating, but their predictive performance is variable and depends on the number and identity of the genes included in the prediction model, the dataset characteristics and the technology employed for measuring gene expression. WHAT IS KNOWN ALREADY: Among the new generation of transcriptomic endometrial dating (TED) tools developed in the last decade, there exists variation in the technology used for measuring gene expression, the gene makeup and the prediction model design. A detailed study, comparing prediction performance across signatures for understanding signature behaviour and discrepancies in gene content between them, is lacking. STUDY DESIGN, SIZE, DURATION: A multicentre prospective study was performed between July 2018 and October 2020 at five different centres from the same group of clinics (Spain). This study recruited 281 patients and finally included in the gene expression analysis 225 Caucasian patients who underwent IVF treatment. After preprocessing and batch effect filtering, gene expression measurements from 217 patients were combined with artificial intelligence algorithms (support vector machine, random forest and k-nearest neighbours) allowing evaluation of different prediction models. In addition, secretory-phase endometrial transcriptomes from gene expression omnibus (GEO) datasets were analysed for 137 women, to study the endometrial dating capacity of genes independently and grouped by signatures. This provided data on the consistency of prediction across different gene expression technologies and datasets. PARTICIPANTS/MATERIALS, SETTING, METHODS: Endometrial biopsies were analysed using a targeted TruSeq (Illumina) custom RNA expression panel called the endometrial dating panel (ED panel). This panel included 301 genes previously considered relevant for endometrial dating as well as new genes selected for their anticipated value in detecting the secretory phase. Final samples (n = 217) were divided into a training set for signature discovery and an independent testing set for evaluation of predictive performance of the new signature. In addition, secretory-phase endometrial transcriptomes from GEO were analysed for 137 women to study endometrial dating capacity of genes independently and grouped by signatures. Predictive performance among these signatures was compared according to signature gene set size. MAIN RESULTS AND THE ROLE OF CHANCE: Testing of the ED panel allowed development of a model based on a new signature of 73 genes, which we termed 'TED' and delivers an enhanced tool for the consistent dating of the secretory phase progression, especially during the mid-secretory endometrium (3-8 days after progesterone (P) administration (P + 3-P + 8) in a hormone replacement therapy cycle). This new model showed the best predictive capacity in an independent test set for staging the endometrial tissue in the secretory phase, especially in the expected window of implantation (average of 114.5 ± 7.2 h of progesterone administered; range in our patient population of 82-172 h). Published sets of genes, in current use for endometrial dating and the new TED genes, were evaluated in parallel in whole-transcriptome datasets and in the ED panel dataset. TED signature performance was consistently excellent for all datasets assessed, frequently outperforming previously published sets of genes with a smaller number of genes for dating the endometrium in the secretory phase. Thus, this optimized set exhibited prediction consistency across datasets. LARGE SCALE DATA: The data used in this study is partially available at GEO database. GEO identifiers GSE4888, GSE29981, GSE58144, GSE98386. LIMITATIONS, REASONS FOR CAUTION: Although dating the endometrial biopsy is crucial for investigating endometrial progression and the receptivity process, further studies are needed to confirm whether or not endometrial dating methods in general are clinically useful and to guide the specific use of TED in the clinical setting. WIDER IMPLICATIONS OF THE FINDINGS: Multiple gene signature combinations provide adequate endometrial dating, but their predictive performance depends on the identity of the genes included, the gene expression platform, the algorithms used and dataset characteristics. TED is a next-generation endometrial assessment tool based on gene expression for accurate endometrial progression dating especially during the mid-secretory. STUDY FUNDING/COMPETING INTEREST(S): Research funded by IVI Foundation (1810-FIVI-066-PD). P.D.-G. visiting scientist fellowship at Oxford University (BEFPI/2010/032) and Josefa Maria Sanchez-Reyes' predoctoral fellowship (ACIF/2018/072) were supported by a program from the Generalitat Valenciana funded by the Spanish government. A.D.-P. is supported by the FPU/15/01398 predoctoral fellowship from the Ministry of Science, Innovation and Universities (Spanish Government). D.W. received support from the NIHR Oxford Biomedical Research Centre. The authors do not have any competing interests to declare.