แนวทางผสมผสานเพื่อการค้นหาตัวบ่งชี้ทางชีวภาพ และการใช้แบบจำลองปัญญาประดิษฐ์เชิงลึกในการทำนายต้นกำเนิดของมะเร็งจากข้อมูลการเติมหมู่เมทิลของสายดีเอ็นเอ

ชื่อนักเรียนผู้จัดทำโครงงานวิทยาศาสตร์

ภคนันท์ ทัศนาภิรมย์, กษิดิ์เดช อิ้วศรีสกุล

อาจารย์ที่ปรึกษาโครงงานวิทยาศาสตร์

กอบชัย ดวงรัตนเลิศ, พิษณุ จันทรเสวต

โรงเรียนที่กำกับดูแลโครงงานวิทยาศาสตร์

โรงเรียนเตรียมอุดมศึกษา

ปีที่จัดทำโครงงานวิทยาศาสตร์

พ.ศ. 2565

บทคัดย่อโครงงานวิทยาศาสตร์

Treatment of cancer is generally determined by the type of tissue origin. However, up to 5% of all cancer cases have carcinoma of unknown primary (CUP). This type of metastatic cancer poses additional challenges to the identification of primary cancer sites and to successful treatment. Carcinogenesis is associated with extensive DNA methylation abnormality on 5'-cytosine- phosphate-guanine-3' (CpG) islands across the genome. Several of these epigenetic modifications take place early in carcinogenesis and are widespread across tumor types. This enables DNA methylation to be cancer biomarkers for early diagnosis, prediction of cancer tissue origin, and potential optimization of treatment. Ultimately, we propose a Combined Approach for 1) biomarker discovery, 2) predicting the origin of cancer from various sets of data, and identifying biomarkers for each cancer origin by sensitivity analysis, and 3) development of an open-source application to allow general users who lack extensive knowledge in machine learning to analyze their data efficiently and effortlessly. Discretization of methylation beta values was conducted to emulate state-like progression of cancer and to accommodate missing values. The feature selection process with strong L1 regularization of lambda=1 resulted in 3 solution sets of which intersection has only 286 sites out of an initial set of 7,846 CpG sites. The solution intersection performed comparable to the full and solutions, and significantly better than randomly selected sets in all cancer stage and demographic groups. Additionally, SHAP was used to calculate contribution feature values of each CpG for each primary sites. Biomarkers with absolute contribution values in upper outlier were selected and further investigated/reviewed to confirm their potential association with respective primary sites conducted and several CpG sites may serve as biomarkers and could be associated biologically with their respective cancer. Further research is necessary in order to elucidate the relationship among biomarkers and primary sites, to improve upon the model performance in few primary site, and to extend applicability to real medical samples.