Consent

This site uses third party services that need your consent.

Placeholder for employee photo  Mr Zeyd Broukhers

M. Abboud (Samer) Alkarkoukly (M.D., M.Sc.)

Data Steward, Medical Data

Contact

Publications from M. Abboud (Samer) Alkarkoukly

Linking international registries to FHIR and Phenopackets with RareLink: a scalable REDCap-based framework for rare disease data interoperability

2025 - Open Access -
Adam S.L. Graefe, Filip Rehburg, Samer Alkarkoukly, Daniel Danis, Ana Grönke, Miriam R. Hübner, Alexander Bartschke, Thomas Debertshäuser, Sophie A.I. Klopfenstein, Julian Saß, Julia Fleck, Mirko Rehberg, Jana Zschüntzsch, Elisabeth F. Nyoungui, Tatiana Kalashnikova, Luis Murguía-Favela, Beata Derfalvi, Nicola A.M. Wright, Shahida Moosa, Soichi Ogishima, Oliver Semler, Susanna Wiegand, Peter Kuehnen, Christopher J. Mungall, Melissa A. Haendel, Peter N. Robinson, Sylvia Thun, Oya Beyan

While Research Electronic Data Capture (REDCap) has been widely adopted in rare disease research, its unconstrained data format often leads to implementations that lack native interoperability with global health data standards, limiting secondary data use. To address this, we developed and validated RareLink, an open-source framework implementing our previously-published ontology-based rare disease common data model, enabling standardised data exchange between REDCap, international registries, and downstream analysis tools. Its preconfigured pipelines interact with the local REDCap application programming interface and enable semi-automatic import or export of data to the Global Alliance for Genomics and Health (GA4GH) Phenopackets and Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) instances, conforming to the HL7 International Patient Summary and Genomics Reporting profiles. The framework was developed in three iterative phases using retrospective and prospective clinical data from patients with various rare metabolic and neuromuscular disorders, as well as inborn errors of immunity. Phase one involved deployment across four German university hospitals for registry and data analysis purposes. Phase two integrated RareLink with the Canadian Inborn Errors of Immunity National Registry, enhancing extensibility. Phase three focuses on international implementation in South Africa and Japan to assess global scalability. Implementation feedback was continuously incorporated to validate outputs and improve usability. For evaluation purposes, we defined a simulated Kabuki syndrome cohort based on published cases and demonstrated data export to both Phenopackets and FHIR instances. RareLink can enhance the clinical utility of REDCap by enabling structured data analysis and interoperability. Its global applicability and open-source nature can support equitable rare disease research with the ultimate goal to improve patient care. Broader adoption and coordination with entities such as HL7 and the European Reference Networks are thus essential to realise its full potential. The framework and its documentation are freely available through GitHub and Read the Docs, respectively

An ontology-based rare disease common data model harmonising international registries, FHIR, and Phenopackets

2025 - Open Access -
Adam S. L. Graefe, Miriam R. Hübner, Filip Rehburg, Steffen Sander, Sophie A. I. Klopfenstein, Samer Alkarkoukly, Ana Grönke, Annic Weyersberg, Daniel Danis, Jana Zschüntzsch, Elisabeth F. Nyoungui, Susanna Wiegand, Peter Kühnen, Peter N. Robinson, Oya Beyan & Sylvia Thun

Although rare diseases (RDs) affect over 260 million individuals worldwide, low data quality and scarcity challenge effective care and research. This work aims to harmonise the Common Data Set by European Rare Disease Registry Infrastructure, Health Level 7 Fast Healthcare Interoperability Base Resources, and the Global Alliance for Genomics and Health Phenopacket Schema into a novel rare disease common data model (RD-CDM), laying the foundation for developing international RD-CDMs aligned with these data standards. We developed a modular-based GitHub repository and documentation to account for flexibility, extensions and further development. Recommendations on the model’s cardinalities are given, inviting further refinement and international collaboration. An ontology-based approach was selected to find a common denominator between the semantic and syntactic data standards. Our RD-CDM version 2.0.0 comprises 78 data elements, extending the ERDRI-CDS by 62 elements with previous versions implemented in four German university hospitals capturing real world data for development and evaluation. We identified three categories for evaluation: Medical Data Granularity, Clinical Reasoning and Medical Relevance, and Interoperability and Harmonisation.

Seeing the primary tumor because of all the trees: Cancer type prediction on low-dimensional data

2024 - Open Access -
Julia Gehrmann, Devina Johanna Soenarto, Johanna Soenarto, Hidayat Kevin, Maria Beyer, Lars Quakulinski, L, Samer Alkarkoukly, Scarlett Berressem, Anna Gundert, Michael Butler, Ana Grönke, Simon Lennartz, Thorsten Persigehl, Thomas Zander, Oya Beyan

The Cancer of Unknown Primary (CUP) syndrome is characterized by identifiable metastases while the primary tumor remains hidden. In recent years, various data-driven approaches have been suggested to predict the location of the primary tumor (LOP) in CUP patients promising improved diagnosis and outcome. These LOP prediction approaches use high-dimensional input data like images or genetic data. However, leveraging such data is challenging, resource-intensive and therefore a potential translational barrier. Instead of using high-dimensional data, we analyzed the LOP prediction performance of low-dimensional data from routine medical care. With our findings, we show that such low-dimensional routine clinical information suffices as input data for tree-based LOP prediction models. The best model reached a mean Accuracy of 94% and a mean Matthews correlation coefficient (MCC) score of 0.92 in 10-fold nested cross-validation (NCV) when distinguishing four types of cancer. When considering eight types of cancer, this model achieved a mean Accuracy of 85% and a mean MCC score of 0.81. This is comparable to the performance achieved by approaches using high-dimensional input data. Additionally, the distribution pattern of metastases appears to be important information in predicting the LOP.

Research Projects