Consent

This site uses third party services that need your consent.

Employee photo Ms. Julia Gehrmann

Julia Gehrmann (M.Sc.)

Doctoral Candidate and Research Assistant
ORCID: 0000-0002-4101-5458

Biography

Julia Gehrmann studied Computer Science with Biology as a minor subject at RWTH Aachen University from 2016 to 2022. During her studies she set a special focus on Data Science and Machine Learning. As a student research assistant, she gained practical experience at the Institute for Computational Genomics Aachen (2018 to 2021), Fraunhofer FIT (2021) and the Institute for Biomedical Informatics Köln (BI-K) (2022). In her master's thesis at BI-K, she developed a data access and integration workflow for medical data science. Since July 2022, Julia is a research assistant and PhD candidate at BI-K with a focus on multimodal data integration for medical AI applications and promoting the usage of real-world medical data in research.

Contact

Academic Background

Areas of Expertise

Research Focus

  • Multimodal Data Integration
  • Medical Real-World Data

Current Teachings

AI in Medicine Series

Bachelor Studies Clinical Semesters Clinicians Doctoral Studies Master Studies PostDoc Preclinical Semesters WiSe + SoSe & SoSe WiSe

Artificial intelligence is already fundamentally changing medicine, but how do the underlying methods work, and what opportunities and challenges do they present? In this series of seminars, each session will cover a new, practical topic, including the basics of some AI methods, ethical challenges and possible solutions. The lectures, depending on the speaker, could be in German or English, are thematically linked but self-contained

Show in KLIPS

WissPro - Literaturrecherche

Preclinical Semesters SoSe

This course is offered to medical students interested in WissPro 1 and 2, involving literature research. It includes an introductory lecture on literature search strategies and best practices, as well as presentations on the topics offered by different members of our institute. The work is organised according to a schedule with several checkpoints and concludes with on-site presentations by the participating students.

Show in KLIPS

Medical AI - Vom Datenchaos zur richtigen Krebstherapie - Daten Aufbereitung für KI in der Onkologie

Clinical Semesters Clinicians Doctoral Studies PostDoc Preclinical Semesters WiSe + SoSe & SoSe WiSe

Im klinischen Alltag entstehen große Mengen an Daten, die wertvolle Erkenntnisse für die Forschung ermöglichen, insbesondere zur Verbesserung von Diagnosen und Therapien. Damit diese Daten für den Einsatz in Künstlicher Intelligenz (KI) nutzbar sind, müssen sie sorgfältig aufbereitet werden. In dieser Lehrveranstaltung erhalten die Teilnehmenden eine Einführung in die Themen Datenqualität und Datenvorverarbeitung für KI. Sie arbeiten mit einem synthetischen Datensatz, der echten klinischen Daten aus der Onkologie nachempfunden ist, und lernen die Herausforderungen der Datenaufbereitung aus erster Hand kennen. Zu Beginn des Kurses werden die Teilnehmenden mit den Grundlagen der Datenqualität und der Datenaufbereitung für KI-Modelle vertraut gemacht. Anschließend werden am Beispiel eines aktuellen onkologischen Forschungsprojektes typische Herausforderungen bei der Vorbereitung medizinischer Routinedaten für KI-basierte Auswertungen erläutert. Im praktischen Teil der Veranstaltung setzen die Teilnehmenden das Gelernte um, indem sie mit Python arbeiten und eigenständig einen Datensatz analysieren. Sie identifizieren Probleme in den Rohdaten, korrigieren fehlerhafte oder unvollständige Einträge und bereiten die Daten so auf, dass sie für eine KI-gestützte Analyse verwendet werden können. Der Kurs ist in drei Teile gegliedert. In einer zweistündigen Einführungssitzung, die am Institut stattfindet, werden theoretische Grundlagen vermittelt und die Aufgabenstellung erläutert. Danach haben die Teilnehmenden eine Woche Zeit, um in einer Hausaufgabe eigenständig die Datenqualität zu untersuchen und den Datensatz für die KI-Analyse vorzubereiten. In einer abschließenden dreistündigen Übungseinheit, die sowohl vor Ort als auch online besucht werden kann, werden die Ergebnisse gemeinsam besprochen und Herausforderungen diskutiert. Studierende, die an beiden Sitzungen teilnehmen, erhalten auf Anfrage eine Teilnahmebescheinigung.

Show in KLIPS

Publications from Julia Gehrmann

What prevents us from reusing medical real-world data in research

2023 - Open Access -

Medical real-world data stored in clinical systems represents a valuable knowledge source for medical research, but its usage is still challenged by various technical and cultural aspects. Analyzing these challenges and suggesting measures for future improvement are crucial to improve the situation. This comment paper represents such an analysis from the perspective of research.

RGT: a toolbox for the integrative analysis of high throughput regulatory genomics data

2023 - Open Access -
Zhijian Li, Chao-Chung Kuo, Fabio Ticconi, Mina Shaigan, Julia Gehrmann, Eduardo Gade Gusmao, Manuel Allhoff, Martin Manolov, Martin Zenke, Ivan G Costa

Background

Massive amounts of data are produced by combining next-generation sequencing with complex biochemistry techniques to characterize regulatory genomics profiles, such as protein–DNA interaction and chromatin accessibility. Interpretation of such high-throughput data typically requires different computation methods. However, existing tools are usually developed for a specific task, which makes it challenging to analyze the data in an integrative manner.

Results

We here describe the Regulatory Genomics Toolbox (RGT), a computational library for the integrative analysis of regulatory genomics data. RGT provides different functionalities to handle genomic signals and regions. Based on that, we developed several tools to perform distinct downstream analyses, including the prediction of transcription factor binding sites using ATAC-seq data, identification of differential peaks from ChIP-seq data, and detection of triple helix mediated RNA and DNA interactions, visualization, and finding an association between distinct regulatory factors.

Conclusion

We present here RGT; a framework to facilitate the customization of computational methods to analyze genomic data for specific regulatory genomics problems. RGT is a comprehensive and flexible Python package for analyzing high throughput regulatory genomics data and is available at: https://github.com/CostaLab/reg-gen. The documentation is available at: https://reg-gen.readthedocs.io

Early Multimodal Data Integration for Data-Driven Medical Research - A Scoping Review

2024 - Open Access -

Introduction: Data-driven medical research (DDMR) needs multimodal data (MMD) to sufficiently capture the complexity of clinical cases. Methods for early multimodal data integration (MMDI), i.e. integration of the data before performing a data analysis, vary from basic concatenation to applying Deep Learning, each with distinct characteristics and challenges. Besides early MMDI, there exists late MMDI which performs modality-specific data analyses and then combines the analysis results.

Methods: We conducted a scoping review, following PRISMA guidelines, to find and analyze 21 reviews on methods for early MMDI between 2019 and 2024.

Results: Our analysis categorized these methods into four groups and summarized group-specific characteristics that are relevant for choosing the optimal method combination for MMDI pipelines in DDMR projects. Moreover, we found that early MMDI is often performed by executing several methods subsequently in a pipeline. This early MMDI pipeline is usually subject to manual optimization.

Discussion: Our focus was on structural integration in DDMR. The choice of MMDI method depends on the research setting, complexity, and the researcher team's expertise. Future research could focus on comparing early and late MMDI approaches as well as automating the optimization of MMDI pipelines to integrate vast amounts of real-world medical data effectively, facilitating holistic DDMR.

Seeing the primary tumor because of all the trees: Cancer type prediction on low-dimensional data

2024 - Open Access -
Julia Gehrmann, Devina Johanna Soenarto, Johanna Soenarto, Hidayat Kevin, Maria Beyer, Lars Quakulinski, L, Samer Alkarkoukly, Scarlett Berressem, Anna Gundert, Michael Butler, Ana Grönke, Simon Lennartz, Thorsten Persigehl, Thomas Zander, Oya Beyan

The Cancer of Unknown Primary (CUP) syndrome is characterized by identifiable metastases while the primary tumor remains hidden. In recent years, various data-driven approaches have been suggested to predict the location of the primary tumor (LOP) in CUP patients promising improved diagnosis and outcome. These LOP prediction approaches use high-dimensional input data like images or genetic data. However, leveraging such data is challenging, resource-intensive and therefore a potential translational barrier. Instead of using high-dimensional data, we analyzed the LOP prediction performance of low-dimensional data from routine medical care. With our findings, we show that such low-dimensional routine clinical information suffices as input data for tree-based LOP prediction models. The best model reached a mean Accuracy of 94% and a mean Matthews correlation coefficient (MCC) score of 0.92 in 10-fold nested cross-validation (NCV) when distinguishing four types of cancer. When considering eight types of cancer, this model achieved a mean Accuracy of 85% and a mean MCC score of 0.81. This is comparable to the performance achieved by approaches using high-dimensional input data. Additionally, the distribution pattern of metastases appears to be important information in predicting the LOP.

Large language models for literature reviews-an exemplary comparison of llm-based approaches with manual methods

2025 - Open Access -

Large Language Models (LLMs) and LLM-based tools are increasingly popular for various tasks, including literature reviews. This trend holds significant potential in fields like healthcare and medical informatics, where timely updates on new research findings can have life-saving implications. However, the sensitive nature of these fields demands high reliability and trustworthiness. In this study, we assess the suitability of widely used LLM-based tools for conducting literature reviews in healthcare and medical informatics across two scenarios. First, we evaluated the tools’ performance and reliability in executing a systematic, scientific literature review by replicating the exact methodology of a recently accepted review we conducted. Second, we explored the tools’ effectiveness in quickly retrieving relevant information by testing their responses to differently phrased queries, focusing on the neutrality and balance of the information provided. Our findings indicate that while LLM-based tools can offer a useful initial overview of an unfamiliar topic, they are less effective for in-depth literature reviews. Furthermore, the choice of the specific tool is critical, as significant differences were observed in both the generated text and the references provided across tools. Additionally, our results suggest that prompts crafted in a scientific style with a negative connotation towards the research hypothesis tend to result in more balanced discussions compared to those framed in everyday language with a positive connotation towards the research hypothesis.