Large language models for literature reviews-an exemplary comparison of llm-based approaches with manual methods
Large Language Models (LLMs) and LLM-based tools are increasingly popular for various tasks, including literature reviews. This trend holds significant potential in fields like healthcare and medical informatics, where timely updates on new research findings can have life-saving implications. However, the sensitive nature of these fields demands high reliability and trustworthiness. In this study, we assess the suitability of widely used LLM-based tools for conducting literature reviews in healthcare and medical informatics across two scenarios. First, we evaluated the tools’ performance and reliability in executing a systematic, scientific literature review by replicating the exact methodology of a recently accepted review we conducted. Second, we explored the tools’ effectiveness in quickly retrieving relevant information by testing their responses to differently phrased queries, focusing on the neutrality and balance of the information provided. Our findings indicate that while LLM-based tools can offer a useful initial overview of an unfamiliar topic, they are less effective for in-depth literature reviews. Furthermore, the choice of the specific tool is critical, as significant differences were observed in both the generated text and the references provided across tools. Additionally, our results suggest that prompts crafted in a scientific style with a negative connotation towards the research hypothesis tend to result in more balanced discussions compared to those framed in everyday language with a positive connotation towards the research hypothesis.