ELSA and the Data Scientist: A Qualitative Approach
Data Science (DS) and Artificial Intelligence (AI) are transforming research, industry and society at an unprecedented pace, enabling advances in areas such as healthcare, finance, e-commerce and beyond. Despite their potential, the rapid development and widespread use of DS and AI raise (novel) issues of reliability, accuracy, copyright and data protection, and bias and discrimination, among others, see for instance [1], [2], [3], [4]. It is therefore vital for data scientists to acknowledge the Ethical, Legal and Societal Aspects (ELSA) encountered in DS and AI projects, as this can promote critical thinking and reflection, thereby ensuring that data-driven systems and their underlying technologies are developed, deployed and used responsibly. In the framework of NFDI4DataScience, specifically in the Community and Training task area, we aim to develop ELSA guidelines for data scientists [5]. In order to achieve this objective, we tried to assess the landscape by conducting interviews with researchers and practitioners in the field, aiming to identify and analyse the most common ELSA challenges encountered in DS/AI projects and how to cope with them. The interviews were semi-structured interviews as this form is well suited to our purpose of collecting experiences, reflections and opinions from the participants [6]. A total of 30 were conducted between November 2022 and February 2024. The participants came mainly from academia, but the industry was also well represented. The application domains included a.o., healthcare, finance, engineering and digital humanities. In order to systematically interpret the material for manifest and latent meanings, we used qualitative content analysis [7]. Consequently, a categorisation of the material was developed to provide the basis for this interpretation [8, p. 33]. Initial categories were developed deductively, derived from the interview guide, which itself was based on existing theory and research. Subcategories were created inductively from the interviews following initial coding with the main categories. The categories reflected the key ELSA challenges faced by data scientists, including data protection, but also more specifically fairness, transparency, consent, intellectual property, and data scientists' knowledge (and also attitudes) towards ELSA challenges and how they influence their decision-making processes in the project. The results of our analysis reveal that data scientists are generally aware of ELSA issues, some more acutely than others; for example, legal issues, especially data protection, are more prominent, especially in application domains such as healthcare; bias is considered more during the data collection and less in connection to the model used or the system deployment; issues of transparency and explainability are also crucial although not prevalent. Insight was also provided regarding interdisciplinary cooperation, institutionalised ELSA support, and project documentation. Additionally, we have recorded critical assessments of the practices followed, spanning from issues with the application of laws to the responsibility and accountability of practitioners during a project life cycle. Finally, our findings emphasise the necessity of enhancing ELSA literacy and establishing and providing a strong foundational understanding of ethical and legal principles to data scientists. Developing recommendations/best practices for data scientists was regarded as a positive first step towards this goal