Anna-Lena’s expertise: Software Technology

Anna-Lena Lamprecht

Anna-Lena’s expertise: Software Technology

Anna-Lena Lamprecht is an assistant professor in Software Technology at Utrecht University, within the Department of Information and Computing Sciences. She describes herself as a Research Software Engineering (RSE) academic, focusing on software development processes in research, and developing methods and technology to improve them. UBC asks Anna-Lena about her work in bioinformatics .

Bioinformatician at heart

To get acquainted, Anna-Lena introduces herself: “Although I am scientifically grounded in computer science and look at the world through the eyes of a software engineer, I have always had a strong interest for applications in the different sciences. I started to combine these interests very early: I studied Applied Computer Science at the University of Göttingen in Germany, a new program at that time. During both Bachelor and Master studies, I took several biology and bioinformatics courses, including lab practicals in genetics and microbiology. In addition, I spent a two-month internship at a bioinformatics research group at Beijing University, China.”

So you’re a bioinformatician at heart? “Of course my personal background and interests had a strong influence on how I shaped my career. And bioinformatics is a domain where software has played a key role for a long time. The community has created a rich infrastructure of data, ontologies, tools, services and workflows, and is like no other field ready and suited to challenge new RSE methods. ‘One excellent example is the strong life science presence in the international RSE and Semantic Web research communities, and by the impressive informatics components in networks like BioSB and ELIXIR.”

Automated workflow composition

When asked to give an example of her work Anna-Lena answers: “Let’s look at a recent collaborative project on workflow exploration in proteomics. This was a joint effort on which we published the outcomes in Bioinformatics last year.

Our goal in the study was to automatically compose workflows for data analysis in mass spectrometry-based proteomics. We used workflow synthesis technology from my previous work, the EDAM data and methods ontology and a selection of EDAM-annotated tools from the bio.tools registry. This combination allowed us to describe the intended workflows at an abstract, conceptual level, and let the system automatically explore workflows that meet the specification. For example, we specified a workflow as taking mass spectra in ThermoRAW format as an input, finally producing an amino acid index (in any format) and using peptide identification and retention time prediction to do this. This request resulted in several possible workflows, of which we selected a few for further analysis and benchmarking. With the study we demonstrated that the latest developments in life science infrastructure enable the (partial) automation of the regularly tedious workflow composition process.”

Roadmap and research agenda

In addition Anna-Lena shares enthusiastically: “Meanwhile we have produced further results and expanded our collaboration on this topic. One recent highlight is a Lorentz Center workshop on ‘Automated Workflow Composition in the Life Sciences’ that we ran in March 2020. Just in time before the Covid-19 countermeasures were implemented. With about 40 researchers from various fields of eScience and life sciences, including several high-profile academics, we discussed the state of the art, latest developments and open challenges of automatically composing scientific workflows. The workshop resulted in a jointly created roadmap and research agenda for the time to come.”

Reliable as route planning

When asked for her ultimate goal in bioinformatics, Anna-Lena uses a metaphor: “With regard to the work I just mentioned, my vision is a workflow exploration interface that is as easy to use and as reliable as route planning with Google Maps or similar. Instead of entering start, destination and wanted means of transportation, users would enter the data they have, the kind of results they want, and possibly additional constraints for the workflow. The ‘route planner’ would then show different possible workflows, along with information that helps to compare the different options, so that the user can select the one(s) they want to execute.”

Generally, I think that scientific quality and correctness of research software is an issue that needs to receive more attention in the future.  This is a complex problem. At the moment we do not even have a clear definition of ‘correctness’ in this context. This is one of the challenges that I want to work on in the next years. Obviously these goals are not only relevant for bioinformatics, but again I think that this domain has the greatest potential to pioneer the developments.”

Cooperating community

Within the UBC Anna-Lena also meets interesting colleagues: “Several people work with workflows, and some joined our recent Lorentz Center workshop on automated workflow composition. From that we have a good basis for future cooperation and new application case studies. Another area of cooperation is in teaching. I teach the Programming in Python course that is taken by many life science students. The course was initiated by UBC members in 2018 and coincidentally my department had started a Computational Thinking course with almost identical learning goals. Since 2019 I do teach both courses together under the name Computational Thinking and Programming in Python.  Within the UBC we frequently discuss experiences and share material with each other.”

Seminars, symposia and sharing coffee

“For my work it is essential to observe how researchers work with software, especially how they use workflows, and what current trends and problems are. The UBC community has a lot to offer here. I learn about that from talks at the UBC seminar and the symposium, but also and maybe even more from talking to people during poster sessions and coffee breaks. The younger members of my group do not have a background in bioinformatics, so for them it is also a great opportunity to learn about the field and to practice effective communication with people from other domains.”

FAIR software

“We design our technologies to be domain-independent, so they are in principle applicable to all computational science disciplines, which are almost all nowadays. Next to bioinformatics we have some really interesting applications in the geosciences. For example, with the Department of Human Geography and Spatial Planning, we work on automated workflow composition for question-based analysis of geographic information. Another relevant research topic to name here is probably the work on FAIR software that I have been involved in for a while now. Last year we published the first comprehensive paper on FAIR principles for research software, which has received a lot of attention and is one of the foundations of the eScience Center’s Five Recommendations for FAIR Software. These principles are again relevant to all scientific areas that rely on research software.”

Credit for your software

Since Anna-Lena works together with a lot of different people, she probably has some suggestions for those who are new in her field of expertise. Here’s her advice: “Bioinformaticians: Please make software not only to get results for a paper, but care about good software.  Acquire decent RSE skills, follow best practices, and make your software open and FAIR. It will help other people to reuse and credit your software development efforts. When you register your tool in a registry like bio.tools, along with rich metadata, we can furthermore include it in our workflow exploration engine, so that it can become part of automatically created workflows.”

“Software technologists in RSE should of course follow the same best practices.  RSE researchers aim to understand the problems and needs more broadly, so they hould not limit themselves to applying current technology, but think forward and develop new software technology for the future.”