Bioinformatics is now a mature and thriving research field, indispensable in addressing life science research challenges. Paulien Hogeweg and Ben Hesper first coined the term bioinformatics as a work concept. In 50 years the field of bioinformatics has become more and more relevant. Will bioinformatics disappear and become an integral part of life sciences? Or is bioinformatics the biology of the future?
Bioinformatics: the origin of a term and a concept
A lot of mathematical biologists mainly cared about birds and ecosystems. Two bioinformaticians avant-la-lettre, Paulien Hogeweg en Ben Hesper, were also interested in molecules and pioneering RNA-folding as a tool to understand information accumulation in biological systems. As Paulien Hogeweg writes in The Roots of bioinformatics in Theoretical Biology (PLOS Computational Biology): ‘At a minimum, we felt that information processing could serve as a useful metaphor for understanding living systems. We therefore thought that in addition to biophysics and biochemistry, it was useful to distinguish bioinformatics as a research field (or what we termed a work concept).’ Later on, with the accumulation of protein structure and sequence data, the field studying these adapted the name bioinformatics. Interestingly, with the data accumulation in modern-day life sciences, the original idea of information processing is becoming more and more relevant again.The current bioinformatics community in Utrecht is clustered in the Utrecht Bioinformatics Center (UBC). Two of its members share their experiences on bioinformatics.
Pioneering in the ‘90s
Berend Snel, professor in bioinformatics, tells about his findings on bioinformatics: “I was triggered by watching ‘Artificial Life’ on a Sunday evening in the ’90s. I discovered a new way of looking at life. My decision to study biology in Utrecht was greatly influenced by the work and vision of Paulien Hogeweg. During my study sequence analysis was starting to become important, which made the field of bioinformatics even more interesting.
The European gold mine of data was available first at European Molecular Biology Laboratory (EMBL) where computational biologists were all pioneering: a.o. Jaap Heringa (Elixir), Martijn Huynen (Radboud University) and also UU honorary doctor Peer Bork (EMBL). It was magic to do my internship there in the nineties. The freedom to play with data and find new knowledge was great! New patterns, new questions, even more new data, new ways to look at the same data… The current term ‘data recycling’ was yet unknown, but there and then bioinformatics became an essential ingredient of Life Sciences.”
First databases for Medical doctors
Medicine was way too much a knowledge-driven field of science for Adrien Melquiond, senior researcher in Computational Structural Bioinformatics. “I got annoyed by studying textbooks and I wanted to experience a more data-driven way of learning. Aside my studies I did a long term internship of three years in a biomedical imaging research group. On the weekends, I went to hospitals to constitute the first database of ~5000 reference cases of foetal malformations, and train a decision support system applied to ultrasonography. Early 2000s, I was developing a machine learning software to help in real time practitioners facing a suspicion of foetal malformation. I really loved it! So, I kept working with data and software development and later got hooked on structural bioinformatics.”
Why did bioinformatics explode?
What do you think is the biggest breakthrough in the field of bioinformatics? According to Berend Snel: “The big transformation was the possibility to sequence genomes. Having a genome or not having a genome is key to revolutionary developments such as enabling personalised medicine and targeted breeding, e.g. to realize a new plant race within a few generations. Perhaps even more important on a technological level are all other more recent life science data explosions such as genotyping, transcriptomics, or proteomics were only possible because of the availability of genomes. These new techniques can help us unravel what we cannot see in the cellular system. We thus can assemble new data, and work on even more innovative techniques. The basic principle stays the same, but data are tightly linked to techniques that follow up each other and will rapidly be replaced by new ones. I do sometimes already feel old.”
FAIR data sharing
Adrien Melquiond adds: “More recently, developments in both machine learning and deep learning have been playing an important role in our field. As Berend rightfully emphasized, this is only possible because of the sequencing revolution and the wealth of data that came along. The first bioinformatic breakthrough came from the vision of Margaret Dayhoff, back in the fifties, at a time when data sharing was a hassle. She created the first ‘online’ database system of protein and nucleic acid sequences, developed tools to interrogate this database and optimized file size with the still used one-letter code for amino-acids. This was the first example of a systematic, smart and well documented way of storing, sharing and querying data! Because sharing data ‘equally’ is essential, FAIR data principles have been conceived by pioneering bioinformaticians in a true heritage of Margaret Dayhoff.”
Bioinformatics in the future
In metagenomics, bioinformaticians speak about 70% dark matter. That is the most intriguing of the field of bioinformatics: there is so much we do not know yet! The more techniques, the more data will become available. “This is endless and points out that we must change the way we work. The wideness of applicability is enormous and the more people do more creative things, the more new results will be available. Where will this end? Will bioinformatics disappear and become an integrated part of biology or is bioinformatics the biology of the future?”
Citation:
Hogeweg P (2011) The Roots of Bioinformatics in Theoretical Biology. PLoS Comput Biol 7(3): e1002021.