Why systems biology needs empiricism
So this post is mainly a response to this excellent piece by Dr. Arjun Raj. This isn’t exactly a rebuttal (despite what the title says), but just some thoughts I wanted to develop on, especially since the original post has been rattling around in my head for a while now. Plus, my current research project is something I consider systems biology, but is almost purely ‘empirical’ (I prefer experimental), and so at least somewhat in contrast to the vision of systems biology described by Raj.
First a couple of caveats. One, I’m barely a systems biologist. Yes, I have 1/2 a Master’s degree in the subject (the other 1/2 is synthetic biology) but I skipped to another field for my PhD and I’ve only just (~2mos) started proper research in this space. Secondly, I work on plants, very different from the usual models of choice for systems biology, especially the quantitative kind.
Dr. Raj’s main arguments (as I read them) are:
i. We need a deeper understanding of biology than empiricism provides
Raj uses a pretty simple example to show the difference between a genuine understanding of mechanism, or a “model” that works and an empirical explanation of a phenomenon.
Take a lever, he says. Essentially the model for how a lever works is described by Archimedes’ law relating the force input and output of a lever with the distances from the fulcrum. Once we have this basic understanding for how a lever works, we know how any lever works. Thus, a functional model is generalisable, and makes accurate predictions about how something works. And in this case at least, it’s built on a solid understanding of the physics involved.
In contrast, according to Raj, a modern empirical approach would be to test all possible variations of force and distance to generate a relationship between force and distance. The end result however, may be the same. We have a model that explains how a lever works.
The difference between the two is that the first approach explains a mechanism in a way that leads to, and signifies, a deeper understanding of things beyond levers. i.e. a theoretically-derived model suggests a better grasp of the wider subject (the motion of objects) than a narrower, experimentally-determined solution which only explains a specific phenomenon.
And I agree, systems biology is currently still in the second, empirical mode.
Where I disagree is that I think this dependence on empiricism in systems biology is inevitable, necessary, and not something we need to rue.
Here’s why:
We don’t know what biology looks like.
Admittably, that’s a very glib statement. But what I mean is that while in the previous example we know that we’re dealing with a lever, in most of biology we still don’t know what we’re dealing with. That’s biology today. It may be a lever, or it may be a pulley. Or something else entirely.
It’s (relatively) easy to make theoretical models that work about things we can see, things we can intuitively grasp. It’s much harder when we don’t know what we’re looking at everyday. So I think we desperately need more empiricism to just describe our systems. This may seem odd given how much we’ve learnt about biology in the last 100 years, but not really. Every discovery in biology has led us to revise earlier models to such an extent that they’re almost unrecognisable. And this is true even for the most successful models such as the Central Dogma (which is the layer of abstration I work in).
Here’s the central dogma of information flow in biology as described by Crick in 1957/58: (yes, yes, I have ignored retro-viruses.)
Here’s what it looks like today:
This picture of course is incomplete too. For one, it assumes that the DNA here is static (i.e. no alleles, no transposition etc.). And some of those arows should be fainter than others. And we can barely even guess at what the next technological breakthrough will add to this picture.
The point of this isn’t to invalidate Crick’s simple linear model. On the contrary, it shows how amazing his feat of intuition was. Squint at the second image and you can still see the underlying scheme that Crick came up with. But the new, networked scheme is undoubtedly more accurate. (A similar update was recently made re. plant immunity by Wu et al. 2018. Science). Another example is what single-cell sequencing is doing to textbook definitions of cell-types.
This is how biological research works. People make some observations from experiment; eventually enough observations accrue that enable a model to be built; someone finds something that shows the model is false, or more commonly, limited; we make a new model and so on. Empiricism is a crucial aspect of this cycle. And I think biology is fundamentally a more empiricial science than say, physics. We already have our grand unifying theory for one.
I do agree that models in biology need to be multi-factorial (as evident from the pictures above). An outstanding question is whether we can even hold and compute multi-factorial models in our brains. There’s a reason linear models, however imprecise, hold so much appeal.
(To be clear, by “model”, I mean both mathematical and non-mathematical descriptions)
ii. Experiments in genetics force us into adopting linear models
(Raj doesn’t say this exactly but I think it’s a straightforward inference.)
I largely agree here. One of my major frustrations since entering the world of plant biology is the excessive reliance on forward genetics, suppressor screens and “genetic rescue” experiments. In my opinion, these deal with surface-level abstractions (aka phenotypes), wilfully ignoring any real mechanistic understanding of the system. They do provide useful signposts but I think their time is up, to be honest.
However, I still do see a lot of value in reverse genetics, i.e. knockout/knockdown/mutation experiments, especially when expanded to the systems level. They dig a little deeper and if supported by appropriate biochemistry (yes, this rarely happens I know), they are the best way we have to draw more of those all important network connections. They definitely have more value in my eyes than half-hearted mathematical descriptions of processes that are derived by ignoring much of the rich detail that experimentation provides.
Raj says “most of what we’re doing in biology right now is probably best called engineering. Trying to make cells divide faster or turn into this cell or kill that other cell. And it’s true that look, whatever, if I can fix your heart, who cares if I have a theory of heart?”.
On the contrary, I think if you can fix a heart, you already have a pretty good idea of how a heart works. I’m not going to repeat that tired old Feynman quote about this, but here.
iii. Models/mechanisms need to be more quantitative
Raj’s third argument, I think, is that biological models are too imprecise, they aren’t quantitative enough. He calls for models that come up with predictions like “X explains 40% of the change in Y” rather than “X affects Y”. The problem still is that we don’t know what 100% of Y looks like, making it impossible to state that something explains 40% of it. And so, I think that if people started making quantitative claims like this we’d be in a far worse mess than we are in now. I’m imagining conversations like:
“A: Thus we show that X process regulates 40% of Y”
“B: A is wrong because we show that Z process regulates 72% of Y”
(I’m not even going to go into what would happen with p-hacking in this case.)
I think far more than any other science, biology trafficks in unknowns, and biologists are usually aware of the limits of their knowledge (this excludes biomedical researchers who, at least based on their paper titles, seem to think the planet is populated by humans alone). So when we read statements like “X regulates Y”, most of us know the assumptions and qualifications that tail that statement. (Just like we all know that every paper should include more “maybes” than an editor/reviewer today would allow).
Secondly, we have to live with the fact that it may be impossible to explain 99.9% of any biological phenomenon because I don’t think any biological phenomenon can be isolated to 99.9%.
Where does this leave systems biology?
Scrolling up I realise that, quite by accident, I seem to have dismissed much of what sets systems biology apart from other fields . I think that systems biologists are far too married to the idea of describing all of biology (even beyond metabolism) quantitatively and mathematically (at least for the present). To me the difference between systems biology and regular biology (for lack of a better term) is more philosophical. And as anodyne as this sounds, there is value in a systems-level perspective of biology.
This includes the trillions of reads of seemingly pointless RNAseq data produced so far, or the idea of high-throughput experiments, or of just using seemingly random observations in highly detailed experiments to come up with a generic model for a process. Systems biology just takes generalisation and inference a lot more seriously than other approaches to biology. There is also value holding on to the assumption that there’s a way of understanding biology that can take into account everything from the regulation of a gene in a cell, to the chemical modification of a protein, to an allele in a population, to the behaviour of a dolphin in it’s ecosystem.
And in my view, it takes a systems perspective to describe a genetic network as an electrical circuit, just as it does to realise that genetic information proceeds in a linear, almost unidirectional chain, or to intuit that all genes could contribute to complex traits, OR to just collect a wide enough dataset to expand on either of these hypotheses. This is a far more expansive, and more diffuse, definition of systems biology than most textbooks use, but it’s the one I prefer.