This write-up is based on a talk that I gave at the Broad Institute. You can find the slides from that talk here
In spite of their impressive performance on a number of learning tasks, Neural Networks still largely exist as computational black boxes. It is still very challenging to understand why and how they perform so well. Within Biology, we have developed a rich set of experimental ideas, methods, and tools in our quest to understand and interpret biological black boxes (i.e organisms and the cells that comprise them). Is it possible to use these ideas, tools, and techniques to develop a general framework for understanding and interpreting trained Neural Networks? In this note, we explore this question, and present preliminary experiments and data that point to this possibility.
In particular, we explore preliminary experimental designs and analysis methods geared towards answering the following core questions;
- Do the weights of neural networks, that is, the networks' learned representations, contain functional modules?
- Are these functional modules, if they exist, associated, in some fashion, to the traits exhibited by the trained networks?
- Can we isolate these functional modules, if they exist, and confirm their association to particular traits through Gain or Loss of function experiments?
We hypothesize that, if we're able to answer all the questions above in the affirmative, then methods, concepts, and ideas developed in the life sciences to help us understand and interpret complex biological black boxes can be refined and repurposed to aid in our understanding and interpretation of complex computational black boxes such as trained Neural Networks.