Professor Bane Vasić co-author of Barcodes for DNA sequencing with guaranteed error correction capability from the University of Arizona talks about his interest in theoretical biology.
My interests lie primarily in Information Theory and its applications to varied fields like magnetic storage, optical communications, compressed sensing, and theoretical biology. Over the past decade, my main focus has been the design and theoretical analysis of signal processing algorithms for communications – primarily error-correction codes and decoders, and detectors.
My interests in theoretical biology lie primarily in the information-theoretical approach to designing mathematical models of gene regulatory networks (GRNs). The DNA of living organisms is highly resilient to damage which strongly indicates the existence of powerful error-correction mechanisms underlying DNA repair. Any mathematical approach (i.e. GRNs) that aims to model time-dependence of gene-expression in living organisms must incorporate this error-correcting capability as well. For me, as an information theorist, it is natural to view the process of maintaining the integrity of DNA as the error correction coding problem.
An ultimate goal of computational biology is to describe a formal-logical and causal description of interaction among the genes - a genetic wiring diagram which may be viewed as a digital logic circuit of the cell’s error control system. A precise understanding of this system is still in its infancy. Fortunately, error-correction has long been a cornerstone of research in information theory, thereby making it a highly promising field to infuse with theoretical biology.
This project is funded by the National Science Foundation and its goal is to develop mathematical models that help to explain time-behaviour of gene-level interactions in living organisms. Studies of gene-expression suggest that GRNs are very sparsely connected. This observation is the foundation of our current focus of research in this area, namely, using low-density parity-check (LDPC) codes (based on highly sparse graphs) to make robust, synthetic Boolean GRNs. LDPC codes also exhibit sparsity, as well as robustness, hence by intelligently choosing codes and update rules of gene-profiles, it is possible to model Boolean GRNs that exhibit these characteristics. In addition, cyclicity is also achieved by an intelligent design of codes.
Interestingly enough, this analogy between coding theory and Boolean GRNs also ties in concepts from the design of fault-tolerant systems, whose goals are directed towards achieving reliable computation/storage in the presence of computing/storage elements with propensity error, which is in alignment with the fundamental observational tenets of DNA regulation.
In our work, distinct DNA barcodes – short DNA fragments – are added to samples, which are subsequently sequenced together (multiplexing). These barcodes then serve to identify the individual samples. However, with multiplexing we run the risk of sample misclassification – a consequence of sequencing error. With the use of DNA barcodes, such misclassifications may be compensated for. DNA barcodes are designed so as to be as distinguishable from each other as possible (à la error-correcting codes). Though there have been other barcode designs relying on error-correcting codes, they suffer either in terms of error-correction capability, or complexity. We have introduced a new class of barcodes that are resilient to misclassifications, highly scalable, and easy to implement. My collaborators Anantha Krishnan, David Galbraith, Megan Sweeney and Jelena Vasic deserve special mention here, whose valuable and innovative contributions this paper showcases.
In the context of barcodes, I would like to see the emergence of more powerful and longer DNA barcodes that will drive even greater multiplexing than currently available. Taking another perspective, I would also like to see the fruition of practicability of better and more scalable designs arising from research into quaternary codes, which at this time is restricted solely to creation of bounds (of error-correction capability and rates). In the broader context, I am sure the next few years will witness the involvement of information theorists in solving fundamental theoretical problems in biology.
The Letter presenting the results on which this interview is based can be found on the IET Digital Library.