The Yeast Regulatory Network
There are many levels in Life Sciences where Network Analysis and Graph Theory have become essential research tools. That is the case of the so called Gene Regulation at Transcriptional level. As an example, in this post we are going to explain briefly what gene transcription is and how it works and we will use BeGraph to study a specific case. Other Biological Networks with great interest nowadays are Protein Interaction Networks, Metabolic Networks, Signal Transduction Networks, Phylogenic Networks and Trophic Food Webs.
Let us focus in Gene Regulation. We all know that genetic information is stored in the DNA inside a cell of a living individual. We also know that DNA is composed by a huge amount of genes, the functional units of DNA. Each gene, in most of cases, has the information required to synthesize one individual protein. We say that it encodes that particular protein. And it happens that virtually all the tasks and processes, essential for life, that take place in the cell are performed by proteins. Thus, the workflow DNA à Protein à life processes ensures the full unfolding of the code embedded in the DNA into the final living form.
All cells of the same individual share the same DNA, but there are many types of cells with very different functions in a single living individual. Then, how can the cell produce different proteins to fulfill its specific tasks? This is possible because the cell is able to regulate which pieces of its DNA are active and thus can be used to synthesize the proteins in response to internal or external stimuli.
Roughly speaking, these stimuli are, in fact, regulatory molecules, able to turn on (inductors) or to turn off (repressors) the ability of a gene to produce (express) its encoded protein. We use to say that proteins are expressed by their coding genes. Genes do that by firstly copying its own code to an intermediate RNA molecule (a procedure called Transcription). These RNA copies are later used as information templates, during the final synthesis of the expressed protein (as a virtual master copy, the DNA information is fully preserved in this way, while the RNA copy can be used once and again and is finally burnt along the process).
Rather frequently, these expressed proteins are, in turn, regulatory elements too, able to turn on or to turn off the transcription of other genes. In such a case, we refer them as Transcription Factors. Sometimes these regulatory cascades are rather complex and involves dozens or even hundred target genes linked in what we call a Transcription Regulatory Network or simply a Transcription Network. They are the core of the complex regulatory workflows that eventually makes the cell grow, divide or perform any other action. So the basic idea is that some genes in the DNA regulate the activation of other genes in the life cycle of the cell.
Network science comes into play identifying the induction/repression relationship among genes all along the genome with a directed network. Genes are the nodes, and there are two kind of possible links between them. Additionally, some genes can have both functions and self-links are allowed.
We have chosen the gene transcription network of yeast (Saccharomyces Cerevisiae) to illustrate this network class. It is a well-studied and small dataset, composed by 690 nodes and 1094 links. Check out the following visualizations of the network:
The default visual configuration shows the network highlighting the gene regulatory function (activator, repressor and one link with dual regulation). This visualization is adequate to navigate through the genes and examine specific regulatory paths with detail.
The importance of genes can be ranked according to their OutDegree (number of genes that it regulates) or InDegree (number of genes that regulate it). In those visualization, the warmer the node color the higher the In/Out Degree. The network is disconnected and it is composed by 11 components, with a dominant component with 96% of the nodes. Also, the division of the network in communities divides the genes in groups with more internal links than with other groups. Together with the low link density, this information can be used for making preliminary diagnosis, elaborate draft profiles or proposing starting hypothesis about, let’s say, robustness/weakness s against random and directed perturbations or promising therapeutic targets. Finally, link prediction techniques can yield to important breakthroughs in this field.
There are many resources that not only endorse the currently strategic importance of Network analysis in Biology (see, for example the link of the Weizmann Institute here) but also point out new ground-breaking directions for the future of this approach. The E. Coli transcription network is similar to the network presented here. We encourage the reader to visualize it with using the Free Trial version of BeGraph.
The BeGraph team thanks Prof. Hilario Ramírez (Granada University) for his help and suggestions in the elaboration of this post.