Hannenhalli Lab

Genomics and Computational Biology

Penn Center for Bioinformatics

Department of Genetics

University of Pennsylvania

Photo:Varanasi, India



        Research
            People
                Publication
                    Rotation Projects
                Tools
            Teaching
        Contact






Research


While broadly interested in computational biology, we focus on biological questions pertaining to eukaryotic gene regulation and its evolution. We develop computational approaches to harness the huge amount of biological data (genomes, transcriptomes, proteomes, microarray, Chip-seq etc) to answer specific questions pertaining to gene regulation and molecular evolution. Within these parameters, anything is a fair game. Please check out our publications, and continue reading for a description of our current research activities.

Characterizing transcription factor DNA interactions

Transcription factors (TF) bind to short and often degenerate DNA motifs. From an information-theoretic viewpoint there is insufficient information in these motifs to accurately identify the binding sites. Evolutionarily conserved non-coding regions may be under purifying selection and are thus likely to be functional. We (and others) have used the evolutionary conservation - the so called phylogenetic footprinting, to reduce the false positives in binding site recognition. Positional Weight Matrix (PWM) is the most common representation of DNA binding specificity of a TF. We have closely investigated the degenerate motifs and found that often the binding sites for a TF fall into distinct clusters and by modeling the TF’s DNA binding by a mixture of PWMs instead of a single PWM we can predict the binding sites more accurately. The biological relevance of these clusters is however not known. One possibility is that the clusters correspond to different contexts in which the TF binds, for instance, the interaction partners. This is consistent with the fact that despite a huge variability in the binding sites within a species, specific binding sites are highly conserved over long evolutionary time. A careful quantification of selection pressure acting upon the “low-affinity” binding sites remains an open problem.

PWM model assumes independence between distinct positions within a binding site. Because of multiple bases interacting with a single TF residue, this may not be the case. If indeed two positions are interdependent then a chance mutation at one position is likely to change the selection pressures at the other position. We have compared the evolutionary patterns at pairs of positions within TFBS and have found a prevalence of interdependence.

Eventually any sequence-only based approach to analyze transcription is limited by the fact that transcription is a highly dynamic process and sequence is static. Future work must incorporate the TF protein levels, epigenomic state of the DNA as well as post translational modification status of histones and TFs.

Cis-regulatory Modules

Transcription factors do not act alone but do so as groups of interacting TFs - cis regulatory modules (CRM), that co-regulate functionally related genes. We have exploited genome-wide co-occurrence of binding sites for specific TF pairs as an indication of their interaction. Consider a bi-partite graph of genes and TFs where a TF-Gene edge indicates that the gene might be regulated by the TF. A completely connected subgraph, or a bi-partite clique, is likely to represent a CRM. Applications of this approach to several biological contexts have yielded useful results. We have recently extended this to finding dense subgraphs in a multi-partite graph (see figure) where various parts may represent functional annotation and expression profiles of the genes; this provides further biological interpretation of the detected CRMs. This approach can also be applied to detect subclasses of a motif where each subclass regulates a different set of functionally related genes.

Modification-dependent activity of transcription factors

Activity of many transcription factors depends on the precise post-translational state of the TF protein, which in turn is modified by a variety if modification enzymes, kinases being a prime example of such a modification enzyme. As shown in the figure, TF F regulates gene G only while modified by enzyme M. We are extending the current models of gene transcription by explicitly incorporating the modifying enzyme and its interactions with the transcription factor. As a starting point for a gold-set to benchmark computational approaches we have developed a database (PTMswitchboard) of specific cases of such regulatory “triplets”.

Evolution of transcriptional regulation

The links between the chance DNA alterations and the organismal evolution is of fundamental interest. These links are mediated by systems-level interactions between genes, all the way to the interaction between an individual and its environment. Gene duplications provide a significant fodder for evolutionary innovation and what determines the fate of a duplicated gene is of interest to us. Expression and coding sequence represent two pathways of divergence; relationships between these pathways of divergences, especially the ones with quantifiable functional consequence, may elucidate the selection pressures during the evolution of a gene family. For instance, We have found that for TF gene paralogs the expression divergence is inversely related to the divergence in their DNA binding motifs. Similar investigations of other aspects of functional divergence, neo-functionalization and sub-functionalization etc. are in progress. We are also interested in investigation of the evolution of developmentally important regulatory networks based on the duplication and diversification of individual genes in the network.

Natural selection on regulatory elements

Polymorphisms in the non-coding portion of the human genome are likely to underlie significant component of the inter- and intra-species phenotypic variability. If so, these genomic regions are likely to be evolving under natural selection. However, the non-coding region is a heterogeneous mix of functional elements, each under potentially varying selection regimes. Our preliminary genome-scale investigation of natural selection specifically on putative transcription factor binding sites in human proximal promoters, based on HapMap and Perlegen SNP data, and several population-genetic techniques indicates that in general, human-specific and primate-specific binding sites may be evolving under positive selection. Furthermore a larger-than-expected fraction of high frequency derived alleles in the human-specific sites yields a binding site gain as opposed to a loss. A closer look at these cases coupled with experimental validation may provide insights into human adaptation. We are extending this approach to study signatures of selection in a variety of functional elements, both coding and non-coding, in multiple species.

People


Sridhar Hannenhalli, Associate Professor, Genetics
sridharh at pcbi dot upenn dot edu
Larry N Singh, Postdoc
larryns at pcbi dot upenn dot edu
Logan Everett, Graduate student
(jointly with Steve Master)
loganje at mail dot med dot upenn dot edu
Anchal Vishnoi, Postdoc
(jointly with Plotkin Lab)
anchalv at sas dot upenn dot edu
Kobby Essien, Postdoc
kobby at pcbi dot upenn dot edu
Mugdha Khaladkar, Postdoc
mugdhak at pcbi dot upenn dot edu
Matt Hansen, Postdoc
mhansen at pcbi dot upenn dot edu
Rajashree Raghunathan, Undergraduate
raghuraj at seas dot upenn dot edu

Past lab members

Junwen Wang, Postdoc(currently faculty at U. Hong Kong )
Praveen Sethupathy, Graduate student(currently postdoc at NIH)
Saran Vardhanabhuti, Research Programmer(currently Penn Biostatistics PhD student)
Antony Vo, Research Programmer(currently undergrad at Penn CIS)


Primary Collaborators

Thomas Cappola (Cardiovascular Institute)
Li-San Wang (Bioinformatics, Pathology)
Maja Bucan (Genetics)
Klaus Kaestner (Genetics)
Josh Plotkin (Biology)
Todd Lamitina (Physiology)
Rick Bushman (Microbiology)
Chris Stoeckert (Bioinformatics, Genetics)
Ted Abel (Biology)
Mary Putt (Biostatistics)
Hongzhe Li (Biostatistics)


Past rotation students

Swetha Garrimalla (CMU), Rithun Mukherjee, Tom Petty, Rumen Kostadinov, Perry Evans, Adam Ewing, Le Ba Nguyen, Greg Donahue, Hanno Hinsch, Elizabeth Schmutter


Rotation projects


If you want to talk about computational/(epi)genomics approaches to investigating gene regulation and molecular evolution, I am all ears. There are several open and fun questions pertaining to the areas mentioned above. Please drop by the office if interested. You must be good at scripting/programming, a quantitative thinker and highly interested in biology, mess and all.

Tools


PSPA: Position-specific propensity analysis - a tool for eukaryotic core promoter prediction
PTM-Switchboard: A database of post-translational-modification mediated regulation of transcription factor activity

Teaching


GCB535: Introduction to Bioinformatics (Fall) (Co-director with Steve Master)
GCB537: Advance Computational Biology (Fall) (Co-director with Li-San Wang)
GCB531: Introduction to Genome Sciences (Fall) (Lecturer)
Genetic basis of diseases (Fall) (Discussion leader)



Contact: 215 746 8683 (v), 215 573 3111 (f) 1409 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104