While broadly interested in computational biology, we focus on biological questions pertaining to eukaryotic gene regulation and its evolution. We develop computational approaches to harness the huge amount of biological data (genomes, transcriptomes, proteomes, microarray, Chip-seq etc) to answer specific questions pertaining to gene regulation and molecular evolution. Within these parameters, anything is a fair game. Please check out our publications, and continue reading for a description of our current research activities.
Characterizing transcription factor DNA interactions
Transcription factors (TF) bind to short and often degenerate DNA motifs. From an
information-theoretic viewpoint there is insufficient information in these
motifs to accurately identify the binding sites. Evolutionarily conserved
non-coding regions may be under purifying selection and are thus likely to
be functional. We (and others) have used the evolutionary conservation -
the so called phylogenetic footprinting, to reduce the false positives in
binding site recognition. Positional Weight Matrix (PWM) is the most common
representation of DNA binding specificity of a TF. We have closely
investigated the degenerate motifs and found that often the
binding sites for a TF fall into distinct clusters and by modeling the TF’s DNA
binding by a mixture of PWMs instead of a single PWM we can predict the
binding sites more accurately. The biological relevance of these clusters
is however not known. One possibility is that the clusters correspond to
different contexts in which the TF binds, for instance, the interaction
partners. This is consistent with the fact that despite a huge variability
in the binding sites within a species, specific binding sites are highly
conserved over long evolutionary time. A careful quantification of
selection pressure acting upon the “low-affinity” binding sites
remains an open problem.
PWM model assumes independence between distinct positions within a binding
site. Because of multiple bases interacting with a single TF residue, this
may not be the case. If indeed two positions are interdependent then a
chance mutation at one position is likely to change the selection pressures
at the other position. We have compared the evolutionary patterns at pairs
of positions within TFBS and have found a prevalence of interdependence.
Eventually any sequence-only based approach to analyze transcription is
limited by the fact that transcription is a highly dynamic process and
sequence is static. Future work must incorporate the TF protein levels,
epigenomic state of the DNA as well as post translational modification
status of histones and TFs.
Cis-regulatory Modules
Transcription factors do not act alone but do so as groups of interacting TFs - cis regulatory modules (CRM),
that co-regulate functionally related genes. We have exploited genome-wide co-occurrence
of binding sites for specific TF pairs as an indication of their interaction.
Consider a bi-partite graph of genes and TFs where a TF-Gene edge indicates that the gene might be regulated by the TF.
A completely connected subgraph, or a bi-partite clique, is likely to represent a CRM.
Applications of this approach to several biological contexts have yielded useful results.
We have recently extended this to finding dense subgraphs in a multi-partite graph (see figure)
where various parts may represent functional annotation and expression profiles of the genes;
this provides further biological interpretation of the detected CRMs. This approach can also be
applied to detect subclasses of a motif where each subclass regulates a different set of functionally related genes.
Modification-dependent activity of transcription factors
Activity of many transcription factors depends on the precise post-translational state of the TF protein,
which in turn is modified
by a variety if modification enzymes, kinases being a prime example of such a modification enzyme.
As shown in the figure, TF F regulates gene G only while modified by enzyme M. We are extending the
current models of gene transcription by explicitly incorporating the modifying enzyme and its
interactions with the transcription factor. As a starting point for a gold-set to benchmark computational
approaches we have developed a database
(PTMswitchboard)
of specific cases of such regulatory “triplets”.
Evolution of transcriptional regulation
The links between the chance DNA alterations and the organismal evolution is of fundamental interest.
These links are mediated by systems-level interactions between genes, all the way to the interaction
between an individual and its environment. Gene duplications provide a significant fodder for evolutionary
innovation and what determines
the fate of a duplicated gene is of interest to us. Expression and coding sequence represent two pathways of divergence;
relationships between these pathways of divergences, especially the ones with quantifiable functional consequence,
may elucidate the selection pressures during the evolution of a gene family. For instance, We have found that for
TF gene paralogs the expression divergence is inversely related to the divergence in their DNA binding motifs.
Similar investigations of other aspects of functional divergence, neo-functionalization and sub-functionalization etc.
are in progress. We are also interested in investigation of the evolution of developmentally important regulatory networks
based on the duplication and diversification of individual genes in the network.
Natural selection on regulatory elements
Polymorphisms in the non-coding portion of the human genome are likely to underlie significant component of the inter- and intra-species phenotypic variability. If so, these genomic regions are likely to be evolving under natural selection. However, the non-coding region is a heterogeneous mix of functional elements, each under potentially varying selection regimes. Our preliminary genome-scale investigation of natural selection specifically on putative transcription factor binding sites in human proximal promoters, based on HapMap and Perlegen SNP data, and several population-genetic techniques indicates that in general, human-specific and primate-specific binding sites may be evolving under positive selection. Furthermore a larger-than-expected fraction of high frequency derived alleles in the human-specific sites yields a binding site gain as opposed to a loss. A closer look at these cases coupled with experimental validation may provide insights into human adaptation. We are extending this approach to study signatures of selection in a variety of functional elements, both coding and non-coding, in multiple species.
|
Sridhar Hannenhalli, Associate Professor, Genetics sridharh at pcbi dot upenn dot edu |
|
|
Larry N Singh, Postdoc larryns at pcbi dot upenn dot edu |
|
|
Logan Everett, Graduate student
(jointly with Steve Master) loganje at mail dot med dot upenn dot edu |
|
|
Anchal Vishnoi, Postdoc
(jointly with Plotkin Lab) anchalv at sas dot upenn dot edu |
|
|
Kobby Essien, Postdoc kobby at pcbi dot upenn dot edu |
|
|
Mugdha Khaladkar, Postdoc mugdhak at pcbi dot upenn dot edu |
|
|
Matt Hansen, Postdoc mhansen at pcbi dot upenn dot edu |
|
|
Rajashree Raghunathan, Undergraduate raghuraj at seas dot upenn dot edu |
|
Past lab members
Junwen Wang, Postdoc(currently faculty at U. Hong Kong )
Praveen Sethupathy, Graduate student(currently postdoc at NIH)
Saran Vardhanabhuti, Research Programmer(currently Penn Biostatistics PhD student)
Antony Vo, Research Programmer(currently undergrad at Penn CIS)
Primary Collaborators
Thomas Cappola (Cardiovascular Institute)
Li-San Wang (Bioinformatics, Pathology)
Maja Bucan (Genetics)
Klaus Kaestner (Genetics)
Josh Plotkin (Biology)
Todd Lamitina (Physiology)
Rick Bushman (Microbiology)
Chris Stoeckert (Bioinformatics, Genetics)
Ted Abel (Biology)
Mary Putt (Biostatistics)
Hongzhe Li (Biostatistics)
Past rotation students
Swetha Garrimalla (CMU), Rithun Mukherjee, Tom Petty, Rumen Kostadinov, Perry Evans, Adam Ewing, Le Ba Nguyen, Greg Donahue, Hanno Hinsch, Elizabeth Schmutter
If you want to talk about computational/(epi)genomics approaches to investigating gene regulation and molecular evolution, I am all ears. There are several open and fun questions pertaining to the areas mentioned above. Please drop by the office if interested. You must be good at scripting/programming, a quantitative thinker and highly interested in biology, mess and all.
PSPA: Position-specific propensity analysis - a tool for eukaryotic core promoter prediction
PTM-Switchboard: A database of post-translational-modification mediated regulation of transcription factor activity
GCB535: Introduction to Bioinformatics (Fall) (Co-director with Steve Master)
GCB537: Advance Computational Biology (Fall) (Co-director with Li-San Wang)
GCB531: Introduction to Genome Sciences (Fall) (Lecturer)
Genetic basis of diseases (Fall) (Discussion leader)
Contact: 215 746 8683 (v), 215 573 3111 (f) 1409 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104