WEB TOOLS AND DATA MINING FOR MUSCLE SYSTEMS/OMICS DATA
Bill Duddy Lecturer in Stratified Medicine (Bioinformatics)
Northern Ireland Center for Stratified Medicine
/ +44(0)28-71-675-686
I develop and apply data mining and systems biology bioinformatics approaches to stratified medicine, with a focus on neuromuscular disease and the functional analysis of omics data. My most recent tool is CellWhere which facilitates the graphical display of gene association networks organised on subcellular localizations.
PROJECTS AND INTERESTS
Stratified Medicine
High-throughput "omics" technologies provide tremendous scope to identify characteristics by which patients can be grouped. This can be used to predict how a disease will progress differently from person to person, or to predict which drug(s) will work best for which patients.
I'm interested to use network biology to identify clusters of interacting genes or proteins that influence the severity of a disease or that can affect a patient’s response to a given therapy.
Muscle, neuromuscular disease, and data mining
I'm interested in trying to make sense of muscle omics data. Which RNAs, proteins, and other molecules are important to the normal function of the muscle? What does it mean when the level of one or more of these molecules is consistently changed in disease or under experimental conditions? How do molecular expression profiles relate to pathways, organelles, and metabolic components of the cell? How can we extract the most information from existing datasets, and how can we best compare new datasets with old? Achieving better approaches to these questions can help to suggest new therapeutic avenues for neuromuscular disorders.

I also want to make bioinformatics resources more accessible to muscle biologists, and to increase the ease of interpretation in the visual display of bioinformatics data.

Asides from muscle research and integrative systems biology, I have a background in mining large datasets.
Pathways, Networking, and
Since the creation of the IMEx consortium and standardized data formats such as that of the Proteomics Standards Initiative, it is now trivial to obtain (e.g. from resources such as Mentha , IntAct , or String ) a carefully curated list of experimentally-determined protein-protein interactions (PPIs) for a given organism. Resources for heavily studied model organisms such as yeast and mouse, and human cell lines, now list tens of thousands of PPIs, and these list are quite comprehensive in terms of covering all 'known' (published) interactions.

I've been using the whole interactome, consisting of all PPIs and of interactions with other molecule types, to help explore muscle function and especially to add another layer of information to muscle transcriptomic or proteomic data. My work has made use of the resources listed above, but in addition I'm working on ways to adapt analyses specifically towards muscle cell types. An example of this is our CellWhere tool which visually displays PPI networks, organizing them according to protein subcellular location. Alongside its generic function giving the most frequently annotated subcellular locations of given proteins, it also facilitates the highlighting of subcellular locations that are of special pertinence to a particular research project. In our own work, we often use the CellWhere tool to prioritize locations such as the muscle contractile aparatus, or the neuromuscular junction.

CellWhere publication (L. Zhu et al., Nucleic Acids Res. 43, W571–W575, 2015) . CellWhere is online here .

I have a long-term goal of providing network exploration tools and analyses that are adapted to the specific details of muscle cell function.
Muscle Gene Sets
More than ten thousand samples of muscle transcriptomic data have been uploaded to the public Gene Expression Omnibus in the past ten years, representing many millions of dollars of research expenditure and incalculable hours of research effort. These data ought to serve as a massive reference set for ongoing and future studies of dysferlinopathy and other neuromuscular disorders. One way to distil the data and render them more accessible to bench researchers is to extract from each study lists of genes ("gene sets") that were differentially expressed. With careful curation, each transcriptomic dataset may yield multiple comparisons, not only relating to the primary focus of that study, such as a pathology or an experimental treatment, but also more general comparisons not necessarily envisaged by the study’s authors, but relating to factors such as age, sex, and muscle group.

Muscle gene sets may be used in several ways, including: (1) to aid in the interpretation of new omics data by allowing their comparison with previous data; (2) to uncover overlap between pathologies or treatments, thereby identifying common signatures and possible biomarkers; and (3) to determine which genes are frequently differentially expressed in muscle experiments and disease, thus identifying potentially important contributors to muscle function and pathology. We have extracted several hundred gene sets from published muscle data, focused on in vitro studies, and are now extending this to in vivo studies. Preliminary meta-analysis shows that muscle function ontologies are enriched among the more frequently differentially expressed genes. We have applied muscle gene sets to several research problems, including inflammatory response in dysferlinopathy, myoblast regenerative capacity in muscle ageing, and the identification of disease-contributing genetic variants.

Our work on muscle gene sets is ongoing in collaboration with the team of Silvio Bicciato at the university of Padova.
Exon Skipping
One of the more frequent and devastating muscle diseases is Duchenne muscular dystrophy (DMD). This disease is caused by DNA mutations to the coding sequence, 79 exons in length, of a very large filamentous structural protein called dystrophin. Generally, mutations that result in DMD are those that render the coding sequence meaningless, whereas other mutations that only cause loss of parts of the coding sequence can sometimes result in a truncated but still partially functional dystrophin protein. These latter mutations are responsible for a milder disease known as Becker muscular dystrophy. Usually the DMD-causing mutations disrupt the codon reading frame of the mRNA transcript. However, due to the intricacies of the codon alignments among the 79 exons, it is possible to restore the reading frame by the use of small drug molecule DNA analogues that target exons surrounding the mutation. The goal of this exon skipping approach is to produce truncated dystrophin protein that will improve the severity of DMD towards that of BMD. The strategy is a leading possibility among the few therapeutic approaches that may be successful for DMD.

Successfully targeting a given exon, however, is not always easy, and depends on identifying a DNA-analogue sequence that will bind strongly to its target exon at a location that blocks the splicing machinery's capacity to recognise that stretch of pre-mRNA as exonic. I've used previous experimental data together with computational approaches to create a predictive algorithm to help researchers design new exon-skipping drug sequences, and we have validated this algorithm with the help of Toshifumi Yokota's team at the University of Alberta.

Our exon skipping algorithm was published in PLoS ONE.
COLLABORATIONS
Data analysis and tool development
I'm involved concurrently with the omics data analysis aspects of around a dozen collaborative projects, working with researchers (both at the Center for Stratified Medicine and internationally) who study neuromuscular disorders such as Amyotrophic Lateral Sclerosis, Duchenne and Becker muscular dystrophies, and Dysferlinopathy (LGMD2B).

In tool development, I'm mainly interested to better explore omics data. In particular, tools that can be adapted towards muscle research. This includes tools to visualize pathways and networks, such as our CellWhere tool ( site | paper ), and also tools to extract information from previous datasets, such as our current Muscle Gene Sets project that is under development ( site ). I have also developed algorithms for the design of better exon skipping drugs for DMD ( paper ).

I'm always interested to analyze new datasets and always ready to consider new collaborations in tool development.
PUBLICATIONS
PubMed Show here Link out
Google Scholar Link out