Muscle Gene Sets
Gene signatures of muscle pathology & physiology

More than ten thousand samples of muscle transcriptomic data have been uploaded to the public Gene Expression Omnibus in the past ten years, representing many millions of dollars of research expenditure and incalculable hours of research effort. These data ought to serve as a massive reference set for ongoing and future studies of neuromuscular disorders. One way to distil the data and render them more accessible to bench researchers is to extract from each study lists of genes ("gene sets") that were differentially expressed. With careful curation, each transcriptomic dataset may yield multiple comparisons, not only relating to the primary focus of that study, such as a pathology or an experimental treatment, but also more general comparisons not necessarily envisaged by the study’s authors, but relating to factors such as age, sex, and muscle group.

Muscle gene sets may be used in several ways, including:

1. To aid in the interpretation of new omics data by allowing their comparison with previous data

2. To uncover overlap between pathologies or treatments, thereby identifying common signatures and possible biomarkers

3. To determine which genes are frequently differentially expressed in muscle experiments and disease, thus identifying potentially important contributors to muscle function and pathology.

We have extracted several hundred gene sets from published muscle data, focused on in vitro studies, and are now extending this to in vivo studies. Preliminary meta-analysis shows that muscle function ontologies are enriched among the more frequently differentially expressed genes. We have applied muscle gene sets to several research problems, including inflammatory response in dysferlinopathy, myoblast regenerative capacity in muscle ageing, and the identification of disease-contributing genetic variants.

Contact us at for more information about Muscle Gene Sets.


We are currently working to derive signatures from in vivo muscle transcriptomic studies from 2005 up to the present and will update here as the work progresses. Contact us at if you would like to work with more up-to-date (unpublished) Muscle Gene Sets.

The current download (version 2: released April 2016) is comprised of 393 Gene Sets. Of these, 185 were derived from a previous meta-analysis carried out by Jelier et al. 2005, and a further 154 were derived from published in vitro muscle microarray studies carried out from 2005-present. The remainder are from muscle-related gene ontology terms, but also several other muscle-relevant entries in the MSigDB database. The second column of the gmt file contains some description of each gene set. For muscle gene sets derived from transcriptomic studies, we provide a PubMed reference or the series number from the Gene Expression Omnibus.

Legacy download: version 1 (.gmt format)

Version 1 is comprised of 236 Gene Sets. These were derived mainly from a previous meta-analysis carried out by Jelier et al. 2005, and from muscle-related gene ontology terms, but also several other muscle-relevant entries in the MSigDB database.
Muscle Gene Set Word Clouds
The first cloud shows the most frequent 300 genes among the Muscle Gene Sets, while the second cloud shows the most frequent 100 in the DMD Gene Sets alone. The bigger the gene name, the more frequently represented it is in the Muscle Gene Sets. Clicking on a gene name links to a PubMed search for that gene + the keyword "muscle", or to the uniprot page for that gene.
The word clouds were generated using Tagul.com (though we counted the genes ourselves!).