WEB TOOLS AND DATA MINING FOR MUSCLE SYSTEMS/OMICS DATA
Muscle Gene Sets
Gene signatures of muscle pathology & physiology

More than ten thousand samples of muscle transcriptomic data have been uploaded to the public Gene Expression Omnibus in the past ten years, representing many millions of dollars of research expenditure and incalculable hours of research effort. These data ought to serve as a massive reference set for ongoing and future studies of neuromuscular disorders. One way to distil the data and render them more accessible to bench researchers is to extract from each study lists of genes ("gene sets") that were differentially expressed. With careful curation, each transcriptomic dataset may yield multiple comparisons, not only relating to the primary focus of that study, such as a pathology or an experimental treatment, but also more general comparisons not necessarily envisaged by the study’s authors, but relating to factors such as age, gender, and muscle group.

Muscle gene sets may be used in several ways, including:

1. To aid in the interpretation of new omics data by allowing their comparison with previous data

2. To uncover overlap between pathologies or treatments, thereby identifying common signatures and possible biomarkers

3. To uncover overlap between muscle gene regulatory processes, muscle disease, and the gene ontology

4. To determine which genes are frequently differentially expressed in muscle experiments and disease, thus identifying potentially important contributors to muscle function and pathology.

We have now completed the extraction of Muscle Gene Sets from several hundred published muscle datasets. We have applied muscle gene sets to several research problems, including inflammatory response in dysferlinopathy, myoblast regenerative capacity in muscle ageing, and the identification of disease-contributing genetic variants.

Contact us at for more information about Muscle Gene Sets.

If you used the Muscle Gene Sets in your research please cite:

Malatras A, Duguez S, Duddy W: Muscle Gene Sets: a versatile methodological aid to functional genomics in the neuromuscular field. Skeletal Muscle 2019, 9(1):10.

Skeletal Muscle link

DOWNLOAD MUSCLE GENE SETS (.gmt format)

The current download (version 3: released March 2019) is comprised of 1,517 Gene Sets. Of these, 1,156 were derived from our recent analysis of 302 studies of muscle physiology and disease published from 2005-present. A further 122 were derived from published in vitro muscle microarray studies carried out from 2005-present, as used in our previous work, and 185 were derived from a previous meta-analysis carried out by Jelier et al. 2005. The remaining 54 are from muscle-related gene ontology terms, but also several other muscle-relevant entries in the MSigDB database. The second column of the gmt file contains some description of each gene set. For muscle gene sets derived from transcriptomic studies, we provide a PubMed reference or the series number from the Gene Expression Omnibus.


Consensus MGS (.gmt format)

The consensus MGS consists of gene sets that were recurrently dysregulated in the same direction across multiple different studies. For each comparison (e.g. DMD v Healthy), we identified genes that were common to >30%, >50%, and >70% of studies. For each of these percentage cut-offs, we created gene sets that were consistently upregulated, downregulated, or dysreglated in the same direction, across the studies. The second column of the .gmt file gives summary statistics on the number of gene sets (i.e. previous studies) from which the consensus is drawn, the total number of genes in those gene sets, and the proportion that made it into the consensus set. The lists of MGS from which the consensus sets are drawn are given here.

Legacy download: version 2 (.gmt format)

Version 2 is comprised of 393 Gene Sets. Of these, 154 were derived from published in vitro muscle microarray studies carried out from 2005-present, and a further 185 were derived from a previous meta-analysis carried out by Jelier et al. 2005. The remainder are from muscle-related gene ontology terms, but also several other muscle-relevant entries in the MSigDB database.

Legacy download: version 1 (.gmt format)

Version 1 is comprised of 236 Gene Sets. These were derived mainly from a previous meta-analysis carried out by Jelier et al. 2005, and from muscle-related gene ontology terms, but also several other muscle-relevant entries in the MSigDB database.


The Muscle Gene Sets resource is free for academic/non-profit use.
Contact us at for more information about Muscle Gene Sets.
Muscle Gene Set Word Clouds
The first cloud shows the most frequent 300 genes among the Muscle Gene Sets, while the second cloud shows the most frequent 100 in the DMD Gene Sets alone. The bigger the gene name, the more frequently represented it is in the Muscle Gene Sets. Clicking on a gene name links to a PubMed search for that gene + the keyword "muscle", or to the uniprot page for that gene.
CLICK HERE TO TOGGLE BETWEEN PubMed/UniProt LINKS.
The word clouds were generated using Tagul.com (though we counted the genes ourselves!).