MyoMiner: Explore Gene Co-expression in Normal and Pathological Muscle

We have gathered the last 10 years of muscle-related microarray data from the public repository Gene Expression Omnibus (GEO) and Array Express (AE) to perform correlation analyses on both human and mouse organisms. Following searches with muscle-related keywords, the details of specific studies and samples were screened manually to select only those samples pertinent to muscle research.

To exclude duplicated raw CEL files these were identified using in-house developed tools as follows: CEL files were converted to ASCII text format and their intensities were concatenated to a string which was issued with a hash key (combined MD5, SHA1 and CRC32 hash algorithms). We performed an overall quality control (QC) analysis using a battery of BioConductor packages: 'simpleaffy', 'affyQCReport', 'genefilter' and 'affyPLM', each using the MAS 5.0 algorithm and the Affymetrix default Chip Description File (CDF). Arrays with extreme values on the combined QC's were abandoned. Because of the studies (series) being carried out in different labs by different scientists on different dates, batch effects will be present in such a large scale analysis and normalization does not address this issue. We statistically approach the batch effects with the use of ComBat where we input technical variables (series and dates) as surrogates for batch.

We created the following pipeline for raw data pre-processing: QC, batch effects correction, background correction, probe summarization and normalization with SCAN algorithm of the raw intensity data (CEL) using the most up to date CDF from Brainarray. We selected the expressed genes per organism using the UPC algorithm. We used Spearman correlation coefficients to identify gene co-expression between two genes, and did multiple test correction with False Discovery Rate. All the pre-calculated co-expressions are stored on a relational MySQL database at okeanos cloud.

We chose the most abundant microarray platforms found on GEO repository, HG-U133 Plus 2.0 for human and MG 430 2.0 for mouse, acquiring 2374 mouse and 2228 human samples. We built a simple and easy-to-use web interface to search for transcriptional co-expression of any expressed gene pair in muscle cells/tissues and various pathological conditions. So far we have included 106 human categories based on age, sex, anatomic part and condition. Users can select a category and a gene of interest, and MyoMiner will return all the expressed correlated genes. Accessing the correlation significance can be done with the FDR adjusted p-value and Confidence Intervals. A standardized scatterplot is available for every gene pair by pressing the corresponding r value. In the network tab the user can create a 2-shell network based on the top correlated genes, or can input a gene list and find the correlated or linked genes. Users can also test whether any two correlation coefficients from different conditions are significantly different from one another.

These co-expression analyses will help muscle researchers to delineate the tissue-, cell-, and pathology-specific elements of muscle protein interactions, cell signaling and gene regulation. Changes in co-expression between pathologic and healthy tissue may suggest new disease mechanisms and therapeutic targets. MyoMiner is a powerful muscle specific database for the discovery of genes that are associated in related functions based on their co-expression.

To access the tool press Search on the Menu bar.