3.6. Lifetime Co-Expressed Transcript Analysis: AgingNet
Last updated
Last updated
Co-expression analysis consists of creating a network that represents the existing relationships between transcripts. In this network, called AgingNet, its nodes are represented by transcripts and their interrelationships by the similarity between their expression profiles throughout life. Furthermore, it is possible to identify groups of highly co-expressed transcripts, called co-expression modules, which may represent specific biological processes. Briefly, the similarity between all pairs of transcripts was calculated by the DTW-MIC metric on the ageCollapsed dataset, creating a similarity matrix. Then, the matrix was clustered using the HDBSCAN algorithm (CAMPELLO; MOULAVI; SANDER, 2013) to identify co-expression modules.
To determine the similarity of the expression profile between pairs of transcripts throughout life, a metric developed for time series, called DTW-MIC ((RICCADONNA et al., 2016)) was used. This metric is a combination of two others, Dynamic Time Warping (DTW, (ITAKURA, 1975)) and Maximal Information Coefficient (MIC, (ALBANESE et al., 2018)).
DTW is a measure of the distance between two sequences, which takes into account temporal displacements. The DTW algorithm uses dynamic programming to find an optimal alignment between the two series, through a non-linear distortion of the time axes. The magnitude of this distortion is reflected in the dissimilarity value, and the similarity between the shapes of the curves has a greater impact on the DTW than the value of the point-to-point distance between the two time-series, as in the case of the Euclidean distance ( Figure 3.5). To obtain the similarity measure (DTWs) from the dissimilarity distance (DTWd), the transformation DTWs = 1/(1+ DTWd) was used, where DTWd is the normalized distance of two-time series, calculated using the dtw package.
The MIC measure is a member of a family of statistics known as Maximal Information-based Nonparametric Exploration (MINE) and was developed to explore the relationships between two variables in multidimensional datasets. The two distinguishing features of the MIC are generality, being possible to capture relationships of different natures, and evenness, which is the property of penalizing similar noise levels in the same way, regardless of the nature of the relationship between the variables as represented in Figure 3.6.
As mentioned earlier, in this work a composition of DTW measures with MIC was used, called DTW-MIC (RICCADONNA et al., 2016) and defined as:
This new definition combines contributions from both the DTW and the MIC, that is, it takes into account both temporal shifts and nonlinear correlations. As evidenced through simulations in (RICCADONNA et al., 2016), such a combination is more effective than Pearson's Correlation Coefficient, and also than DTW and MIC if considered separately.
Once the similarity has been calculated for each transcript pair, we obtain a similarity matrix where the number of columns and rows is equal to the number of transcripts and each cell of this matrix represents the similarity between two probes.
3.6.2. Detection of Co-Expression Modules and Sub-Modules
The detection of co-expression modules was done by clustering the transcripts similarity matrix through a density-based algorithm, called HDBSCAN ("Density-Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms" [R package dbscan version 1.1 -2],” [nd]). This algorithm groups points in a given space that are very close to each other (points with many neighbours) and defines as outlier the isolated ones, located in low-density regions (CAMPELLO; MOULAVI; SANDER, 2013). Once the modules were detected, the transcripts were separated into sub-modules according to their expression profile.
3.6.3. AgingNet Modules Path Enrichment Analysis
The same procedure for creating enrichment structures developed for the AgingGenes (described in Section 3.5.4) was applied to each sub-module, with one change: all “daughter” tracks with Combined Score < 10 were removed. the correlation between age and the median expression profile of the transcripts comprising each enriched pathway was calculated. Correlated pathways (|Rho| > 0.35) were classified according to their trend (positive and negative), and this information was added to the visualization of enrichment structures.