Using Systems Biology to understand Immunosenescen
  • Using Systems Biology to Understand Immunosenescence
  • Background
    • Introduction
    • 1.1. Aging in Society and in the Individual
    • 1.2. Aging and its Molecular Mechanisms
    • 1.3.The Remodeling of the Immune System: Immunosenescence
    • 1.4. Changes in the Immune System Related to Immunosenescence
    • 1.5 Chronic Inflammation During Aging: Inflammaging
    • 1.6 The Immune Risk Phenotype (IRP)
    • 1.6. Systems Biology
  • Objectives
  • Methods
    • Overall Methodology
    • 3.1 Survey of Studies
    • 3.2. Reannotation of Probes in Microarrays
    • 3.3. Data acquisition and pre-processing
    • 3.4. Creation of age-representative samples: AgeCollapsed
    • 3.5. Detection of Highly Age-Related Transcripts: AgingGenes
    • 3.6. Lifetime Co-Expressed Transcript Analysis: AgingNet
    • 3.7. Detection of Change Points in Age-Related Modules
  • Results
    • 4.1. Survey and Data Acquisition
    • 4.2. Reannotation of Platforms
    • 4.3. AgeCollapsed Pre-Processing and Creation
    • 4.4. Assessment of the Agreement of the Relationships of Transcripts with Age between the Sexes
    • 4.5. AgingGenes and AgingNet Reviews
    • 4.6 Aging Co-Expression Network: AgingNet
  • Discussion
    • Main Regards
    • AgingGenes
    • Análise de Co-Expressão: AgingNet
  • Conclusions
    • Final Regards
  • Citations
    • References
  • Appendix
    • Supplementary Files
Powered by GitBook
On this page
  1. Methods

3.3. Data acquisition and pre-processing

Previous3.2. Reannotation of Probes in MicroarraysNext3.4. Creation of age-representative samples: AgeCollapsed

Last updated 3 years ago

Was this helpful?

CtrlK

Was this helpful?

Data acquisition was performed via R and bash scripts, either manually (for ArrayExpress data) or through Bioconductor project packages (e.g., GEOquery, (DAVIS; MELTZER, 2007)) for data from GEO.

3.3.1. Normalization

Once the raw data and their respective metadata were obtained, with clinical and experimental characteristics, control samples were normalized using a "barcode" approach, called Universal exPression Code (PICCOLO et al., 2013)UPC, (PICCOLO et al., 2013 ). The UPC is a method applied individually to each sample and estimates the probability of the probes being expressed (active), based on the hypothesis that expression values originate from two populations: inactive genes, where the expression measure represents the variation background, and active genes, consisting of background variation plus a signal. As can be seen in the graph below (Figure 3.3), after a sample is normalized by UPC, the values of its probes can vary between 0 and 1, where only a small portion of the probes is active (expression value > 0.5).

3.3.2. Quality Control

The presence of batch effect and identification of possible outlier samples were evaluated by principal component analysis (PCA), which allows evaluating the relative distances of the samples in the plane of their principal components. With the PCA it is possible to detect outliers, that is, samples that have a high distance from the others, and also clusters of samples that possibly reflect technical variables, such as the date of the experiment or the scanner used. Samples with a large amount of variation concerning the other samples, and with variation not explained by phenotypic variables, were removed.

Figura 3.3. Densidade dos valores de expressão do estudo GSE46097. O eixo horizontal representa o nível de expressão e o vertical a densidade de sondas. Cada linha representa uma amostra de um estudo.