Arzu Burak (2635213) and Karima Allach (2706743)
DATA ANALYSIS PROMISES POTENTIAL TARGET GENES FOR
THE TREATMENT OF ALZHEIMER’S DISEASE
INTRODUCTION
Alzheimer’s disease (AD), the most common cause of dementia, is a neurological disorder of the
brain causing shrinkage and cell death. As this disease is still common in the elderly, we are
interested in finding gene associations to better understand this disease and make a start to finding
a potential treatment. By looking at genes in specific locations that could be a possible risk factor for
AD during the Artificial Intelligence Master programme, we hope to gain further knowledge on this
disease and work together with pharmaceutical companies by finding genes that could be a target
for treatment after getting our master’s degree. By making our codes available on the internet at the
end of this analysis, we hope to help other researchers in further developing their own research. We
expect a difference in gene expression in specific brain regions in AD patients compared to the
control. In order to test whether this is the case, we will combine genome-wide association studies
(GWAS), gene expression in patients, brain gene expression data, protein-protein interactions (PPIs)
and MRI data.
GENOME-WIDE ASSOCIATION STUDIES
Before analyzing the data, the GWAS results obtained by FUMA were visualized and explored. Next,
the data is cleaned in Matlab by retaining the columns GENE, CHR, ZSTAT, SYMBOL and P to get a
smaller dataset. For the data analysis, we plotted a distribution of the p-values in a histogram to find
potentially valuable candidate genes. A code is made that finds the genes that reach the significance
threshold of 5*10-8. In order to see which genes have the largest effect, we sorted the rows on
ZSTAT. What we also found in FUMA is that the most significant top lead SNP rs41289512 lies closest
to the APOE gene.
Figure 1: GWAS meta-analysis for AD risk (N=455,258 participants, N=2,357 SNPs). Notice the remarkably high peak at
chromosome 19. α = 5*10-8.
GWAS showed that TOMM40 is the gene with the largest effect (Z = 21.921, P = 8.1347 -107), implying
genetic association with AD risk. After doing some research on the internet, we found that TOMM40
is in LD with the gene APOE. GWAS from previous research showed that APOE is indeed a strong risk
factor for late-onset AD (Mise et al., 2017). These genes are both located on chromosome 19, also
visible in figure 1 and table 1.
Table 1: Top 4 genes with largest effects. TOMM40 is the gene with the largest effect in this part of the analysis.
, Arzu Burak (2635213) and Karima Allach (2706743)
GENE EXPRESSION
The goal of the gene expression part of our analysis is to investigate whether the level of expression
of TOMM40 and APOE are different between AD patients and controls in the entorhinal cortex (EC)
that is involved in the long-term cognitive
memory formation (Puthiyedth et al., 2016).
After visualization of the results from the
separate data files obtained by Liang et al.
(2008), we merged these files by first cleaning up
to get a structured table. The chosen unique
identifier was ‘probe_id’. Next, we tested for a
potential difference between patients and
controls by performing the t-test statistic (T =
2.5047). As you can also see in the barplot in
figure 2, the gene expression is lower in the AD
patients compared to control, indicating that a
lower gene expression is a marker of AD.
Figure 2: Bar plot. Statistical comparison between control and
patients affected with AD. Error bar showing 1 standard deviation.
PROTEOMICS AND PPI
For the proteomics analysis, we were interested in which other proteins APOE interacts with in order
to have a better understanding of the function of the APOE gene in the human brain cell. In order to
do this, a PPI list was needed. HENA provided this list from the IntAct molecular interaction
database, a heterogeneous network based-data set for Alzheimer’s disease. To draw a conclusion
regarding the fidelity of the PPIs, we first searched for the correct ENSG identifier for APOE and
looked up the
corresponding protein. This resulted in a list of the PPIs of only APOE. These can also be seen in the
PPI network (figure 3) built in the STRING database.
HENA also provided a data set that can be
downloaded as an individual txt file containing a list of
proteins that could possibly have a connection to
Alzheimer's disease and this was used to determine
the protein with the most interactions in the disease.
All PPIs in this list, we could presume, are
bidirectional. The amyloid precursor protein (APP), as
a result, has the most interactions with the other
proteins in the network. Thus it is the central player
within the PPI network (figure 3). As the name already
suggests this protein produces a peptide called β-
amyloid. Mutations in the APP gene cause aberrant
cleavage of the APP protein leading to abnormal Aβ
generation, which leads to the
formation of plaques in the brain and causes early-
onset Alzheimer's disease (Wang et al., 2017). Figure 3: Protein-protein interaction network of
APOE. The lines indicate the interactions
between genes. Blue and pink lines show known
interactions.