LSC-30057 bioinformatics and science communication (LSC30057)
Essay
Bioinformatics essay assignment
0 view 0 purchase
Course
LSC-30057 bioinformatics and science communication (LSC30057)
Institution
Keele University (KU)
Bioinformatics assignment using data interpretation and graphical analysis on the topic. Submitted for biomedical science, first class grade for final year assignment.
LSC-30057 bioinformatics and science communication (LSC30057)
All documents for this subject (1)
Seller
Follow
elsiebelsy
Content preview
The use of Bioinformatic Tools to Assess the Structure and Function of an
Unidentified Coding and Non-Coding Sequence.19002410
Abstract
Nucleotide sequences 4a and 4b were interpreted with the use of bioinformatic tools, to first find their
identity using NCBI nucleotide BLAST. Aims focused on using a range of bioinformatic tools to
recall the characteristics of each sequence and their function in regulating cellular processes or role in
pathology within the body. Outputs were interpreted relative to the purpose of each tool, providing
information on the structural features of each sequence, or their role in cell signalling. HOTAIR was
revealed to be long non-coding RNA sequence 4a, found to be upregulated in several diseases within
the body. BECN1 was found to be coding sequence 4b, responsible for autophagy, and was closely
associated to other protein subunits within the PI3K complex. Other databases such as
STRING, HuRi, Alphafold, Genevisible and DisGeNET were then used to further characterise
and establish the function of each sequence. Outputs were interpreted relative to the protein-
protein interactions, splice variants, or varying levels of expression of each sequence within a range of
tissues. Prevalence of the gene within each tissue was associated with normal, or abnormal, cellular
function, playing a vital role in medical research. Information provided by
each database informed subsequent investigation into the impact each sequence had within
the Homo sapien genome.
Introduction
Bioinformatics has allowed for a large volume of biological and statistical information to be
processed, stored, and collated in the form of databases. Recall of complex datasets can provide a large
array of information to determine the identity and characteristics of sequences. Of interest was the role
of non-coding and coding nucleotide sequences that were analysed in terms of their homology to a
range of genes, where results demonstrated a variety of differences due to splice variants and their
interactions with structural proteins. Approximately 98% of genetic information can be categorised as
non-coding; demonstrating that an array of information can be explored and deduced from sequences
that do not have the required mechanisms to code for proteins (Perenthaler et al., 2019). A variety
of bioinformatic tools can collectively be used to gain further insight into genomics, whereby
sequences can be analysed to reveal their role in disease presentation, cellular function or signalling
pathways. Notably, the rise of the Human Genome Project has enabled the field of bioinformatics to
gain traction within the scientific community, as an efficient way to store information and establish a
valuable relationship between the related disciplines of mathematics and computer science (Hood,
Rowen, 2013). The availability of a range of bioinformatic tools allowed for a more objective point of
view on the role of a gene within the body, and a clearer understanding of its functional significance.
This is beneficial, as deductions were made from a greater number of resources that had a large
coverage across several databases. The aims of this project were to gain an understanding of the
functionality of two sequences within the human genome, and to depict the key features they
present that lead to their distinction from other elements of the transcriptome. Bioinformatic tools
were used as the basis of research into each sequence, whereby each search result led to further
investigation using other databases, to gain insight into the significance of both non-coding and coding
sequences 4a and 4b.
Methods
1
, A summary of the bioinformatic tools used to obtain structural and functional information
surrounding sequence 4a and 4b during this investigation are summarised in Table 1.
Table 1. List of bioinformatic tools that were used, including their purpose in the characterisation of sequences 4a and 4b. A
range of bioinformatic tools were utilised to determine the structure and function of each sequence, starting with NCBI Nucleotide
BLAST. Each sequence was identified as either coding or non-coding and named according to their homology to
predicted sequences stored within the chosen database. Outputs provided by each tool informed subsequent investigation using
other databases to explore the impact the sequences had in more detail.
Name of Bioinformatic Tool Purpose of Tool
NCBI Nucleotide BLAST To confirm the identity of each sequence compared to several predicted and
experimentally confirmed outputs
STRING To assess the interaction of sequence 4b with other co-regulatory proteins
within protein complexes
HuRI Confirm the coverage of STRING and objectiveness of outputs provided for
sequence 4b by comparing the two databases
Alphafold Visualise the structural features of sequence 4b in relation to its function
Genevisible Used to compare the level of 4a’s expression to an array of tissue types
in healthy samples and in cancer presentation
DisGeNET Compared the role of splice variants within exon and intronic regions
of sequence 4a in disease presentation, whilst determining the chromosome
number it was found on
Results
Non-coding sequence
A total of 11 sequences were recalled from NCBI nucleotide BLAST to confirm the identity of
sequence 4a to be non-coding HOX antisense intergenic RNA (HOTAIR). HOTAIR displayed
100% homology matching the query length of 2158 nucleotides. Searches were refined to genes
specific to the Homo sapien genome only, where only highly similar sequences were selected (Figure
1).
Figure 1. Output from NCBI Nucleotide BLAST using nucleotide sequence 4a (NCBI, 2021). A list of similar sequences to the
query length of sequence 4a were shown and refined to the Homo sapien genome.
Following from the identification of sequence 4a, the extent of HOTAIR expression
within healthy tissue was observed using data from Genevisible (Figure 2).
2
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller elsiebelsy. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $20.25. You're not tied to anything after your purchase.