310 INFORMATICS
, BD INFORMATICS
:
LECTURE 1 :
INTRODUCTION TO BIOINFORMATICS
↳
DATABASE STRUCTURE
↳
USE OF PROTEIN DDTDBDSES
↳
PROTEIN SEQUENCE AND BIOLOGICAL FUNCTION
THE CENTRAL PARADIGM
:
DND TREPLICATION
f- TRANSCRIPTION
-
y
I GENOTYPE → PHENOTYPE - -
l
MND
l
l
L
' V TRANSLATION
EVOLUTION ←
NATURAL SELECTION PROTEIN
f
METABOLISM
DND →
ATGC ( Z strands )
RNA →
UACG (1 STRAND )
PROTEIN -
D 20 Amino ACIDS (N -
AND C- terminus )
METABOLISM - D
PATHWAY
BIOINFORMATICS IS THE APPLICATION OF COMPUTER TECHNOLOGY to THE
MANAGEMENT AND ANALYSIS OF BIOLOGICAL DATA
•
DDT 0
€
D GENERATION , STORAGE , MSNDGEMENT , MINING .
AREAS OF APPLICATION
•
DDTD INTERPRETATION , INTEGRATION
SEQUENCING :
TIE FIRST COMPLETE GENOME OF A FREE LIVING ORGANISM ,
THE
Influenza VIRUS ,
1000 BP ,
BY HAND .
HUMAN ( 90 03 )
' '
GENOME
:
3.1 Billion gp ,
13 YEARS -
, WHAT IS GENOMIC SEQUENCING? https://www.youtube.com/watch?v=2JUu1WqidC4
GENOMIC SEQUENCING IS D PROCESS FOR ANALYSING A SAMPLE OF DND ,
TAKEN FROM BLOOD , SAMPLES DRE SUBMITTED to HIGH FREQUENCY SOUND WAVES ,
THAT BREAK THE DND INTO SMALLER PIECES ( N
Cold BASES LONG ) .
SPECIAL TAGS
ARE ADDED TO THE ENDS OF THE FRAGMENTED DND ,
WHICH CAN THEN ATTACH
to rs GUSS SLIDE
IN A SEQUENCER ,
EACH PIECE OF DND is COPIED HUNDREDS OF THOUSANDS
OF TIMES ,
CREATING CLUSTERS OF DENT CSL DND FRAGMENTS .
NEXT ,
THE
SEQUENCER READS THE DND , ONE BASE AT A TIME ,
USING DIFFERENT COLOUR TAGS
FOR EACH BASE (DTGC) .
SPECIAL SENSORS WITHIN THE MACHINE DETECT
THE DIFFERENT COLORED TAGS .
THIS SEQUENCE OF colours REVEALS THE DNA
SEQUENCE OF EACH FRAGMENT .
COMPUTERS PIECE TOGETHER THESE INDIVIDUAL DND FRAGMENTS AND
GIVE THE SEQUENCE THE ORIGINAL DND D EXPERTS
SPECIALIZED
OF .
TEAM OF USES
SOFTWARE TO ANALYSE AND COMPARE THE SEQUENCES TO IDENTIFY THE
VDRIENTS THAT MAY BE RELEVANT .
I SEQUENCING
:
SANDER 'S SEQUENCING
NEXT GENERATION -
SEQUENCING TECHNOLOGY :
BILLIONS OF BDS.ES/RUN
a
454 ROCHE
Illumina D. Seal Mi Sea
"
•
Souls ) DB 55001 WILDFIRE
•
( on Torment PGM ( Proton
•
PACIFIC BIOSCIENCES
OXFORD NANO PORE
°
1
↳
BIOSCIENCES
"
Dlhdws FOR LENG MOLECULES , Brut SOME ERRORS ARE PRE -
t
SENT ( PROVIDE BACKBONE) ILLUMINA
it
ALLOWS FOR SHORT FRAGMENTS THAT CAN
CORRECT
.
LONG SEQUENCES MISTAKES .
, D LOT OF DATA IS PRODUCED ,
CREATING A STORAGE SHORTAGE AND
ANALYSIS INTERPRETATION ISSUES
•
WHY DO WE NEED BIOINFORMATICS
? ANALYSE
, INTERPRET ,
STORE
DATA .
°
How can WE look at it
? KNOW WHAT DATA to cook FOR
DATABASE STRUCTURE
:
IMPORTANT SOURCES OF INFORMATION ON BI @ MOLECULES
.eu#tT
•
PUBLIC REPOSITORIES OF SEQUENCE DATA
:
NCBI ,
UNI Depot
s.s.in?.:i:::.::::::::::::::::::::.r
DDTD STRING PUBMED NLM
•
INTEGRATION AND LITERATURE SEARCH ENGINES :
,
,
ESSENTIAL ELEMENTS OF DATABASES
EVERY
CHARACTERISTIC
DATABASE CAN HAVE ITS OWN FORMAT BUT SOME ELEMENTS DRE
FOR EVERY DATABASE :
°
ACCESSION CODES ( IDENTIFIERS ,
IS UNIQUE )
•
DATA TABLES
°
META DATA ( DATA ABOUT DATA =
ANNOTATION
) .
IMPORTANT ADDITIONAL INFORMATION :
•
DEPOSITION DATE
°
NAME OF DEPOSITOR
°
LITERATURE REFERENCES