100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Bioinformatics $7.49   Add to cart

Interview

Bioinformatics

 5 views  0 purchase
  • Course
  • Institution
  • Book

Interview study book Bioinformatics of Dev Bukhsh Singh, Rajesh Kumar Pathak - ISBN: 9780323900058 (Study Notes)

Preview 4 out of 67  pages

  • December 26, 2021
  • 67
  • 2021/2022
  • Interview
  • Unknown
  • Unknown
  • Secondary school
  • 2
avatar-seller
Introduction to Statistics and Biostatistics:

Statistics is used by all of us in our day to day life, may not be in the most complex form but
in general simple logic we provide for daily occurrences based on simple analytical
probability. The examples for statistics range from the outcomes of coin tossing experiments
to public opinion polls, predicting the general public consensus regarding state elections.
Statistics finds its use in fields as varied as sociology to economics to health sciences to
mathematics. From the Lattice LotkaVolterra model for studying population model (1) to
using Fourier statistics to portray human faces (2). Such varied use of statistics has also
contributed to personalised advancement of specialised fields in statistics.

The various fields of statistics widely used include astrostatistics, biostatistics, business
analytics, environmental statistics, population ecology, quantitative psychology, statistical
finance, statistical mechanic, statistical physics, statistical thermodynamics, etc.

In this chapter, we shall start from looking into the population types, sampling techniques and
basic analysis. We will discuss different types of data, frequency distribution, frequency
tables. A data representation is an important use of statistics and enables us to achieve finer
interpretation of the given data. In the section, we will look at various representation methods
along with the measures of central tendency. Simultaneously we shall discuss discrete and
continuous distributions and concept of confidence intervals, which will take us deeper into
the understanding of a range of predictability in statistics. This study will enable us to
proceed with hypothesis testing. In this chapter,we shall look further in basic analysis of
variance, correlation, regression. It will include the basic idea of biostatistics.

Definitions:

Datacan be defined as any information, collected in raw or organised form based on
observations (includes visual interpretations, measured quantities, survey responses, etc.),
which suitably refer to a set of defined conditions, ideas or objects.

Statistics is the study of the planning experiments followed by collecting, organizing,
analysing, interpreting and presenting data. So it deals with the overall establishment of
experiments/ cases, beginning from design of experiment to inferring and presenting the
resulting data obtained. Statistics can be broadly categorized into two types: descriptive and
inferential statistics.

Population refers to the complete collection of all elements (scores, people, measurements,
1

,and so on) from where the data has been obtained. The collection includes all subjects to be
studied.

Census refers to the systematic collection of data from every member of the population and
is usually recorded periodically at regular intervals.

Sample refers to a sub-collection of elements selected from a population, and the data is
collected and assumed to be representing the whole population.

Descriptive statistics is used to describe the population under study using statistical
procedures, and the results cannot be extended to a larger population. The results obtained
facilitate better organization of data and information about the population and is limited to the
same. Therefore, descriptive statistics is useful when the result are used for the population
under study, and need not be extended further. Examples of descriptive statistics include
frequency distributions, measures of central tendency and graphical representations.

Inferential statistics as the name suggests is involved in drawing inferences about a larger
population, based on a study conducted on a sample. In this case it is important to
selectcarefully the sample for a study as the results obtained thereby, will be extended and
shall be applicable to the whole concerned population. Several tests of significance such as
Chi-square or t-test allow us to decide whether the results of our analysis on the samples are
significantly representing the population it is supposed to represent or not. Correlation
analyses, regression analyses, ANOVA are examples of inferential statistics.

Discrete variables either have a finite number of values or a counted number of possible
values. In other words, they can have only certain values and none in between. For example,
the number of students on a class on roll can be 44 or 45, but can never be in between these
two.

Continuous variables can have many possible values; they may take any value in a given
range without gaps or intervals. However, they may have intermediate discrete values
depending on the measurement strategy used. For example, body weight may have any value,
but depending on the accuracy of the weighing machine, the outcome may be restricted to
one or two decimal places, however, originally the outcome may have any value in the
continuous range.




2

,Apart from classification as discrete and continuous data, data can be classified based on the
level of information as levels of measurements into nominal, ordinal, interval and ratio levels.

Levels of Measurement

1. Nominal Level means 'names only'. Nominal level data includes qualitative information
which can't be further classified as ranks or in order and don't have quantitative or numerical
significance. Data usually contains names, labels or categories only. For example, names of
cities, eye colour, survey responses as yes, no.

2. Ordinal Level is the next level where the data can be ordered in some numerical order,
however, the differences between the data, if determined, are meaningless. For example,top
ten countries for tourism, exam grades A, B, C, D, or F.

3. Interval Level deals with data values that can be appropriately ranked and the differences
between data points are meaningful. Data at this level does not have an intrinsic zero or
starting point. Ratio of data values at this level is meaningless. For example,temperature in
Fahrenheit or Celsius scale, where 20 degrees and 40 degrees are ordered, and their
difference make sense. However, 0 degrees do not indicate an absence of temperature, also 40
degrees is not twice as hot as 20 degrees. Similarly years 1000, 2000, 1776, and 1492, where
the difference is meaningful, but the ratio is meaningless.

4. Ratio Level deals with data quite similar to an interval level, but there is an intrinsic zero,
or starting point, which indicates that none of the quantity is present. Also, the ratios of data
values in ratio level are meaningful. For example,distance measurement, where 2 inches is
twice as long as 1 inch and can be added, subtracted to give a meaningful value. For example,
prices of commodities (pen, pencil, etc.).

Frequency distributions:
A group of disorganized data is difficult to manage and interpret. In such a situation where a
large amount of data is involved, use of a frequency table for organising and summarising
data makes the whole process quite convenient and effective. To construct a frequency table
for grouped data, the first step is to determine class intervals that would effectively cover the
entire range of data. They are arranged in ascending order and are defined in such a way that
they do not overlap. Then the data values are allocated to different class intervals and are
represented as the frequency of that particular class interval, known as the class frequency.
Sometimes another columnwhich displays the percentage of class frequency of the total

3

, number of observations is also included in the table, and is called as the relative frequency
percentage.

Thefrequency is the number of times a particular datum occurs in the data set.

A relative frequency is a proportion of times a value occurs. To find the relative frequencies,
divide each frequency by the total number of values in the sample.

Cumulative frequency table is also constructed sometimes, where cumulative value can be
obtained by adding the relative frequencies in a particular row and all the preceding class
intervals. It may consist of relative cumulative frequency or cumulative percentage, which
gives the frequency or percentage of values less than or equal to the upper boundary of a
given class interval respectively.

A histogram widely represents the frequency table in the form of a bar graph, where the
endpoints of the class interval are placed on x-axis, and the frequencies are plotted on the y-
axis. Instead of plotting frequencies on the y-axis, relative frequencies can also be plotted, the
histogram in such a case is termed as relative frequency histogram.

Graphical methods:

Another way to represent data is by using graphs, which gives a visual overview of the
essential features of the population under study. Graphs are easier to understand and give
immediate broad qualitative idea of the parameters under study. They may sometimes lack
the precision that can be presented in the table. Graphs should be simple and should
essentially be self-explanatory with suitable titles, adequate use of units of measurements,
properly labelled axes, etc. In the text here, we shall discuss few major graphical methods:

Frequency histograms:

As seen in the previous section, a frequency histogram is a bar graph, with class intervals
placed on the x-axis and frequency plotted on the y-axis. Constructing a histogram is an art
and is led by the need of the presenter, as which information should be highlighted and
prominently displayed. Several such guidelines are available for constructing histograms,
which can efficiently showcase the information of interest. Histograms illustrate a data set
and its shape provides an idea about the distribution of the data.

Guidelines for Creating Frequency Distributions from Grouped Data (3)


4

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller waseemmirza2262. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $7.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

75632 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$7.49
  • (0)
  Add to cart