100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten
logo-home
Data Analytics for Engineers €5,49
In winkelwagen

College aantekeningen

Data Analytics for Engineers

 0 keer verkocht

Document covers the definitions and concepts need for the Data exam

Voorbeeld 2 van de 14  pagina's

  • 23 juni 2021
  • 14
  • 2020/2021
  • College aantekeningen
  • Rick van den brink
  • Alle colleges
Alle documenten voor dit vak (1)
avatar-seller
eshikanair13
Week 1 -EDA

Association statistic: a summary statistic that describes the strength of the relation between two variables.

Asymmetry: Asymmetry is a general term to describe lack of symmetry. It does not have a specific quantitative
definition.

Bar chart: A bar chart (or bar plot) is a way of summarizing a set of categorical data. It is often used to illustrate
the major features of the distribution of the data in a convenient form using a number of rectangles. There are
different types of bar charts:

- Stacked: better to compare totals
- Grouped: better for comparisons
- Back to back: visualizes the size of different groups in vertical order

Bimodality: When a graphical representation of data set like a histogram, kernel density plot or bar chart
shows two peaks (that need not be equally high , basically we have here a global maximum and a local
maximum),then we speak about bimodality. So bimodality is not as strict as saying that there are two modes.

Bin: A histogram counts frequencies of data point falling into certain intervals called bins.

Binary data: Binary data (also known as dichotomous data) are categorical data for which only two values are
possible.

Box and Whisker Plot: A box and whisker plot is a way of summarizing a set of numerical data. It is a type of
graph which is used to show the shape of the distribution, its central value, and variability. The picture
produced consists of the most extreme values in the data set (maximum and minimum values), the first and
third quartile, and the median. A box and whisker plot is especially helpful for indicating whether a distribution
is skewed and whether there are any outliers in the data set (outliers are indicated by crosses in box and
whisker plots).

Categorical data (qualitative data): This is data that can be organized into mutually exclusive categories. Typical
they are representing characteristics such as marital status, birth place or eye colour. Categorical data can
contain either dichotomous, nominal or ordinal values.

Causation: Causation, or causality, means that one factor influences the outcome of another factor. If heating
in a house is turned on, then the temperature within the house will increase because of that (if you keep
windows and doors closed).

Ceiling: Ceiling is the mathematical operation of rounding a numerical value to the smallest integer larger than
that numerical value (so the ceiling of 2.2 equals 3). The opposite operation is called floor.

Continuous data: Continuous data are measured along a continuous scale, such as temperature or length. It is
a subclass of numerical data. We distinguish interval data and ratio data.

Correlation: Correlation means that values of two factors may have some relationship. An example are shoe
sizes and IQ scores of young children. It does not mean that one factor is causing an effect on the other factor.
Obviously larger shoe sizes do not cause higher IQ (it is . So correlation need not imply causation. In the shoe
size case, the explanation is that when young children become older, they grow both in a physical way (they
get larger) and in a mental way (their thinking powers increase).

Correlation coefficient: A correlation coefficient is an association statistic that describes how strong two
variables depend on each other. Correlation coefficients takes on values in the closed interval [-1,1]. Values
close to -1 and 1 indicate strong dependence, values close to 0 indicate no or very weak dependence.

Count data: Count data is obtained by counting the occurrence of specific observations. Therefore the data can
only take non-negative integer values.

, Covariance: The (sample) covariance of a data set is an association statistic. It cannot be directly when
analysing data sets, since it has no universal scale. It is the building block for other association statistics like the
correlation coefficient.

Cumulative histogram: A cumulative histogram is an alternative version of a histogram. Instead of indicating
frequencies of values in bins, the value is the frequency of that bin plus the frequencies of all bins to the left
(so taking into account all smaller values as well).

Data Analytics: Data analytics (DA) is the process of extracting, modelling and transforming data in order to
draw meaningful conclusions out of it. Data analytics technologies and techniques are widely used in
commercial industries to enable organizations to make more-informed business decisions and by scientists and
researchers to verify or disprove scientific models, theories and hypotheses.

Data Analytics Life Cycle: The Data Analytics Life Cycle is a general workflow that describes which aspect are
involved in performing a data science project, such as formulating a problem statement, organising data,
cleaning data, transforming data, visualising data, applying statistical and data mining techniques and
communicating the results. The concept was introduced by Wickham and Grolemund in their book R for Data
Science,: Import, Tidy, Transform, Visualize, and Model Data.

Demonstration table: A demonstration table is a table with summary statistics that has the purpose to show
certain specific aspects of a data set that are not easily observed from the original data set (the "raw" or un-
processed data).

Dichotomous data (binary data): Dichotomous data contains precisely two distinct categorical values. For
example either heads or tails (coin), pass or fail.

Discrete Data: Discrete data are numerical data that are not continuous and can only take certain numerical
values. The number of possible values may be infinite (e.g., count data that have no upper bound).

Dispersion: Dispersion is an old-fashioned synonym for spread or variability.

Dot chart: A dot chart (or dot plot) is a one-dimensional plot that shows data by dots (bullets) on a horizontal
or vertical line.

Empirical cumulative distribution function: The empirical cumulative distribution function (ecdf) is a plot of a
staircase function that jumps of size 1/n, where n is the number of data points, at every data point. Unlike
cumulative histogram, there is no pre-processing of data.

Exploratory Data Analysis: Exploratory Data Analysis (EDA) is a set of methods with an emphasis on getting to
know the data and generating the relevant questions for a deeper analysis. An important part of EDA is to
visualize the results with graphs, charts or plots for further exploration. Apart from this, EDA summarizes the
data set using summary statistics.

First quartile: The first quartile of a data set is the value such that 25% of the data is smaller than or equal to
that value.

Floor: The floor is a mathematical operation on numeric data that removes the decimal part of a number, i.e. it
equals the largest integer smaller than or equal to that number. Example: the floor of 2.6 equals 2. The
opposite of the floor function is the ceiling.

Histogram: A histogram is a graphical representation of the distribution of numerical data. A histogram is
constructed by dividing the entire range of values into a series of intervals – and then count the number of
values fall into each interval. These intervals are called bins. Histograms are sensitive to bin width, which limits
their use for assessing the shape of data. A much better plot to assess the shape of data is the kernel density
estimator.

Interquartile Range: The interquartile range (IQR) is a scale statistic. It is defined as the difference of the third
and the first quartile.

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, creditcard of je Stuvia-tegoed en je bent klaar. Geen abonnement nodig.

Direct to-the-point

Direct to-the-point

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper eshikanair13. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 65040 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Begin nu gratis
€5,49
  • (0)
In winkelwagen
Toegevoegd