This document contains a summary of the complete course content of Predicting Food Quality (FQD-31306) at Wageningen University. All modules are described.
- Module 0: Introduction
- Module 1: Predictions, models, errors and uncertainty
- Module 2: Thermodynamics
- Module 3: Reaction rates
-...
Summary Predicting Food Quality (FQD-31306)
Module 0: Introduction to Food Quality
• Quality: satisfying the expectations of the consumer. Decomposed into food quality
attributes (colour, smell, taste, crispiness, texture, convenience, shelf life, nutritional value).
• Continuous product development because of changing consumer demands → efficient to predict
to some extent beforehand what will happen as a function of these changes
• Foods are not stable but in equilibrium with the environment → battle against this
thermodynamic process. Postpone this time by chemical kinetics.
• Predicting: we are able to say something sensible about future events. You should therefore
always give the uncertainty
• Quality is a combination of intrinsic properties (this course, the properties of the food itself)
and extrinsic properties (brand name, price, cultural accepted or not)
• Nutritional value is a food quality attribute that can be decomposed into the vitamin C content
and lysine content. These are quality performance indicators.
• Thermodynamics tell if a reaction is possible and in which direction it goes towards equilibrium
• How fast reactions occur (reaction rate) depends on kinetics
• Quality Analysis Critical Control Points (QACCP): promoting food quality. Every actor’s
processes in the supply chain can affect the food quality factors. Salmonella content (QPI) can
be increased by 1 actor and decreased by another. Critical control points: the most critical
processes that influence the quality attributes.
• Get grip on the quality by: chain analysis (what are the actors doing), identify the processes
that affect quality (chemical, biochemical, physical, microbiological), identify which factors
influence these processes (pH, temperature, water activity), identify what is going on in the
food & turn into a model.
Module 1: Predictions: models, errors and uncertainties
Structure of mathematical models
• Equation can be written as: 𝜂 = 𝑓(𝜃, 𝜉)
o 𝜃 = represents parameters a, b and c
o 𝜉 = independent variable (the one you control)
o 𝜂 = dependent variable (y, response value)
• Independent variable: the x-value, for instance time or pH → controlled by ourselves
• Dependent variable: y-variable, for instance colour → response of the system
• Linear equation → Y = a + bx
o Parameter A is the intercept
o Parameter B is the slope
o With the parameters, you know the relation between x and y. So predict y from x.
• Exponential relationship → Y = a * exp (bx)
• Hyperbola → Y = (ax) / (x+b)
• Polynomial of order 2 → Y = a + bx + cx2
• First step when choosing a model is always plotting the data and seeing how the relation looks
like! Mathematical models can be algebraic (linear relation between time and concentration,
microorganism growth), differential (first order model, rate becomes lower at lower
concentrations of substrate) or partial differential (change in time ánd position, sterilization).
• Overfitting: when a model contains too many parameters and too less datapoints
Variation and errors
• Deterministic models: there is no uncertainty in the parameters of the model so always the
same Y-value will be found for the same X-value. There will always be experimental uncertainty
so absolute certainty of the parameters in impossible to achieve in real life.
• Stochastic or probabilistic models: there IS uncertainty in the parameters so also
uncertainty in the outcome of Y. Does not produce the same output with the same input.
• Sources of uncertainty
o Biological variation → composition of two apples will be different due to change that
happen all the time → you cannot change this
o Experimental variation → can be reduced by repetitions and careful experimental
design
,ACCURACY AND PRECISION
• Accuracy: how close are measurements to
the real value (real value is not known..)
o Small accuracy: lot of systematic
errors → cannot be corrected by
statistics (for instance more
samples)! It is the responsibility of
the researcher. For instance
underestimation for all samples
because wrong calibration.
o Large accuracy: less systematic
errors by e.g. calibration
• Precision: how close are the measured values together? Correct by statistics.
o Precise: values close together → for instance because lot of samples
o Imprecise: values strongly dispersed → lot of random errors
• Nature of these random errors
o Homoscedastic errors: constant errors over the whole measuring range, x-values.
o Heteroscedastic errors: errors decrease or increase with increasing x-values.
• Estimate errors
o So in stochastic models, an error term is included that reflects the uncertainty
o Repeating experiments → mean and standard deviation can be calculated
o Experimental uncertainty reduces by more repetitions.
o Biological variation (differences between different milk samples) cannot be lowered, but
it can be characterized by statistical summary of the data!
Statistical summary of data
• Sample variance: sum of squares / degrees of freedom. Degrees of freedom is n-1 because
mean is estimated from the data which costs 1 degree.
• Standard deviation: variability in the sample data → the square root of the sample variance v
• Coefficient of variation (CV): standard deviation / average * 100%
• Standard error of mean (SEM): uncertainty in the mean. Standard deviation / square root of
sample numbers → so becomes smaller with more samples, becomes more precise! But
because of square root you have to do much more samples.
• Confidence interval (CI): average + (t value * SEM). T value can be found by T.INV.2T with
0.05 for confidence level of 95%. Smaller when you have more samples because smaller SEM.
How to interpret the confidence interval: if the experiment is repeated 100 times, in 95 cases,
the estimated mean is in this interval. In 5 cases, it will be not in this interval.
• Histogram: gives impression of how data are distributed. What is the frequency at each bin
value? Bin of 16 means how everything between 15 and 16.
Resampling: bootstrapping
• Resampling: experimental results are simulated on the computer many times
• Advantage: we don’t need to assume a statistical distribution
• Assumption: experimental data obtained represents whole population
• Bootstrapping is efficient for estimating uncertainty in parameters. New datapoints are
estimated with corresponding parameters. Distribution gives uncertainty in parameters.
Precisions in predictions: error propagation
• Without or with a large certainty, your prediction does not make sense!
• When you further use parameters with uncertainty → error propagation (errors accumulate)
• C = a + b. Standard deviation in c = SQRT (std. a^2 + std. b^2)
• D = a/b. Standard deviation in d = SQRT ([std. a/value]^2 + [std. b/value^2])
• Covariances are important
Resampling: Monte Carlo
• Experimental data set from which parameters are estimated. You have to assume the
distribution beforehand. Model is proposed. An error term that corresponds to the experimental
observed error is included to the model. New model is run to simulate new data. Parameters
are estimated after each run. Shows the distribution and correlation in parameters.
, Fitting models to data: regression techniques
• Confront regression techniques with your data to find parameters for your model
• Residuals: differences in value between experimental and model data
• Perfect model (happens never!) → the residuals = only experimental error
• Real life → residuals are the experimental error + model error
• Why to plot residuals?
o You can see if the errors are homo- or heteroscedastic
o You can see if there is a systematic deviation between model and data (when there is a
trend in the residuals you have to search for better model. The current model did not
cover all phenomena) → ideal case = randomly scattered (as much up as below)
o Be aware: it can look like there is a trend when residuals are actually very small!
• Find parameter estimates by minimizing the sum of squares of the residuals by least squares
regression. Lowest SS = best model. You square because large deviations between model and
data become larger. However, there are some assumptions:
o You have chosen the correct model function
o The y-value contains an error term
o The error term is independent of the model function (so not one of the parameters)
o There are no systematic errors (so normally distributed and mean of each error is 0)
o Variances for each measurement are the same (homoscedastic errors) → otherwise do
weighted linear regression
o Variances for each measurement are not correlated (covariance = 0. So measurement
at t=0 should not influence the one at t=2)
o The error in the x-value is negligible
• Residual standard deviation: S res = SQRT (V res)
o V res = sum of squares / degrees of freedom (number of samples – number
parameters)
o S res is an estimate of the precision with which the determinations have been done.
Importance to determine the confidence intervals for the parameters.
Covariances between parameters
• Covariances: for instance σpq indicates how much fluctuation between p and q depend on each
other → so if their errors are related. If it is 0, they are independent. -1 = overestimate in one
parameter leads to underestimate in the other parameter. 1 = positive correlation.
• A and B are always correlated because they are estimated from the same data set
• How to avoid correlation between parameters in a linear relation? Centring the data around the
average value of x.
• If you want to know if there is a linear relation between x and y, you can have a look at the
correlation coefficient R, 1 means Y increases with increasing X. Low R → data points are
everywhere. Not very important for this course, R between parameters is more important!
• If the correlation coefficient between parameters is above 0.9, one should to be careful.
Above 0.99, it means problems → parameters cannot be estimated well from the dataset, the
model is not appropriate or there is a parameter that is not needed.
Imprecision contours
• For linear relation, you can centre the x-values around the average x to avoid correlation
o So not y = a + bx but y = a + b (x – mean x)
o The standard deviation will be: SQRT (s res2 + sa2 + sb2 * (x-mean x)2
• When not centred → take covariance sab into account
o Standard deviation: SQRT (s res2 + sa2 + sb2*(x)2 + 2*sab
• Making imprecision contour: Y value + or - (t-value * standard deviation)
• Marginal confidence contour: only looks at interval for 1 parameter while the other is fixed
• Joint confidence intervals: interval for 2 parameters. Large circle = no correlation. Ellipsoid
= correlation. Problem because different combinations of parameters give the same SS
Weighted regression
• If the errors are heteroscedastic, you should use weighted regression!
• Each Y-value is weighted by its own variance → precise data have a bigger impact on the
results (more weight) than less precise data
• Transformation (remove exp by taking ln for instance) leads to transformation of errors →
homoscedastic can become heteroscedastic. Log leads to larger error structure, so more
heteroscedastic. Then you have to do weighted regression!
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller bsc_msc_food_technology. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $4.08. You're not tied to anything after your purchase.