Course code:
358B2
Course name:
Statistics and Advanced Data Processing in Biochemistry

Academic year:

2023/2024.

Attendance requirements:

There are no requirements.

ECTS:

6

Study level:

graduate academic studies

Study program:

Biochemistry: 1. year, winter semester, elective (E51B2) course

Teacher:

Filip Lj. Andriæ, Ph.D.
associate professor, Faculty of Chemistry, Studentski trg 12-16, Beograd

Assistants:

Hours of instruction:

Weekly: two hours of lectures + two hours of exercises (2+2+0)

Goals:

The course aims to familiarize students with the elements of statistics and advanced data processing in biochemistry in a clear, understandable and practical way. Aside from basic statistical techniques that enable fundamental data processing and have a strong foundation in quality control and good laboratory practice, techniques for optimization of experimental conditions and processing of multivariate data are becoming increasingly present in Biochemistry. Therefore, this subject aim to introduce both, basic statistical and chemometric concepts, in easily perceivable and intuitive way, avoiding rigorous mathematical explanations. Particular attention is given to solving specific problems that arise in practice.

Outcome:

After completing this course students should be able to: use software packages for basic statistical and advanced data processing in biochemistry, accurately display results of measurement, properly select and apply statistical significance tests, understand the concepts of multivariate data analysis methods, use techniques optimization of experimental conditions, make accurate conclusions and interpretations based on data processing outcomes.

Teaching methods:

Lectures, theoretical exercises, colloquia.

Extracurricular activities:

Coursebooks:

Main coursebooks:

  1. James N. Miller, Jane C. Miller: Statistics and Chemometrics for Analytical Chemistry, 6th ed., Pearson Education Ltd., Harlow, 2010.
  2. B. Dawson, R. G. Trapp: Basic and Clinical Biostatistics, Lange Medical Books/McGraw-Hill Companies Inc., New York, 2004.
  3. J. W. Kuzma, S. E. Bohnenblust: Basic statistics for the health science, McGraw-Hill Companies Inc., New York, 2005.
  4. Richard G. Brereton: Chemometrics: Data Analysis for the Laboratory and Chemical Plant, Wiley, 2003.

Supplementary coursebooks:

  • Handouts, presentations, and other electronic and printed material prepared for lectures and laboratory practice.

Additional material:

  Course activities and grading method

Lectures:

10 points (2 hours a week)

Syllabus:

  1. Introduction - Why statistics and data analysis in Biochemistry?
    Problems in biochemical measurements. What is statistics? How does statistics help us separate relevant information from statistical noise? What is data analysis and data science and how it can relate to biochemical problems.
  2. Measurements in biochemistry and presentation of measurement results
    Variables and types of variables, histograms, line graphs, scattered plots, spreadsheets, measures of location and dispersion, measurement errors, accuracy, precision, error propagation, significant digits, rounding of numbers and accurate display of measurement results.
  3. Probability theory and basic probability distributions
    Probability, probability distribution and probability density distribution, population, statistical sample, normal distribution, Student’s distribution, Fisher’s distribution, chi-square distribution.
  4. Parametric significance tests of continuous random variables
    Statistical testing, null and alternative hypothesis, notions of statistical significance and confidence, power, conservatism and sensitivity of the statistical test, one-tailed and two-tailed statistical testing, outliers, testing deviations from the normal distribution, comparisons of two statistically independent samples, comparisons of two statistically dependent samples, comparing variances of two statistical samples, testing multiple statistically independent samples, one-factor and two-factor analysis of variance, post hoc significance tests: Fischer’s, Tukey’s and Scheffe’s model.
  5. Parametric significance tests of a discrete random variable and categorical data
    Observational frequencies, contingency tables, chi-square test, Fisher's exact test, and McNemar's test.
  6. Non-parametric significance tests
    Significance of difference of a reference and the mean, comparison of statistically dependent and independent datasets, comparison of multiple datasets, trends in data - sign test, sign and rank test, Men-Whitney U-test, homogeneous sequence test, Kruskal-Wallis and Friedman analysis variance.
  7. Correlation and regression
    Correlation - a measure of similarity between variables, Pearson's, Spearman’s, Kendal's and Kruskal's correlation coefficients. Statistical significance of correlation. Linear and nonlinear modelling, least squares method, analysis of variance and chi-square test as means for testing the model quality and fitness, regression coefficients and their errors, errors of calibration derived results.
  8. Basic concepts of quality control and good laboratory practice
    Phases of analytical process, representative sample and sampling strategies, basic principles of good laboratory practice, quality control and quality assurance program, parameters of validation and verification of analytical methods, internal and external methods of quality control (control charts, inter-laboratory tests, collaborative studies.
  9. Experimental design and optimization
    Factors and their influence on the outcome of the experiment. Experimental design and design matrix. Planning of experiments according to the schemes of full and fractional factorial design, central-composite design and Box-Benken design. Construction of a mathematical model. Significance of factors and their interactions. Optimization of experimental conditions using response surface method.
  10. Exploratory analysis of multivariate data
    Multivariate data. Introduction to exploratory analysis of multivariate data. Principal component analysis and hierarchical cluster analysis.

Exercises:

0 points (2 hours a week)

Syllabus:

  1. Introduction to statistical and data processing software packages.
  2. Graphical and tabular presentation of measurement results, calculation of basic elements of descriptive statistics, estimation of error and uncertainty, propagation of measurement uncertainty, significant figures, rounding numbers, and proper presentation of measurement results.
  3. Introduction to the basic concepts of probability, theoretical and practical aspects of binomial, Gaussian, log-normal, Student's, Fisher's, and chi-squared distributions.
  4. Detection of non-standard observations. Checking the accuracy of measurement results, comparison of measurement results for variances, comparison of arithmetic means of two independent datasets, comparison of paired measurement sets, comparison of multiple statistically independent data sets (analysis of variance). Testing the influence of various factors on the outcome of biochemical processes.
  5. Comparison of categorical parameters. Construction of contingency tables. Application of chi-square test, Fisher's exact test and McNemar test.
  6. Non-parametric approaches in: comparison of a reference value with a set of repeated measurements, comparison of independent and dependent data sets, testing multiple data sets for differences, testing for the presence of trends in the data.
  7. Calibration and construction of linear and curvilinear regression models. Estimation of the model error. Optimization of model complexity. The errors of the values derived from the calibration and regression coefficients.
  8. Elements of quality control and assurance. Assessment of basic parameters for validation and verification of analytical methods in biochemistry - precision, accuracy, linearity, operating range, limit of detection and quantification. Control charts for arithmetic means and ranges. Basic estimation of measurement uncertainty. The role of collaborative studies and inter-laboratory performance tests in the assessment of verification parameters.
  9. Selection and analysis of factorial experiments. Design of the experiments according to the central-composite and Box-Behnken models. Construction of regression models, estimation of the influence of factors and their cross-coupling terms. Selection of the optimal experimental conditions in biochemical and biotechnological processes.
  10. Pretreatment of multivariate data. Practical aspects of principal component analysis and interpretation of scree plots, and diagrams of scores and loadings. Performing hierarchical cluster analysis and interpreting dendrograms of objects and variables.

Colloquia:

30 points

Written exam:

60 points