Course code:
358B2 
Course name:
Statistics and Advanced Data Processing in Biochemistry 
Academic year: 
2021/2022. 
Attendance requirements: 
There are no requirements. 
ECTS: 
6 
Study level: 
graduate academic studies 
Study programme: 
Biochemistry: 1. year, winter semester, elective (E51B2) course 
Teacher: 
Filip Lj. Andriæ, Ph.D.
associate professor, Faculty of Chemistry, Studentski trg 1216, Beograd 
Assistants: 
— 
Hours of instruction: 
Weekly: two hours of lectures + two hours of exercises (2+2+0) 
Goals: 
The course aims to familiarize students with the elements of statistics and advanced data processing in biochemistry in a clear, understandable and practical way. Aside from basic statistical techniques that enable fundamental data processing and have a strong foundation in quality control and good laboratory practice, techniques for optimization of experimental conditions and processing of multivariate data are becoming increasingly present in Biochemistry. Therefore, this subject aim to introduce both, basic statistical and chemometric concepts, in easily perceivable and intuitive way, avoiding rigorous mathematical explanations. Particular attention is given to solving specific problems that arise in practice. 
Outcome: 
After completing this course students should be able to: use software packages for basic statistical and advanced data processing in biochemistry, accurately display results of measurement, properly select and apply statistical significance tests, understand the concepts of multivariate data analysis methods, use techniques optimization of experimental conditions, make accurate conclusions and interpretations based on data processing outcomes. 
Teaching methods: 
Lectures, theoretical exercises, colloquia. 
Extracurricular activities: 
— 
Coursebooks: 
Main coursebooks:
 James N. Miller, Jane C. Miller: Statistics and Chemometrics for Analytical Chemistry, 6th ed., Pearson Education Ltd., Harlow, 2010.
 B. Dawson, R. G. Trapp: Basic and Clinical Biostatistics, Lange Medical Books/McGrawHill Companies Inc., New York, 2004.
 J. W. Kuzma, S. E. Bohnenblust: Basic statistics for the health science, McGrawHill Companies Inc., New York, 2005.
 Richard G. Brereton: Chemometrics: Data Analysis for the Laboratory and Chemical Plant, Wiley, 2003.
Supplementary coursebooks:
 Handouts, presentations, and other electronic and printed material prepared for lectures and laboratory practice.

Additional material: 
— 
Course activities and grading method 
Lectures: 
10 points (2 hours a week)
Syllabus:
 Introduction  Why statistics and data analysis in Biochemistry?
Problems in biochemical measurements. What is statistics? How does statistics help us separate relevant information from statistical noise? What is data analysis and data science and how it can relate to biochemical problems.
 Measurements in biochemistry and presentation of measurement results
Variables and types of variables, histograms, line graphs, scattered plots, spreadsheets, measures of location and dispersion, measurement errors, accuracy, precision, error propagation, significant digits, rounding of numbers and accurate display of measurement results.
 Probability theory and basic probability distributions
Probability, probability distribution and probability density distribution, population, statistical sample, normal distribution, Student’s distribution, Fisher’s distribution, chisquare distribution.
 Parametric significance tests of continuous random variables
Statistical testing, null and alternative hypothesis, notions of statistical significance and confidence, power, conservatism and sensitivity of the statistical test, onetailed and twotailed statistical testing, outliers, testing deviations from the normal distribution, comparisons of two statistically independent samples, comparisons of two statistically dependent samples, comparing variances of two statistical samples, testing multiple statistically independent samples, onefactor and twofactor analysis of variance, post hoc significance tests: Fischer’s, Tukey’s and Scheffe’s model.
 Parametric significance tests of a discrete random variable and categorical data
Observational frequencies, contingency tables, chisquare test, Fisher's exact test, and McNemar's test.
 Nonparametric significance tests
Significance of difference of a reference and the mean, comparison of statistically dependent and independent datasets, comparison of multiple datasets, trends in data  sign test, sign and rank test, MenWhitney Utest, homogeneous sequence test, KruskalWallis and Friedman analysis variance.
 Correlation and regression
Correlation  a measure of similarity between variables, Pearson's, Spearman’s, Kendal's and Kruskal's correlation coefficients. Statistical significance of correlation. Linear and nonlinear modelling, least squares method, analysis of variance and chisquare test as means for testing the model quality and fitness, regression coefficients and their errors, errors of calibration derived results.
 Basic concepts of quality control and good laboratory practice
Phases of analytical process, representative sample and sampling strategies, basic principles of good laboratory practice, quality control and quality assurance program, parameters of validation and verification of analytical methods, internal and external methods of quality control (control charts, interlaboratory tests, collaborative studies.
 Experimental design and optimization
Factors and their influence on the outcome of the experiment. Experimental design and design matrix. Planning of experiments according to the schemes of full and fractional factorial design, centralcomposite design and BoxBenken design. Construction of a mathematical model. Significance of factors and their interactions. Optimization of experimental conditions using response surface method.
 Exploratory analysis of multivariate data
Multivariate data. Introduction to exploratory analysis of multivariate data. Principal component analysis and hierarchical cluster analysis.

Exercises: 
0 points (2 hours a week)
Syllabus:
 Introduction to statistical and data processing software packages.
 Graphical and tabular presentation of measurement results, calculation of basic elements of descriptive statistics, estimation of error and uncertainty, propagation of measurement uncertainty, significant figures, rounding numbers, and proper presentation of measurement results.
 Introduction to the basic concepts of probability, theoretical and practical aspects of binomial, Gaussian, lognormal, Student's, Fisher's, and chisquared distributions.
 Detection of nonstandard observations. Checking the accuracy of measurement results, comparison of measurement results for variances, comparison of arithmetic means of two independent datasets, comparison of paired measurement sets, comparison of multiple statistically independent data sets (analysis of variance). Testing the influence of various factors on the outcome of biochemical processes.
 Comparison of categorical parameters. Construction of contingency tables. Application of chisquare test, Fisher's exact test and McNemar test.
 Nonparametric approaches in: comparison of a reference value with a set of repeated measurements, comparison of independent and dependent data sets, testing multiple data sets for differences, testing for the presence of trends in the data.
 Calibration and construction of linear and curvilinear regression models. Estimation of the model error. Optimization of model complexity. The errors of the values derived from the calibration and regression coefficients.
 Elements of quality control and assurance. Assessment of basic parameters for validation and verification of analytical methods in biochemistry  precision, accuracy, linearity, operating range, limit of detection and quantification. Control charts for arithmetic means and ranges. Basic estimation of measurement uncertainty. The role of collaborative studies and interlaboratory performance tests in the assessment of verification parameters.
 Selection and analysis of factorial experiments. Design of the experiments according to the centralcomposite and BoxBehnken models. Construction of regression models, estimation of the influence of factors and their crosscoupling terms. Selection of the optimal experimental conditions in biochemical and biotechnological processes.
 Pretreatment of multivariate data. Practical aspects of principal component analysis and interpretation of scree plots, and diagrams of scores and loadings. Performing hierarchical cluster analysis and interpreting dendrograms of objects and variables.

Colloquia: 
30 points 
Written exam: 
60 points 