**M. Meloun, J. Militký, M. Forina: CHEMOMETRICS FOR ANALYTICAL CHEMISTRY, Volume 2: PC-Aided Regression and Related Methods, Ellis Horwood, Chichester, 1992, 330 stran, ISBN 0-13-126376-5.**

Click on photo |

### Preface

Today, data treatment and data interpretation are highly developed areas that make up the subject known as chemometrics. This interdisciplinary subject aims to extract the maximum amount of information from data, by the use of computer-assisted methods.

## Intended Audience

This chemometrics text is intended for advanced undergraduates, graduates and postgraduate students in chemistry, other natural sciences, and chemical engineering, and for working professionals in the chemical sciences.

At the college of Chemical Technology in Pardubice (Czechoslovakia) and the College of Pharmaceutical and Food Technology in Genoa (Italy) we have taught chemometrics for some years, and recently we have given post-graduate courses to personnel from the chemistry industry.

The first volume contained five chapters on data analysis by statistical methods. The second volume describes interactive model building and testing. Our intent is that each chapter is reasonably self contained. We explain each method in conjunction with worked examples, and the chapter ends with a summary of the suggested procedure for data treatment. Additional solved problems describe variations on the techniques, and present results on diagnostics performance.

## Purpose of this Textbook

In current chemometrics practice, methods of statistical data analysis are very important. The advent of the personal computer has revolutionized the treatment of experimental data in the chemical laboratory. Our daily lives are directly, or indirectly, affected by the ubiquitous personal computer. Chemists and chemical engineers now make extensive use of personal computers for interactive data analysis.

Most chemometrics approaches have been based on the use of the concepts of classical statistical theory. This textbook introduces the use of interactive data treatment by personal computer. The attitudes underlying exploration, though long used by skilled data analysts, have been little exposed to public view. The book provides the basis for an understanding of exploratory and confirmatory techniques by use of examples and a relatively unsophisticated level of mathematical knowledge. Interactive statistical data analysis is introduced here as a form of numerical or graphical detective work.

### Software

Most of the programs used in this book are available in the computer package CHEMSTAT, (or later version called ADSTAT as ADvanced STATistics) which is distributed by Trilobyte Statistical Software Ltd, Pardubice. More information is provided in the Appendix.

### Content

Preface ix

Glossary xi

6. Linear regression models 1

6.1 Formulation of a linear regression model 1

6.2 Conditions for the least-squares method 10

6.3 Statistical properties of the least-squares method 12

6.3.1 Construction of confidence intervals 20

6.3.2 Testing of hypotheses 24

6.3.2.1 Testing for multicollinearity 28

6.3.2.2 Test of significance of the intercept term 30

6.3.2.3 Simultaneous test of a composite hypothesis 33

6.3.2.4 Test of agreement of two linear models 36

6.3.2.5 Acceptance test for a proposed linear model 39

6.3.3 Comparison of regression lines 46

6.3.3.1 Test for homogeneity of intercepts 47

6.3.3.2 Test for homogeneity of slopes 48

6.3.3.3 Test for coincidence of regression lines 49

6.4 Numerical problems in the computer calculation of linear regression 52

6.4.1 The method of orthogonal functions 55

6.4.2 The method of rational ranks 57

6.5 Regression diagnostics 62

6.5.1 Exploratory regression analysis 62

6.5.2 Examination of data quality 64

6.5.2.1 Statistical analysis of residuals 64

6.5.2.2 Analysis of projection matrix elements 69

6.5.2.3 Plots for identification of influential points 72

6.5.2.4 Other characteristics of influential points 76

6.5.3 Examination of a proposed regression model 87

6.5.3.1 Partial regression leverage plots 87

6.5.3.2 Partial residual plots 89

6.5.3.3 Sign test for model specification 93

6.5.4 Examination of conditions for the least-squares method 94

6.5.4.1 Heteroscedasticity 94

6.5.4.2 Autocorrelation 95

6.5.4.3 Normality of errors 97

6.6 Procedures when conditions for least-squares are violated 98

6.6.1 Restrictions placed on the parameters 98

6.6.2 The method of generalized least squares (GLS) 102

6.6.2.1 Heteroscedasticity 104

6.6.2.2 Autocorrelation 110

6.6.3 Multicollinearity 116

6.6.4 Variables subject to random errors 121

6.6.5 Other error distributions of the dependent variable 126

6.6.5.1 The M-estimates method 126

6.6.5.2 The L1-approximation method 130

6.6.5.3 Robust estimates with bounded influence 133

6.7 Calibration 141

6.7.1 Types of calibration and calibration models 142

6.7.2 Calibration straight line 145

6.7.3 The precision of calibration 151

6.8 Procedure for linear regression analysis 155

6.9 Additional problems 157

References 175

7. Correlation 178

7.1 Correlation models 179

7.1.1 Correlation models for two random variables 179

7.1.2 The correlation model for many random variables 185

7.2 Correlation coefficients 184

7.2.1 Paired correlation coefficient 194

7.2.2 Partial correlation coefficients 200

7.2.3 Multiple correlation coefficient 202

7.2.4 Rank correlation 204

7.3 Procedure for correlation analysis 206

References 206

8. Nonlinear regression models 207

8.1 Formulation of a nonlinear regression model 209

8.2 Models of measurement errors 214

8.3 Formulation of the regression criterion 219

8.4 Geometry of nonlinear regression 225

8.5 Numerical procedure for parameter estimation 235

8.5.1 Non-derivative optimization procedures 236

8.5.1.1 Direct search methods 237

8.5.1.2 Simplex methods 238

8.5.1.3 Random optimization 244

8.5.1.4 Special procedures for the least-squares method 247

8.5.2 Derivative procedures for the least-squares method 250

8.5.2.1 Gauss-Newton methods 253

8.5.2.2 Marquardt-type methods 257

8.5.2.3 Dog-leg type procedures 259

8.5.3 Complications in nonlinear regression 260

8.5.3.1 Parameter estimability 260

8.5.3.2 Existence of a minimum of U(b) 262

8.5.3.3 Existence of local minima 263

8.5.3.4 Ill-conditioning of parameters 264

8.5.3.5 Small range of experimental data 264

8.5.4 Examination of the reliability of the regression algorithm 267

8.6 Statistical analysis of nonlinear regression 270

8.6.1 Degree of nonlinearity of a regression model 272

8.6.1.1 Bias of parameter estimates 273

8.6.1.2 Asymmetry of parameter estimates 276

8.6.2 Interval estimates of parameters 277

8.6.2.1 Confidence regions of parameters 277

8.6.2.2 Confidence intervals of parameters 282

8.6.2.3 Confidence intervals of prediction 285

8.6.3 Hypothesis tests about parameter estimates 286

8.6.4 Goodness-of fit tests 288

8.6.4.1 Graphical analysis of residuals 289

8.6.4.2 Statistical analysis of residuals 289

8.6.4.3 Identification of influential points 292

8.7 Procedure for building and testing a nonlinear model 294

8.8 Additional problems 295

References 303

9. Interpolation and approximation 306

9.1 Classical interpolation procedures 307

9.1.1 Lagrange and Newton interpolation formulae 309

9.1.2 Hermite interpolation 315

9.1.3 Rational interpolation 316

9.2 Spline interpolation 319

9.2.1 Local Hermite interpolation 324

9.2.2 Cubic spline 328

9.3 Approximation of functions 336

9.4 Approximation of tabular data 340

9.4.1 Polynomial approximation 340

9.4.2 Piecewise regression 342

9.5 Numerical smoothing 349

9.5.1 Smoothing spline 350

9.5.2 Nonparametric regression 360

9.5.3 Digital filtration 362

9.6 Procedures for interpolation and approximation 372

9.7 Additional problems 373

References 374

10. Derivatives and integrals 375

10.1 Derivatives 376

10.1.1 Analytical derivatives 378

10.1.2 Numerical derivatives 378

10.2 Integrals 381

10.2.1 Analytical integration 381

10.2.2 Numerical integration 381

10.3 Procedure for numerical differentiation and integration 387

10.4 Additional problems 388

References 390

Appendix: CHEMSTAT, ADSTAT 391

Index 393