Linear regression

This document describes a very flexible, efficient and robust linear regression package, developed by me. Please download it for evaluation and let me have your comments.

Notation and conventions

We assume that a dependent quantity y can be expressed in terms of a linear function of a number of independent quantities x, i.e. y = ∑j αjxj. Note that this expression does not allow for a constant term. However, this is not a loss of generality: it can be emulated by making one of the independent quantities, say x0, equal to 1 for all points in your dataset.

Linear regression algorithm

The algorithm used is the well-known least squares method. It determines the values of the model parameters α by minimising the quantity i wi(yi - ∑j αjxij)2 where the index i runs over all points in the dataset and wi is a weight factor for each datapoint. A special feature of the algorithm in linmodel.dll is that it allows linear constraints being imposed on the model parameters α: j cjαj = d. The algorithm has no restrictions on the number of datapoints, parameters or constraints. It has been implemented in such a way that it does not need any external routines, not even standard mathematical functions like "square root". It is entirely written in terms of basic operations: addition, subtraction, multiplication and division, nothing else!

Download

The regression package can be downloaded by clicking this link for evaluation purposes only. Documentation on how to use it in your own programs is included. The package is written in C++, compiled and linked in Windows XP. It should work in any 32 bit version of MS Windows (Windows 95 or later).

Testing

I have tested the algorithm on a few standard datasets from the NIST website. These datasets are included in a format the sample program lintest.cpp can use. In all cases the certified outputs are reproduced to at least 10 significant digits and I am therefore convinced that the code in linmodel.dll is fully correct (and also very robust and efficient).

N.B. you may notice some differences for some of the statistical measures: this is because in lintest.cpp I only implemented the definitions for the case without a constant term.

Disclaimer

I will not accept any liability for damages that you may incur, either directly or indirectly, by using this software.

Back to my home page

Valid XHTML 1.0 Strict