2022 H2 Mathematics Paper 2 Question 10

Linear Correlation and Regression

Answers

There is a negative correlation between d{d} and p.{p.} As seen by the negative value of the product moment correlation coefficient and the scatter diagram, an increase in d{d} is correlated with a decrease in p.{p.}

There is only a moderate linear correlation between d{d} and p{p} as indicated by the value of the product moment correlation coefficient. In the scatter diagram, it can be observed that a straight line is not the best fit for the data as the rate or decrease of p{p} decreases as d{d} increases.

The product moment correlation will stay the same as a change in units does not affect the strength of the linear correlation between d{d} and p.{p.}
As the special car has a bigger engine and many extra features, any possible data trend between d{d} and p{p} for the other six cars may not be applicable for this special car. Using the data relating to this special car could lead to a less accurate result with weaker correlation.
The distances are squared so that any negative values that arise from subtracting a larger value from a smaller will be made positive.

This is referred to as the "method of least squares" as the least squares regression line of p{p} on 100  000d{\frac{100\; 000}{d}} is the line such that the total squares of these distances is made the least.

p=506(100  000d)+9400.{p = 506 \left( \frac{100\; 000}{d} \right) + 9400.}
r=0.987.{r = 0.987.}
£19,400{\pounds 19,400}

I would not expect it to be reliable as d=5055{d=5055} is outside the given data range from 8954{8954} to 45  452.{45 \; 452.} The observed data trend may no longer hold outside the given data range.

Full solutions

(a)

There is a negative correlation between d{d} and p.{p.} As seen by the negative value of the product moment correlation coefficient and the scatter diagram, an increase in d{d} is correlated with a decrease in p.{p.}

There is only a moderate linear correlation between d{d} and p{p} as indicated by the value of the product moment correlation coefficient. In the scatter diagram, it can be observed that a straight line is not the best fit for the data as the rate or decrease of p{p} decreases as d{d} increases.

(b)

The product moment correlation will stay the same as a change in units does not affect the strength of the linear correlation between d{d} and p.{p.}

(c)

As the special car has a bigger engine and many extra features, any possible data trend between d{d} and p{p} for the other six cars may not be applicable for this special car. Using the data relating to this special car could lead to a less accurate result with weaker correlation.

(d)

(e)

The distances are squared so that any negative values that arise from subtracting a larger value from a smaller will be made positive.

This is referred to as the "method of least squares" as the least squares regression line of p{p} on 100  000d{\frac{100\; 000}{d}} is the line such that the total squares of these distances is made the least.

(f)

p=506(100  000d)+9400.{p = 506 \left( \frac{100\; 000}{d} \right) + 9400.}
r=0.987.{r = 0.987.}

(g)

P=505.73(100  0005055)+9398.5=£19,400 (3 sf)  \begin{align*} P &= 505.73 \left( \frac{100\; 000}{5055} \right) + 9398.5 \\ &= \pounds 19,400 \textrm{ (3 sf)} \; \blacksquare \end{align*}
I would not expect it to be reliable as d=5055{d=5055} is outside the given data range from 8954{8954} to 45  452.{45 \; 452.} The observed data trend may no longer hold outside the given data range.

Question Commentary

The linear correlation and regression questions have been getting a bit more theoretical in recent years, with more questions about theory and interpretation and less involving just calculations.

In particular, the discussion on the method of least squares is similar to 2019 paper 2 question 10.