2022 H2 Mathematics Paper 2 Question 10
Linear Correlation and Regression
Answers
There is a negative correlation between
d
and
p. As seen by the negative value of the product moment
correlation coefficient and the scatter diagram, an increase
in
d is correlated with a decrease in
p.
There is only a moderate linear correlation between d
and p as indicated by the value of the product moment
correlation coefficient. In the scatter diagram, it can be observed
that a straight line is not the best fit for the data as the rate
or decrease of p decreases as d increases.
The product moment correlation will stay the
same as a change in units does not affect the strength of the
linear correlation between
d and
p.
As the special car has a bigger engine and many
extra features, any possible data trend between
d
and
p for the other six cars may not be applicable
for this special car. Using the data relating to this special car
could lead to a less accurate result with weaker correlation.
The distances are squared so that any
negative values that arise from subtracting a larger
value from a smaller will be made positive.
This is referred to as the "method of least squares"
as the least squares regression line of p on
d100000 is the line such that
the total squares of these distances is made the least.
p=506(d100000)+9400.
r=0.987.
£19,400
I would not expect it to be reliable as d=5055
is outside the given data range from 8954
to 45452. The observed data trend may
no longer hold outside the given data range.
Full solutions
(a)
There is a negative correlation between
d
and
p. As seen by the negative value of the product moment
correlation coefficient and the scatter diagram, an increase
in
d is correlated with a decrease in
p.
There is only a moderate linear correlation between d
and p as indicated by the value of the product moment
correlation coefficient. In the scatter diagram, it can be observed
that a straight line is not the best fit for the data as the rate
or decrease of p decreases as d increases.
(b)
The product moment correlation will stay the
same as a change in units does not affect the strength of the
linear correlation between
d and
p.
(c)
As the special car has a bigger engine and many
extra features, any possible data trend between
d
and
p for the other six cars may not be applicable
for this special car. Using the data relating to this special car
could lead to a less accurate result with weaker correlation.
(d)
(e)
The distances are squared so that any
negative values that arise from subtracting a larger
value from a smaller will be made positive.
This is referred to as the "method of least squares"
as the least squares regression line of p on
d100000 is the line such that
the total squares of these distances is made the least.
(f)
p=506(d100000)+9400.
r=0.987.
(g)
P=505.73(5055100000)+9398.5=£19,400 (3 sf)■
I would not expect it to be reliable as
d=5055
is outside the given data range from
8954
to
45452. The observed data trend may
no longer hold outside the given data range.
Question Commentary
The linear correlation and regression questions have been getting
a bit more theoretical in recent years, with more questions about theory and interpretation and less
involving just calculations.
In particular, the discussion on the method of least squares is similar to
2019 paper 2 question 10.