ex3 practical exercises for Example Sheet 3.¶

The example sheet asks you to implement the three functions given below: sd_confint_parametric, sd_confint_nonparametric, and exp_equality_test. The skeletons are given. To test your answers on Moodle, please upload either a Jupyter notebook called ex3.ipynb or a plain Python file called ex3.py.

These functions are meant to compute 95% confidence intervals and $p$-values, using computational approximation based on sampling. This means that your answers might not be exact. You should be able to use around 30,000 samples with no more than a few seconds of runtime, and this should be enough to pass the tester. If you can't, you should refactor your code to make it faster (using numpy vectorized commands).

sd_confint_parametric and sd_confint_nonparametric. We have a dataset $x_1,\dots,x_n$ drawn from $N(\mu,\sigma^2)$. Find a 95% confidence interval for the maximum likelihood estimator $\hat{\sigma}$, using parametric and non-parametric sampling respectively.

def sd_confint_???(p, x):
    # Input: p=0.95 and x = November temperatures in Cambridge from 2000 owards
    # TODO: compute a p-confidence interval for σhat
    return (lo,hi)

exp_equality_test. We have a data $x_1,\dots,x_m$ drawn from $\operatorname{Exp}(\mu)$, and $y_1,\dots,y_n$ drawn from $\operatorname{Exp}(\nu)$. We wish to test if $\mu=\nu$. Compute the $p$-value for this test, using the test statistic $\hat{\nu}-\hat{\mu}$.

def exp_equality_test(x, y):
    # Input: x=np.array([0.89, 0.97, 2.30, 2.85, 2.05, 1.00]), y=np.array([0.51, 0.24, 1.21, 0.16])
    # TODO: compute the p-value
    return p

TEST¶

The Moodle checker will look for a markdown cell with the contents # TEST, and ignore everything beneath it. Put your working code above this cell, and put any experiments and tests below.

In [ ]:
import numpy as np
import pandas
In [ ]:
url = 'https://www.cl.cam.ac.uk/teaching/current/DataSci/data/climate_202309.csv'
climate = pandas.read_csv(url)
df = climate.loc[(climate.station=='Cambridge') & (climate.yyyy>=2000)]
x = df.temp[df.mm==11].values

print(sd_confint_parametric(0.95, x))
print(sd_confint_nonparametric(0.95, x))
In [ ]:
x = np.array([0.89, 0.97, 2.30, 2.85, 2.05, 1.00])
y = np.array([0.51, 0.24, 1.21, 0.16])
exp_equality_test(x, y)