# Tutor profile: Erin M.

## Questions

### Subject: SAS

The following code performs a one-tailed t-test to determine if the mean MPG for a sample of 10 cars is significantly lower than 30 at an $$\alpha = 0.05 $$ Data NewModel; Input MPG; Datalines; 26.6 30.4 32.5 26.3 31.0 25.9 29.7 24.8 30.6 28.1 ; Run; Proc ttest data=NewModel H0=30 plots side=l; var MPG; Run; How would you modify the code to perform the test with a significance level of $$\alpha = 0.01$$?

Proc ttest data=NewModel H0=30 plots side=l alpha=0.1; var MPG; Run;

### Subject: R Programming

We're going to write some code in R to help understand the principle of the Central Limit Theorem. Part 1: Draw 500 random samples of size n= 10 from a normal distribution with $$ \mu = 0$$ and $$\sigma = 1$$. Compute the means for each of the 500 samples and plot the distribution of means. Describe the shape of the distribution of means and calculate its mean and standard deviation (AKA standard error). How does this distribution of means compare to the original distribution from which we drew our sample? Part 2: Repeat this process only now draw samples of size 10,000 from the normal distribution with $$ \mu = 0 $$ and $$\sigma = 1$$. Compare this new distribution of means to that of the one where we drew samples of size 10. Part 3: Repeat the process one more time only now draw samples of size 100 from a uniform distribution with values within [a= 0, b =1] (note: $$ \mu = \frac{b-a}{2} = 0.5 $$ and $$\sigma = \sqrt{\frac{(b-a)^2}{12}} \approx 0.3 $$). Describe the shape of the distribution of means and compute its mean and standard deviation (standard error).

#Part 1 x = matrix(rnorm(5000,0,1),100,500) means = rep(0,500) for (j in 1:500){means[j] = mean(x[,j])} hist(means) mean(means) sd(means) The distribution of means is approximately normally distributed with a mean near 0 and a standard deviation approximately equal to $$ \frac{\sigma}{\sqrt{n}}$$ = 0.1. #Part 2 y = matrix(rnorm(5000000,0,1),10000,500) means = rep(0,500) for (j in 1:500){means[j] = mean(y[,j])} hist(means) mean(means) sd(means) Again, the distribution of means is approximately normally distributed with a mean near 0 and a standard deviation approximately equal to $$ \frac{\sigma}{\sqrt{n}}$$ = 0.01. #Part 3 z = matrix(runif(5000000,0,1),10000,500) means = rep(0,500) for (j in 1:500){means[j] = mean(z[,j])} hist(means) mean(means) sd(means) Just like in our previous two examples, the distribution of means is approximately normally distributed with a mean near the mean of the uniform distribution 0.5, and a standard deviation approximately equal to $$ \frac{\sigma}{\sqrt{n}}$$ = 0.003. The shape of our distribution of means will be approximately normal, regardless of the shape of our original distribution (provided a sufficient sample size).

### Subject: Statistics

In a linear regression framework, what correlation (r) between the predictor variable (X) and the outcome variable (Y) will produce the largest standard error of estimate (i.e. the standard error of the residuals)?

The standard error of estimate is based on the minimized squared residuals (or "errors" which are the distance between the actual and predicted Y). The residuals or errors will be larger when there is a weak linear correlation between X and Y. Since the r = 0 indicates the weakest linear correlation, the standard error of estimate will be largest when r = 0.

## Contact tutor

needs and Erin will reply soon.