1. Discrete Random Variables. Suppose that we are interested in the number of cups of co↵ee drank by a

(randomly selected) student at UCLA. This quantity can be represented as a random variable Y with

probability mass function:

pY (a) =

8>>>>>><

>>>>>>:

14

if a 2 {0, 1, 2} 18

if a = 3

3

32 if a = 4

c if a = 5

0 otherwise

,

where c is an unknown constant.

(a) Explain why the number of cups of co↵ee drank in a day by a randomly selected student at UCLA

is a random variable.

(b) What is the relevant outcome space of the random variable Y ?

(c) Explain what the distribution of this random variable represents. In other words distribution of

Y assigns a probability to any subset of the outcome space. How do we interpret this probability?

(d) Solve for c. (Hint: Recall that PY (OY ) = 1 so that

P

a2OY pY (a) must equal one).

(e) What is the probability that a randomly selected student at UCLA drinks at least 3 cups of co↵ee

a day, PY (Y ” 3)?

(f) What is the expected number of cups of co↵ee drank per day for a randomly selected student at

UCLA?

2. Continuous Random Variables. Suppose that we are interested in the income of a randomly selected

Angeleno. The distribution of incomes (in tens of thousands of dollars) for residents of Los Angeles

can be described as a random variable, X, with the following pdf.

fX(a) =

(

0.11 − cx if 0 x 10

0 otherwise

,

where c is an unkown constant.

(a) What is the outcome space of X, OX?

(b) Using the relationship

PX(l X m) =

Z m

l

fX(a) da,

explain why the pdf must always be weakly positive, fX(a) ” 0, for any a 2 R.

(c) Because PX(OX) = 1 we must have that

R 10

0 fX(a) da = 1. Using this fact, solve for c.

(d) What is the expected value of X, E[X]?

(e) What is the variance of X, Var(X)?

3. Variance and Covariance. Let Y be a random variable representing income (in tens of thousands of

dollars) and X be a random variable representing years of education. Suppose that the marginal

distribution of X is described by its probability mass function

pX(x) =

8>>>><

>>>>:

0.05 if x 2 {1, 2, . . . , 12}

0.09 if x 2 {13, 14, 15, 16}

0.04 if x 2 {17}

0 otherwise

.

The marginal distribution of Y is described by its probability density function

fY (y) =

(

0.1 if0. x . 10

0 otherwise

.

(a) What is the expectation of Y , E[Y ]? What is its variance, Var(Y )?

(b) What is the expectation of X, E[X]? What is its variance, Var(X)?

(c) Using E[YX] = 60 compute the covariance between Y and X, Cov(X, Y ).

(d) Calculate the correlation coefficient between X and Y .

.YX =

Cov(X, Y )

“X”Y

.

(e) What does this covariance tell us about the relationship between education levels and income? Is

there a positive or negative association?

(f) Should we interpret this result as a causal relationship between education and income? What are

some reasons we may want to refrain from this interpretation?

(g) (Challenge) A common inequality used in econometrics is the Cauchy-Schwarz inequality. It

states that, for any random variables X and Y , and any functions g(·) and h(·),

))

E[g(X)h(Y )]

)) . p

E[g2(X)]

p

E[h2(Y )].

Use this inequality to show why the correlation coefficient is bounded between negative one and

one, .1 . .XY . 1. (Hint: Try g(x) = x . μX and h(y) = y . μY ).

Introduction to Single Linear Regression

1. Useful Equalities. Recall that in deriving the form of ˆ #1 we used the following equalities

1

n

Xn

i=1

(Yi − ¯ Y )(Xi − ¯X ) =

1

n

Xn

i=1

YiXi − ¯ Y ¯X and

1

n

Xn

i=1

(Xi − ¯X )2 =

1

n

Xn

i=1

X2

i − ( ¯X )2.

Show either one of these equalities (only have to show one or the other).

2. Assumptions for Inference. Suppose we are interested in the relationship between the size of the average

American’s social circle, X, and whether or not they are unemployed, Y . To investigate this relationship

we want to estimate the following regression equation1

Y = #0 + #1X + ✏, E[✏] = E[✏X] = 0.

1Recall that this regression specification corresponds to finding the line of best fit parameters !0,!1 = argminb0,b1 E[(Y

To estimate the regression coefficient parameters we collect a sample of size n, {Yi,Xi}ni

=1. Recall

that for valid asymptotic inference on our estimates ˆ #0 and ˆ #1 we require the following assumptions:

Random Sampling, Homoskedasticity, and Rank condition.

- Random Sampling: Assume that {Y,Xi} are independently and identically distributed from the

population of interest, (Yi,Xi) i.i.d ⇠ (Y,X).

- Homoskedasticity: Assume that Var(✏|X = x) = “2

✏ for all possible values of x.

- Rank Condition: There must be at least two distinct values of X that appear in the population.

(a) Suppose we collect our sample by only randomly surveying people on UCLA campus. Which

assumption would be violated?

(b) Suppose we collect our sample and find that everyone appears to have exactly one friend. Which

assumption would be violated? Why is this a problem when computing the line of best fit through

our sample?

(c) Suppose random sampling, homoskedasticity, and the rank condition are all satisfied, but n = 10.

Why might inferences based on the approximation

ˆ #1 − #1

ˆ””1/pn ⇠ N(0, 1)

not be valid?

3. Hypothesis Testing. Suppose now that we are interested in investigating the relationship between the

size of someone’s social circle, X, and their income (in tens of thousands of dollars), Y . We want to

estimate the following linear regression model

Y = #0 + #1X + ✏, E[✏] = E[✏X] = 0.

To do so we collect a random sample of size n = 64, {Yi,Xi}64

i=1 and find that 1n

Pn

i=1(Xi− ¯X )2 = 100,

1n

Pn

i=1(Yi − ¯ Y )(Xi − ¯X) = 225, ¯ Y = 5.5, and ¯X = 1.5.

(a) Using this information find and interpret ˆ #1 and ˆ #0.

(b) After finding ˆ #1 and ˆ #1 describe how you would construct the estimated residuals ˆ✏i.

(c) We find that 1n

Pn

i=1 ˆ✏2i

= 36. Use this and the result that, for n large,

ˆ #1 − #1

ˆ””1/pn ⇠ N(0, 1),

to compute the (approximate) probability that, if the true value was given #1 = 0, we would see

a value of |ˆ #1| equal to or larger than the one that we observed.

(d) Use this result to test, at level ↵ = 0.1, the hypotheses

H0 : #1 = 0 vs. H1 : #1 6= 0

(e) Conduct this test in another fashion by constructing the test statistic t⇤ and comparing to either

z0.95 = 1.64 or z0.9 = 1.24 (indicate which value you are comparing the test statistic to).

(f) Construct a 90% confidence interval for #1. How could we use this to conduct the hypothesis test

in part (d)?

(g) Suppose that we find we made an error in our calculation and actually 1n

Pn

i=1(Xi − ¯X )2 = 1. If

all other values stayed the same, how would this change the result of the hypothesis test in part

(d)?