# 4.4.4 Random Variables: Uniform & Binomial: Video

PROFESSOR: Certain kinds

of random variables keep coming up, so let’s

look at two basic examples now, namely uniform

random variables and binomial random variables. Let’s begin with

uniform, because we’ve seen those already. So a uniform random

variable means that all the values

that it takes, it takes with equal probability. So the threshold variable Z

took all the values from 0 to 6 inclusive, each

with probability 1/7. So it was a basic example

of a uniform variable. And other examples

that come up, if D is the outcome of a fair

die– dies are six-sided. Dice are six-sided. So the probability that it comes

up 1 or 2 or 6 is 1/6 each. Another game is the

four-digit lottery number where it’s supposed to be the

case that the four digits are each chosen at random, which

means that the possibilities range from four 0’s up through

four 9’s for 10,000 numbers. And they’re supposed to

be all equally likely. So the probability that the

lottery winds up with 00 is the same as that it ends up with

1 is the same that it ends up with four 9’s. It’s 1/10,000. So that’s another

uniform random variable. Let’s prove a little lemma

that will be of use later. It’s just some practice

with uniformity. Suppose that I have R1, R2,

R3 are three random variables. They’re mutually independent. And R1 is uniform. I don’t really care

about the other two. I do care technically that they

are only taking the values. They only take values

that R1 can take as well. So I haven’t said

that on this slide, but that’s what we’re assuming. And then I claim is that each

of the pairs, the probability that R1 equals R2– the

event that R1 is equal to R2 is independent of

the event that R2 is equal to R3, which is

independent of the event that R1 is equal to R3. Now, these events overlap. There’s an R1 here and an R1

there and there’s an R2 here and an R2 there. So even though the R1, R2,

R3 are mutually independent, it’s not completely clear. In fact, it isn’t really

clear that these events are mutually independent. But in fact, they’re not

mutually independent. In fact, they’re

pairwise independent. They’re obviously not

three-way independent– that is, mutually

independent– because if I know that R1 is equal to R2 and I

know that R2 is equal to R3, it follows that

R1 is equal to R3. So given these two,

the probability of this changes dramatically

to certainty. So this is the

useful lemma, which is that if I have

the three variables and I look at the three possible

pairs of values that might be equal that whether any

two of them are equal is independent of each other. Now, let me give a

handwaving argument. There’s a more

rigorous argument based on total probability

that appears as a problem in the text. But the intuitive ideas,

let’s look at the case that R1 is the uniform

variable, and R1 is independent of R2 and R3. So certainly, that implies that

R1 is independent of the event that R2 is equal to R3,

because R1 isn’t mutually independent, both R1 and R2. Doesn’t matter what they do, so

it’s independent of this event that R2 is equal to R3. Now, because R1

is uniform, it has probability p of equaling

every possible value that it can take. And since R2 and R3 only take

a value that R1 could take, the probability that R1 hits

the value that R2 and R3 happens to have is still p. That’s the informal argument. So in other words, the claim

is that the probability that R1 is equal to R2

given that R2 is equal to R3 is simply the probability

that R1 happens to hit R2, whatever values R2 has. This equation says

that R1 equals R2 is independent of R2, R3. And in fact, in both cases,

it’s the same probability that R1 is equal

to any given value, the probability

of R being uniform has of equaling each

of its possible values. You can think about that,

see if it’s persuasive. It’s an OK argument, but

I was bothered by it. I found that it took me–

I wasn’t happy with it until I sat down

and really worked it out formally to

justify this somewhat handwavy proof of the lemma. All right. Let’s turn from uniform

random variables to binomial random variables. They’re probably the most

important single example of random variable that

comes up all the time. So the simplest definition

of a binomial random variable is the one that

you get by flipping n mutually independent coins. Now, they have an order,

so you can tell them apart. Or again, you can say that

you flip one coin n times, but each of the flips is

independent of all the others. Now, there’s two parameters

here, an n and a p, because we don’t assume

that the flips are fair. So there’s one parameter is

how many flips there are. The other parameter

is the probability of a head, which might

be biased that heads are more likely or

less likely than tails. The fair case would

be when p was 1/2. So for example, if

n is 5 and p is 2/3, what’s the probability that we

consecutively flip head, head, tail, tail, head? Well, because

they’re independent, the probability

of this is simply the product of the

probability that I flip a head on the

first toss, which is probability of H,

which is p; probability of H on the second toss;

probability of T on the third; T on the fourth; T on the fifth. So I can replace

each of those by 2/3 is the probability of a head. 2/3, 1/3. 1 minus 2/3 is the

probability of a tail. 1/3, 2/3. And I discover that the

probability of HHTTH is 2/3 cubed and 1/3 squared. Or abstracting the probability

of a sequence of n tosses in which there are i heads

and the rest are tails, n minus i tails, is

simply the probability of a head raised to the i-th

power times the probability of a tail, namely

1 minus p, raised to the n minus i-th power. Given any particular sequence

of H’s and T’s of length n, this is the probability that’s

assigned to that sequence. So all sequences with the

same number of H’s have the same probability. But of course, with

different numbers of H’s they have different probabilities. Well, what’s the probability

that you actually toss i heads and n minus i tails

in the n tosses? That’s going to be equal to the

number of possible sequences that have this property of

i heads and n minus i tails. Well, the number

of such sequences is simply choose the i

places for the n heads out of– choose the i places for

the heads out of the n tosses. So it’s going to be n choose i. So we’ve just figured out that

the probability of tossing i heads and n minus

i tails is simply n choose i times p to the i,

1 minus p to the n minus i. In short, the probability

that the number of heads is i is equal to this number. And this is the

probability that’s associated with whether

the binomial variable with parameters n

and p is equal to i is n choose i, p to the i,

1 minus p to the n minus i. This is a pretty basic formula. If you can’t memorize

it, then make sure it’s written on any crib

sheet you take to an exam. So the probability

density function, it abstracts out some

properties of random variables. Basically, it just

tells you what’s the probability that the

random variable takes a given value for every possible value. So the probability density

function, PDF of R, is a function on

the real values. And it tells you for each

a what’s the probability that R is equal to a. So what we’ve just said is

that the probability density function of the binomial

random variable characterized by parameters n and p at i

is n choose i, p to the i, 1 minus p to the n

minus i, where we’re assuming that i is an

integer from 0 to n. If I look at the

probability density function for a uniform

variable, then it’s constant. The probability density

function on any possible value v that the uniform variable

can take is the same. This applies for v

in the range of U. So in fact, you could

say exactly what it is. It’s simply 1 over the

size of the range of U, if U is uniform. A closely related

function that describes a lot about the behavior

of a random variable is the cumulative

distribution function. It’s simply the probability that

R is less than or equal to a. So it’s a function

on the real numbers, from reals to reals,

where CDF R of a is the probability that R

is less than or equal to a. Clearly given the PDF,

you can get the CDF. And given the CDF,

you can get the PDF. But it’s convenient

to have both around. Now the key

observation about these is that once we’ve abstracted

out to the PDF and the CDF, we don’t have to think about

the sample space anymore. They do not depend

on the sample space. All they’re telling

you is the probability that the random variable

takes a given value, which is in some ways, the

most important data about a random variable. You need to fall back on

something more general than the PDF or the

CDF when you start having dependent

random variables, and you need to know how

the probability that R takes a value changes, given

that s has some property or takes some other value. But if you’re just looking

at the random variable alone, essentially everything

you need to know about it is given by its density

or distribution functions. And you don’t have to worry

about the sample space. And this has the advantage that

both the uniform distributions and binomial distributions

come up [AUDIO OUT] –and it means that all of these

different random variables, based on different

sample spaces, are going to share a

whole lot of properties. Everything that I derive

based on what the PDF is is going to apply

to all of them. That’s why this abstraction

of a random variable in terms of a probability

density function is so valuable and key. But remember, the definition

of a random variable is not that it is a

probability density function, rather it’s a function from

the sample space to values.

## 2 Replies to “4.4.4 Random Variables: Uniform & Binomial: Video”

10:03 cat fight!

Your tutorials are the clearest and most intuitive ones i've come across. Thank you so much!