# Lecture 12: Discrete vs. Continuous, the Uniform | Statistics 110

So, the main topic for the next couple

lectures is continuous distributions. We’ve learned about the binomial,

and the poisson, and the hypergeometric, and so on, and

at this point we’ve covered all of the famous discreet distributions

that we need in this course. And no now is a good time to start talking

about the continuous distributions. I like to do discreet before continuous, because conceptually it’s

simpler to think about discreet. But it doesn’t mean that

continuous is harder, necessarily, because discreet is kind of

conceptually easier in a sense. But, on the other hand, we have all

these nasty sums that come up, and so we learn some ways to sometimes

avoid the sums using stories, and so on, but sometimes you just have

the sum you can’t deal with. The continuous case, we’ll be doing

integrals instead of sums, and even though this sounds counterintuitive, in general,

it’s easier to do an integral than a sum. Although the same thing could come up, we could be faced with integrals

we don’t know how to do. So again, we’re gonna try to look for

kind of more clever, and more conceptual ways to avoid having

to do lots and lots of integration. But anyway, we’ll come to that later. But, a lot of the ideas

are completely analogous. So, at this point, I’m assuming you

have a pretty good understanding of what a PMF is, and

what is a discreet distribution? What does it really mean, and the expected

value of a discreet distribution, and now we’re just gonna move

into the continuous case. So, I think, and just for

having a big picture on this, it helps to just kind of

contrast the two things. So I’m gonna make kind of a dictionary

of discrete world and continuous worlds. So we can put discrete world over here and

continuous world over here. So we have a random variable

that we’re looking at, and usually we’ve been calling our random

variable x in the discrete case, and usually we’ll call it

x in the continuous case. So, so far it’s completely analogous. We got discrete, continuous. Now in the discrete case, as you’re very

familiar with by now, we have a PMF, Which you can just think of as the P(X=x),

viewed as a function of little x. So if it takes positive integer values,

then I would need to specify this for all positive integers x. In the continuous case, the [P(X=x)=0]. So in that case we have a PDF instead,

which usually we would write as f(x), but

you can call it whatever you want. I’ll call it f sub x (x) just to

emphasize that this is the PDF of x. So I’m gonna tell you what a PDF is,

but I’m just telling you now, that it’s analogous to a PMF. The reason we need this

is that the [P(X=x)=0]. So continuous, it means we’re thinking of

random variables that could take on any real value, or

maybe any real number in some interval. So say we had the interval

from zero to one, and X is allowed to take on real

number value between zero and one. Well, I mean we could make up examples

where this not true, but in the continuous case every, there are uncountably many

real number between zero and one, and any specific number like Pi

over four has probability zero. So if we just try to write down a PMF

we would just say it’s zero, and that would be useless. So that’s why we need

something else instead. So, I’ll tell you what a PDF is,

but that’s the analogy. Just to continue this a little more, then I’ll start telling you

more about what PDFs are. We have a CDF. That’s this function F(x)=P(X=x), and sometimes we’ll subscript the x just

because maybe if we add another random variable y, we could write F sub y for

its CDF, okay? Well, in the continuous case we

have a CDF, exactly the same thing. So that’s one advantage, and

we’ve seen in the discrete case, usually it’s easier to deal

with the PMF than with the CDF. The CDF in the discrete case is a lot like

a step function with all these jumps. It’s not so easy to deal with,

and this is much more direct. But one virtue of CDFs is

a CDF is completely general. So every random variable has a CDF, and so

we don’t need to separate out the theory. Now, let’s talk about PDFs. So, now this is a PMF. So the PDF is the most common way to

specify a continuous distribution. PDF stands for

probability density function, not portable document format,

probability density function. Okay, so the keyword here is density. The common mistake with PDFs is to

think that they’re probabilities. It’s not a probability,

it’s a probability density. So you can think of density,

just in an intuitive sense as like, think of probability as mass. Remember if the pebbles with

the total mass equals one? But in the continuous case we can’t think

of pebbles any more, it’s more like we just have this kind of massive of mud

that we’re smearing around the space. So I think of discrete as pebbles,

continuous as mud. The total mass of the mud is one,

and density, makes you think of mass per volume, mass per area,

things like that, mass per length. Okay, so it’s probability per something,

but not probability. So we say that x, this is a definition. A random variable, X, has PDF F(x) if in order to find probabilities for X we can achieve that by integrating the PDF. So, if the probability of

let’s say X is between a and b, that is X is in some interval a b, must be given by the integral

from a to b of f(x)dx. For all a and b. So, f(x) is not a probability, it’s

what you integrate to get probability. Integrated density,

then you get a probability. So that’s the definition, and let’s see

how this relates to CDF and other things. Notice, by the way, that if we let a=b, Then we’re integrating from a to a,

of f(x)dx. So that’s the area under the curve

from a to a, which is zero, cuz you haven’t actually specified

any interval, so that’s zero. Which agrees with what I said there, the

probability of any specific point is zero. We need an interval of non-zero length. Okay, so that’s called a PDF,

and to be valid, remember, for a PMF, I said that a PMF is valid

if they’re non-negative and they sum up to one, right? So by analogy, for

a PDF we want them to be non-negative, and rather than summing to one,

it should integrate to one. Okay, so to be valid,

f(x) is greater than or equal to 0, and the integral of f(x) for

minus infinity to infinity should equal 1. Otherwise, we have not

specified a valid PDF. So it might look something like, just to draw a picture,

an example, maybe a PDF. A famous example would be the bell curve

type of thing that we’ll get to later. But anyway, for

the purpose of the picture, I don’t really care exactly what

the definition of this function is. But I’m just drawing some curve

from minus infinity to infinity. Now it might be that it’s

0 on the negative side and only positive to the right or whatever,

but it’s some continuous looking curve. And the total area, if I shaded

the whole area under this curve, I would get 1, right? And so larger points where the density,

I drew a symmetric one, but it doesn’t have to be symmetric,

it could be some nasty looking curve. As long as it’s non-negative and

the area under the curve is 1, those are the requirements. So to kind of interpret a little more

of what does the density really mean? Cuz I said it’s not a probability. If we take f(x), let’s say, at some

point x0, what is that really like? If we take some point x0 here and

we say the density is this number. What does that mean? It’s possible that this

number is greater than 1, for example, because you can have a function

that sometimes is greater than 1, but the integral could still be 1, right? So we can’t say that’s a probability, but

what we can say is, so this is a density. So if you think of it as like

probability per unit of length, then if we multiply by some small number,

let’s say, epsilon is approximately

the probability that x will fall in an interval

of length epsilon. Let’s call this, let’s say,

x0- epsilon/2, X0 + epsilon/2. So all I did was take x0. The probability of the random variable

exactly equaling x0 is 0, okay? But if we take some tiny,

so for epsilon, very small. So the probability is 0 of it equaling x0,

but we take some tiny little interval around x0, I just wrote

down an interval of length epsilon. Then the probability is approximately the

density times the length of that interval. But by multiplying by this epsilon here,

we’re kind of converting it back into a probability scale instead

of a density scale. And to see, so this is kind of a good

intuitive way to think of a density. But I haven’t yet shown you why is that equivalent to this

mathematical thing that I wrote here. But to see why this is true,

just by staring at this, If we wanna find the probability,

then what do we do? We integrate the pdf from here to here,

right? So imagine this integral where you’re

integrating from here to here, okay? And then let’s think about

what would that integral be. Well, I didn’t just say epsilon was small,

I said epsilon is very small. I could have said very, very small. Now if epsilon is very,

very, very small, what that means is that in that tiny little

interval, f is not gonna change very much. So over that tiny miniscule interval, we can treat this function as

being approximately a constant. And it’s easy to integrate a constant, the integral of a constant is just the

constant times the length of the interval. And that’s all we did, so we’re treating

it, if it’s approximately this constant on that interval times

the length of the interval. So, that’s why this follows from this. And this is more useful for driving

things, but this gives you some more intuition on what’s the difference between

a probability density and a probability. Okay, so, let’s see how is

this thing related to the CDF? So if x has PDF, little f, let’s find the CDF. Well, by definition, the CDF is

the probability, That x is less than or equal to little x, but by definition,

I said the definition of a PDF is that’s the thing that you integrate

to get probability, right? So if I wanna know what’s

the probability that x is in any region, all I do is integrate

the PDF over that region. So here, it would simply integrate

from minus infinity to x of, I could call it f(x)dx, but it’s a little

bit clearer to change the letter, so f(t)dt. t is just a dummy variable here. I just didn’t want it

to clash with this x. That is, for any particular number x,

we’re gonna take this curve. Let’s say x is here. If this x is this x we’re looking at, then we’re saying just look at the area

under the curve up to this point. That would give us the CDF at that point,

all right? Because we wanna know the probability

of everything to the left, and probability is given just by

taking area under this curve. So it’s just the area under the curve

from minus infinity up to x. That’s all we’re doing. So that shows how to get

from a PDF to a CDF, okay? Well, what about the other way around,

if we have a CDF, how do we get the PDF? So go the other way around,

if x has CDF, capital F, and of course, we’re assuming it’s a continuous

random variable, not a discrete one. So in the continuous case, by the way,

the terminology, it could be slightly confusing because when we say we

have a continuous distribution, it means capital F should be continuous. But we don’t just want

it to be continuous, we want it to be differentiable. So the continuous refers to not so

much to F being a continuous function. It refers to the fact that x can

take on a whole continuum of values, rather than just discrete values, okay? So if it has CDF F, and

x is a continuous random variable, and then we want to get,

From the CDF to the PDF, So f(x)=, so let’s think about that. This is the relationship between a CDF and

a PDF, okay? And but now I wanna say, if we know this

integral, how can we extract out this? Well, the answer is just take

the derivative, right, f(x)=F'(x). And why is that true? Your favorite theorem of calculus, that’s

the fundamental theorem of calculus, FTC. Actually, we’re gonna need both parts of

the fundamental theorem of calculus, so it’s nice that actually that

it is pretty fundamental. At least the way I learned it, part one of

the fundamental theorem of calculus said if you have an integral that looks like

this, up to some indetermined upper limit, if you take the derivative of that,

then you just get this function. So that’s the first part of

the fundamental theorem of calculus. The second part of the fundamental

theorem of calculus says that if you wanna do a definite integral, you find anti-derivative and

then evaluate it at the two end points. So okay, this is just saying

the derivative of the CDF is the PDF, in the continuous case. So it’s a very straightforward

relationship between them. And if we wanted to know, this also kind

of confirms something we did earlier. Let’s say we wanna know the probability

that x is between a and b. And in the discrete case it’s crucial

whether less than or equal, and so on. In the continuous case, it makes no

difference if you write strict or not strict here. So according to the definition of a PDF, if we wanna get the probability of this

interval that x is in that interval. All we do is integrate

the PDF from there to there. But another way to think about this would

be, remember your fundamental theorem of calculus, and

the notation matches up pretty well too. Because, like in AP calculus, usually if

you have a function little f, usually it will call its anti-derivative capital F,

which is exactly what we’re doing here. If we wanna do this integral,

we take some anti-derivative. Well, we already have one, that’s the CDF. And then we evaluate here, evaluate there. So that’s just F(b)- F(a). So that’s also true by

fundamental theorem of calculus. And that’s similar to a result

that we had earlier for CDFs, so it’s consistent with earlier stuff, okay? So we’ll do some examples

in a little while. But right now this is just the general

framework, and making the analogy, okay? So we have a CDF, and I’ll just add here,

just to have it in this dictionary too. That’s the derivative of the PDF is

the derivative of the CDF, question?>>[INAUDIBLE]

>>Yeah, and the question is, in this framework,

is big F always differentiable? Yeah, we have to assume

that it’s differentiable. I mean, there are functions

that are continuous, but not differentiable everywhere. But in that case it would just

be a more complicated thing. And when we say continuous random variable

in this course it means we have a CDF which has a derivative. Because if we don’t have a PDF then we’re

not dealing with continuous distributions and things can be much nastier. So yeah we’re assuming that

this derivative exists. Okay, so that’s, So

in general if I ask you, find the distribution of whatever,

in the continuous case, in the discrete case,

you can either give the PMF or the CDF. Those are equally valid ways

to describe a distribution. In a continuous case, you can give the PDF

or the CDF, those are equally valid ways. Okay, so let’s continue this list. In the discrete case,

we have the expected value, right? And remember the expected value,

we just take the sum of the values times the probability

of the values, okay? So in the continuous case, this would just be 0 because all of

these are 0, so that’s no useful. But by analogy, instead of a sum,

we’ll do an integral. S the definition of the expected

value in the continuous case is that we integrate x times the PDF. So it’s completely analogous. In general we’re gonna integrate

from minus infinity to infinity. And sometimes we’ll deal with random

variables where the only possible values are say between 0 and 1. And in that case we’re just integrating

0 outside of that interval. So then we would restrict it to

the region where it’s non-zero. But in general that’s best the definition,

okay? So that’s completely analogous. Let’s do one more concept that applies in

both the discrete and continuous cases. And that’s the notion of variance, so

we’ve been talking about expected values. But that’s just giving a one number

summary of the average, right? But it is not telling us anything about

the spread of the distribution, right? How spread out is it? So for that, we need the idea of variance,

and the definition of the variance. So intuitively, variance is just supposed to be a measure

of how spread out the distribution is. That is, on average,

how far is x from its mean? So we might start by trying to do

the expected value of x minus the expected value of x. Here’s the mean. This is the difference between x and

its mean. But if we just did this though,

we would always get 0, though. Because by linearity,

that’s E of X- E of E of X. But E of E of X isn’t E of X. Because it’s just a constant, so this would be useful because

that would just be zero. Okay, so then I guess the most

obvious thing to me to do to fix that problem is to

put absolute value signs. Because then we’re making

it non-negative and then there won’t be 0 anymore,

except if x is a constant. But absolute values

are annoying to deal with. For example, the absolute value function,

it’s this V shaped thing right? It has a sharp corner,

it’s not differentiable. It is difficult to work with. So the standard way to deal with this is

instead of absolute values, to square it. One reason as I said is

that the absolute value is just annoying because

it’s not differential. Kind of a deeper reason though is that

the square, anytime you see squares, it should start to reminding you

of the Pythagorean theorem, right? It means that there’s a lot of geometry, there’s a lot of beautiful geometry that

goes on with squares, and sums of squares, and right triangles, and

Euclidean distance, and things like that. And you lose that geometry if

you’re using absolute value, and there are other reasons as well. But anyway this is the standard

definition of variance. So this is on average, how far is x from

its mean, except that we’re squaring it. One annoying thing about squaring it

though, is that we changed the units. So if x is like a measurement that

let’s say it’s measured in miles. We are measuring some distance

in miles and we square it, we’ve got miles squared, okay? And so that’s no longer in the same

units as what we started with. So because of that, something more

interpretable is the standard deviation, Which is a familiar term. Standard deviation is defined as just

the square root of the variance. So this seems, at first, like a kind

of convoluted thing to be doing. First we square everything then we

take the average then we square root it back again. The reason is that the variances

has really nice properties, but on the other hand we changed the units,

so we just change it back at the end. So that’s the definition

of standard deviation. In general, variance is a lot nicer to

work with than standard deviation as far as doing the math. But then at the end of the day when you

want to have something interpretable. It’s easier to think about what

the standard deviation means, because you’re back on the original units. Okay, and let’s just write one. One nice thing about this letter E

notation, this is a really good notation. E for expectation. Because I could just write

down this one thing and I didn’t assume here that X is

continuous or discrete or anything. This is just a general definition and

I didn’t need to write a separate definition for

the discrete or for the continuous case. So this is a unified definition. Let’s just write the other. Another way to compute variance,

rather than doing this, so another way to write variance which

is more commonly used than this one. This one’s the usual definition but the

other way to write it which I’m about to show you is usually easier for

computing it. Not always, sometimes this one’s easier. Another way to express variance. So we want the variance of X. Let’s just expand this thing out. I’m just gonna multiply it out, right. So that’s X squared. -2 X(EX) I’m just squaring this thing + (EX) squared. And let’s use linearity this

is E(X) squared, minus. Now for this middle term, the 2 is

a constant and constants can come out. The E(X) is also a constant, right? X is a random variable,

E(X) is just a number. The 2 E(X) is just

a number that comes out. So that’s 2 E(X), and

then what’s left inside is still an E(X). So we have another E(X) there, and

then plus, this thing is also a constant. So taking its expected value does nothing

because it’s just a constant already. So that’s + E(X) squared, and so this whole thing just becomes

E(X) squared- E(X) squared. And it sounds like what I just said was 0,

but the parentheses are different. Here we square it first

then take the average. Here we take the average then square it. We take that difference, okay? So that’s usually easier. And so that answers the age old question,

if you had, this question came up for me I think in seventh grade

science class where I had to, do a bunch of experiments and

then I got a bunch of numbers. And for some reason I was squaring

the numbers and I wanted the average. I didn’t know whether I should

square first and then average or average first and then square. And I think I computed both ways and

I got a slightly different answer. Which one is correct? Well, this one doesn’t say

which one is correct, but this says that this one will always

be bigger than or equal to this one. And equality holds only in

the case when X is a constant. So if X is a constant, then the variance

is 0 cuz X just equals its mean obviously. If X is not a constant, than what’s

gonna happen is that this thing, that you’re averaging some

numbers that may be sometimes 0. But it’s certainly sometimes positive,

and you can’t average positive things and get a negative number. You can’t average positive things and

get 0. So it would be strictly positive. Which means this is

strictly greater than this. Except in the case of a constant. So, okay, that’s the variance. And as far as notation,

it’s standard to write E(X) squared for E(X) squared this way. That’s just standard notation. So, if you see E(X) squared, you should

always interpret that as squaring first, and then take the E. That’s just a convention,

a pretty standard convention. This way is a little more clearer,

to avoid any possible ambiguity. But, it’s very common to

see it written this way. So interpret it as squaring first. Okay so that’s variance, and over here

we can continue our little dictionary, variance of X=E (X squared)-

E (X) squared the other way, And then the continuous case,

same thing again. And the one difficulty with this is,

we’ve been talking on how do we compute E(X), but

how do we actually compute E(X) squared, that’s the question that

we need to address. How do you actually compute that thing? So we’ll talk about that a little later. But first, we should see at least one

example of a continuous distribution. The simplest one to start

with is called the uniform. As far as what you’ll

need before the midterm, there are only two continuous

distributions that you need to know by name before the midterm,

and then we’ll do more later. One is the uniform,

the other is the normal. Uniform is the simplest

continuous distribution, so we’ll start with that one right now. Normal distribution we’ll talk about

mostly for next week, and normal distribution is the most famous important

distribution in all of statistics. And the reasons why it’s so important will kind of gradually

emerge over the semester. Let’s start with the uniform. So here’s the uniform distribution

on some interval on (a,b). So we have some interval from a to b. I’ll say here’s a, here’s b. We wanna pick a random

point in this interval. I’ll put random in quotes. In this interval. How do we do that,

the question is what does random mean? If it’s sort of intuitively

random is too vague because that just means we have some

random variable okay? What if we said completely random. Like what’s the most

random that it could be? Again that’s a little bit vague but let’s just kinda explore that intuition a

little bit and then write down a formula. If it’s completely random see I can

just see the probability of any two points is the same because all real

numbers between here and here. Every individual number is probably 0. So it’s not so interesting to say

all the probabilities are the same. So pick some random point say,

there, x but the problem if I got that

exact value right there was 0. Okay so that means we still have

the same way it does mean for it to be completely random. So well the intuitions

now is suppose we broke this interval into two halves

where this is the midpoint say. Intuitively, if it’s completely

random it should be that this half is equally likely as this half. Cuz If it were not then it seems like we

would kind of prefer to be, you know, the random variable prefers to be

more to the right than to the left. And somehow we want a concept

where it’s not gonna, it doesn’t care where it is, right? So in other words we could

say that probability, so for the uniform means that probability

is proportional to length. That’s a reasonable definition. That is, if we take two

intervals of the same length, they should have the same probability. If one interval is twice as long, it seems reasonable that that

one should be twice as likely. So we’re just gonna write down

a continuous distribution where probability is

proportional to length. And so to specify this, we can

either write down the pdf and drive, the cdf, or we can try to figure out what

the cdf should be and derive the pdf. Let’s start with the pdf here,

because we’re trying to practice pdfs. So here’s the pdf, it’s a constant. If x is between a and

b and it’s 0 otherwise, Because I want probability to

be 0 outside of this interval. Inside that interval I want the density

to be constant, because if the density were higher at one point than another,

that doesn’t seem very uniform. So well of course, we could ask, what’s c? Well it has to be that

the integral of the pdf is 1. And I could start out by integrating

from minus infinity to infinity, but of course we only need to integrate from

a to b, cuz it’s zero outside of there. If we integrate this we have to get 1, therefore c equals1 over b minus a. So it’s just one over

the length of the interval. It has to be this way otherwise

this would not be a valid pdf. Now suppose we want the cdf. So to get the cdf we just have to integrate this thing

minus infinity up to x. So how do we do that? Again, we don’t really have to go

all the way from minus infinity we can just start it at a. f of t dt,

then we have to consider some cases. Well, first of all this

is 0 if x is less than a. Well, this expression here that I wrote

down is kind of already assuming that x is greater than a. So assume x is greater than a. If x is less than a, then the probability

is 0 so the cdf has to be 0. And we also know that it’s 1 if well,

let me just write this separately. So here’s the cdf,

if x is less than a it’s 0. If x is greater than b it’s 1, because

we know for sure that x is less than b. Now, the interesting case is

what happens in the middle. To get the thing in the middle, all we

have to do is integrate a constant and this is the constant with the integral so

that I plug in f of t equals c here. That’s gonna be c times x minus a. Right just integrate the constant. It’s a very easy integral. So that’s just gonna be x

minus a over b minus a. If x is between a and b. And notice this makes sense because if

we let x equals to a here it reduces down to 0. And if we let x equal

to b it reduces to 1. So this is a continuous function. So it’s saying intuitively,

this is a linear function of x. They’re saying that the probability is,

as you increase x, the probability is increasing linearly. Which make sense, cuz you’re

accumulating more and more stuff. So let’s get the expected value of x. Expected value of x, again,

it’s just gonna be an easy integral. Because we just have to

integrate from a to b of x times the pdf. So I just wrote down x times the pdf. So integrating x is easy,

it’s just gonna be x squared over 2. So this is x squared

over 2 times b minus a. And we evaluate this

as x goes from a to b. So that’s really just b squared. Let’s factor our the 1

over two 2 b minus a. And then it’s b squared minus a squared. But b squared minus a squared

is b minus a, b plus a. So we can actually cancel the b minus

a and we just get a plus b over 2. Just doing that easy integral. Well, that’s a very intuitive answer. That’s just the mid point. It says the average is in middle which

it would really be weird if that didn’t happen cuz this is supposed to be uniform. Okay, so that was just check that. Now, we have a bit of a quandary though. For how to deal with the variants. So let’s try to get the variants. So If we want the variants then that

means we need e of x squared. Because we know this part

we don’t know this part. How do we get rid of x squared? Well, E of x squared, Equals? So if we think carefully about

this how do we get E of X squared? Well, X squared is a random variable. Let’s call that thing Y. So let’s let Y equal X squared. If we take a function of a random

variable it’s a random variable. So Y equals X squared. So that’s E of Y. And how do we get E of Y? Well to get E of Y then

we need to know the pdf, assuming X is continuous right now. To get E of Y then we need

to know the pdf of Y and then we integrate Y times the pdf of Y,

it’d be Y. So the question is do

we need the pdf of Y? But that sounds kind of annoying

because we don’t know the pdf of Y. Now we can get the pdf of Y, and later in

the course we will talk about how do we get the pdf of Y, but right now that’s

seems like a pretty annoying problem. So let’s kind of do this

more carelessly instead. Let’s just say well it’s too

much hustle to get the pdf of Y. So instead I’m just gonna say

I’m gonna reason by analogy. And I’m looking at this

formula right now for E of X. But I don’t want E of X. I want E of X squared. So I’m just gonna change

that to an X squared. All right, I want X squared, not X, so

I’m just gonna put down X squared there. And then I’ll go f of x dx. That’s the pdf of X, that’s what I know. And I’m too lazy to find the pdf of Y,

so I’ll just change X to X squared. Well, that doesn’t sound very legitimate. This, what I just did is called the Law

of the Unconscious Statistician. Which has a nice acronym

that’s just LOTUS. It’s called that because that just

seems like if you’re kind of like half asleep and

you just want to find this thing and you just kind of replace X by X

squared because X squared and it seems like something you might do

if you’re not thinking very hard. So to state it in general

in the continuous case, we want the expected value

of a function of that. X is a random variable who’s PDF we know. We want the expected

value of a function of X. So, the Principled Approach would be, find the distribution of this and

then work with that. The Lazy Approach would be,

still use the distribution of X but that sounds kind of too good to be true. So the Lazy Approach here would be well, I’m gonna take g of X I am

gonna change big X to little x. And then I am still gonna need

insist on using the density of x and not convert anything. Well, this turns out to be true. So I’ll put a box around it. We can talk sometime next week

about the proof, why this is true. But this turns out to be true. And thus, even though it sounds too

good to be true, it actually is true. So that’s called LOTUS. This is the continuous version. In the discrete let me

write both versions. So a continuous LOTUS is

that thing I just wrote, we have LOTUS so

same equation you can copy that there. And let me just write the discrete case, again we want the expect value

of some function g of S, so all I’m gonna do is take this. This is the definition

of the expected value. All I’m gonna do is change X to g of x. So this is gonna be g of

x times the PMF of x. It says we don’t need to convert and

get a distribution for g of x. We just do that. This is also valid. We’ll talk more about why later. But it’s useful to know that right now. So coming back to this

problem about the uniform, if we want the variance of the uniform so let’s let, just for simplicity let’s

let u be uniform between 0 and 1. And suppose we want the variance. So we know the expected value of u,

Is one-half, just the midpoint. And if we want E of u squared. According to LOTUS, we don’t need

to first find the PDF of u squared. We can just directly write down

the integral 0 to 1 of u squared times the PDF times the PDF f

sub u of u du as the PDF, but this PDF is actually equal to a constant

and that constant is 1 in this case. So this is just equal to,

this part is just one. So it’s the integral of u squared, the u, u cubed over 3, which is 1/3. So therefore the variance

of u equals e of u squared, minus e of u squared the other way. And that’s one-third minus

one-quarter equals one twelfth. So the variants of a uniform

zero one is a one twelfth and that was a very easy calculation

because we were able to use lotus here, which we haven’t proven yet but

we will talk more about that later. I’m showing you how to use it right now,

then we’ll justify it more. So that thing that’s too good

to be true actually works. So that’s Lotus. One more thing about

the uniform distribution. It seems like the uniform is the simplest continuous distribution that

you could possibly imagine. Because the PDF it’s just a constant. On some interval and one other point about this is we have

to have some bounded interval here. We cannot define a uniform

distribution on the entire real line. Sometimes that it’s a bit

annoying if there isn’t one. But if the whole real line there

will be no way to normalize it, there’d be no way to find a constant and

make it disintegrate to one. So it sounds like this is

an extremely simple distribution. And it is,

it’s just constant PDF on some interval. Extremely easy. So start with the uniform zero one,

it seems very simple, but actually uniform zero one

has the property that if you give me one uniform random variable

and you’re interested in some other distribution, there is a way to

convert it and simulate that. That is from the uniformed

zero one you can simulate or generate from any distribution

no matter how complicated it is. At least in principle. As a matter of computation that may be

easy or hard, but in principle from the uniform you can get anything, so

I call that universality of the uniform. Universality of the uniform means

that given a uniform you can create any distribution that you want. So that’s kind of theoretically nice

in that it kind of unifies concepts and says this things that’s seems very,

very simple to just one uniform. You can actually use it to generate

something that’s as complicated as you want. That’s kind of cool but it’s also useful

in practice where most computers programs can generate random numbers between zero

and one, it’s actually pseudo random. But they not know how to generate

whatever complicated distribution you’re interested in. And this, in many cases gives you away to convert from the random uniforms

to whatever you want to simulate. So I want to show why that’s true. So the statement is that,

we’re gonna start with the uniform between 0 and 1 and let F be a CDF, that we’re interested in. So usually we’ve been talking about here’s

the random variable and then find in CDF. Here we’re going the other way

in the sense that we assume that we have some CDF

that’s of interest to us. But we do not yet have access to

a random variable that has that CDF. So let F be a CDF and

it’s possible to generalize this further. But to make this something that we can

do fairly quickly let’s assume that F is strictly increasing, so

we don’t have to deal with flat regions. And let’s also assume that F

is continuous as a function. Just so that we don’t have to

think about jumps right now, although you can generalize this. Now then the theorem says that if we let X define X to be F inverse of u. So the inverse function exists in this

case because I took something that was continuously and strictly increasing,

it will have an inverse. So we take the inverse and we plug in u. Then the statement is that X

is distributed according to F. That is the CDF of X is F. So what this says is we have

this CDF we’re interested in. We take the inverse CDF, plug in

the uniform, and then we’ve constructed a random draw from that distribution

we’re interested in, capital F. So let’s prove this very quickly. And the proof doesn’t require

anything fancy at all. It doesn’t require anything, except for

understanding what a CDF is. So another reason I like to talk about

this is it’s just good practice with really understanding what a CDF is. Cuz the better you understand CDFs, then

the easier it is to see why this is true. So to prove this, all we need to

do is to compute the CDF of X. This notation means that X has the CDF F,

that is, X follows this distribution. So all we have to do is compute the CDF

of X, but that’s actually pretty easy. Because by definition, X is F inverse

of u, I’m just plugging in what X is. Now let’s apply capital F to both side. So I am just putting F here and F here. And that’s an equivalent because I made

these nice assumptions about F that’s an equivalent to n,

u is less than and equal to F of x. You know if we didn’t have

an increasing function then if I apply both sides by minus one then

the inequality flips, things like that. But since we have an increasing

function then It’s preserved. And since its invertible, this is really the same event

just written in a different way. Now we are done with that, because what

the probability that u is less than or equal to F of x. I’ll just draw a simple little picture. U is uniform from 0 to 1, and F of x,

remember that’s a probability, so that’s just some number between 0 and

1, let’s say it’s there F of x. Now I said that probability is

proportional to length for a uniform and in this case that proportionality

constant is just 1 because the length of the whole interval is 1. So for uniforms 0 and 1, the probability

of an interval is its length. So we want to know, what’s the probability

that u is between here and here. That’s just the length of

the interval that’s F of X. And that’s the end. That’s the end of the lecture. So have a good weekend. Thanks.