Lecture 12: Discrete vs. Continuous, the Uniform | Statistics 110

# Lecture 12: Discrete vs. Continuous, the Uniform | Statistics 110

So, the main topic for the next couple
lectures is continuous distributions. We’ve learned about the binomial,
and the poisson, and the hypergeometric, and so on, and
at this point we’ve covered all of the famous discreet distributions
that we need in this course. And no now is a good time to start talking
about the continuous distributions. I like to do discreet before continuous, because conceptually it’s
simpler to think about discreet. But it doesn’t mean that
continuous is harder, necessarily, because discreet is kind of
conceptually easier in a sense. But, on the other hand, we have all
these nasty sums that come up, and so we learn some ways to sometimes
avoid the sums using stories, and so on, but sometimes you just have
the sum you can’t deal with. The continuous case, we’ll be doing
integrals instead of sums, and even though this sounds counterintuitive, in general,
it’s easier to do an integral than a sum. Although the same thing could come up, we could be faced with integrals
we don’t know how to do. So again, we’re gonna try to look for
kind of more clever, and more conceptual ways to avoid having
to do lots and lots of integration. But anyway, we’ll come to that later. But, a lot of the ideas
are completely analogous. So, at this point, I’m assuming you
have a pretty good understanding of what a PMF is, and
what is a discreet distribution? What does it really mean, and the expected
value of a discreet distribution, and now we’re just gonna move
into the continuous case. So, I think, and just for
having a big picture on this, it helps to just kind of
contrast the two things. So I’m gonna make kind of a dictionary
of discrete world and continuous worlds. So we can put discrete world over here and
continuous world over here. So we have a random variable
that we’re looking at, and usually we’ve been calling our random
variable x in the discrete case, and usually we’ll call it
x in the continuous case. So, so far it’s completely analogous. We got discrete, continuous. Now in the discrete case, as you’re very
familiar with by now, we have a PMF, Which you can just think of as the P(X=x),
viewed as a function of little x. So if it takes positive integer values,
then I would need to specify this for all positive integers x. In the continuous case, the [P(X=x)=0]. So in that case we have a PDF instead,
which usually we would write as f(x), but
you can call it whatever you want. I’ll call it f sub x (x) just to
emphasize that this is the PDF of x. So I’m gonna tell you what a PDF is,
but I’m just telling you now, that it’s analogous to a PMF. The reason we need this
is that the [P(X=x)=0]. So continuous, it means we’re thinking of
random variables that could take on any real value, or
maybe any real number in some interval. So say we had the interval
from zero to one, and X is allowed to take on real
number value between zero and one. Well, I mean we could make up examples
where this not true, but in the continuous case every, there are uncountably many
real number between zero and one, and any specific number like Pi
over four has probability zero. So if we just try to write down a PMF
we would just say it’s zero, and that would be useless. So that’s why we need
something else instead. So, I’ll tell you what a PDF is,
but that’s the analogy. Just to continue this a little more, then I’ll start telling you
more about what PDFs are. We have a CDF. That’s this function F(x)=P(X=x), and sometimes we’ll subscript the x just
because maybe if we add another random variable y, we could write F sub y for
its CDF, okay? Well, in the continuous case we
have a CDF, exactly the same thing. So that’s one advantage, and
we’ve seen in the discrete case, usually it’s easier to deal
with the PMF than with the CDF. The CDF in the discrete case is a lot like
a step function with all these jumps. It’s not so easy to deal with,
and this is much more direct. But one virtue of CDFs is
a CDF is completely general. So every random variable has a CDF, and so
we don’t need to separate out the theory. Now, let’s talk about PDFs. So, now this is a PMF. So the PDF is the most common way to
specify a continuous distribution. PDF stands for
probability density function, not portable document format,
probability density function. Okay, so the keyword here is density. The common mistake with PDFs is to
think that they’re probabilities. It’s not a probability,
it’s a probability density. So you can think of density,
just in an intuitive sense as like, think of probability as mass. Remember if the pebbles with
the total mass equals one? But in the continuous case we can’t think
of pebbles any more, it’s more like we just have this kind of massive of mud
that we’re smearing around the space. So I think of discrete as pebbles,
continuous as mud. The total mass of the mud is one,
and density, makes you think of mass per volume, mass per area,
things like that, mass per length. Okay, so it’s probability per something,
but not probability. So we say that x, this is a definition. A random variable, X, has PDF F(x) if in order to find probabilities for X we can achieve that by integrating the PDF. So, if the probability of
let’s say X is between a and b, that is X is in some interval a b, must be given by the integral
from a to b of f(x)dx. For all a and b. So, f(x) is not a probability, it’s
what you integrate to get probability. Integrated density,
then you get a probability. So that’s the definition, and let’s see
how this relates to CDF and other things. Notice, by the way, that if we let a=b, Then we’re integrating from a to a,
of f(x)dx. So that’s the area under the curve
from a to a, which is zero, cuz you haven’t actually specified
any interval, so that’s zero. Which agrees with what I said there, the
probability of any specific point is zero. We need an interval of non-zero length. Okay, so that’s called a PDF,
and to be valid, remember, for a PMF, I said that a PMF is valid
if they’re non-negative and they sum up to one, right? So by analogy, for
a PDF we want them to be non-negative, and rather than summing to one,
it should integrate to one. Okay, so to be valid,
f(x) is greater than or equal to 0, and the integral of f(x) for
minus infinity to infinity should equal 1. Otherwise, we have not
specified a valid PDF. So it might look something like, just to draw a picture,
an example, maybe a PDF. A famous example would be the bell curve
type of thing that we’ll get to later. But anyway, for
the purpose of the picture, I don’t really care exactly what
the definition of this function is. But I’m just drawing some curve
from minus infinity to infinity. Now it might be that it’s
0 on the negative side and only positive to the right or whatever,
but it’s some continuous looking curve. And the total area, if I shaded
the whole area under this curve, I would get 1, right? And so larger points where the density,
I drew a symmetric one, but it doesn’t have to be symmetric,
it could be some nasty looking curve. As long as it’s non-negative and
the area under the curve is 1, those are the requirements. So to kind of interpret a little more
of what does the density really mean? Cuz I said it’s not a probability. If we take f(x), let’s say, at some
point x0, what is that really like? If we take some point x0 here and
we say the density is this number. What does that mean? It’s possible that this
number is greater than 1, for example, because you can have a function
that sometimes is greater than 1, but the integral could still be 1, right? So we can’t say that’s a probability, but
what we can say is, so this is a density. So if you think of it as like
probability per unit of length, then if we multiply by some small number,
let’s say, epsilon is approximately
the probability that x will fall in an interval
of length epsilon. Let’s call this, let’s say,
x0- epsilon/2, X0 + epsilon/2. So all I did was take x0. The probability of the random variable
exactly equaling x0 is 0, okay? But if we take some tiny,
so for epsilon, very small. So the probability is 0 of it equaling x0,
but we take some tiny little interval around x0, I just wrote
down an interval of length epsilon. Then the probability is approximately the
density times the length of that interval. But by multiplying by this epsilon here,
we’re kind of converting it back into a probability scale instead
of a density scale. And to see, so this is kind of a good
intuitive way to think of a density. But I haven’t yet shown you why is that equivalent to this
mathematical thing that I wrote here. But to see why this is true,
just by staring at this, If we wanna find the probability,
then what do we do? We integrate the pdf from here to here,
right? So imagine this integral where you’re
integrating from here to here, okay? And then let’s think about
what would that integral be. Well, I didn’t just say epsilon was small,
I said epsilon is very small. I could have said very, very small. Now if epsilon is very,
very, very small, what that means is that in that tiny little
interval, f is not gonna change very much. So over that tiny miniscule interval, we can treat this function as
being approximately a constant. And it’s easy to integrate a constant, the integral of a constant is just the
constant times the length of the interval. And that’s all we did, so we’re treating
it, if it’s approximately this constant on that interval times
the length of the interval. So, that’s why this follows from this. And this is more useful for driving
things, but this gives you some more intuition on what’s the difference between
a probability density and a probability. Okay, so, let’s see how is
this thing related to the CDF? So if x has PDF, little f, let’s find the CDF. Well, by definition, the CDF is
the probability, That x is less than or equal to little x, but by definition,
I said the definition of a PDF is that’s the thing that you integrate
to get probability, right? So if I wanna know what’s
the probability that x is in any region, all I do is integrate
the PDF over that region. So here, it would simply integrate
from minus infinity to x of, I could call it f(x)dx, but it’s a little
bit clearer to change the letter, so f(t)dt. t is just a dummy variable here. I just didn’t want it
to clash with this x. That is, for any particular number x,
we’re gonna take this curve. Let’s say x is here. If this x is this x we’re looking at, then we’re saying just look at the area
under the curve up to this point. That would give us the CDF at that point,
all right? Because we wanna know the probability
of everything to the left, and probability is given just by
taking area under this curve. So it’s just the area under the curve
from minus infinity up to x. That’s all we’re doing. So that shows how to get
from a PDF to a CDF, okay? Well, what about the other way around,
if we have a CDF, how do we get the PDF? So go the other way around,
if x has CDF, capital F, and of course, we’re assuming it’s a continuous
random variable, not a discrete one. So in the continuous case, by the way,
the terminology, it could be slightly confusing because when we say we
have a continuous distribution, it means capital F should be continuous. But we don’t just want
it to be continuous, we want it to be differentiable. So the continuous refers to not so
much to F being a continuous function. It refers to the fact that x can
take on a whole continuum of values, rather than just discrete values, okay? So if it has CDF F, and
x is a continuous random variable, and then we want to get,
From the CDF to the PDF, So f(x)=, so let’s think about that. This is the relationship between a CDF and
a PDF, okay? And but now I wanna say, if we know this
integral, how can we extract out this? Well, the answer is just take
the derivative, right, f(x)=F'(x). And why is that true? Your favorite theorem of calculus, that’s
the fundamental theorem of calculus, FTC. Actually, we’re gonna need both parts of
the fundamental theorem of calculus, so it’s nice that actually that
it is pretty fundamental. At least the way I learned it, part one of
the fundamental theorem of calculus said if you have an integral that looks like
this, up to some indetermined upper limit, if you take the derivative of that,
then you just get this function. So that’s the first part of
the fundamental theorem of calculus. The second part of the fundamental
theorem of calculus says that if you wanna do a definite integral, you find anti-derivative and
then evaluate it at the two end points. So okay, this is just saying
the derivative of the CDF is the PDF, in the continuous case. So it’s a very straightforward
relationship between them. And if we wanted to know, this also kind
of confirms something we did earlier. Let’s say we wanna know the probability
that x is between a and b. And in the discrete case it’s crucial
whether less than or equal, and so on. In the continuous case, it makes no
difference if you write strict or not strict here. So according to the definition of a PDF, if we wanna get the probability of this
interval that x is in that interval. All we do is integrate
be, remember your fundamental theorem of calculus, and
the notation matches up pretty well too. Because, like in AP calculus, usually if
you have a function little f, usually it will call its anti-derivative capital F,
which is exactly what we’re doing here. If we wanna do this integral,
we take some anti-derivative. Well, we already have one, that’s the CDF. And then we evaluate here, evaluate there. So that’s just F(b)- F(a). So that’s also true by
fundamental theorem of calculus. And that’s similar to a result
that we had earlier for CDFs, so it’s consistent with earlier stuff, okay? So we’ll do some examples
in a little while. But right now this is just the general
framework, and making the analogy, okay? So we have a CDF, and I’ll just add here,
just to have it in this dictionary too. That’s the derivative of the PDF is
the derivative of the CDF, question?>>[INAUDIBLE]
>>Yeah, and the question is, in this framework,
is big F always differentiable? Yeah, we have to assume
that it’s differentiable. I mean, there are functions
that are continuous, but not differentiable everywhere. But in that case it would just
be a more complicated thing. And when we say continuous random variable
in this course it means we have a CDF which has a derivative. Because if we don’t have a PDF then we’re
not dealing with continuous distributions and things can be much nastier. So yeah we’re assuming that
this derivative exists. Okay, so that’s, So
in general if I ask you, find the distribution of whatever,
in the continuous case, in the discrete case,
you can either give the PMF or the CDF. Those are equally valid ways
to describe a distribution. In a continuous case, you can give the PDF
or the CDF, those are equally valid ways. Okay, so let’s continue this list. In the discrete case,
we have the expected value, right? And remember the expected value,
we just take the sum of the values times the probability
of the values, okay? So in the continuous case, this would just be 0 because all of
these are 0, so that’s no useful. But by analogy, instead of a sum,
we’ll do an integral. S the definition of the expected
value in the continuous case is that we integrate x times the PDF. So it’s completely analogous. In general we’re gonna integrate
from minus infinity to infinity. And sometimes we’ll deal with random
variables where the only possible values are say between 0 and 1. And in that case we’re just integrating
0 outside of that interval. So then we would restrict it to
the region where it’s non-zero. But in general that’s best the definition,
okay? So that’s completely analogous. Let’s do one more concept that applies in
both the discrete and continuous cases. And that’s the notion of variance, so
we’ve been talking about expected values. But that’s just giving a one number
summary of the average, right? But it is not telling us anything about
the spread of the distribution, right? How spread out is it? So for that, we need the idea of variance,
and the definition of the variance. So intuitively, variance is just supposed to be a measure
of how spread out the distribution is. That is, on average,
how far is x from its mean? So we might start by trying to do
the expected value of x minus the expected value of x. Here’s the mean. This is the difference between x and
its mean. But if we just did this though,
we would always get 0, though. Because by linearity,
that’s E of X- E of E of X. But E of E of X isn’t E of X. Because it’s just a constant, so this would be useful because
that would just be zero. Okay, so then I guess the most
obvious thing to me to do to fix that problem is to
put absolute value signs. Because then we’re making
it non-negative and then there won’t be 0 anymore,
except if x is a constant. But absolute values
are annoying to deal with. For example, the absolute value function,
it’s this V shaped thing right? It has a sharp corner,
it’s not differentiable. It is difficult to work with. So the standard way to deal with this is
instead of absolute values, to square it. One reason as I said is
that the absolute value is just annoying because
it’s not differential. Kind of a deeper reason though is that
the square, anytime you see squares, it should start to reminding you
of the Pythagorean theorem, right? It means that there’s a lot of geometry, there’s a lot of beautiful geometry that
goes on with squares, and sums of squares, and right triangles, and
Euclidean distance, and things like that. And you lose that geometry if
you’re using absolute value, and there are other reasons as well. But anyway this is the standard
definition of variance. So this is on average, how far is x from
its mean, except that we’re squaring it. One annoying thing about squaring it
though, is that we changed the units. So if x is like a measurement that
let’s say it’s measured in miles. We are measuring some distance
in miles and we square it, we’ve got miles squared, okay? And so that’s no longer in the same
units as what we started with. So because of that, something more
interpretable is the standard deviation, Which is a familiar term. Standard deviation is defined as just
the square root of the variance. So this seems, at first, like a kind
of convoluted thing to be doing. First we square everything then we
take the average then we square root it back again. The reason is that the variances
has really nice properties, but on the other hand we changed the units,
so we just change it back at the end. So that’s the definition
of standard deviation. In general, variance is a lot nicer to
work with than standard deviation as far as doing the math. But then at the end of the day when you
want to have something interpretable. It’s easier to think about what
the standard deviation means, because you’re back on the original units. Okay, and let’s just write one. One nice thing about this letter E
notation, this is a really good notation. E for expectation. Because I could just write
down this one thing and I didn’t assume here that X is
continuous or discrete or anything. This is just a general definition and
I didn’t need to write a separate definition for
the discrete or for the continuous case. So this is a unified definition. Let’s just write the other. Another way to compute variance,
rather than doing this, so another way to write variance which
is more commonly used than this one. This one’s the usual definition but the
other way to write it which I’m about to show you is usually easier for
computing it. Not always, sometimes this one’s easier. Another way to express variance. So we want the variance of X. Let’s just expand this thing out. I’m just gonna multiply it out, right. So that’s X squared. -2 X(EX) I’m just squaring this thing + (EX) squared. And let’s use linearity this
is E(X) squared, minus. Now for this middle term, the 2 is
a constant and constants can come out. The E(X) is also a constant, right? X is a random variable,
E(X) is just a number. The 2 E(X) is just
a number that comes out. So that’s 2 E(X), and
then what’s left inside is still an E(X). So we have another E(X) there, and
then plus, this thing is also a constant. So taking its expected value does nothing
because it’s just a constant already. So that’s + E(X) squared, and so this whole thing just becomes
E(X) squared- E(X) squared. And it sounds like what I just said was 0,
but the parentheses are different. Here we square it first
then take the average. Here we take the average then square it. We take that difference, okay? So that’s usually easier. And so that answers the age old question,
if you had, this question came up for me I think in seventh grade
science class where I had to, do a bunch of experiments and
then I got a bunch of numbers. And for some reason I was squaring
the numbers and I wanted the average. I didn’t know whether I should
square first and then average or average first and then square. And I think I computed both ways and
I got a slightly different answer. Which one is correct? Well, this one doesn’t say
which one is correct, but this says that this one will always
be bigger than or equal to this one. And equality holds only in
the case when X is a constant. So if X is a constant, then the variance
is 0 cuz X just equals its mean obviously. If X is not a constant, than what’s
gonna happen is that this thing, that you’re averaging some
numbers that may be sometimes 0. But it’s certainly sometimes positive,
and you can’t average positive things and get a negative number. You can’t average positive things and
get 0. So it would be strictly positive. Which means this is
strictly greater than this. Except in the case of a constant. So, okay, that’s the variance. And as far as notation,
it’s standard to write E(X) squared for E(X) squared this way. That’s just standard notation. So, if you see E(X) squared, you should
always interpret that as squaring first, and then take the E. That’s just a convention,
a pretty standard convention. This way is a little more clearer,
to avoid any possible ambiguity. But, it’s very common to
see it written this way. So interpret it as squaring first. Okay so that’s variance, and over here
we can continue our little dictionary, variance of X=E (X squared)-
E (X) squared the other way, And then the continuous case,
same thing again. And the one difficulty with this is,
we’ve been talking on how do we compute E(X), but
how do we actually compute E(X) squared, that’s the question that
we need to address. How do you actually compute that thing? So we’ll talk about that a little later. But first, we should see at least one
example of a continuous distribution. The simplest one to start
with is called the uniform. As far as what you’ll
need before the midterm, there are only two continuous
distributions that you need to know by name before the midterm,
and then we’ll do more later. One is the uniform,
the other is the normal. Uniform is the simplest
mostly for next week, and normal distribution is the most famous important
distribution in all of statistics. And the reasons why it’s so important will kind of gradually
emerge over the semester. Let’s start with the uniform. So here’s the uniform distribution
on some interval on (a,b). So we have some interval from a to b. I’ll say here’s a, here’s b. We wanna pick a random
point in this interval. I’ll put random in quotes. In this interval. How do we do that,
the question is what does random mean? If it’s sort of intuitively
random is too vague because that just means we have some
random variable okay? What if we said completely random. Like what’s the most
random that it could be? Again that’s a little bit vague but let’s just kinda explore that intuition a
little bit and then write down a formula. If it’s completely random see I can
just see the probability of any two points is the same because all real
numbers between here and here. Every individual number is probably 0. So it’s not so interesting to say
all the probabilities are the same. So pick some random point say,
there, x but the problem if I got that
exact value right there was 0. Okay so that means we still have
the same way it does mean for it to be completely random. So well the intuitions
now is suppose we broke this interval into two halves
where this is the midpoint say. Intuitively, if it’s completely
random it should be that this half is equally likely as this half. Cuz If it were not then it seems like we
would kind of prefer to be, you know, the random variable prefers to be
more to the right than to the left. And somehow we want a concept
where it’s not gonna, it doesn’t care where it is, right? So in other words we could
say that probability, so for the uniform means that probability
is proportional to length. That’s a reasonable definition. That is, if we take two
intervals of the same length, they should have the same probability. If one interval is twice as long, it seems reasonable that that
one should be twice as likely. So we’re just gonna write down
a continuous distribution where probability is
proportional to length. And so to specify this, we can
either write down the pdf and drive, the cdf, or we can try to figure out what
the cdf should be and derive the pdf. Let’s start with the pdf here,
because we’re trying to practice pdfs. So here’s the pdf, it’s a constant. If x is between a and
b and it’s 0 otherwise, Because I want probability to
be 0 outside of this interval. Inside that interval I want the density
to be constant, because if the density were higher at one point than another,
that doesn’t seem very uniform. So well of course, we could ask, what’s c? Well it has to be that
the integral of the pdf is 1. And I could start out by integrating
from minus infinity to infinity, but of course we only need to integrate from
a to b, cuz it’s zero outside of there. If we integrate this we have to get 1, therefore c equals1 over b minus a. So it’s just one over
the length of the interval. It has to be this way otherwise
this would not be a valid pdf. Now suppose we want the cdf. So to get the cdf we just have to integrate this thing
minus infinity up to x. So how do we do that? Again, we don’t really have to go
all the way from minus infinity we can just start it at a. f of t dt,
then we have to consider some cases. Well, first of all this
is 0 if x is less than a. Well, this expression here that I wrote
down is kind of already assuming that x is greater than a. So assume x is greater than a. If x is less than a, then the probability
is 0 so the cdf has to be 0. And we also know that it’s 1 if well,
let me just write this separately. So here’s the cdf,
if x is less than a it’s 0. If x is greater than b it’s 1, because
we know for sure that x is less than b. Now, the interesting case is
what happens in the middle. To get the thing in the middle, all we
have to do is integrate a constant and this is the constant with the integral so
that I plug in f of t equals c here. That’s gonna be c times x minus a. Right just integrate the constant. It’s a very easy integral. So that’s just gonna be x
minus a over b minus a. If x is between a and b. And notice this makes sense because if
we let x equals to a here it reduces down to 0. And if we let x equal
to b it reduces to 1. So this is a continuous function. So it’s saying intuitively,
this is a linear function of x. They’re saying that the probability is,
as you increase x, the probability is increasing linearly. Which make sense, cuz you’re
accumulating more and more stuff. So let’s get the expected value of x. Expected value of x, again,
it’s just gonna be an easy integral. Because we just have to
integrate from a to b of x times the pdf. So I just wrote down x times the pdf. So integrating x is easy,
it’s just gonna be x squared over 2. So this is x squared
over 2 times b minus a. And we evaluate this
as x goes from a to b. So that’s really just b squared. Let’s factor our the 1
over two 2 b minus a. And then it’s b squared minus a squared. But b squared minus a squared
is b minus a, b plus a. So we can actually cancel the b minus
a and we just get a plus b over 2. Just doing that easy integral. Well, that’s a very intuitive answer. That’s just the mid point. It says the average is in middle which
it would really be weird if that didn’t happen cuz this is supposed to be uniform. Okay, so that was just check that. Now, we have a bit of a quandary though. For how to deal with the variants. So let’s try to get the variants. So If we want the variants then that
means we need e of x squared. Because we know this part
we don’t know this part. How do we get rid of x squared? Well, E of x squared, Equals? So if we think carefully about
this how do we get E of X squared? Well, X squared is a random variable. Let’s call that thing Y. So let’s let Y equal X squared. If we take a function of a random
variable it’s a random variable. So Y equals X squared. So that’s E of Y. And how do we get E of Y? Well to get E of Y then
we need to know the pdf, assuming X is continuous right now. To get E of Y then we need
to know the pdf of Y and then we integrate Y times the pdf of Y,
it’d be Y. So the question is do
we need the pdf of Y? But that sounds kind of annoying
because we don’t know the pdf of Y. Now we can get the pdf of Y, and later in
the course we will talk about how do we get the pdf of Y, but right now that’s
seems like a pretty annoying problem. So let’s kind of do this
more carelessly instead. Let’s just say well it’s too
much hustle to get the pdf of Y. So instead I’m just gonna say
I’m gonna reason by analogy. And I’m looking at this
formula right now for E of X. But I don’t want E of X. I want E of X squared. So I’m just gonna change
that to an X squared. All right, I want X squared, not X, so
I’m just gonna put down X squared there. And then I’ll go f of x dx. That’s the pdf of X, that’s what I know. And I’m too lazy to find the pdf of Y,
so I’ll just change X to X squared. Well, that doesn’t sound very legitimate. This, what I just did is called the Law
of the Unconscious Statistician. Which has a nice acronym
that’s just LOTUS. It’s called that because that just
seems like if you’re kind of like half asleep and
you just want to find this thing and you just kind of replace X by X
squared because X squared and it seems like something you might do
if you’re not thinking very hard. So to state it in general
in the continuous case, we want the expected value
of a function of that. X is a random variable who’s PDF we know. We want the expected
value of a function of X. So, the Principled Approach would be, find the distribution of this and
then work with that. The Lazy Approach would be,
still use the distribution of X but that sounds kind of too good to be true. So the Lazy Approach here would be well, I’m gonna take g of X I am
gonna change big X to little x. And then I am still gonna need
insist on using the density of x and not convert anything. Well, this turns out to be true. So I’ll put a box around it. We can talk sometime next week
about the proof, why this is true. But this turns out to be true. And thus, even though it sounds too
good to be true, it actually is true. So that’s called LOTUS. This is the continuous version. In the discrete let me
write both versions. So a continuous LOTUS is
that thing I just wrote, we have LOTUS so
same equation you can copy that there. And let me just write the discrete case, again we want the expect value
of some function g of S, so all I’m gonna do is take this. This is the definition
of the expected value. All I’m gonna do is change X to g of x. So this is gonna be g of
x times the PMF of x. It says we don’t need to convert and
get a distribution for g of x. We just do that. This is also valid. We’ll talk more about why later. But it’s useful to know that right now. So coming back to this
problem about the uniform, if we want the variance of the uniform so let’s let, just for simplicity let’s
let u be uniform between 0 and 1. And suppose we want the variance. So we know the expected value of u,
Is one-half, just the midpoint. And if we want E of u squared. According to LOTUS, we don’t need
to first find the PDF of u squared. We can just directly write down
the integral 0 to 1 of u squared times the PDF times the PDF f
sub u of u du as the PDF, but this PDF is actually equal to a constant
and that constant is 1 in this case. So this is just equal to,
this part is just one. So it’s the integral of u squared, the u, u cubed over 3, which is 1/3. So therefore the variance
of u equals e of u squared, minus e of u squared the other way. And that’s one-third minus
one-quarter equals one twelfth. So the variants of a uniform
zero one is a one twelfth and that was a very easy calculation
because we were able to use lotus here, which we haven’t proven yet but
we will talk more about that later. I’m showing you how to use it right now,
then we’ll justify it more. So that thing that’s too good
to be true actually works. So that’s Lotus. One more thing about
the uniform distribution. It seems like the uniform is the simplest continuous distribution that
you could possibly imagine. Because the PDF it’s just a constant. On some interval and one other point about this is we have
to have some bounded interval here. We cannot define a uniform
distribution on the entire real line. Sometimes that it’s a bit
annoying if there isn’t one. But if the whole real line there
will be no way to normalize it, there’d be no way to find a constant and
make it disintegrate to one. So it sounds like this is
an extremely simple distribution. And it is,
it’s just constant PDF on some interval. Extremely easy. So start with the uniform zero one,
it seems very simple, but actually uniform zero one
has the property that if you give me one uniform random variable
and you’re interested in some other distribution, there is a way to
convert it and simulate that. That is from the uniformed
zero one you can simulate or generate from any distribution
no matter how complicated it is. At least in principle. As a matter of computation that may be
easy or hard, but in principle from the uniform you can get anything, so
I call that universality of the uniform. Universality of the uniform means
that given a uniform you can create any distribution that you want. So that’s kind of theoretically nice
in that it kind of unifies concepts and says this things that’s seems very,
very simple to just one uniform. You can actually use it to generate
something that’s as complicated as you want. That’s kind of cool but it’s also useful
in practice where most computers programs can generate random numbers between zero
and one, it’s actually pseudo random. But they not know how to generate
whatever complicated distribution you’re interested in. And this, in many cases gives you away to convert from the random uniforms
to whatever you want to simulate. So I want to show why that’s true. So the statement is that,
we’re gonna start with the uniform between 0 and 1 and let F be a CDF, that we’re interested in. So usually we’ve been talking about here’s
the random variable and then find in CDF. Here we’re going the other way
in the sense that we assume that we have some CDF
that’s of interest to us. But we do not yet have access to
a random variable that has that CDF. So let F be a CDF and
it’s possible to generalize this further. But to make this something that we can
do fairly quickly let’s assume that F is strictly increasing, so
we don’t have to deal with flat regions. And let’s also assume that F
is continuous as a function. Just so that we don’t have to
think about jumps right now, although you can generalize this. Now then the theorem says that if we let X define X to be F inverse of u. So the inverse function exists in this
case because I took something that was continuously and strictly increasing,
it will have an inverse. So we take the inverse and we plug in u. Then the statement is that X
is distributed according to F. That is the CDF of X is F. So what this says is we have
this CDF we’re interested in. We take the inverse CDF, plug in
the uniform, and then we’ve constructed a random draw from that distribution
we’re interested in, capital F. So let’s prove this very quickly. And the proof doesn’t require
anything fancy at all. It doesn’t require anything, except for
understanding what a CDF is. So another reason I like to talk about
this is it’s just good practice with really understanding what a CDF is. Cuz the better you understand CDFs, then
the easier it is to see why this is true. So to prove this, all we need to
do is to compute the CDF of X. This notation means that X has the CDF F,
that is, X follows this distribution. So all we have to do is compute the CDF
of X, but that’s actually pretty easy. Because by definition, X is F inverse
of u, I’m just plugging in what X is. Now let’s apply capital F to both side. So I am just putting F here and F here. And that’s an equivalent because I made
these nice assumptions about F that’s an equivalent to n,
u is less than and equal to F of x. You know if we didn’t have
an increasing function then if I apply both sides by minus one then
the inequality flips, things like that. But since we have an increasing
function then It’s preserved. And since its invertible, this is really the same event
just written in a different way. Now we are done with that, because what
the probability that u is less than or equal to F of x. I’ll just draw a simple little picture. U is uniform from 0 to 1, and F of x,
remember that’s a probability, so that’s just some number between 0 and
1, let’s say it’s there F of x. Now I said that probability is
proportional to length for a uniform and in this case that proportionality
constant is just 1 because the length of the whole interval is 1. So for uniforms 0 and 1, the probability
of an interval is its length. So we want to know, what’s the probability
that u is between here and here. That’s just the length of
the interval that’s F of X. And that’s the end. That’s the end of the lecture. So have a good weekend. Thanks.