Continuous Distributions and the Uniform Distribution

Continuous Distributions and the Uniform Distribution


This is a video on continuous This is a video on continuous probability distributions I will start out today by talking about the definition of a continuous random variable. Then we will move on and I will talk about the most important, maybe not the most important but the simplest of all distributions for continuous random variables. And that’s called the Uniform distribution. After going over the Uniform distribution We will look at probabilities that involve the Uniform distribution. Then we’ll move on and we will say, how we find a quartile? How do we find the first or the third quartile? Or how do we find a percentile in general if we know that a variable follows a Uniform distribution? And then finally we will go over the mean, the standard deviation, and we will use those to find the z-score when we know that a continuous random variable follows a Uniform distribution today won’t be that long but it is pretty important and it’s going to set the stage for what we will be doing for the rest of the class and particularly for Normal distributions which are a little more complex than the Uniform distribution, but they are the important distributions for continuous random variables. Now we talk about what a continuous distribution is. The equation for a continuous probability distribution is called the probability density function remember that a variable is a continuous random variable, that means it could come up and be any number within an interval. That interval could either be like between zero and one or can even be between negative Infinity and Infinity. So we’re not talking about discrete random variables anymore where we only had maybe 10 different possibilities or only the whole numbers as possibilities. Now the possibilities are all numbers on the number line or all numbers between two fix points. There will be an infinite uncountable number of answers that you can get when you are selecting your random variable. For the function, now we can’t draw a table anymore because it’s just an infinite number to draw on the table, we can’t even use “…” notation. But what we can do is we can sketch a graph or we can show an equation that will represent that probability density function and it will look like some picture like this where you have a curve above the x-axis. I’m going to define the cumulative distribution function as the area to the left of some value. So here’s a value b. The cumulative distribution function is the area to the left of b above the x-axis and below the curve. On the other hand we will say the probability that x is between a and b, so it is greater than a and less than b, will be the area above some value a that might be here at a and below some value b under the curve and above the x-axis. by the way if you want the probability the x is greater than b that would be this white area or the area to the right of the value b below the curve and above the x-axis. Now let’s look at some properties of continuous distributions the first property is that the probability density function can never be negative. It can be zero at some values, but it can never be negative. It just doesn’t make any sense when we’re talking about probabilities to talk about negative numbers. Probabilities being negative never happen. We’re not even going to go there. The probability density function can be zero or positive. The total area bounded by the probability density function and the x axis is always equal to one. We will be using that over and over and over again. That makes sense because that says the probability of something happening is 100%. If you throw a dart at a number line then the probability it will hit somewhere between negative infinity and infinity is 1. that makes sense because that’s everything. but there’s one thing that you may feel that’s true that’s not. That is the probability density function can reach values greater than one that feels wrong because we learn for discrete probabilities that you can never have values greater than one for discrete probabilities. But for continuous distributions you can. That’s because we’re talking about the area under a curve. For example, if we’re only between 0 and 1/2 you have to get bigger than 1 because if you stay under one the the area between 0 1/2 would be less than 1/2, but the area for all values must be equal to 1, and you can’t be restricted to 1/2 so it will have to average at 2, for example. So don’t be surprised if a probability density function gets higher than one. Now I have an example. The number of seconds after the exact minute that classes end follows a Uniform distribution. The graph below shows this distribution curve. Find h. A Uniform distribution means that it’s the same for all values so when a function is the same number for all values of x, then the graph of that function is a horizontal line. If we’re talking about the seconds after the class ends, that seconds hand can only be between 0 and 60 because when you hit 60 you start over again at 1 second, for example, or 1/2 second. So this Uniform distribution is between zero and 60. We want to find out what the height is. We will use the big formula that states that the area under the entire curve must be equal to 1. So we can say that the area is 1. Because we have a Uniform distribution we’re talking about a rectangle here. Areas are hard in general to compute, but the area of a rectangle is very easy to compute. If you remember from your basic basic geometry the area of a rectangle is the base times the height. The base times the height is 1. The base of this rectangle we can compute as 60 or 60 – 0 if you want. That’s the base of this rectangle. The height of this rectangle is h, so we can say 60 times 8 equals 1. Now we get to do some very basic algebra. Don’t you wish this is the hardest problem you would ever get in algebra, but that wasn’t true in your algebra classes, but for us you don’t have to do some real difficult algebra just 60 times h equals 1. To solve that we just divide both sides by 60. and what we get is h equals 1/60. And there’s the height of the rectangle. Now let’s use that to answer some questions. Let’s find the probability that a randomly selected class will end with seconds hand between 10 and 30 seconds. I will start by drawing this picture. Again, we have this rectangle between zero and 60 but we also have another rectangle between 10 and 30, because remember the probability that x is between 10 and 30 is the area under the curve or in this case the area under the straight line segment between 10 and 30 above the x-axis, and that area is the probability that x is between 10 and 30 which is a rectangle and areas of rectangles are again the base times the height. The base of this rectangle is 30 -10 and the height of the rectangle we’ve already found was 1/60. Now we just do the arithmetic. 30 – 10 times 1/60, 30 – 10 is 20 20 divided by 60 is the same as 2/6 which is 1/3. So we can say that the probability that the seconds hand will be between 10 and 30 is equal to 1/3. Let’s look at another example. Let’s find the first quartile for the same question, and by the way, this notation X~U(0,60) X means the variable x. The ~ means has the distribution. U means Uniform. 0 means with a low value or the lower bounded at 0. and 60 in this case means the upper bound is 60. The question is “Find the first quartile.” The first quartile means that we will pick this value right here which we call Q1 and we know that the area to the left of Q1 must be 25% because the first quartile is the 25th percentile. So the area is 0.25. So let’s set that up recalling that the probability that x is between 0 and this first quartile which I will call Q1 is equal to the area. But the area again is 0.25, but since we have a rectangle the area is the base times the height. The base of this rectangle will be Q1 – 0 and the height is 1/60. We are talking about the same problem with the seconds hand. Now we just do a little algebra. Q1 – 0 is Q1 and I multiply left and right by 60. I get that Q1 is 0.25 times 60 and 0.25 times 60 is 15. Q1 is 15. That should totally make sense because a quarter of an hour is 15 minutes and a quarter of a minute is 15 seconds. That says that the first quartile is at 15 seconds. So from 0 to 15 seconds takes up 1/4 of a minute. Let’s take a look at another question. This is one of those that can be very difficult for you to wrap your heads around. we want to find out the probability that a randomly selected class will end with seconds hand at exactly 40 seconds. That could happen. It could happen that you’re in class, the professor says “The class is over”, you the look of the clock and that’s it exactly 40. Hopefully, you agree that’s a possibility. On the other hand, let’s find the probability. Again the probability is always area. In this case to be exactly 40 seconds means that it is greater than equal to 40 seconds but it is also less than are equal to 40 seconds. It can’t be bigger than 40 but it is also less than are equal to 40 seconds. It can’t be bigger than 40 and they can’t be less than 40 books between 40 and 40 and that’s the area under this rectangle. In this case when we the picture that is a line segment. and the area of a rectangle is the base times the height but the base is going to be 40 – 40 that base is just 0. The height is still 1/60. and 40 – 40 is 0 times 1/60 is still just zero. The probability that a randomly selected class will end with seconds hand exactly 40 seconds is equal to zero. That just feels wrong, but it is right. even though it’s possible for that seconds hand to be at 40 when class is over the probability is 0. that just feels weird. Here’s what going on. We’re talking about a continuous random variable. When you have a continuous random variable you have an infinite number of possibilities. and remember before, you thought about a probability being a number of ways of making you win divided by the total number of ways that something happen. Here there is only one way that can make you win and that’s exactly 40 but there’s an infinite number of possibilities between 0 and 60 seconds. One over infinity is kind of a weird thing. You may not have thought about it before but when you get into calculus,you realize that 1 over infinity goes to 0. It’s kind of a weird idea but it really is true that the probability that the seconds hand will be exactly 40 is 0 even though it can happen. It turns out that it makes life easier for us. When we’re dealing with probabilities with this little=below theyou don’t have to worry about it, because putting the equal there doesn’t change the probability. It allows us to be a little bit sloppy, which is a good thing. Let’s look at another example. Let’s look at kindergarteners. We will look at a class such that there’s a rule that says when you start kindergarten you have to be between five and six years old. That’s standard nowadays, it typically happens with exceptions, but I will say we will follow the rule. Suppose that the age of a kindergarten child when they first start school Suppose that the age of a kindergarten child when they first start school follows a Uniform distribution between 60 and 72 months old. 16 months is five years old. 72 months in six years old. It makes sense but it may or may not be true. When I say “suppose” it’s probably not really true, but just for the purpose of our class, we will suppose that it is Uniform that it isn’t like certain months of the year that are more likely than other month for children to be suppose that it is Uniform that it isn’t like certain months of the year that are more likely than other month for children to be born but we’re gonna say that all months are equally likely. All days are the likely. So that it really is a Uniform distribution. Let’s find the probability that kindergarten child who is older than 65 months will be younger than 70 months. It is a weird thing down to ask, but I am doing it so we can review a little bit about probabilities that involve given so these are conditional probabilities. I can draw the picture. Since we have a Uniform distribution, We have a horizontal line for the probability density function. We don’t know the height of this rectangle. I will call it “h”. but we do know that the base of the rectangle is between 60 and 72. Whenever we want to deal with anything with probabilities of continuous distributions that involve a Uniform distribution we need to start by finding this height. Let’s do that. So again, the area under a curve total, must be equal to 1. That was one of our big big properties of any continuous distribution. But the area is the base times the height because we have a uniform distribution. The base is just 72 minus 60. the height is h. We don’t know it. And the area is 1. But 72 – 60 is 12. So I will divide left and right by 12 and we get that h equals 1/12. There’s my height, h=1/12. Now I can continue on, and that try to figure out this probability. Remember, we wanted to find out what is the probability that a kindergarten child who is older than 65 months will be younger than 70 months. So we can write this is a conditional probability. The probability that x is less than 70 given that x is greater than 65. In a picture, and yes it’s very important draw the picture, if you’re in my class and you don’t draw the picture, you will lose points. And hopefully in other classes too because the picture is really important. We are given that this child is over 65 months. we want to find out what is the probability that they’re younger than 72. There are a few ways of doing this problem. The way I want to show you how to do this problem is by looking at the definition of conditional probability and that said that if you know the probability of A given B you take the probability of A and B and divide by the probability of B. A and B means we are both less than 70 and greater than 65. That means were between 65 and 70. So that’s the probability that x is greater than 65 and less than 70, so the probability that x is between 65 and 70. Divided by the probability of B or the probability that x is greater than 65. Now I compute both of these probabilities by finding the area under each of the curves. For the first one, the area under the curve between 65 and 70 is the base times the height. That base will be 70 – 65 The height will be 1/12 We get 70 – 65 times 1/12. For the probability that x is greater than 65 that will be the area of the rectangle greater than 65, that will be 72 – 65 times the height which is 1/12. And I divide. notice that when you divide, the numerator has a 1/12 and the denominator also has a 1/12. They actually cancel. We just get (70 – 65)/ (72 – 65). 70 – 65 is 5 72 – 65 is 7, So my answer is 5/7 to the probability that a kindergarten a child who is older than 65 months will be younger than 70 months is 5/7. I am done with this question. It is a little complicated, but if you remember conditional probability and you understand the probability for a Uniform distribution It’s not too bad. Now let’s look at one more example with this kindergarten class. Let’s find the 40th percentile. for this kindergarten class. Remember the 40th percentile, that means that the area or the probability to the left of the 40th percentile is 0.4. I sketch the picture and I say that the area to the left of some value x is 0.4. Then I go ahead and I use areas for a rectangle. The area again is the base times the height. The base here is x – 60. the height is 1/12. so x – 60 times 1/12 the equals 0.4. Now I need a little bit of algebra. To get rid of this fraction I multiply left and right by 12 and I get the x – 60 is 0.4 times 12 and 0.4 times 12 is 4.8. Now I just add 60 to both sides. and I get the x is equal to 4.8 + 60. So that x is equal to 64.8. So the 40th percentile is 64.8. That makes some sense that if children in kindergarten are between 60 months old and 72 and months old that the 40th percentile might be around 64.8 months. Whenever you have a distribution or any statistics in this class remember two really important concepts are the mean and the standard deviation. If X is uniformly distributed between a and b we will want some formulas for the mean and the standard deviation. The good news for the mean is that because when we have a Uniform distribution, Uniform distributions are symmetric, so the mean and the medium are the same. The medium is the halfway point. The way you find the halfway point is take an average. You add the two and divide by 2. So the mean is (a + b) / 2. The standard deviation on the other hand has a formula, but I am not even going to try to explain where it comes from. The standard deviation is the square root of (b – a) squared divided by 12. Don’t ask me where the 12 comes from unless you are really good at math, you have had some calculus, you really understand some heavy math I am not going to do that in this class because it’s a little bit difficult for what we will be talking about, But you should understand the formula. You know what a square root is. You know how to square numbers, subtract, you know how to divide by 12. So let’s use this to talk about z-scores. Let’s suppose that time between when a person arrives at a bus stop and when they’re seated on the bus is uniformly distributed on 2,30. It make sense. Sit down at the bus stop, even if the bus is there right away you still have to get up from your chair, get on the bus, put your money in the slot, and find your seat and sit down. That will probably take about 2 minutes. on the other hand you could get really unlucky and you wait and wait and wait and finally the bus comes. 30 minutes later, you’re sitting on the bus. Let’s suppose the time between arrival at the airport, so when you get to the airport and when you get seated on the plane is also uniformly distributed but now it’s 25 to 75 minute so if you’ve ever been to an airport you have to go through the checkin area. You need to get checked out to make sure you’re not a terrorist. They go and look at your luggage. They look at your carry on, you have to take your shoes off. You have to sit down. Then of course you always have to wait awhile so the least that it can take before you can get on the plane and sit down is 25 minutes. Lets suppose of the most it can take is 75 minutes and that it is uniformly distributed. That’s probably not exactly correct. Whenever I say “suppose” just believe me for the purposes of this class we will assume that’s true. The same thing with a bus, we will just assume that’s true. If it is anything other than Uniform, it’s more difficult and trust me, you don’t want to go there. We will stay with the Uniform distributions for both of these. First, let’s find the mean and standard deviations for each of these. It’s not too bad. We’re just going to have to follow the formulas Remember for the bus stop, X1 is uniformly distributed X1 being the bus stop: uniformly distributed with low 2 and high wait time 30. For the airplane X2 is uniformly distributed with the low wait time of 25 and the high is 75. Let’s start out with the mean. The mean is (a + b)/2 and for the bus stop that’s (2 + 30) / 2. That’s 32/2 or 16. For the airplane that’s (a + b)/2 again. that’s (25 + 75) / 2. or 100/2 which is 50. For the standard deviation Again we put it into the formula: the square root of (b – a) squared over 12 That’s the square root of 30 – 2 squared divided by 12. I put it into my calculator. I highly recommend you also try putting it into your calculator I put it into my calculator. I highly recommend you also try putting it into your calculator so you’re familiar with how to do that so that you can actually verify that you get about the 8.08. Similarly for the airplane the standard deviation is the square root of (75 – 25) squared divided by 12 and in the calculator you get about 14.43. Now, the second question is more interesting. Let’s suppose that it took Christine 26 minutes to get seated on her bus and it took Pedro 65 minutes to get seated on his airplane. you z-scores to determine whose wait time was more surprisingly long. We know that Pedro took longer than Christine but who would be more surprised by how long it took because you know airplanes generally just take longer. You’re not surprised if you take a long time. But 65 is pretty much longer than 26, so who should be more surprised? We will use a z-score which we seen before in this class to find out who should be more surprised. If you remember the formula for the score is that z=x minus mu divided by sigma. Just plug in. x for Christine was 26 minutes for her wait time. mu was 16. sigma was 8.08 and I put that in my calculator. That is 1.24. For Pedro, his wait time was 65 minutes. So I put in 65 – 50 and I divide by 14.43. So Pedro’s z-score was 1.04. Remember that z-scorers talk about how surprised you might be, how rare it would be to be at the value. For Christine Her z-score was 1.24 vs. Pedro which was 1.04. Christine’s z-score is larger so Christine would be more surprised. We can conclude that Christine’s time to get seated was more surprisingly long. That’s about all I have to say about continuous distributions and, in particular, about the Uniform distribution. Hopefully, this is pretty clear to you. If not, please go over this video again or ask me questions if you’re in my class or if you’re in other instructor’s class please ask your instructor about how to work out these problems and how to understand continuous distributions and the Uniform distribution. Thank you again for watching this video and ask questions ask questions asked questions. I will see you next time when we talk about the Normal distribution, the next real special, not quite as easy, we will need calculators and a big way to come up with probabilities. Take care. Have a great day. Have a great morning. Have a great night. Whenever you’re watching this. Just have a great time. See you later

Leave a Reply

Your email address will not be published. Required fields are marked *