Now Playing: Sweet like Choclate boy by Shanky and bigfoot
Today i decided to have a public blog at tripod. I hope to post my ramblings on my day to day activities.
The first thing that comes to my mind today is that i have a test of statistics tommorow and i must study for that. It is all about hypothesis testing. Which include Null Hypothesis, Alternative Hypothesis, Test of significance, P Value, Alpha, One sample z test, One sample t test.
Just a thought on z value. We know it actually tells us how far is a given value from the mean in terms of standard deviation. If something is normally distributed and we have a value which is 3 times the standard deviation from the normal we can easily conclude that such value won't occur more than 15 times if we have a sample of 10000 (Ten Thousand). Got it ? Because for 3 standard deviation we cover almost 99.7 % of the value centered around mean. So the value could be either +3 standard deviation or -3 standard deviation. The probability of each is 0.15 and together they make 0.3. So the probability of having either in 10,000 is 15.
The T test is done when we don't know the population standard deviation. Which is correct. Because we never know the population standard deviation we only know the standard deviation of the sample. The only time we will know the population standard deviation when we have a very small population. Then there is no use of statistics. We use statistics when we have to make a prediction on a large population with a sample. So T test is a lot better than z test.
Different books have a different way of specifying the layout of the table. For ex in Basic practice of statistics the Z table is layout on the basis of Z value. If you know the Z value you can find the area corresponding to that Z value in the table. But just matching the value in the row and columns and this area corresponds to the area from the left. So if you have a Z value of 1. You just search the first column and look for 1. I can guess the area corresponding to this value 1 because what it tells me that Standard deviation is 1 and i know that for 1 standard deviation i have 68 percent of my all data. Since its plus 1 i know that to its left i have 34 percent and to its right also i have 34 percent (Since the normal curve is symmetric along mean) and mean is 50 percent of the all values. Ie it lies in the middle. Isn't what mean means ? So the area is 50 +34 = 84 percent. If the total area is 1 then its 0.84 and viola you get the value. Now armed with this information you should be able to see that for z = 1 you have 0.84 of the total area starting from the left, can you think now what will correspond to z = -1. We know that between z = -1 and +1 we have 68 %. So this will be 50 - 32 = 18 percent of the left area starting from the left. Infact we donot need the the z table for both side. Since the total area is one. What is on the left can be easily calculated. So if we know for z = 1 the value is 0.84 then for z = -1 the value will be (1-0.84) = 0.16. We are able to do this because of the symmetric nature of the normal distribution table and we are calculating using mean in the centre as reference. Ok to explain this more think in terms of what z = 1 and z = -1 corresponds to when you are measuring the area from the left. Z = 1 and Z = -1 both have equal area from the centre. When you take z = 1 you add 0.5 to it and when you take z = -1 you subtract it FROM 0.5.
One more thing to observe is when we say about the 68, 95, 99.7 rule what we are saying is in reference to mean. That 68 percent of all the values will be within 1 standard deviation. However when we are calculating in reference to left we take all the values from the left. For example a z score of 0 corresponds to area 0.5. Which is true because a z score of zero tells that the value is same as mean and since mean is greater than 50 percent of the value the area under the curve will be 0.5 if total area is 1. When we say that 68 percent of all values lie between z = -1 and 1 , you can test that too. For z = -1 we have 0.1587 and for z = 1 we have 0.8413 and the difference is 0.8413 - 0.1587 = 0.6826. Similarly for z = -2 and z = 2 we have 0.0228 and 0.9772 and the difference is 0.9772 - 0.0228 = 0.9544 and the last is z = -3 and z = 3 for which we have 0.0013 and 0.9987 and the difference is 0.9987-0.0013= .9974. So we see that our rule of 68, 95 and 99.7 does hold with exact values as 0.6826,0.9544 and 0.9974.
There is usually some problem with the understanding of Z and Z*. The Z* is used when we have to find the confidence interval and Z is used for the area. Suppose we have to find the 90 % confidence interval around a mean. What it actually means is if we take 100 samples 90 times we will find our mean to be in this range. The Z* is the statistic we use to compute this. The Z Value for this will be the same as for 95% area. Because Z* is centred around the mean. So we have 45% on left and 45% on right. But since Z is encompassing it becomes (50+45)% which makes it 95%. Since Z for 95% area is 1.96. Z* for 90% confidence interval is 1.96. The T distribution works in the same way.
It doesn't matter if you use left side or the right side for reading the table. What one should understand and apprecitate is that the area of Normal curve is 1. If we talk about interval we are taking Z* or T*. If we talk about area and probability then we talk about Z or T.