Friday, May 17, 2019

Statistics Coursework

1st theory For my archetypal shot I leave alone give out the kin in the midst of the add up of TV hours watched per work week by the pupils against their IQ. I am tone break offing to function the columns IQ and Average image of hours TV watched per week bringn from the Mayfield high entropysheet. I remember that in that respect yield be a affinity in the midst of them and give attempt to reveal it.second venture For my second hypothesis I go forth investigate the consanguinity in the midst of Average enactment of TV hours watched per week and weight (kg). I think that on that point pass on non be each major relationship amid as they will non affect each other greatly.I will present my analysis and the terminations in charts and t suitables and explain the issuances development the coefficient of correlativity of the charts and ar floatments of the figures.I will rent a issue forth of pupils to base my selective information on and will use random sampling to visualize the proper frame of male and distaff pupils carryed to reach the investigating fair.Stratified SamplingI do not want to use all of the info in the infobase for my analysis so I will contend to take a prove of the number of people in the school. I would interchangeable to take ab stunned 10% of the overall figure. I will also need to use stratify sampling to make it an equal proportion of the number of males and females in the school to make it fair.The total number of pupils at the school is 813 so I will need to take 10% as my number, 81.3 is locomote down to 81.The overall ratio for boys and girls in the school is 414399 this instant I will need to do my samplingMales = 414 cipher by 81 = 41813Females = 399 multiplied by 81 = 40813 stochastic Sampling in a flash I perplex the number of samples I will need to select the samples I will be taking. To do this I will use random sampling. I will take random samples until I have 81. I goa t do this on Excel using the following formula = round(round()*120.Once I have ga in that respectd the samples I am ready to start analyzing my samples. compendHypothesis 1 MalesThe freshman thing I need to do in my analysis is to decompose my chartical records which are the source of the probe. I have earnd fritter charts to memorialize the relationship if the ii data sources for my first hypothesis. I have separated them into male and female interprets as at that place is a separation in the numbers.First male frivol away graphThis first graph presented a bit of a problem. on that point was an inconclusive result that affected the purport disceptation and the crustal plate of the graph. I decided to create a new graph that didnt include that 1 fragment of data. This focussing it would help me to analyze the rest of the data. sulfur male spread graphThis graph demonstrateed the data some(prenominal) clearer and I could then start analyzing it. in that respe ct is no correlation amidst the 2 sets of data. This room that it is incredible that there is a relationship in the midst of IQ and Average number of TV hours watched per week. In this it may be that my hypothesis is preposterous. There is plainly a really push aside side on the trend cable television that leans towards a negative correlation, but the gradient is not steep enough to unpack either conclusions about the relationship between the two sets of data. I will have to use the cumulative absolute oftenness graphs and lash p deal out of lands to read if each conclusions can be do.accumulative frequence graphs for IQ and Average number of TV hours watched per weekFrom these graphs I could create reason plots and compare the two sets of data. Before that I analyzed the cumulative frequency graphs to draw initial conclusions. The majority of the IQs for males are between 90 105, this leavens that the data is quite public exposure out as this section plainl y covers a puny area of the graph. For the TV hours graph, again the data is airhead among 1 main area in this case it is between 5-25. There is al to the highest degree a straight notation near the transcend off of the graph this shows that there is likely to be some ill-advised results and 0 pupils in between that result and the main bulk. directly I will create quoin plots so I can compare the two graphs together. quoin plots for cumulative frequency graphs of IQ and average number of TV hours watched per week (for interquartile ranges olfactory property at copies of graphs at the back)From the box plots I can see that the data spread is relatively the same apart from a possible unnatural result in the TV hours data. This parity is the reason wherefore the scatter graph had no correlation and therefore no relationship. This means that my hypothesis is wrong.Hypothesis 1 Femalesonce more I will start with the scatter graphs. As with the male graph I had an infatuated result that spread out the data and scale down the graph so most of the relevant data couldnt be analyzed. I then did another graph without that specific piece of data.Scatter Graphs 1 and 2 to show the relationship between IQ and average number of TV hours watched per week for FemalesAs you can see on both the graphs there is no correlation between the two sets of data. This again means that my first hypothesis is unlikely to be correct. There is only a slight gradient on the trend line which is not steep enough to draw all conclusions from it. There is another infatuated result on the graph but it doesnt affect the trend line and my conclusions so I left it on the graph. I will now crate cumulative frequency graphs to see if they can help me to draw conclusions.Cumulative frequency graphs for the IQ and number of TV hours watched per weekI will now analyze the graphs before move box plots to compare the graphs. The IQs graph is practically more erratic which means that the da ta is spread over a larger range. Although there is 1 area where the data is concentrated and the gradient very steep, between 95-105. The TV hours graph is much even and the data less spread. The data number of hours increases steadily to a certain point then it goes monotone until the end. This means that there is a n abnormal result somewhere. I know that it can only be 1 or 2 anomalous because the point where it goes flat is at about 38 and there are only 39 sets of data in the graph. I will now saying at the box plots to compare the two cumulative frequency graphs.Box plots for cumulative frequency graphs of IQ and number of TV hours watched for femalesThe box plots for these graphs show me that the IQ data has a much larger range and that it is quite evenly spread. I can see this because the interquartile range is quite large and the median evenly spread. There may be a a couple of(prenominal) exceptions as 1 pupil is likey to have a very low IQ which is wherefore the lo west value is so low. The TV hours data seems to be much more concentrated and the data is generally lower. This shows that there cant be any relationship between them as they each grouped in certain areas. Also the box plot for TV hours shows that there is likely to bge an anomalous result as the highest value is so far out of the focal ratio quartile.Hypothesis 2 MalesIn this hypothesis I will be comparing the Average number of TV hours watched per week and incubus, to see if there is any relationship between them. I will again start with Males and the Scatter graphs.Scatter graphs 1 and 2 to show the relationship between clog and the Average number of TV hours watched per week for malesIn these scatter graphs there is a slight negative correlation. This means that as the number of TV hours goes up Weight goes down. This may not be an accurate graph as there are a hardly a(prenominal) anomalous results that may have ca apply the trend line to be that gradient. If this is so my hypothesis would have been correct, if it is not the gradient of the trend line isnt steep enough to enunciate that it is 100% certain that it is accurate. I will need to use the cumulative frequency graphs to draw roll in the hay conclusions.Cumulative frequency graphs for the number of TV hours watched and Weights of malesThese two graphs look quite contrastive the weights graph has most of its data concentrated in the sum of the range, between 30-50 and looks like a normal cumulative frequency curve. Whereas the number of TV hours has most of its data concentrated at the beginning between 0-30, showing that there is likely to be an anomalous result at the end of the range. These anomalous results on the TV hours graph are what caused the slight negative correlation on the trend line. I will be able to make complete conclusions after looking at the female sample and seeing if that graph follows suit. The box plots for these graphs will look quite different and will make it ea sy to make a unprejudiced comparison.Box plots for Cumulative frequency graphs IQ and Weight for malesFrom the box plots I can see that the two sets of data are close selfsame(a) in range which would cause a straight line on the scatter graph it is because of the anomalous results on the TV hours which caused the slight negative correlation. The weights box plot shows me that the data is quite evenly spread in the middle of the range apart from a very heavy person at the end which is why the highest figure is so far apart from the upper quartile. Overall the box plots show me that the relation in the data means there is no relationship and hypothesis was correct.Hypothesis 2 FemalesAgain I will start with the scatter graphs to show the relationship between Number of TV hours watched and weight. The graphs should be similar to the males and the conclusions the same. Again I had an anomalous result and had to create a second scatter graph without it there.Scatter graphs 1 and 2 to show the relationship between the Number of TV hours watched per week and WeightThe second scatter graph in this section, without the anomalous result completely changed the trend line. The first graph looks a lot more like the male graph whereas the second follows my hypothesis a lot breach. In graph 1 there is a slight gradient on the graph which points towards a negative correlation, like those of the male sample. On the graph without the anomalous result there is clearly no correlation whatsoever as the line is nearly horizontal. I will take the results of the male sample to be wrong as I said early there are a few anomalous results which caused the trend line to be at that gradient. Now I will look at the cumulative frequency graphs to see what results I get from them.Cumulative frequency graphs for Average number of TV hours watched per week and Weight for FemalesAs on the males graph the TV hours for females have a lot of anomalous results. But for the scatter graphs I can celled them all out which gave no correlation. If the line at the top of the TV hours graph is blanked out the two graphs look almost identical. This is why the scatter graph got a near horizontal trend line. The box plots for these to graphs will look alike apart from there will be a much longer line at the end of the TV hours graph because of the anomalous results.Box plots of cumulative frequency graphs for Number of TV hours watched and weights of femalesThese box plots show me the same as the males did, that the data is almost identical if placed 1 on top of the other. This is what caused the horizontal line in my scatter graphs and proves my hypothesis.ConclusionHypothesis 1 My first hypothesis has been prove incorrect. The scatter graphs show that there is no correlation between the two sets of data. For my hypothesis to have been correct there would have indispensable to be a strong positive correlation. The cumulative frequency graphs and box plots again proved my hypothe sis incorrect, the similarities in the two sets of datas box plots showed that there was no relationship and showed why the scatter graphs showed a straight line. both(prenominal) the male and female samples showed that my hypothesis was incorrect although some anomalous results created a slight negative correlation in both it was obvious that it was still wrong.Hypothesis 2 My second hypothesis was proved correct. The scatter graphs showed that there was absolutely no correlation on the graphs which means no relationship. Although the male graphs did show a a negative correlation it was proved to be made by a few anomalous results by the cumulative frequency and later the inconsistency with the female sample. The female scatter graph showed a near horizontal trend line which was what I needed to prove my hypothesis. The similarities on the cumulative frequency graphs and box plots further proved my hypothesis was correct.EvaluationThe investigation went quite well although my firs t hypothjesis was incorrect it showed that careful analysis of data is needed before drawing conclusions. When I next do an investigation into data I will use histograms to aid me in my analysis as they come in useful when looking for relationships in two sets of data as the cumulative frequency graphs do. I could have made the cumulative frequency graphs a little better as the program I used did not put a scale on the x axis but only the duration of the range.Statistics Coursework1st Hypothesis For my first hypothesis I will investigate the relationship between the number of TV hours watched per week by the pupils against their IQ. I am going to use the columns IQ and Average number of hours TV watched per week taken from the Mayfield high datasheet. I think that there will be a relationship between them and will attempt to reveal it.2nd Hypothesis For my second hypothesis I will investigate the relationship between Average number of TV hours watched per week and weight (kg). I t hink that there will not be any major relationship between as they will not affect each other greatly.I will present my analysis and the results in graphs and tables and explain the results using the correlation of the graphs and arrangements of the figures.I will select a number of pupils to base my data on and will use random sampling to ascertain the correct number of male and female pupils needed to make the investigation fair.Stratified SamplingI do not want to use all of the data in the database for my analysis so I will need to take a sample of the number of people in the school. I would like to take about 10% of the overall figure. I will also need to use stratified sampling to make it an equal proportion of the number of males and females in the school to make it fair.The total number of pupils at the school is 813 so I will need to take 10% as my number, 81.3 is rounded down to 81.The overall ratio for boys and girls in the school is 414399Now I will need to do my sampling Males = 414 multiplied by 81 = 41813Females = 399 multiplied by 81 = 40813Random SamplingNow I have the number of samples I will need to select the samples I will be taking. To do this I will use random sampling. I will take random samples until I have 81. I can do this on Excel using the following formula = round(round()*120.Once I have gathered the samples I am ready to start analyzing my samples.AnalysisHypothesis 1 MalesThe first thing I need to do in my analysis is to analyze my graphs which are the source of the investigation. I have created scatter graphs to show the relationship if the two data sources for my first hypothesis. I have separated them into male and female graphs as there is a separation in the numbers.First male scatter graphThis first graph presented a bit of a problem. There was an anomalous result that affected the trend line and the scale of the graph. I decided to create a new graph that didnt include that 1 piece of data. This way it would help me to anal yze the rest of the data.Second male scatter graphThis graph showed the data much clearer and I could then start analyzing it. There is no correlation between the 2 sets of data. This means that it is unlikely that there is a relationship between IQ and Average number of TV hours watched per week. In this it may be that my hypothesis is incorrect. There is only a very slight gradient on the trendline that leans towards a negative correlation, but the gradient is not steep enough to draw any conclusions about the relationship between the two sets of data. I will have to use the cumulative frequency graphs and boxplots to see if any conclusions can be made.Cumulative frequency graphs for IQ and Average number of TV hours watched per weekFrom these graphs I could create box plots and compare the two sets of data. Before that I analyzed the cumulative frequency graphs to draw initial conclusions. The majority of the IQs for males are between 90 105, this shows that the data is quite sp read out as this section only covers a small area of the graph. For the TV hours graph, again the data is spread among 1 main area in this case it is between 5-25. There is almost a straight line near the top of the graph this shows that there is likely to be some anomalous results and 0 pupils in between that result and the main bulk. Now I will create box plots so I can compare the two graphs together.Box plots for cumulative frequency graphs of IQ and average number of TV hours watched per week (for interquartile ranges look at copies of graphs at the back)From the box plots I can see that the data spread is relatively the same apart from a possible anomalous result in the TV hours data. This similarity is the reason why the scatter graph had no correlation and therefore no relationship. This means that my hypothesis is wrong.Hypothesis 1 FemalesAgain I will start with the scatter graphs. As with the male graph I had an anomalous result that spread out the data and scale down the graph so most of the relevant data couldnt be analyzed. I then did another graph without that specific piece of data.Scatter Graphs 1 and 2 to show the relationship between IQ and average number of TV hours watched per week for FemalesAs you can see on both the graphs there is no correlation between the two sets of data. This again means that my first hypothesis is unlikely to be correct. There is only a slight gradient on the trend line which is not steep enough to draw any conclusions from it. There is another anomalous result on the graph but it doesnt affect the trend line and my conclusions so I left it on the graph. I will now crate cumulative frequency graphs to see if they can help me to draw conclusions.Cumulative frequency graphs for the IQ and number of TV hours watched per weekI will now analyze the graphs before drawing box plots to compare the graphs. The IQs graph is much more erratic which means that the data is spread over a larger range. Although there is 1 area w here the data is concentrated and the gradient very steep, between 95-105. The TV hours graph is much smoother and the data less spread. The data number of hours increases steadily to a certain point then it goes flat until the end. This means that there is a n anomalous result somewhere. I know that it can only be 1 or 2 anomalous because the point where it goes flat is at about 38 and there are only 39 sets of data in the graph. I will now look at the box plots to compare the two cumulative frequency graphs.Box plots for cumulative frequency graphs of IQ and number of TV hours watched for femalesThe box plots for these graphs show me that the IQ data has a much larger range and that it is quite evenly spread. I can see this because the interquartile range is quite large and the median evenly spread. There may be a few exceptions as 1 pupil is likey to have a very low IQ which is why the lowest value is so low. The TV hours data seems to be much more concentrated and the data is ge nerally lower. This shows that there cant be any relationship between them as they each grouped in certain areas. Also the box plot for TV hours shows that there is likely to bge an anomalous result as the highest value is so far out of the upper quartile.Hypothesis 2 MalesIn this hypothesis I will be comparing the Average number of TV hours watched per week and Weight, to see if there is any relationship between them. I will again start with Males and the Scatter graphs.Scatter graphs 1 and 2 to show the relationship between Weight and the Average number of TV hours watched per week for malesIn these scatter graphs there is a slight negative correlation. This means that as the number of TV hours goes up Weight goes down. This may not be an accurate graph as there are a few anomalous results that may have caused the trend line to be that gradient. If this is so my hypothesis would have been correct, if it is not the gradient of the trend line isnt steep enough to say that it is 100% certain that it is accurate. I will need to use the cumulative frequency graphs to draw complete conclusions.Cumulative frequency graphs for the number of TV hours watched and Weights of malesThese two graphs look quite different the weights graph has most of its data concentrated in the middle of the range, between 30-50 and looks like a normal cumulative frequency curve. Whereas the number of TV hours has most of its data concentrated at the beginning between 0-30, showing that there is likely to be an anomalous result at the end of the range. These anomalous results on the TV hours graph are what caused the slight negative correlation on the trend line. I will be able to make complete conclusions after looking at the female sample and seeing if that graph follows suit. The box plots for these graphs will look quite different and will make it easy to make a simple comparison.Box plots for Cumulative frequency graphs IQ and Weight for malesFrom the box plots I can see that the two sets of data are almost identical in range which would cause a straight line on the scatter graph it is because of the anomalous results on the TV hours which caused the slight negative correlation. The weights box plot shows me that the data is quite evenly spread in the middle of the range apart from a very heavy person at the end which is why the highest figure is so far apart from the upper quartile. Overall the box plots show me that the similarity in the data means there is no relationship and hypothesis was correct.Hypothesis 2 FemalesAgain I will start with the scatter graphs to show the relationship between Number of TV hours watched and weight. The graphs should be similar to the males and the conclusions the same. Again I had an anomalous result and had to create a second scatter graph without it there.Scatter graphs 1 and 2 to show the relationship between the Number of TV hours watched per week and WeightThe second scatter graph in this section, without the anomalous r esult completely changed the trend line. The first graph looks a lot more like the male graph whereas the second follows my hypothesis a lot better. In graph 1 there is a slight gradient on the graph which points towards a negative correlation, like those of the male sample. On the graph without the anomalous result there is clearly no correlation whatsoever as the line is nearly horizontal. I will take the results of the male sample to be wrong as I said earlier there are a few anomalous results which caused the trend line to be at that gradient. Now I will look at the cumulative frequency graphs to see what results I get from them.Cumulative frequency graphs for Average number of TV hours watched per week and Weight for FemalesAs on the males graph the TV hours for females have a lot of anomalous results. But for the scatter graphs I cancelled them all out which gave no correlation. If the line at the top of the TV hours graph is blanked out the two graphs look almost identical. T his is why the scatter graph got a near horizontal trend line. The box plots for these to graphs will look alike apart from there will be a much longer line at the end of the TV hours graph because of the anomalous results.Box plots of cumulative frequency graphs for Number of TV hours watched and weights of femalesThese box plots show me the same as the males did, that the data is almost identical if placed 1 on top of the other. This is what caused the horizontal line in my scatter graphs and proves my hypothesis.ConclusionHypothesis 1 My first hypothesis has been proved incorrect. The scatter graphs show that there is no correlation between the two sets of data. For my hypothesis to have been correct there would have needed to be a strong positive correlation. The cumulative frequency graphs and box plots again proved my hypothesis incorrect, the similarities in the two sets of datas box plots showed that there was no relationship and showed why the scatter graphs showed a straig ht line. Both the male and female samples showed that my hypothesis was incorrect although some anomalous results created a slight negative correlation in both it was obvious that it was still wrong.Hypothesis 2 My second hypothesis was proved correct. The scatter graphs showed that there was absolutely no correlation on the graphs which means no relationship. Although the male graphs did show a a negative correlation it was proved to be made by a few anomalous results by the cumulative frequency and later the inconsistency with the female sample. The female scatter graph showed a near horizontal trend line which was what I needed to prove my hypothesis. The similarities on the cumulative frequency graphs and box plots further proved my hypothesis was correct.EvaluationThe investigation went quite well although my first hypothjesis was incorrect it showed that careful analysis of data is needed before drawing conclusions. When I next do an investigation into data I will use histogra ms to aid me in my analysis as they come in useful when looking for relationships in two sets of data as the cumulative frequency graphs do. I could have made the cumulative frequency graphs a little better as the program I used did not put a scale on the x axis but only the length of the range.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.