I am new to statistical analysis in Rstudio software. For my classes in R I was given a task to complete. We have a crime statistics for a city:
Month Crime # Days in a month
January 1680 31
February 1610 28
March 1750 31
April 1885 30
May 1887 31
June 1783 30
July 1698 31
August 1822 31
September1735 30
October 1829 31
Novemer 1780 30
December 1673 31
I need to verify if there is a seasonality change in crime rates, or these are stable each month (alpha 0.05). Shall I add a column 'daily_crime_rate' each month and then perform Pearson test/T-Test/Chi-square test? Thank you in advance for help as I am not really good at statistics, just wanna learn programming...
Kind regards, Mike
I tried calculating average number of crimes, add this vector to dataframe. I don't know if adding columns with percentage values will be really needed...
Do you have some suggestions?
Hi Mike. First off, I would like to recommend putting in some time and trying to get stronger at statistics, even though it can be difficult. R is a really powerful tool for both learning and experimenting with statistics and utilizing statistics to perform analyses. I have worked with a few programmers that didn't have as strong of mathematical skills and it was a stumbling block for them.
That having been said, what you should do to complete this depends a little on what the teacher expects. You are correct that you should calculate the average daily number of crimes for each month. If it is not seasonal then you would expect that the daily averages would be consistent for all months. If you calculate the daily mean and the standard deviation for the whole year you can use that to calculate the confidence interval for a 95% confidence level (alpha = 0.05). Daily averages by month that are outside of that range would allow us to reject our "null hypothesis" that the variations are not seasonal.
FWIW, the Pearson test/Students T-Test/Chi-square are for comparing two sets of data. In this case you only have one set of data. You could conceivably break the year into four seasons and run an analysis of variance (ANOVA), but each season would have only three data points, so it might be hard to hit your significance interval on season-to-season variation.
Unless I am misunderstanding something badly I do not see how one can establish seasonality with only data from one year. In any one year the variance could be random.