I just started working on R and I would like to know how I can extract specific data from a database.
My database contains the year, the age and the number of death for a specific year and age. My data looks like this :
Year Age Death
1955 0 150
1955 1 132
....
1955 109 0
1955 110 1
1956 0 126
.....
I want to be able to extract the data following a specific group, for example, I want to only keep the data for the age 0 in 1955, the age 1 in 1956, the age 2 in 1957 ... until the age 65 for the year 2020.
Can someone please explain me how I can extract this data ?
I am able to extract an entire year or age but not a specific age for a specific year.
When I try it on my database :
SelectedData <- death_filter %>% filter((Year - Age) == 1955)
SelectedData
#(death_filter is the name of my database)
I receive this error message :
Error in filter():
! Problem while computing ..1 = (Year - Age) == 1955.
Caused by error in Year - Age:
! non-numeric argument to binary operator
Run rlang::last_error() to see where the error occurred.
into the console. But what you're showing us suggests that Age is a character vector. Try going back to your original version and substituting as.numeric(Age) where you had Age.
When I run the code I wrote above, I have an issue. Indeed, it gives me the data table below but what I need is Age 2 for year 3, age 3 for year 4 and so on. Do you have any idea why? Thank you in advance
Your Age is a factor. The following code illustrates the problem. I set the data frame's Age column to be a factor. When I print it out, the values look like numbers but those are character representations of the underlying factor levels. The Levels display of the printing or DF$Age shows that the levels are in alphabetical order, 1 is followed by 10 and 2 is the 6th entry. When I run the factor through as.numeric(), the result is the ordering of the levels. 1 maps to 1 but 2 maps to 6 and 10 maps to 2. When I run DF$Age through as.character() and then as.numeric() the result is what you expected in your data. The factor element whose character representation was 2 ends up with the numeric value of 2.
The best solution to your problem would be to change the process of reading in the data so that Age is numeric. How are the data getting into the data frame?
If you cannot change how the data are read into the data frame, use as.character() followed by as.numeric() to get the correct result.