Extract specific data from database on RStudio

Dear all,

I just started working on R and I would like to know how I can extract specific data from a database.
My database contains the year, the age and the number of death for a specific year and age. My data looks like this :
Year Age Death
1955 0 150
1955 1 132
....
1955 109 0
1955 110 1
1956 0 126
.....
I want to be able to extract the data following a specific group, for example, I want to only keep the data for the age 0 in 1955, the age 1 in 1956, the age 2 in 1957 ... until the age 65 for the year 2020.
Can someone please explain me how I can extract this data ?
I am able to extract an entire year or age but not a specific age for a specific year.

Thank you in advance.

You want to extract the data for a given year-of-birth, if I understand you correctly.

library(dplyr)
DF <- data.frame(Year = rep(1955:2020, each = 100), Age = rep(0:99, 66), Death = sample(20:60, 6600, replace = TRUE))

SelectedData <- DF %>% filter((Year - Age) == 1955)
SelectedData
#>    Year Age Death
#> 1  1955   0    43
#> 2  1956   1    41
#> 3  1957   2    30
#> 4  1958   3    21
#> 5  1959   4    23
#> 6  1960   5    55
#> 7  1961   6    31
#> 8  1962   7    45
#> 9  1963   8    54
#> 10 1964   9    45
#> 11 1965  10    27
#> 12 1966  11    51
#> 13 1967  12    58
#> 14 1968  13    20
#> 15 1969  14    20
#> 16 1970  15    33
#> 17 1971  16    57
#> 18 1972  17    29
#> 19 1973  18    33
#> 20 1974  19    34
#> 21 1975  20    36
#> 22 1976  21    22
#> 23 1977  22    22
#> 24 1978  23    24
#> 25 1979  24    39
#> 26 1980  25    48
#> 27 1981  26    58
#> 28 1982  27    45
#> 29 1983  28    33
#> 30 1984  29    54
#> 31 1985  30    29
#> 32 1986  31    36
#> 33 1987  32    30
#> 34 1988  33    60
#> 35 1989  34    26
#> 36 1990  35    57
#> 37 1991  36    40
#> 38 1992  37    43
#> 39 1993  38    58
#> 40 1994  39    20
#> 41 1995  40    22
#> 42 1996  41    26
#> 43 1997  42    40
#> 44 1998  43    52
#> 45 1999  44    38
#> 46 2000  45    25
#> 47 2001  46    32
#> 48 2002  47    30
#> 49 2003  48    45
#> 50 2004  49    33
#> 51 2005  50    41
#> 52 2006  51    21
#> 53 2007  52    26
#> 54 2008  53    40
#> 55 2009  54    49
#> 56 2010  55    26
#> 57 2011  56    25
#> 58 2012  57    26
#> 59 2013  58    44
#> 60 2014  59    22
#> 61 2015  60    24
#> 62 2016  61    27
#> 63 2017  62    33
#> 64 2018  63    21
#> 65 2019  64    48
#> 66 2020  65    48

Created on 2022-04-04 by the reprex package (v0.2.1)

Thank you for your response!

When I try it on my database :
SelectedData <- death_filter %>% filter((Year - Age) == 1955)
SelectedData

#(death_filter is the name of my database)

I receive this error message :

Error in filter():
! Problem while computing ..1 = (Year - Age) == 1955.
Caused by error in Year - Age:
! non-numeric argument to binary operator
Run rlang::last_error() to see where the error occurred.

Can you maybe explain me the reason ?

Almost certainly either Year or Age is a character rather than numeric. Try

str(Year)
str(Age)

I receive this error now :

library(dplyr)
SelectedData <- death_filter %>% filter((str(Year) - str(Age) == 1955))
int [1:7326] 1955 1955 1955 1955 1955 1955 1955 1955 1955 1955 ...
chr [1:7326] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" ...
Error in filter():
! Problem while computing ..1 = (str(Year) - str(Age) == 1955).
x Input ..1 must be of size 7326 or 1, not size 0.
Run rlang::last_error() to see where the error occurred.

Sorry, I wasn't clear. I just meant to type

str(death_filter$Age)

into the console. But what you're showing us suggests that Age is a character vector. Try going back to your original version and substituting as.numeric(Age) where you had Age.

No worries I am a real beginner so I don't understand a lot yet but I guess it will get easier with time.

But it worked! Thanks to both of you!

Hello, I am working on the same project as Mathias.

death_filter <- filter(Deaths_1x1, Year > 1954)
SelectedData <- death_filter %>% filter((Year - as.numeric(Age) == 1955))
SelectedData

When I run the code I wrote above, I have an issue. Indeed, it gives me the data table below but what I need is Age 2 for year 3, age 3 for year 4 and so on. Do you have any idea why? Thank you in advance

Year Age Female Male Total
1 1956 0 972 1300 2272
2 1957 1 95 129 224
3 1958 10 14 21 35
4 1959 100 2 6 8
5 1960 101 2 0 2
6 1961 102 1 2 3
7 1962 103 4 0 4
8 1963 104 1 1 2
9 1964 105 0 0 0
10 1965 106 0 0 0
11 1966 107 0 0 0
12 1967 108 1 1 2
13 1968 109 0 0 0
14 1969 11 9 17 26
15 1970 110+ 0 0 0
16 1971 12 13 13 26
17 1972 13 15 21 36
18 1973 14 11 33 44
19 1974 15 16 42 58
20 1975 16 17 45 62
21 1976 17 18 44 62
22 1977 18 21 64 85
....

Your Age is a factor. The following code illustrates the problem. I set the data frame's Age column to be a factor. When I print it out, the values look like numbers but those are character representations of the underlying factor levels. The Levels display of the printing or DF$Age shows that the levels are in alphabetical order, 1 is followed by 10 and 2 is the 6th entry. When I run the factor through as.numeric(), the result is the ordering of the levels. 1 maps to 1 but 2 maps to 6 and 10 maps to 2. When I run DF$Age through as.character() and then as.numeric() the result is what you expected in your data. The factor element whose character representation was 2 ends up with the numeric value of 2.

DF <- data.frame(Age = c("1", "2", "3", "10", "11", "100", "101"),
+                  stringsAsFactors = TRUE)
> DF$Age
[1] 1   2   3   10  11  100 101
Levels: 1 10 100 101 11 2 3
> as.numeric(DF$Age)
[1] 1 6 7 2 5 3 4
> as.numeric(as.character(DF$Age))
[1]   1   2   3  10  11 100 101

The best solution to your problem would be to change the process of reading in the data so that Age is numeric. How are the data getting into the data frame?
If you cannot change how the data are read into the data frame, use as.character() followed by as.numeric() to get the correct result.

It worked. Thanks a lot!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.