Error finding Column name

RUserious · January 10, 2024, 12:31am

I used this code for another dataset and it worked fine

birth_data_summary=birth_data %>% group_by(PLACE OF BIRTH - NAME OF HOSPITAL OR FACILITY) %>% summarise(mean(diffdates),sd(diffdates))

The columns are PLACE OF BIRTH and NAME OF HOSPITAL OR FACILITY were successfully found, but when I used the same code for my other dataset

death_data_summary=death_data %>% group_by(DATE OF DEATH - NAME OF FUNERAL ESTABLISHMENT ) %>% summarise(mean(diffdates),sd(diffdates))

and used the column names DATE OF DEATH and NAME OF FUNERAL ESTABLISHED, I received this error message

Error in group_by():
! Must group by variables found in .data.
Column DATE OF DEATH - NAME OF FUNERAL ESTABLISHMENT is not found.
Run rlang::last_trace() to see where the error occurred.

When I use this code

colnames(death_data). The column names are listed.

How do I correct this error?

[1] "STATE FILE NUMBER"
[2] "LOCAL REGISTRATION NUMBER"
[3] "FIRST NAME"
[4] "MIDDLE NAME"
[5] "LAST NAME"
[6] "RECORD STATUS"
[7] "NAME TRUNCATION CODE"
[8] "DATE OF BIRTH"
[9] "YEAR OF BIRTH"
[10] "MONTH OF BIRTH"
[11] "DAY OF BIRTH"
[12] "AGE IN YEARS"
[13] "AGE IN MONTHS"
[14] "AGE IN DAYS"
[15] "AGE IN HOURS"
[16] "AGE IN MINUTES"
[17] "AGE TYPE"
[18] "AGE UNIT"
[19] "SEX"
[20] "DATE OF DEATH"
[21] "HOUR OF DEATH (24 HOUR CLOCK )"
[22] "FND"
[23] "EST"
[24] "YEAR OF DEATH"
[25] "MONTH OF DEATH"
[26] "DAY OF DEATH"
[27] "BIRTH STATE/FOREIGN COUNTRY (TEXT)"
[28] "BIRTH STATE/FOREIGN COUNTRY (CODE)"
[29] "BIRTH COUNTRY"
[30] "BIRTH STATE/PROVINCE"
[31] "SOCIAL SECURITY NUMBER"
[32] "EVER IN US ARMED FORCES?"
[33] "MARITAL STATUS"
[34] "EDUCATION - HIGHEST LEVEL/DEGREE"
[35] "EDUCATION - HIGHEST LEVEL/DEGREE (NCHS CODE)"
[36] "HISPANIC ORIGIN"
[37] "HISPANIC ANCESTRY/ORIGIN (CODE)"
[38] "SPECIFY HISPANIC (TEXT)"
[39] "RACE #1 CODE (PRELIMINARY)"
[40] "RACE #2 CODE (PRELIMINARY)"
[41] "RACE #3 CODE (PRELIMINARY)"
[42] "RACE #1 (TEXT)"
[43] "RACE #2 (TEXT)"
[44] "RACE #3 (TEXT)"
[45] "RACE #1 CODE (FINAL)"
[46] "RACE #2 CODE (FINAL)"
[47] "RACE #3 CODE (FINAL)"
[48] "RACE #1 (NCHS TEXT)"
[49] "RACE #2 (NCHS TEXT)"
[50] "RACE #3 (NCHS TEXT)"
[51] "MULTI-RACE STATUS"
[52] "OCCUPATION"
[53] "BUSINESS OR INDUSTRY"
[54] "YEARS IN OCCUPATION"
[55] "DECEDENT'S RESIDENCE STREET NUMBER"
[56] "DECEDENT'S RESIDENCE STREET NAME"
[57] "DECEDENT'S CITY OF RESIDENCE (TEXT)"
[58] "DECEDENT'S CITY OF RESIDENCE (CODE)"
[59] "DECEDENT'S COUNTY/PROVINCE OF RESIDENCE (CDPH CODE)"
[60] "DECEDENT'S COUNTY/PROVINCE OF RESIDENCE (TEXT)"
[61] "DECEDENT'S COUNTY OF RESIDENCE (NCHS CODE)"
[62] "FINAL DECEDENT'S COUNTY OF RESIDENCE BASED ON GEOCODE (NCHS CODE)"
[63] "DECEDENT'S COUNTY OF RESIDENCE BASED ON GEOCODE (TEXT)"
[64] "DECEDENT'S COUNTY OF RESIDENCE BASED ON GEOCODE (CDPH CODE)"
[65] "DECEDENT'S COUNTY OF RESIDENCE BASED ON GEOCODE (NCHS CODE)"
[66] "DECEDENT'S ZIP CODE OF RESIDENCE 5"
[67] "DECEDENT'S ZIP CODE OF RESIDENCE 4"
[68] "STATE/FOREIGN COUNTRY OF RESIDENCE (TEXT)"
[69] "STATE/FOREIGN COUNTRY OF RESIDENCE (CODE)"
[70] "DECEDENT RESIDENCE COUNTRY (CODE)"
[71] "RESIDENCE STATE/PROVINCE"
[72] "YEARS IN COUNTY"
[73] "DECEDENT RESIDENCE CENSUS TRACT"
[74] "DECEDENT RESIDENCE LATITUDE"
[75] "DECEDENT RESIDENCE LONGITUDE"
[76] "INFORMANT'S RELATIONSHIP"
[77] "INFORMANT'S FIRST NAME"
[78] "INFORMANT'S MIDDLE NAME"
[79] "INFORMANT'S LAST NAME"
[80] "INFORMANT MAILING ADDRESS STREET NUMBER"
[81] "INFORMANT MAILING ADDRESS STREET NAME"
[82] "INFORMANT MAILING ADDRESS APT/SUITE/UNIT #"
[83] "INFORMANT MAILING CITY"
[84] "INFORMANT MAILING STATE"
[85] "INFORMANT MAILING ZIP CODE 5"
[86] "INFORMANT MAILING ZIP CODE 4"
[87] "NAME OF SURVIVING SPOUSE -- FIRST"
[88] "SPOUSE'S MIDDLE NAME"
[89] "SPOUSE'S LAST (MAIDEN NAME)"
[90] "NAME OF FATHER \x96 FIRST"
[91] "MIDDLE NAME OF FATHER"
[92] "LAST (FAMILY) NAME OF FATHER"
[93] "FATHER'S STATE OF BIRTH (TEXT)"
[94] "FATHER'S STATE OF BIRTH (CODE)"
[95] "FATHER'S COUNTRY OF BIRTH (CODE)"
[96] "FATHER'S STATE OF BIRTH (CODE)"
[97] "FIRST (GIVEN) NAME OF MOTHER"
[98] "MIDDLE NAME OF MOTHER"
[99] "LAST (MAIDEN) NAME OF MOTHER"
[100] "MOTHER'S STATE OF BIRTH (TEXT)"
[101] "MOTHER'S STATE OF BIRTH (CODE)"
[102] "MOTHER'S COUNTRY OF BIRTH (CODE)"
[103] "MOTHER'S STATE OF BIRTH (CODE)"
[104] "DISPOSITION DATE"
[105] "PLACE OF FINAL DISPOSITION (DESCRIPTION)"
[106] "PLACE OF FINAL DISPOSITION ADDRESS STREET NUMBER"
[107] "PLACE OF FINAL DISPOSITION ADDRESS STREET NAME"
[108] "PLACE OF FINAL DISPOSITION ADDRESS CITY"
[109] "PLACE OF FINAL DISPOSITION STATE"
[110] "DISPOSITION STATE (CODE)"
[111] "DISPOSITION COUNTRY (CODE)"
[112] "PLACE OF FINAL DISPOSITION ZIP CODE 5"
[113] "PLACE OF FINAL DISPOSITION ZIP CODE 4"
[114] "TYPE OF DISPOSITION (TEXT)"
[115] "TYPE OF DISPOSITION (CODE)"
[116] "SIGNATURE OF EMBALMER"
[117] "EMBALMER'S LICENSE NUMBER"
[118] "NAME OF FUNERAL ESTABLISHMENT"
[119] "FUNERAL Establishment LICENSE NUMBER"
[120] "NAME OF LOCAL REGISTRAR"
[121] "DATE ACCEPTED BY LOCAL REGISTRAR"
[122] "DEATH FACILITY NAME/LOCATION"
[123] "DEATH IN HOSPITAL (CODE)"
[124] "DEATH OUT OF HOSPITAL (CODE)"
[125] "PLACE OF DEATH (FACILITY)"
[126] "COUNTY OF DEATH (TEXT)"
[127] "COUNTY OF DEATH (CODE)"
[128] "DEATH COUNTY (CODE)"
[129] "LOCAL REGISTRATION DISTRICT"
[130] "DEATH STATE"
[131] "DEATH COUNTRY"
[132] "PLACE OF DEATH ADDRESS STREET NUMBER"
[133] "PLACE OF DEATH ADDRESS STREET NAME"
[134] "CITY WHERE DEATH OCCURRED"
[135] "IMMEDIATE CAUSE OF DEATH"
[136] "APPROX INTERVAL BETWEEN ONSET & DEATH"
[137] "A IS DUE TO OR A CONSEQUENCE OF B"
[138] "APPROX. INTERVAL BETWEEN ONSET & DEATH"
[139] "B IS DUE TO OR A CONSEQUENCE OF C"
[140] "APPROX. INTERVAL BETWEEN ONSET & DEATH"
[141] "C IS DUE TO OR AS A CONSEQUENCE OF D"
[142] "APPROX. INTERVAL BETWEEN ONSET & DEATH"
[143] "PRELIMINARY CAUSE OF DEATH (ICD-10 CODE)"
[144] "FINAL CAUSE OF DEATH (ICD10 CODE)"
[145] "FINAL CAUSE OF DEATH (ICD-10 CODE) (DATE RECEIVED)"
[146] "DEATH REPORTED TO CORONER"
[147] "REFERRAL NUMBER"
[148] "BIOPSY PERFORMED"
[149] "AUTOPSY PERFORMED"
[150] "AUTOPSY USED TO DETERMINE CAUSE"
[151] "OTHER SIGNIFICANT CONDITIONS"
[152] "OPERATION PERFORMED"
[153] "OPERATION (CODE)"
[154] "PREGNANCY STATUS"
[155] "PREGNANCY STATUS (CODE)"
[156] "LAST DATE PHYSICIAN ATTENDED DECEDENT"
[157] "DATE PHYSICIAN LAST SAW DECEDENT ALIVE"
[158] "NAME/SIGNATURE/TITLE OF CERTIFIER"
[159] "CERTIFIER LICENSE NUMBER"
[160] "DATE CERTIFIER SIGNED"
[161] "ATTENDING PHYSICIAN'S FIRST, MIDDLE, LAST NAME "
[162] "ATTENDING PHYSICIAN'S ADDRESS STREET NUMBER"
[163] "ATTENDING PHYSICIAN'S ADDRESS STREET NAME"
[164] "ATTENDING PHYSICIAN'S CITY"
[165] "ATTENDING PHYSICIAN'S STATE"
[166] "ATTENDING PHYSICIAN'S ZIP CODE 5"
[167] "ATTENDING PHYSICIAN'S ZIP CODE 4"
[168] "MANNER OF DEATH"
[169] "INJURED AT WORK"
[170] "INJURY DATE"
[171] "INJURY YEAR"
[172] "INJURY MONTH"
[173] "INJURY DAY"
[174] "INJURY HOUR (24 HOUR CLOCK)"
[175] "PLACE OF INJURY"
[176] "PLACE OF INJURY (EDRS CODE)"
[177] "ENVIRONMENTAL PLACE OF INJURY (CODE)"
[178] "DESCRIBE HOW INJURY OCCURRED"
[179] "LOCATION OF PLACE OF INJURY ADDRESS STREET NUMBER"
[180] "LOCATION OF PLACE OF INJURY ADDRESS STREET NAME"
[181] "LOCATION OF PLACE OF INJURY CITY"
[182] "LOCATION OF PLACE OF INJURY STATE"
[183] "LOCATION OF PLACE OF INJURY ZIP CODE 5"
[184] "LOCATION OF PLACE OF INJURY ZIP CODE 4"
[185] "SIGNATURE OF CORONER/DEPUTY CORONER"
[186] "DATE CORONER SIGNED"
[187] "CORONER'S NAME AND TITLE"
[188] "TYPE OF CERTIFIER"
[189] "STATE REGISTRATION DATE"
[190] "MONTH DEATH CERTIFICATE PROCESSED"
[191] "INJURYLOCATIONDESCRIPTION"
[192] "RESIDENCE CENSUS_TRACT_2010"
[193] "RESIDENCE LATITUDE"
[194] "RESIDENCE LONGITUDE"
[195] "PLACE OF DEATH CENSUS_TRACT_2008"
[196] "PLACE OF DEATH LATITUDE"
[197] "PLACE OF DEATH LONGITUDE"
[198] "PLACE OF INJURY CENSUS_TRACT_2008"
[199] "PLACE OF INJURY LATITUDE"
[200] "PLACE OF INJURY LONGITUDE"
[201] "ENTITY AXIS CODE 1"
[202] "ENTITY AXIS CODE 2"
[203] "ENTITY AXIS CODE 3"
[204] "ENTITY AXIS CODE 4"
[205] "ENTITY AXIS CODE 5"
[206] "ENTITY AXIS CODE 6"
[207] "ENTITY AXIS CODE 7"
[208] "ENTITY AXIS CODE 8"
[209] "ENTITY AXIS CODE 9"
[210] "ENTITY AXIS CODE 10"
[211] "ENTITY AXIS CODE 11"
[212] "ENTITY AXIS CODE 12"
[213] "ENTITY AXIS CODE 13"
[214] "ENTITY AXIS CODE 14"
[215] "ENTITY AXIS CODE 15"
[216] "ENTITY AXIS CODE 16"
[217] "ENTITY AXIS CODE 17"
[218] "ENTITY AXIS CODE 18"
[219] "ENTITY AXIS CODE 19"
[220] "ENTITY AXIS CODE 20"
[221] "RECORD AXIS CODE 1"
[222] "RECORD AXIS CODE 2"
[223] "RECORD AXIS CODE 3"
[224] "RECORD AXIS CODE 4"
[225] "RECORD AXIS CODE 5"
[226] "RECORD AXIS CODE 6"
[227] "RECORD AXIS CODE 7"
[228] "RECORD AXIS CODE 8"
[229] "RECORD AXIS CODE 9"
[230] "RECORD AXIS CODE 10"
[231] "RECORD AXIS CODE 11"
[232] "RECORD AXIS CODE 12"
[233] "RECORD AXIS CODE 13"
[234] "RECORD AXIS CODE 14"
[235] "RECORD AXIS CODE 15"
[236] "RECORD AXIS CODE 16"
[237] "RECORD AXIS CODE 17"
[238] "RECORD AXIS CODE 18"
[239] "RECORD AXIS CODE 19"
[240] "RECORD AXIS CODE 20"
[241] "SSN PASSED VALIDATION"
[242] "PLACE OF DEATH ZIP CODE"
[243] "PLACE OF DEATH ZIP CODE 4 DIGIT EXTENSION"
[244] "NIOSH INDUSTRY CODE"
[245] "NIOSH OCCUPATION CODE"
[246] "HOMELESS STATUS"
[247] "HOMELESS STATUS, SPECIFY"
[248] "DISASTER-RELATED"
[249] "DISASTER-RELATED, SPECIFY"
[250] "DATE AND TIME RECORD LAST MODIFIED"
[251] "EXPORT DATE/TIME"
[252] "datedeath"
[253] "dateofregistration"
[254] "diffdates"

FJCC · January 10, 2024, 12:51am

This error

Column DATE OF DEATH - NAME OF FUNERAL ESTABLISHMENT is not found.

suggests that the group_by function is looking for one column named DATE OF DEATH - NAME OF FUNERAL ESTABLISHMENT. Your group_by should look like

group_by(`DATE OF DEATH`, `NAME OF FUNERAL ESTABLISHMENT`)

When posting code or data, please put a line with three back ticks just before and after the content. That will format the code or data in a way that makes it much easier to read. Here is and example. Typing this in your post
```
group_by(DATE OF DEATH, NAME OF FUNERAL ESTABLISHMENT)
```

looks like this when posted

group_by(`DATE OF DEATH`, `NAME OF FUNERAL ESTABLISHMENT`)

RUserious · January 11, 2024, 12:54am

Thank you for your response. The code still didn't work. This is the next error and update code.

> death_data_summary=death_data %>% group_by(`DATE OF DEATH`,`NAME OF FUNERAL ESTABLISHMENT `) %>% summarise(mean(diffdates),sd(diffdates)) Error in `group_by()`:

ERROR MESSAGE

```! Must group by variables found in .data.
Column NAME OF FUNERAL ESTABLISHMENT is not found.
Run rlang::last_trace() to see where the error occurred.

#birth data grouped by place of birth and averages
death_data_summary=death_data %>% group_by(DATE OF DEATH, NAME OF FUNERAL ESTABLISHMENT)%>% summarise(mean(diffdates),sd(diffdates))
Error in as_tibble():
! Column names FATHER'S STATE OF BIRTH (CODE), MOTHER'S STATE OF BIRTH (CODE), APPROX. INTERVAL BETWEEN ONSET & DEATH, and APPROX. INTERVAL BETWEEN ONSET & DEATH must not be
duplicated.
Use .name_repair to specify repair.
Caused by error in repaired_names():
! Names must be unique.
These names are duplicated:

"FATHER'S STATE OF BIRTH (CODE)" at locations 94 and 96.
"MOTHER'S STATE OF BIRTH (CODE)" at locations 101 and 103.
"APPROX. INTERVAL BETWEEN ONSET & DEATH" at locations 138, 140, and 142.
Run rlang::last_trace() to see where the error occurred.```

FJCC · January 11, 2024, 2:47am

This will be easier to solve if I have some of your data. Please run the following code

dput(death_data[1:30, c(20, 118, 254)]

Post the output of that here. That will give me 30 rows of the columns DATE OF DEATH, NAME OF FUNERAL ESTABLISHMENT, and diffdates.

rene_at_coco · January 11, 2024, 10:23pm

Here are a few tips for working this these types of data. This is especially helpful with messy variable names.

clean names

Use janitor::clean_names() to clean up the variable names to make it easier to work with.

birth_data_clean <- birth_data %>%
  clean_names() 
  
birth_data_clean %>%
  names()

get a glimpse of the data

Get a dplyr::glimpse() of the data to help keep yourself grounded with the different variables and their values.

birth_data_clean %>%
  glimpse()

Also be sure not to dput any actual real world vital statistics data as it would likely violate your use of the data.

RUserious · January 18, 2024, 11:39pm

This code still didn't correct the issue.

FJCC · January 19, 2024, 3:13am

FJCC:

This will be easier to solve if I have some of your data. Please run the following code
dput(death_data[1:30, c(20, 118, 254)]
Post the output of that here. That will give me 30 rows of the columns DATE OF DEATH, NAME OF FUNERAL ESTABLISHMENT, and diffdates.

Please post some data here as I requested previously.

RamRS · January 19, 2024, 9:36pm

This has been mentioned to you in a past thread as well, but please use janitor to clean your names, and format your code and results properly when you post them here. Please make it easy for forum users to help you.

nirgrahamuk · January 20, 2024, 12:17pm

[quote="RUserious, post:3, topic:179998"]
> death_data_summary=death_data %>% group_by(`DATE OF DEATH`,`NAME OF FUNERAL ESTABLISHMENT `) %>% summarise(mean(diffdates),sd(diffdates)) Error in `group_by()`:
[/quote]

best guess is that RUserious put an extra space between the T of establishmnent and the final backtick mark, making the symbol name not match the same as without the space.

So this a typographical error.

RUserious · January 22, 2024, 5:40pm

My code with revision you suggested

death_data_summary=death_data %>% group_by(`DATE OF DEATH`, `NAME OF FUNERAL ESTABLISHMENT`)%>% summarise(mean(diffdates),sd(diffdates))

Error message

```Error in as_tibble():
! Column names FATHER'S STATE OF BIRTH (CODE), MOTHER'S STATE OF BIRTH (CODE), APPROX. INTERVAL BETWEEN ONSET & DEATH, and APPROX. INTERVAL BETWEEN ONSET & DEATH must not be
duplicated.
Use .name_repair to specify repair.
Caused by error in repaired_names():
! Names must be unique.
These names are duplicated:

"FATHER'S STATE OF BIRTH (CODE)" at locations 94 and 96.
"MOTHER'S STATE OF BIRTH (CODE)" at locations 101 and 103.
"APPROX. INTERVAL BETWEEN ONSET & DEATH" at locations 138, 140, and 142.
Run rlang::last_trace() to see where the error occurred.
Warning messages:
1: In grep("^[.].", names) : unable to translate 'NAME OF FATHER <96> FIRST' to a wide string 2: In grep("^[.][.](?:[.]|[1-9][0-9]*)", names) :
input string 90 is invalid ```

Then your suggested code

dput(death_data[1:30, c(20, 118, 254)]

Error message

"dput(death_data[1:30, c(20, 118, 254)]
death_data_summary" ```

nirgrahamuk · January 22, 2024, 7:11pm

if your error is about the presence of duplicate names; this implies you did not run janitor::clean_names on that data prior, as that would have assigned a unique name.
see this example

# making a frame with duplicate name
dfx <- data.frame(
  dup_name_1=1:6,
  dup_name_2=1:6
) 
names(dfx)[2] <- names(dfx)[1]
dfx

#cleaning it 
(cleaned_dfx <- janitor::clean_names(dfx))

  dup_name_1 dup_name_1_2
1          1            1
2          2            2
3          3            3
4          4            4
5          5            5
6          6            6

rene_at_coco · January 22, 2024, 8:11pm

There is no one right or wrong way to work in R, but there are things you can do to make it go easier on yourself.

When reading vital statistics data, I use something like this:

birth_data <- haven::read_sas("the_file_to_read") %>%
    janitor::clean_names() %>%
    dplyr::mutate(across(.cols = everything(), as.character))

I then usually inspect the data with:

dplyr::glimpse(birth_data)

At that point, I have a better grasp of the data and how I should proceed. It seems like this is where you're getting hung up - making messy data more tidy.

system · March 4, 2024, 8:12pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.