Column names improperly read into R studio

RUserious · January 5, 2024, 12:27am

I'm very new to R and Rstudio. I am importing a dataset into R from Excel and copied and pasted the columns names from one Excel file to the other. When I import the dataset the columns names appear, but over the column names are other columns labeled V1-V215. What code do I use to remove them?

FJCC · January 5, 2024, 1:07am

The best way to avoid the V1-V215 column names is to tell the function importing the data to treat the first row as column names. What command are you using to import the data?

RUserious · January 5, 2024, 5:58pm

birth_data <-import("ccbf__20220101_20230814.csv",header=FALSE)

birth_data_dictionary <-import("ccbf_data_dictionary_v2.3.3.xlsx" )

death_data_dictionary <-import("ccdf_data_dictionary_v2.3.1.csv")

death_data <-import ("ccdf__20220101_20230814.csv", header=FALSE)

I'm using this code to combine column names for my death data

colnames(death_data) = colnames(death_data_dictionary)

I haven't had a problem with headers for this code. I was thinking about using this code for the other dataset, but wanted to see if I can correct the problem first.

I hope this helps.

jrkrideau · January 5, 2024, 6:28pm

As FJCC says.

There is a good chance that some of your column-names are numeric and R thinks they that row is the first row of your data and not names.

I am guessing but try header = TRUE like this

birth_data <-import("ccbf__20220101_20230814.csv", header= TRUE)

RUserious · January 5, 2024, 6:54pm

Thank you so much ! that solved the problem.

RUserious · January 5, 2024, 7:02pm

I have another error message for a different line of code. I'm not sure if I should create a new post. This is the code:

birth_data_summary=birth_data %>% group_by(PLACE OF BIRTH - NAME OF HOSPITAL OR FACILITY) %>% summarise(mean(diffdates),sd(diffdates))

This is the error message:

Error in as_tibble():
! Column name SPECIFY HISPANIC (TEXT) must not be duplicated.
Use .name_repair to specify repair.
Caused by error in repaired_names():
! Names must be unique.
These names are duplicated:

"SPECIFY HISPANIC (TEXT)" at locations 101 and 119.
Run rlang::last_trace() to see where the error occurred.
Warning messages:
1: In grep("^[1].", names) : unable to translate 'TYPE OF BIRTH (PLURALITY <96> THIS PREGNANCY)' to a wide string 2: In grep("^[.][.](?:[.]|[1-9][0-9]*)", names) :
input string 16 is invalid
3: In grep("^[2].", names) : unable to translate 'BIRTH ORDER (PLURALITY <96> THIS PREGNANCY)' to a wide string 4: In grep("^[.][.](?:[.]|[1-9][0-9]*)", names) :
input string 17 is invalid

. ↩︎
. ↩︎

jrkrideau · January 5, 2024, 7:55pm

Probably but have a look at your variable names first.
This will not work because you have spaces in the variable names.

group_by(PLACE OF BIRTH - NAME OF HOSPITAL OR FACILITY

I f those are the actual names (shudder) Try

group_by(`PLACE OF BIRTH` , `NAME OF HOSPITAL OR FACILITY`)

Note the , rather than -

See https://cran.r-project.org/web/packages/janitor/vignettes/janitor.html for good way to clean up those names.