I'm very new to R and Rstudio. I am importing a dataset into R from Excel and copied and pasted the columns names from one Excel file to the other. When I import the dataset the columns names appear, but over the column names are other columns labeled V1-V215. What code do I use to remove them?
The best way to avoid the V1-V215 column names is to tell the function importing the data to treat the first row as column names. What command are you using to import the data?
birth_data <-import("ccbf__20220101_20230814.csv",header=FALSE)
birth_data_dictionary <-import("ccbf_data_dictionary_v2.3.3.xlsx" )
death_data_dictionary <-import("ccdf_data_dictionary_v2.3.1.csv")
death_data <-import ("ccdf__20220101_20230814.csv", header=FALSE)
I'm using this code to combine column names for my death data
colnames(death_data) = colnames(death_data_dictionary)
I haven't had a problem with headers for this code. I was thinking about using this code for the other dataset, but wanted to see if I can correct the problem first.
I hope this helps.
As FJCC says.
There is a good chance that some of your column-names are numeric and R thinks they that row is the first row of your data and not names.
I am guessing but try header = TRUE like this
birth_data <-import("ccbf__20220101_20230814.csv", header= TRUE)
Thank you so much ! that solved the problem.
I have another error message for a different line of code. I'm not sure if I should create a new post. This is the code:
birth_data_summary=birth_data %>% group_by(
PLACE OF BIRTH - NAME OF HOSPITAL OR FACILITY
) %>% summarise(mean(diffdates),sd(diffdates))
This is the error message:
Error in as_tibble()
:
! Column name SPECIFY HISPANIC (TEXT)
must not be duplicated.
Use .name_repair
to specify repair.
Caused by error in repaired_names()
:
! Names must be unique.
These names are duplicated:
- "SPECIFY HISPANIC (TEXT)" at locations 101 and 119.
Runrlang::last_trace()
to see where the error occurred.
Warning messages:
1: In grep("[1].", names) : unable to translate 'TYPE OF BIRTH (PLURALITY <96> THIS PREGNANCY)' to a wide string 2: In grep("^[.][.](?:[.]|[1-9][0-9]*)", names) :
input string 16 is invalid
3: In grep("[2].", names) : unable to translate 'BIRTH ORDER (PLURALITY <96> THIS PREGNANCY)' to a wide string 4: In grep("^[.][.](?:[.]|[1-9][0-9]*)", names) :
input string 17 is invalid
Probably but have a look at your variable names first.
This will not work because you have spaces in the variable names.
group_by(PLACE OF BIRTH - NAME OF HOSPITAL OR FACILITY
I f those are the actual names (shudder) Try
group_by(`PLACE OF BIRTH` , `NAME OF HOSPITAL OR FACILITY`)
Note the , rather than -
See https://cran.r-project.org/web/packages/janitor/vignettes/janitor.html for good way to clean up those names.