Adding columns, changing ALL rows names

Yonkleinverson · June 6, 2021, 8:29pm

Hello! I'm a newbie and I need some help using R.
The thing is that I got two data bases related with COVID-19

dataC<- read.csv("https://raw.githubusercontent.com/MinCiencia/Datos-COVID19/master/output/producto3/CasosTotalesCumulativo.csv", header = T, sep =",")
dataF <- read.csv("https://raw.githubusercontent.com/MinCiencia/Datos-COVID19/master/output/producto14/FallecidosCumulativo.csv", header = T, sep =",")

, the thing is that I don't know how to add a column when the number of rows differ from each other.
I also re-order them, replacing the columns for rows and viceversa using this:

#For dataC
ndataC<- data.frame(t(dataC[-1]))
colnames(ndataC) <- dataC[, 1]
ndataC
### For dataF
ndataF <- data.frame(t(dataF[-1]))
colnames(dataF) <- dataF[, 1]
dataF

Once doing this, I decided to choose any column, for example: Coquimbo (ndataC$Coquimbo and ndataF$Coquimbo). As you can see from these two, they have different number of rows, which they can't be put together, how ever I saw some codes which they added 0 in replacement, but most of them (due to my poor knowledge) didn't work for me. What I'm trying to do is add 0 where the dates from ndataF, doesn't exist.
Last but not least, how do I remove those X from the data? Thank you very much for your time!
Any suggestion about how to improve with R will be appreciated it!

jrkrideau · June 6, 2021, 10:02pm

Welcome to the forum and thanks for the data.

Where did the data come from---spreadsheet, database, etc....? It looks like the column names originally were dates. R will not normally read in any number or date as a column (variable) name so it adds an X to column name.

You have two data.frames dataC (17 X 462) and dataF 17 x 443) so they have the same number of rows. It is the number of columns that is not the same. It looks like 443 columns have the same names and 19 (cols 2:20] in dataC are additional. Does this make sense?

I think you need to explain what you want to do in terms of the analysis that you want and what the data is. It seems like you want to bind the two data sets together but we need a bit more information before we can suggest how to do it or even if it makes sense.

Yonkleinverson · June 7, 2021, 1:45am

Hey! Thanks for the answer. I'm currently working with ndataC and ndataF, and yes, I would like to bind to columns from different database.
The data is obtained from the link that I post above (Currently don't know the exactly web, someone sent me these two links), it is relate to COVID-19 in Chile, where dataC shows the number of cases cumulative and dataF shows the number of deceased cumulative.
And my idea as I said before, is to bind or combine, or put together two columns from different database, and I know that they differ in the number of rows (ndataC and ndataF), but I would like to fill with zeroes in the dates that are not in ndataF.
For example:
ndataC starts in 2020.03.03 and ndataF starts in 2020.03.22, that's 19 days, those 19 days I would like it to fill it with zeroes in ndataF, hope there you can understand it better, if not please let me know and I'll would try my best to explain myself even better.

FJCC · June 7, 2021, 2:15am

I suggest you reshape the data sets to have one column for the Region, one for the Date and one for the value. You can then join these two data sets by matching the Region and the date. It should be easy to do any analysis or graphing that is required starting from this long form data.

dataC<- read.csv("https://raw.githubusercontent.com/MinCiencia/Datos-COVID19/master/output/producto3/CasosTotalesCumulativo.csv", header = T, sep =",")
dataF <- read.csv("https://raw.githubusercontent.com/MinCiencia/Datos-COVID19/master/output/producto14/FallecidosCumulativo.csv", header = T, sep =",")
library(tidyr)
library(lubridate)
library(dplyr)
dataClong <- pivot_longer(dataC, -Region,names_to = "Date")
dataFlong <- pivot_longer(dataF, -Region,names_to = "Date")
dataClong <- mutate(dataClong, Date = substr(Date,2,11), #remove the X
                    Date = ymd(Date)) #make the Date column a numeric Date

dataFlong <- mutate(dataFlong, Date = substr(Date,2,11),
                    Date = ymd(Date))
AllData <- left_join(dataClong, dataFlong, by = c("Region","Date"), 
                     suffix = c("_C","_F"))
head(AllData)
#> # A tibble: 6 x 4
#>   Region             Date       value_C value_F
#>   <chr>              <date>       <dbl>   <dbl>
#> 1 Arica y Parinacota 2020-03-03       0      NA
#> 2 Arica y Parinacota 2020-03-04       0      NA
#> 3 Arica y Parinacota 2020-03-05       0      NA
#> 4 Arica y Parinacota 2020-03-06       0      NA
#> 5 Arica y Parinacota 2020-03-07       0      NA
#> 6 Arica y Parinacota 2020-03-08       0      NA

tail(AllData)
#> # A tibble: 6 x 4
#>   Region Date       value_C value_F
#>   <chr>  <date>       <dbl>   <dbl>
#> 1 Total  2021-06-01 1389357   29344
#> 2 Total  2021-06-02 1394973   29385
#> 3 Total  2021-06-03 1403101   29598
#> 4 Total  2021-06-04 1411346   29696
#> 5 Total  2021-06-05 1420266   29816
#> 6 Total  2021-06-06 1427956   29937

^{Created on 2021-06-06 by the reprex package (v0.3.0)}

Yonkleinverson · June 7, 2021, 2:41pm

I see, at first I thought that it was easier to work the data changing their position (columns for rows and rows for columns, like a matrix), but this makes more sense to me if I wanted to put them into a graph, thank you very much!

system · June 14, 2021, 2:41pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.