, the thing is that I don't know how to add a column when the number of rows differ from each other.
I also re-order them, replacing the columns for rows and viceversa using this:
Once doing this, I decided to choose any column, for example: Coquimbo (ndataC$Coquimbo and ndataF$Coquimbo). As you can see from these two, they have different number of rows, which they can't be put together, how ever I saw some codes which they added 0 in replacement, but most of them (due to my poor knowledge) didn't work for me. What I'm trying to do is add 0 where the dates from ndataF, doesn't exist.
Last but not least, how do I remove those X from the data? Thank you very much for your time!
Any suggestion about how to improve with R will be appreciated it!
Where did the data come from---spreadsheet, database, etc....? It looks like the column names originally were dates. R will not normally read in any number or date as a column (variable) name so it adds an X to column name.
You have two data.frames dataC (17 X 462) and dataF 17 x 443) so they have the same number of rows. It is the number of columns that is not the same. It looks like 443 columns have the same names and 19 (cols 2:20] in dataC are additional. Does this make sense?
I think you need to explain what you want to do in terms of the analysis that you want and what the data is. It seems like you want to bind the two data sets together but we need a bit more information before we can suggest how to do it or even if it makes sense.
Hey! Thanks for the answer. I'm currently working with ndataC and ndataF, and yes, I would like to bind to columns from different database.
The data is obtained from the link that I post above (Currently don't know the exactly web, someone sent me these two links), it is relate to COVID-19 in Chile, where dataC shows the number of cases cumulative and dataF shows the number of deceased cumulative.
And my idea as I said before, is to bind or combine, or put together two columns from different database, and I know that they differ in the number of rows (ndataC and ndataF), but I would like to fill with zeroes in the dates that are not in ndataF.
For example:
ndataC starts in 2020.03.03 and ndataF starts in 2020.03.22, that's 19 days, those 19 days I would like it to fill it with zeroes in ndataF, hope there you can understand it better, if not please let me know and I'll would try my best to explain myself even better.
I suggest you reshape the data sets to have one column for the Region, one for the Date and one for the value. You can then join these two data sets by matching the Region and the date. It should be easy to do any analysis or graphing that is required starting from this long form data.
dataC<- read.csv("https://raw.githubusercontent.com/MinCiencia/Datos-COVID19/master/output/producto3/CasosTotalesCumulativo.csv", header = T, sep =",")
dataF <- read.csv("https://raw.githubusercontent.com/MinCiencia/Datos-COVID19/master/output/producto14/FallecidosCumulativo.csv", header = T, sep =",")
library(tidyr)
library(lubridate)
library(dplyr)
dataClong <- pivot_longer(dataC, -Region,names_to = "Date")
dataFlong <- pivot_longer(dataF, -Region,names_to = "Date")
dataClong <- mutate(dataClong, Date = substr(Date,2,11), #remove the X
Date = ymd(Date)) #make the Date column a numeric Date
dataFlong <- mutate(dataFlong, Date = substr(Date,2,11),
Date = ymd(Date))
AllData <- left_join(dataClong, dataFlong, by = c("Region","Date"),
suffix = c("_C","_F"))
head(AllData)
#> # A tibble: 6 x 4
#> Region Date value_C value_F
#> <chr> <date> <dbl> <dbl>
#> 1 Arica y Parinacota 2020-03-03 0 NA
#> 2 Arica y Parinacota 2020-03-04 0 NA
#> 3 Arica y Parinacota 2020-03-05 0 NA
#> 4 Arica y Parinacota 2020-03-06 0 NA
#> 5 Arica y Parinacota 2020-03-07 0 NA
#> 6 Arica y Parinacota 2020-03-08 0 NA
tail(AllData)
#> # A tibble: 6 x 4
#> Region Date value_C value_F
#> <chr> <date> <dbl> <dbl>
#> 1 Total 2021-06-01 1389357 29344
#> 2 Total 2021-06-02 1394973 29385
#> 3 Total 2021-06-03 1403101 29598
#> 4 Total 2021-06-04 1411346 29696
#> 5 Total 2021-06-05 1420266 29816
#> 6 Total 2021-06-06 1427956 29937
I see, at first I thought that it was easier to work the data changing their position (columns for rows and rows for columns, like a matrix), but this makes more sense to me if I wanted to put them into a graph, thank you very much!