Sorting By Date -- New To R

ImranJ · January 17, 2020, 7:31pm

Hi Everyone I am new to R and looking for some help categorizing my data, or better a good tutorial on how this can be done. I currently have a data set where the date column is organized in the format YYYY-MM-DD. I am looking for some help to categorize the dates by Year, then month. if there is no date in column 1, then I would like to use the date in column 2. meaning port of the information of that row to column 1. This probably means I need to categorize column 1 & 2 and then sub in 2 where there is no 1.

technocrat · January 17, 2020, 7:44pm

Hi, and welcome.

Questions benefit greatly from a reproducible example, called a reprex. In this case it would resolve an ambiguity of whether your date columns were character strings, like "2020-01-18" or class.

The lubridate package has functions to discard DD to get a year/month.

The dplyr package allows you to replace the contents of the first column with those of the second. Here, again, an assumption is needed without a reprex--that the empty data for column 1 is represented by NA

The syntax to do this would be

my_data %<>% mutate(Date1 <- ifelse(is.na(),Date2,Date1)

(there's also a sort function, called arrange

but you'll want to look at the R for Data Science chapters on this, first.

ImranJ · January 17, 2020, 7:58pm

Hello, Thanks for your reply.

Currently, the date column is represented as a string Ex) YYYY-MM-DD. However when there is no date, the box is left empty. Should this change how I am going about answering this question?

technocrat · January 17, 2020, 8:04pm

You can use the lubridate::hm() function to convert. That will probably produce NAs

ImranJ · January 17, 2020, 8:09pm

Thanks. I will try this and get back to you

ImranJ · January 20, 2020, 3:23pm

Should something go inside the is.na() block. I have it in my scipt however I am getting an error

Error in is.na() : 0 arguments passed to 'is.na' which requires 1

andresrcs · January 20, 2020, 3:41pm

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

ImranJ · January 20, 2020, 3:52pm

I cannot share my data set due to the project I am working on but attached I have included a dataset of something similar showing what I am trying to do . Where there is no Incident Date Take the date from the Report Date Column and put that in the Incident Date column as well so it shows in both. My goal is to fill in all the NA's and then once done, organize by date however I think I already have script to do that

andresrcs · January 20, 2020, 4:09pm

That is not a dataset, that is a screenshot and is not very useful since I can't copy your sample data into my R session and give you a working solution, please read the guide I gave you and try to provide a proper reproducible example including sample data on a copy/paste friendly format.

ImranJ · January 20, 2020, 4:33pm

Perhaps this might help, however when I am using the dataset I currently have it is from a CSV file and this is a small table I have just created in R. I hope this helps.

datapasta::df_paste(head(iris, 11)[, c('Incident.Date', 'Report.Date','Artifact.Number')])

data.frame(
  Incident.Date = c('1/1/2015','NA','2/2/2015','NA','5/5/2016','NA','4/4/2017','4/4/2018','NA','5/5/2018','1/4/2015'),
  Report.Date = c('3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019'),
  Artifact.Number = c(1,2,3,4,5,6,7,8,9,10,11)
)
)

In the dataset that I am working with, there are no strings 'NA', they are just blank cells

andresrcs · January 20, 2020, 4:49pm

Yes, it helps, this is sample data on a proper format

You are getting 'NA' as a string (with quotes) because you are not reading the data correctly from the CSV file, they should be NA (without quotes) which is the way R represents blanks, it stands for Not Available.

Anyways, If I understand you correctly, this is what you are trying to do

library(tidyverse)
library(lubridate)

# This is just sample data, you can replace this with the actual dataset that you read from the CSV file
sample_data <- data.frame(
    Incident.Date = c('1/1/2015','NA','2/2/15','NA','5/5/2016','NA','4/4/2017','4/4/2018','NA','5/5/2018','1/4/2015'),
    Report.Date = c('3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019','3/3/2019'),
    Artifact.Number = c(1,2,3,4,5,6,7,8,9,10,11)
)

sample_data %>%
    mutate_at(vars(contains("Date")), dmy) %>% 
    rowwise() %>% 
    mutate(Incident.Date = if_else(
        is.na(Incident.Date),
        true = Report.Date,
        false = Incident.Date)
    ) %>% 
    ungroup() %>% 
    mutate(year = year(Incident.Date),
           month = month(Incident.Date)) %>% 
    arrange(Incident.Date)
#> Warning: 4 failed to parse.
#> # A tibble: 11 x 5
#>    Incident.Date Report.Date Artifact.Number  year month
#>    <date>        <date>                <dbl> <dbl> <dbl>
#>  1 2015-01-01    2019-03-03                1  2015     1
#>  2 2015-02-02    2019-03-03                3  2015     2
#>  3 2015-04-01    2019-03-03               11  2015     4
#>  4 2016-05-05    2019-03-03                5  2016     5
#>  5 2017-04-04    2019-03-03                7  2017     4
#>  6 2018-04-04    2019-03-03                8  2018     4
#>  7 2018-05-05    2019-03-03               10  2018     5
#>  8 2019-03-03    2019-03-03                2  2019     3
#>  9 2019-03-03    2019-03-03                4  2019     3
#> 10 2019-03-03    2019-03-03                6  2019     3
#> 11 2019-03-03    2019-03-03                9  2019     3

^{Created on 2020-01-20 by the reprex package (v0.3.0.9000)}

ImranJ · January 20, 2020, 6:36pm

Thank you for your solution, however on one machine this worked perfectly and on another I get the following error

Error in UseMethod("tbl_vars")
    no applicable method for 'tbl_vars' applied to an object of class "c('matrix, 'logical')"

I've double checked to make sure all the correct packages are installed and that the libraries are also being used correctly.

andresrcs · January 20, 2020, 8:14pm

Can you provide a reproducible example for this? Other wise I have no means to help you any further.

ImranJ · January 20, 2020, 8:16pm

I've been doing some research and it seems that in my provided example, all the date values where entered as strings. However in my data set, they could've been vectorized or be in a different formats.. That could be potentially why I am getting this error. I will have to do some further digging and update the thread when I know more.

technocrat · January 20, 2020, 8:32pm

Yes, is.na(SOMEVARIABLE)

andresrcs · January 20, 2020, 9:18pm

You can show the actual structure of your data by using dput() instead of datapasta::df_paste().

system · February 10, 2020, 9:18pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.