Convert char to date and numeric

SJB_137 · May 2, 2022, 9:51pm

Hello everyone. This is my first post and reach out to the Community.

I've just completed the Google Data Analytics course and am working on my own case study. Rather than use a nicely cleaned publicly available data set I decided I would source my own data from Yahoo finance. I'm now wondering if this was a tad over ambitious. I thought I had formatted the data appropriately in Google Sheets but all three columns of my dataset are showing as type 'char' in R. I need the 'Date' column to be in date format and 'Close' column to be numeric.

I've tried all kinds of fixes from scouring stackoverflow.com etc. but nothing seems to work. Any pointers would be most welcome. Need to get this project off the ground! Markdown file below, I think? This is all new to me - the Google course is one thing, this is where the real work begins clearly.

https://8bddcf2b612c4486b7e6aa3e656458b8.app.rstudio.cloud/file_show?path=%2Fcloud%2Fproject%2FSafe_Haven.html

williaml · May 2, 2022, 10:00pm

Hi, can you provide a reproducible example?

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

SJB_137 · May 2, 2022, 10:49pm

Hi. I tried to do this but for some reason couldn't read the csv file?

install.packages("ggplot2")
#> Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
#> (as 'lib' is unspecified)
library(ggplot2)
install.packages("tidyverse")
#> Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
#> (as 'lib' is unspecified)
library(tidyverse)
install.packages("datapasta")
#> Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
#> (as 'lib' is unspecified)
library(datapasta)
install.packages("styler")
#> Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
#> (as 'lib' is unspecified)
#> also installing the dependencies 'R.methodsS3', 'R.oo', 'R.utils', 'R.cache', 'rprojroot'
library(styler)
install.packages("reprex")
#> Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
#> (as 'lib' is unspecified)
library(reprex)

safe_haven <- read.csv("Safe_Haven.csv")
#> Warning in file(file, "rt"): cannot open file 'Safe_Haven.csv': No such file or
#> directory
#> Error in file(file, "rt"): cannot open the connection

head(safe_haven)
#> Error in head(safe_haven): object 'safe_haven' not found
str(safe_haven)
#> Error in str(safe_haven): object 'safe_haven' not found

sessionInfo()
#> R version 4.2.0 (2022-04-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] reprex_2.0.1    styler_1.7.0    datapasta_3.1.0 forcats_0.5.1  
#>  [5] stringr_1.4.0   dplyr_1.0.9     purrr_0.3.4     readr_2.1.2    
#>  [9] tidyr_1.2.0     tibble_3.1.6    tidyverse_1.3.1 ggplot2_3.3.5  
#> 
#> loaded via a namespace (and not attached):
#>  [1] lubridate_1.8.0   assertthat_0.2.1  digest_0.6.29     utf8_1.2.2       
#>  [5] R6_2.5.1          cellranger_1.1.0  backports_1.4.1   evaluate_0.15    
#>  [9] httr_1.4.2        highr_0.9         pillar_1.7.0      rlang_1.0.2      
#> [13] readxl_1.4.0      rstudioapi_0.13   R.utils_2.11.0    R.oo_1.24.0      
#> [17] rmarkdown_2.14    munsell_0.5.0     broom_0.8.0       compiler_4.2.0   
#> [21] modelr_0.1.8      xfun_0.30         pkgconfig_2.0.3   htmltools_0.5.2  
#> [25] tidyselect_1.1.2  fansi_1.0.3       crayon_1.5.1      tzdb_0.3.0       
#> [29] dbplyr_2.1.1      withr_2.5.0       R.methodsS3_1.8.1 grid_4.2.0       
#> [33] jsonlite_1.8.0    gtable_0.3.0      lifecycle_1.0.1   DBI_1.1.2        
#> [37] magrittr_2.0.3    scales_1.2.0      cli_3.3.0         stringi_1.7.6    
#> [41] fs_1.5.2          xml2_1.3.3        ellipsis_0.3.2    generics_0.1.2   
#> [45] vctrs_0.4.1       tools_4.2.0       R.cache_0.15.0    glue_1.6.2       
#> [49] hms_1.1.1         fastmap_1.1.0     yaml_2.3.5        colorspace_2.0-3 
#> [53] rvest_1.0.2       knitr_1.39        haven_2.5.0

^{Created on 2022-05-02 by the reprex package (v2.0.1)}

Minimal data set below

head(safe_haven)

          Date    Close Invest

1 04/01/2015 00:00 1,294.20 GOLD
2 04/02/2015 00:00 1,288.20 GOLD
3 04/03/2015 00:00 1,288.20 GOLD
4 04/04/2015 00:00 1,288.20 GOLD
5 04/05/2015 00:00 1,288.20 GOLD
6 04/06/2015 00:00 1,305.40 GOLD

str(safe_haven)
'data.frame': 15342 obs. of 3 variables:
Date : chr "04/01/2015 00:00" "04/02/2015 00:00" "04/03/2015 00:00" "04/04/2015 00:00" ... Close : chr "1,294.20" "1,288.20" "1,288.20" "1,288.20" ...
$ Invest: chr "GOLD" "GOLD" "GOLD" "GOLD" ...

williaml · May 3, 2022, 12:21am

Can you post the output of dput(head(safe_haven))?

Otherwise, this should probably work:

library(tidyverse)
safe_haven %>% 
  mutate(date = as.Date(date, format = "%d/%m/%Y %H:%M"))

yifanliu · May 3, 2022, 7:30am

The read error is possibly because you didn't put the csv file into your current working directory(wd). set the folder of the csv file as the wd or provide the full file path of the csv file might solve the problem:

# assuming that the directory "~/user/home/desktop/" is the folder of csv
setwd("~/user/home/desktop/")
read.csv("Safe_Haven.csv")
# or
read.csv("~/user/home/desktop/Safe_Haven.csv")

Use function as.Date to retrieve dates from chr, function strptime() to retrieve date-time form chr. The function as.numeric() is used to transform other datatypes into numeric. But in your example data, you may use str_remove to delete all the thousand separators at first.
In addition, you can combine using the function mutate from package dplyr to modify columns of a data.frame in clean syntax.

library(tidyverse)

safe_haven %>% mutate(
  Date_ = as.Date(Date,format = "%m/%d/%Y %H:%M"),
  Datetime_ = strptime(Date,format = '%m/%d/%Y %H:%M'),
  Close_ = as.numeric(str_remove_all(Close,','))
)
# A tibble: 6 x 6
  Date             Close    Invest Date_      Datetime_           Close_
  <chr>            <chr>    <chr>  <date>     <dttm>               <dbl>
1 04/01/2015 00:00 1,294.20 GOLD   2015-04-01 2015-04-01 00:00:00  1294.
2 04/02/2015 00:00 1,288.20 GOLD   2015-04-02 2015-04-02 00:00:00  1288.
3 04/03/2015 00:00 1,288.20 GOLD   2015-04-03 2015-04-03 00:00:00  1288.
4 04/04/2015 00:00 1,288.20 GOLD   2015-04-04 2015-04-04 00:00:00  1288.
5 04/05/2015 00:00 1,288.20 GOLD   2015-04-05 2015-04-05 00:00:00  1288.
6 04/06/2015 00:00 1,305.40 GOLD   2015-04-06 2015-04-06 00:00:00  1305.

# name the mutated column as the previous one to replace the original data:
safe_haven %>% mutate(
    Date = as.Date(Date,format = "%m/%d/%Y %H:%M"),
    Close = as.numeric(str_remove_all(Close,','))
)
# A tibble: 6 x 3
  Date       Close Invest
  <date>     <dbl> <chr> 
1 2015-04-01 1294. GOLD  
2 2015-04-02 1288. GOLD  
3 2015-04-03 1288. GOLD  
4 2015-04-04 1288. GOLD  
5 2015-04-05 1288. GOLD  
6 2015-04-06 1305. GOLD

SJB_137 · May 3, 2022, 6:41pm

setwd("/cloud/project")
safe_haven <- read.csv('Safe_Haven.csv')


# Here is the output from dput(head(safe_haven))

dput(head(safe_haven))
#> structure(list(Date = c("04/01/2015 00:00", "04/02/2015 00:00", 
#> "04/03/2015 00:00", "04/04/2015 00:00", "04/05/2015 00:00", "04/06/2015 00:00"
#> ), Close = c("1,294.20", "1,288.20", "1,288.20", "1,288.20", 
#> "1,288.20", "1,305.40"), Invest = c("GOLD", "GOLD", "GOLD", "GOLD", 
#> "GOLD", "GOLD")), row.names = c(NA, 6L), class = "data.frame")

# After running your suggested code I get this erroe

library(tidyverse)
safe_haven %>% 
  mutate(date = as.Date(date, format = "%d/%m/%Y %H:%M"))
#> Error in `mutate()`:
#> ! Problem while computing `date = as.Date(date, format = "%d/%m/%Y
#>   %H:%M")`.
#> Caused by error in `as.Date.default()`:
#> ! do not know how to convert 'date' to class "Date"

SJB_137 · May 3, 2022, 8:50pm

Thank you so much for your help. After running your suggested code I get the same output as you

   Date       Close Invest
   <date>     <dbl> <chr> 
 1 2015-04-01 1294. GOLD  
 2 2015-04-02 1288. GOLD  
 3 2015-04-03 1288. GOLD  
 4 2015-04-04 1288. GOLD  
 5 2015-04-05 1288. GOLD  
 6 2015-04-06 1305. GOLD  
 7 2015-04-07 1298. GOLD  
 8 2015-04-08 1291. GOLD  
 9 2015-04-09 1282. GOLD  
10 2015-04-10 1293. GOLD

(although when I run str(Safe_Haven) the date column still shows as 'chr')?

str(Safe_Haven)
spec_tbl_df [15,342 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Date  : chr [1:15342] "04/01/2015 00:00" "04/02/2015 00:00" "04/03/2015 00:00" "04/04/2015 00:00" ...
 $ Close : num [1:15342] 1294 1288 1288 1288 1288 ...
 $ Invest: chr [1:15342] "GOLD" "GOLD" "GOLD" "GOLD" ...
 - attr(*, "spec")=
  .. cols(
  ..   Date = col_character(),
  ..   Close = col_number(),
  ..   Invest = col_character()
  .. )
 - attr(*, "problems")=<externalptr>

Sorry but when I'm trying to provide you with reproducible example, not able to read the file. Something I'm doing wrong? reprex::reprex()

library(tidyverse)

Safe_Haven %>% mutate(
  Date_ = as.Date(Date,format = "%m/%d/%Y %H:%M"),
  Datetime_ = strptime(Date,format = '%m/%d/%Y %H:%M'),
  Close_ = as.numeric(str_remove_all(Close,','))
)
#> Error in mutate(., Date_ = as.Date(Date, format = "%m/%d/%Y %H:%M"), Datetime_ = strptime(Date, : object 'Safe_Haven' not found

Safe_Haven %>% mutate(
  Date = as.Date(Date,format = "%m/%d/%Y %H:%M"),
  Close = as.numeric(str_remove_all(Close,','))
)
#> Error in mutate(., Date = as.Date(Date, format = "%m/%d/%Y %H:%M"), Close = as.numeric(str_remove_all(Close, : object 'Safe_Haven' not found

williaml · May 3, 2022, 9:28pm

You don't have to set the working directory if you use Projects. It is good practice for collaborating as file locations are different across machines. Also just easier.

yifanliu · May 4, 2022, 6:13am

Watch carefully of the object name, I noticed that in your code, the data object named "safe_haven", but in later code which triggered error, you gave it "Safe_Haven".

Moreover, my code didn't save the result of mutate, you should save the result at first, like:

safe_haven_1 <- safe_haven %>% mutate(
    Date = as.Date(Date,format = "%m/%d/%Y %H:%M"),
    Close = as.numeric(str_remove_all(Close,','))
)

> str(safe_haven)
tibble [6 x 3] (S3: tbl_df/tbl/data.frame)
 $ Date  : chr [1:6] "04/01/2015 00:00" "04/02/2015 00:00" "04/03/2015 00:00" "04/04/2015 00:00" ...
 $ Close : chr [1:6] "1,294.20" "1,288.20" "1,288.20" "1,288.20" ...
 $ Invest: chr [1:6] "GOLD" "GOLD" "GOLD" "GOLD" ...
> str(safe_haven_1)
tibble [6 x 3] (S3: tbl_df/tbl/data.frame)
 $ Date  : Date[1:6], format: "2015-04-01" "2015-04-02" "2015-04-03" "2015-04-04" ...
 $ Close : num [1:6] 1294 1288 1288 1288 1288 ...
 $ Invest: chr [1:6] "GOLD" "GOLD" "GOLD" "GOLD" ...

SJB_137 · May 4, 2022, 8:51am

Thank you so much for your patience

benjaminhlina · May 5, 2022, 2:07pm

Highly suggest not using setwd() for setting working directories and instead use the package {here} in tandem with RProjects. It will make your code reproducible, streamline your workflow, and your working directories won't break as in above. See @jennybryan's blog post about why setwd() is poor and here() makes your life easier. Also want to add that using {lubridate} makes working with date/POSIXct variables a lot easier than base functions. Not that you can't use base functions as per the solution and get the same result but might be worth checking out {lubridate} at some point. Cheers

system · May 12, 2022, 2:07pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.