Hello everyone. This is my first post and reach out to the Community.
I've just completed the Google Data Analytics course and am working on my own case study. Rather than use a nicely cleaned publicly available data set I decided I would source my own data from Yahoo finance. I'm now wondering if this was a tad over ambitious. I thought I had formatted the data appropriately in Google Sheets but all three columns of my dataset are showing as type 'char' in R. I need the 'Date' column to be in date format and 'Close' column to be numeric.
I've tried all kinds of fixes from scouring stackoverflow.com etc. but nothing seems to work. Any pointers would be most welcome. Need to get this project off the ground! Markdown file below, I think? This is all new to me - the Google course is one thing, this is where the real work begins clearly.
The read error is possibly because you didn't put the csv file into your current working directory(wd). set the folder of the csv file as the wd or provide the full file path of the csv file might solve the problem:
# assuming that the directory "~/user/home/desktop/" is the folder of csv
setwd("~/user/home/desktop/")
read.csv("Safe_Haven.csv")
# or
read.csv("~/user/home/desktop/Safe_Haven.csv")
Use function as.Date to retrieve dates from chr, function strptime() to retrieve date-time form chr. The function as.numeric() is used to transform other datatypes into numeric. But in your example data, you may use str_remove to delete all the thousand separators at first.
In addition, you can combine using the function mutate from package dplyr to modify columns of a data.frame in clean syntax.
library(tidyverse)
safe_haven %>% mutate(
Date_ = as.Date(Date,format = "%m/%d/%Y %H:%M"),
Datetime_ = strptime(Date,format = '%m/%d/%Y %H:%M'),
Close_ = as.numeric(str_remove_all(Close,','))
)
# A tibble: 6 x 6
Date Close Invest Date_ Datetime_ Close_
<chr> <chr> <chr> <date> <dttm> <dbl>
1 04/01/2015 00:00 1,294.20 GOLD 2015-04-01 2015-04-01 00:00:00 1294.
2 04/02/2015 00:00 1,288.20 GOLD 2015-04-02 2015-04-02 00:00:00 1288.
3 04/03/2015 00:00 1,288.20 GOLD 2015-04-03 2015-04-03 00:00:00 1288.
4 04/04/2015 00:00 1,288.20 GOLD 2015-04-04 2015-04-04 00:00:00 1288.
5 04/05/2015 00:00 1,288.20 GOLD 2015-04-05 2015-04-05 00:00:00 1288.
6 04/06/2015 00:00 1,305.40 GOLD 2015-04-06 2015-04-06 00:00:00 1305.
# name the mutated column as the previous one to replace the original data:
safe_haven %>% mutate(
Date = as.Date(Date,format = "%m/%d/%Y %H:%M"),
Close = as.numeric(str_remove_all(Close,','))
)
# A tibble: 6 x 3
Date Close Invest
<date> <dbl> <chr>
1 2015-04-01 1294. GOLD
2 2015-04-02 1288. GOLD
3 2015-04-03 1288. GOLD
4 2015-04-04 1288. GOLD
5 2015-04-05 1288. GOLD
6 2015-04-06 1305. GOLD
setwd("/cloud/project")
safe_haven <- read.csv('Safe_Haven.csv')
# Here is the output from dput(head(safe_haven))
dput(head(safe_haven))
#> structure(list(Date = c("04/01/2015 00:00", "04/02/2015 00:00",
#> "04/03/2015 00:00", "04/04/2015 00:00", "04/05/2015 00:00", "04/06/2015 00:00"
#> ), Close = c("1,294.20", "1,288.20", "1,288.20", "1,288.20",
#> "1,288.20", "1,305.40"), Invest = c("GOLD", "GOLD", "GOLD", "GOLD",
#> "GOLD", "GOLD")), row.names = c(NA, 6L), class = "data.frame")
# After running your suggested code I get this erroe
library(tidyverse)
safe_haven %>%
mutate(date = as.Date(date, format = "%d/%m/%Y %H:%M"))
#> Error in `mutate()`:
#> ! Problem while computing `date = as.Date(date, format = "%d/%m/%Y
#> %H:%M")`.
#> Caused by error in `as.Date.default()`:
#> ! do not know how to convert 'date' to class "Date"
You don't have to set the working directory if you use Projects. It is good practice for collaborating as file locations are different across machines. Also just easier.
Watch carefully of the object name, I noticed that in your code, the data object named "safe_haven", but in later code which triggered error, you gave it "Safe_Haven".
Moreover, my code didn't save the result of mutate, you should save the result at first, like:
Highly suggest not using setwd() for setting working directories and instead use the package {here} in tandem with RProjects. It will make your code reproducible, streamline your workflow, and your working directories won't break as in above. See @jennybryan's blog post about why setwd() is poor and here() makes your life easier. Also want to add that using {lubridate} makes working with date/POSIXct variables a lot easier than base functions. Not that you can't use base functions as per the solution and get the same result but might be worth checking out {lubridate} at some point. Cheers