How to work with preloaded datasets in RStudio

aaron.hodges · August 23, 2023, 2:15pm

head(airquality)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6

Solar.R > 150 & Wind > 10
Error: object 'Solar.R' not found

Can someone explain what I am doing wrong? Thank You!

M_AcostaCH · August 23, 2023, 2:27pm

Hi @aaron.hodges, you need first call de object data, see this example:

# if the data is airquality

data_filtered <- airquality %>%
  filter(Solar.R > 150, Wind > 10)

head(data_filtered)

jrkrideau · August 23, 2023, 3:53pm

Another way would be to use the {data.table} package. You probably need to install the package.

install.packages("data.table")

Then

library(data.table)

dat1  <- as.data.table(airquality) ## Convert data.frame to data.table

dat2  <- dat1[Solar.R > 150 & Wind > 10]

dat2

It does the same thing as @ M_AcostaCH's code. I just find the syntax less verbose

aaron.hodges · August 23, 2023, 5:40pm

Error: object 'data_filtered' not found

aaron.hodges · August 23, 2023, 5:45pm

Thank You, it worked! But I noticed you used brackets instead of parenthesis for dat2 <- dat1[Solar.R > 150 & Wind > 10] Is there any particular reason why?

FJCC · August 23, 2023, 6:09pm

@M_AcostaCH's code works if you load the dplyr library.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
data_filtered <- airquality %>%
  filter(Solar.R > 150, Wind > 10)

head(data_filtered)
#>   Ozone Solar.R Wind Temp Month Day
#> 1    18     313 11.5   62     5   4
#> 2    14     274 10.9   68     5  14
#> 3    14     334 11.5   64     5  16
#> 4    34     307 12.0   66     5  17
#> 5    30     322 11.5   68     5  19
#> 6    11     320 16.6   73     5  22

^{Created on 2023-08-23 with reprex v2.0.2}

M_AcostaCH · August 23, 2023, 6:41pm

Nice update @FJCC

jrkrideau · August 23, 2023, 8:12pm

Yes, data.table works well with base R and tidyverse but the syntax and, I think, the design philosophies are quite different.

From A data.table and dplyr tour

Syntax:

The general data.table syntax is as follows: DT[i, j, by, ...] which means: “Take DT, subset rows using i , then calculate j , grouped by by ” with possible extra options ... . It allows to combine several operations in a very concise and consistent expression.
The syntax of dplyr is based on key verbs corresponding to the most common operations: filter() , arrange() , select() , mutate() , summarise() , … These functions can be combine with group_by() to aggregate data ‘by group’ and with a bunch of helper functions. It is a ‘do one thing at a time’ approach, chaining together functions dedicated to a specific task.

So the basic answer for the brackets is that this in how R identifies what dataset and what subsets of data are to be used. You can think of it as somewhat the equivalent of

airquality %>%

in the tidyverse.

And so

 airquality %>%
  filter(Solar.R > 150, Wind > 10)

is the equivalent of

dat1[Solar.R > 150 & Wind > 10]

The really nice thing is that we can use most or all base and tidyverse commands within data.table so we get the best of both worlds.

For example try this:

library(data.table)
library(tidyverse)
dat1  <- as.data.table(airquality) ## Convert data.frame to data.table

dat1[ , ggplot(, aes(Ozone, Wind)) + geom_point()]

system · September 13, 2023, 8:12pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.