Syntax of filter function confusion

I am trying to understand the reasoning behind the syntax of this filter function code from Coursera in the hotel_bookings dataset. I am confused why the first filter condition "hotel" does not need the dollar sign but the second filter condition "market_segment" needs the dollar sign. Both "hotel" and "market_segment" are columns in the data frame. This is the explanation from Coursera:

For the first step, you can use the filter() function to create a data set that only includes the data you want. Input 'City Hotel' in the first set of quotation marks and 'Online TA' in the second set of quotations marks to specify your criteria:

Here is the code they give:
'''
onlineta_city_hotels <- filter(hotel_bookings,
(hotel=="" &
hotel_bookings$market_segment==""))
'''

As far as I can tell, you are right, they are wrong.

In classic, so-called "base" R, you would need to use $ each time you're referencing a column; in filter() you never need to specify the data frame since you gave it at the beginning. I suspect here they just gave you an example of mixing the two syntaxes to show you it's possible (and it might be useful in the rare case you need a column from a different data frame). That seems confusing more than anything else.

To be sure, I suggest you try yourself and compare the result of these 5 commands:

filter(hotel_bookings,
  (hotel=="" &
  hotel_bookings$market_segment==""))

filter(hotel_bookings,
  (hotel=="" &
  market_segment==""))

filter(hotel_bookings,
  (hotel_bookings$hotel=="" &
  hotel_bookings$market_segment==""))

filter(hotel_bookings,
  (hotel_bookings$hotel=="" &
  market_segment==""))

filter(hotel_bookings,
  hotel=="",
  market_segment=="")

My prediction is that the result will be identical.

1 Like

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.