Rstudio, quarto unicode and janitor

RichardJTelford · October 28, 2022, 11:23pm

I have been having a problem with a class I am teaching. The students are importing an excel file that contains a unicode character \u2103 (℃) in the header row. They are then using janitor::clean_names().

For most of the students janitor::clean_names() converts the column name to "temperature_c" in both Rstudio and when rendering with quarto.
For about 20% of the students, janitor::clean_names() converts the "℃" to "temperature_u_00b0_c" (the unicode for "°") in Rstudio but to "temperature_c" when rendered with quarto. This then causes problems with the rest of their code when they render the document

In both rstudio and quarto the "℃" is being imported correctly as utf-8 and has the same output with charToRaw() - e2 84 83, so it is not an import problem. Somehow janitor is treating the unicode differently depending on how R is being run.

All the affected students are using R4.2.1 with the current version of RStudio on windows. Students might have Norwegian locales - I haven't been able to check that.

I have given the students some quick fixes, but it would be good to avoid this confusing problem

Minimal example

tibble::tibble("Temperature (℃)"= 1) |> janitor::clean_names() |> names()
#temperature_u_00b0_c

system · November 18, 2022, 11:24pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.