R filling in the names of columns automatically, completely bizarre behavior

I have no idea what is going on here, and this seems baffling to me.

I'm sorting through some taxonomy with the taxize package; I need to do something like this:

fung = taxize::fg_name_search("Xanthoparmelia ionnis-simae")
new_name = fung$current_name

I noticed that for some records it would return a number instead of a name, and I noticed one of the column names was "current_name_record_number". It seems to be automatically assuming that I mean "current_name_record_number" instead of "current_name". It's filling in the name automatically inside of the console.

[1] "name_of_fungus" "authors"
[3] "specific_epithet" "infraspecific_rank"
[5] "orthography_comment" "year_of_publication"
[7] "editorial_comment" "sts_flag"
[9] "record_number" "basionym_record_number"
[11] "protonym_record_number" "name_of_fungus_fundic_record_number"
[13] "current_name_record_number" "updatedby"
[15] "updateddate" "addeddate"
[17] "uuid"
[1] "343913"
[1] "343913"

I tested this on a dummy data frame and it does the same thing.

df = data.frame(current_name_record_number="343913")
[1] "343913"

This is deadly behavior and shouldn't exist. I don't understand how this is possible and needs to be fixed. Or maybe I've done something wrong? I need help to understand this issue and how it can be solved.

R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.5.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Chicago
tzcode source: internal

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] taxize_0.9.100

loaded via a namespace (and not attached):
[1] crayon_1.5.2 conditionz_0.1.0 nlme_3.1-162 cli_3.6.1 rlang_1.1.1
[6] crul_1.4.0 stringi_1.7.12 jsonlite_1.8.7 data.table_1.14.8 zoo_1.8-12
[11] glue_1.6.2 httpcode_0.3.0 bold_1.3.0 grid_4.3.1 foreach_1.5.2
[16] ape_5.7-1 lifecycle_1.0.3 stringr_1.5.0 compiler_4.3.1 codetools_0.2-19
[21] Rcpp_1.0.11 rstudioapi_0.15.0 lattice_0.21-8 digest_0.6.33 R6_2.5.1
[26] curl_5.0.1 parallel_4.3.1 magrittr_2.0.3 uuid_1.1-1 tools_4.3.1
[31] iterators_1.0.14 xml2_1.3.6

Hi @ksanbon this is "normal" behaviour for R - here 2 examples to clarify.

Your example dataframe

> df = data.frame(current_name_record_number="343913")
> df$c
[1] "343913"
> df$cu
[1] "343913"
> df$current
[1] "343913"

works since there are no other columns to match against - the same behaviour can be seen also in your original dataframe - there are no other columns that start with current_name so R will "autocomplete" for you.

Here a smaller example where this magic will not happen:

> df2 = data.frame(current_name_record_number="343913",
+                  current_name = "a")
> df2$c
> df2$current
> df2$current_name
[1] "a"
> df2$current_name_
[1] "343913"

it only "autocompletes" when there is only one possibility. This is the reason why df2$c and df$current didn't work (could match multiple columns), df2$current_name matches only one column perfectly and df2$current_name_ "autocompletes" to current_name_record_number as it is the "only" possibility left to match.

Hope it clarifies it a bit. R is an interpreted language so "magical" behaviour like the one you encountered happen all the time.

This behavior of the $ operator can be modified to throw a warning. Here is a Stackoverflow thread. I suggest you read both answers. The [[ operator does not do partial matching as a default, but it can be turned on.

1 Like

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.