Mutate appends .x and .y to new unrepeated column names

August · November 26, 2020, 11:59am

When I conditionally loop with map over column rows in a tibble (from a list of tibbles (n=300+)) it sometimes appends .x or .y to the new columns. This only happens occasionally, and is annoying, but there appears to be nothing wrong with the data (if you are interested the values in the prop test are the same type (integer) and the total values (n) equal the sum of the values inputted to the x argument).

I order to get around to the appended names I have created the following function...
r

remove_appendedxy <- function (ti) {
  df_names <- ti %>%
    names() %>%
    stringr::str_remove(pattern = "\\.x") %>%
    stringr::str_remove(pattern = "\\.y")
  
  names(ti) <- df_names
}

but this over rights the tibble entirely and replaces it with the column names, would someone be able to show me wear I have gone wrong please?

nirgrahamuk · November 26, 2020, 12:02pm

Unfortunately its likely not going to be possible to support you in this without a reprex of the problem, I.e. the initial.problem where you inconsistently and unexpectedly received .x or .y column names.

The function for renaming fails because it has no return object.

(hiris <- head(iris) %>% mutate(Petal_width.x = Petal.Width * 2))

remove_appendedxy <- function (ti) {
  df_names <- ti %>%
    names() %>%
    stringr::str_remove(pattern = "\\.x") %>%
    stringr::str_remove(pattern = "\\.y")
  
  names(ti) <- df_names
  ti
}

remove_appendedxy(hiris)

August · November 26, 2020, 12:43pm

@nirgrahamuk

Thank you, this helps me understand how to deal with the symptom perfectly. I will try to add a repex later for the prop.test/mutate issue, when I have a bit more time.

August · November 26, 2020, 4:33pm

@nirgrahamuk here is a repex (sorry its so large) of the code I used to make the prop.test followed by some data, and replication of that data. The names list is used to conditionally select indexes to input into the prop.test, I have however removed the additional conditions and extra code, which is basically the same. Thank you in advance, for taking the time to read through this, I know its quite a pain to help people as the complexity ramps up.

As I mentioned before, sometime the mutate appends .x or .y to the end of the columns, but not all the time. I'm not sure why this occurs.

try(cor_logs <- purrr::pmap(list(df_list, item_names), function(first, forth) {
  if ( forth[[2]] == forth[[3]]) {
    purrr::pmap(first, ~ {
      prop.test(x = c(..2, ..5), n = c(..6, ..6), correct = "FALSE")
    }) %>%
      purrr::map_df(broom::tidy, .id = "formula") %>%
      mutate(formula = as.integer(formula)) %>%
      dplyr::select(-one_of(c("parameter", "conf.low", "conf.high", "method", "alternative"))) %>%
      dplyr::rename(
        prop_NL_NP2 = estimate1, prop_NH_PR2 = estimate2,
        high_chi = statistic, HN_pval = p.value
      ) %>%
      dplyr::left_join(first, by = c("formula" = "row_number")) %>%
      dplyr::rename(row_number = formula) %>%
      dplyr::select(!1:5, 1:5)
  }  else {
    print("ERROR in dataframe structure")
  }
}))

df1<- structure(list(day = structure(c(1546300800, 1546387200, 1546473600,
1546560000, 1546646400, 1546732800, 1546819200, 1546905600, 1546992000,
1547078400), tzone = "", class = c("POSIXct", "POSIXt")), ITEM10128_0 = c(1190L,
1532L, 1110L, 1134L, 1566L, 987L, 1324L, 937L, 1007L, 1100L),
ITEM10128_1 = c(2372L, 2865L, 2751L, 3238L, 3156L, 2541L,
2224L, 2032L, 5458L, 2386L), ITEM24373_0 = c(867L, 1062L,
1151L, 1267L, 1311L, 1011L, 916L, 1095L, 1008L, 925L), ITEM24373_1 = c(1814L,
3099L, 1976L, 3610L, 3040L, 2520L, 2520L, 3411L, 2434L, 2413L
), tot_sales_pr1 = c(3004L, 4631L, 3086L, 4744L, 4606L, 3507L,
3844L, 4348L, 3441L, 3513L)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
item_names <- list(c("day", "ITEM10128", "ITEM10128", "ITEM24373", "ITEM24373"
), c("day", "ITEM10128", "ITEM10128", "ITEM24373", "ITEM24373"
), c("day", "ITEM10128", "ITEM10128", "ITEM24373", "ITEM24373"
), c("day", "ITEM10128", "ITEM10128", "ITEM24373", "ITEM24373"
))

df4 <- df1 
df3 <- df1
df2 <- df1
df_list <- list(df1, df2, df3, df4)

nirgrahamuk · November 26, 2020, 4:42pm

I ran your example and got an error about row_number not being present (i assume in the 'first' plucked from df_list.
I had to add

function(first, forth) {
  first <- mutate(first,
                  row_number=row_number()) 
...

however, when I look at cor_logs, I dont see any strange name discrepancies, so it may be that this change clouds the issue you wanted help with ?

August · November 26, 2020, 4:57pm

Ahh I forgot about the row number, it is iteratively added and at the end of my code. I don't seem to be able to recreate the issue unless I send the whole list of dataframes (n=362) to the function. Which may suggest a data issue, I'll just try somethings and feedback, so far I have been unable locate what I assume may be a problem dataframe in the list.

One of the errors I received before using try was that x values equal n values (this is the top error message in the proptest package. However, looping through the dfs did not find any dataframes where this was TRUE. Weird...

nirgrahamuk · November 26, 2020, 5:05pm

usually .x .y columns turn up when using base::merge and where two dataframe share columns in common, thats the circumstance where I'd seen that before.

AlexisW · November 26, 2020, 5:17pm

Also happens with left_join()

library(tidyverse)
tibble(a=1, b =2) %>%
  left_join(tibble(a=1, b=3), by="a")
#> # A tibble: 1 x 3
#>       a   b.x   b.y
#>   <dbl> <dbl> <dbl>
#> 1     1     2     3

^{Created on 2020-11-26 by the reprex package (v0.3.0)}

August · November 26, 2020, 7:17pm

@AlexisW Thank you Alexis that makes sense now, I guess the only way around it in this case is to remove the appendages after the mutate.

AlexisW · November 27, 2020, 3:44pm

Or to avoid duplicate column names (or include them in the by argument if their contents are identical). Especially if you just remove the .x and .y, you'll end up with two columns with the same name, which is probably not a good idea.

system · December 4, 2020, 3:44pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.