Mutate appends .x and .y to new unrepeated column names

When I conditionally loop with map over column rows in a tibble (from a list of tibbles (n=300+)) it sometimes appends .x or .y to the new columns. This only happens occasionally, and is annoying, but there appears to be nothing wrong with the data (if you are interested the values in the prop test are the same type (integer) and the total values (n) equal the sum of the values inputted to the x argument).

I order to get around to the appended names I have created the following function...
r

remove_appendedxy <- function (ti) {
  df_names <- ti %>%
    names() %>%
    stringr::str_remove(pattern = "\\.x") %>%
    stringr::str_remove(pattern = "\\.y")
  
  names(ti) <- df_names
}

but this over rights the tibble entirely and replaces it with the column names, would someone be able to show me wear I have gone wrong please?

Unfortunately its likely not going to be possible to support you in this without a reprex of the problem, I.e. the initial.problem where you inconsistently and unexpectedly received .x or .y column names.

The function for renaming fails because it has no return object.

(hiris <- head(iris) %>% mutate(Petal_width.x = Petal.Width * 2))

remove_appendedxy <- function (ti) {
  df_names <- ti %>%
    names() %>%
    stringr::str_remove(pattern = "\\.x") %>%
    stringr::str_remove(pattern = "\\.y")
  
  names(ti) <- df_names
  ti
}

remove_appendedxy(hiris)
1 Like

@nirgrahamuk

Thank you, this helps me understand how to deal with the symptom perfectly. I will try to add a repex later for the prop.test/mutate issue, when I have a bit more time.

1 Like

@nirgrahamuk here is a repex (sorry its so large) of the code I used to make the prop.test followed by some data, and replication of that data. The names list is used to conditionally select indexes to input into the prop.test, I have however removed the additional conditions and extra code, which is basically the same. Thank you in advance, for taking the time to read through this, I know its quite a pain to help people as the complexity ramps up.

As I mentioned before, sometime the mutate appends .x or .y to the end of the columns, but not all the time. I'm not sure why this occurs.

try(cor_logs <- purrr::pmap(list(df_list, item_names), function(first, forth) {
  if ( forth[[2]] == forth[[3]]) {
    purrr::pmap(first, ~ {
      prop.test(x = c(..2, ..5), n = c(..6, ..6), correct = "FALSE")
    }) %>%
      purrr::map_df(broom::tidy, .id = "formula") %>%
      mutate(formula = as.integer(formula)) %>%
      dplyr::select(-one_of(c("parameter", "conf.low", "conf.high", "method", "alternative"))) %>%
      dplyr::rename(
        prop_NL_NP2 = estimate1, prop_NH_PR2 = estimate2,
        high_chi = statistic, HN_pval = p.value
      ) %>%
      dplyr::left_join(first, by = c("formula" = "row_number")) %>%
      dplyr::rename(row_number = formula) %>%
      dplyr::select(!1:5, 1:5)
  }  else {
    print("ERROR in dataframe structure")
  }
}))

df1<- structure(list(day = structure(c(1546300800, 1546387200, 1546473600,
1546560000, 1546646400, 1546732800, 1546819200, 1546905600, 1546992000,
1547078400), tzone = "", class = c("POSIXct", "POSIXt")), ITEM10128_0 = c(1190L,
1532L, 1110L, 1134L, 1566L, 987L, 1324L, 937L, 1007L, 1100L),
ITEM10128_1 = c(2372L, 2865L, 2751L, 3238L, 3156L, 2541L,
2224L, 2032L, 5458L, 2386L), ITEM24373_0 = c(867L, 1062L,
1151L, 1267L, 1311L, 1011L, 916L, 1095L, 1008L, 925L), ITEM24373_1 = c(1814L,
3099L, 1976L, 3610L, 3040L, 2520L, 2520L, 3411L, 2434L, 2413L
), tot_sales_pr1 = c(3004L, 4631L, 3086L, 4744L, 4606L, 3507L,
3844L, 4348L, 3441L, 3513L)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
item_names <- list(c("day", "ITEM10128", "ITEM10128", "ITEM24373", "ITEM24373"
), c("day", "ITEM10128", "ITEM10128", "ITEM24373", "ITEM24373"
), c("day", "ITEM10128", "ITEM10128", "ITEM24373", "ITEM24373"
), c("day", "ITEM10128", "ITEM10128", "ITEM24373", "ITEM24373"
))

df4 <- df1 
df3 <- df1
df2 <- df1
df_list <- list(df1, df2, df3, df4)

I ran your example and got an error about row_number not being present (i assume in the 'first' plucked from df_list.
I had to add

function(first, forth) {
  first <- mutate(first,
                  row_number=row_number()) 
...

however, when I look at cor_logs, I dont see any strange name discrepancies, so it may be that this change clouds the issue you wanted help with ?

Ahh I forgot about the row number, it is iteratively added and at the end of my code. I don't seem to be able to recreate the issue unless I send the whole list of dataframes (n=362) to the function. Which may suggest a data issue, I'll just try somethings and feedback, so far I have been unable locate what I assume may be a problem dataframe in the list.

One of the errors I received before using try was that x values equal n values (this is the top error message in the proptest package. However, looping through the dfs did not find any dataframes where this was TRUE. Weird...

usually .x .y columns turn up when using base::merge and where two dataframe share columns in common, thats the circumstance where I'd seen that before.

Also happens with left_join()

library(tidyverse)
tibble(a=1, b =2) %>%
  left_join(tibble(a=1, b=3), by="a")
#> # A tibble: 1 x 3
#>       a   b.x   b.y
#>   <dbl> <dbl> <dbl>
#> 1     1     2     3

Created on 2020-11-26 by the reprex package (v0.3.0)

2 Likes

@AlexisW Thank you Alexis that makes sense now, I guess the only way around it in this case is to remove the appendages after the mutate. :smiley:

Or to avoid duplicate column names (or include them in the by argument if their contents are identical). Especially if you just remove the .x and .y, you'll end up with two columns with the same name, which is probably not a good idea.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.