Need help creating a function for character string creation in R

I have a task where I must create a character string representing ventilation units on a vast set of city government building rooftops. After taking the attribute information from the representational shapefile, I have a dataframe consisting of related descriptors. For each ventilation unit, or rooftop fan, there is column information ID'ing which city borough, city property, city property code and city property building number to which its' assigned. A geometry column is also available.

I created a new dataframe based on a count function which tallied all rooftop fans within each city property building code. My problem is that I am absolutely unsure how to create a column which, based on the city property building code creates a string from 1 up to the number for each respective adjacent row. This way every rooftop ID is linked to it's corresponding rooftop.

#List the column names of dataframe
> names(RoofFanLocationString)
[1] "vars" "n"   

> #List the length of both columns
> length(RoofFanLocationString$vars)
[1] 931
> length(RoofFanLocationString$n)
[1] 931

#Create vector of first 20 city building codes
> BuildingValue <- (RoofFanLocationString$vars[1:20])

#Print out vector
> print(BuildingValue)
[1] "100.1"  "100.2"  "100.3"  "101.1"  "101.2"  "101.3"  "101.4"  "102.1"  "102.10"
[10] "102.2"  "102.3"  "102.4"  "102.5"  "102.6"  "102.7"  "102.8"  "102.9"  "103.3" 
[19] "103.4"  "103.5" 

# Create vector of first 20 ventilation unit amounts
VentilationCounts <- (RoofFanLocationString$n[1:20])

#Print first 20 values of rooftop units
> print(VentilationCounts)
 [1] 12 13  6  7  7  7  8 12 13  6  5  6 12 18  6  6  6 10  8  8

This example of code is meant to show that I want to append every element from the first list(Building Value) to every number from 1 to the numbered element of the second list(Ventilation Counts) in a new list.

It's obvious to me that this new vector will be much longer than each because of the different calculated permutations. It's not at all obvious to me how to execute this in R.

This is my first time messaging on the R Studio community. I mistakenly thought I'd be able to upload a csv file along with code snippets but it appears that capability is not here. I hope I've been concise enough to help whomever can help me.

Thank you

Welcome Akil! I'm not sure I understand what your starting data looks like or what the output should look like, so please let me know if the code below (basically three different ways of getting a similar result) is on the right track:


BuildingValue = c("100.1",  "100.2",  "100.3")
VentilationCounts = c(3, 4, 2)

map2(BuildingValue, VentilationCounts, 
     ~ paste(.x, 1:.y, sep="_"))
#> [[1]]
#> [1] "100.1_1" "100.1_2" "100.1_3"
#> [[2]]
#> [1] "100.2_1" "100.2_2" "100.2_3" "100.2_4"
#> [[3]]
#> [1] "100.3_1" "100.3_2"

map2_df(BuildingValue, VentilationCounts, 
        ~tibble(BuildingValue=.x, VentilationCount=.y, new.var=paste(.x, 1:.y, sep="_")))
#> # A tibble: 9 x 3
#>   BuildingValue VentilationCount new.var
#>   <chr>                    <dbl> <chr>  
#> 1 100.1                        3 100.1_1
#> 2 100.1                        3 100.1_2
#> 3 100.1                        3 100.1_3
#> 4 100.2                        4 100.2_1
#> 5 100.2                        4 100.2_2
#> 6 100.2                        4 100.2_3
#> 7 100.2                        4 100.2_4
#> 8 100.3                        2 100.3_1
#> 9 100.3                        2 100.3_2

dat = data.frame(BuildingValue, VentilationCounts)

dat %>% 
  group_by(BuildingValue) %>% 
  mutate(new.var = list(paste(BuildingValue, 1:VentilationCounts, sep="_"))) %>% 
#> # A tibble: 9 x 3
#> # Groups:   BuildingValue [3]
#>   BuildingValue VentilationCounts new.var
#>   <fct>                     <dbl> <chr>  
#> 1 100.1                         3 100.1_1
#> 2 100.1                         3 100.1_2
#> 3 100.1                         3 100.1_3
#> 4 100.2                         4 100.2_1
#> 5 100.2                         4 100.2_2
#> 6 100.2                         4 100.2_3
#> 7 100.2                         4 100.2_4
#> 8 100.3                         2 100.3_1
#> 9 100.3                         2 100.3_2

Created on 2019-08-28 by the reprex package (v0.3.0)

WHOA, thank you so much, Joel. This is really helpful. I tried the 2nd and 3rd methods and they worked LIKE A CHARM with larger vectors. However, I'm still having a problem opening the "tibble". I need to draw out all the rows and be able to write it to a csv file and export it. I've been researching how to display an entire tibble but it's been a very troublesome maze.

When you say "opening the tibble" I'm not sure what you mean. Do you want to display the whole data frame in the console (the output window)? If so, you can do (where dat is the name of the tibble). By the way, a tibble is just a data frame with some extra bells and whistles, one of which is that, by default, it prints to the console in an abbreviated form.

To write to a csv file, you can use the base R write.csv or the faster write_csv from the readr package.

write.csv(dat, "my_data.csv", rownames=FALSE)



write_csv(dat, "my_data.csv")

Also, if you prefer a standard R data frame, instead of a tibble, you can do:

dat = map2_df(BuildingValue, VentilationCounts, 
              ~data.frame(BuildingValue=.x, VentilationCount=.y, new.var=paste(.x, 1:.y, sep="_")))

Ok this is all very helpful. Your original solution worked on a small sample of my dataset and your last addition worked in practice on the whole enchilada. However I’ve come across an issue where the code and resulting dataframe has dropped trailing zeros.

So for instance, “16.10” and “ 16.20” in my original dataset become “16.1” and “16.2” in the intended final dataframe. This is an issue because these values are characters data types intended to represent building codes and therefore confusion may be caused without these necessary trailing zeros.

I tried using sprintf() as suggested in other forums without much satisfaction.

Thank you again for your clear and consistent assistance.

It would be helpful if you could provide a sample of the actual data frame you're working with, including it's structure (you can use for example dput(my_data[1:5, ]) and paste the output into your question to provide a data sample, or use the reprex package to provide a full reproducible example of your data and code).

By any chance are the building codes actualiy stored as numeric values in your data frame?
For example, run the following at see the output in the console:

paste0("16.10", "_1")
paste0(16.10, "_1")

OK, next time I will make sure to be more helpful by providing a sample of my dataframe and become better adept with the reprex package. I actually went back and uncovered what I did wrong and the code worked better than ever.

Thank you again for your time and assistance!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.