Hi there, I'm new to R and perhaps taking water over my head but is there a way to exert values from cells to create new columns when they have identifiers in the cells? I'm trying to make new columns out of the column " Landskapstyp" to separate and populate them with the various values in the cells i.e. Skog (S), Våtmark (V), Urban miljö (U), Marin Miljö etc (which all should have their own columns) while removing the other text. Is this even possible? Thanks for any advice!
This could be a start. Be more specific if you want more guidance .
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(stringr)
df1 <- data.frame(
Landskapstyp=c(
'Marin miljö, Skog (S)',
'Skog (S)'
)
)
df2 <- df1 |>
mutate(
marin= case_when(
stringr::str_detect(Landskapstyp,fixed("Marin")) ~ TRUE,
TRUE ~ FALSE
),
skog= case_when(
stringr::str_detect(Landskapstyp,fixed("Skog (S)")) ~ TRUE,
TRUE ~ FALSE
)
)
print(df2)
#> Landskapstyp marin skog
#> 1 Marin miljö, Skog (S) TRUE TRUE
#> 2 Skog (S) FALSE TRUE
Created on 2022-11-01 with reprex v2.0.2
If you provide part of your data frame using dput(head(YOURDF, 20))
and you specify in detail an example of what you want to achieve, we can try to get the result
Thank you so much for the help so far! To clarify a bit, I’m trying to separate out the column “Landskapstyp” into several variations depending on the cell value so it becomes easier to summarise - I just assumed separating them to columns would help for sorting, tallying plotting etc but there might be a better way. The cell value can contain the following 9 values.
Jordbrukslandskap (J)
Skog (S)
Urban miljö (U)
Fjäll (F)
Våtmark (V)
Sötvatten (L)
Havsstrand (H)
Marin Miljö (M)
Bracksvatten (B)
Mostly the cell value for the records includes one or two of these, however, it’s possible to include all nine values. When importing the CSV file to R the value also adds additional text to some cells which are not needed like “- Stor betydelse” and “- Har betydelse” . Querying dput(head(DF, 20)) part of the outcome looks as follows
Landskapstyp = c("Marin miljö (M) - Stor betydelse", "Marin miljö (M) - Stor betydelse", "Marin miljö (M) - Stor betydelse", "Havsstrand (H) - Stor betydelse", "Skog (S) - Stor betydelse, Våtmark (V) - Har betydelse", "Skog (S) - Stor betydelse, Våtmark (V) - Stor betydelse", "Skog (S) - Stor betydelse, Våtmark (V) - Stor betydelse, Urban miljö (U) - Har betydelse", "Skog (S) - Stor betydelse, Urban miljö (U) - Stor betydelse"….
In that case expand my example with the other classes.
Note that I now treat upper and lower case the same,
as you use both miljö and Miljö
library(dplyr) ; library(stringr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df1 <- data.frame(
Landskapstyp = c("Marin miljö (M) - Stor betydelse",
"Marin miljö (M) - Stor betydelse",
"Marin miljö (M) - Stor betydelse",
"Havsstrand (H) - Stor betydelse",
"Skog (S) - Stor betydelse, Våtmark (V) - Har betydelse",
"Skog (S) - Stor betydelse, Våtmark (V) - Stor betydelse",
"Skog (S) - Stor betydelse, Våtmark (V) - Stor betydelse, Urban miljö (U) - Har betydelse",
"Skog (S) - Stor betydelse, Urban miljö (U) - Stor betydelse"
)
)
f_ig <- function(x) stringr::fixed(x, ignore_case = TRUE)
df2 <- df1 |>
mutate(
L_J= case_when(
stringr::str_detect(Landskapstyp,f_ig("Jordbrukslandskap (J)")) ~ TRUE,
TRUE ~ FALSE
),
L_S= case_when(
stringr::str_detect(Landskapstyp,f_ig("Skog (S)")) ~ TRUE,
TRUE ~ FALSE
),
L_U= case_when(
stringr::str_detect(Landskapstyp,f_ig("Urban miljö (U)")) ~ TRUE,
TRUE ~ FALSE
),
L_F= case_when(
stringr::str_detect(Landskapstyp,f_ig("Fjäll (F)")) ~ TRUE,
TRUE ~ FALSE
),
L_V= case_when(
stringr::str_detect(Landskapstyp,f_ig("Våtmark (V)")) ~ TRUE,
TRUE ~ FALSE
),
L_L= case_when(
stringr::str_detect(Landskapstyp,f_ig("Sötvatten (L)")) ~ TRUE,
TRUE ~ FALSE
),
L_H= case_when(
stringr::str_detect(Landskapstyp,f_ig("Havsstrand (H)")) ~ TRUE,
TRUE ~ FALSE
),
L_M= case_when(
stringr::str_detect(Landskapstyp,f_ig("Marin Miljö (M)")) ~ TRUE,
TRUE ~ FALSE
),
L_B= case_when(
stringr::str_detect(Landskapstyp,fixed("Bracksvatten (B)")) ~ TRUE,
TRUE ~ FALSE
)
)
head(df2)
#> Landskapstyp L_J L_S L_U
#> 1 Marin miljö (M) - Stor betydelse FALSE FALSE FALSE
#> 2 Marin miljö (M) - Stor betydelse FALSE FALSE FALSE
#> 3 Marin miljö (M) - Stor betydelse FALSE FALSE FALSE
#> 4 Havsstrand (H) - Stor betydelse FALSE FALSE FALSE
#> 5 Skog (S) - Stor betydelse, Våtmark (V) - Har betydelse FALSE TRUE FALSE
#> 6 Skog (S) - Stor betydelse, Våtmark (V) - Stor betydelse FALSE TRUE FALSE
#> L_F L_V L_L L_H L_M L_B
#> 1 FALSE FALSE FALSE FALSE TRUE FALSE
#> 2 FALSE FALSE FALSE FALSE TRUE FALSE
#> 3 FALSE FALSE FALSE FALSE TRUE FALSE
#> 4 FALSE FALSE FALSE TRUE FALSE FALSE
#> 5 FALSE TRUE FALSE FALSE FALSE FALSE
#> 6 FALSE TRUE FALSE FALSE FALSE FALSE
Created on 2022-11-01 with reprex v2.0.2
This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.