Excert values from columns

CC_M · November 1, 2022, 10:22am

Hi there, I'm new to R and perhaps taking water over my head but is there a way to exert values from cells to create new columns when they have identifiers in the cells? I'm trying to make new columns out of the column " Landskapstyp" to separate and populate them with the various values in the cells i.e. Skog (S), Våtmark (V), Urban miljö (U), Marin Miljö etc (which all should have their own columns) while removing the other text. Is this even possible? Thanks for any advice!

HanOostdijk · November 1, 2022, 11:28am

This could be a start. Be more specific if you want more guidance .

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)
df1 <- data.frame(
  Landskapstyp=c(
    'Marin miljö, Skog (S)',
    'Skog (S)'
  )
)

df2 <- df1 |>
  mutate(
    marin= case_when(
      stringr::str_detect(Landskapstyp,fixed("Marin")) ~ TRUE,
      TRUE ~ FALSE
      ), 
    skog= case_when(
      stringr::str_detect(Landskapstyp,fixed("Skog (S)")) ~ TRUE,
      TRUE ~ FALSE
      )
  )

print(df2)
#>            Landskapstyp marin skog
#> 1 Marin miljö, Skog (S)  TRUE TRUE
#> 2              Skog (S) FALSE TRUE
Created on 2022-11-01 with reprex v2.0.2

Flm · November 1, 2022, 1:38pm

If you provide part of your data frame using dput(head(YOURDF, 20)) and you specify in detail an example of what you want to achieve, we can try to get the result

CC_M · November 1, 2022, 3:35pm

Thank you so much for the help so far! To clarify a bit, I’m trying to separate out the column “Landskapstyp” into several variations depending on the cell value so it becomes easier to summarise - I just assumed separating them to columns would help for sorting, tallying plotting etc but there might be a better way. The cell value can contain the following 9 values.

Jordbrukslandskap (J)
Skog (S)
Urban miljö (U)
Fjäll (F)
Våtmark (V)
Sötvatten (L)
Havsstrand (H)
Marin Miljö (M)
Bracksvatten (B)

Mostly the cell value for the records includes one or two of these, however, it’s possible to include all nine values. When importing the CSV file to R the value also adds additional text to some cells which are not needed like “- Stor betydelse” and “- Har betydelse” . Querying dput(head(DF, 20)) part of the outcome looks as follows

Landskapstyp = c("Marin miljö (M) - Stor betydelse", "Marin miljö (M) - Stor betydelse", "Marin miljö (M) - Stor betydelse", "Havsstrand (H) - Stor betydelse", "Skog (S) - Stor betydelse, Våtmark (V) - Har betydelse", "Skog (S) - Stor betydelse, Våtmark (V) - Stor betydelse", "Skog (S) - Stor betydelse, Våtmark (V) - Stor betydelse, Urban miljö (U) - Har betydelse", "Skog (S) - Stor betydelse, Urban miljö (U) - Stor betydelse"….

HanOostdijk · November 1, 2022, 6:41pm

In that case expand my example with the other classes.
Note that I now treat upper and lower case the same,
as you use both miljö and Miljö

library(dplyr) ; library(stringr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df1 <- data.frame(
  Landskapstyp = c("Marin miljö (M) - Stor betydelse",
                   "Marin miljö (M) - Stor betydelse",
                   "Marin miljö (M) - Stor betydelse", 
                   "Havsstrand (H) - Stor betydelse", 
                   "Skog (S) - Stor betydelse, Våtmark (V) - Har betydelse", 
                   "Skog (S) - Stor betydelse, Våtmark (V) - Stor betydelse",
                   "Skog (S) - Stor betydelse, Våtmark (V) - Stor betydelse, Urban miljö (U) - Har betydelse",
                   "Skog (S) - Stor betydelse, Urban miljö (U) - Stor betydelse"
  )
)

f_ig <- function(x) stringr::fixed(x, ignore_case = TRUE) 

df2 <- df1 |>
  mutate(
    L_J= case_when(
      stringr::str_detect(Landskapstyp,f_ig("Jordbrukslandskap (J)")) ~ TRUE,
      TRUE ~ FALSE
      ), 
    L_S= case_when(
      stringr::str_detect(Landskapstyp,f_ig("Skog (S)")) ~ TRUE,
      TRUE ~ FALSE
      ),
    L_U= case_when(
      stringr::str_detect(Landskapstyp,f_ig("Urban miljö (U)")) ~ TRUE,
      TRUE ~ FALSE
      ), 
    L_F= case_when(
      stringr::str_detect(Landskapstyp,f_ig("Fjäll (F)")) ~ TRUE,
      TRUE ~ FALSE
      ),
    L_V= case_when(
      stringr::str_detect(Landskapstyp,f_ig("Våtmark (V)")) ~ TRUE,
      TRUE ~ FALSE
      ), 
    L_L= case_when(
      stringr::str_detect(Landskapstyp,f_ig("Sötvatten (L)")) ~ TRUE,
      TRUE ~ FALSE
      ),
    L_H= case_when(
      stringr::str_detect(Landskapstyp,f_ig("Havsstrand (H)")) ~ TRUE,
      TRUE ~ FALSE
      ),
    L_M= case_when(
      stringr::str_detect(Landskapstyp,f_ig("Marin Miljö (M)")) ~ TRUE,
      TRUE ~ FALSE
      ), 
    L_B= case_when(
      stringr::str_detect(Landskapstyp,fixed("Bracksvatten (B)")) ~ TRUE,
      TRUE ~ FALSE
      )
  ) 

head(df2)
#>                                              Landskapstyp   L_J   L_S   L_U
#> 1                        Marin miljö (M) - Stor betydelse FALSE FALSE FALSE
#> 2                        Marin miljö (M) - Stor betydelse FALSE FALSE FALSE
#> 3                        Marin miljö (M) - Stor betydelse FALSE FALSE FALSE
#> 4                         Havsstrand (H) - Stor betydelse FALSE FALSE FALSE
#> 5  Skog (S) - Stor betydelse, Våtmark (V) - Har betydelse FALSE  TRUE FALSE
#> 6 Skog (S) - Stor betydelse, Våtmark (V) - Stor betydelse FALSE  TRUE FALSE
#>     L_F   L_V   L_L   L_H   L_M   L_B
#> 1 FALSE FALSE FALSE FALSE  TRUE FALSE
#> 2 FALSE FALSE FALSE FALSE  TRUE FALSE
#> 3 FALSE FALSE FALSE FALSE  TRUE FALSE
#> 4 FALSE FALSE FALSE  TRUE FALSE FALSE
#> 5 FALSE  TRUE FALSE FALSE FALSE FALSE
#> 6 FALSE  TRUE FALSE FALSE FALSE FALSE
Created on 2022-11-01 with reprex v2.0.2

system · December 13, 2022, 6:41pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.