Arrange variables

sirius1170 · November 28, 2019, 8:01am

Hi,
I am studying on political connection and I would like to ask for help with the data below,

data.frame(
         Northwest = as.factor(c("Increased region", "2006", "2007", NA)),
       Northwest.1 = as.factor(c("Remained region", "2011", "2012", "2013")),
         Northeast = as.factor(c("Increased region", "2006", "2007", NA)),
   Red.River.Delta = as.factor(c("Increased region", "2011", "2012", "2013"))
)

I would like to arrange my data as follows,

But it is difficult for me to know how to work with it.
I would like to ask for help.
Many thanks.

FJCC · November 28, 2019, 2:51pm

DF <- data.frame(
  Northwest = as.factor(c("Increased region", "2006", "2007", NA)),
  Northwest.1 = as.factor(c("Remained region", "2011", "2012", "2013")),
  Northeast = as.factor(c("Increased region", "2006", "2007", NA)),
  Red.River.Delta = as.factor(c("Increased region", "2011", "2012", "2013"))
)

library(tidyr)
library(dplyr)
library(stringr)
DFnew <- gather(DF, key = "Region", value = "Val") %>% 
  filter(!is.na(Val)) %>% 
  mutate(Label = ifelse(str_detect(Val, "^\\d+$"), NA, Val)) %>% #keep non-numeric text
  fill(Label) %>% #replace NA with previous text
  filter(str_detect(Val, "^\\d+$")) %>% #keep only numeric text
  mutate(Region = str_replace_all(Region, "\\.|\\d", " ")) %>% 
  mutate(Region = str_trim(Region))
#> Warning: attributes are not identical across measure variables;
#> they will be dropped
DFnew
#>             Region  Val            Label
#> 1        Northwest 2006 Increased region
#> 2        Northwest 2007 Increased region
#> 3        Northwest 2011  Remained region
#> 4        Northwest 2012  Remained region
#> 5        Northwest 2013  Remained region
#> 6        Northeast 2006 Increased region
#> 7        Northeast 2007 Increased region
#> 8  Red River Delta 2011 Increased region
#> 9  Red River Delta 2012 Increased region
#> 10 Red River Delta 2013 Increased region

^{Created on 2019-11-28 by the reprex package (v0.2.1)}

andresrcs · November 28, 2019, 6:33pm

Just for variety sake, this would be another approach

library(tidyverse)
library(stringr)

df <- data.frame(
    Northwest = as.factor(c("Increased region", "2006", "2007", NA)),
    Northwest.1 = as.factor(c("Remained region", "2011", "2012", "2013")),
    Northeast = as.factor(c("Increased region", "2006", "2007", NA)),
    Red.River.Delta = as.factor(c("Increased region", "2011", "2012", "2013"))
)

df %>%
    mutate_all(as.character) %>% 
    rename(!! set_names(names(.), nm = paste(names(.), .[1,]))) %>% 
    tail(-1) %>% 
    gather(Region, Value) %>% 
    mutate(Region = str_remove_all(Region, "[:punct:]|[:digit:]")) %>% 
    separate(Region, c("Region", "Label"), extra = "merge") %>% 
    filter(!is.na(Value))
#>           Region            Label Value
#> 1      Northwest Increased region  2006
#> 2      Northwest Increased region  2007
#> 3      Northwest  Remained region  2011
#> 4      Northwest  Remained region  2012
#> 5      Northwest  Remained region  2013
#> 6      Northeast Increased region  2006
#> 7      Northeast Increased region  2007
#> 8  RedRiverDelta Increased region  2011
#> 9  RedRiverDelta Increased region  2012
#> 10 RedRiverDelta Increased region  2013

system · December 5, 2019, 6:46pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.