Hi everyone,
This is just a glimpse of my column that I need to transform...What would be the best solution for creating two new columns called Client and Country ?
If I need to clarify please tell me.
Thanks in advance,
Milos
Hi everyone,
This is just a glimpse of my column that I need to transform...What would be the best solution for creating two new columns called Client and Country ?
If I need to clarify please tell me.
Thanks in advance,
Milos
Unless there is a uniform number of clients per country, I think you're going to have to wrestle with the highlighting you've been handed, which isn't something readxl handles.
When you're dealing with the use of formatting (e.g. cell colours) as a sort of data encoding, I'd recommend looking at a pair of packages by Duncan Garmonsway, tidyxl (despite the name, not actually part of the tidyverse) and unpivotr
Duncan's detailed patterns for this scenario in the write-up below:
https://nacnudus.github.io/spreadsheet-munging-strategies/an-example-formatting-lookup.html
If you can make a list of the values in your yellow cells, the following non-tidy solution may work.
library(dplyr)
DF <- data.frame(Orig = c("A", "AAA", "CCC", "EEE", "TTT",
"B", "WS", "ED", "RG", "YH", "UJ"), stringsAsFactors = FALSE)
DF
#> Orig
#> 1 A
#> 2 AAA
#> 3 CCC
#> 4 EEE
#> 5 TTT
#> 6 B
#> 7 WS
#> 8 ED
#> 9 RG
#> 10 YH
#> 11 UJ
Countries <- c("A", "B")
DF <- DF %>% mutate(Tag = ifelse(Orig %in% Countries, "Country", "Client"))
DF
#> Orig Tag
#> 1 A Country
#> 2 AAA Client
#> 3 CCC Client
#> 4 EEE Client
#> 5 TTT Client
#> 6 B Country
#> 7 WS Client
#> 8 ED Client
#> 9 RG Client
#> 10 YH Client
#> 11 UJ Client
DF$Country <- NA
DF[1, "Country"] <- DF[1, "Orig"]
for(i in 2:nrow(DF)){
if (DF[i, "Tag"] == "Client") {
DF[i, "Country"] <- DF[i - 1, "Country"]
} else {
DF[i, "Country"] <- DF[i, "Orig"]
}
}
DF <- DF %>% filter(!Orig %in% Countries) %>%
select(Country, Client = Orig)
DF
#> Country Client
#> 1 A AAA
#> 2 A CCC
#> 3 A EEE
#> 4 A TTT
#> 5 B WS
#> 6 B ED
#> 7 B RG
#> 8 B YH
#> 9 B UJ
Created on 2019-09-03 by the reprex package (v0.2.1)
I'd like to offer a small modification to avoid the for
loop:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
DF <- data.frame(Orig = c("A", "AAA", "CCC", "EEE", "TTT",
"B", "WS", "ED", "RG", "YH", "UJ"), stringsAsFactors = FALSE)
Countries <- c("A", "B")
DF <- DF %>% mutate(Tag = ifelse(Orig %in% Countries, "Country", "Client"))
DF %>%
mutate(Country = ifelse(Tag == "Country", Orig, NA)) %>%
tidyr::fill(Country) %>%
filter(Tag != "Country") %>%
select(Country, Client = Orig)
#> Country Client
#> 1 A AAA
#> 2 A CCC
#> 3 A EEE
#> 4 A TTT
#> 5 B WS
#> 6 B ED
#> 7 B RG
#> 8 B YH
#> 9 B UJ
Created on 2019-09-03 by the reprex package (v0.3.0)
Thanks a lot Valeri, it works perfectly
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.