Get id, coordinate of the value, then ouput

Comede_way · November 21, 2022, 2:40am

Hi,
I met another practice about R for beginner like me. How could I get the value( in "dat1" dataframe) corresponding id number(in "id" dataframe) other than the missing value? Also, is it possible to output the value(in "dat1" dataframe, except NA)coordinates? We may output one dataframe that contains them?
Thank you for help in advance!

id<-data.frame(id1<-c(11,12,13),
               id2<-c(26,22,23),
               id3<-c(39,27,38))
colnames(id)<-c("id1","id2","id3")

dat1<-data.frame(c1<-c(2,9,NA),
                 C2<-c(3,0.2,3.3),
                 C3<-c(1,NA,2.6))
colnames(dat1)<-c("c1","c2","c3")

^{Created on 2022-11-21 with reprex v2.0.2}

DavoWW · November 21, 2022, 7:56am

Hi @Comede_way,
If I understand your question correctly then this should do it:

suppressPackageStartupMessages(library(tidyverse))

id <- data.frame(id1 = c(11,12,13),
                 id2 = c(26,22,23),
                 id3 = c(39,27,38))

dat1 <- data.frame(c1 = c(2,9,NA),
                   c2 = c(3,0.2,3.3),
                   c3 = c(1,NA,2.6))

id$coordinate_row <- c(1:nrow(id))

(full <- cbind(id, dat1))
#>   id1 id2 id3 coordinate_row c1  c2  c3
#> 1  11  26  39              1  2 3.0 1.0
#> 2  12  22  27              2  9 0.2  NA
#> 3  13  23  38              3 NA 3.3 2.6

full %>% 
  select(contains("1"), contains("row")) %>% 
  mutate(coordinate_col = parse_number(names(.)[1])) %>% 
  drop_na() -> subset_1

# Make a function to do all column pairs.
# Argument x must be a character vector of the pair numbers.
subset_section <- function(x){
  y_nam <- paste0("subset_", x)
  full %>% 
    select(contains(x), contains("row")) %>% 
    mutate(coordinate_col = parse_number(names(.)[1])) %>% 
    drop_na() -> y_nam
  return(y_nam)
}

subset_section("1")
#>   id1 c1 coordinate_row coordinate_col
#> 1  11  2              1              1
#> 2  12  9              2              1

lapply(c("1","2","3","4"), subset_section)
#> Warning: 1 parsing failure.
#> row col expected         actual
#>   1  -- a number coordinate_row
#> [[1]]
#>   id1 c1 coordinate_row coordinate_col
#> 1  11  2              1              1
#> 2  12  9              2              1
#> 
#> [[2]]
#>   id2  c2 coordinate_row coordinate_col
#> 1  26 3.0              1              2
#> 2  22 0.2              2              2
#> 3  23 3.3              3              2
#> 
#> [[3]]
#>   id3  c3 coordinate_row coordinate_col
#> 1  39 1.0              1              3
#> 2  38 2.6              3              3
#> 
#> [[4]]
#> [1] coordinate_row coordinate_col
#> <0 rows> (or 0-length row.names)

^{Created on 2022-11-21 with reprex v2.0.2}

Comede_way · November 21, 2022, 9:42am

Hi,@DavoWW,
My problem has been almost solved. I am very appreciative of your effort and time on this matter. Would you mind if I ask other question about this matter? I am sorry to disturb you.

My first question is about the code: What's the meaning of "names(. )[1]"， assignment value？And what 's "4" in "lapply(c("1","2","3","4"), subset_section)" represent？it seems that "id" and "dat1" dataframe are both 3rows, 3cols.

Second is that if the data has changed: there are many columns and rows. And the column names and rownames seems not regular. Just we still input "1", "2",........"17","18","19" in final step? Or make some adjustment?

I have great respect for the fruits of your labour and thank you for your help. I only have these questions because I have just started learning R and some knowledge is not well understood by myself. I put example code to show my thought now.

c1<-c(16.6,NA,10.1,8.6,8.0,17.0,2.4,7.6,5.7,11.6,3.6,NA,6.3,1.5,2.7,16.7,6.7,5.3,12.5)
c2<-c(13.0,11.2,11.0,15.0,10.0,NA,9.6,7.8,9.2,6.6,1.6,8.2,18.0,18.9,NA,NA,2.9,16.1,17.8)
c3<-c(4.2,5.6,1.4,3.4,NA,5.8,5.1,8.2,8.8,9.1,1.9,7.7,9.1,10.6,3.7,9.9,10.2,11.5,NA)
c4<-data.frame(c1,c2,c3)
colnames(c4)<-c("aa","bb","cc")

c5<-c(1:7,99,52,60:69)
c6<-c(71,76,30:45,82)
c7<-c(101,103,202,115:128,108,111)
c8<-data.frame(c5,c6,c7)
colnames(c8)<-c("AA","Bb","cC")

^{Created on 2022-11-21 with reprex v2.0.2}

DavoWW · November 22, 2022, 12:12am

Hi @Comede_way,
This code parse_number(names(.)[1])) simply extracts only the numeric part of the first column name.

Using lapply(c("1","2","3","4"), subset_section) just shows you that if that subset doesn't exist in the data (i.e. subset "4") then the function fails (as expected).

Larger numbers of subsets should work provided that the columns to be paired can be "matched" by having the same number as part of the column names. If the number is not present (as in your latest example) then column indexing may be needed. Although, in your new example the names of the two dataframes could be matched by making them all the same case.

Comede_way · November 22, 2022, 12:27am

Thank you very much. So, as you could see in my new example, there are 19 rows. After i making the name of two dataframe all the case, in the last step, lapply(), i need input “1”, “2”,“3”,……“19”?And if there are more than 100rows, still one by one? or maybe use a small function?

DavoWW · November 22, 2022, 12:59am

Hi @Comede_way,

column_pairs <- c(as.character(1:19))
lapply(column_pairs, subset_section)

Comede_way · November 23, 2022, 12:09pm

It works!!! I am very appreciative of your effort and time on this matter.It helped me a lot.

system · November 30, 2022, 12:09pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.