save a data frame with a list column to tsv

Hi, I want to save a data frame with a list column to tsv

> group_file
# A tibble: 2,233 x 2
   gene_name marker     
   <chr>     <list>     
 1 A3GALT2   <chr [81]> 
 2 AADACL3   <chr [91]> 
 3 AADACL4   <chr [132]>
 4 ABCA4     <chr [756]>
 5 ABCB10    <chr [219]>
 6 ABCD3     <chr [260]>
 7 ABL2      <chr [676]>
 8 ACADM     <chr [305]>
 9 ACAP3     <chr [121]>
10 ACBD3     <chr [200]>
# … with 2,223 more rows

and I want to save this to a format like and each row consist of gene name followed by the markers belonging to that gene:

GENE1	chrX:4_A/C	chrX:9_A/C	chrX:10_A/C	chrX:11_A/C
GENE2	chrX:12_A/C	chrX:14_A/C	chrX:15_A/C	chrX:17_A/C

But I couldn't save it directly:

> write_tsv(group_file,"../data/group_fie.tsv")
Error: Flat files can't store the list column `marker`

What should I do about it?

readr::write_tsv help says that the function will take

A data frame or tibble to write to disk

but assumes that the columns are all vectors. Each of your marker rows is a list, presumably of character vectors. Those need to be unpacked. The simplest way is to treat each as a single string

"chrX:4_A/C chrX:9_A/C chrX:10_A/C chrX:11_A/C"

but I suspect you want

"chrX:4_A/C\tchrX:9_A/C\tchrX:10_A/C\tchrX:11_A/C"

That's doable, but you'll need to right pad out each with trailing \t so that the number of tab delimiters on each line match the number in the longest. Otherwise, other applications may choke trying to import.

Could you post a representative gene_name and marker? Use

dput(group_file[1,]

and cut and paste the result.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.