R markdown creates an empty data frame while executed code chunks do not

Hi,
I have a strange issue. One my code chunks subsets a data frame based on the names of list elements. When I execute the chunk in Rstudio the subsetted data frame is filled but when I knit my Rmd the data frame is empty which leads to an error. However, when I subset the data frame twice and store the result in two different variables, the document can be knitted and the data frame is filled.

The code causing the problem is at the beginning of my Rmd file, which is why I just show these two simple examples:

Example1 leads to an error as regular.df is empty (although it should not be):
regular.df <- regular.df[regular.df$category %in% names(mylist.l),]

Example2 leads not to an error and regular.df is filled:

dummy.df <- regular.df[regular.df$category %in% names(mylist.l),]
regular.df <- regular.df[regular.df$category %in% names(mylist.l),]

I have no clue where the problem is. regular.df has 457 rows and 20 columns, which are either chr, num or Factors. So nothing special.

Loaded libraries:
ggplot2
dplyr
gridExtra
knitr
kableExtra
reshape2

Thanks in advance.

Hi,

Welcome to the RStudio community!

This is indeed peculiar behaviour, but difficult to evaluate the source of the problem without being able to run the code myself. Could you try and generate a reprex? A reprex consists of the minimal code and data needed to recreate the issue/question you're having. You can find instructions how to build and share one here:

Good luck,
PJ

I think I partially solved the problem. The list I was talking about was actually a CompressedGRangesList and the corresponding package (GenomicRanges) was not loaded. Therefore at the first execution the names() function did not work. Why at the second execution it suddenly worked I still do not know.

Here is some example code that should work. It is code of an R-File creating the GRangesList and stores it. And an RMD-File loading the list, creating a data frame and subsetting the data frame via the names of the GRangesList:

R-File:

library(dplyr)
library(GenomicRanges)

filter_list.grl <- data.frame(start = 1:2, end = 3:4, chr="chr1", strand="+", LSV.ID = c("ENSG00000005893.15:s:120441730-120441894", "ENSG00000006625.17:t:30498803-30498938")) %>%
makeGRangesListFromDataFrame(., split.field = "LSV.ID", keep.extra.columns = T)

save(filter_list.grl, file="test.RData")

RMD-File:

title: "Reprex"
author: "Mario Keller"
date: "r format(Sys.time(), '%B %e, %Y')"
geometry: "left=1cm,right=1cm,top=1.5cm,bottom=1.5cm"
output:
pdf_document:
fig_caption: yes
keep_tex: yes
toc: true
toc_depth: 3
number_sections: true
fontsize: 11pt

header-includes:

  • \makeatletter\renewcommand*{\fps@figure}{h}\makeatother
  • \usepackage{placeins}

library(ggplot2)
library(dplyr)
library(gridExtra)
library(knitr)
library(kableExtra)
library(reshape2)

load("test.RData")

regular.df <- data.frame(LSV.ID=c("ENSG00000005893.15:s:120441730-120441894",
                                  "ENSG00000006194.9:t:3288454-3288570",
                                  "ENSG00000006282.20:t:50548283-50548453",
                                  "ENSG00000006625.17:s:30504569-30504844",
                                  "ENSG00000006625.17:t:30498803-30498938"),
                         Gene.Name = c("LAMP2","ZNF263","SPATA20","GGCT","GGCT"),
                         max.titration = c("KD.10000","KD.10000","KD.10000","KD.10000","KD.10000"),
                         name1 = c("LAMP2:ENSG00000005893.15:s:120441730-120441894:IJ",
                                   "ZNF263:ENSG00000006194.9:t:3288454-3288570:IJ",
                                   "SPATA20:ENSG00000006282.20:t:50548283-50548453:IJ",
                                   "GGCT:ENSG00000006625.17:s:30504569-30504844:IJ",
                                   "GGCT:ENSG00000006625.17:t:30498803-30498938:IJ"),
                         name2 = c("LAMP2:ENSG00000005893.15:s:120441730-120441894:SJ",
                                   "ZNF263:ENSG00000006194.9:t:3288454-3288570:SJ",
                                   "SPATA20:ENSG00000006282.20:t:50548283-50548453:SJ",
                                   "GGCT:ENSG00000006625.17:s:30504569-30504844:SJ",
                                   "GGCT:ENSG00000006625.17:t:30498803-30498938:SJ"),
                         pos1 = c(1,2,4,1,2),
                         pos2 = c(2,1,2,2,1),
                         coords1 = c("120439293-120441730","3286149-3288454","50548161-50548283","30500681-30504569","30498938-30500536"),
                         coords2 = c("120431462-120441730","3284205-3288454","50547767-50548283","30498938-30504569","30498938-30504569"),
                         dPSI1 = c(-0.102114,0.1283633,0.09851803,0.2216248,0.1732019),
                         dPSI2 = c(0.1011261,-0.1283633,-0.06161086,-0.2167608,-0.1732019),
                         pval1 = c(0.9771942,0.9992916,0.9681561,0.9999998,1),
                         pval2 = c(0.9724774,0.9992916,0.7773438,0.9999998,1),
                         type1 = c("IJ","IJ","IJ","IJ","IJ"),
                         type2 = c("SJ","SJ","SJ","SJ","SJ"),
                         nH1 = c(8.117751,-3.791877,-10.35513,-3.306764,-4.280869),
                         nH2 = c(-5.183411,3.791884,NA,3.2531,2.892524),
                         r2.1 = c(0.782191,0.9190228,0.9205928,0.9193401,0.9073389),
                         r2.2 = c(0.8046416,0.9190228,NA,0.8987046,0.9304862),
                         nH.group = c("4","1","1","1","1"),
                         stringsAsFactors = F)

regular.df$nH.group <- factor(regular.df$nH.group, levels = c("1", "2", "3", "4"))

dummy.df <- regular.df[regular.df$LSV.ID %in% names(filter_list.grl),] 
regular.df <- regular.df[regular.df$LSV.ID %in% names(filter_list.grl),] 

print(regular.df)

If you remove the dummy.df line. The regular.df will be empty.

HI,

So is your problem solved now? I'm a bit confused... the code is running fine in R when I rest it loading all packages it seems

PJ

The problem itself is solved. But I am still curious to know why, if I do not load the GenomicRanges package and remove this line

dummy.df <- regular.df[regular.df$LSV.ID %in% names(filter_list.grl),] 

the regular.df data frame is empty but if I include the line it is filled. And in addition this problem does not occur when I execute the code chunk in Rstudio (only when I knit the Rmd file).