Loop to append variable to data frame changes values

So I am going crazy about this issue because I already tried several different options to append the column to a dataframe. After doing so, the values change most of the times (but not all the times). The good thing is, that the data is downloaded from the web, so you can try yourself:

chain = "polkadot"
current_session = 117
first_session = 126


difference = current_session - first_session
x = c(current_session:(current_session - difference))

for(i in 1:length(x)) {
  validators <- read.csv(url(paste("https://storage.googleapis.com/watcher-csv-exporter/", chain , "_validators_era_", x[i], ".csv", sep=(""))))
  validators$era_points = ifelse(validators$era_points == "undefined",0,validators$era_points)
  validators$era_points = as.numeric(validators$era_points)
  if (i==1){
    validators_overall = as.data.frame(validators$era_points)
    colnames(validators_overall)[ncol(validators_overall)] <- paste0("new", i)
  }
  if (i>1){
  new <- validators$era_points
  validators_overall[ , ncol(validators_overall) + 1] <- new                
  colnames(validators_overall)[ncol(validators_overall)] <- paste0("new", i)  
  }
}

The individual values of the different data sets of validators$era_points are somewhere in 100x. But after the loop, they are completely different values. But some columns are correct. What is going on?

Thanks!

Could you try again to explain what your problem is ?

I dont know what this means.

which ones are and which ones arent ? how would we know ?

Hi
Yes, sorry. I have the same data set for different point in times. I want to extract the column "era_points" of every point in time and append it to a new data frame. When you download the data set individually (e.g., by setting i=3), you will see that the variables "era_points" values are somewhat in the thousands,like 1250, 1300, etc. But after I append it to the overall data frame, those points are converted to strange values like 25, 5, 15 etc.

When you View the overall data frame at the end of the loop, you see that most columns have those strange low values, but some have the correct ones. When you individually download the data frame from that point in time you will see that the columns with the larger values are actually correct.

The code you shared produces 10 columns, each have the supposed range you believe to correct, i checked column 3 specifically and downloading it directly or in a loop gives same response.

does your example that you shared simply not show your issue, you provided the wrong parameters ?
Do you know a principled reason that the numbers should be in thousands, maybe you are simply mistaken in your expectations of what the data is ?

Okay, so see below the validators_overall data frame as it is after the loop.

So, column new1 should be equal to the era_points column in the data frame which is downloaded for i=1, right?

So, run the code:

chain = "polkadot"
current_session = 117
first_session = 126


difference = current_session - first_session
x = c(current_session:(current_session - difference))
i=1
  validators <- read.csv(url(paste("https://storage.googleapis.com/watcher-csv-exporter/", chain , "_validators_era_", x[i], ".csv", sep=(""))))
View(validators$era_points)

It clearly gives different values, no?

somehow sometimes you are getting the values as factors, and so its the levels that are being turned to numbers. It makes no sense why this wouldnt be consistent on one machine, and independent of whether you directly read the csv or do it within a loop.

That bizarre detail aside, you can try being explict and pass the stringsAsFactors param to read.csv as FALSE

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.