read txt.gz file

I've been trying using RStudio to read txt.gz it consists of huge text files.

setwd("directory") # setting directory to read data from it
Data = read.table(gzfile("MCT.txt.gz"),sep="\t")

I try to read the column three of the files but the variable of Data cant be defined:

A <- Data(651217,3)
Error in Data(651217, 3) : could not find function "Data"

If this possible can read the text, I also wanted to pick the selected column and do with the other text files to average them, what will be the best command for this? Thank you

1 Like

For subseting a dataframe you have to use [] not ()

A <- Data[651217, 3]

This will return the value for row = 651217 and column = 3

Thanks. When I try it is ok.
A <- Data[651217, 1], for A <- Data[651217, 3] the data is empty maybe the format is different.
the data appears like this :
[1] -7.15, 112.25, 0.00, 0.00

I want to take out the third column from data file. because I use A <- Data[651217, 3] it doesnt get the third column?

With this code you are selecting a single "cell" if you want to select the entire third column use

A <- Data[, 3]

If you need more specific help please provide a minimal REPRoducible EXample (reprex). A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

I am sorry it cant works there are error :

A <- Data[, 3]
Error in [.data.frame(Data, , 3) : undefined columns selected

when I try REPR oducible EX ample (reprex)

head(Data)
V1
1 Lat, Lon, HourlyPrecipRate, HourlyPrecipRateGC
2 54.95, 0.05, 0.00, 0.00
3 54.85, 0.05, 0.00, 0.00
4 54.75, 0.05, 0.00, 0.00
5 54.65, 0.05, 0.00, 0.00
6 54.55, 0.05, 0.00, 0.00

head(Data)
V1
1 Lat, Lon, HourlyPrecipRate, HourlyPrecipRateGC
2 54.95, 0.05, 0.00, 0.00
3 54.85, 0.05, 0.00, 0.00
4 54.75, 0.05, 0.00, 0.00
5 54.65, 0.05, 0.00, 0.00
6 54.55, 0.05, 0.00, 0.00

when I try 2 column,it still cant get the data :

head(Data, 5)[, c('HourlyPrecipRate', 'HourlyPrecipRateGC')]
Error in [.data.frame(head(Data, 5), , c("HourlyPrecipRate", "HourlyPrecipRateGC")) :
undefined columns selected

Since you are not providing a proper reproducible example I can't know what your problem is. As you can see in the reprex below, subsetting works with the data you are showing

Data <- data.frame(Lat = c(54.95, 54.85, 54.75, 54.65, 54.55),
                   Lon = c(0.05, 0.05, 0.05, 0.05, 0.05),
                   HourlyPrecipRate = c(0, 0, 0, 0, 0),
                   HourlyPrecipRateGC = c(0, 0, 0, 0, 0)
)
Data
#>     Lat  Lon HourlyPrecipRate HourlyPrecipRateGC
#> 1 54.95 0.05                0                  0
#> 2 54.85 0.05                0                  0
#> 3 54.75 0.05                0                  0
#> 4 54.65 0.05                0                  0
#> 5 54.55 0.05                0                  0

A <- Data[,3]
A
#> [1] 0 0 0 0 0

head(Data, 5)[, c('HourlyPrecipRate', 'HourlyPrecipRateGC')]
#>   HourlyPrecipRate HourlyPrecipRateGC
#> 1                0                  0
#> 2                0                  0
#> 3                0                  0
#> 4                0                  0
#> 5                0                  0

Created on 2019-07-20 by the reprex package (v0.3.0.9000)

1 Like

In addition to what Andres told you, I can guess you are using read.table, with the default header = FALSE, sep = " ". Your column names appear as the first row, and everything is pulled together in a single column (named V1). If this is the case, you should try something like:
Data <- read.table('your_file', header = TRUE, sep = ',')?
And then follow Andres recommendations in previous posts.
cheers
Fer

3 Likes

Thanks andresrcs and Fer. This give me some solution, because I am still learning to use this language. Thanks.

1 Like

were you able to read the data properly and reproduce Andres code? R is always tricky at start, but reading datasets can be complex as there is some kind of "mother function" with a lot of wrappers for certain specifications. If any other doubt on these topic, or any one, please don't forget to include the code and try to make a reproducible example. The more information you provide (and the better formatted), the easier for people to help you.
Good luck

1 Like

Yes it is tricky. I shared for reading the txt.gz files using this steps:

setwd("directory") # setting directory to read data from it
Data = read.table(gzfile("MCT.txt.gz"),, header = TRUE, sep = ',')

and read the location of your data that you wanted:
A <- Data[651217, 3]

If using data.frame it helps for 5-10 data to show. If larger data cant use data.frame quickly. perhaps any suggestion using data.frame without typing the data values?

sorry, I miss some part of your first message. Your are using a connection to a zipped file or something like that (you are calling gzfile) and I cannot help on that, because I don't know. Someone will came to help, but if you are new to R, please, try simple. if you can extract the data from the compressed file, do it, at least with one for testing. I always make anything a csv, if I can. Always making things simpler is a good move (In my 'arena', we say that if you can reduce what you are doing to Bernoulli trials, you know what you are doing, if not, think and work more).
After you read the file with read.table, what does dim(Data) returns on the R console?
cheers

Yes learning is part of life.. So my basic to learn to read the zipped file first then locate the array of the data you need and do some addition or subtraction of the data without compressed the file. When you read the file using read.table is for reading like matrix form.

Fer is right - if you file has a header row you need read.table(..., header = TRUE).

One other suggestion - not related to the original question - is that if the filename ends with .gz then read.table() will now (since R 2.10) recognise and read it without you needing to use gzfile(). It probably won't be any faster (read.table will call gzrile() internally), but it will be easier to type.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.