Read.table with no header

jacksonan1 · July 24, 2018, 2:00pm

Community:

If one has a data set that does not have true headers (i.e., line1 is not all headers):

Line1  Table No    1

Line 2            ID   AMT     DV

How should the following header code be edited?

datr<-read.table(file="popfo7n_100.fit",header=T)

Would the same edit be correct for the next line of code?

datr<-read.table(file="ritd4nb_sim500.fit",header=T,as.is=TRUE)

EconomiCurtis · July 24, 2018, 2:41pm

Just set header = FALSE, no?

Check out the read.table documentation:
https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html

header
a logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: header is set to TRUE if and only if the first row contains one fewer field than the number of columns.

ron · July 24, 2018, 3:07pm

@jacksonan1 I'm not sure that I've understood what you are asking, but looking at those two lines of data I'm guessing that Line 1 has a header that relates to the whole table (ie it is table number 1) and that the column headers are then in line 2?

In this case, the following would work (reading from a text string rather than a file as an example):

junk <- 'Table no 1
ID AMT DV
1 2 3
4 5 6
'
DF <- read.table(text = junk, header = TRUE, skip = 1)

Helpful?

jacksonan1 · July 24, 2018, 3:10pm

That was the first thing that I did but I get the error :

datr<-read.table(file="popfo7n_100.fit",header=False)

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :

line 1 did not have 18 elements

There is still an issue trying to input this file which the change in code does not resolve.

jacksonan1 · July 24, 2018, 3:28pm

Ron:

You do have the correct understanding that the first line of the table is junk and needs to be skipped. My question is do I have to put in the entire table as you did or can I just designate the junk entry?

Thanks

jacksonan1 · July 24, 2018, 4:52pm

I was able to read in the data with this code:

datr<-read.table(file="popfo7n_100.fit",skip=1, nrows=46400, header=T)

However when I tried to convert all the variables to characters with this code:

datr<-read.table(file="popfo7n_100.fit",header=T,as.is=TRUE)

The resulting fix(datar) command for the file contained only 1 row of data.

Do you have any idea of what I did wrong and how to correct?

jcblum · July 24, 2018, 4:52pm

Proper code formatting would make it easier for helpers to understand your situation. Since you’re posting from e-mail, you may not realize it but none of the whitespace in your posts is preserved unless text is in a designated code block. To add code formatting, just set off each chunk of code with a 3-backtick fence. You can let the syntax highlighter guess the language, or provide a hint like so:

```r
x <- list(a = c(1:3, 5), b = "six")
```

Some brilliant commentary

```text
Some plain text
           with special
                alignment
```

Becomes:

x <- list(a = c(1:3, 5), b = "six")

Some brilliant commentary

Some plain text
           with special
                alignment

jacksonan1 · July 24, 2018, 5:25pm

Is there a forum link on line where I can post and circumvent the need to use ``` the 3 backtick fence for code?

mara · July 24, 2018, 6:32pm

You can also select the text and hit the code symbol (see above). You could make a gist at gist.github.com, but it's nice for the people helping you to have everything in one place.

Also, using reprex will do the formatting for you.

For pointers specific to the community site, check out the reprex FAQ, linked to below.

jacksonan1 · July 24, 2018, 6:48pm

From your reply button, I see how it works, but can you give me the exact link where one can access the text box. Sorry but that was not clear.

jcblum · July 24, 2018, 6:49pm

Yes! Your topic lives here: Read.table with no header

(Since you originally posted it at the end of an unrelated thread, it was split off into its own topic by a moderator)

I edited your original post to insert the code formatting (so people could see the layout of the data you're trying to import), but you might want to add code formatting to some of your follow-ups. You can edit your own posts from the forum page by clicking the tiny gray pencil icon at the bottom of a post.

jacksonan1 · July 24, 2018, 7:01pm

I don’t use R a lot so may I ask the link to the forum page so that in the future I can post properly?

Thanks

jcblum · July 24, 2018, 7:02pm

The main RStudio Community site link is:

https://forum.posit.co

Is that what you're looking for?

jacksonan1 · July 24, 2018, 7:14pm

Yes, that is it.

Thanks

ron · July 25, 2018, 7:49am

Do you have any idea of what I did wrong and how to correct?

I'm still at a bit of a loss to understand what you are asking.

In an earlier post you asked:

My question is do I have to put in the entire table as you did or can I just designate the junk entry?

read.table reads from a file or, in my example case, a text string, into a data.frame. I don't understand what you mean by this question. My code showed how to skip the first line which was not required, and then read the file. Which was what you seemed to be trying to do.

Then you tried this:

datr<-read.table(file="popfo7n_100.fit",skip=1, nrows=46400, header=T)

which, if I understand you correctly, has done what you wanted except that the character variables have been stored as factors. Yes?

I presume you have more than 46400 rows of data but only want to read 46400 rows. If you are reading the complete file, the nrows parameter is redundant. As an aside, if you do have more than 46400 rows, it may be better to read the whole file and then subset the data.frame based on whatever defines those first 46400 rows as being of interest (more robust if revisited in the future - you may not need to know the number 46400).

I do not understand why, having got a read.table command that works, you then seem to be trying to read the same file without the skip=1. This will cause the line of garbage to be read and interpreted as headers which has implications for the number of columns read.table will expect to find in the following lines.

Try:

datr<-read.table(file = "popfo7n_100.fit", header = TRUE,
                          skip = 1, stringsAsFactors = FALSE)

with or without the "nrows = 46400", as appropriate.

Without access to your file; so not knowing what the first line actually says (number of words), or how many columns of data there are, or how they're are separated, I'm not sure why you only got 1 row with your last attempt. You got more than 1 row when using skip=1 and specifying nrows?

It could possibly be to do with the way you've called read.table or it may be related to the origin of the data file. Is it a text file that was created on Windows that you are attempting to read on a Mac, or vice versa?

jacksonan1 · July 25, 2018, 8:45am

Thanks, this solution solved my problem.

jcblum · July 25, 2018, 8:49am

If your question's been answered, would you mind marking a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. You’ll need to do this from the forum page. Here’s how:

EconomiCurtis · September 26, 2018, 11:38am

2 posts were split to a new topic: Plot truncates on x-axis - advice to avoid this

EconomiCurtis · September 26, 2018, 2:16pm

A post was merged into an existing topic: Plot truncates on x-axis - advice to avoid this