tibble help viewing column names

Hi I am a very fresh R user and I am currently doing a big project in it because STATA wouldn't allow me to load my dataset which conscists of 23K x 8K (rows and columns).

My problem is that I need the descriptions of my variables. I managed to see them using the "tibble" command (now I can see what mcsr01 actually means. It means "whether csr 2010 (0=no/1=yes)).
The problem is that R crashes because there are too many rows in the tibble table that I have created. I still want to see all 8000 columns because these are the ones I need to pick out (I need to use aroud 100 of these variables) BUT I wish to limit the amount of rows that the tibble command shows me (since R crashes).

Some other great help would be if someone could tell me how I can see the descriptions of my variables. I tried writing "names(name of my data set) but I just got what you can see in the console on the picture - alot of names but not stated what they stand for.

Thanks so much for your time and help

You can use head() or glimpse() around the name of your dataframe to look at your data frame without trying to see the rows, though, by default, tibble() will only try to print the first ten rows).

However, if you have 8,000 columns, you're going to run out of console space with glimpse (since each column/variable gets one line and the console only goes back 1,000). You can work around this by using glimpse(dat[1:n]) to limit the columns you're glimpsing, or consider using colnames() to take a look at all of your variable names (you might even assign that to its own object).

library(tibble)
tibble(mtcars)
#> # A tibble: 32 x 1
#>    mtcars$mpg  $cyl $disp   $hp $drat   $wt $qsec   $vs   $am $gear $carb
#>         <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1       21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2       21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3       22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4       21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5       18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6       18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7       14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8       24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9       22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10       19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # … with 22 more rows
head(mtcars)
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
glimpse(mtcars)
#> Observations: 32
#> Variables: 11
#> $ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2,…
#> $ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4,…
#> $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140…
#> $ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 18…
#> $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92,…
#> $ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.1…
#> $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.…
#> $ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1,…
#> $ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,…
#> $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4,…
#> $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1,…
glimpse(mtcars[3:4])
#> Observations: 32
#> Variables: 2
#> $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140…
#> $ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 18…
colnames(mtcars)
#>  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
#> [11] "carb"

Created on 2019-02-15 by the reprex package (v0.2.1)

1 Like

Thanks for the above commands they all work very well.

The "colnames" command works nicely at displaying the "codes" for the variables(columns) - but I still don't get the description like I did with "tibble" as I showed in my first picture. Also when I write "colnames()" I can only see variables 960:8000 and not the first 959 variables. Is there a fix to seeing more of the console? (This is not a huge problem since I can just write "colnames(dat[1:4000])" and then "colnames(dat[4001:8000])". )

I do however have a very pressing question
I would prefer working with the data in STATA and my version only allows up to 2047 columns per datafile uploaded to STATA. Is there a way for me to cut my data set into 4 bids of 2000 columns. so basically creating 4 new data files each with only 2000 columns in them . Or can I handpick what columns I want? Say I want column 4, 19, 234 and 1127 and want to create a new data set with these 4 columns only. How do I do this?

Again thanks a lot for your time

You'll have to batch this out as you've mentioned because of the limits of the console printing.

Yes, you can subset columns in a number of ways, see the link here, including in the manner you've been doing inside of colnames(). You would subset your data and then write it out to separate files.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.