How to format a file

I have a csv file, while each row has 6 columns.
The first column represents ID, and are six characters (the first character is an empty space). The second and third columns represent latitude and longitude, where they each should be 8 floating point numbers with two decimals. The next 3 columns represent values, and they each have 10 floating points number with two decimals. How to reformat this csv file according to this format, and to save it as a txt file? Thanks for your help.

For example, DF represent the csv file:

DF
12091 39.90 145.90 12.92 100.92
12092 39.95 144.12 14.00 110.10
...

a combination of readr::read_csv() and readr::write_tsv() would help. You can also use base R.

Out of curiosity, why did you left an empty space for the ID column?

Thanks, but I still don't know how to do it. Could you please help to write a sample code? I just follow another example file, the first space can be removed and thus there will be 5 characters for the first column. Thanks for your help.

csv files actually are text files already. Open your .csv in a text editor (that is, assuming it's not a huge file) and you'll see that these are text files with each row a table-row, with each column separated by a comma (usually, anyway).

RStudio has a great GUI to help you load csv files, Check out the section called "Importing data from Text and CSV files" in the guide Importing Data with RStudio. One feature that I think is really cool in the "import text (readr)" option is your ability to explicitly select the data type of each column.

It might take you a couple trials and errors to get it right, but once you load the data, note the "Code Preview", with a code you can reuse in future.


In terms of saving your data, the write_csv or write_tsv functions are helpful.


Once you get more comfortable with all the basics of data importing and saving, the "Data Import Cheat Sheet" cheatsheet will be a very good friend in future.

2 Likes

There was a typo, each row should have 5 columns.
I tried to read the .csv file and write it as a .dat file, but have some problems.

require(readr)
DF = read_csv(originaldata.csv, col_names=F, col_types=list(col_character(),
col_double(), col_double(), col_double(), col_double()))

write.table(DF, "reformatdata.dat", col.names=F, row.names=F, fileEncoding='UTF-8')

But how to leave the required space as I mentioned above? I put the requirements below:
The first column represents ID, and are five characters. The second and third columns represent latitude and longitude, where they each should be 8 floating point numbers with two decimals. The next 2 columns represent values, and they each have 10 floating points number with two decimals. For example, if the last column in the first row has 7 floating points in total, there should be 3 empty spaces before the number. So that each row has ordered numbers. Thanks again for your help.

I did not use write_tsv() because it uses tab to separate columns, while I want to have blank spaces. Did I understand it correctly?

The reason to set number of floating point numbers for each column is to have the dataframe formatted like this:

DF
12091   39.90  145.90     12.92    100.92
12092   39.95  144.12      4.00    110.10
...

That is to say, row 1 column 2 has 5 floating point numbers and 2 decimals, so that there are 3 blank spaces before this number and make up 8 floating points in total. Similarly, row 1 column 5 has 6 floating point numbers with 2 decimals, so that there are 4 blank spaces before this number and make up 10 floating points in total. My goal is to convert the comma separated csv file to this format and export as .dat file, how to realize this? Thanks again.

You can use readr::write_delim() to specify whatever delimeter you desire.

1 Like

Thanks, I got the delimiters here, but my problem is to define specific lengths for each column. Just an example, the number of blank spaces before 39.90 plus the number 39.90 itself should be 8. So as 39.90 has five floating points, there should be three blank spaces before it, so there are 8 floating points in total. And the same theory applies to each column, just different lengths for different columns.

Sorry for the weird question, but I'm preparing input files for another program and need to format the data well. I just got stuck here...

Is that fixed-width format? If so, perhaps see one of the functions below (it's not a format I've worked with):

https://rdrr.io/cran/miceadds/man/write.fwf2.html

https://rdrr.io/github/DavideMassidda/testing/man/write.fwf.html

There's also a thread here on controlling the widths in fwf.

Yes, I want to manipulate the csv files as fixed-width format .dat files, and each column has their specific width. Which package should I use for read.fwf, write.fwf, and write.fwf2? I called utils library, but it cannot find write.fwf2 function, thanks for your help.

The package to be used is "miceadds", thanks.

There's a write.fwf() function in the gdata package, see here:

write.fwf2(), linked to in my previous reply, is from a package called miceadds. The links go to function reference pages. Those functions are part of packages (miceadds for the first, and the latter from someone's package on GitHub — ignore that for now), the names of which are given on the function reference pages in several places (including in the URL), see below:

The rdrr.io page for the gdata package is here (rdrr.io and rdocumentation are two of several places that show you package and function documentation, in addition to the usual CRAN reference — https://CRAN.R-project.org/package=gdata)

"Man pages" is where you'll find the index for specific functions, including write.fwf():

https://rdrr.io/cran/gdata/man/write.fwf.html

You can also just search for such functions either by keywords or the exact function name. See, for example, the link below where I've searched for "write.fwf"
https://www.rdocumentation.org/search?q=write.fwf

Kieran Healy has a really nice figure of how to read an R help page in the appendix of his book (free online):

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.