Reading a .cfl file into R

mustafghan · July 10, 2019, 4:34pm

I'm trying to read the following .cfl file into R so that then I could use this "control" or mapping file to aggregate data using another input file. I'd basically want to skip all the lines that start with an exclamation and then organize the remaining in a nice data frame format. Just a little lost on where to start. What is the best way to approach this:

! PS&D commodity definitions and aggregations.
! Comments are indicated by the following symbols, starting in column 1:
! ! begins a comment line.
! /* begins a comment block.
! */ ends a comment block.

! Table: Define commodities
! | CSV commodity code; input or aggregate. Output order implied. This field is treated as text.
! | (Many of the numeric codes match SITC Revision 3, http://www.intracen.org/tradstat/sitc3list.htm)
! | | TS file name; output
! | | | TS commodity name; output
! | | | | First output year. Default=all years.
! | | | | | Year offset, from CSV file year (left-hand number) to TS year. Default=0.
! | | | | | | Print empty tables: 1=yes, 0=no=default
! | | | | | | | Comments
! --------- ------- -------- ----------------------- ---- -- - ......
COMMODITY 0011000 CATTLE Cattle
COMMODITY 0013000 SWINE Swine

COMMODITY 0111000 BFVEAL Beefveal
COMMODITY 0113000 PORK Pork
COMMODITY 0114000 POULTRY_ Poultry_Chicken+Turkey
COMMODITY 0115000 CHICKEN Chicken
COMMODITY 0114300 TURKEY Turkey
COMMODITY EGGS EGGS Eggs
COMMODITY LAMBMUT LAMBMUT Lamb Mutton

COMMODITY 0223000 FLUIDMK Fluid Milk
COMMODITY 0224200 NFDMILK Nonfat Dry Milk
COMMODITY 0224400 DAIRYDM Dairy Dry Milk
COMMODITY 0230000 BUTTER Butter
COMMODITY 0240000 CHEESE Cheese

! Cotton, bales
COMAGGR COTBALE + 2631000 'Cotton

! Total grains
COMAGGR TOTGRNS + 0430000 'Barley
COMAGGR TOTGRNS + 0440000 'Corn
COMAGGR TOTGRNS + 0451000 'Rye
COMAGGR TOTGRNS + 0452000 'Oats
COMAGGR TOTGRNS + 0459100 'Millet
COMAGGR TOTGRNS + 0459200 'Sorghum
COMAGGR TOTGRNS + 0459900 'Mixed Grains
COMAGGR TOTGRNS + 0410000 'Wheat
COMAGGR TOTGRNS + 0422110 'Rice
! ------- ------- - ------- ......

pieterjanvc · July 20, 2019, 4:27pm

Hi,

I was able to get you started on the first part: filtering only the lines of interest:

library("stringr")

myData = readLines("testData.cfl")
myData = myData[!str_detect(myData, "(^\\!)") & myData != ""]

You end up with a list in which each line is a string of data in your file like this:

 [1] "COMMODITY 0011000 CATTLE Cattle"                  
 [2] "COMMODITY 0013000 SWINE Swine"                    
 [3] "COMMODITY 0111000 BFVEAL Beefveal"                
 [4] "COMMODITY 0113000 PORK Pork"                      
 [5] "COMMODITY 0114000 POULTRY_ Poultry_Chicken+Turkey"
 [6] "COMMODITY 0115000 CHICKEN Chicken"                
 [7] "COMMODITY 0114300 TURKEY Turkey"                  
 [8] "COMMODITY EGGS EGGS Eggs"                         
 [9] "COMMODITY LAMBMUT LAMBMUT Lamb Mutton"            
[10] "COMMODITY 0223000 FLUIDMK Fluid Milk"             
[11] "COMMODITY 0224200 NFDMILK Nonfat Dry Milk"        
[12] "COMMODITY 0224400 DAIRYDM Dairy Dry Milk"         
[13] "COMMODITY 0230000 BUTTER Butter"                  
[14] "COMMODITY 0240000 CHEESE Cheese"                  
[15] "COMAGGR COTBALE + 2631000 'Cotton"                
[16] "COMAGGR TOTGRNS + 0430000 'Barley"                
[17] "COMAGGR TOTGRNS + 0440000 'Corn"                  
[18] "COMAGGR TOTGRNS + 0451000 'Rye"                   
[19] "COMAGGR TOTGRNS + 0452000 'Oats"                  
[20] "COMAGGR TOTGRNS + 0459100 'Millet"                
[21] "COMAGGR TOTGRNS + 0459200 'Sorghum"               
[22] "COMAGGR TOTGRNS + 0459900 'Mixed Grains"          
[23] "COMAGGR TOTGRNS + 0410000 'Wheat"                 
[24] "COMAGGR TOTGRNS + 0422110 'Rice"

Now, it's not clear from just looking at the data what it is you want (one table, different tables, ...) and which part of each string goes in which column. Depending on how you split, you end up with different number of columns for different rows or different data types (integer, string, ...).

Please provide more info, or if this is all you need to get started, enjoy the rest
PJ

mustafghan · July 22, 2019, 2:55pm

@pieterjanvc...Thanks very much. This is is very helpful. I wanted to eventually convert this table into a column for each Commodity name, code, etc.

system · August 12, 2019, 3:10pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.