Thanks for that idea!
I tried to use drill, but for some reason it treats the csv as a single column of data, even when the drill console does.
I get output like:
# Source: table<dfs.`/tmp/data.csv`> [?? x 1]
# Database: DrillConnection
columns
<chr>
1 "[\"fname\",\"lname\",\"score\"]"
2 "[\"mark\",\"smith\",\"1\"]"
3 "[\"betty\",\"wilson\",\"2\"]"
4 "[\"jim\",\"mccoy\",\"3\"]"
Here is the example code
#write out example csv file
df1 <- tibble(
'fname'=c('mark','betty','jim'),
'lname'=c('smith','wilson','mccoy'),
'score'=1:3
)
write_csv(df1,'/tmp/data.csv')
#startup drill
#check out: https://drill.apache.org/docs/drill-in-10-minutes/
#in another shell run
#bin/drill-embedded --verbose
#works from drill
#0: jdbc:drill:zk=local> select columns[1],columns[2] from dfs./tmp/data.csv;
db <- src_drill('localhost')
#sees the file but does not split into columns?
drill_df <- tbl(db, "dfs.`/tmp/data.csv`")
drill_df
Source: table<dfs.`/tmp/data.csv`> [?? x 1]
Database: DrillConnection
columns
<chr>
1 "[\"fname\",\"lname\",\"score\"]"
2 "[\"mark\",\"smith\",\"1\"]"
3 "[\"betty\",\"wilson\",\"2\"]"
4 "[\"jim\",\"mccoy\",\"3\"]"
Thanks for any input,
John