I am pulling files from an HDFS directory and writing them to a hive table using sparkly R.
This will be done each month but the number of csv files that will be required will vary each month.
The files are numbered like below....
January
part0000.csv
part0001.csv
part0002.csv
....
part0453.csv
June
part0000.csv
.....
part0268.csv
I was thinking of doing an initial call to check to see the total number of files in the directory then looping through using a counter to grab them.
january_count = 454
March_count = 269
However, I am running into problems on figuring out how to tell the total number of files in the directory.
Recommendations?