Learning text mining technique: trouble with importing a file


#1

Hello,
When I input the .csv file, which has a feed per each row. Some are kinda long.

data = fread("Q8_Comment.csv")
Warning messages:
1: In fread("Q8_Comment.csv") :
  Detected 18 column names but the data has 14 columns. Filling rows automatically. Set fill=TRUE explicitly to avoid this warning.
2: In fread("Q8_Comment.csv") :
  Stopped early on line 69. Expected 18 fields but found 21. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<Thank you for the opportunity and consideration! I wish you the best of luck in the hiring process. Thank you again! >>

How can I approach this issue?
I cannot even share the reprex() result.

Thanks!


#2

What's Q8_Comment.csv's header? first row?

I'd give as close to a reproducible example as you can.


On creating your reprex note that you can use a reprex like the one below without having to upload any file

library(readr)
read_csv(
  "C1,    C2,   C3
   100,   a1,   b1
   200,   a2,   b2
   300,   a3,   b3
   400,   a4,   b4")
#> # A tibble: 4 x 3
#>      C1 C2    C3   
#>   <int> <chr> <chr>
#> 1   100 a1    b1   
#> 2   200 a2    b2   
#> 3   300 a3    b3   
#> 4   400 a4    b4

Created on 2018-07-20 by the reprex package (v0.2.0.9000).


I can replicate the gist of your warning below,
This might occur if your headers are long and include whatever char is used to separate values.

library(data.table)
fread(
  "C1,    C2,   C3, C4,
  100,   a1,   b1, 
  200,   a2,   b2, 
  300,   a3,   b3, 
  400,   a4,   b4")
#> Warning in fread("C1, C2, C3, C4,\n 100, a1, b1, \n 200, a2, b2, \n 300,
#> a3, b3, \n 400, a4, b4"): Detected 4 column names but the data has 3
#> columns. Filling rows automatically. Set fill=TRUE explicitly to avoid this
#> warning.
#>     C1, C2, C3, C4,
#> 1: 100, a1, b1,  NA
#> 2: 200, a2, b2,  NA
#> 3: 300, a3, b3,  NA
#> 4: 400, a4,  b4  NA

Created on 2018-07-20 by the reprex package (v0.2.0.9000).


#3

Curtis:
yes. The header is the first row.
I am learning text analytics though R for text mining.
The way it shows how to input a text is through c(). For instance,

text <- c("Because I could not stop for Death -",
          "He kindly stopped for me -",
          "The Carriage held but just Ourselves -",
          "and Immortality")

My file has comments per each row and for some reason, I cannot do:

  unnest_tokens(word, text)

However, I have found an inefficient way to do this inputting similar to the above text book suggests. I copy this from jcblum.

sink( "myoutput.txt")
data = fread("myfile.csv", header=FALSE)
sink()

In the .txt file, the format of text is exactly the same as the one suggested by the textbook.
I was then able to do some text analytics.

If you have a better way to do this, please suggest.

Thank you.


#4

I fail to see the connection between this reply and my post? Can you offer a reprex of the data you're trying to load that creates your error?


#5
library(tm)
#> Loading required package: NLP
library(tidytext)
library(tidyverse) 
library(forcats) 
library(dplyr)
library(data.table)
#> 
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:dplyr':
#> 
#>     between, first, last
#> The following object is masked from 'package:purrr':
#> 
#>     transpose
library(reprex)
library(wordcloud)
#> Loading required package: RColorBrewer
data = structure(list(`Still do not know if my resume loaded or not.` = c("Application discrimination is very widespread and obvious by not hiring completely qualified candidates. Age discrimination appears to be a strong factor to not hire", 
                                                                   "I wish that your site would pull work experience and auto populate.", 
                                                                   "I had no option to Opt Out of providing my SSN or DOB. I would never provide these without a job offer in hand.", 
                                                                   "I felt a little uncomfortable giving my SSN online coming from an HR background.", 
                                                                   "need to fix the year box on application.  It took 3 times to get the system to accept the year", 
                                                                   "your date functions would disappear as soon as you hit enter......I probably had to redo dates 50 times......very frustrating", 
                                                                   "There was no place to upload a cover letter separate from the resume.", 
                                                                   "I was removed from consideration for the open GM position in Greenville, SC. Upon reviewing my profile on the Patterson website, I discovered the changes to address I made did not accurately reflect. I have since made the necessary change that shows I live in the Greenville area less than 10 miles from the branch. I will also be updating my resume to reflect that change as well. I resubmitted my application in the event it was not considered due to my old California address showing. I hope to speak with someone soon to be able to discuss how I can be a great addition to the Patterson team. Thank you. Larry White", 
                                                                   "If employment history wasn't required if you would test resume why is it we also required so he answered on the second step of the employment application itself that seems contradictory", 
                                                                   "During the initial application the system locked up and I could not complete the application.", 
                                                                   "I want this job 1 min away from my home", "Following phone interview, never heard back from HR Specialist. Sent follow-up inquiry and inquired in person at a career fair. Several months and still no follow-up.", 
                                                                   "At one point in the application process, I felt that the online form was broken, and had low confidence that it would be submitted correctly.", 
                                                                   "Your online application was user friendly.", "I would have liked to add a cover letter and my resume did not populate", 
                                                                   "Took some careful time navigating", "Please allow cover letters.  There is much information to be gleaned from a cover letter that may affect the hiring decision.  Your process currently is very aseptic and impersonal.", 
                                                                   "Thank you!", "I am looking forward to potentially contributing to team and company goals a representative of Patterson Dental", 
                                                                   "But I'm one of the few that are out here still thinking that paper apps should be still offered", 
                                                                   "Too much repetition. Upload a resume, then type in the same information. I was asked about my degree 3 times.  This redundancy causes many otherwise qualified candidates to not apply.", 
                                                                   "For the application, let an applicant go back to previous sections they completed", 
                                                                   "Sounds like a great company to work for#", "Application process was very user friendly.", 
                                                                   "Entered through LinkedIn. The transfer of work experience from LinkedIn to Patterson did not transfer in correct order.", 
                                                                   "I am pretty sure that I won't be asked for an interview because my salary will be deemed too high even though I am willing to negotiate on salary for a good fit. Disappointed with required fields that force candidates to enter a salary figure.", 
                                                                   "It would be very nice if all the information on my resume would transfer to the application.", 
                                                                   "I am a U.S. citizen currently living in Costa Rica, but amd ready, willing and able to return to work in Tampa immediately.  I hope my resume on file is given proper consideration as I am extremely qualified for the position and believe I have been sidelined because i am not on U.S. soil.  Please consider me as an applicant, I truly am prefectly experienced and suited for the job at hand.  Thanks!  Howard Siegler", 
                                                                   "I like your easy format application form.", "Well done! The technology toolset was user friendly, intuitive and made the process easy.  Thanks for the consideration.", 
                                                                   "Your process checks off many of the boxes, but the key is to get back to the candidate with disposition!  Even a \"no\" is better than silence.", 
                                                                   "I didn't see where I could upload a cover letter of interest which makes me feel like my application is incomplete.", 
                                                                   "Need to allow us to upload a cover letter. Also a different cover letter and/or resume for a different position we may want to apply for.", 
                                                                   "I'm excited to tell you why I would be a great fit with your company", 
                                                                   "It was a pleasure applying for this position and would do it agin thank you for everything", 
                                                                   "I am very interested in this position and look forward to hearing from you!", 
                                                                   "Would have liked to add a more personal touch (ie submit cover letter, etc)", 
                                                                   "Great company to work to for", "I would love an apportunity to be a Patterson team employee", 
                                                                   "I was not able to erase a wrong entry. I put my explanation in the next comment section.", 
                                                                   "Overall process is smooth, would've been nice if the 7 year work history was picked up off the resume and pre-filled. To be fair it may be due to my uploading a word doc vs. plain text.", 
                                                                   "There are no questions on here pertaining to the application itself.  Seems very outdated.  Almost all the other sites I have used in the last month autofill fields in the employment section once resume is submitted.  With your site, you have to submit resume (which has ALL the info), then retype all the employment info.  Seems redundant.  And the fields, usability is not very streamline.  Found myself more frustrated than anything.", 
                                                                   "Mis-spelling in your survey!", "Your company uses and shows me there is wide spread professionalism and structure within the system.  This is high on my list of priorities with my next job position.", 
                                                                   "I would have rated my experience as Satisfied, but I was disappointed that a cover letter could not be part of the process.  I feel a cover letter allows a candidate to provide insights that are not easily conveyed in a resume.", 
                                                                   "Thanks", "I was wondering if I would be informed that my application had been received and reviewed by a company representative.", 
                                                                   "Maybe call or text candidates to acknowledge the time they spent applying instead of an automatic e-mail assuring me that my application and 20 minutes of time just went into a bucket.", 
                                                                   "Have not heard anything back since application was submitted 10 days ago", 
                                                                   "This was the second time that I applied for the same position.  I was not asked to submit a new resume or cover letter - I figured that they will use the one I already submitted."
)), row.names = c(NA, -50L), class = c("tbl_df", "tbl", "data.frame"
)) 

data1 = fread(data)
#> Error in fread(data): 'input' must be a single character string containing a file name, a system command containing at least one space, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or, the input data itself containing at least one \n or \r

Created on 2018-07-20 by the reprex
package
(v0.2.0).


#6

I am sorry.
I meant to say I resolved the problem.
I edited my code and I don't remember what cause me to the error.
Right now, on the top my head, I would say this command below gave me an error:

data = fread("Q8_Comments.csv", header  = FALSE)
text = data_frame(line = 1:386, text = data)