Help!!!I have a problem in using readHTMLTable()

Question: I use readHTMLTable() in package(XML) to grab a form in a website("上海健康医学院2019年浙江本科分专业录取分数线_高考网"). But the form is special and looks like the writer merges some cells. So when I grab it, the result is wierd as you can see in the pitcure. So What shoud I do? I heard that there are a lot of masters in the community. So I come here. Thanks a lot!!!!
image|636x247
Here is my code:

temp<-getURL(url,httpheader = myHttpheader,.encoding = "GB2312")
temp1<-iconv(temp,"GB2312","UTF-8")
doc<-htmlParse(temp1,asText = TRUE,encoding = "UTF-8")
table1<-readHTMLTable(doc,header = TRUE,which = 1)

what package is getURL from, and what is the content of myHttpheader ?

Sorry, getURL is from package ‘RCurl’ , and
myHttpheader <- c(
"User-Agent"="Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.48 Safari/537.36 QQBrowser/7.7.31732.400",
"Accept"="text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8",
"Accept-Language"="en-us", "Connection"="keep-alive",
"Accept-Charset"="GB2312,utf-8;q=0.7,*;q=0.7",
"Referer"="http://t.dianping.com/")

Help!!!!! It has been a big problem for me.

Its hard when I don't speak the language... I made a simple example to show XML might not give desired behavioir compared to rvest. maybe there are even other packages that would be better

# htmltablewithmerge.html #

# <body>
#   <table style="border: 1px solid black;">
#     <tr ><th>head1</th><th>head2</th><th>head3</th></tr>
#     <tr ><td colspan="2">merged text 1 & 2 ?</td><td>text3</td></tr>
#       </table>
#       </body>
# 
# 
library(XML)

(table1<-readHTMLTable("htmltablewithmerge.html",which = 1))

library(rvest)

read_html("htmltablewithmerge.html")%>%
  html_node("table") %>%
  html_table()

Really thank you!!! I tried your methods and the result went well! I am a student and sometimes really desperate for looking for help. Thanks again!!!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.