rama27
November 3, 2020, 12:25pm
1
Hi, I have a following problem. I downloaded a HTML body of a table. The file is saved as myfile.txt. See first two lines bellow:
<tr class="header">
<th class="display_name"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Beneficiary</font></font></th>
<th class="district_display"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Location of the beneficiary</font></font></th>
<th class="formatted_year"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Period of payments received</font></font></th>
<th class="sum"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">EAGF and EAFRD, EUR</font></font></th>
</tr>
<tr onclick="show_hide_tr('pd_1');" style="cursor: pointer; width: 100%;" class=" row1"><td class="display_name"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Dzidra Breidaga</font></font></td>
<td class="district_display"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Salacgriva county</font></font></td>
<td class="formatted_year"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">2017-2018</font></font></td>
<td class="sum"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">
4491.20
</font></font></td>
</tr>
<tr onclick="show_hide_tr('pd_2');" style="cursor: pointer; width: 100%;" class=" row1"><td class="display_name"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">JÄnis Mikijanskis</font></font></td>
<td class="district_display"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Ludzas nov.</font></font></td>
<td class="formatted_year"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">2017-2018</font></font></td>
<td class="sum"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">
2926.31
</font></font></td>
</tr>
I would like to read this file as a df in R. I tried this:
library(rvest)
adresa <- 'C:/Users/.../myfile.txt'
table <- html_nodes(adresa, "table")
But I got an error "Error in UseMethod("xml_find_all") :
no applicable method for 'xml_find_all' applied to an object of class "character""
Desired output is:
Beneficiary Location of the beneficiary Period of payments received EAGF and EAFRD, EUR
Dzidra Breidaga Salacgriva county 2017-2018 4491.20
JÄnis Mikijanskis Ludzas nov. 2017-2018 2926.31
How can I fix it please? Thanks
First you need to use xml2::read_html()
to read the text as html / xml content. The second is that you can't select a node that isn't present in the data. e.g. you need <table></table>
to select it. A reprex below shows how you can fix this.
tb_text <- '<tr class="header">
<th class="display_name"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Beneficiary</font></font></th>
<th class="district_display"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Location of the beneficiary</font></font></th>
<th class="formatted_year"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Period of payments received</font></font></th>
<th class="sum"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">EAGF and EAFRD, EUR</font></font></th>
</tr>
<tr onclick="show_hide_tr(\'pd_1\');" style="cursor: pointer; width: 100%;" class=" row1"><td class="display_name"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Dzidra Breidaga</font></font></td>
<td class="district_display"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Salacgriva county</font></font></td>
<td class="formatted_year"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">2017-2018</font></font></td>
<td class="sum"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">
4491.20
</font></font></td>
</tr>
<tr onclick="show_hide_tr(\'pd_2\');" style="cursor: pointer; width: 100%;" class=" row1"><td class="display_name"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Janis Mikijanskis</font></font></td>
<td class="district_display"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Ludzas nov.</font></font></td>
<td class="formatted_year"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">2017-2018</font></font></td>
<td class="sum"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">
2926.31
</font></font></td>
</tr>'
library(tidyverse)
library(rvest)
# you cant select a node that isn't present
read_html(tb_text) %>%
html_nodes("table")
#> {xml_nodeset (0)}
# you can select it when present
read_html(paste("<table>",tb_text,"</table>")) %>%
html_nodes("table") %>%
html_table()
#> [[1]]
#> Beneficiary Location of the beneficiary Period of payments received
#> 1 Dzidra Breidaga Salacgriva county 2017-2018
#> 2 Janis Mikijanskis Ludzas nov. 2017-2018
#> EAGF and EAFRD, EUR
#> 1 4491.20
#> 2 2926.31
Created on 2020-11-03 by the reprex package (v0.3.0)
1 Like
system
Closed
November 10, 2020, 4:53pm
3
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.