xml - Load a table from wikipedia into R -

- June 15, 2012

i'm trying load table of supreme court justices r following url. https://en.wikipedia.org/wiki/list_of_justices_of_the_supreme_court_of_the_united_states

i'm using following code:

scotusurl <- "https://en.wikipedia.org/wiki/list_of_justices_of_the_supreme_court_of_the_united_states" scotusdata <- geturl(scotusurl, ssl.verifypeer = false) scotusdoc <- htmlparse(scotusdata) scotusdata <- scotusdoc['//table[@class="wikitable"]'] scotustable <- readhtmltable(scotusdata[[1]], stringsasfactors = false)

r returns scotustable null. goal here data.frame in r can use make ggplot of scotus justice tenure on court. had script working make awesome plot, after recent decisions changed on page , script not function. went through html on wikipedia try find changes, i'm not webdev break script isn't apparent.

additionally, there method in r allow me cache data page i'm not referencing url? seem ideal way avoid issue in future. appreciate help.

as aside, scotus in on-going hobby/side-project of mine if there's other data source out there that's better wikipedia, i'm ears.

edit: sorry should have listed dependencies. i'm using xml, plyr, rcurl, data.table, , ggplot2 libraries.

if don't mind using different package, can try "rvest" package.

library(rvest)     scotusurl <- "https://en.wikipedia.org/wiki/list_of_justices_of_the_supreme_court_of_the_united_states"

option 1: grab tables page , use html_table function extract tables you're interested in.

temp <- scotusurl %>%    html %>%   html_nodes("table")  html_table(temp[1]) ## "legend" table html_table(temp[2]) ## table you're interested in

option 2: inspect table element , copy xpath read table directly (right-click, inspect element, scroll relevant "table" tag, right click on that, , select "copy xpath").
```
scotusurl %>%    html %>%    html_nodes(xpath = '//*[@id="mw-content-text"]/table[2]') %>%    html_table 
```

another option loading data in google spreadsheet , reading using "googlesheets" package.

in google drive, create new spreadsheet named, instance "supreme court". in first worksheet, enter:

=importhtml("https://en.wikipedia.org/wiki/list_of_justices_of_the_supreme_court_of_the_united_states", "table", 2)

this automatically scrape table google spreadsheet.

from there, in r can do:

library(googlesheets) sc <- gs_title("supreme court") gs_read(sc)

Search This Blog

Bay WIKI

xml - Load a table from wikipedia into R -

Comments

Post a Comment

Popular posts from this blog

Android : Making Listview full screen -

javascript - Parse JSON from the body of the POST -

Automatically Create Database in Entity Framework 6 with Automatic Migrations Disabled -