asp.net - Scraping using Html Agility Package -
i trying scrape data news article using htmlagilitypackage link follows http://www.ndtv.com/india-news/vyapam-scam-documents-show-chief-minister-shivraj-chouhan-delayed-probe-780528
i have written following code below extract comments in articles reason variable atags returning null value
code:
var gethtmlweb = new htmlweb(); var document = gethtmlweb.load(txtinputurl.text); var atags = document.documentnode.selectnodes("//div[@class='com_user_text']"); int counter = 1; if (atags != null) { foreach (var atag in atags) { lbloutput.text += lbloutput.text + ". " + atag.innerhtml + "\t" + "<br />"; counter++; } }
i have used xpath still same result //div[@class='newcomment_list']/ul/li/div[@class='headerwrap']/div[@class='com_user_text'] please me correct xpath extract comments searched on net no solution.
do 'view source' on page , search com_user_text
. user comments don't appear @ all. loaded via javascript after page loaded. when load page content via gethtmlweb.load()
, don't user comments.
as this answer says, html agility not tool capable of emulating browser , running javascript. instead, need watin "allows programmatic access web pages through given browser engine , load full document."
Comments
Post a Comment