Scrapy xpath returns an empty list although tag and syntax are correct -

- August 15, 2013

in parse function, here code have written:

hs = selector(response) links = hs.xpath(".//*[@id='requisitionlistinterface.listrequisition']") items = [] x in links:         item =  crawlsiteitem()         item["title"] = x.xpath('.//*[contains(@title, "view job           description")]/text()').extract()         items.append(item) return items

and title returns empty list.

i capturing xpath id tag in links , in links tag, want list of values withthe title has view job description.

please me fix error in code.

if curl request of url provided curl "https://cognizant.taleo.net/careersection/indapac_itbpo_ext_career/moresearch.ftl?lang=en" site way different 1 see in browser. search results in following <a> element not have text() attribute select:

<a id="requisitionlistinterface.reqtitlelinkaction"      title="view job description"     href="#"     onclick="javascript:setevent(event);requisition_openrequisitiondescription('requisitionlistinterface','actopenrequisitiondescription',_ftl_api.lstval('requisitionlistinterface', 'requisitionlistinterface.listrequisition', 'requisitionlistinterface.id5645', this),_ftl_api.intval('requisitionlistinterface', 'requisitionlistinterface.id5649', this));return ftlutil_followlink(this);"> </a>

this because site loads site loads information displayed xhr request (you can in chrome example) , site updated dynamically returned information.

for information want extract should find xhr request (it not hard because one) , call scraper. resulting dataset can extract required data -- have create parsing algorithm goes through pipe separated format , splits job postings , extracts information need position, id, date , location.

Search This Blog

Dil

Scrapy xpath returns an empty list although tag and syntax are correct -

Comments

Post a Comment

Popular posts from this blog

c# - Store DBContext Log in other EF table -

c# - SetBinding not registering for PropertyChanged event -

javascript - Karma not able to start PhantomJS on Windows - Error: spawn UNKNOWN -