Scrapy xpath returns an empty list although tag and syntax are correct -
in parse function, here code have written:
hs = selector(response) links = hs.xpath(".//*[@id='requisitionlistinterface.listrequisition']") items = [] x in links: item = crawlsiteitem() item["title"] = x.xpath('.//*[contains(@title, "view job description")]/text()').extract() items.append(item) return items
and title returns empty list.
i capturing xpath id tag in links , in links tag, want list of values withthe title has view job description.
please me fix error in code.
if curl request of url provided curl "https://cognizant.taleo.net/careersection/indapac_itbpo_ext_career/moresearch.ftl?lang=en"
site way different 1 see in browser. search results in following <a>
element not have text()
attribute select:
<a id="requisitionlistinterface.reqtitlelinkaction" title="view job description" href="#" onclick="javascript:setevent(event);requisition_openrequisitiondescription('requisitionlistinterface','actopenrequisitiondescription',_ftl_api.lstval('requisitionlistinterface', 'requisitionlistinterface.listrequisition', 'requisitionlistinterface.id5645', this),_ftl_api.intval('requisitionlistinterface', 'requisitionlistinterface.id5649', this));return ftlutil_followlink(this);"> </a>
this because site loads site loads information displayed xhr request (you can in chrome example) , site updated dynamically returned information.
for information want extract should find xhr request (it not hard because one) , call scraper. resulting dataset can extract required data -- have create parsing algorithm goes through pipe separated format , splits job postings , extracts information need position, id, date , location.
Comments
Post a Comment