web scraping - How to identify a request's crucial information that needs to be sent? -
i wanted scrape fares this website uses requests autocompletion.
this code:
import scrapy scrapy.http import request, formrequest import urllib class cabforcespider(scrapy.spider): name = 'cabforce' start_urls = ['https://www.cabforce.com'] complete_url = 'https://www.cabforce.com/v1/geo/autocomplete' def parse(self, response): payload = { 'chnl': 'cforce', 'complete': 'barcelona airport', 'destination': 'barcelona' } return request( self.complete_url, self.print_json, method='post', body=urllib.urlencode(payload), headers={'x-requested-with': 'xmlhttprequest'}) def print_json(self, response): print response.body
unfortunately response looks this:
{"status":"argumenterror","reason":"cannot validate input","description":null,"reasontype":2000,"details":[]}
how find out information missing needs sent request? thought jsessionid , version couldn't figure out how that. hints , have lovely day!
you not need cookies send request. problem with
body=urllib.urlencode(payload),
this encodes body url-format if @ body of request of browser see json body.
so solution import json
, change line mentioned above one:
body=json.dumps(payload),
in case following result spider:
{"status":"ok","result":{"autocomplete":{"elements":[{"type":16,"description":"(bcn) - barcelona airport, barcelona, spain","location":{"lat":41.289545,"lng":2.072639},"raw":{"name":"(bcn) - barcelona airport","city":"barcelona","country":"spain"}},{"location":{"lat":41.3181887517739,"lng":2.07441323388724},"description":"barcelona airport hotel, plaza volaterÃa, 3, el prat de llobregat, spain","raw":{"name":"barcelona airport hotel","city":"el prat de llobregat","country":"spain"},"type":4},{"location":{"lat":41.3176275,"lng":2.0249774},"description":"airport barcelona apartments, rafael casanova, 37, viladecans, spain","raw":{"name":"airport barcelona apartments","city":"viladecans","country":"spain"},"type":4}]}}}
Comments
Post a Comment