Mechanize Python page download does not work with HTTPS -
i'm on linux mint 13 xfce 32-bit, 3.2.0-7
python 2.7.3
. i'm trying read source code of webpage protected https. here's little program:
#!/usr/bin/env python import mechanize browser = mechanize.browser() browser.set_handle_robots(false) browser.set_handle_equiv(false) browser.addheaders = [('user-agent', 'mozilla/5.0 (macintosh; intel mac os x 10_10_1) applewebkit/537.36 (khtml, gecko) chrome/39.0.2171.95 safari/537.36'), ('accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'), ('accept-encoding', 'gzip, deflate, sdch'), ('accept-language', 'en-us,en;q=0.8,ru;q=0.6'), ('cache-control', 'max-age=0'), ('connection', 'keep-alive')] html = browser.open('https://scholar.google.com/citations?view_op=search_authors') print html.read()
but instead of source code of page, see this:
what's problem , how fix it? need use mechanize, since need play page later on.
your code works me, remove line
('accept-encoding', 'gzip, deflate, sdch'),
to not having reverse encoding afterwards. clarify: getting content, expect in "clear text". clear text not requesting gzipped content.
Comments
Post a Comment