parsing - urllib2 returning nothing in python -

i confused !!! can tell me problem is??? code used work started returning nothing since yesterday !! did not make changes on !!! have idea???

import re re import sub import time import cookielib cookielib import cookiejar import urllib2 urllib2 import urlopen import difflib import requests   def twitparser():          try:             cj = cookiejar()                         opener = urllib2.build_opener(urllib2.httpcookieprocessor(cj))             res=opener.open('https://twitter.com/haberturk')             html=res.read()              splitsource=re.findall(r'<p class="js-tweet-text tweet-text">(.*?)</p>',html)             print len(splitsource)              item in splitsource:                 atweet = re.sub(r'<.*?>','',item)                 print atweet              except exception, e:                 print str(e)                 print 'error in main try'        twitparser()

if code did not change, propably else did:

this tag not exists anymore:

<p class="js-tweet-text tweet-text">

instead there like:

profiletweet-text js-tweet-text u-dir

although possible want using regexp, not use it, use xml parser instead:

from bs4 import beautifulsoup soup = beautifulsoup(html) ptags = soup.find_all("p") texts = [p.text p in ptags if "js-tweet-text" in p["class"]]

propably split function, first making sure html, if find p tags, if find meet criteria.

as wooble said, use twitter api instead, these companies offer don't have scrape , cost them resources.

Search This Blog

WIKI

parsing - urllib2 returning nothing in python -

Comments

Post a Comment

Popular posts from this blog

android - Automated my builds -

how to proxy from https to http with lighttpd -

python - Flask migration error -