parsing - urllib2 returning nothing in python -


i confused !!! can tell me problem is??? code used work started returning nothing since yesterday !! did not make changes on !!! have idea???

import re re import sub import time import cookielib cookielib import cookiejar import urllib2 urllib2 import urlopen import difflib import requests   def twitparser():          try:             cj = cookiejar()                         opener = urllib2.build_opener(urllib2.httpcookieprocessor(cj))             res=opener.open('https://twitter.com/haberturk')             html=res.read()              splitsource=re.findall(r'<p class="js-tweet-text tweet-text">(.*?)</p>',html)             print len(splitsource)              item in splitsource:                 atweet = re.sub(r'<.*?>','',item)                 print atweet              except exception, e:                 print str(e)                 print 'error in main try'        twitparser() 

if code did not change, propably else did:

this tag not exists anymore:

<p class="js-tweet-text tweet-text"> 

instead there like:

profiletweet-text js-tweet-text u-dir 

although possible want using regexp, not use it, use xml parser instead:

from bs4 import beautifulsoup soup = beautifulsoup(html) ptags = soup.find_all("p") texts = [p.text p in ptags if "js-tweet-text" in p["class"]] 

propably split function, first making sure html, if find p tags, if find meet criteria.

as wooble said, use twitter api instead, these companies offer don't have scrape , cost them resources.


Comments

Popular posts from this blog

how to proxy from https to http with lighttpd -

android - Automated my builds -

python - Flask migration error -