python regex invalid syntax -
i testing code current 2600 magazine wordlist generator based off bunch of searches in google. invalid syntax line:
results.extend(re.findall("<a href="/%201d([^/%201d]*)/%201d">class=(?:1|s)",data.read()))
i new regex did research on basics of re , seemed easy still didn't understand /%201d. did search on , found thats it's hex of char code. still stuck on making work. here rest of code. line i'm having problem line 36.
this function:
import re, sys, os, urllib ### custom useragent ### class appurlopener(urllib.fancyurlopener): version = "mozilla/5.0(compatable;msie 9.0; windows nt 6.1; trident/5.0)" urllib._urlopener = appurlopener() uopen = urllib.urlopen uencode = urllib.urlencode def google(query, numget=10, verbose=0): numget = int(numget) start = 0 results = [] if verbose == 2: print("[+]getting " + str(numget) + " results") while len(results) < numget: print("[+]" + str(len(results)) + " far...") data = uopen("https://www.google.com/search?q="+query+"&star="+str(start)) if data.code != 200: print("error " + str(data.code)) break results.extend(re.findall("<a href="/%201d([^/%201d]*)/%201d">class=(?:1|s)",data.read())) print(data.read()) start += 10 if verbose == 2: print("[+] got " + str(numget) + " results") return results[:numget]
first need escape " in <a href="
"<a href=\"/%201d([^/%201d]*)/%201d\">class=(?:1|s)"
second, %20
encodes single space in urls, %201d
corresponds " 1d"
.
Comments
Post a Comment