java - Crawler4j crawl jquery live content -
i have website on category page , product list generated after page loaded via javascript. , crawler goes , couldnt find product. how can solve problem ?
crawlconfig config = new crawlconfig(); config.setcrawlstoragefolder(rootfolder); config.setmaxpagestofetch(100000000); config.setmaxdepthofcrawling(-1); config.setpolitenessdelay(1); config.setuseragentstring("mozilla/5.0 (windows nt 6.1; wow64) applewebkit/537.36 (khtml, gecko) chrome/33.0.1750.146 safari/537.36"); //config.setresumablecrawling(true); config.setincludehttpspages(true); pagefetcher pagefetcher = new pagefetcher(config); robotstxtconfig robotstxtconfig = new robotstxtconfig(); robotstxtconfig.setenabled(false); robotstxtserver robotstxtserver = new robotstxtserver(robotstxtconfig, pagefetcher); crawlcontroller controller = new crawlcontroller(config, pagefetcher, robotstxtserver); controller.addseed(sitedomain); for(int = 4; i<=14; i++) { if(i < args.length) { controller.addseed(args[i]); } } controller.start(crawling.class, numberofcrawlers); list<object> crawlerslocaldata = controller.getcrawlerslocaldata();
unfortunately, crawler4j supports static content. javascript , ajax support use crawler crawljax or nutch selenium.
Comments
Post a Comment