java - Why HTMLUnit always shows the HostPage no matter what url I type in (Crawlable GWT APP)? -
here full code
public class crawlservlet implements filter{ public static string getfullurl(httpservletrequest request) { stringbuffer requesturl = request.getrequesturl(); string querystring = request.getquerystring(); if (querystring == null) { return requesturl.tostring(); } else { return requesturl.append('?').append(querystring).tostring(); } } @override public void destroy() { // todo auto-generated method stub } @override public void dofilter(servletrequest request, servletresponse response, filterchain chain) throws ioexception, servletexception { httpservletrequest httprequest = (httpservletrequest) request; string fullurlquerystring = getfullurl(httprequest); system.out.println(fullurlquerystring+" wrong"); if ((fullurlquerystring != null) && (fullurlquerystring.contains("_escaped_fragment_"))) { // remember unescape %xx characters fullurlquerystring=urldecoder.decode(fullurlquerystring,"utf-8"); // rewrite url original #! version string url_with_hash_fragment=fullurlquerystring.replace("?_escaped_fragment_=", "#!"); final webclient webclient = new webclient(); webclientoptions options = webclient.getoptions(); options.setcssenabled(false); options.setthrowexceptiononscripterror(false); options.setthrowexceptiononfailingstatuscode(false); options.setjavascriptenabled(false); htmlpage page = webclient.getpage(url_with_hash_fragment); // important! give headless browser enough time execute javascript // exact time wait may depend on application. webclient.waitforbackgroundjavascript(20000); // return snapshot //string originalhtml=page.getwebresponse().getcontentasstring(); //system.out.println(originalhtml+" +++++++++"); system.out.println(page.asxml()+" +++++++++"); printwriter out = response.getwriter(); out.println(page.asxml()); //out.println(originalhtml); } else { try { // not _escaped_fragment_ url, move chain of servlet (filters) chain.dofilter(request, response); } catch (servletexception e) { system.err.println("servlet exception caught: " + e); e.printstacktrace(); } } } @override public void init(filterconfig arg0) throws servletexception { // todo auto-generated method stub } }
after opened url "http://127.0.0.1:8888/myproject.html?gwt.codesvr=127.0.0.1:9997?_escaped_fragment_=article
", showed host page html code this:
<html> <head> <meta name="fragment" content="!"> <meta http-equiv="content-type" content="text/html; charset=utf-8"/> <!-- --> <!-- consider inlining css reduce number of requested files --> <!-- --> <link type="text/css" rel="stylesheet" href="myproject.css"/> <!-- --> <!-- title fine --> <!-- --> <title>myproject</title> <!-- --> <!-- script loads compiled module. --> <!-- if add gwt meta tags, must --> <!-- added before line. --> <!-- --> <script type="text/javascript" language="javascript" ></script> <!-- --> <!-- body can have arbitrary html, or --> <!-- can leave body empty if want --> <!-- create dynamic ui. --> <!-- --> </head> <body> <div id="loading"> loading <br/> <img src="../images/loading.gif"/> </div> <!-- optional: include if want history support --> <iframe src="javascript:''" id="__gwt_historyframe" tabindex="-1" style="position: absolute; width: 0;height: 0; border:0;"></iframe> <!-- recommended if web app not function without javascript enabled --> <noscript> <div style="width: 22em; position: absolute; left: 50%; margin-left: -11em; color: red; background-color: white; border: 1pxsolid red; padding: 4px; font-family: sans-serif;"> web browser must have javascript enabled in order application display correctly. </div> </noscript> </body> </html>
on other hand, "http://127.0.0.1:8888/myproject.html?gwt.codesvr=127.0.0.1:9997#!article
" works ok & show article without problem.
i compiled whole project & ran under tomcat7, have same problem. shows html of host page.
note: article page nested presenter embedded inside header presenter. don't think main reason cos didn't show header page.
first, instead of ?_escaped_fragment_=article
, perhaps try &_escaped_fragment_=article
because have ?
gwt.codesvr
, 2 ?
may mess url parameter parsing.
second, need make sure filter handle case of having parameter gwt.codesvr
. looks filter assumes first parameter -- i.e., starting ?
. believe example here work either way.
Comments
Post a Comment