java - Using JSoup to scrape Google Results -
i'm trying use jsoup scrape search results google. code.
public class googleoptimization { public static void main (string args[]) { document doc; try{ doc = jsoup.connect("https://www.google.com/search?as_q=&as_epq=%22yorkshire+capital%22+&as_oq=fraud+or+allegations+or+scam&as_eq=&as_nlo=&as_nhi=&lr=lang_en&cr=countryca&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=&as_filetype=&as_rights=").useragent("mozilla").ignorehttperrors(true).timeout(0).get(); elements links = doc.select("what should put here?"); (element link : links) { system.out.println("\n"+link.text()); } } catch (ioexception e) { e.printstacktrace(); } } }
i'm trying title of search results , snippets below title. yea, don't know element in order scrape these. if has better method scrape google using java love know.
thanks.
here go.
public class scanwebso { public static void main (string args[]) { document doc; try{ doc = jsoup.connect("https://www.google.com/search?as_q=&as_epq=%22yorkshire+capital%22+&as_oq=fraud+or+allegations+or+scam&as_eq=&as_nlo=&as_nhi=&lr=lang_en&cr=countryca&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=&as_filetype=&as_rights=").useragent("mozilla").ignorehttperrors(true).timeout(0).get(); elements links = doc.select("li[class=g]"); (element link : links) { elements titles = link.select("h3[class=r]"); string title = titles.text(); elements bodies = link.select("span[class=st]"); string body = bodies.text(); system.out.println("title: "+title); system.out.println("body: "+body+"\n"); } } catch (ioexception e) { e.printstacktrace(); } } }
also, suggest using chrome. right click on whatever want scrape , go inspect element. take exact spot in html element located. in case first want find out root of result listings are. when find that, want specify element, , preferably unique attribute search by. in case root element is
<ol eid="" id="rso">
below see bunch of listings start
<li class="g">
this want put initial elements array, each element want find spot title , body are. in case, found title under
<h3 class="r" style="white-space: normal;">
element. search element in each listing. same goes body. found body under searched using .text() method , returned text under element. key try , find element original attribute (using class name ideal). if don't , search "div" search entire page element containing div , return that. way more results want. hope explains well. let me know if have more questions.
Comments
Post a Comment