java - How to divide lines using html() method in Jsoup -
i'm experiencing 1 problem catching elements tag in jsoup. return of method links.html() writen in string crawlingnode = links.html();
writes in .txt file entire string without spaces or line divisions. but, in console shows links divide per line. so, need ask if there 1 way write in .txt file links divide per lines using html() method? cos me doesn't make sense returned method on console shows divided , on .txt file can same
ps: i'm sorry not give 1 shorter version, code complete runnable. focus on the
elements links = doc.getelementsbytag("cite"); string crawlingnode = links.html(); crawlingnode = crawlingnode.replaceall("(?=<).*?(>=?)", ""); //remove undesired html tags system.out.println(crawlingnode); httptest.writeonfile(writer, crawlingnode);
part, contains problem want solve. in advance!
public class httptest { static file file; file folder= null; string crawlingnode, date, timezone,tag="google node"; static bufferedwriter writer = null; static httptest ht; public httptest() throws ioexception{ date = new simpledateformat("yyyy.mm.dd hh-mm-ss").format(new date()); folder = new file("queries/downloads/"+date+" "+timezone.getdefault().getdisplayname()); file = new file(folder.getpath()+"\\"+date+" "+tag+".txt"); folder.mkdir(); } private void getlinks() throws ioexception{ document doc = jsoup.connect("http://google.com/search?q=mamamia") .useragent("mozilla/5.0 (x11; u; linux x86_64; en-gb; rv:1.8.1.6) gecko/20070723 iceweasel/2.0.0.6 (debian-2.0.0.6-0etch1)") .cookie("auth", "token") .timeout(3000) .get(); elements links = doc.getelementsbytag("cite"); string crawlingnode = links.html(); crawlingnode = crawlingnode.replaceall("(?=<).*?(>=?)", ""); //remove undesired html tags system.out.println(crawlingnode); httptest.writeonfile(writer, crawlingnode); } private static void openwriter(file file){ try { writer = new bufferedwriter(new filewriter(file)); } catch (ioexception e) { joptionpane.showmessagedialog(null, "failed open url writer"); e.printstacktrace(); } } private static void writeonfile(bufferedwriter writer, string crawlingnode){ try { writer.write(crawlingnode); } catch (ioexception e) { joptionpane.showmessagedialog(null, "failed write url node"); e.printstacktrace(); } } private static void closewriter(bufferedwriter writer){ try { writer.close(); } catch (ioexception e) { joptionpane.showmessagedialog(null, "unable close url writer"); system.err.println(e); } } public static void main (string[] args) throws ioexception{ ht = new httptest(); httptest.openwriter(file); ht.getlinks(); httptest.closewriter(writer); } }
the lines in crawlingnode
separated unix line-separator \n
. windows using \r\n
have problems see linebreak in e.g. notepad. use different editor or replace separators.
crawlingnode.replace("\n", system.getproperty("line.separator"))
Comments
Post a Comment