java - How to divide lines using html() method in Jsoup -


i'm experiencing 1 problem catching elements tag in jsoup. return of method links.html() writen in string crawlingnode = links.html(); writes in .txt file entire string without spaces or line divisions. but, in console shows links divide per line. so, need ask if there 1 way write in .txt file links divide per lines using html() method? cos me doesn't make sense returned method on console shows divided , on .txt file can same

ps: i'm sorry not give 1 shorter version, code complete runnable. focus on the

elements links = doc.getelementsbytag("cite");               string crawlingnode = links.html();                 crawlingnode = crawlingnode.replaceall("(?=<).*?(>=?)", ""); //remove undesired html tags                     system.out.println(crawlingnode);                         httptest.writeonfile(writer, crawlingnode); 

part, contains problem want solve. in advance!

public class httptest {          static file file;         file folder= null;         string crawlingnode, date,  timezone,tag="google node";         static bufferedwriter writer = null;         static httptest ht;          public httptest() throws ioexception{              date = new simpledateformat("yyyy.mm.dd hh-mm-ss").format(new date());                 folder = new file("queries/downloads/"+date+" "+timezone.getdefault().getdisplayname());                     file = new file(folder.getpath()+"\\"+date+" "+tag+".txt");                          folder.mkdir();          }          private void getlinks() throws ioexception{              document doc = jsoup.connect("http://google.com/search?q=mamamia")                         .useragent("mozilla/5.0 (x11; u; linux x86_64; en-gb; rv:1.8.1.6) gecko/20070723 iceweasel/2.0.0.6 (debian-2.0.0.6-0etch1)")                         .cookie("auth", "token")                         .timeout(3000)                         .get();                  elements links = doc.getelementsbytag("cite");                   string crawlingnode = links.html();                     crawlingnode = crawlingnode.replaceall("(?=<).*?(>=?)", ""); //remove undesired html tags                         system.out.println(crawlingnode);                             httptest.writeonfile(writer, crawlingnode);          }              private static void openwriter(file file){                try {                     writer = new bufferedwriter(new filewriter(file));              } catch (ioexception e) {                  joptionpane.showmessagedialog(null, "failed open url writer");                     e.printstacktrace();              }             }             private static void writeonfile(bufferedwriter writer, string crawlingnode){                 try {                      writer.write(crawlingnode);             } catch (ioexception e) {                  joptionpane.showmessagedialog(null, "failed write url node");                     e.printstacktrace();              }             }              private static void closewriter(bufferedwriter writer){                try {                      writer.close();                 } catch (ioexception e) {                     joptionpane.showmessagedialog(null, "unable close url writer");                     system.err.println(e);                 }            }             public static void main (string[] args) throws ioexception{                  ht = new httptest();                 httptest.openwriter(file);                 ht.getlinks();                 httptest.closewriter(writer);          }      } 

the lines in crawlingnode separated unix line-separator \n. windows using \r\n have problems see linebreak in e.g. notepad. use different editor or replace separators.

crawlingnode.replace("\n", system.getproperty("line.separator")) 

Comments

Popular posts from this blog

How to mention the localhost in android -

php - Calling a template part from a post -

c# - String.format() DateTime With Arabic culture -