java - How to divide lines using html() method in Jsoup -

April 15, 2012

i'm experiencing 1 problem catching elements tag in jsoup. return of method links.html() writen in string crawlingnode = links.html(); writes in .txt file entire string without spaces or line divisions. but, in console shows links divide per line. so, need ask if there 1 way write in .txt file links divide per lines using html() method? cos me doesn't make sense returned method on console shows divided , on .txt file can same

ps: i'm sorry not give 1 shorter version, code complete runnable. focus on the

elements links = doc.getelementsbytag("cite");               string crawlingnode = links.html();                 crawlingnode = crawlingnode.replaceall("(?=<).*?(>=?)", ""); //remove undesired html tags                     system.out.println(crawlingnode);                         httptest.writeonfile(writer, crawlingnode);

part, contains problem want solve. in advance!

public class httptest {          static file file;         file folder= null;         string crawlingnode, date,  timezone,tag="google node";         static bufferedwriter writer = null;         static httptest ht;          public httptest() throws ioexception{              date = new simpledateformat("yyyy.mm.dd hh-mm-ss").format(new date());                 folder = new file("queries/downloads/"+date+" "+timezone.getdefault().getdisplayname());                     file = new file(folder.getpath()+"\\"+date+" "+tag+".txt");                          folder.mkdir();          }          private void getlinks() throws ioexception{              document doc = jsoup.connect("http://google.com/search?q=mamamia")                         .useragent("mozilla/5.0 (x11; u; linux x86_64; en-gb; rv:1.8.1.6) gecko/20070723 iceweasel/2.0.0.6 (debian-2.0.0.6-0etch1)")                         .cookie("auth", "token")                         .timeout(3000)                         .get();                  elements links = doc.getelementsbytag("cite");                   string crawlingnode = links.html();                     crawlingnode = crawlingnode.replaceall("(?=<).*?(>=?)", ""); //remove undesired html tags                         system.out.println(crawlingnode);                             httptest.writeonfile(writer, crawlingnode);          }              private static void openwriter(file file){                try {                     writer = new bufferedwriter(new filewriter(file));              } catch (ioexception e) {                  joptionpane.showmessagedialog(null, "failed open url writer");                     e.printstacktrace();              }             }             private static void writeonfile(bufferedwriter writer, string crawlingnode){                 try {                      writer.write(crawlingnode);             } catch (ioexception e) {                  joptionpane.showmessagedialog(null, "failed write url node");                     e.printstacktrace();              }             }              private static void closewriter(bufferedwriter writer){                try {                      writer.close();                 } catch (ioexception e) {                     joptionpane.showmessagedialog(null, "unable close url writer");                     system.err.println(e);                 }            }             public static void main (string[] args) throws ioexception{                  ht = new httptest();                 httptest.openwriter(file);                 ht.getlinks();                 httptest.closewriter(writer);          }      }

the lines in crawlingnode separated unix line-separator \n. windows using \r\n have problems see linebreak in e.g. notepad. use different editor or replace separators.

crawlingnode.replace("\n", system.getproperty("line.separator"))

Search This Blog

Live

java - How to divide lines using html() method in Jsoup -

Comments

Post a Comment

Popular posts from this blog

How to mention the localhost in android -

php - Calling a template part from a post -

c# - String.format() DateTime With Arabic culture -