java - text from url, but it gets copied 3 to 4 times -


when execute code, raw data website getting written, gets written 3-4 times same content..i not sure how resolve this..can me out..

i use jsoup..

public static void main(string a[]) {     try      {     //url url=new url("https://in.yahoo.com/?p=us");     document doc = jsoup.connect("http://www.businessinsider.in/").get();     elements contents = doc.select("div") ;       printwriter out = new printwriter(new filewriter("e:/outputtext.txt"));      for(element p : contents) {         out.print(p.text());     }    catch(exception e)     {         e.printstacktrace();     } } 

content website gets saved in .txt file, content gets copied 3-4 times in same file...

with selector doc.select("div") pick div elements of document, ones inside other div elements, resulting in duplication.

maybe should differentiate , select ones need.

if want full content, not need jsoup parser @ all. still can use jsoup net access, can leave out parser this:

connection con = jsoup.connect("http://www.businessinsider.in/");     response res = con.execute(); string rawcontent = res.body(); printwriter out = new printwriter(new filewriter("e:/outputtext.txt")); out.print(rawcontent); 

or, if have use parser after all, body tag instead of div

document doc = jsoup.connect("http://www.businessinsider.in/").get(); element bodyel = doc.body(); string bodyst = bodyel.html(); 

Comments

Popular posts from this blog

python - Healpy: From Data to Healpix map -

c - Bitwise operation with (signed) enum value -

xslt - Unnest parent nodes by child node -