java - text from url, but it gets copied 3 to 4 times -
when execute code, raw data website getting written, gets written 3-4 times same content..i not sure how resolve this..can me out..
i use jsoup..
public static void main(string a[]) { try { //url url=new url("https://in.yahoo.com/?p=us"); document doc = jsoup.connect("http://www.businessinsider.in/").get(); elements contents = doc.select("div") ; printwriter out = new printwriter(new filewriter("e:/outputtext.txt")); for(element p : contents) { out.print(p.text()); } catch(exception e) { e.printstacktrace(); } }
content website gets saved in .txt file, content gets copied 3-4 times in same file...
with selector doc.select("div")
pick div
elements of document, ones inside other div
elements, resulting in duplication.
maybe should differentiate , select ones need.
if want full content, not need jsoup parser @ all. still can use jsoup net access, can leave out parser this:
connection con = jsoup.connect("http://www.businessinsider.in/"); response res = con.execute(); string rawcontent = res.body(); printwriter out = new printwriter(new filewriter("e:/outputtext.txt")); out.print(rawcontent);
or, if have use parser after all, body
tag instead of div
document doc = jsoup.connect("http://www.businessinsider.in/").get(); element bodyel = doc.body(); string bodyst = bodyel.html();
Comments
Post a Comment