Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

java - Jsoup: Extracting innertext from anchor tag

Here's my problem. I have a html content: innerText I need to extract the "innerText". While trying this in Jsoup I found that the innertext goes outside the anchor tag when parsed by Jsoup.

Here's my code

Document doc=Jsoup.parse("<div>  <a href="#"> innerText  </a> </div>");
System.out.println(doc.html());

output:

<html>
 <head></head>
 <body>
  <div >
   <a href="#"></a>innerText
  </div>
 </body>
</html>

why is "innerText" moved outside the anchor tag?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can access the text by calling the text()method on the element.

Document doc = Jsoup.parse("<div>  <a href="#"> innerText  </a> </div>");
System.out.println(doc.html());
Elements rows = doc.getElementsByTag("a");
for (Element element : rows) {
    System.out.println("element = " + element.text());
}

btw. Using your posted code (and JSoup 1.8.1) produces the following output

<html>
    <head></head>
    <body>
        <div> 
            <a href="#"> innerText </a> 
        </div>
    </body>
</html>

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...