本文整理汇总了Java中org.cyberneko.html.parsers.DOMFragmentParser类的典型用法代码示例。如果您正苦于以下问题:Java DOMFragmentParser类的具体用法?Java DOMFragmentParser怎么用?Java DOMFragmentParser使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。
DOMFragmentParser类属于org.cyberneko.html.parsers包,在下文中一共展示了DOMFragmentParser类的6个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。
示例1: htmlToText
import org.cyberneko.html.parsers.DOMFragmentParser; //导入依赖的package包/类
/**
* @param html
* @return text from HTML
*/
public static String htmlToText(String html) {
DOMFragmentParser parser = new DOMFragmentParser();
StringBuffer buffer = new StringBuffer();
try {
ByteArrayInputStream fin = new ByteArrayInputStream(html.getBytes("UTF-8"));
InputSource inSource = new InputSource(fin);
CoreDocumentImpl codeDoc = new CoreDocumentImpl();
DocumentFragment doc = codeDoc.createDocumentFragment();
parser.parse(inSource, doc);
processNode(buffer, doc);
fin.close();
} catch (Exception e) {
return null;
}
return buffer.toString();
}
开发者ID:MobileManAG,项目名称:Project-H-Backend,代码行数:24,代码来源:HTMLTextParser.java
示例2: stringToNode
import org.cyberneko.html.parsers.DOMFragmentParser; //导入依赖的package包/类
protected Node stringToNode(String str) {
try {
final DOMFragmentParser parser = new DOMFragmentParser();
final DocumentFragment fragment = document.createDocumentFragment();
parser.parse(new InputSource(new StringReader(str)), fragment);
return fragment;
// try and return the element itself if possible...
// NodeList nl = fragment.getChildNodes();
// for (int i=0; i<nl.getLength(); i++) if (nl.item(i).getNodeType()
// == Node.ELEMENT_NODE) return nl.item(i);
// return fragment;
} catch (final Exception e) {
throw new RuntimeException(e);
}
}
开发者ID:openimaj,项目名称:openimaj,代码行数:18,代码来源:Readability.java
示例3: htmlToText
import org.cyberneko.html.parsers.DOMFragmentParser; //导入依赖的package包/类
/**
* 将 html 格式的文本过滤掉标签.
* @param html
* html 格式的字符串
* @return String
* 过滤掉 html 标签后的文本。如果 html 为空,返回空串""
*/
private String htmlToText(String html) {
if (html == null) {
return "";
}
DOMFragmentParser parser = new DOMFragmentParser();
CoreDocumentImpl codeDoc = new CoreDocumentImpl();
InputSource inSource = new InputSource(new ByteArrayInputStream(html.getBytes()));
inSource.setEncoding(textCharset);
DocumentFragment doc = codeDoc.createDocumentFragment();
try {
parser.parse(inSource, doc);
} catch (Exception e) {
return "";
}
textBuffer = new StringBuffer();
processNode(doc);
return textBuffer.toString();
}
开发者ID:heartsome,项目名称:translationstudio8,代码行数:28,代码来源:MessageParser.java
示例4: parse
import org.cyberneko.html.parsers.DOMFragmentParser; //导入依赖的package包/类
public static Node parse(String content) throws SAXException, IOException {
DOMFragmentParser parser = new DOMFragmentParser();
HTMLDocument document = new HTMLDocumentImpl();
DocumentFragment fragment = document.createDocumentFragment();
InputSource is = new InputSource(new StringReader(content));
parser.parse(is, fragment);
return fragment;
}
开发者ID:bsspirit,项目名称:kettle-4.4.0-stable,代码行数:10,代码来源:CarteTest.java
示例5: parse
import org.cyberneko.html.parsers.DOMFragmentParser; //导入依赖的package包/类
public static Node parse( String content ) throws SAXException, IOException {
DOMFragmentParser parser = new DOMFragmentParser();
HTMLDocument document = new HTMLDocumentImpl();
DocumentFragment fragment = document.createDocumentFragment();
InputSource is = new InputSource( new StringReader( content ) );
parser.parse( is, fragment );
return fragment;
}
开发者ID:pentaho,项目名称:pentaho-kettle,代码行数:10,代码来源:CarteIT.java
示例6: setup
import org.cyberneko.html.parsers.DOMFragmentParser; //导入依赖的package包/类
@Before
public void setup() throws Exception {
conf = NutchConfiguration.create();
conf.setBoolean("parser.html.form.use_action", true);
utils = new DOMContentUtils(conf);
DOMFragmentParser parser = new DOMFragmentParser();
parser.setFeature(
"http://cyberneko.org/html/features/scanner/allow-selfclosing-iframe",
true);
for (int i = 0; i < testPages.length; i++) {
DocumentFragment node = new HTMLDocumentImpl().createDocumentFragment();
try {
parser.parse(
new InputSource(new ByteArrayInputStream(testPages[i].getBytes())),
node);
testBaseHrefURLs[i] = new URL(testBaseHrefs[i]);
} catch (Exception e) {
Assert.assertTrue("caught exception: " + e, false);
}
testDOMs[i] = node;
}
answerOutlinks = new Outlink[][] {
{ new Outlink("http://www.nutch.org", "anchor"), },
{ new Outlink("http://www.nutch.org/", "home"),
new Outlink("http://www.nutch.org/docs/bot.html", "bots"), },
{ new Outlink("http://www.nutch.org/", "separate this"),
new Outlink("http://www.nutch.org/docs/ok", "from this"), },
{ new Outlink("http://www.nutch.org/", "home"),
new Outlink("http://www.nutch.org/docs/1", "1"),
new Outlink("http://www.nutch.org/docs/2", "2"), },
{ new Outlink("http://www.nutch.org/frames/top.html", ""),
new Outlink("http://www.nutch.org/frames/left.html", ""),
new Outlink("http://www.nutch.org/frames/invalid.html", ""),
new Outlink("http://www.nutch.org/frames/right.html", ""), },
{ new Outlink("http://www.nutch.org/maps/logo.gif", ""),
new Outlink("http://www.nutch.org/index.html", ""),
new Outlink("http://www.nutch.org/maps/#bottom", ""),
new Outlink("http://www.nutch.org/bot.html", ""),
new Outlink("http://www.nutch.org/docs/index.html", ""), },
{ new Outlink("http://www.nutch.org/index.html", "whitespace test"), },
{},
{ new Outlink("http://www.nutch.org/dummy.jsp", "test2"), },
{},
{ new Outlink("http://www.nutch.org/;x", "anchor1"),
new Outlink("http://www.nutch.org/g;x", "anchor2"),
new Outlink("http://www.nutch.org/g;x?y#s", "anchor3") },
{
// this is tricky - see RFC3986 section 5.4.1 example 7
new Outlink("http://www.nutch.org/g", "anchor1"),
new Outlink("http://www.nutch.org/g?y#s", "anchor2"),
new Outlink("http://www.nutch.org/;something?y=1", "anchor3"),
new Outlink("http://www.nutch.org/;something?y=1#s", "anchor4"),
new Outlink("http://www.nutch.org/;something?y=1;somethingelse",
"anchor5") } };
}
开发者ID:jorcox,项目名称:GeoCrawler,代码行数:57,代码来源:TestDOMContentUtils.java
注:本文中的org.cyberneko.html.parsers.DOMFragmentParser类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。 |
请发表评论