本文整理汇总了Java中org.apache.tika.parser.microsoft.ooxml.OOXMLParser类的典型用法代码示例。如果您正苦于以下问题:Java OOXMLParser类的具体用法?Java OOXMLParser怎么用?Java OOXMLParser使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。
OOXMLParser类属于org.apache.tika.parser.microsoft.ooxml包,在下文中一共展示了OOXMLParser类的7个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于我们的系统推荐出更棒的Java代码示例。
示例1: convertWordDocumentIntoHtml
import org.apache.tika.parser.microsoft.ooxml.OOXMLParser; //导入依赖的package包/类
/**
* Converts a .docx document into HTML markup. This code
* is based on <a href="http://stackoverflow.com/a/9053258/313554">this StackOverflow</a> answer.
*
* @param wordDocument The converted .docx document.
* @return
*/
public ConvertedDocumentDTO convertWordDocumentIntoHtml(MultipartFile wordDocument) {
LOGGER.info("Converting word document: {} into HTML", wordDocument.getOriginalFilename());
try {
InputStream input = wordDocument.getInputStream();
Parser parser = new OOXMLParser();
StringWriter sw = new StringWriter();
SAXTransformerFactory factory = (SAXTransformerFactory)
SAXTransformerFactory.newInstance();
TransformerHandler handler = factory.newTransformerHandler();
handler.getTransformer().setOutputProperty(OutputKeys.ENCODING, "utf-8");
handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "html");
handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
handler.setResult(new StreamResult(sw));
Metadata metadata = new Metadata();
metadata.add(Metadata.CONTENT_TYPE, "text/html;charset=utf-8");
parser.parse(input, handler, metadata, new ParseContext());
return new ConvertedDocumentDTO(wordDocument.getOriginalFilename(), sw.toString());
}
catch (IOException | SAXException | TransformerException | TikaException ex) {
LOGGER.error("Conversion failed because an exception was thrown", ex);
throw new DocumentConversionException(ex.getMessage(), ex);
}
}
开发者ID:Vincit,项目名称:spring-boot-word-to-html-example,代码行数:33,代码来源:WordToHtmlConverter.java
示例2: testSupports
import org.apache.tika.parser.microsoft.ooxml.OOXMLParser; //导入依赖的package包/类
public void testSupports() throws Exception
{
ArrayList<String> mimeTypes = new ArrayList<String>();
for (Parser p : new Parser[] {
new OfficeParser(), new OpenDocumentParser(),
new Mp3Parser(), new OOXMLParser()
}) {
Set<MediaType> mts = p.getSupportedTypes(new ParseContext());
for (MediaType mt : mts)
{
mimeTypes.add(mt.toString());
}
}
for (String mimetype : mimeTypes)
{
boolean supports = extracter.isSupported(mimetype);
assertTrue("Mimetype should be supported: " + mimetype, supports);
}
}
开发者ID:Alfresco,项目名称:alfresco-repository,代码行数:21,代码来源:TikaAutoMetadataExtracterTest.java
示例3: readXlsx
import org.apache.tika.parser.microsoft.ooxml.OOXMLParser; //导入依赖的package包/类
public static ExcelData readXlsx(String xlsxFilePath)
throws IOException, InvalidFormatException, XmlException, TikaException, SAXException {
BodyContentHandler bcHandler = new BodyContentHandler();
Metadata metadata = new Metadata();
FileInputStream inputStream = new FileInputStream(new File(xlsxFilePath));
ParseContext pcontext = new ParseContext();
OOXMLParser parser = new OOXMLParser();
parser.parse(inputStream, bcHandler, metadata, pcontext);
if (DEBUG_PRINT_META_DATA) {
System.err.println("Metadata:");
for (String name : metadata.names())
System.out.println(name + "\t:\t" + metadata.get(name));
}
ExcelData spreedsheet = new ExcelData(bcHandler.toString());
return spreedsheet;
}
开发者ID:mark-watson,项目名称:power-java,代码行数:17,代码来源:PoiMicrosoftFileReader.java
示例4: getParser
import org.apache.tika.parser.microsoft.ooxml.OOXMLParser; //导入依赖的package包/类
@Override
protected Parser getParser() {
return new OOXMLParser();
}
开发者ID:Alfresco,项目名称:alfresco-repository,代码行数:5,代码来源:PoiOOXMLContentTransformer.java
示例5: getParser
import org.apache.tika.parser.microsoft.ooxml.OOXMLParser; //导入依赖的package包/类
@Override
protected Parser getParser()
{
return new OOXMLParser();
}
开发者ID:Alfresco,项目名称:alfresco-repository,代码行数:6,代码来源:PoiMetadataExtracter.java
示例6: testExcelXLSB
import org.apache.tika.parser.microsoft.ooxml.OOXMLParser; //导入依赖的package包/类
/**
* We don't currently support the .xlsb file format
* (an OOXML container with binary blobs), but we
* shouldn't break on these files either (TIKA-826)
*/
@Test
public void testExcelXLSB() throws Exception {
Detector detector = new DefaultDetector();
AutoDetectParser parser = new AutoDetectParser();
InputStream input = ExcelParserTest.class.getResourceAsStream(
"/test-documents/testEXCEL.xlsb");
Metadata m = new Metadata();
m.add(Metadata.RESOURCE_NAME_KEY, "excel.xlsb");
// Should be detected correctly
MediaType type = null;
try {
type = detector.detect(input, m);
assertEquals("application/vnd.ms-excel.sheet.binary.macroenabled.12", type.toString());
} finally {
input.close();
}
// OfficeParser won't handle it
assertEquals(false, (new OfficeParser()).getSupportedTypes(new ParseContext()).contains(type));
// OOXMLParser won't handle it
assertEquals(false, (new OOXMLParser()).getSupportedTypes(new ParseContext()).contains(type));
// AutoDetectParser doesn't break on it
input = ExcelParserTest.class.getResourceAsStream("/test-documents/testEXCEL.xlsb");
try {
ContentHandler handler = new BodyContentHandler(-1);
ParseContext context = new ParseContext();
context.set(Locale.class, Locale.US);
parser.parse(input, handler, m, context);
String content = handler.toString();
assertEquals("", content);
} finally {
input.close();
}
}
开发者ID:kanrourou,项目名称:software-testing,代码行数:46,代码来源:ExcelParserTest.java
示例7: testExcel95
import org.apache.tika.parser.microsoft.ooxml.OOXMLParser; //导入依赖的package包/类
/**
* We don't currently support the old Excel 95 .xls file format,
* but we shouldn't break on these files either (TIKA-976)
*/
@Test
public void testExcel95() throws Exception {
Detector detector = new DefaultDetector();
AutoDetectParser parser = new AutoDetectParser();
InputStream input = ExcelParserTest.class.getResourceAsStream(
"/test-documents/testEXCEL_95.xls");
Metadata m = new Metadata();
m.add(Metadata.RESOURCE_NAME_KEY, "excel_95.xls");
// Should be detected correctly
MediaType type = null;
try {
type = detector.detect(input, m);
assertEquals("application/vnd.ms-excel", type.toString());
} finally {
input.close();
}
// OfficeParser will claim to handle it
assertEquals(true, (new OfficeParser()).getSupportedTypes(new ParseContext()).contains(type));
// OOXMLParser won't handle it
assertEquals(false, (new OOXMLParser()).getSupportedTypes(new ParseContext()).contains(type));
// AutoDetectParser doesn't break on it
input = ExcelParserTest.class.getResourceAsStream("/test-documents/testEXCEL_95.xls");
try {
ContentHandler handler = new BodyContentHandler(-1);
ParseContext context = new ParseContext();
context.set(Locale.class, Locale.US);
parser.parse(input, handler, m, context);
String content = handler.toString();
assertEquals("", content);
} finally {
input.close();
}
}
开发者ID:kanrourou,项目名称:software-testing,代码行数:45,代码来源:ExcelParserTest.java
注:本文中的org.apache.tika.parser.microsoft.ooxml.OOXMLParser类示例整理自Github/MSDocs等源码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。 |
请发表评论