Java PDDocument类代码示例

OGeek|极客世界-中国程序员成长平台 › 门户 › 编程› Java›Java编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Java中org.pdfbox.pdmodel.PDDocument类的典型用法代码示例。如果您正苦于以下问题：Java PDDocument类的具体用法？Java PDDocument怎么用？Java PDDocument使用的例子？那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。

PDDocument类属于org.pdfbox.pdmodel包，在下文中一共展示了PDDocument类的20个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Java代码示例。

示例1: getWordsToHighlight

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
public String[] getWordsToHighlight(String[] highlightWords) {
    NRC_PDFHighlighter hl;
    CharArrayWriter xmlOutput = null;
    String[] wordsToHighlight = null;
    try {
        hl = new NRC_PDFHighlighter();
        PDDocument pdDocument = new PDDocument(document);
        xmlOutput = new CharArrayWriter();
        hl.generateXMLHighlight(pdDocument, highlightWords, xmlOutput);
        wordsToHighlight = hl.getWordsToHighlight();
    } catch (IOException e) {
        e.printStackTrace();
    }
    return wordsToHighlight;
    
}

开发者ID:LowResourceLanguages，项目名称:InuktitutComputing，代码行数:17，代码来源:NRC_PDFDocument.java

示例2: generateXMLHighlight

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
/**
 * Generate an XML highlight string based on the PDF.
 * 
 * @param pdDocument The PDF to find words in.
 * @param sWords The words to search for.
 * @param xmlOutput The resulting output xml file.
 * 
 * @throws IOException If there is an error reading from the PDF, or writing to the XML.
 */
public void generateXMLHighlight(PDDocument pdDocument, String[] sWords, Writer xmlOutput ) throws IOException
{
    String ls = System.getProperty("line.separator");
    highlighterOutput = xmlOutput;
    searchedWords = sWords;
    foundWords = new Vector(); // initialization - vector filled  in endPage()
    highlighterOutput.write("<XML>"+ls+"<Body units=characters " + 
                            //color and mode are not implemented by the highlight spec
                            //so don't include them for now
                            //" color=#" + getHighlightColorAsString() + 
                            //" mode=active " + */ 
                            " version=2>"+ls+"<Highlight>");
    highlighterOutput.write(ls);
    textOS = new ByteArrayOutputStream();
    textWriter = new OutputStreamWriter( textOS, "UTF-16" );
    writeText(pdDocument, textWriter);
    highlighterOutput.write("</Highlight>"+ls+"</Body>"+ls+"</XML>");
    highlighterOutput.flush();
}

开发者ID:LowResourceLanguages，项目名称:InuktitutComputing，代码行数:29，代码来源:NRC_PDFHighlighter.java

示例3: getPDFdocument

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
/**
 * @param inputStream
 * @param contentHandler
 * @return
 */
private PDDocument getPDFdocument(InputStream inputStream, ContentHandler contentHandler) {
    PDDocument doc = null;
    // Create access to PDF Document
    try {
        // We get the document from the inputstream
        doc = PDDocument.load(inputStream);
    } catch (IOException e) {
        logger.error("PDFParser(InputStream)", e);
        doc = null; // We reset the object
        // We write our some stuff into output document, so we have a
        // chance to see what went wrong
        addErrorTagToOutput(contentHandler, e.toString());
    }

    return doc;

}

开发者ID:evlist，项目名称:orbeon-forms，代码行数:23，代码来源:FromPdfConverter.java

示例4: textContentOf

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
private static String textContentOf(byte[] pdfData) throws IOException {
    PDDocument pdfDocument = PDDocument.load(new ByteArrayInputStream(pdfData));
    try {
        return new PDFTextStripper().getText(pdfDocument);
    } finally {
        pdfDocument.close();
    }
}

开发者ID:hmcts，项目名称:cmc-pdf-service，代码行数:9，代码来源:GeneratedPDFContentV2Test.java

示例5: getHighlightPositions

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
public OutputStreamWriter getHighlightPositions(String highlightWord, File filePath) {
    NRC_PDFHighlighter hl;
    OutputStreamWriter xmlOutput = null;
    try {
        hl = new NRC_PDFHighlighter();
        PDDocument pdDocument = new PDDocument(document);
        xmlOutput = new OutputStreamWriter(new FileOutputStream(filePath),"UTF-8");
        hl.generateXMLHighlight(pdDocument, highlightWord, xmlOutput);
    } catch (IOException e) {
        e.printStackTrace();
    }
    return xmlOutput;
}

开发者ID:LowResourceLanguages，项目名称:InuktitutComputing，代码行数:14，代码来源:NRC_PDFDocument.java

示例6: main

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
/**
 * Command line application.
 * 
 * @param args The command line arguments to the application.
 * 
 * @throws IOException If there is an error generating the highlight file.
 */
public static void main(String[] args) throws IOException 
{
    NRC_PDFHighlighter xmlExtractor = new NRC_PDFHighlighter();
    PDDocument doc = null;
    try
    {
        if( args.length < 2 )
        {
            usage();
        }
        String[] highlightStrings = new String[ args.length - 1];
        System.arraycopy( args, 1, highlightStrings, 0, highlightStrings.length );
        doc = PDDocument.load( args[0] );
        
        xmlExtractor.generateXMLHighlight( 
            doc, 
            highlightStrings, 
            new OutputStreamWriter( System.out ) );
    }
    finally
    {
        if( doc != null )
        {
            doc.close();
        }
    }
}

开发者ID:LowResourceLanguages，项目名称:InuktitutComputing，代码行数:35，代码来源:NRC_PDFHighlighter.java

示例7: getText

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
/**
 * This will return the text of a document.  See writeText. <br />
 * NOTE: The document must not be encrypted when coming into this method.
 *
 * @param doc The document to get the text from.
 *
 * @return The text of the PDF document.
 *
 * @throws IOException if the doc state is invalid or it is encrypted.
 */

public String getText (PDDocument doc) throws IOException
{
    StringWriter outputStream = new StringWriter();
    List ft = new ArrayList();
    writeText( doc, outputStream, ft);
    String fullText = null;
    fullText = outputStream.toString();
    return fullText;
}

开发者ID:LowResourceLanguages，项目名称:InuktitutComputing，代码行数:21，代码来源:NRC_PDFTextStripperWithFonts.java

示例8: startDocument

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
/**
 * This method is available for subclasses of this class.  It will be called before processing
 * of the document start.
 * 
 * @param pdf The PDF document that is being processed.
 * @throws IOException If an IO error occurs.
 */
protected void startDocument(PDDocument pdf) throws IOException 
{
        Iterator textIter =  getCharactersByArticle().iterator();
        guessTitle(textIter);
        writeHeader();
        pageNumber = 0;
}

开发者ID:LowResourceLanguages，项目名称:InuktitutComputing，代码行数:15，代码来源:NRC_PDFText2XML.java

示例9: endDocument

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
/**
 * @see PDFTextStripper#endDocument( PDDocument )
 */
public void endDocument(PDDocument pdf) throws IOException 
{
    output.write("</body>");
    output.write("</xmlstream>");
    output.flush();
}

开发者ID:LowResourceLanguages，项目名称:InuktitutComputing，代码行数:10，代码来源:NRC_PDFText2XML.java

示例10: toPDDocument

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
public PDDocument toPDDocument() throws CryptographyException, InvalidPasswordException, IOException {
	PDDocument doc;
	if(barr!=null) 
		doc= PDDocument.load(new ByteArrayInputStream(barr,0,barr.length));
	else if(resource instanceof FileResource)
		doc= PDDocument.load((File)resource);
	else 
		doc= PDDocument.load(new ByteArrayInputStream(IOUtil.toBytes(resource),0,barr.length));
	
	if(password!=null)doc.decrypt(password);
	
	
	return doc;
	
}

开发者ID:lucee，项目名称:Lucee4，代码行数:16，代码来源:PDFDocument.java

示例11: extractText

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
public static Object extractText(PDFDocument doc, Set<Integer> pageNumbers) throws IOException, CryptographyException, InvalidPasswordException {
	PDDocument pdDoc = doc.toPDDocument();
	//PDPageNode pages = pdDoc.getDocumentCatalog().getPages();
	//pages.
	//pdDoc.getDocumentCatalog().
	
	/*Iterator<Integer> it = pageNumbers.iterator();
	int p;
	while(it.hasNext()){
		p=it.next().intValue();
	
		pdDoc.getDocumentCatalog().getPages()
	}
	*/
	
	//print.o(pages);
	
	
	
	//pdDoc.
	
	
	//PDFTextStripperByArea  stripper = new PDFTextStripperByArea();
	//PDFHighlighter  stripper = new PDFHighlighter();
	PDFText2HTML  stripper = new PDFText2HTML();
	//PDFTextStripper stripper = new PDFTextStripper();
    StringWriter writer = new StringWriter();
    stripper.writeText(pdDoc, writer);
    
	
	return writer.toString();
}

开发者ID:lucee，项目名称:Lucee4，代码行数:33，代码来源:PDFUtil.java

示例12: getDocumentInformation

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
/**
 * @param doc
 * @return PDFDocumentInformation
 */
private PDDocumentInformation getDocumentInformation(PDDocument doc, ContentHandler contentHandler) {
    PDDocumentInformation tmpInfo = null;
    try {
        tmpInfo = doc.getDocumentInformation();
    } catch (Exception e) {
        logger.error(e);
        addErrorTagToOutput(contentHandler, e.toString());
    }
    return tmpInfo;
}

开发者ID:evlist，项目名称:orbeon-forms，代码行数:15，代码来源:FromPdfConverter.java

示例13: addPageCountAttribute

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
/**
 * @param atts
 * @param doc
 */
private void addPageCountAttribute(AttributesImpl atts, PDDocument doc) {
    int pageCount = 0; //The number of pages in this document
    try {
        pageCount = doc.getPageCount();
    } catch (IOException e) {
        logger.error(e);
        pageCount = 0;
    }
    if (pageCount > 0) {
        atts.addAttribute("", ATT_PAGES, ATT_PAGES, ATT_CDATA, String.valueOf(pageCount));
    }

}

开发者ID:evlist，项目名称:orbeon-forms，代码行数:18，代码来源:FromPdfConverter.java

示例14: endDocument

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
/**
 * @see PDFTextStripper#endDocument( PDDocument )
 */
public void endDocument(PDDocument pdf) throws IOException 
{
    output.write("</body></html>");      
}

开发者ID:LowResourceLanguages，项目名称:InuktitutComputing，代码行数:8，代码来源:NRC_PDFText2HTML.java

示例15: getText

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
/**
 * This will return the text of a document.  See writeText. <br />
 * NOTE: The document must not be encrypted when coming into this method.
 *
 * @param doc The document to get the text from.
 *
 * @return The text of the PDF document.
 *
 * @throws IOException if the doc state is invalid or it is encrypted.
 */
public Object[][] getText( PDDocument doc ) throws IOException
{
    StringWriter outputStream = new StringWriter();
    List ft = new ArrayList();
    writeText( doc, outputStream, ft);
    return (Object[][])ft.toArray(new Object[][]{});
}

开发者ID:LowResourceLanguages，项目名称:InuktitutComputing，代码行数:18，代码来源:NRC_PDFFonttedTextStripper.java

示例16: getText

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
/**
 * This will return the text of a document.  See writeText. <br />
 * NOTE: The document must not be encrypted when coming into this method.
 *
 * @param doc The document to get the text from.
 *
 * @return The text of the PDF document.
 *
 * @throws IOException if the doc state is invalid or it is encrypted.
 */
public String getText( PDDocument doc ) throws IOException
{
    StringWriter outputStream = new StringWriter();
    writeText( doc, outputStream );
    return outputStream.toString();
}

开发者ID:LowResourceLanguages，项目名称:InuktitutComputing，代码行数:17，代码来源:NRC_PDFTextStripper.java

示例17: parseDocument

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
/**
 *  This will parse a PDF document.
 *
 * @param  input         The input stream for the document.
 * @return               The document.
 * @throws  IOException  If there is an error parsing the document.
 */
private static PDDocument parseDocument(InputStream input) throws IOException {
	PDFParser parser = new PDFParser(input);
	parser.parse();
	return parser.getPDDocument();
}

开发者ID:NCAR，项目名称:joai-project，代码行数:13，代码来源:PageDesc.java

示例18: writeText

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
/**
 * @deprecated
 * @see PDFFonttedTextStripper#writeText( PDDocument, Writer )
 * @param doc The document to extract the text.
 * @param outputStream The stream to write the text to.
 * @throws IOException If there is an error extracting the text.
 */
public void writeText( COSDocument doc, Writer outputStream, List ft ) throws IOException
{
    writeText( new PDDocument( doc ), outputStream, ft );
}

开发者ID:LowResourceLanguages，项目名称:InuktitutComputing，代码行数:12，代码来源:NRC_PDFFonttedTextStripper.java

示例19: startDocument

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
/**
 * This method is available for subclasses of this class.  It will be called before processing
 * of the document start.
 * 
 * @param pdf The PDF document that is being processed.
 * @throws IOException If an IO error occurs.
 */
protected void startDocument(PDDocument pdf) throws IOException 
{
    // no default implementation, but available for subclasses    
}

开发者ID:LowResourceLanguages，项目名称:InuktitutComputing，代码行数:12，代码来源:NRC_PDFFonttedTextStripper.java

示例20: endDocument

import org.pdfbox.pdmodel.PDDocument; //导入依赖的package包/类
/**
 * This method is available for subclasses of this class.  It will be called after processing
 * of the document finishes.
 * 
 * @param pdf The PDF document that is being processed.
 * @throws IOException If an IO error occurs.
 */
protected void endDocument(PDDocument pdf ) throws IOException 
{
    // no default implementation, but available for subclasses
}

开发者ID:LowResourceLanguages，项目名称:InuktitutComputing，代码行数:12，代码来源:NRC_PDFFonttedTextStripper.java

注：本文中的org.pdfbox.pdmodel.PDDocument类示例整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Java Tuple6类代码示例发布时间：2022-05-22

Java LibraryLoader类代码示例发布时间：2022-05-22

剪的笔顺,诠释剪的笔画,认识剪的部首

1 六六分期app的软件客服如何联系？(六六分期

六六分期app的软件客服如何联系？不知道吗？加qq群【895510560】即可！标题：六六分期

阅读：18282|2023-10-27

2 可心卡盟:win10系统火狐flash插件崩溃怎么

今天小编告诉大家如何处理win10系统火狐flash插件总是崩溃的问题，可能很多用户都不知

阅读：9680|2022-11-06

3 亲亲特价:怎么删除回收站图标

今天小编告诉大家如何对win10系统删除桌面回收站图标进行设置，可能很多用户都不知道

阅读：8180|2022-11-06

4 济南大学虚拟社区:鲁大师节能降温的具体办

今天小编告诉大家如何对win10系统电脑设置节能降温的设置方法，想必大家都遇到过需要

阅读：8549|2022-11-06

5 xlueops.exe:无线网络安装向导

我们在使用xp系统的过程中,经常需要对xp系统无线网络安装向导设置进行设置，可能很多

阅读：8458|2022-11-06

6 女斗合众国:win7系统cf与主机连接不稳定怎

今天小编告诉大家如何处理win7系统玩cf老是与主机连接不稳定的问题，可能很多用户都不

阅读：9393|2022-11-06

7 0xc000022-[cf烟雾头]cf怎么调烟雾头

电脑对日常生活的重要性小编就不多说了，可是一旦碰到win7系统设置cf烟雾头的问题，很

阅读：8431|2022-11-06

8 qizideyouhuo:应用程序无法正常启动0xc0000

我们在日常使用电脑的时候，有的小伙伴们可能在打开应用的时候会遇见提示应用程序无法

阅读：7865|2022-11-06

9 ipz-185:win7系统vcf文件怎么打开

今天小编告诉大家如何对win7系统打开vcf文件进行设置，可能很多用户都不知道怎么对win

阅读：8416|2022-11-06

10 傻哥蹦迪:win10系统s4怎么打开usb调试

今天小编告诉大家如何对win10系统s4开启USB调试模式进行设置，可能很多用户都不知道怎

阅读：7394|2022-11-06

客服电话

电子邮件

Java PDDocument类代码示例

示例1: getWordsToHighlight

示例2: generateXMLHighlight

示例3: getPDFdocument

示例4: textContentOf

示例5: getHighlightPositions

示例6: main

示例7: getText

示例8: startDocument

示例9: endDocument

示例10: toPDDocument

示例11: extractText

示例12: getDocumentInformation

示例13: addPageCountAttribute

示例14: endDocument

示例15: getText

示例16: getText

示例17: parseDocument

示例18: writeText

示例19: startDocument

示例20: endDocument

请发表评论

全部评论

上一篇：

下一篇：

dphi-official/Machine_Learning_Bootcamp

tianli/matlab_offscreen: Matlab offscree

win7系统注册表编辑器打开的操作方法

これがマストドンだ！ 使い方からインスタ

芙蓉王（硬领航）多少钱一包？

剪的笔顺,诠释剪的笔画,认识剪的部首

六六分期app的软件客服如何联系？(六六分期

florent37/ViewAnimator: A fluent Android

florent37/Shrine-MaterialDesign2: implem

CVE-2020-36276

SimpleSoftwareIO/simple-sms: Send and re

关于我们

产品与服务

解决方案

139-2527-9053

これがマストドンだ！使い方からインスタ