Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
688 views
in Technique[技术] by (71.8m points)

Convert PDF to HTML in PHP?

I want to be able to convert a PDF file to an HTML file via PHP, but am running into some trouble.

I found a basic way to do this using Saaspose, which lets you convert PDF's to HTML files. There are some problems with this, however, such as the use of SVGs, images, positioning, fonts, etc.

All I would need is the ability to grab the text from the PHP file and any images associated with it, and then display it in a linear format as opposed to it being formatted with absolute positioning.

What I mean by this is that if the PDF looks like this:

enter image description here

I'd want to convert it to a single column design HTML file. If there were images, I'd want them returned as well.

Is this possible in PHP? I know I can simply grab the text from the PDF file, but what about grabbing images as well?

Another problem is that I want everything to be inline, as it's being served to the client in a single file. Currently, I can do this with my setup through some code:

for ($i = 0; $i < $object_number; $i++) {
                $object = $html->find("object")->find("embed")->eq($i);
                $embed = file_get_contents("Output/OutputHtml/" . $object->attr("src"));
                array_push($converted_obj, $embed);
                array_push($original_obj, $object);
            }

            for ($i = 0; $i < $object_number; $i++){
                pq($original_obj[$i])->replaceWith($converted_obj[$i]);
            }

Which grabs all the SVG files and displays them inline. Images would be easier for this, as I could use base64.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

1) download and unpack the .exe file to a folder: http://sourceforge.net/projects/pdftohtml/

2) create a .php file, and put this code (assuming, that the pdftohtml.exe is inside that folder, and the source sample.pdf too):

<?php
$source_pdf="sample.pdf";
$output_folder="MyFolder";

    if (!file_exists($output_folder)) { mkdir($output_folder, 0777, true);}
$a= passthru("pdftohtml $source_pdf $output_folder/new_file_name",$b);
var_dump($a);
?>

3) enter MyFolder, and you will see the converted files (depends on the number of pages..)

p.s. i dont know, but there exists many commercial or trial apis too.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...