Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
700 views
in Technique[技术] by (71.8m points)

python 3.x - How to insert blank pages into a pdf using PyPDF2

Problem: I have an array of page numbers which blank pages need to be inserted or merged into the original pdf. Example) [1, 3, 5, 8, 10]. I need these pages to be blank and then the original document would increase in page numbers.

I have this Python script searching for specific text within a pdf file which signifies the end of a letter. Each letter is different in number of pages. Using PyPDF2, I have tried merge() with a single blank page pdf within the directory, insertBlankPage(), addPage(), addBlankPage. The problem I ran into was the blank pages were overwriting original pages. The first page that needed to be blank worked but the next pages were incorrect. It seemed like the blank pages were being written on top of existing pages vs being insert at the page number.

How can I insert blank pages at the page numbers listed in the array? Here is the code. The output array of pages does not need to be a string; it was converted to a string to bring into another program. If I can add blank pages using Python, the page number array won't need to be a string.

import PyPDF2, re

pdfIn = open('sample_letter.pdf', 'rb')
pdfFile = PyPDF2.PdfFileReader(pdfIn)
NumPages = pdfFile.getNumPages()
string = "Text I am searching for."
separator = ', '
mystring = ""

def end_of_letter():
    pages = []
    for page in range(NumPages):
        pgObj = pdfFile.getPage(page)
        text = pgObj.extractText()
        match = re.search(string, text)
        if match:
            pages.append(str(page + 1))
    mystring = separator.join(pages)
    print(mystring)
    return mystring


end_of_letter()
question from:https://stackoverflow.com/questions/65830082/how-to-insert-blank-pages-into-a-pdf-using-pypdf2

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I was able to find a solution that successfully iterates through the pdf, finds the text for the end of the letter and then insert the blank page. Code below.

"""This program will take an input pdf file and search for a string that signifies the end of a letter.
 After the end of the letter is found based on a string, a blank page is added. The output file is then
 created in the directory with blank pages added """

import PyPDF2, re

pdfIn = open('sample_letter.pdf', 'rb')
pdfFile = PyPDF2.PdfFileReader(pdfIn)
NumPages = pdfFile.getNumPages()
string = "Text I am searching for"
output = PyPDF2.PdfFileWriter()
outputStream = open('added_blank_pages.pdf', 'wb')


def end_of_letter():
    pages = []
    for page in range(NumPages):
        pgObj = pdfFile.getPage(page)
        text = pgObj.extractText()
        match = re.search(string, text)
        output.addPage(pgObj)
        if match:
            pages.append(page + 1)
            output.addBlankPage()
    output.write(outputStream)
    print(pages)


end_of_letter()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...