python - Pytesseract OCR multiple config options

Question

Welcome To Ask or Share your Answers For Others

python - Pytesseract OCR multiple config options

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Pytesseract OCR multiple config options

I am having some problems with pytesseract. I need to configure Tesseract to that it is configured to accept single digits while also only being able to accept numbers as the number zero is often confused with an 'O'.

Like this:

target = pytesseract.image_to_string(im,config='-psm 7',config='outputbase digits')

question from:https://stackoverflow.com/questions/44619077/pytesseract-ocr-multiple-config-options

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T17:30:07+0000

tesseract-4.0.0a supports below psm. If you want to have single character recognition, set psm = 10. And if your text consists of numbers only, you can set tessedit_char_whitelist=0123456789.

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
                        bypassing hacks that are Tesseract-specific.

Here is a sample usage of image_to_string with multiple parameters.

target = pytesseract.image_to_string(image, lang='eng', boxes=False, 
        config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')

Hope this helps.

Categories

python - Pytesseract OCR multiple config options

python - Pytesseract OCR multiple config options

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags