I am trying to use Tesseract to extract from the below image,
text = pytesseract.image_to_string(image, config='-c preserve_interword_spaces=1 --psm 1 --oem 1')
Here is the result from tesseract 4 ocr,
print(text)
Wrote Datastream application
e Used Kafka to get the accounts
If you see the bullet point in the image is converted to e
, I found several such points in document converted into single characters in ascii
If anyone is familiar with such issue and have a solution please let me know.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…