Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
230 views
in Technique[技术] by (71.8m points)

javascript - Determine Text Orientation in a PDF

Is there a way that I can detect whether the text in a page is in Landscape or Portrait Orientation using JS or any libraries? I cannot rely on width > height, as there are pages that are rotated as well. Rotated Page with Portrait Orientation vs Rotated Page with Landscape Orientation

I cannot rely on comparing Width and Height, or checking if the page is rotated, because both these pages are rotated 90 degrees, but I cannot figure out how to detect the text's orientation.

I also do some preprocessing on the PDF using Node.js and pdfjs. So if that has any API/library to help me get the required information I would appreciate the help.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can do this using tesseract which is mainly used for OCR conversion. I am using it with PHP but you can also use it with JS: https://ourcodeworld.com/articles/read/580/how-to-convert-images-to-text-with-pure-javascript-using-tesseract-js

Tesseract can detect orientation. Here is some information on it using Python: Is it possible to check orientation of an image before passing it through pytesseract ocr module

All you would need to do is to adapt this to Javascript using the tool of the first link above.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...