Text Extraction From Image And Pdf Using Python and Tesseract

Text Extraction from both images and pdfs in Python using Tesseract OCR(handled by pytesseract)

Description:

Extracting texts from both images(.png and .jpg types) and pdfs using Tesseract OCR Engine.This Project can be further modified and implemented to extract particular texts from image or pdf.

Requirements:

Tesseract OCR:

installtion details for Tesseract : https://github.com/tesseract-ocr/tesseract/wiki#windows

(path to the folder must be defined in environment variables)

Pytesseract:

pip install pytesseract

PyMuPDF:

pip install PyMuPDF

Pillow:

pip install Pillow

Usage:

Go to the destined folder and open command prompt (terminal). From command prompt (terminal) type:

python text_extractor.py --file path_to_file

For example: python text_extractor.py --file test.pdf

Text Extraction From Image And Pdf Using Python and Tesseract

Description:

Requirements:

Usage:

Project Files

Comments (0)

Leave a Comment

Rating

Author

	..

This directory is empty.