Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have a rather small set of images which contains dates. The size might be a problem, but I'd say that the quality is OK. I have followed the guidelines to provide the clearest image I can to the engine. After resizing, apply filters, lots of trial and error, etc. I came up with an image that is almost properly read. I put an example below:

tesseract test

Now, this is read as “9 MAR 2021 x0c. Not bad, but the first 2 is read as ". At this point I think I'm misusing part of the power of Tesseract. After all, I know what it should expect, i.e. something as "%d %b %Y".

Is there a way to tell Tesseract that it should try to find the best match given this strong constraint? Providing this metadata to the engine should heavily facilitate the task. I have been reading the documentation, but I can't find the way to do this.

I'm using pytesseract on Tesseract 4.1. with Pytyon 3.9.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
210 views
Welcome To Ask or Share your Answers For Others

1 Answer

You need to know the followings:

Now if we center the image (by adding borders):

  • enter image description here

  • We up-sample the image without losing any pixel.

Second, we need to make the characters in the image bold to make the OCR result accurate.

  • enter image description here

Now OCR:

29 MAR 2021

Code:

import cv2
import pytesseract

# Load the image
img = cv2.imread("xsGBK.jpg")

# Center the image
img = cv2.copyMakeBorder(img, 50, 50, 50, 50, cv2.BORDER_CONSTANT, value=[0, 0, 0])

# Convert to the gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Dilate
gry = cv2.dilate(gry, None, iterations=1)

# OCR
print(pytesseract.image_to_string(gry))

# Display
cv2.imshow("", gry)
cv2.waitKey(0)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...