I have a rather small set of images which contains dates. The size might be a problem, but I'd say that the quality is OK. I have followed the guidelines to provide the clearest image I can to the engine. After resizing, apply filters, lots of trial and error, etc. I came up with an image that is almost properly read. I put an example below:
Now, this is read as “9 MAR 2021
x0c
. Not bad, but the first 2
is read as "
. At this point I think I'm misusing part of the power of Tesseract. After all, I know what it should expect, i.e. something as "%d %b %Y"
.
Is there a way to tell Tesseract that it should try to find the best match given this strong constraint? Providing this metadata to the engine should heavily facilitate the task. I have been reading the documentation, but I can't find the way to do this.
I'm using pytesseract
on Tesseract 4.1. with Pytyon 3.9.