Hello All,
I am trying to read an .PNG image and need to extract the text from that image.
I used tesseract and magick packages but output is weird.Please find the below the source as well as the output ,please let me know what should be done to solve it.
tesseract::ocr("C:/Users/Prasanna.Mathivanan1/Desktop/Image_Processing/git.PNG", engine = tesseract("eng"), HOCR = FALSE)
[1] "ANU NT UCB PRU es ele re Ea Lee UCB) a Led ELLOS Lose ee\nAOU MICuEe\n\nALIJHU, Bala\n\nTee]\n\nTse ee)\n"
OCR is always going to be somewhat prone to errors, but the tesseract library and its R bindings provide various methods you can try to improve your results.
maybe its worth backing up a step ..
Must you create a solution involving OCR ?
If you are capturing content of a dos terminal, there are ways to capture that that dont involve imagery. Perhaps you can describe something about your underlying use case and requirements so we can see if there is something more effective than reliance on OCR ?
My requirement is a screenshot would be provided by the client and from that i need to retrieve the text available in the image.
Client would provide the image from dos terminal or from Putty or the output derived after executing a script.
That's why i opted for OCR to fulfill my requirement.
Is this about logging the clients activity ? there would be better ways to transmit a log of an output than OCR.
when a script is run from command line, you can always pipe its output to a file, then emailing that file will perfectly reproduce the scripts output without the need for reading an image
the client logs his activity and he sends the picture.
We are supposed to retrieve the text information from the text and we should add it in our log document.
I'm sorry to say, this approach sounds a mistake. It's very common for people to run process, log result, send outcome, send log. It is not standard practice to use image and OCR for this. Please consider alternatives, for your own sake.
Client is not going to change his approach.
They will provide the image from dos terminal i neeed to retrieve the text from the image and provide to another client.
To me my requirement has to be fulfilled so i am open to anything.
In that case, I think you're going to need to take a deep dive into the tesseract documentation I linked to earlier. The R package is just a binding to the tesseract library, so I'd go right to the source.
You might look into customizing your configuration to deal with the specific input. I've never done this, but it's the