hi i am using Ubuntu 8.04, i want to scan documents to OCR to Office documents, so that i dont have to type all the matter (text), but it xsane and kooka converts in .txt files with junk characters, many times blank file .txt how can i convert directly into Office Doc files,
it used to convent in WinXP, directly into MS Doc without or very less junk characters retaining the same format.
can u suggest any good software for this on Ubuntu
Have you tried tesseract
http://code.google.com/p/tesseract-ocr/
Its in the ubuntu archive, atleast 8.10
shirish wrote:
Have you tried tesseract
http://code.google.com/p/tesseract-ocr/
Its in the ubuntu archive, atleast 8.10
Thanks for the link. I installed it but didn't know how to use it as google's page does not have a howto. Found this link for the howto.
http://www.howtoforge.com/ocr_with_tesseract_on_ubuntu704
Hi Rony, The package is there in Intrepid atleast, dunno about whether it was there in Hardy or not. I do not have a scanner hence can't test it but have heard some nice things about it.
If its not in the main repository for hardy then it surely may be in somebody's PPA.
Would be interested to know what you make of it.
shirish wrote:
Hi Rony, The package is there in Intrepid atleast, dunno about whether it was there in Hardy or not. I do not have a scanner hence can't test it but have heard some nice things about it.
If its not in the main repository for hardy then it surely may be in somebody's PPA.
Would be interested to know what you make of it.
I found it in the apt packages of Etch. My scanner is doze only so the max. I may get working is to convert a scanned jpeg into text.
IPS Khurana wrote:
hi i am using Ubuntu 8.04, i want to scan documents to OCR to Office documents, so that i dont have to type all the matter (text), but it xsane and kooka converts in .txt files with junk characters, many times blank file .txt how can i convert directly into Office Doc files,
I am not sure if there is an open source OCR engine that will retain the formatting and table structures, gocr is a good tool and I believe it is in use in the software you mentioned
it used to convent in WinXP, directly into MS Doc without or very less junk characters retaining the same format.
if you already have a good OCR why not use the software you already have with wine
can u suggest any good software for this on Ubuntu
IPS Khurana wrote:
hi i am using Ubuntu 8.04, i want to scan documents to OCR to Office documents, so that i dont have to type all the matter (text), but it xsane and kooka converts in .txt files with junk characters, many times blank file .txt how can i convert directly into Office Doc files,
In GNU/Linux you convert to ODT format. Then let OO convert it to doc format if necessary.