I propose to purchase a scanner and also use the same , among other things, for reading Business cards of contacts to build a database. My query
1. Can I assume that using any scanner I can get pdf files and the same can be OCRed ( using pdfocr script which uses Cnueiform ) 2. Are there any scanners I should avoid 3. Any experience or Contra-indication against Canon LiDE 100 . 4. Any application that reads the business cards and intelligently (or interactively) give parsed data from business card 5. Any recommendation for FOSS application for building a database for contacts using scanner/OCR/manual effort
jitendra
On Fri, Dec 31, 2010 at 10:17 AM, jitendra jituviju@gmail.com wrote:
I propose to purchase a scanner and also use the same , among other things, for reading Business cards of contacts to build a database. My query
1. Can I assume that using any scanner I can get pdf files and the same can be OCRed ( using pdfocr script which uses Cnueiform )
You can choose PDF output in xsane, no idea if it can be OCR'd by your script.
I am using Epson Perfection V30 in my office. You will need to download Linux drivers from Epson's site. The download also includes two minimalist scanner apps that produce output in JPEG, PDF, PNG formats; the output is good enough for my use.
2. Are there any scanners I should avoid
See below.
3. Any experience or Contra-indication against Canon LiDE 100 .
About 4 months back I installed an unit @ client site - it does not support Linux and at that time Canon had no drivers for it on their web site.
From other forums, similar query, I have read that HP scanners work
with Linux - do your research as for model numbers etc.
4. Any application that reads the business cards and intelligently (or interactively) give parsed data from business card 5. Any recommendation for FOSS application for building a database for contacts using scanner/OCR/manual effort
I am also interested in the above two. Please do post your findings when find something useful.
HTH, -- Arun Khan
Thanks Arun,
I found this on the web:
Support has recently been added for this scanner. You have to download the latest source code and compile. Below is instructions copied from a Shutter4U post
Code:
To get this working, here are the steps to take:
1) You need some usb libraries, so, in a terminal type:
sudo apt-get install libusb-dev build-essential libsane-dev
2) To get the sane backends from git you need git-core. If you don't already have it, type this (also in a terminal):
sudo apt-get install git-core
3) Now use the git that was just installed to get the sane backends using the following command:
git clone git://git.debian.org/sane/sane-backends.git
That downloads the backends and puts them in a folder called sane-backends in your home folder.
4) Change directory into the new sane-backends folder and compile them:
cd sane-backends
./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var
make <--- this one takes a while
sudo make install
Now everything is installed, but you still won't be able to scan (except as root) until you set up some permissions.
5) You need to edit a file, but you need to be root to edit it, so:
sudo gedit /lib/udev/rules.d/40-libsane.rules
and add the following 2 lines:
# Canon CanoScan Lide 100 ATTRS{idVendor}=="04a9", ATTRS{idProduct}=="1904", ENV{libsane_matched}="yes"
save the file, exit gedit, exit terminal, reboot, and...
SCAN AWAY!
On Fri, Dec 31, 2010 at 4:52 PM, Arun Khan knura9@gmail.com wrote:
On Fri, Dec 31, 2010 at 10:17 AM, jitendra jituviju@gmail.com wrote:
I propose to purchase a scanner and also use the same , among other
things,
for reading Business cards of contacts to build a database. My query
- Can I assume that using any scanner I can get pdf files and the same
can be OCRed ( using pdfocr script which uses Cnueiform )
You can choose PDF output in xsane, no idea if it can be OCR'd by your script.
I am using Epson Perfection V30 in my office. You will need to download Linux drivers from Epson's site. The download also includes two minimalist scanner apps that produce output in JPEG, PDF, PNG formats; the output is good enough for my use.
- Are there any scanners I should avoid
See below.
- Any experience or Contra-indication against Canon LiDE 100 .
About 4 months back I installed an unit @ client site - it does not support Linux and at that time Canon had no drivers for it on their web site.
From other forums, similar query, I have read that HP scanners work with Linux - do your research as for model numbers etc.
- Any application that reads the business cards and intelligently (or
interactively) give parsed data from business card 5. Any recommendation for FOSS application for building a database for contacts using scanner/OCR/manual effort
I am also interested in the above two. Please do post your findings when find something useful.
HTH,
-- Arun Khan
On Android phones there are several business card apps that you can use. I use CamCard to take photos of business cards. The application then neatly does an OCR and files away the business cards in the contacts, with phone numbers, mobiles, faxes all appearing in the correct fields. Some amount of editing is required but by and large, it does a good job. It also stores the card as an image so one can review it and correct details at a later date. I found it a very useful application.
Venky
On Fri, Dec 31, 2010 at 10:17 AM, jitendra jituviju@gmail.com wrote:
I propose to purchase a scanner and also use the same , among other things, for reading Business cards of contacts to build a database. My query
- Can I assume that using any scanner I can get pdf files and the same
can be OCRed ( using pdfocr script which uses Cnueiform ) 2. Are there any scanners I should avoid 3. Any experience or Contra-indication against Canon LiDE 100 . 4. Any application that reads the business cards and intelligently (or interactively) give parsed data from business card 5. Any recommendation for FOSS application for building a database for contacts using scanner/OCR/manual effort
jitendra
On Friday 31 December 2010 10:17 AM, jitendra wrote:
I propose to purchase a scanner and also use the same , among other things, for reading Business cards of contacts to build a database. My query
1. Can I assume that using any scanner I can get pdf files and the same can be OCRed ( using pdfocr script which uses Cnueiform ) 2. Are there any scanners I should avoid 3. Any experience or Contra-indication against Canon LiDE 100 . 4. Any application that reads the business cards and intelligently (or interactively) give parsed data from business card 5. Any recommendation for FOSS application for building a database for contacts using scanner/OCR/manual effort
Avoid Canon scanners as they do not have proper linux drivers. Plus, they draw power from the usb port itself which is not nice and can be useless for older mobos. HP now supports its printers and scanners well and its hplip software is good. Just go through the list of supported devices before buying.