Besides htmldoc is there any other way to convert all html documents in a folder recursively to pdf files?
I need to convert html to pdf in some automatic way (script based, gui etc) --typically websites downloaded for offline reading have html documents. These documents are sometimes many pages long. Since there is no way to remember how much of the html file was read before stopping reading for the day, i need to convert the html etc files to pdf--so that the reading done upto the particular day for the particular pdf file can be noted.
Thanks for any help.
Kussh
On Wed, Jun 23, 2010 at 7:45 AM, Kussh Singh kussh.singh@gmail.com wrote:
Besides htmldoc is there any other way to convert all html documents in a folder recursively to pdf files?
I need to convert html to pdf in some automatic way (script based, gui etc) --typically websites downloaded for offline reading have html documents. These documents are sometimes many pages long. Since there is no way to remember how much of the html file was read before stopping reading for the day, i need to convert the html etc files to pdf--so that the reading done upto the particular day for the particular pdf file can be noted.
I have used html2text (the package is available in the Ubuntu repos, at least), but can't find a text to pdf converter. May be you could mark the extent of the file you have read by making a line with whatever character you choose (@, _, -, etc) in a text file?
Regards, Easwar Registered Linux user #442065
On Wed, Jun 23, 2010 at 9:22 AM, Easwar Hariharan meindian523@gmail.comwrote:
On Wed, Jun 23, 2010 at 7:45 AM, Kussh Singh kussh.singh@gmail.comwrote:
Besides htmldoc is there any other way to convert all html documents in a folder recursively to pdf files?
I need to convert html to pdf in some automatic way (script based, gui etc) --typically websites downloaded for offline reading have html documents. These documents are sometimes many pages long. Since there is no way to remember how much of the html file was read before stopping reading for the day, i need to convert the html etc files to pdf--so that the reading done upto the particular day for the particular pdf file can be noted.
I have used html2text (the package is available in the Ubuntu repos, at least), but can't find a text to pdf converter. May be you could mark the extent of the file you have read by making a line with whatever character you choose (@, _, -, etc) in a text file?
Forgot to mention that html2text is CLI, man page is good, and I've done batch processing with it (just wrapped it in a bash loop)
On Wed, Jun 23, 2010 at 7:45 AM, Kussh Singh kussh.singh@gmail.com wrote:
Besides htmldoc is there any other way to convert all html documents in a folder recursively to pdf files?
I need to convert html to pdf in some automatic way (script based, gui etc) --typically websites downloaded for offline reading have html documents.
You can try http://pypi.python.org/pypi/pisa/ But it requires some readings to customize it to the type of output you expect. Some Python scripting is required, I think.
If it is a standalone web page, then there are some free online service providers who convert the page to the PDF on the fly. One of them is http://html-pdf-converter.com/ Batch mode processing is not possible here, though.
Raghu
Why the cc to bom-lug linuxers@mm.glug-bom.org?
I recieve a double dose of your mails ;-).
I had the similar requirement in my last project. Used Qt to do it.
Check these links: http://bharatikunal.wordpress.com/2010/01/31/converting-html-to-pdf-with-pyt... http://bharatikunal.wordpress.com/2010/02/01/converting-html-to-pdf-with-jav...
Besides htmldoc is there any other way to convert all html documents in a folder recursively to pdf files?
I've used something called wkhtmltopdf with much success. http://code.google.com/p/wkhtmltopdf/ or just apt-get install wkhtmltopdf .
Usage is on the command line with like wkhtmltopdf <path-to-html-file-or-url> output.pdf , and many parameters you can tweak for quality, size, etc. Of course, its easy to now put this in a bash script to loop over all html files in a folder, etc.
This tool uses the webkit rendering engine to render the html page, and then converts this snapshot to a pdf. In my experience, it has rendered to pdf flawlessly even quite complicated page layouts, which other html to pdf converters seem not to do. Basically, if a page looks fine in a webkit based browser, its likely output the pdf exactly the way it looks as a webpage.
I think the version in the repositories requires the machine to have X installed, but if you install the newest version from the google code page, it can run on a server without X installed.
Best of luck, Sanjay
Hi,
On 06/23/2010 07:45 AM, Kussh Singh wrote:
Besides htmldoc is there any other way to convert all html documents in a folder recursively to pdf files?
I need to convert html to pdf in some automatic way (script based, gui etc) --typically websites downloaded for offline reading have html documents. These documents are sometimes many pages long. Since there is no way to remember how much of the html file was read before stopping reading for the day, i need to convert the html etc files to pdf--so that the reading done upto the particular day for the particular pdf file can be noted.
btw, not directly related to your question, but if the basic aim is to easily read websites/html docs which you have downloaded, I would suggest using a proper ebook reader for the job. I highly recommend: FBReader: http://www.fbreader.org/about.php (supports a lot of formats, you can set bookmarks to remember where to start off from next time and also can read from compressed archives (gz/bz2/tar.gz/zip)
cheers, - steve
On Thu, Jun 24, 2010 at 12:54 PM, steve steve@lonetwin.net wrote:
Hi,
btw, not directly related to your question, but if the basic aim is to easily read websites/html docs which you have downloaded, I would suggest using a proper ebook reader for the job. I highly recommend: FBReader: http://www.fbreader.org/about.php (supports a lot of formats, you can set bookmarks to remember where to start off from next time and also can read from compressed archives (gz/bz2/tar.gz/zip)
cheers,
- steve
-- random spiel: http://lonetwin.net/ what i'm stumbling into: http://lonetwin.stumbleupon.com/
HI Steve,
Thanks for the suggestion. Had earlier tried fbreader for chm books but had finally opted for xchm. I will now try reading the downloaded websites via fbreader using bookmarks though i still feel numbered pages in a pdf file give a better idea of the output read per day.
Cheers, Kussh