VOGONS


First post, by Kerr Avon

User metadata
Rank Oldbie
Rank
Oldbie

After hating the idea of e-book readers (to be fair, the first couple I tried were slow and not good picture quality), I've spent the last few years using them extensively and they are great, and so I'm trying to get all of my old books onto ebook format. Anyway, there are some that I can't find online, so I want to scan them in, and convert them via OCR, but many of the books won't open enough so that all of the page lies plush on the scanner, so is there any solution other than manually removing each page and scanning them that way. The books aren't valuable or anything (or even rare, it's some science fiction books, and a few books of collected essays) but I'd rather not destroy them if possible.

To clarify, the problem isn't that my (flat-bed) scaner won't scan, it's that the books themselves won't physically open enough so that the two pages both lie straight on the lense (or whatever you can the glass sheet) of the scanner. The pages curve back into the spine of the book, and so the text near the spine isn't scanned properly. I can solve this by removing each page from the book, and then scanning each individual page, but then I'm left with just the pages, and not the book anymore. This has to be a problem for lots of people when scanning books, so I was hoping that there was a solution that would not involve physically ripping the pages out of the books.

All but one of the books are paperback, and none of them have a two page size that's larger than A4.

Reply 2 of 14, by Tertz

User metadata
Rank Oldbie
Rank
Oldbie
Kerr Avon wrote:

so I want to scan them in, and convert them via OCR

Keep books in picture format too (>=200 dpi, 8 bit grey, 24 bit for color parts, scan covers too), as OCR make many mistakes wich are almost impossible to remove all. 200 mb PDF is not a problem for today to keep and transfer, while to loose or distort information is very bad. Strings should be straight on all their lenght, all text and images good readable - you'll need to check every page after scan, and rescan in case of issues. Numbers of pages in PDF should coincide to numbers drawn on pages.

but many of the books won't open enough so that all of the page lies plush on the scanner

The most important is to get quality digital variant. You'll don't use the paper later, in common.

so is there any solution other than manually removing each page and scanning them that way

Unfold at maximum [a picture of unfolding sequence: II \/ -- /\ ], and then hold pages by hand during scanning so they were pressed wholly. You may break the book's binding a little, but digital variant is the only thing you should care about.

so I was hoping that there was a solution that would not involve physically ripping the pages out of the books

You can don't remove every page, as I've described above, but you'll have a risk of some damaging books anyway.

DOSBox CPU Benchmark
Yamaha YMF7x4 Guide

Reply 3 of 14, by Lo Wang

User metadata
Rank Member
Rank
Member

I'd much rather use a good camera mounted on a tripod for that. You'll be able to cover every detail without distortion and any OCR software should be able to pick it all up.

"That if thou shalt confess with thy mouth the Lord Jesus, and shalt believe in thine heart that God hath raised him from the dead, thou shalt be saved" - Romans 10:9

Reply 4 of 14, by Tertz

User metadata
Rank Oldbie
Rank
Oldbie
Lo Wang wrote:

I'd much rather use a good camera mounted on a tripod for that.

Camera, good light, tripod. Then as pages will not be pressed to glass, so text will be not straight. And it would be significantly harder (try to "scan" so 500 pages book, several of such). With more effort you'll get less quality.
Plustek OpticBook seems as best decision, if $300 are not a serious problem for such fun.

DOSBox CPU Benchmark
Yamaha YMF7x4 Guide

Reply 5 of 14, by Lo Wang

User metadata
Rank Member
Rank
Member
Tertz wrote:

Then as pages will not be pressed to glass, so text will be not straight.

It's precisely the distance between the camera's lens and document what makes it possible to sample the page without distortion.

"That if thou shalt confess with thy mouth the Lord Jesus, and shalt believe in thine heart that God hath raised him from the dead, thou shalt be saved" - Romans 10:9

Reply 6 of 14, by Stiletto

User metadata
Rank l33t++
Rank
l33t++

http://diybookscanner.org 😁

"I see a little silhouette-o of a man, Scaramouche, Scaramouche, will you
do the Fandango!" - Queen

Stiletto

Reply 7 of 14, by Dominus

User metadata
Rank DOSBox Moderator
Rank
DOSBox Moderator

The opticbook is good for texts, but for graphics that extend right to the middle it is not 100% ideal since it does have a slight border.

Windows 3.1x guide for DOSBox
60 seconds guide to DOSBox
DOSBox SVN snapshot for macOS (10.4-11.x ppc/intel 32/64bit) notarized for gatekeeper

Reply 8 of 14, by obobskivich

User metadata
Rank l33t
Rank
l33t

Lo Wang's advice is dead-on for over-sized books and things you don't want to damage; its more tedious though. For books that can be easily replaced and you just want an electronic copy, strip the binding off (if its a perfect-bound book you can do this with a guillotine) and run'em through an auto-feeder page-by-page. OTR/OCR the PDF if you like, but make sure you keep an original before-hand (as OTR/OCR is not always perfect, and can in some cases make even larger files (it isn't supposed to, but I've seen automated OTR turn a ~900k PDF into a 30MB monstrosity in the past)).

Also note that 100% reproduction of a printed book, even for personal use, is a huge legal gray area (and/or potentially copyright infringement), depending on where you are in the world, and the book's copyright status.

Reply 9 of 14, by mockingbird

User metadata
Rank Oldbie
Rank
Oldbie

Does anyone remeber these things?

The attachment scanman.jpg is no longer available

mslrlv.png
(Decommissioned:)
7ivtic.png

Reply 10 of 14, by Matth79

User metadata
Rank Oldbie
Rank
Oldbie

Used to have a hand scanner - and at one time, a self propelled motorized hand scanner.

As for the photo method, the key is to have the book held open at 90 degrees, maybe with glass flattening the side you are taking, and even, non-reflecting illumination. To avoid having to reverse the book, it would be better for the stand to flatten both sides and have a camera position which can be swung across - or just two plates with lens holes against which you press the camera.

Reply 11 of 14, by idspispopd

User metadata
Rank Oldbie
Rank
Oldbie

Is your issue only the focus or also the distortion? Focus shouldn't be an issue with a CCD scanner, modern ones (especially the cheaper ones) are mostly CIS.

Reply 13 of 14, by Kerr Avon

User metadata
Rank Oldbie
Rank
Oldbie

Thanks for all of the great suggestions, but something came up, and I've not been able to get home to try them (and no, I've not been arrested or abducted by aliens!). I'll probably be home on Sunday, so hopefully I can try them then (I'm not too confident about my (digital) camera though, it's old and probably not up to the task, so I'll probably end up borrowing a mate's camera, or seeing if my tablet's camera is good enough).

If I can escape from this space-craft first!