The Curse of OCR

Posted by Paul Samael on Sunday, November 1, 2015 Under: Random thoughts



I’m a little hesitant about criticising books for having typos, as I’m sure that – despite my best endeavours to weed them out - my own are not entirely error-free.   So having a pop at William Boyd’s publishers over the numerous typos in the Kindle editions of some of his older novels could be seen as mild hypocrisy on my part.  Someone with higher moral scruples might conceivably agonise about this for several paragraphs – perhaps even whole pages.  But a couple of sentences is enough hesitation for me.  So without further ado, let’s just get on with putting the boot in:

The worst case I’ve come across so far was “Armadillo,” which seemed to have been converted to a Kindle file via optical character recognition (OCR) without anyone having bothered to run any checks at all on the output.  Both capital “I” and lower case “l” were frequently rendered as the number “1,” separate words ran into one another, vast amounts of punctuation were missing and there were numerous other mis-transcriptions.  All this suggests that no one had even bothered to run a spellcheck, let alone proof read the text.  It was far worse than any self-published book I have read (and I have read a fair few).  I can only agree with the reviewer on Amazon who described the Kindle conversion as “a monument to illiteracy, lack of proof reading and indifference to the basics of written English. In short, an insult to its author.”   There are no such complaints about non-Kindle editions, so it does look as if the publisher is to blame.

With “A Good Man in Africa”, which I’ve just finished, a spellchecker does seem to have been used (hurrah!) – but alas, it wasn’t clever enough to spot mistakes like a description of a boy having “a look of hair” falling down over his face (presumably it should have been “lock”).  Again, I suspect the culprit was OCR again (as with quite a number of other similar mistakes).

I know publishers are not having the easiest of times right now, but William Boyd is an author who must have repaid their initial investment in him many times over, based on hard copy sales alone – so you would have thought that resources could have been found to do a more careful job of converting his older titles to ebooks.   It’s certainly one to ponder the next time you hear someone talking up the value that publishers supposedly add.  

I’d also like to know why these books had to be OCR’d in the first place.  Word Processors have been around for some time - can it really have been the case that no electronic files existed of these novels?  Perhaps someone out there will enlighten me.   In the meantime, I will return to wrestling with my rather feeble conscience….

UPDATE 5.11.2015:  a little Googling around this topic has not turned up much beyond a suggestion on a discussion forum that many publishers didn't bother keeping electronic files of the final edited text (hence their reliance on OCR).  It's good to see that publishers have been thinking ahead to the probable future of their industry.....  But I did find this article about a fairly common and particularly unfortunate OCR-induced error, which sort of sums up the whole thing for me.

In : Random thoughts 


Tags: typos in kindle versions of ebooks 
blog comments powered by Disqus

About Me


Paul Samael Welcome to my blog, "Publishing Waste" which will either (a) chronicle my heroic efforts to self-publish my own fiction; or (b) demonstrate beyond a scintilla of doubt the utter futility of (a). And along the way, I will also be doing some reviews of other people's books and occasionally blogging about other stuff.
blog comments powered by Disqus

Make a free website with Yola