Google ate my story!

December 18, 2016




According to a report in The Guardian, Google has recently attempted to improve the language capabilities of one of its Artificial Intelligence programs by feeding it over 10,000 free ebooks downloaded from Smashwords (out of a total of well over 50,000 free ebooks).  Apparently the idea was to help the AI produce more natural-sounding sentences.

Being The Guardian, the report was a bit po-faced about the whole thing and the journalist seemed to think that the authors ought to have been remunerated – or that at the very least, Google ought to  have asked permission.  However, doing that for 10,000+ books would’ve been a massive undertaking – and it’s not as if they were trying to pass off the material as their own.  Besides, if you make your work available in digital form for free, it is inevitably going to be read by web-bots (even if no one else reads it).

Anyway, as someone who has self-published on Smashwords, I was curious as to whether my own work had been guzzled by Google’s AI  - or whether it had chomped down on any of the excellent free Smashwords books I have reviewed on this site.  I was also curious to know what, if anything, the researchers had done to ensure that this giant helping of “brain-food” was actually any good. After all, feeding a load of garbage into your AI program wouldn’t necessarily produce very satisfactory results – and even a cursory look at Smashwords is enough to tell you that the quality of work there is, well, a little variable, shall we say.  Indiscriminate downloading could also result in the AI program developing an unhealthy fixation with language used to describe acts of sexual congress…..

So I requested the BooksCorpus dataset from the people who originally compiled it at Toronto University (NB these are not the people at Google – Google just borrowed their dataset).  It is sorted into a wide range files by genre, ranging from “Adventure” through to “Vampires.”  I’m not sure what the AI program made of all the vampire stuff.  “Erotica” was not included, although “Romance” was, as was something called “New Adult” (again, not sure what the AI made of these…).   Of books reviewed on this site, the following were included:


Of authors whose books I have reviewed on this site, several books by Tom Lichtenberg were included (e.g. Girl in the Trees), as was a book by Steve Anderson (Underheroes).

As for the other books in the dataset, I’ve opened up a few of them at random, but I haven’t found anything that struck me as utter garbage – so whatever they did to select the books, they don’t seem to have done too bad a job.  Indeed, I may use sections of the BooksCorpus as a way of trying to find free Smashwords gems that have so far escaped my attentions.

I did ask them how they selected which books to put into the dataset (e.g. did they take account of downloads, reviews or was it just random?), but the researcher who replied hadn’t been directly involved.  It looks as if the idea was to feed the dataset with as much material as possible in the hope that the good would overwhelm the bad – although there was some mention of a facility in the program which can correct for bad grammar/spelling.  That said, the researcher did admit that their model sometimes produced broken sentences, including ones that read a bit like porn (just fancy that !).

So there you have it – instead of being fed on a carefully controlled diet of Great Literature from established giants of the publishing world, Google’s AI has been gorging itself on the modest scribblings of a bunch of self-published indie writers. It feels a bit like this scene towards the end of “2001:  A Space Odyssey” where astronaut Dave Bowman is deactivating the HAL 9000 supercomputer - only in reverse:

HAL 9000:  “Dave, my mind is going. I can feel it. I can feel it. My mind is going. There is no question about it. I can feel it. I can feel it. I can feel it. I'm a... fraid. Good afternoon, gentlemen. I am a HAL 9000 computer. I became operational at the H.A.L. plant in Urbana, Illinois on the 12th of January 1992. My instructor was Mr. Langley, and he taught me to sing a song. If you'd like to hear it I can sing it for you.”

 

Stumps of mystery

October 31, 2016



“Stumps of mystery: stories from the end of an era” by Susan Wickstrom describes itself as “a novel in stories” – and it’s certainly true that it occupies a space somewhere in between a full-blown novel and a book of short stories.  Structurally, it’s similar to some of David Mitchell’s fiction, where you get a series of separate but linked stories - I am thinking in particular of “Ghostwritten” and “Cloud Atlas”.  

But whereas Mitchell tends to leap around a lot in ter...

Continue reading...
 

Inselaffen!

June 27, 2016


Some thoughts on the EU referendum result.


Now we know why, when they are feeling frustrated with us (as well they might right now), the Germans refer to us as “Inselaffen” (island apes).  Here’s a picture of one of those island apes watching a graph of his currency dropping to a 30 year low against the dollar (having at long last managed to switch on his laptop).

If you have read any of my previous, rather geeky (and evidently totally ineffectual) posts on Brexit (they start here and the...

Continue reading...
 

Can't decide about Brexit? Read this

June 20, 2016


Unsure about which way to vote in the EU referendum?  Well, who can blame you given that debate on the subject has descended into an unedifying slanging match.

It’s hard to feel enthused about voting to remain because the EU is not a particularly lovable organisation – and it’s going through a particularly bad patch right now with the euro and migration crises, which highlight the fact that it is far from perfect.  So your heart may be telling you we should leave, buoyed up by stirring s...

Continue reading...
 

Is the EU a giant squid?

June 12, 2016


In this post I’m going to look at whether the EU is so dysfunctional and plagued by major problems (e.g. migration, the euro etc) that it has become like a giant squid, threatening to drag us down into the abyss – so the safest course is to disentangle ourselves and leave.  For me, geography means that this “safer out” argument doesn’t hold much water (excuse the pun).  This is because, if we leave, “the squid” will still be sat there right next to us, with all the same problems...
Continue reading...
 

Brexit: a broader perspective (3)

June 6, 2016


Having discussed security and trade in previous posts, I’m now going to look at the impact of the EU on the domestic economy.  Maybe I should retitle this “Boring for Brexit,” as I suspect most people are sick of hearing about it – but it’s also hard to find much in the way of reasoned analysis of the issues, hence this series of posts.

Anyway, my starting point is the argument of pro-Brexit campaigners that since the vast majority of UK businesses don’t export, the Single Market i...

Continue reading...
 

Brexit: a broader perspective (2)

May 23, 2016



Having looked at the security position in my last post, I'm now going to look at whether the EU is good for trade.  The remain side says it is (and prophesies economic doom if we leave), whereas the leave campaign say we’d do better for ourselves outside the EU (and prophesies economic doom if we stay).  Both sides have been overstating their case whilst lobbing statistics at each other - so in this post I’m going to try to keep the numbers to a minimum and focus more on practical example...

Continue reading...
 

Brexit: a broader perspective (1)

May 16, 2016


I don’t usually blog that much about politics, but the referendum on 23 June 2016 on whether the UK should leave the EU is probably one of the biggest decisions voters will be asked to make in my lifetime.  Both sides in the debate have been throwing somewhat extreme and wholly contradictory claims around – when the reality is probably somewhere in between these two extremes.  So what I’m trying to do here is to look at things from a broader perspective.  If you’ve already made up you...

Continue reading...
 

The Metamorphosis of Prime Intellect

February 26, 2016



This is an excellent “big picture” sci-fi novel, which is available for free online – but it’s not one for the faint hearted (owing to a certain amount of disturbingly graphic content – of which more later).

Caroline – along with the rest of human race – “lives” in a virtual environment where she can do almost anything.  But being something of a contrary sort, Caroline most wants what she can’t have.  She is a so-called “death jockey”, who spends much of her time arrang...

Continue reading...
 

The Curse of OCR

November 1, 2015



I’m a little hesitant about criticising books for having typos, as I’m sure that – despite my best endeavours to weed them out - my own are not entirely error-free.   So having a pop at William Boyd’s publishers over the numerous typos in the Kindle editions of some of his older novels could be seen as mild hypocrisy on my part.  Someone with higher moral scruples might conceivably agonise about this for several paragraphs – perhaps even whole pages.  But a couple of sentences is en...

Continue reading...
 

About Me


Paul Samael Welcome to my blog, "Publishing Waste" which will either (a) chronicle my heroic efforts to self-publish my own fiction; or (b) demonstrate beyond a scintilla of doubt the utter futility of (a). And along the way, I will also be doing some reviews of other people's books and occasionally blogging about other stuff.
blog comments powered by Disqus
Make a Free Website with Yola.