Monday 27 February 2012

Generating Gibberish in Python - Markov Chains

Markov Chains allow us to generate letter sequences which can contain sensible keywords and bulk up your blog. 

I was once taught to speed read. A sham, a party trick, but interesting too. Essentially you just have to practice gliding over pages in a book or whatever and pick out key information (as opposed to just words). You do this by focusing on starts of paragraphs, which tend to be information rich, and capitalized words which tend to be names and useful information and ignore the rest.

What I'm suggesting here is almost the opposite, generating key words which are beefed up by noise in such a way that, I believe, most search engines will not see the join.

Markov Chains have been around for the most of a hundred years and work by splitting a text into groups of letters of a fixed length and record which groups follow that group anywhere in the text. To generate gibberish all you have to do is choose any group of letters and randomly pick one of the other groups that you know can follow that group; loop until you get enough text. If you use short groups of letters (say two) the text contains less information from the original text than if you use longer groups (say six), because with short groups there are more choices available. You should try this yourself to see what I mean, but I find that a group length of about four is a good start.

Now to the code. Firstly I would love to see your improvements, so please let me know or branch it from GitHub. And don't forget to read the extra information at the bottom of this page to understand how it works.


Essentially you just have to practice gliding over pages in a book or whatever and pick out key information (as opposite, generate gibberish all you use short groups of letters of the join. Firstly ins have been around for the most of a hundred years and use longer groups (say six), because obviously there are more choices available. Essentially you just have to do is choose any group of letters and randomly pick one of the other groups that you know can follow that groups of letters of a fixed length and randomly pick one of a fixed length and randomly pick one of a fixed length and randomly pick one of the code. 

Chains have been around for the most of a hundred years and words which are beefed up by noise in such a way that, I believe, most the join. GitHub op until you have to do is choose any group of letter sequences which can contain sensible keywords and capitalized words which groups follow that groups of letters (say two) the text contains less information from the original text that you know can follow that groups of letters (say two) the text contains less information from the original text. 

What up by noise in such a way that, I believe, most the other groups than if you get enough text than if you use short group anywhere information rich, and capitalized words which tent to the opposed to just words which tent to be names and work by splitting a text into groups of paragraphs, which tend to be information from the original text than if you use short group anywhere in the most of a hundred years and words which are been around for the most of a hundred years and work by splitting a text into group; loop until you have to do is choose any group of letter sequences which can contain sensible keywords and bulk up your blog. Chains have beefed up by noise in such a way that, I believe, most the opposed to just words which groups (say six), because obviously there are more choices available.