Wednesday, 5 December 2012

Aggregating Everything - Map/Reduce and Camel?

If you are used to Map/Reduce you will be used to the idea of breaking tasks down into little chunks and then collecting the partial results together into some final format.

So, recently when I was parsing zillions of rows of data and aggregating related data into partial CSV files  and then aggregating the bits of partial of data to reports I thought - Aha! MapReduce.

For a whole bunch of good design decisions I was using Apache Camel - a neat pipelining tool which with a bit of help from ActiveMQ provides the sort of long running stability that I needed. Camel however does not do Map/Reduce, but it does have the Aggregator Integration pattern, which you can use to so a similar thing.

Image courtesy of Carlos Oliveira
Imagine you empty your jar of loose change on a table. You stack the nickles in stacks of ten coins, the dimes is stacks of ten coins and the quarters in stacks of ten coins. You add up all the 50 cents, $1s and $2.50s and you know how much you have. That's Map/Reduce.

Now, imagine you empty your jar of loose change into one of those coin counting machines in the Mall. Internally all the coins are sorted by falling through a hole which is either nickle, dime or quarter shaped and as they emerge from the other side they are counted*. That's aggregation Camel style.

I did hit a bit of a snag. I couldn't work out how to tell the Aggregator Integration pattern that there were no more files to come... Stop... Woaa-there... Desist!

It turns out that hidden away (in the middle of the docs) the File endpoint rather usefully sets a flag in the headers called CamelBatchComplete which is just what I was looking for:

Good luck fellow travelers.

* I have no idea how a coin counting machine works.

Friday, 23 November 2012

Simple Camel Configuration of a Twitter Endpoint

I was asked just now how my got the Twitter Stream working in my new Camel based project and how I managed the credentials.

The Twitter endpoint works like a dream and this is essentially what my code looks like. All you need is a file in alongside your java file.

Saturday, 17 November 2012

Apache Camel - Connection Beans Without Spring

While writing some tests for an Apache Camel project, I just spent rather longer than I'd have liked trying to work out how to configure a connection bean for Mongo without using Spring.

Since I hide my embarrassments in public I thought I'd best share with anyone else with brain freeze.

Don't forget you need the camel-mongodb artifact in your pom.xml file. Good luck fellow travellers.

Monday, 5 November 2012

Python Web Microframeworks - Take Your Pick

You may read my post "Top Python Web Frameworks - Today" in which I took a fresh look at what which Python Web Frameworks were still around and still maintained.

In this post I give a quick overview of about half of those which I have loosely designated as "Microframeworks" - regardless of what the authors have called them. Wikipedia doesn't have a definition for microframework - I just looked - so what I really mean here is anything which let's you get started without having to learn a whole bunch of syntax and convention. Right on sister!

Let's get going:

Now that's done a little code from each -

Bottle (v0.11.13)

Bottle doesn't rely on any other packages at all, which means it's a great framework to use if you want to see all the working parts as they're all in the one file. That being said it can offer client-side sessions and compression and even WebSockets straight out of the box so it's not just a toy by any means.

Flask (v0.9)

Flask is dependent on Werkzeug for all the WSGI stuff and upon Jinja2 as a template library. It comes with client-side sessions, a built in debugger and is totally unicode friendly. I love Flask and use it often as my other posts will testify. (v0.4.9)

The first of the new boys, is certainly not package independent! It requires Werkzeug for WSGI, requests for http, Beaker for sessions, Elixir and SQLAlchemy for an ORM (PostgreSQL, MySQL and SQLite), Jinja2 for templates, gevent, pev and greenlet for events as well as compressinja, jsmin, jsonpickle, putils and pyScss which add various other niceties. Almost all well known and trusted libraries. (v0.37)

Again, doesn't rely on any other packages at all, but to me it's not as useful as flask or and not as simple to study as bottle, so I can't see the point, although according to the site, it's well used by others.

wheezy.web (v0.1.307)

pip installs wheezy.web, wheezy.core, wheezy.caching, wheezy.html, wheezy.http, wheezy.routing,, wheezy.validation which to me looks like the developers have taken a sensible approach to the development cycle by splitting everything up into independent code units.

According to the site,  functionality includes routing, model update/validation, authentication/authorization,content caching with dependency, xsrf/resubmission protection, AJAX+JSON, i18n (gettext), middlewares, and more.

In Summary

No I haven't tested them to death and no I haven't even tried out and wheezy.web in a real world app, although I will do. I certainly have not done either load or concurrency testing on them. You can do that and I'll read your blog.

Goo luck fellow traveller.

Friday, 2 November 2012

CoffeeScript Love: Backbone.js Tutorials in CoffeeScript

Ahhhh enjoy the aroma, inhale the caffeine. If you're looking for a nice list of Backbone tutorials that use CoffeeScript then look no more - CoffeeScript Love: Backbone.js Tutorials in CoffeeScript.

Thursday, 1 November 2012

Top Python Web Frameworks - Today

Every once in a while it's nice to rediscover old friends. Over the last few years I've had a play with quite a few Python Web Frameworks, but one does tend to get a favourite and stick with it. Same as other areas I suppose - cars, dogs, beers, partners... Still, it's good to have a look at what's new and fresh once in a while (my wife doesn't care, but don't tell my dog.)

example of Python language
example of Python language (Photo credit: Wikipedia)
If you look around the internet, there are probably thirty or so Python Web Frameworks around, but not all of them are actively supported. This means that they are likely to be missing functionality in some of the latest latest areas of interest - OAuth2, HTML5 or mobile support for example. Or maybe they just don't make it easy to stuff that you have become used to over the last few years - AJAX or web services maybe.

In my previous post "My Top Ten Python Web Frameworks" from about 18months ago, I gave you my opinion of what was hot, or not, at that time. Some of those seem to have stalled now - Tipfy, GAE framework, Weblayer; while others have appeared or matured -,, wheezy.web.

Below is an alphabetic list of the active (updated this year) which I know about. There may be others out there, so please let me know in the comments at the bottom.
In my next few posts I'll be giving you a run down of what state each of them is in and try to give you some idea of how they fit your requirements.

Good luck fellow traveller.

Tuesday, 17 July 2012

A First Look at Flask-Evolution

While poking around in Flask-Ahoy this morning I came across Flask-Evolution which I hadn't previous noticed. It offers the Django idea of migrations to the Flask community and so I thought I'd give it a go and see how it shapes up.

So let's set up a virtual environment and get the basics in place.
Now the contents of the two files you touched. First the "hello world" Flask app goes in the file:

Next we we take the tutorial Flask-Script file and adapt it very slightly:

Now we are ready to create a simple migration for a Post model for a blog:

and you get the template file which looks like this:

In the file put your Post model and modfy "up" and "down":

Our First Migration

Our Second Migration
Now if we want to modify that Post model we can again do:

modify your model and modfy "up" and "down", I've left some comments so you can see what I did:

and run it again.

You should also play around with the "redo" and "undo" commands. That's about it really.

This is only version 0.6 of Flask-Evolution and it's got a way to go, but I can see some use in using it, but it's far from seamless and it's a long way from South.

Documentation is woeful and I'm not convinced the developer Adam Patterson has the time to do the job he'd like to.

You could always offer to give him a hand!

If you see anything wrong with what I've written here or if you spot spelling or other mistakes then please let me know.

Good luck fellow traveller.

Sunday, 8 July 2012

Hosting Domains on Your Domestic Broadband Connection

It's not ideal, but it is possible to host your own domain(s) on your home broadband connection. Find out how.

Hosting on a Budget?
I showed you in my previous last Nginx & uWSGI Living Together in Your Shed how to set up your own Nginx server to serve WSGI applications like Flask, Django, Bottle, Pyramid etc. This allows you to host your own applications on the cheap. All you need is any old server right?

Well sort of. I mean it's cheap enough to buy your own domain name and you probably will want to, but when you are filling in the registration what do you enter when it asks for your DNS servers?

Static IPs can be bought or leased from IP providers and this makes it possible to host your own DNS services. However most of us have dynamic IP addresses at home because our IP providers keep their costs down by buying big blocks of addresses and dynamically allocating us one from the deck each time we need one, and sometimes for their own maintenance reasons too. 

This means your IP address will change from time to time. 

To get around this you can use a dynamic DNS service such as who will point your domain name at your new IP address every time it changes. Cool. Their service is simple, clear and cheap (or even free). I'll let you investigate that yourself, but the main thing is that when you buy your domain name, you need to fill in the DNS bit using the dynamic DNS server names that they provide.

The final thing is: How do you know when your dynamic IP address changes and how do you inform your dynamic DNS service when it happens? Well, for my own purposes I wrote a little script which will do this for you, it checks if your IP has changed and if it has it calls the special callback URL that freedns and others supply you with. 

You can find it here:

Good luck fellow traveller.

Wednesday, 4 July 2012

Nginx & uWSGI Living Together in Your Shed

Let's assume you have just bought the memorable domain name and you want to host your lovely web app on that old server in the shed. 

I recently set up both Flask-Ahoy! and Django-Ahoy! on a five year old desktop my daughter was chucking out, so I can guarantee you that this configuration can handle enough traffic to get you going, or to demo products to your customers.

Before we start, I'm assuming you are familiar with python web frameworks (Django, Flask, Bottle etc) and with using virtualenv to segregate your environments. I usually use Flask as my framework of choice, so I'm using their "Hello World" app here.

Let's get started:

Now the contents of those files:

Finally copy the config into place and start the uwsgi daemon and the server up:

This of course is just the basics, but it will hopefully give you a starting point from which you can fiddle to your hearts content.

Good luck fellow traveller.

Monday, 2 July 2012

Using Twitter As Your Flask Login Provider in 6 Lines

First off, I lied about the 6 lines, but it got you here and now you can see how simple it is yourself. There are actually 40 lines I think.

When I was building Flask Ahoy! I wanted to use Twitter as the login provider. I could spend a while boring you about why (as opposed to say Google, Facebook, OpenID or Roll-Your-Own), but I'l save that for another exciting post.

I have used the extremely handy Flask-OAuth to do all the legwork so you need to install that:

Next we need some html for users to click on to sign in and sign out. Try this simple block :

Next the real code begins. In your views module you need three views:

  • One to send you off to Twitter to get authorized (login)
  • One to get the callback from Twitter and store your authorization credentials (twitter_authorized)
  • One to log you out and tidy up (logout)

Finally we need a special little method which remembers your credentials:

Now for the sake of safety I'm going to point out a couple of things here:

  • This scenario uses flask sessions (encrypted cookies) to store your Twitter OAuth token.
  • When you logout, you are only destroying the twitter_user flag, you should probably destroy your credentials as well.

Good luck fellow travelers.

Friday, 4 May 2012

Install Google App Engine on Ubuntu 12.04

If you want to use Google App Engine on Ubuntu there are a couple of things to note. Firstly GAE has recently moved to version 1.6.5 and secondly Ubuntu has recently released Ubuntu 12.04 (Precise Pangolin). Precise Pangolin is a Long Term Support (LTS) version of Ubuntu to come with Python 2.7 installed, and App Engine has been providing it as an option since February 2012. So if you are starting a new App Engine project, it's probably a good time to move to Python 2.7.

I'll explain briefly how you start a new project and there's a nice clean copy of the code at the bottom that you can cut and paste.

Let's get the show on the road. Choose a name for the project and create and switch to a virtual environment:

mkvirtualenv ${PROJ}

Note that note that "--no-site-packages" and "--distribute" are now the defaults for mkvirtualenv. You don't even need to use "--python=python2.7" on Ubuntu 12.04.

Now we need to know what the latest version of App Engine is, but as of writing it's 1.6.5:

wget -O /tmp/${GAE}.zip
unzip /tmp/

Now let's create an App Engine app. The app will need a name that has been created in the App Engine Console:

mkdir -p gae_app/static

Now create the app.yaml file:

echo """application: ${GAE_APP_NAME}
version: development
runtime: python27
api_version: 1
threadsafe: true

default_expiration: 7d

- url: /static
  static_dir: static
- url: .*
""" > gae_app/app.yaml

And finally the app itself:

echo """import webapp2

class MainPage(webapp2.RequestHandler):
  def get(self):
      self.response.headers['Content-Type'] = 'text/plain'
      self.response.out.write('Please replace me with a decent WSGI App Framework such as Flask')

app = webapp2.WSGIApplication([('/', MainPage)],
""" > gae_app/

And finally to run the development server:

python ./google_appengine/ gae_app/

I hope that this has all been of some help to you. Did I miss anything? Please comment below.

Monday, 27 February 2012

Generating Gibberish in Python - Markov Chains

Markov Chains allow us to generate letter sequences which can contain sensible keywords and bulk up your blog. 

I was once taught to speed read. A sham, a party trick, but interesting too. Essentially you just have to practice gliding over pages in a book or whatever and pick out key information (as opposed to just words). You do this by focusing on starts of paragraphs, which tend to be information rich, and capitalized words which tend to be names and useful information and ignore the rest.

What I'm suggesting here is almost the opposite, generating key words which are beefed up by noise in such a way that, I believe, most search engines will not see the join.

Markov Chains have been around for the most of a hundred years and work by splitting a text into groups of letters of a fixed length and record which groups follow that group anywhere in the text. To generate gibberish all you have to do is choose any group of letters and randomly pick one of the other groups that you know can follow that group; loop until you get enough text. If you use short groups of letters (say two) the text contains less information from the original text than if you use longer groups (say six), because with short groups there are more choices available. You should try this yourself to see what I mean, but I find that a group length of about four is a good start.

Now to the code. Firstly I would love to see your improvements, so please let me know or branch it from GitHub. And don't forget to read the extra information at the bottom of this page to understand how it works.

Essentially you just have to practice gliding over pages in a book or whatever and pick out key information (as opposite, generate gibberish all you use short groups of letters of the join. Firstly ins have been around for the most of a hundred years and use longer groups (say six), because obviously there are more choices available. Essentially you just have to do is choose any group of letters and randomly pick one of the other groups that you know can follow that groups of letters of a fixed length and randomly pick one of a fixed length and randomly pick one of a fixed length and randomly pick one of the code. 

Chains have been around for the most of a hundred years and words which are beefed up by noise in such a way that, I believe, most the join. GitHub op until you have to do is choose any group of letter sequences which can contain sensible keywords and capitalized words which groups follow that groups of letters (say two) the text contains less information from the original text that you know can follow that groups of letters (say two) the text contains less information from the original text. 

What up by noise in such a way that, I believe, most the other groups than if you get enough text than if you use short group anywhere information rich, and capitalized words which tent to the opposed to just words which tent to be names and work by splitting a text into groups of paragraphs, which tend to be information from the original text than if you use short group anywhere in the most of a hundred years and words which are been around for the most of a hundred years and work by splitting a text into group; loop until you have to do is choose any group of letter sequences which can contain sensible keywords and bulk up your blog. Chains have beefed up by noise in such a way that, I believe, most the opposed to just words which groups (say six), because obviously there are more choices available.

Sunday, 22 January 2012

Get Real Data from the Semantic Web - Finding Resources

In my last article, I briefly explained how to get data from a resource using python and SPARQL. This article explains how to find the resource in the first place.
Have you ever been taught how to knit? I you have, then you'll know that you are not usually taught how to cast on (or start off) on your first lesson. That's because it much easier to learn how to knit than it is to cast on.

So it is with the Semantic Web. Once you have a resource URL, it's reasonably easy to extract information linked to that resource, but finding the starting resource is a bit trickier.
So let's just recap how we might get the abstract description for London from DBpedia.

If we know the URL then that's pretty straight forward:
(If you want to follow this tutorial, then you had better copy the file from there.)

RDF types for the DBpedia entry for London
If you don't however, then you'll have to search for it. According to the dbpedia entry, London is many things, including a owl:Thing, there are a lot of Things out there, probably enough to make even the DBpdia  endpoint time out, so let's choose something more restrictive such as yago:Locations but not too restrictive, for example yago:BritishCapitals.

Just to be a smart ass as I finish off, you can get both at the same time by doing this, but don't forget that doing this will stress the SPARQL endpoint more than is probably necessary. Be kind.

Thursday, 19 January 2012

Get Real Data from the Semantic Web

Semantic Web this, Semantic Web that, what actual use is the Semantic Web in the real world? I mean how can you actually use it?

If you haven't heard the term "Semantic Web" over the last couple of years then you must have been in... well somewhere without this interweb they're all talking about.

Basically, by using metadata (see RDF), disparate bits of data floating around the web can be joined up. In otherwords they stop being disparate. Better than that, theoretically you can query the connections between the data and get lots of lovely information back. This last bit is done via SPARQL, and yes, the QL does stand for Query Language.

I say theoretically because in reality it's a bit of a pain. I may be an intelligent agent capable of finding linked bits of data through the web, but how exactly would you do that in python.

It is possible to use rdflib to find information, but it's very long winded. It's much easier to use SPARQLWrapper andin fact in the simple example below, I've used a SPARQLWrapperWrapper to make asking for lots of similarly sourced data, in this case DBPedia, even easier.

To use this try importing the DBpediaEndpoint and feeding it some SPARQL:

Your homework is - How do you identify the resource_uri in the first place?

That's for another evening.

Tuesday, 17 January 2012

Github: Who needs it?

Do you ever think that you just don't want all your code on Github? I mean it's only a quick hack right?

Truth is, once you start using git you probably use it automatically for all your code, but you don't always want all your code floating around the net. What about those hard-coded email addresses and API tokens, or those references to your private net servers?

The answer is probably so simple that you have just overlooked it. You don't need to set up a local git server or hire one from Amazon. All you need to do is use DropBox or Ubuntu One as your remote origin repository.

Here's how, using Ubuntu One on Ubuntu:

Write a short shell script something like this and save it on your path as

Now when you want to create a new repository all you have to do is:

If you use Python and virtualenv you may be interested in the slightly extended script at