Powering Conversocial's Analytics

Powering Conversocial's Analytics

We recently released our new analytics functionality for our customers. It allows them to see stats like:

  • Number of messages received each day
  • Messages processed by each agent
  • Response times split into buckets (less than 30 minutes, less than 1 hour, etc)
  • Sentiment breakdown

All of this can be viewed for different date ranges and comparisons performed to previous time periods.

We've done some interesting things to make this possible and I'd like to share them.

Queues, MongoDB and Service Oriented Architecture

When an action is performed in the system (e.g. new content arrives, someone replies to a message) an event is generated and placed in our queueing system (backed by Redis and using pyres).

This is then picked up by a worker which identifies which metrics need updating (often tens of metrics for a single event) and then a call is made to our analytics API to perform the actual update.

The analytics API itself then handles pushing this data into MongoDB.

Why have an internal API?

Creating an internal service for analytics with its own API has given us a lot of benefits:

  • Small and self-contained code base that deals purely with analytics - this makes debugging far simpler
  • Failure is isolated - if the analytics servers go down then everything else carries on running
  • Freedom to use different technologies - our main application is Django, our analytics service uses Flask and pymongo as they fit the requirements better
  • Upgrading / changing is easier - create a new analytics machine with the new code and redirect a percentage of requests to the new machine until we're happy with it

The two downsides to this:

  1. The whole system now has extra moving parts - but they're simpler so this is OK
  2. Our development environment becomes more complex - but we now have a Virtual Machine with scripts to start everything so this is moot

Why MongoDB?

MongoDB is very fast when things are in memory. As most of our metrics are updating the reading for now we can safely assume that all our counter updates will be hitting data that is in memory.

To ensure good read performance we grouped readings for an individual metric together into a single document for each month. More on that here

We also considered Redis and Cassandra but ruled them out:

  • Redis has a memory limit which makes it useless for us - we want our customers to be able to query data from last year in just as much detail as the data today
  • Cassandra would have also been a good fit - it has tremendous write performance. We have no experience deploying Cassandra and a lot of experience deploying MongoDB and so we went with MongoDB

Data Structure

All our metrics are stored in the same way: a single document per month with a value for each day. E.g.

{ type: 'message-count', date: '2012-07', 1: 756, 2: 754, ..., 30: 760 }

This works perfectly for simple statistics such as number of messages per day. However, it doesn't really work if you want to see how the breakdown of messages by hour. To achieve this we store 24 metrics, 1 for each hour:

{ type: 'message-count-1', date: '2012-07', 1: 1, 2: 1, ..., 30: 3 }
{ type: 'message-count-2', date: '2012-07', 1: 2, 2: 3, ..., 30: 7 }
...
{ type: 'message-count-24', date: '2012-07', 1: 1, 2: 5, ..., 30: 6 }

Then if we want to get the breakdown by hour we query all 24 of these documents and combine them to create the hourly breakdown.

Performance

We currently use 3 of Amazon's small instances in a replica set to power this. We haven't really stress tested read performance - most queries respond in 1 or 2 milliseconds and doing lots of them at once hasn't caused CPU usage to go above 2%.

Whilst migrating our existing data over to our new analytics we maxed out at around 2,000 metric updates per second.

Conclusions

We've really enjoyed creating this new analytics system for ourselves. By isolating the entire system behind an internal API we made our lives a lot easier and simpler :)

We're Hiring!

We're on a mission to help companies give fantastic customer service on social media such as Facebook and Twitter.

If you want to join a London based VC funded startup working with fun things like Redis, MongoDB, Amazon Web Services and hundreds of millions of messages then e-mail jobs@conversocial.com with your CV and covering letter.

Discussion

blog comments powered by Disqus

Colin Howe

I'm Colin. I like coding, ultimate frisbee and startups. I am VP of engineering at Conversocial