MySQL Performance for More Data than Memory

In a previous post I looked at the performance hit you get in MongoDB when the working set is larger than the memory available. I thought I'd have a look to see how MySQL fares under similar conditions.

The experiment was very similar experiment to the experiment for MongoDB.

Setup

MySQL server: A large Amazon RDS instance. I chose this as RDS is very well configured for MySQL. I'm more interested in how performance changes so don't want to spend time on configuration.

MySQL client: An EC2 small instance.

Code: up on github.

Test

The test will involve inserting X documents to MySQL with the following structure:

key: n (where n is 0, 1, ... X - 1) text: 'Mary had a little lamb. ' x 100 There will be an index on key to prevent full scans of the data.

After the insert there will be 30,000 reads with random keys.

The expectation is that when the data set gets too large to fit in memory the random reads will become very slow. This will be due to the data no longer fitting in memory.

When this thrashing of the disk starts happening it will be interesting to see what happens when a subset of the dataset is read from. To investigate this a further test will be run that:

99% of the time - reads from a random key chosen from only Y% of the keys 1% of the time - reads from any key chosen from the entire dataset The expectation here is that for small Y the performance will be similar to when the entire data set is in memory - as the pages that contain the subset of data will be in memory already and not need to read from disk.

Results

A result spreadsheet is available here (Google Doc).

The interesting part is when you reach 10,000,000 keys and start reading from subsets:

    Focus (%)  Read 1 (s)  Read 2 (s)  Read 3 (s)
      100        172.35      135.02      154.77
       10        124.70       95.48       77.29
        1         32.58       24.46       22.59
For comparison, the reads for a database with only 10,000 keys average out to 20.5s.

Unsurprisingly, this is similar to the results with MongoDB. What I think makes this interesting is that a lot of people don't seem to be aware of this - there are a lot of discussions about how to optimise indices but not many on how to keep your working set small. I know that we've benefited from being more aware of this when optimising MySQL queries. Maybe you will too?

Discussion

blog comments powered by Disqus

Colin Howe

I'm Colin. I like coding, ultimate frisbee and startups. I am VP of engineering at Conversocial