2007-02-21

MySQL - Awesome Delayed Inserts

MySQL has a great feature that IMHO, isn't written about often enough, Delayed INSERTS. Check the link for the sytax details.

I was tasked with creating a public information site for the FDOT for a roadway improvement project. The site's purpose it to provide the public with information about scheduled lane closures and the the like. The site has a potential to get a lot of traffic as the site will be promoted both on handouts to the locals and on roadside light-signs. Keeping the potential for high-volume traffic and a likely low-tech audience, I designed a simple strict CSS site with great cross-platform rendering with NO JavaScript. The backend is a LAMP architecture. I setup a Google Analytics include for traffic monitoring.

When we submitted the site, however, the FDOTreviewer stated in his comments that he wanted a "hit-counter" on the site for some reason. That got me thinking. If, and that's a big IF, the site gets a ton of traffic (which I'm not really set-up for), the hit-counter would likely be one of the biggest I/O hogs. I decided to figure out the fastest way to both collect hit counts (pageviews is what I actually am tracking) and display them on the site. If I have to collect the information anyway, I might as well grab a couple of other things too such as IP, page viewed, and the referrer because no matter how cool google analytics is, there are still questions it can't easily answer that an SQL query can.

I decided to use a small MySQL table to store the hits for now, but I was concerned about the INSERTS locking the table or taking too long under load to return control to PHP->Apache->TheBrowser. I even tried to see if wrapping the INSERTS in transactions could help me get a delayed/batch effect. That's when I discovered the DELAYED option to the INSERT statement. This option is very useful if you have clients that cannot or need not wait for the INSERT to complete. Be aware that the table will need to use the MyISAM, MEMORY, or ARCHIVE engine. InnoDB won't work, however keep in mind that you can select the engine on a table-by-table basis withing a database (schema).

From the MySQL Docs:

When a client uses INSERT DELAYED, it gets an okay from the server at once, and the row is queued to be inserted when the table is not in use by any other thread.

Another major benefit of using INSERT DELAYED is that inserts from many clients are bundled together and written in one block. This is much faster than performing many separate inserts.

Voila, problem possibly solved. We'll see as traffic and table sizes icrease. Right now I'm using a count of the rows in the table for the "hit" counter. That may not scale. If anybody has empirical data on what the most efficient wat to do this is, I'd love to hear from you.

No comments: