If you ever see a list like this in your MySQL process list, run!

+---------+--------------------------------------------+--------------------------------+------------+-----------------+------------+-------------+------------+
| Id  | User                 | Host           | db   | Command | Time | State | Info |
+---------+--------------------------------------------+--------------------------------+------------+-----------------+------------+-------------+------------+
| 442 | unauthenticated user | 10.1.1.2:55885 | NULL | Killed  | NULL | login | NULL | 
| 443 | unauthenticated user | 10.1.1.1:48009 | NULL | Killed  | NULL | login | NULL | 
.....
| 444 | unauthenticated user | 10.1.1.1:58105 | NULL | Killed  | NULL | login | NULL | 
| 445 | unauthenticated user | 10.1.1.2:60799 | NULL | Killed  | NULL | login | NULL | 
+---------+--------------------------------------------+--------------------------------+------------+-----------------+------------+-------------+------------+

No seriously. Apparently it's something of a known bug which comes up at unpredictable points in time.

One proposed solution is to add all your internal hosts which can't be looked up through DNS to your /etc/hosts file:

10.1.1.1    appserver1
10.1.1.2    appserver2

Et voila. The exact same moment, all the users were authenticated. Scary? Yes. Weird? Yes. Problem solved? Yes.

Tags: mysql

It's a classic. You want to return random rows from a table, say a collection of random users in your social network. Easy, MySQL's ORDER BY RAND() to the rescue. After all, everybody's doing it. At least on my last search on that topic, all the PHP kids did it.

SELECT * FROM USERS ORDER BY RAND() LIMIT 20

There. Does what it's supposed to.

Your social network keeps growing and growing and after about a year and 50000 new users you realize a slow-down on the page where you show random users. You're thinking of caching it, but what's the point? It wouldn't be random.

You break it down with EXPLAIN and realize with horror that your fancy query doesn't use the nice index you placed on the table ages ago. And why would it? It's calling RAND() for every row in your table, there's just no way around it. So what to do?

One alternative is to fetch random IDs first and then join the IDs found with the USERS table to fetch the real data. How do we do that? Why, using ORDER BY RAND(), of course. Wait, didn't I just say you're not supposed to use it? Well, I did say that, but the difference is that we'll run the ORDER BY RAND() on the best-indexed column there is, the primary key. Let's break it down and get our hands dirty:

SELECT USERS.* FROM (SELECT ID FROM USERS WHERE IS\_ACTIVE = 1
ORDER BY RAND() LIMIT 20)  
AS RANDOM\_USERS JOIN USERS ON USERS.ID = RANDOM\_USERS.ID

And with a little bit of thinking we got ourselves a nice and fast way to fetch random data. Most likely there are other ways out there (sometimes I do miss Oracle's ROWID), but this one worked fairly well for me. It probably won't scale forever though, so be prepared to get back to it every once in a while.

Paul Tuckfield (YouTube's MySQL administrator) gave a nice talk on do's and don'ts when it comes to scaling MySQL. He held it at last year's MySQL Conference, so it's not that recent, but it's still very much worth it.

You can (and should) watch his presentation, though unfortunately his slides don't seem to be available anywhere. Colin Charles took some notes which sum it up quite nicely.

At the end of his talks, Paul mentions the oracle caching mechanism. On each of their replication slaves they have a script running that basically reads the relay log on the slaves a little bit ahead of the SQL thread and turns the statements into SELECTs. The data is fetched into the cache and will already be there when the slave wants to update the data, so that I/O is minimized at the point of the actual applying of the data in the relay log. Pretty neat stuff.

A tool called MaatKit now includes a command that does exactly that. The author of MaatKit also wrote a nice article on the issue.

Paul also says that Python is one of the factors that YouTube scales. A little something to think about.

Tags: mysql