Sorry about the long, and rather vague, acronym-fest. Please ignore, if you really don’t like that kind of thing.
Now that’s a very good question.
First, a bit of housekeeping:
- for the user tsu2 slide show, I found that I had to go to the ‘30 minute presentations’ section; after that it was readable (Opera)
- for the two Phoronix articles, well one is a forum thread linking to the other, so there is only one, really
Now, traditionally, the advice that was given (…and that seemed to make sense at the time…) was that while CFQ was the best choice for HDDs, with SSDs elevator-type scheduling makes no sense and that you might as well go for ‘noop’ scheduling (effectively, no scheduling) because the phenomenon of head stroking taking a long time is no longer present, so optimisations to work around head stroking just aren’t relevant any more. And, noop scheduling should have the lower overhead, so even if noop had the same measured performance on an otherwise unloaded system were the same as with some other scheduler, there ought to be more processor performance left over for other things.
So far, so convincing, but…purely looking at the SSD figures from Phoronix (and you’ll have to look at the original article for exact test definitions, if you are interested):
- FS mark tests (2 tests): no big difference, but the order is CFQ, then NOOP, then Deadline (worst)
- Blogmark test NOOP, then Deadline then CFQ: unlike the FSmark tests, these weren’t close results, with the best being twice as good as the worst
- Compile bench compile:CFQ then NOOP then deadline: fastest about one third better than the slowest
- Compile bench initial create: CFQ wins, the order between second and third depends on hardware platform (Sandy Bridge vs Clarksfield) - you probably call this a tie.
- Compile bench read compiled tree: On SB, CFQ wins, NOOP/DL tie ~20% behind. On Clarksfield, DL wins with CFQ/NOOP ~20% behind.
- On IOZONE, CFQ wins, but not my that much (~5%). On SB NOOP is ahead of deadline by a couple of percent; on CF NOOP and DL are about equal, with DL being trivially ahead
- On TIOT; RW, 128 M, 8 th; On SB, CFQ > NOOP > DL. On Clarksfield, DL > NOOP > CFQ.
- On TIOT; RW, 64 M, 16 th; On SB, not much in it; CF DL > CFQ > NOOP and although the margins still aren’t large, at least there is an order
Now, call me dumb but I find it difficult to call a clear winner from that lot: You’d probably say that CFQ being the best overall bet, but that if your use case was more like Blogmark, CFQ is the worst and not by a small margin. This is unexpected, but it does make the point about benchmarking that the results that you get can be heavily dependant on exactly what tests you perform.
(And a couple of passing notes - given that the results vary by platform (not just proportionally - the order also changes), who would take bets on how results would turn out on Haswell or Atom, say? And results on an AMD platform would probably be more difficult to predict still. And this was all ext4 and other filesystems would introduce another variable.)
Now, on to traditional rotating bits of magnetic matter: The first disturbing thing is that on some tests the HDD is probably about 50% of the SSD, on others it gets nowhere close to that. This is probably just a reflection of the fact that on some tests, stroke time is a significant proportion and on others, it isn’t. The second is that there are some tests where the results are closer on the HDD than they were on the SSD (see, for example, blogbench, where for SSD it is NOOP > DL > CFQ, with NOOP being handily in front, where on a HDD it is CFQ > DL > NOOP, with the difference being much smaller; now it isn’t unexpected that HDD vs SSD changes the order, but, even if you don’t write Blogbench off as an ‘outlier’, you might even say the result has little significance on an HDD, but is really significant on an SSD (a 2:1 range, versus more like 20%)).
Anyway, I think on the HDD, the thing that you don’t want is the NOOP, although there are a couple of tests that it wins. Given that this is the situation that CFQ was designed for, you might expect CFQ to win handily. As so often, in these matters, it isn’t really that clear, but DL is probably slightly ahead on points, but that could change if your use case was closer to one test than the others.
Now here is another complicating factor, or two. For the HDD, the size of the cache is probably also a factor (as would be rotational speed and interface, but probably all of the drives that you’d consider would be 7200 rpm SATA drives). These days, caches can be quite large (32/64 M) and there are many use cases where disk I/O can primarily fit into the cache, rather having the CPU wait for disk mechanics. And, most drives that you’d buy, these days, will have Native Command Queing (whereas, back in the PATA days, they wouldn’t). Now NCQ is (vaguely) like elevator seek, so you could probably argue that on an older drive, something like CFQ or Deadline are expected to be more relevant than on a more modern drive with NCQ.
I’m sorry that I couldn’t come up with something simpler or results with more clarity, but, based on that set of Phoronix results, I didn’t manage it.