Very interesting experiment, but somewhat this is the proof that Redis and any other memory intensive application where memory bandwidth can be one of the most important factors (MySQL is another example, and in general most database systems) are penalized running inside VMs compared to real hardware. 10-20% improvements are small compared to the order of magnitude of difference between real hardware and virtualized hardware.
This does not mean virtualization is in general not a good fit for Redis and other DBs. Actually it is already and with the introduction of Redis Cluster to take many small VMs with 1 or 2 GB each and running a cluster will be the way to go for most users probably (every instance will serve a subset of the queries, and there are cheap VMs with a good amount of memory).
But still, Redis in the most basic of the real-hardware Linux boxes will deliver 100k operations per second per core, while with a small virtualized instance it is an order of magnitude slower. Something to take in mind for deployments.
Are you sure there's an order of magnitude of difference between real and virtualized hardware? Does redis actually deliver 100k operations _per core_? (I thought it is supposed to be per machine, because memory bandwidth is a bottleneck or something) Can you point to the relevant benchmarks, please.
I am interested to know how big of a difference there is. Maybe redis on Xen performs not bad after all, considering that 2.2 GHz Athlon 64 3700+ (single core CPU, circa 2005) is lower powered than what is considered server box today.
Yes I'm sure. from 100k to 140k operations per second in any new computer (not server or special hardware) you can buy today as a desktop everywhere.
Check the Redis benchmark page on google code to see this numbers, but I want to show you how Redis performs in an EEE PC 701:
PING: 7942.81 requests per second
PING (multi bulk): 7861.64 requests per second
MSET (10 keys, multi bulk): 6435.01 requests per second
SET: 5698.01 requests per second
GET: 8183.31 requests per second
INCR: 9074.41 requests per second
LPUSH: 8710.80 requests per second
LPOP: 8583.69 requests per second
SADD: 8396.31 requests per second
SPOP: 8123.48 requests per second
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 13
model name : Intel(R) Celeron(R) M processor 900MHz
stepping : 8
cpu MHz : 630.029
cache size : 512 KB
Note that it is clocked at 630 Mhz.
I don't know exactly but the PC you mention can not deliver less than 50k operations per second.
Yes, but with a big conditional ;) That is: if you are not using multi-keys operations, or your multi-key operations have great locality so that you can use "hash tags" (http://antirez.com/post/Sorting-in-key-value-data-model.html), then it is an easy upgrade.
We should release ASAP guidelines to make sure current users are designing schemas that can run into Redis Cluster with little efforts.
Yes, this is definitely an interesting question. Salvatore says above that it's an order of magnitude, but it would be great to see the numbers for some specific CPU.
I'd also love to see how much of a penalty is imposed by the OS for things like virtual memory support. It would be fun to see the numbers for something closer to bare metal like LoperOS, or TinyOS without Xen and running directly. Not that this would be practical for many real world circumstances, but I don't really have a clue what the overhead is.
Yeah, I've always wondered too that we're running servers with the full OS stack (sometimes horrific beast like Solaris) to operate a single simple program.
What a waste. Xen is looking more interesting every day, with stuff like this and HaLVM.
Yeah, I've always wondered too that we're running servers with the full OS stack (sometimes horrific beast like Solaris) to operate a single simple program.
Because there is more to running an application than just running the application. You need disks, network configuration, somewhere to send your log messages, etc., etc.
By the time you've implemented all that, congratulations, you have an OS.
In the case of this Redis example, they dispensed with all of that and used the host OS's TCP stack via Xen. If you were running on the bare metal, you wouldn't even have that.
If you ever checked out TinyOS (for embedded system), you will like their idea a lot. The main premise about TinyOS is to wrap a thin layer of necessary (by "necessary" it means, for example, if you only used TCP/IP in the app, it won't pack UDP support into) OS component along with your application and write the generated binary to the ROM directly.
I don't think I'd use it even if Redis was a bottleneck [1]. With a performance increase of around 11% for SET/GET commands, I don't think there's that much to gain from running redis without an underlying OS. There's quite a bit to lose, too, because if you think about it, the implications here are that you aren't going to (ever) be able to make your data persist. Redis is a better fit for data that doesn't necessarily require persistence anyway, but putting this all on a scale, 11% don't win me over.
That being said (which is specifically about Redis on Xen), though, I think being able to run other server software without an underlying OS is pretty damn neat.
[1] Redis already scales nicely and with the upcoming Redis Cluster it's going to be trivial to run a fault-tolerant scaleable environment.
You're right - bad choice of words. I can only hope that salvatore's implementation does make it trivial. I have spoken to him about it and seen him present about it and do feel confident at this point. It's obviously a little soon to say it'll make it trivial - I like the thinking behind it, that is all.
This does not mean virtualization is in general not a good fit for Redis and other DBs. Actually it is already and with the introduction of Redis Cluster to take many small VMs with 1 or 2 GB each and running a cluster will be the way to go for most users probably (every instance will serve a subset of the queries, and there are cheap VMs with a good amount of memory).
But still, Redis in the most basic of the real-hardware Linux boxes will deliver 100k operations per second per core, while with a small virtualized instance it is an order of magnitude slower. Something to take in mind for deployments.