I will start with the command first, cause some people (like me) are impatient.

sync && date && dd if=/dev/zero of=/somefile bs=16M count=64 && sync && date
sync && date && dd if=/dev/sda of=/dev/null bs=16M count=64 && sync && date

Most people don’t really know what happens there and why, and they also do not know how to measure the throughput. What values to choose, etc. Will start with each command:

  • sync - flushes the write cache buffer to disk
  • date - obvious - prints the current date and time
  • dd - copy 64 times a block size of 16MB from /dev/zero to /somefile

/dev/zero is extremely fast, after all, it only writes zeros. Some disks have optimizations for writing zeros, especially in virtual machines and ISCSI SANs. Work around would be to use if=/dev/random or if=/dev/urandom on some systems. This will not allow for those optimizations to kick in, however the random engine on linux is slow. You will need to build a large file of pre-generated random data and use that one.

How big should the block size be ?

  • On a physical disk, you might want to have it as big as the disk cache for maximum performance.
  • On a physical raid, check the buffer size of the card
  • On a virtual server, it depends on the hypervisor. For example VMWare has preallocated 32MB blocks of memory, so keep bs below 32MB.
  • On a remote disk, it depends on multiple things (local OS buffering, eventual hypervisor buffering, iSCSI buffering, remote RAID and disk buffering)

To all these you should add the file system you are using as some will be faster at writing small blocks and other faster at writing bigger blocks of data. To overcome this limitation, just write directly on the disk (assuming you have a spare disk to test on), let’s say of=/dev/sdc

When dd finishes, it prints the time taken and speed, but what it actually prints is how long it takes until the OS returns control. In reality most of the data is still in a buffer and it’s up the kernel, underlying hypervisor and remote SAN who flush to disk. sync only takes care of the OS buffer. To overcome these, you should test with a lot of data to make sure you fill all these buffers and in the end, after filling the buffers, it will write to the disk at disk speed, so make count big enough.

So if you want to figure out your disk speed divide the amount read or written by the time spent (between the 2 date calls).

In a real life scenario, you will never write that much data all of a sudden, or will you? It all depends on the kind of applications you are running. Many applications write small bits of data every now and then, so you might want to test writing with 4K blocks.

In most virtual servers, like AWS you will see that write speed is bigger than read speed, especially for small bits of data up to 1GB or so. That is because of the extensive use of buffers. When reading data however, there is no way around than to actually go to the disk and get it. This means a lot of time spent seeking the disk heads and waiting for the data to arrive over the network.

If you read random data you will see quite a low disk speed. If you read the same data over and over again, caches come into play again and read speed will increase significantly.

What you might need to do to get an accurate idea of how the disks behave is to run dd at regular intervals for a longer period of time. If your disks are not dedicated to you alone, you will see huge spikes based on other people activities, as they share those disks with you. In Amazon, each server has a 1Gb/s connection to their central EBS storage. This means that you will never get a higher speed than 128MB/s. In reality and in all my tests, you should expect speed to go up to 60MB/s and sometimes as low as 2MB/s, with an average of 40MB/s or so.

Another figure you will be interested in, is the disk average response time. This is probably a more important factor than bandwidth, assuming the bandwidth is still decent. For small writes, this is crazy fast on virtual machines due to the buffer usage discussed earlier, however linux systems suffer a lot when reads are slow.

I should probably discuss more about linux memory allocation now, but suffice to say that when you copy files you can see a lot of memory usage, which is never deallocated. That’s because the memory is used for caching those files. You can see it under “Cached” in top. This however is freed the moment another program needs memory and there is nothing free. When you run some big databases, this cache ends up being quite small and cleared often, as a result the disk access becomes slower and we all know that disk access is quite important for big databases. This is one reason I do not use Amazon for anything really big and memory hungry, however they are great for anything that can be properly split between many servers.

Do you have a massive database? After working with VMWare, Xen, AWS, Rackspace and many others, I can say there is nothing out there to compete with a physical machine. If you can afford AWS prices, you can afford buying a server with a PCI-E SSD card. Have a look at something like OCZ Revodrive or FusioIO.

About the author

Mircea Danila Dumitrescu is a highly technical advisor to startups, CTO, Entrepreneur, Geek, Mentor, Best AI Startup Winner, who previously ran multiple complex systems with billions of records and millions of customers.