All too often I have heard things like “You need fast storage? Use SAN”. While this is certainly correct, most people have a relatively vague understanding about SANs in general and what they offer performance-wise compared to local disks in particular. I will not go into the details of things like the various protocols etc. Rather I want to highlight where this requirement for high performance comes from and why SANs are in many cases a good approach to it.
I am looking at all this from the point of view of commercial applications only. What they are doing -in a nutshell- is retrieving, processing and sending data. By the way, if you never touched upon the semantics of terms like data, information, knowledge and how they are related to one another, I suggest to do so. It is an interesting topic and knowing the differences can be helpful in many discussions.
But back to things …. For most commercial applications the overall performance is determined not so much by CPU power but the throughput with which information can be received or sent (or read and written from/to disk when speaking more of persistance). In technical terms this is commonly referred to as I/O throughput. And that is, by the way, a major reason why mainframes will most likely stay around for quite a few more years. They have been optimized for I/O performance for literally decades. This includes hardware (special I/O subsystems that free the CPUs from processing I/O requests), the operating system and also the applications (because people have been aware of this much more than in the open systems’ world).
But I did not want to focus on mainframes today. Rather this post is about a rather common misconception when it comes to I/O or more specifically storage performance. Many people, whenever they hear “We need high performance storage” instantly think about putting the respective data onto a SAN. While this may be the right choice in many cases, the reasoning is not so often correct. For many folks SAN is just a synonym for “some centralized ultra-fast storage pool”. It is in my view important to understand how a SAN is related to other storage devices.
There is no reason that local hard disks are “by definition” slower than their counterpart in the SAN. At the end of the day the main difference between a SAN and local hard disks is that the connection from the computer to the disks is different. For SAN this is either fiber channel (FC) or iSCSI (basically SCSI over IP). For local disks it can be SCSI (serial or good old parallel), Serial ATA or whatever. What determines the speed is the number and individual performance of the physical disks that make up a {en:RAID}. The more disks you have and the faster they each are, the faster your logical drive will be. This is generally indepedent from whether they are connected to a local RAID controller or “simply” sit in the SAN somewhere in your data center.
So as soon as you compare the SAN to a local RAID system (and I am not talking about the RAID-capabilities many motherboards offer these days but dedicated RAID controllers) the perceived performance gap goes pretty much away. You may even have an advantage with local disks because you have direct control over how many drives your file system spreads while with a SAN this may be a bit more difficult to achieve.
I will leave it here for now. Hopefully this somewhat demystifies things for you.