Poor Storage IOPS

USMCGrunt

2[H]4U
Joined
Mar 19, 2010
Messages
3,103
I have a makeshift SAN node setup that a pair of VM hosts are connected to and I get shit IOPS performance. Initially, I had a RAID0 array of 6 1TB 7200 HDDs connected to an Adaptec 3805 SATAII RAID controller and 1GB connectivity with a PowerConnect 2816 switch, everything configured for 9k frames. I'm using StarWinds Virtual SAN to handle the iSCSI connections and provide 5GB of RAM cache. This netted me about 800 IOPS...shit.

So I got myself a 500GB SSD (going through mobo SATA3 connection) and an Intel quad port NIC. Using NIC teaming and static teams configured on the switch and OS with 2GB teams to each host, I very occasionally see a spike into the 2,200 IOPS region....still shit.

Any idea what's causing such poor performance?
 
you're shitty drives. A 15K spinner is good for about 250 iops. A 7200 for about half of that, minus any compression if you have that enabled.

We're running 72TB all flash array with about 60TB of that being highly accessed databases and we barely see over 5000 iops usage. Even though the array is capable of something stupid 250,000+ iops, we'll never see that.
 
First off, the RAID card is old and only PCIe 1.1 if I remember correctly. 800 IOPs is not bad from a 7200 rpm drive (single drive is ~100 IOPs). The SSD is on average about 3,000 IOPs. What are you looking for? What are you considering poor performance? What programs, etc. in the client VMs are you running? What is your hosts specs?
 
As the other two said, the 70 / 30 numbers from a storage vendor would say you shouldn't have even been able to hit 800 IOPs. I have a dozen 7200 rpm HDDs and I believe they are quoted at about ~750 IOPs in RAID6. I think this calculator is fairly close to what my storage guy would quote me:

http://thecloudcalculator.com/calculators/disk-raid-and-iops.html

If you need IOPs then SSDs are the way to go. I haven't seen quotes yet but I'm being told that on a cost per GB, SSD pricing can be more cost effective than even 15K SAS drives these days.
 
The basic problem is that a single disk has around 100 iops. This is limited by the time to position a head and the time a special block is under the head. If you build a raid on such disks, does not matter if you build a striped raid like 0,5,6 EVERY single disk of the raid must be positioned for every block that you want to read. This means that raw iops that the whole array is capable to deliver is like a single disk. In case of a mirror read iops is twice as you can read simultaniously from both disks. In case of a raid-10 or raid 50/60 the value doubles. On ZFS iops scale with number of vdevs then.

If you do not use single random datablocks to measure iops but larger datablocks or add a read/write cache, you try to achieve a more sequential type of workload as you must not reposition the heads or wait for every block to be under the head. This is why you can achieve higher benchmark values but the effect is quite limited. If you really want to achieve more then a few hundred iiops from a raid pool with some sort of a realistic benchmark, you must improve "raw iops" of the underlying system what means massive mirrors with disks like a ZFS pool built from many mirrors or SSD only arrays. As SSDs do not suffer from position time or fragmentation, they can offer much higher iops values. Desktop SSDs can give you 50000 read iops and 5000 write iops under steady load. Do not believe the 100000 iops write values of cheap SSDs. Only the very best enterprise SSD can offer write values under steady load up to 80000 iops, NVMe disks much more.

https://en.wikipedia.org/wiki/IOPS
 
you never team iSCSI connections - go with MPIO

to check network performance start with a RAM disk and don't move on with a "real" storage before RAM-based storage won't give you wire speed

nttcp & iperf are your friend here

followed by DiskSDP

I have a makeshift SAN node setup that a pair of VM hosts are connected to and I get shit IOPS performance. Initially, I had a RAID0 array of 6 1TB 7200 HDDs connected to an Adaptec 3805 SATAII RAID controller and 1GB connectivity with a PowerConnect 2816 switch, everything configured for 9k frames. I'm using StarWinds Virtual SAN to handle the iSCSI connections and provide 5GB of RAM cache. This netted me about 800 IOPS...shit.

So I got myself a 500GB SSD (going through mobo SATA3 connection) and an Intel quad port NIC. Using NIC teaming and static teams configured on the switch and OS with 2GB teams to each host, I very occasionally see a spike into the 2,200 IOPS region....still shit.

Any idea what's causing such poor performance?
 
The basic problem is that a single disk has around 100 iops. This is limited by the time to position a head and the time a special block is under the head. If you build a raid on such disks, does not matter if you build a striped raid like 0,5,6 EVERY single disk of the raid must be positioned for every block that you want to read. This means that raw iops that the whole array is capable to deliver is like a single disk. In case of a mirror read iops is twice as you can read simultaniously from both disks. In case of a raid-10 or raid 50/60 the value doubles. On ZFS iops scale with number of vdevs then.

If you do not use single random datablocks to measure iops but larger datablocks or add a read/write cache, you try to achieve a more sequential type of workload as you must not reposition the heads or wait for every block to be under the head. This is why you can achieve higher benchmark values but the effect is quite limited. If you really want to achieve more then a few hundred iiops from a raid pool with some sort of a realistic benchmark, you must improve "raw iops" of the underlying system what means massive mirrors with disks like a ZFS pool built from many mirrors or SSD only arrays. As SSDs do not suffer from position time or fragmentation, they can offer much higher iops values. Desktop SSDs can give you 50000 read iops and 5000 write iops under steady load. Do not believe the 100000 iops write values of cheap SSDs. Only the very best enterprise SSD can offer write values under steady load up to 80000 iops, NVMe disks much more.

https://en.wikipedia.org/wiki/IOPS
I'm currently running on a cheapo 480GB SSD that says it can handle 80k random iops. I took those numbers with a grain of salt though.

you never team iSCSI connections - go with MPIO

to check network performance start with a RAM disk and don't move on with a "real" storage before RAM-based storage won't give you wire speed

nttcp & iperf are your friend here

followed by DiskSDP

My teamed iSCSI connections are utilizing MPIO.
 
Get rid of the teaming and do MPIO with links on 2 subnets. You can easily trace how the performance goes up when you try 1 link without teaming and then 2 links without teaming.
Also, what's the cache configuration? Is it write-through or write-back?
 
Get rid of the teaming and do MPIO with links on 2 subnets. You can easily trace how the performance goes up when you try 1 link without teaming and then 2 links without teaming.
Also, what's the cache configuration? Is it write-through or write-back?
write-through caching, 2.25GB RAM cache
 
The iops performance of Starwind's vSAN is pretty low for low end equipment. I tried it out, but unless you have 32GB+ of memory, it doesn't do well.

I had a coworker who set up the iSCSI we have here using Ubuntu Server using targetcli to allow direct access to the raid set. It works beautifully.
 
Are you testing io from the system that has the hdd installed or from a host over iscsi? Test locally first, then remote to rule out network issues. You can also set up a ramdisk on star wind to test your iscsi setup and remove local hdd issues from your testing.

Having used star wind extensively in the past it is more than capable of 10k+ iops with the right hardware (disregarding io patterns and just benchmarking)
 
Are you testing io from the system that has the hdd installed or from a host over iscsi? Test locally first, then remote to rule out network issues. You can also set up a ramdisk on star wind to test your iscsi setup and remove local hdd issues from your testing.

Having used star wind extensively in the past it is more than capable of 10k+ iops with the right hardware (disregarding io patterns and just benchmarking)
I'm using RAMCache in addition to the an lsfs virtual drive on an SSD. Should I abandon the lsfs format and use a physical drive or non lsfs virtual drive for better performance? The server has 8GB but for some reason only allowed me to use 2.25GB for RAMCache, not sure if that was because of the lsfs format or not.
 
try using flat images or disk bridge device

I'm using RAMCache in addition to the an lsfs virtual drive on an SSD. Should I abandon the lsfs format and use a physical drive or non lsfs virtual drive for better performance? The server has 8GB but for some reason only allowed me to use 2.25GB for RAMCache, not sure if that was because of the lsfs format or not.
 
blah.....trying to setup MPIO but for some reason I'm not able to get the iSCSI initiators to create connections to more than one IP on the iSCSI target computer. On an Initiator, if I select a Discovered Target>Properties>Add session>Advanced, if I change local adapter to Microsoft iSCSI Initiator, I can select an Initiator IP but only one IP of the Target Portal IP. If I leave Local adapter at default, I can't select a separate Initiator IP but both IPs are selectable under Target Portal IP. Basically, my VM Hosts are allowed to communicate over multiple NICs to only a single Storage NIC, instead of multiple NICs on storage.
 
As far as i know, StarWind does not recommend using teaming, may be if you break it, performance will be better. You may also try going for multiple iSCSI sessions instead of using 1.
 
1. Get rid of that teaming.
2. Get rid of the complex setup you made and decompose it to smaller parts you can diagnose: remove LSFS, use image files; remove WT cache, see how it works without it. If you still see bad performance, go to starwinds forum and get support from their engineers. They do have some really smart people on the forum.
3. Get one of their performance benchmarking guides and make sure you follow it. You may be benchmarking with your handbrake on.
 
you're shitty drives. A 15K spinner is good for about 250 iops. A 7200 for about half of that, minus any compression if you have that enabled.

We're running 72TB all flash array with about 60TB of that being highly accessed databases and we barely see over 5000 iops usage. Even though the array is capable of something stupid 250,000+ iops, we'll never see that.

This is your answer.

Pushing 5k+ IOPS sustained is no easy feat. My storage at work runs about 1500 IOPS, and usually never goes above 4000. I have seen times it pushed 7000, but mostly in bursts. 6 Equalogics, 2 all flash, 4 7200rpm NL-SAS. I am sure the theoretical is crazy, but will never see that in production.
 
I am running a 2TB iSCSI LUN the Starwind VSAN 8 over 10Gb ethernet which is mounted to a file on a 6X 4TB Ultrastar RAID 10 set on a Dell PERC H710 controller specifically for my Acronis backups from my desktop. It backs up 210GB of OS and programs in a matter of less than 10 minutes. Testing it with CrystalMark last night, I get about 850 iops out of that set, and a transfer rate of about 500MB/s. That is with no dedup or compression, but with lsfs and RAM caching.

(I have my data storage on a SMB share on the server rather than stored locally, and I wanted the 10Gb ethernet so I didn't have to sacrifice access speed to that data to remove the drives from my main machine. I also wanted access to that same data from other machines in my apartment. I also wanted to get rid of the USB drives I used for backups, but Acronis has issues backing up to an SMB share, so I set up Starwind VSAN 8 for my backup destination. That's how I ended up with this setup. My server is a Core i7 4930k with 32GB of low latency DDR3 1600. My network includes a Dlink DGS-1510-28X with 4 10Gb ports, and both my server and my main machine are running Qlogic QLE3242-SR NICs for 10Gb access. I got the Qlogic NICs at cost from Qlogic through my old company because of their partnership with Qlogic. The server runs VMs from the second port and a second server runs from the 4th 10Gb port on the switch. So, it gets used, that's for sure.)

Only using 1Gb ethernet, older 1TB drives, and an older SATA 3Gb controller, feel lucky you get 800 iops.
 
Only using 1Gb ethernet, older 1TB drives, and an older SATA 3Gb controller, feel lucky you get 800 iops.

IOPS are generally limited by the rotational speed. It really doesn't matter if you have new 7k drives and he has old 7k drives, the I/O performance is going to be similar because the actual disk latency times are quite similar. (And the speeds are not fast enough that even sata1 and gig E are plenty fast) You increase I/O by decreasing latency, or using more disks in parallel. I could probably get more random I/O out of an old Raptor than you can out of a new 7k drive because of this fact. The only thing that has changed is the density, which means more data will pass under the read / write head in a given amount of time, so sequential access where all of that data is read goes up significantly.

The biggest issue that HDDs have is that if you need to read a random piece of data somewhere on the disk, the time required for the platter to rotate that piece of info under the read / write head (and time for the read / write head to actuate) hasn't really changed. Add to that a lot of newer drives are not truly 7,200 rpm anymore because the number of platters required to make such a large drive make it difficult to spin at a high rate of speed. If you do the simple math, 7,2000 rotations a minute / 60 seconds / 1,000 = 8.3ms. So if you had two bits of data, one in one track and one right next to it in another track, the drive can read the first bit, but then has to do nothing and wait 8.3ms for the platter to come around again to get that other bit. This is as true in 2017 as it was in 1997. If data is scattered about the drive it will spend most of it's time waiting for the read / write head to position itself to read the data rather than actually reading data off the drive. More exotic disks like 15K SAS directly affect this because a 2x the rotation, it takes 1/2 the time for that platter to go around one time. I/O basically doubles because the drive only has to wait half as long to get that next bit of information as it did with a 7k drive.
 
Multiple read/write heads though, so it's not quite ~that~ bad :p But still not good. And commonality of data locality often means that it doesn't have to go quite that far either. But you're right. We're literally dealing with physical hardware ~moving~ to pull data - that's gonna be slow.
 
Too lazy to look at the last config I mentioned on here...and I haven't touched the setup in weeks because I've been busy with work, but I am running a single SSD, I think a Crucial BX 480GB (not a great one). That should negate the latency issue, or at least a great deal of latency for my setup.
 
Back
Top