Fall 2015 Solid State Drive Technology Update @ [H]

FrgMstr

Just Plain Mean
Staff member
Joined
May 18, 1997
Messages
55,634
Fall 2015 Solid State Drive Technology Update - Since our last SSD update article, the last 7 months have seen no shortage of exciting announcements, and the enthusiast market has rapidly evolved in both positive and confusing ways. Let’s get up to speed on U.2, NVMe, 3D XPoint, M&A, and the rest of the buzzword soup that make up this market.
 
So it seems that the real issue for anybody who wants to go faster than a couple of Sata drives in RAID is the PCIe bottleneck. One or two GPUs, whether for gaming or rendering, will starve these things for bandwidth.

Other than trying to force people onto -E or -EP chips, is there any real reason for Intel persisting with the low number of lanes? It's like the US cable companies shifting more things onto streaming without boosting speed or data caps.
 
Awesome write up just the right amount of detail and length to sum it all up. Thanks for the time and effort.
 
I'm hoping on a 950 Pro [H] test in the near future! The 512GB version certainly has my interest.
 
great overview. I'm still cloudy on a few things with my asus deluxe z170 but I'll worry about that if I actually want a PCIE ssd. For now I'm sticking with my 850 pro.
 
So it seems that the real issue for anybody who wants to go faster than a couple of Sata drives in RAID is the PCIe bottleneck. One or two GPUs, whether for gaming or rendering, will starve these things for bandwidth.

Other than trying to force people onto -E or -EP chips, is there any real reason for Intel persisting with the low number of lanes? It's like the US cable companies shifting more things onto streaming without boosting speed or data caps.

Ideally the mainstream Core i processors should have at least 24 PCI-E lanes from the processor (16 for graphics and 8 for PCI-E based storage, or some combination thereof). I don't see that happening though as it would cut into sales of the E series chips.
 
Last edited:
Loving my Intel 750. Looking forward to what comes next, but I can honestly say the speeds I get today I will be happy with for the next several years.
 
Awesome article, [H]!

Thanks for putting in the time for this. \m/
 
I dont have any heat issues throttling my SM951 that I am aware of.
Would like to see a comparison with SM951 and 950 Pro both m.2
 
Good article, it surely can be confusing. I just helped my friend build a comp for his friend; picked out all the parts and sent him a list...he deviated from the list and screwed it up bad LOL. They went to fry's, had a 5820K on sale for the same price as a 4790K so he bought it, but kept the same mobo for the 4790. Doh. Then went with a refurbed m.2 which the motherboard didn't have. Bought a cheapo PS because he wanted to keep his budget. Doh. which wouldn't have even booted with the 980Ti and 5820K. They went back that night and stuck to the list, it's a super nice machine.

Needless to say, SSD standards are confusing, this was a clear and concise article, well done.

Anyone see that article about NAND on a DRAM board? I think it was Intel who conjured it up, looked cool. I've always wondered when would transition to something like that.
 
Anantech reviewed the 950 Pro last week -- and yeah its fast as hell! For normal consumer type IO it is faster than the Intel 750 -- but when you really throw tons of IO -- especially on a drive in 'steady state' the Intel 750 handles it much better -- really showing it's roots as a server based SSD.
 
The 950 Pro reviews pretty much show that - for consumer market - the performance gains are not that big of a deal imho, and the price/performance takes a big hit. I'd stick with SATA drives for now, you can get a 1TB drive for close to 30 cents per gig, not too shabby.
 
But back to the top of the heap. Can the Samsung 950 Pro or one of the other scrappy challengers unseat the mighty Intel SSD 750? Will developers finally give us a reason to care? Stay tuned, and brace for impact.

I don't really want the 950 Pro to beat out the 750, I want the 950pro to be more price competitive instead of being priced similarly.

I also think we have enough reasons to care right now for faster storage but none of them are on the gaming side, or even the consumer side. I would love to see mother boards for data centers/VMLabs that have tons of connectors for m.2/u.2 drives.

Great article Chris/Kyle, its always nice to get something that brings together all the technologies and compares them. Its real easy to forget about some other the other tech that's been release with all that's been going on.
 
Now that's what I call a refresh on current SSD tech.

Seriously appreciate the time spent on this bad boy. It'll save many of us so much trouble when we go past SATA-spec stuff.

Cheers!
 
Anantech reviewed the 950 Pro last week -- and yeah its fast as hell! For normal consumer type IO it is faster than the Intel 750 -- but when you really throw tons of IO -- especially on a drive in 'steady state' the Intel 750 handles it much better -- really showing it's roots as a server based SSD.

Intel released new firmware for their 750 on Oct.23 and I don't think Anandtech used this firmware. It does speed up the drive.

https://downloadmirror.intel.com/18455/eng/Intel_SSD_Toolbox_3_3_2_Release_Notes_325993-021US.pdf
 
yep i agree i've had mine since before they came across the pond and i hammer mine hard everyday and has not throttled once!.but i have a 540 air which imo is a great case for cooling.most people that complain about throttling are using crappy cases and mutiple video cards.all my stuff is water cooled so that helps too i'm sure.
 
This new stuff is a bit confusing. Thanks for the chart it helps make sense of it.
 
Very informative and clear article and brings more excitement into the PC world. I too will stick with SATA until my next major build late 2016 or 2017.
 
So it seems that the real issue for anybody who wants to go faster than a couple of Sata drives in RAID is the PCIe bottleneck. One or two GPUs, whether for gaming or rendering, will starve these things for bandwidth.

My understanding of the Z170 chipset is that you get 16 lanes of PCI-E 3.0 dedicated to graphics that is connected directly to the CPU, and 20 lanes of PCI-E 3.0 from the chipset for non-graphics stuff like u.2/m.2 drives. Why would GPUs starve the drives for bandwidth?
 
Great writeup.

I had to double take on this one though,as I read it too fast and thought it said "The Fall of SSD's" :p
 
Two things I would bring up in a discussion like this, that were missing:


1.) With the ever increasing life span of motherboards and CPU's due to vanishingly small performance increases generation over generation, there are going to eb a lot of people who don't have M.2 or U.2 slots wanting to upgrade to PCIe SSD's. For these people either discrete PCIe SSD expansion cards, or M.2/U.2 PCIe adapters will be the solution. The issue of bootability will inevitably come up when doing this. Most M.2/U.2 drives will NOT feature boot rom's, so if your motherboard bios doesn't come with drivers preinstalled, booting from them will never work. Some PCIe expansion card models DO feature boot rom's, (Like the Intel SSD 750) and thus will boot on most systems, making them a better choice for people on pre-m.2/u.2 motherboards.

Moving installed OS partitions From MRB/BIOS installs to an UEFI onlu drive can be a nightmare though, but it is possible (I did it, and documented here)

So, the booting issue is huge. Us on pre-m2./pre u.2 motherboards (which is a lot of us) will likely appreciate that as part of the discussion.




2.) Some discussion of Queue Depths, what they mean, and how they affect client workloads is probably also useful.

Many SSD reviews compare high Queue Depth (like QD=32) figures and point out that one drive is much faster than the other in this regard. Showing a contrast between cards like this can be great, but it completely ignores the fact that high queue depths don't reflect even an enthusiasts desktop load very well. These are mostly server/database/VM host metrics.

What really counts on the client side are the low queue depth figures. QD=1 and QD=2 are both very good metrics for the desktop/laptop. As soon as we start getting towards QD=4 the impact on client workloads becomes more limited. Anything above QD=4 should probably be completely disregarded by all desktop/laptop shoppers, unless they know they have very specific needs that will take advantage of the higher queue depths.

The problem here is, you can wind up spending lots of extra money on a fancy SSD, that benches extremely high QD=32 numbers, when a cheaper one actually performs better in QD1, QD2 and partially QD4, where it really counts :p

This was touched on in less specific terms in the article when discussing how NVMe and PCIe/M.2 drives mostly aren't benefiting clients much yet, but the level of detail above will likely be helpful to people when comparing benchmarks of drives.
 
I've been trying to find an answer to this quetions:

Can you run AHCI and NVMe drives together?

I'm strongly considering a 950 Pro as a new boot drive. Common sense says of course I'd still be able to use all my Sata channels and such, but I remember having terrible issues with the IDE to AHCI switch.
 
I've been trying to find an answer to this quetions:

Can you run AHCI and NVMe drives together?

I'm strongly considering a 950 Pro as a new boot drive. Common sense says of course I'd still be able to use all my Sata channels and such, but I remember having terrible issues with the IDE to AHCI switch.


Yeah, no issues at all. They work completely independently of each other. At least on my motherboard. I wouldn't be surprised if somewhere out there there might be a motherboard with a BIOS that gets itself all confused, but that ought to be the exception, not the rule.
 
Zarathustra[H];1041948000 said:
Yeah, no issues at all. They work completely independently of each other. At least on my motherboard. I wouldn't be surprised if somewhere out there there might be a motherboard with a BIOS that gets itself all confused, but that ought to be the exception, not the rule.

Great! Thanks. I've been playing around with my Z170X Gaming7, but without one of the m.2 slots populated I can't see if I've got the option to toggle protocols on a per disk/per interface level, vs. the whole subsystem.
 
Great! Thanks. I've been playing around with my Z170X Gaming7, but without one of the m.2 slots populated I can't see if I've got the option to toggle protocols on a per disk/per interface level, vs. the whole subsystem.

Well, if your motherboard works like mine does with a PCIe Intel 750, it won't show up in the list of drives at all.

The only place you will see it in BIOS is in the UEFI boot menu.

There should be no option to select AHCI/RAID or NVMe mode, as NVMe SSD's tend to be NVMe or nothing.

For what it's worth, I had my Samsung 850 Pro hooked up to onboard sata, copying data over to my Intel 750 in a PCIe slot running on NVMe just fine.
 
My understanding of the Z170 chipset is that you get 16 lanes of PCI-E 3.0 dedicated to graphics that is connected directly to the CPU, and 20 lanes of PCI-E 3.0 from the chipset for non-graphics stuff like u.2/m.2 drives. Why would GPUs starve the drives for bandwidth?

The problem with the Z170 chipset is that the DMI 3,0 link has equivalent bandwidth to just 4x PCIe lanes. All of your SATA, network, USB, etc. and NVMe traffic have to go through this bottleneck.
The X99 chipset doesn't have this bottleneck.
 
The problem with the Z170 chipset is that the DMI 3,0 link has equivalent bandwidth to just 4x PCIe lanes. All of your SATA, network, USB, etc. and NVMe traffic have to go through this bottleneck.
The X99 chipset doesn't have this bottleneck.

True,

But I feel like these things will rarely (if ever) all be going full bore at the same time.

I mean, if you look at bus usage on a video card, it is rarely over single digits in percentage utilization on my system, and that is with two GPU's in PCIe 3.0 8x.

It's nowhere near as good as having 40+ PCIe lanes, but I think in practice the bandwidth starvation/collision won't be as bad as you suggest, and probably better than non-E Sandy/Ivy/Haswell before it.

No way of knowing without testing though :p

IMHO, I would love to have dynamic PCIe lanes for everything.

If I could pool my 40 PCIe lanes over my 7 slots, with the device currently using it getting the lanes assigned to it, that would be fantastic, as I have so many devices in my slots that are not used at the same time, and just ti there occupying lanes for no reason, when not used.

Currently it looks something like this:

980ti (8x Gen3)
Intel SSD 750 PCIe (4x Gen3)
Sound Blaster X-Fi Titanium (1x Gen1)
980ti (8x Gen3)
(slot covered by second GPU)
Brocade 10gig ethernet, (8x Gen2) (for dedicated line to my NAS server.)

Pooling the 40x Gen3 lanes worth of bandwidth and letting each device use it when needed, rather than statically assigning it to each device would be a thing of beauty, as long as that doesn't result in significantly added PCIe latency.
 
MRFSYS:

The video out put connectors on the I/O panel are connected directly to the CPU, this isn't a PCIe path, it is purely output.

The PCIe slots are connected directly to the CPU using the PCIe lanes. Your CPU is the one that is only physically capable of handling 16 lanes at once. It has a predetermined amount it will allocate to each slot depending on use.

For all other PCIe lanes that your board uses (typically for storage) are connected to a PCH chipset now, that chipset is connected directly to the CPU via the DMI link (a dedicated path).
 
The block diagram is easier to read in HardOCP's article about Z170. When that box's outline is white instead of dark blue, it's easier to see the word OR between the three choices: "1x16 lanes" OR "2x8 lanes" OR "1x8 and 2x4 lanes".
 
Zarathustra[H];1041948065 said:
Pooling the 40x Gen3 lanes worth of bandwidth and letting each device use it when needed, rather than statically assigning it to each device would be a thing of beauty, as long as that doesn't result in significantly added PCIe latency.

Almost all the PCIe latency is in just going to PCIe. The latency additions of PCIe switch chips is basically immaterial. PCIe latency is basically several to tens of uS while PCIe switch latency is in 10s of nS.

The main problem is with the ridiculous price increases of PCIe switch chips since PLX got bought out. Hopefully others that are entering the PCIe switch biz will bring pricing back to reality. If nothing else there is going to be increased competition in Yx4 switches.
 
https://en.wikipedia.org/wiki/Direct_Media_Interface

> DMI 3.0, released in August 2015, allows the 8 GT/s transfer rate per lane, for a total of four lanes and 3.93 GB/s for the CPU–PCH link.

Yes, I calculate the same result: 4 lanes @ 8 GHz = 32 GHz / 8.125 bits per byte = 3.93 GB/s
(jumbo frame is 128b/130b or 130 bits / 16 bytes = 8.125 bits per byte)

So, if I understand this correctly, an NVMe RAID controller with an x16 edge connector
will presently face an upstream bottleneck at the 4 PCIe 3.0 lanes assigned to the DMI link
on the Z170 chipset.

Same will be true of NVMe RAID controllers with an x8 edge connector.

Sounds to me as if the DMI link should have x16 lanes instead of x4
to avoid another glass ceiling on high-performance storage subsystems.

MRFS

The DMI link is the choke point, but again it's a single link and is being arbitrarily choked. If you have a need for greater bandwidth use a solution that uses the PCIe lanes.

What you are seeing with the PCIe configuration is the different allocations that are supported on your motherboard. It depends solely on what slots you have populated. For example if you have a graphics card installed in the 16x slot and nothing else, the cpu will give it 16x dedicated lanes. If you have two graphics cards installed then your cpu will split it to 8x8x. If you have a graphics card installed and a NIC in the 4x spot you are getting 8x (graphics) and 4x (NIC) instead with 4x for what ever else.
 
I think this is a typo in the chipset article above:

> "In other words after overhead is accounted for[,] you are going to see an actual limit of 40GB/s of bandwidth across the DMI bus."

Should be 4.0 GB/s of bandwidth across the DMI bus.

The exact number is 3.938 GB/s: 4 lanes @ 8 GHz / 8.125 bits per byte = 3.938 GB/s,
because PCIe 3.0 uses a 128b/130b "jumbo frame" i.e. 130 bits / 16 bytes per frame = 8.125

MRFS

The article writer got sloppy and wrote GB/sec when they should have written of Gb/sec. GB = gigabytes, Gb = gigabits.
 
> If you have a need for greater bandwidth use a solution that uses the PCIe lanes

In this block diagram of the Z170 chipset, follow the flow:

http://www.hardocp.com/article/2015/08/12/intel_z170_chipset_summary#.Vjn_rW4uaxY

... from CPU, thru DMI 3.0, to Z170 Chipset, to Up to 20 PCI Express 3.0.

Conclusions:

EVEN IF we were to design and build an NVMe RAID controller
with an x16 edge connector compatible with PCIe 3.0,
such an advanced controller must communicate to the CPU
via the chipset's DMI 3.0 link.

(The lanes issuing directly from the CPU are output ONLY.)

The DMI 3.0 link uses four lanes @ 8 GHz = 32 GHz / 8.125 bits per byte
= 3.938 GB/sec MAX HEADROOM.

Our desired controller needs 16 lanes @ 8 GHz = 128 GHz / 8.125 bits per byte
= 15.75 GB/sec MAX HEADROOM.

Thus, the DMI "chokepoint" cuts raw bandwidth by a factor of 4X.

Try doing the same comparison with a SAS RAID controller
and x8 edge connector compatible with PCIe 3.0 and
a clock rate over data cables of 12 Gb/s.

p.s. I looked around yesterday, just briefly, and did NOT
find any Intel server chipsets that are any different:

the latest ones that I found all use a DMI 3.0 link.

MRFS

Some peripherals on the motherboard connect to the PCH chip using PCIe lanes and the PCH is linked to the CPU via the DMI link. Some motherboards connect x1 ad x4 PCIe connectors and sometimes the M.2 connector to the PCH. The x16 slots should never be connected to the PCH unless you are looking at a Z170 motherboard with more than 2 x16 connectors. Intel got super confusing with their documentation, had most people thought the Z170/1151 platform had a total of 20 PCIe lanes, which wasn't true.

For those MB's that don't connect the M.2 or certain PCIe slots to the PCH you won't be limited by the DMI link.

...

(The lanes issuing directly from the CPU are output ONLY.)


MRFS

hmmm...
 
Last edited:
Good article,

I'm struggling with a bit of dissonance on my next upgrade. It's mostly storage centric and I'm torn. Looking to to get an NVMe drive for OS and 2TB 850 EVO for my d drive (steam library and other stuff)

I 'know' that I won't see any real tangible performance improvements in most stuff (Media creation scratch space aside) from having the NVMe drive in there but I just want it. It's very conflicting when I have a wedding to pay for and shouldn't be spending money on things that aren't tangible.
 
Back
Top