Dell Quad M.2 PCIe card.

evt · Oct 8, 2017

Happy Hopping said:
So, has anyone uses Squid?

1) It is to cool the IC that allows you to use 4x NVMe M.2's.

2) You shouldn't need a motherboard/chipset to perform bifurcation since that IC will already take care of that.

==================

That one from Amazon only supports SATA M.2's, not NVMe M.2's.

Happy Hopping · Oct 8, 2017

I found more from other brand, looks better, no fan, no noise:

2 x SSD:

https://www.synology.com/en-us/products/M2D17#features

4 x SSD:

https://www.amazon.com/SEDNA-Hyoper...58&sr=8-1-spons&keywords=sedna+pcie+ssd&psc=1

it looks like Synology is a better build w/ some sort of IC w/ heat sink to control the 2 x ssd.

whereas Senda isn't saying much, there is no IC of any kind of their PCB

price is the same:

https://www.newegg.com/Product/Prod...3&cm_re=synology_m2d17-_-40-990-003-_-Product

2 of the Synology = 1 x Sedna

but 2 of Synology = price of the Dell

https://www.amazon.com/Dell-Ultra-S...rd_wg=o3dHw&psc=1&refRID=8GQPQ0A1A21EEEXG9GC8

I wonder if the Dell's fan is noisy or silent

Happy Hopping · Oct 8, 2017

evt said:
1) It is to cool the IC that allows you to use 4x NVMe M.2's.

2) You shouldn't need a motherboard/chipset to perform bifurcation since that IC will already take care of that.

==================

That one from Amazon only supports SATA M.2's, not NVMe M.2's.

You want NVMe, go w/ Asus, only $89, cheaper than Dell

https://www.amazon.com/Hyper-M-2-x1...rd_wg=lT058&psc=1&refRID=V8R0MEKTPQKDQDMDBNMY

then there is Aplicata, but I never heard of that brand name:

https://www.amazon.com/Aplicata-Qua...rd_wg=lT058&psc=1&refRID=V8R0MEKTPQKDQDMDBNMY

MixManSC · Oct 8, 2017

No idea on Squid.
On the Dell card the fan does cool the SSD's and I've never heard it myself but I know it is running. No idea what the purpose of the fan is on the Amfeltec card is for.
With the Dell card your motherboard must have a PLX chip for it to work (bifurcation). The Amfeltec card has the PLX chip on the card.
On the Dell card there is no hardware raid capability. You can software raid the individual SSD's or configure them as independent drives.
The Dell card and the Amfeltec cards as for PCIE NVME SSD's. That Senda card is for SATA SSD's. Totally different ballgames and technologies.

Thing is with the SATA ones there is no PLX or bifurcation. Those are just using some flavor SATA controller chip.

If you want to use NVMe SSD's then either the card or your motherboard must have a PLX chip to handle the PCIe bifurcation.

Dont really know anything about that Asus card. Quick look shows its somewhat specialized and only for a few select boards with some sort of Intel raid capability built in.

Highpoint does have one out now with an onboard PLX chip and raid support for NVME drives. http://www.highpoint-tech.com/USA_new/CS-product_nvme.htm

Not sure what else there is though.

Happy Hopping · Oct 9, 2017

No way anyone is paying so many times the $ for that high point card

zer0gravity · Oct 9, 2017

Why is NVMe RAID not a bigger thing?

drescherjm · Oct 9, 2017

Why is NVMe RAID not a bigger thing?

It is in enterprise where there is a bigger market / need.

BlueLineSwinger · Oct 9, 2017

zer0gravity said:
Why is NVMe RAID not a bigger thing?

Because a single NVMe drive is already faster than most any desktop/workstation can effectively utilize, so there's no need for the increased complexity and risk of a RAID setup.

Happy Hopping · Oct 9, 2017

not to mention the lost of overhead of GB capacity in any SSD RAID setup

archenroot · Dec 6, 2017

Well, I bought this beast as well, but due to limited budget, I started with one Samsung EVO 960 m.2 1TB stick ... for now... here are my results with this one drive:

Note I am no Gentoo Linux system with dual xeon (32 cpu threads total with 128GB ram)

# first I write some dummy file for benchmarking
andromeda /opt # dd if=/dev/zero of=tempfile bs=1M count=1024 conv=fdatasync,notrunc status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.14752 s, 936 MB/s

# I clean up system buffer
andromeda /opt # echo 3 > /proc/sys/vm/drop_caches

# I try to read with empty buffer in default operation
andromeda /opt # dd if=tempfile of=/dev/null bs=1M count=1024 status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.384912 s, 2.8 GB/s

# now the same, but without cleaning the system buffer
andromeda /opt # dd if=tempfile of=/dev/null bs=1M count=1024 status=progress
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.161535 s, 6.6 GB/s

so without cleaning buffer I get I get 6,6GB, here I started thinking about RAID 0 over 4x m.2

))

So I tried some other tests available on Gentoo or Linux in general:

andromeda ~ # hdparm -Tt /dev/nvme0n1

/dev/nvme0n1:
Timing cached reads: 23776 MB in 2.00 seconds = 11899.49 MB/sec
Timing buffered disk reads: 7650 MB in 3.00 seconds = 2549.95 MB/sec

This method above is independent on partition alignment.

I decided to do some tests for sync/async access by using fio utility, this time not on 1GB, but 4GB big file

# SYNC IO RANDOM ACCESS - CONFIG
andromeda /opt # cat random-read-test.fio
; random read of 128mb of data

[random-read]
rw=randread
size=4096m
directory=/opt/fio-test

# SYNC IO RANDOM ACCESS - RESULT
andromeda /opt # fio random-read-test.fio
random-read: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
fio-2.15
Starting 1 process
random-read: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [r(1)] [100.0% done] [64824KB/0KB/0KB /s] [16.3K/0/0 iops] [eta 00m:00s]
random-read: (groupid=0, jobs=1): err= 0: pid=6125: Wed Sep 20 18:08:38 2017
read : io=4096.0MB, bw=64992KB/s, iops=16247, runt= 64536msec
clat (usec): min=13, max=2288, avg=60.52, stdev= 5.64
lat (usec): min=13, max=2288, avg=60.62, stdev= 5.64
clat percentiles (usec):
| 1.00th=[ 57], 5.00th=[ 58], 10.00th=[ 58], 20.00th=[ 59],
| 30.00th=[ 59], 40.00th=[ 59], 50.00th=[ 60], 60.00th=[ 60],
| 70.00th=[ 60], 80.00th=[ 61], 90.00th=[ 62], 95.00th=[ 67],
| 99.00th=[ 91], 99.50th=[ 92], 99.90th=[ 95], 99.95th=[ 98],
| 99.99th=[ 110]
lat (usec) : 20=0.01%, 50=0.01%, 100=99.96%, 250=0.04%, 750=0.01%
lat (msec) : 2=0.01%, 4=0.01%
cpu : usr=4.58%, sys=6.32%, ctx=1048635, majf=0, minf=17
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
READ: io=4096.0MB, aggrb=64991KB/s, minb=64991KB/s, maxb=64991KB/s, mint=64536msec, maxt=64536msec

Disk stats (read/write):
nvme0n1: ios=1045941/17, merge=0/34, ticks=57687/3, in_queue=57616, util=89.58%

# ASYNC IO RANDOM ACCESS - CONFIG
andromeda /opt # cat random-read-test-aio.fio
[random-read]
rw=randread
size=4096m
directory=/opt/fio-test
ioengine=libaio
iodepth=8
direct=1
invalidate=1
andromeda /opt #

# ASYNC AIO RANDOM ACCESS - RESULT
andromeda /opt # fio random-read-test-aio.fio
random-read: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=8
fio-2.15
Starting 1 process
Jobs: 1 (f=1): [r(1)] [100.0% done] [338.2MB/0KB/0KB /s] [86.6K/0/0 iops] [eta 00m:00s]
random-read: (groupid=0, jobs=1): err= 0: pid=11209: Wed Sep 20 18:17:49 2017
read : io=4096.0MB, bw=329120KB/s, iops=82279, runt= 12744msec
slat (usec): min=2, max=93, avg= 3.17, stdev= 1.73
clat (usec): min=28, max=23455, avg=87.64, stdev=80.48
lat (usec): min=31, max=23458, avg=90.94, stdev=80.50
clat percentiles (usec):
| 1.00th=[ 57], 5.00th=[ 71], 10.00th=[ 73], 20.00th=[ 74],
| 30.00th=[ 76], 40.00th=[ 78], 50.00th=[ 82], 60.00th=[ 88],
| 70.00th=[ 91], 80.00th=[ 95], 90.00th=[ 108], 95.00th=[ 124],
| 99.00th=[ 155], 99.50th=[ 169], 99.90th=[ 209], 99.95th=[ 243],
| 99.99th=[ 2448]
lat (usec) : 50=0.01%, 100=85.59%, 250=14.36%, 500=0.02%, 750=0.01%
lat (usec) : 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
cpu : usr=17.83%, sys=36.17%, ctx=507915, majf=0, minf=58
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
READ: io=4096.0MB, aggrb=329119KB/s, minb=329119KB/s, maxb=329119KB/s, mint=12744msec, maxt=12744msec

Disk stats (read/write):
nvme0n1: ios=1040194/0, merge=0/0, ticks=84617/0, in_queue=84553, util=93.80%

=======================================

In the end as my machine has 32 cpu threads (16 physical cores) I tried to simulate more complex scenario:
4x mapped query engines
1x updating thread - to simulate file system journaling
2x background updater - simulate read and write at the same time configured to 20, or later 40 ms pause period, which is to simulate some kind of data processing period, data size of each thread is 32 or later 64MB.

# SPECIAL TEST - CONFIG
andromeda /opt # cat four-threads-randio.fio
; seven threads, two query, two writers.

[global]
rw=randread
size=4096m
directory=/opt/fio-test
ioengine=libaio
iodepth=4
invalidate=1
direct=1

[bgwriter]
rw=randwrite
iodepth=32

[queryA]
iodepth=1
ioengine=mmap
direct=0
thinktime=3

[queryB]
iodepth=1
ioengine=mmap
direct=0
thinktime=5

[queryC]
iodepth=1
ioengine=mmap
direct=0
thinktime=4

[queryD]
iodepth=1
ioengine=mmap
direct=0
thinktime=2

[bgupdaterA]
rw=randrw
iodepth=16
thinktime=20
size=32m

[bgupdaterB]
rw=randrw
iodepth=16
thinktime=40
size=64m
Result

# SPECIAL TEST - RESULT
andromeda /opt # fio seven-threads-randio.fio
bgwriter: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
queryA: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1
queryB: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1
queryC: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1
queryD: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1
bgupdaterA: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=16
bgupdaterB: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=16
fio-2.15
Starting 7 processes
queryC: Laying out IO file(s) (1 file(s) / 4096MB)
queryD: Laying out IO file(s) (1 file(s) / 4096MB)
bgupdaterA: Laying out IO file(s) (1 file(s) / 32MB)
bgupdaterB: Laying out IO file(s) (1 file(s) / 64MB)
Jobs: 1 (f=1): [_(2),r(1),_(4)] [100.0% done] [35323KB/0KB/0KB /s] [8830/0/0 iops] [eta 00m:00s]
bgwriter: (groupid=0, jobs=1): err= 0: pid=11772: Wed Sep 20 18:34:23 2017
write: io=4096.0MB, bw=669910KB/s, iops=167477, runt= 6261msec
slat (usec): min=2, max=63, avg= 4.69, stdev= 2.18
clat (usec): min=18, max=6017, avg=185.43, stdev=35.27
lat (usec): min=23, max=6020, avg=190.23, stdev=35.38
clat percentiles (usec):
| 1.00th=[ 171], 5.00th=[ 171], 10.00th=[ 173], 20.00th=[ 175],
| 30.00th=[ 177], 40.00th=[ 177], 50.00th=[ 181], 60.00th=[ 185],
| 70.00th=[ 191], 80.00th=[ 197], 90.00th=[ 205], 95.00th=[ 213],
| 99.00th=[ 231], 99.50th=[ 239], 99.90th=[ 278], 99.95th=[ 342],
| 99.99th=[ 390]
lat (usec) : 20=0.01%, 50=0.01%, 100=0.01%, 250=99.79%, 500=0.20%
lat (msec) : 10=0.01%
cpu : usr=21.49%, sys=78.40%, ctx=7, majf=0, minf=12
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=1048576/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=32
queryA: (groupid=0, jobs=1): err= 0: pid=11773: Wed Sep 20 18:34:23 2017
read : io=4096.0MB, bw=41277KB/s, iops=10319, runt=101613msec
clat (usec): min=66, max=5590, avg=92.93, stdev=84.01
lat (usec): min=66, max=5591, avg=92.98, stdev=84.01
clat percentiles (usec):
| 1.00th=[ 72], 5.00th=[ 79], 10.00th=[ 79], 20.00th=[ 80],
| 30.00th=[ 81], 40.00th=[ 82], 50.00th=[ 84], 60.00th=[ 89],
| 70.00th=[ 95], 80.00th=[ 97], 90.00th=[ 100], 95.00th=[ 106],
| 99.00th=[ 143], 99.50th=[ 197], 99.90th=[ 1848], 99.95th=[ 2224],
| 99.99th=[ 2576]
lat (usec) : 100=89.32%, 250=10.25%, 500=0.17%, 750=0.06%, 1000=0.03%
lat (msec) : 2=0.11%, 4=0.08%, 10=0.01%
cpu : usr=5.49%, sys=8.59%, ctx=1048668, majf=1048576, minf=94
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
queryB: (groupid=0, jobs=1): err= 0: pid=11774: Wed Sep 20 18:34:23 2017
read : io=4096.0MB, bw=40250KB/s, iops=10062, runt=104207msec
clat (usec): min=17, max=5694, avg=93.18, stdev=84.29
lat (usec): min=17, max=5694, avg=93.21, stdev=84.29
clat percentiles (usec):
| 1.00th=[ 73], 5.00th=[ 78], 10.00th=[ 79], 20.00th=[ 80],
| 30.00th=[ 81], 40.00th=[ 82], 50.00th=[ 85], 60.00th=[ 90],
| 70.00th=[ 95], 80.00th=[ 97], 90.00th=[ 101], 95.00th=[ 106],
| 99.00th=[ 141], 99.50th=[ 189], 99.90th=[ 1800], 99.95th=[ 2224],
| 99.99th=[ 2608]
lat (usec) : 20=0.01%, 50=0.01%, 100=87.76%, 250=11.82%, 500=0.16%
lat (usec) : 750=0.05%, 1000=0.03%
lat (msec) : 2=0.11%, 4=0.08%, 10=0.01%
cpu : usr=7.93%, sys=8.39%, ctx=1048689, majf=1048576, minf=62
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
queryC: (groupid=0, jobs=1): err= 0: pid=11775: Wed Sep 20 18:34:23 2017
read : io=4096.0MB, bw=47849KB/s, iops=11962, runt= 87658msec
clat (usec): min=57, max=6160, avg=78.94, stdev=85.48
lat (usec): min=57, max=6160, avg=78.98, stdev=85.49
clat percentiles (usec):
| 1.00th=[ 60], 5.00th=[ 61], 10.00th=[ 62], 20.00th=[ 62],
| 30.00th=[ 63], 40.00th=[ 64], 50.00th=[ 67], 60.00th=[ 78],
| 70.00th=[ 81], 80.00th=[ 85], 90.00th=[ 96], 95.00th=[ 101],
| 99.00th=[ 135], 99.50th=[ 213], 99.90th=[ 1816], 99.95th=[ 2224],
| 99.99th=[ 2576]
lat (usec) : 100=94.35%, 250=5.20%, 500=0.18%, 750=0.05%, 1000=0.03%
lat (msec) : 2=0.11%, 4=0.08%, 10=0.01%
cpu : usr=7.62%, sys=9.23%, ctx=1048640, majf=1048576, minf=48
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
queryD: (groupid=0, jobs=1): err= 0: pid=11776: Wed Sep 20 18:34:23 2017
read : io=4096.0MB, bw=54710KB/s, iops=13677, runt= 76664msec
clat (usec): min=57, max=6988, avg=70.45, stdev=86.49
lat (usec): min=57, max=6988, avg=70.48, stdev=86.49
clat percentiles (usec):
| 1.00th=[ 60], 5.00th=[ 61], 10.00th=[ 61], 20.00th=[ 62],
| 30.00th=[ 62], 40.00th=[ 63], 50.00th=[ 63], 60.00th=[ 64],
| 70.00th=[ 65], 80.00th=[ 66], 90.00th=[ 71], 95.00th=[ 83],
| 99.00th=[ 124], 99.50th=[ 213], 99.90th=[ 1848], 99.95th=[ 2224],
| 99.99th=[ 2544]
lat (usec) : 100=97.17%, 250=2.38%, 500=0.18%, 750=0.05%, 1000=0.03%
lat (msec) : 2=0.11%, 4=0.08%, 10=0.01%
cpu : usr=5.58%, sys=10.84%, ctx=1048637, majf=1048576, minf=156
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1048576/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
bgupdaterA: (groupid=0, jobs=1): err= 0: pid=11777: Wed Sep 20 18:34:23 2017
read : io=16824KB, bw=17955KB/s, iops=4488, runt= 937msec
slat (usec): min=3, max=35, avg= 4.58, stdev= 2.26
clat (usec): min=52, max=3446, avg=160.21, stdev=290.90
lat (usec): min=58, max=3450, avg=165.03, stdev=290.86
clat percentiles (usec):
| 1.00th=[ 55], 5.00th=[ 56], 10.00th=[ 57], 20.00th=[ 58],
| 30.00th=[ 59], 40.00th=[ 62], 50.00th=[ 71], 60.00th=[ 89],
| 70.00th=[ 119], 80.00th=[ 179], 90.00th=[ 310], 95.00th=[ 402],
| 99.00th=[ 1928], 99.50th=[ 2288], 99.90th=[ 3024], 99.95th=[ 3248],
| 99.99th=[ 3440]
write: io=15944KB, bw=17016KB/s, iops=4254, runt= 937msec
slat (usec): min=3, max=48, avg= 5.29, stdev= 2.64
clat (usec): min=4, max=102, avg=13.30, stdev= 4.47
lat (usec): min=15, max=110, avg=18.76, stdev= 5.32
clat percentiles (usec):
| 1.00th=[ 9], 5.00th=[ 11], 10.00th=[ 12], 20.00th=[ 12],
| 30.00th=[ 12], 40.00th=[ 12], 50.00th=[ 13], 60.00th=[ 13],
| 70.00th=[ 13], 80.00th=[ 14], 90.00th=[ 14], 95.00th=[ 16],
| 99.00th=[ 37], 99.50th=[ 41], 99.90th=[ 81], 99.95th=[ 97],
| 99.99th=[ 102]
lat (usec) : 10=0.65%, 20=46.45%, 50=1.48%, 100=32.58%, 250=11.44%
lat (usec) : 500=5.52%, 750=0.65%, 1000=0.16%
lat (msec) : 2=0.63%, 4=0.45%
cpu : usr=18.80%, sys=6.41%, ctx=8182, majf=0, minf=8
IO depths : 1=99.8%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=4206/w=3986/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=16
bgupdaterB: (groupid=0, jobs=1): err= 0: pid=11778: Wed Sep 20 18:34:23 2017
read : io=32684KB, bw=7351.4KB/s, iops=1837, runt= 4446msec
slat (usec): min=2, max=35, avg= 4.15, stdev= 2.40
clat (usec): min=40, max=5188, avg=440.41, stdev=680.03
lat (usec): min=58, max=5191, avg=444.85, stdev=679.99
clat percentiles (usec):
| 1.00th=[ 55], 5.00th=[ 57], 10.00th=[ 57], 20.00th=[ 59],
| 30.00th=[ 61], 40.00th=[ 71], 50.00th=[ 102], 60.00th=[ 161],
| 70.00th=[ 294], 80.00th=[ 628], 90.00th=[ 1672], 95.00th=[ 2160],
| 99.00th=[ 2512], 99.50th=[ 2640], 99.90th=[ 4384], 99.95th=[ 5088],
| 99.99th=[ 5216]
write: io=32852KB, bw=7389.2KB/s, iops=1847, runt= 4446msec
slat (usec): min=3, max=32, avg= 4.94, stdev= 2.33
clat (usec): min=0, max=109, avg=13.04, stdev= 4.17
lat (usec): min=14, max=116, avg=18.08, stdev= 4.82
clat percentiles (usec):
| 1.00th=[ 10], 5.00th=[ 11], 10.00th=[ 11], 20.00th=[ 12],
| 30.00th=[ 12], 40.00th=[ 12], 50.00th=[ 13], 60.00th=[ 13],
| 70.00th=[ 13], 80.00th=[ 13], 90.00th=[ 14], 95.00th=[ 15],
| 99.00th=[ 23], 99.50th=[ 35], 99.90th=[ 87], 99.95th=[ 101],
| 99.99th=[ 109]
lat (usec) : 2=0.01%, 10=0.47%, 20=48.25%, 50=1.32%, 100=24.66%
lat (usec) : 250=8.94%, 500=5.20%, 750=1.76%, 1000=1.34%
lat (msec) : 2=4.72%, 4=3.27%, 10=0.06%
cpu : usr=15.43%, sys=2.25%, ctx=16378, majf=0, minf=9
IO depths : 1=99.9%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=8171/w=8213/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
READ: io=16432MB, aggrb=161474KB/s, minb=7351KB/s, maxb=54710KB/s, mint=937msec, maxt=104207msec
WRITE: io=4143.7MB, aggrb=677703KB/s, minb=7389KB/s, maxb=669909KB/s, mint=937msec, maxt=6261msec

Disk stats (read/write):
nvme0n1: ios=4205750/1060793, merge=0/64, ticks=313136/11710, in_queue=325124, util=99.10%

Clearly Async IO is pumping up the performance.

Now, sorry guys for spamming here all these data, but I would like to see these test to be executed on 4x m.2 software raid.

We have theoretical troughput as per PCIe specification:
single direction - 16 GB/s
bidirectional - 32 GB/s - could be reached in mixed load

Now, the original testing by this thread owner showed much lower numbers.

FUTURE PLANS
Once I have 2 or more these drives (christmas is coming

) I will tune RAID 0 for max trougput and will rerun this kind of test with same parameters, so the only which changes is the additional storage.

On linux I think one of major players will be the kernel scheduler:
## for Deadline
for drive in {a..p}; do echo deadline > /sys/block/sd${drive}/queue/scheduler; done

## For CFQ
for drive in {a..p}; do echo cfq > /sys/block/sd${drive}/queue/scheduler; done

## For Noop
for drive in {a..p}; do echo noop > /sys/block/sd${drive}/queue/scheduler; done

Here are some links as my source:
https://wiki.mikejung.biz/Software_RAID
https://raid.wiki.kernel.org/index.php/Tweaking,_tuning_and_troubleshooting
https://lucatnt.com/2013/06/improve-software-raid-speed-on-linux/

Dell Quad M.2 PCIe card.

evt

Limp Gawd

Happy Hopping

Supreme [H]ardness

Happy Hopping

Supreme [H]ardness

MixManSC

║▌║█║▌│║▌║▌█

Happy Hopping

Supreme [H]ardness

zer0gravity

[H]ard|Gawd

drescherjm

[H]F Junkie

BlueLineSwinger

[H]ard|Gawd

Happy Hopping

Supreme [H]ardness

archenroot

n00b