Raid 10 performance?

  • Thread starter Deleted member 330132
  • Start date
D

Deleted member 330132

Guest
I'm making a raid 10 with 6 Re 4 drives: https://hdd.userbenchmark.com/SpeedTest/5792/WDC-WD2503ABYX-01WERA0 <-The drives indvidually test at avarage 95mb/s with gnome disk utility.

When I assemble it in any configuration of raid 10 it gets a total write speed of about 70-130mb/s max. Is there a way to get this closer to 250-300mb/s?

I'm using fedora 32 with mate and mdadm atm. I thought raid 10's could get up to 3x the write speed with 6 drives. I'm testing using gnome disk utility.

Is there anything like asynchronous write or a proper file system or something that can get more speed out of it?

When I used the format function to set the drives to zeros they started at around 65mb/s per drive and ended at around 40mb/s.

Does it matter that I don't have swap on my computer? My main drive is an SSD and I didn't do swap to avoid writes.

Drive are reading at near 133.7mb/s individually. Except one that is doing 144mb/s.
 
Last edited by a moderator:
A single mirror has the same write performance than the weaker disk. Only readperformance can be the sum of both disks on a good mirror array that can read from both disks simulatiously. A rated sequential performance of > 200 MB/s per disk is only available with a pure datastream to the inner tracks of a disk, more or less only under benchmark conditions with some filesystems. In case of a raid or a modern filesystem, data is spread over the disk. Depending on this and an optional fragmentation, performance especially with small files can go down to 20-60 MB/s.

A raid-10 is a stripe of two mirrors. This means that iops and sequential performance doubles from a mirror. If you calculate an average 120 MB/s per disk read/write, expect 240 MB/s write and 480 MB/s read at the best when disks are empty, with small files or disks more full maybe the half. In general with low RAM and therefor small rambased read/write caches, older filesystems like ext4 or ntfs are faster (and less secure) than newer ones like ZFS.
 
Last edited:
My raid 1's when I manually did them all got around 64mb/s. which is 50-66% of a single disks performance. Should that happen?

After that I have 3 of them in a raid which even then should get near 180mb/s, but I only get the 70-130mb/s max.

Ideally if all disks perform at 95mb/s average I should be getting near 270mb/s for 3 mirrors shouldn't I? I might be missing something.

My raid 10 is a stripe of 3 mirrors. I think mdadm's default is a 2 stripes of 3 disks though. I'm not sure. Even if I manually mirror them and make a raid 10 out of them it still gets the same performance or worse.

The disks rated minimum file transfer is around 40mb/s. It's like it's always keeping the minimum. I've even done a 2 mirror array to test it and got the exact same performance realities minus about 10% at most. And the mdadm default near2 arrays is the highest performing of all. That is a set of two stripes of 3 or something. I'm not sure exactly.
 
Last edited by a moderator:
A raid performs on write like the weakest disk.
I would check disk by disk to find out if all disks perform similar or one is bad.
 
I have(as much as I know how to.). The performance is identical unless i missed something.

The only oddity I saw was when I zeroed the disk. One of the disks slowed down around the 75% mark below the others. It went from 40-45mb/s to near 35mb/s until around 80%. At which point it went back up. I have no idea what caused it though.

But that disk shows no difference on any performance measure. Is there a way to test them more thoroughly?



This is it's smart data: (The disks that slowed down for 5% between 75% and 80%.)
Code:
$ sudo smartctl -A /dev/sdg
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.7.12-200.fc32.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       21
  3 Spin_Up_Time            0x0027   137   137   021    Pre-fail  Always       -       4133
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       113
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       1665
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       113
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       81
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       31
194 Temperature_Celsius     0x0022   110   106   000    Old_age   Always       -       33
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

Here are the other ones. I'm not sure what to look for to be honest. Raid disks are sdc - sdh.
Code:
sudo smartctl -A /dev/sdc
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.7.12-200.fc32.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       3
  3 Spin_Up_Time            0x0027   136   136   021    Pre-fail  Always       -       4175
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       106
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       322
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       104
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       77
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       28
194 Temperature_Celsius     0x0022   111   100   000    Old_age   Always       -       32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

$ sudo smartctl -A /dev/sdd
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.7.12-200.fc32.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1
  3 Spin_Up_Time            0x0027   142   140   021    Pre-fail  Always       -       3875
  4 Start_Stop_Count        0x0032   098   098   000    Old_age   Always       -       2417
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   077   077   000    Old_age   Always       -       16805
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       2090
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       605
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       1813
194 Temperature_Celsius     0x0022   111   100   000    Old_age   Always       -       32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

$ sudo smartctl -A /dev/sde
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.7.12-200.fc32.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1127
  3 Spin_Up_Time            0x0027   139   137   021    Pre-fail  Always       -       4050
  4 Start_Stop_Count        0x0032   098   098   000    Old_age   Always       -       2401
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   076   076   000    Old_age   Always       -       17703
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       2091
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       628
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       1772
194 Temperature_Celsius     0x0022   111   102   000    Old_age   Always       -       32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       9

$ sudo smartctl -A /dev/sdf
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.7.12-200.fc32.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       3
  3 Spin_Up_Time            0x0027   137   137   021    Pre-fail  Always       -       4133
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       82
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   056   056   000    Old_age   Always       -       32506
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       80
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       57
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       24
194 Temperature_Celsius     0x0022   110   101   000    Old_age   Always       -       33
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

$ sudo smartctl -A /dev/sdh
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.7.12-200.fc32.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   137   137   021    Pre-fail  Always       -       4108
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       114
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       445
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       114
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       83
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       30
194 Temperature_Celsius     0x0022   114   104   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

EDIT: Got something interesting. I was messing around with the chunk size and I currently have it at 512M. I started a transfer of a game folder that is around 33gb. It started at 125mb/s. I then changed the vmdirty/background from 5/10 to 10/20 and it started writing at 250mb/s but slowly degraded back to 125mb/s. Does this indicate anything? (Actually it's started to start at near 400mb/s then drop down to 150-125mb/s in some cases. But it's very random.)

I can't seem to fully reproduce it though. It now starts at 150mb/s and drops down to 125mb/s. It seems it has the capacity to use more, but I can't figure out how to get it to work. It seems to temporarily work better right after changing the VMdirty and background settings. But only after the first change. It also doesn't seem to be increasing ram usage much despite the changes. I think something is not utilizing the system fully.

Does it matter for file transfers that I don't have a swap location? I wonder if linux is programmed to only use swap or something wierd and not ram directly first. I've asked stuff like this on linux forums and nobody will answer any of it sadly.

That or I wonder if my read spead isn't fast enough from my ssd that I'm copying from.

Code:
$ sudo dd if=/dev/zero of=/mnt/Storage/zero bs=4k count=100000
100000+0 records in
100000+0 records out
409600000 bytes (410 MB, 391 MiB) copied, 0.933487 s, 439 MB/s

$ sudo dd if=/dev/zero of=/mnt/Storage/zero bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 3.43706 s, 312 MB/s

It's apparently writing at full speed according to this, but it never does more in practice. Is it not using all the disks? I don't understand what it's doing. Is something else getting in the way?

Actually, my ssd is getting oddly low writes also.

Code:
$ sudo dd if=/dev/zero of=/home/zero bs=4k count=100000
100000+0 records in
100000+0 records out
409600000 bytes (410 MB, 391 MiB) copied, 0.450286 s, 910 MB/s
$ sudo dd if=/dev/zero of=/home/zero bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 10.4647 s, 103 MB/s

Trying to figure if this has anything to do with: https://serverfault.com/questions/471070/linux-file-system-cache-move-data-from-dirty-to-writeback

Edit: It turns out using dirty.ratio limits it to 1gb of ram usage. You have to manually set the dirty.bytes in order to increase it above this limit.

Still trying to optimize the speed. I had it where I was bursting starting from 400mb/s, but then it would rappidly slow down. Can't figure out how to do that while setting dirty bytes over ratio. I think one method reversed dirty ratio and background ratio. I'm still not sure what I did. I'm actually switching over to offset to play with that. I have been running the chunk size at 512m and similar to mess with the speed on near. I'm changing to offset2 with 64m chunk size to see if that helps get it to pull more data more evenly and maximize disk usage across the array. Let alone whatever else is happening. I don't understand the specifics and can't find thorough enough information on it to get a grasp on how to make it perform better yet. Too much underlying information missing everywhere. Too many simple tutorials.
 
Last edited by a moderator:
In real world (set and forget) usage is it going to be a big issue? One thing I've learned in 30+ years of IT is that things rarely perform as well as the box (or logic) says. Sometimes you just have to put up with 'good enough' unless you want to spend $X000 more.

The biggest issue I see with data transfer is file sizes. Oh massive 4K video files will pump at the fastest your setup can push/take but then you hit 500MB of microfiles and you will be waiting an hour for those to move at KBps!

I hate microfiles. Especially recovery on Apple Time machine archives. Painful.
 
And how should it be set to get it to it's proper performance? It starts to burst when setup in certain ways using vm.dirty_bytes etc at the speed it should be then degraids in speed contanstly till 125mb/s. This fits the descriptions of linux not using enough ram. It should be getting a lot more than 125mb/s.

How should the raid be made to get best performance. I've read stuff about aligning with an offset so that it uses all the drives etc and not just using the default. I would like to actually set this up close to well working. 1/4th of it's proper speed it not really acceptable.

If you've worked in IT for 30 years why don't you know how to optimize a raid and have information to give? This stuff used to be more common. Do you know what should be happening speed wise?

I don't understand why nobody can actually help with this. How is telling someone to give up and deal with it good when I haven't even gotten the chance to try correctly. It used to be really easy to get help with raids too. I honestly don't understand what is happening now a days. It's not just this forum. Nobody responds at all now to raid help. It used to be fairly common knowledge. They are not out of use so I don't get whats up. And they will probably keep comming in and out of use till the cold death of the universe.

I haven't done one of these in near a decade and I don't remember the specifics anymore. All internet searched bring up basically nothing more deep than the commands to make the raid and those are completely shallow because of all the annoying websites that want money from adds.
 
Last edited by a moderator:
First, if you're in Linux, make sure you have noatime so it's not trying to update each record while accessing a file. Also, keep in mind, MB/s is a horrible metric to use for servers and RAID. You are looking at/for IOPS, MB/s is only for purely sequential read/writes (which rarely happens, although this is highly dependent on the work load you're using) and isn't a great indicator of actual performance. I run a 6 disk RAID 10 and I can get 900MB/s in reads using sequential (I can't recall write speeds, but my work load is much more read centric), but my file transfers tank due to most folders having many small files. My RAID 10 is through a hardware RAID card that has built in RAM cache and a battery backup, so I'm not sure on all the commands and settings for software RAID.

RAID 10 with 6 disks should be approximately 6x read speed and 3x write speed. Of course this will depend on your hard drives as well (and even the disk format). Also, platters are known to have different speeds at the start/end of the platter, so depending on how full the drive is or where it's writing, you may have degraded performance as well.

Also, dd isn't great for benchmarking, please use something like FIO, it will give you MB/s as well as IOPS.
https://support.binarylane.com.au/support/solutions/articles/1000055889-how-to-benchmark-disk-i-o
 

Does this indicate anything wrong. I redid the array into n2. This normally happens after making an array at this point and I usually reformat with gnome disk utility to fix it. Not sure why this happens or what it is. Does this reformatting mess up anything related to the array?
 
If you've worked in IT for 30 years why don't you know how to optimize a raid and have information to give? This stuff used to be more common. Do you know what should be happening speed wise?
I learned not to sweat the small stuff. Is it fast enough to do the job? If so live with it and move on to the next project.
 
I learned not to sweat the small stuff. Is it fast enough to do the job? If so live with it and move on to the next project.

Or you could stop posting in a thread asking for help when you have not done anything to try to answer teh question. It's my decision to ask a question. not yours. I wouldn't be asking any questions if I didn't feel the need to. Kindly cease posting if you had no interest in helping to begin with. Nobody cares about your opinion on the matter if it's not answering the question.

Find a hobby instead of amusing yourself by harassing people who are asking for help. Nobody gives one shit what you think.
 
Your using mdadm to create the array? What parameters are you using? How are you mounting? I only run Linux headless so I'm much more familiar with good o'le console commands and manual editing of my fstab file. For giggles, can I see what you get for your disk info when you run.
lsblk -o NAME,SIZE,FSTYPE,TYPE,MOUNTPOINT
And also
cat /proc/mdstat
 
Code:
cat /proc/mdstat
Personalities : [raid10]
md10 : active raid10 sdh[5] sdg[4] sdf[3] sde[2] sdd[1] sdc[0]
      734527488 blocks super 1.2 262144K chunks 2 near-copies [6/6] [UUUUUU]
      bitmap: 0/6 pages [0KB], 65536KB chunk

#order of commands
sudo mdadm --stop /dev/md10
sudo mdadm --zero-superblock /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
sudo mdadm --create --verbose --assume-clean /dev/md10 --level=10 --layout=n2 --raid-devices=6 --chunk=256M /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh
#The rest is done in gnome utility disk.

lsblk -o NAME,SIZE,FSTYPE,TYPE,MOUNTPOINT
NAME                                            SIZE FSTYPE            TYPE   MOUNTPOINT
sda                                           238.5G                   disk
├─sda1                                            1G ext4              part   /boot
└─sda2                                        237.5G crypto_LUKS       part
  └─luks-e5147da0-2b15-4f00-bd75-5af1fe9f6f75 237.5G LVM2_member       crypt
    ├─fedora_localhost--live-root                25G ext4              lvm    /
    └─fedora_localhost--live-home             212.5G ext4              lvm    /home
sdb                                             1.8T                   disk
└─sdb1                                          1.8T crypto_LUKS       part
  └─luks-223344de-b882-4444-becd-a4e6a5fe1cfd   1.8T ext4              crypt  /mnt/Backup
sdc                                           233.8G linux_raid_member disk
└─md10                                        700.5G ext4              raid10 /mnt/Storage
sdd                                           233.8G linux_raid_member disk
└─md10                                        700.5G ext4              raid10 /mnt/Storage
sde                                           233.8G linux_raid_member disk
└─md10                                        700.5G ext4              raid10 /mnt/Storage
sdf                                           233.8G linux_raid_member disk
└─md10                                        700.5G ext4              raid10 /mnt/Storage
sdg                                           233.8G linux_raid_member disk
└─md10                                        700.5G ext4              raid10 /mnt/Storage
sdh                                           233.8G linux_raid_member disk
└─md10                                        700.5G ext4              raid10 /mnt/Storage
sr0                                            1024M                   rom
sr0 is a tmpfs I was trying to figure out how to turn into a ramdisk to see if it would let the raid use more than 1gb of disk space. This can be somewhat fixed by using dirty.bytes over dirty.ratio etc. But I don't know to what extent or what it does enough to know if there are still any problems. Let alone how weird everything acts. I have to reverse background and normal to get burst mode basically. I don't know enough of what any of it does under the hood to know how to eliminate it as a problem yet.


Does anyone know where Fio installs by default from dnf? I installed it that way from the fedora repositories and it's not in ./fio like the instructions assume. Doing a search atm but it will likely take a few hours. Edit: nvm, I think you just type fio instead of ./fio.

Also, when I originally zeroed the drives during a format they all ran from 65-40mb/s from one side of the disk to another. So, my normal performance is about 3x the worst performance of the drive. It should be running 95mb/s sequential and 70mb/s random. Mdadm is also doing this with what seems to be two stripes instead of 3 mirrors. And when I manually did 3 mirrors the mirrors individually tested at 65mb/s or so writes. Could this be displaying where the problem is?

Code:
sudo mdadm -D /dev/md10
/dev/md10:
           Version : 1.2
     Creation Time : Mon Aug 17 17:52:17 2020
        Raid Level : raid10
        Array Size : 734527488 (700.50 GiB 752.16 GB)
     Used Dev Size : 244842496 (233.50 GiB 250.72 GB)
      Raid Devices : 6
     Total Devices : 6
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Mon Aug 17 19:51:20 2020
             State : clean
    Active Devices : 6
   Working Devices : 6
    Failed Devices : 0
     Spare Devices : 0

            Layout : near=2
        Chunk Size : 262144K

Consistency Policy : bitmap

              Name : localhost-live.attlocal.net:10  (local to host localhost-live.attlocal.net)
              UUID : a3fe09a9:09cf65b5:02b99912:54859302
            Events : 2

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync set-A   /dev/sdc
       1       8       48        1      active sync set-B   /dev/sdd
       2       8       64        2      active sync set-A   /dev/sde
       3       8       80        3      active sync set-B   /dev/sdf
       4       8       96        4      active sync set-A   /dev/sdg
       5       8      112        5      active sync set-B   /dev/sdh

Just noticing this, but the proc mdstats shows the chunk size as 64M where mdadm -D shows it as 256M. 64mb is the size of the cache. Is that a limit. I was trying above it to see if it helped use more of the drives up at once. I usually just look at mdadm info. I always forget the other exists. Edit: NVM, there are two chunks listed. One is the raids and one is 64m.

Should I be doing the chunk size at like 4M instead of above 64M. I was using 512 originally and higher and getting a burst mode that started at max speed but dropped down. I think this was usually after starting a copy and then stopping it and restarting with a real file. I wonder if this is from the size of the chunk putting things in ram fast enough or something. Is it safe to put chunk sizes in the multi gig speeds? I could do 50gb chunks and let it do it's thing. That is probably the largest file I would transfer. Or do chunks in accordance with the parts of the disk that are fastest so they are one chunk. Maybe 8gb chunks so it only uses up to half of my ram. Not sure how this effects performance overall. Edit: The highest chunk size it will let me set is 1gb.

I'll have to get fio working and test performance more.

Editagain: https://hdd.userbenchmark.com/SpeedTest/5792/WDC-WD2503ABYX-01WERA0

What out of the box performance should be expected with 6 of these disk with the stated speeds in the link:


MinAvgMax
50Read 110141
43.7Write 104132
40.2Mixed 78.3106
39.3SusWrite 95.5133
71.7% 97.1 MB/s
MinAvgMax
0.274K Read 1.11.4
0.74K Write 2.343.1
0.44K Mixed 0.921.1
191% 1.45 MB/s

I'm copying a 33.7gb install folder of warthunder to test.

Code:
sudo sysctl -a | grep dirty
vm.dirty_background_bytes = 67108864
vm.dirty_background_ratio = 0
vm.dirty_bytes = 6442450944
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 0
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 43200

This just got me a 400mb/s+ starting burst mode that slows down rapidly. But it only happens after I start and stop and then restart a copy. It has slight stalls noticable hickups. I wonder if any of this is the read speed of the ssd. that is where the file is coming from. It should be faster though.

Code:
sudo sysctl -a | grep dirty
vm.dirty_background_bytes = 2796202
vm.dirty_background_ratio = 0
vm.dirty_bytes = 17179869184
vm.dirty_expire_centisecs = 150
vm.dirty_ratio = 0
vm.dirty_writeback_centisecs = 25
vm.dirtytime_expire_seconds = 9

This is also getting interesting results at first. it's keeping 250mb/s and lower for longer. But still taking a long time as it drops speed.

Code:
$ sudo sysctl -a | grep dirty
vm.dirty_background_bytes = 17179869184
vm.dirty_background_ratio = 0
vm.dirty_bytes = 17179869184
vm.dirty_expire_centisecs = 150
vm.dirty_ratio = 0
vm.dirty_writeback_centisecs = 25
vm.dirtytime_expire_seconds = 9

This is getting the best results. It started at above 450mb/s till 9gb copied then stuttered and dropped to 200mb/s and then slowed down constantly.(initial use without restarting teh copy job jumps up to around 300mb/s or more before slowing down in a slower manner resting at about 140mb/s.) It's only using 3.4gb of ram total for the system so it's not using most of the ram. And it's only maxing one core at a time on my cpu. I wonder if the problem is it isn't using multiple cores or threads.

I have 16gb of 1600mhz ram at 1/9/9/9/24. And a Phenom 1100t cpu. It's 6core/6thread. Would it help to turn all my ram into a tmpfs file? I think the system is not using ram and threads correctly or something. I have 6 disks. It would be nice to use all 6 threads and all the ram.

That or my throughput on the single core is maxing and stalling. The rest of the time they are all jumping up and down together.
 
Last edited by a moderator:
Here is the fio test for my ssd and my raid. did the ssd incase my settings are introducing problems:

sda: SSD/boot drive:
Code:
#First test with no existing file.

$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.21
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=69.9MiB/s,w=23.1MiB/s][r=17.9k,w=5904 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=33335: Tue Aug 18 08:50:48 2020
  read: IOPS=16.2k, BW=63.5MiB/s (66.5MB/s)(3070MiB/48376msec)
   bw (  KiB/s): min= 5728, max=117480, per=100.00%, avg=65077.40, stdev=17463.74, samples=96
   iops        : min= 1432, max=29370, avg=16269.22, stdev=4365.93, samples=96
  write: IOPS=5429, BW=21.2MiB/s (22.2MB/s)(1026MiB/48376msec); 0 zone resets
   bw (  KiB/s): min= 2392, max=39272, per=100.00%, avg=21750.82, stdev=5720.15, samples=96
   iops        : min=  598, max= 9818, avg=5437.54, stdev=1430.04, samples=96
  cpu          : usr=5.75%, sys=30.18%, ctx=229526, majf=1, minf=6
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=63.5MiB/s (66.5MB/s), 63.5MiB/s-63.5MiB/s (66.5MB/s-66.5MB/s), io=3070MiB (3219MB), run=48376-48376msec
  WRITE: bw=21.2MiB/s (22.2MB/s), 21.2MiB/s-21.2MiB/s (22.2MB/s-22.2MB/s), io=1026MiB (1076MB), run=48376-48376msec

Disk stats (read/write):
    dm-2: ios=802991/262361, merge=0/0, ticks=2400759/459547, in_queue=2860306, util=97.69%, aggrios=804279/262741, aggrmerge=0/0, aggrticks=2403207/459744, aggrin_queue=2862951, aggrutil=97.71%
    dm-0: ios=804279/262741, merge=0/0, ticks=2403207/459744, in_queue=2862951, util=97.71%, aggrios=802820/262643, aggrmerge=1458/98, aggrticks=2178768/335518, aggrin_queue=2514546, aggrutil=99.05%
  sda: ios=802820/262643, merge=1458/98, ticks=2178768/335518, in_queue=2514546, util=99.05%

Code:
# Second test with preexisting file.

$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.21
Starting 1 process
Jobs: 1 (f=1): [m(1)][100.0%][r=140MiB/s,w=46.1MiB/s][r=35.9k,w=11.8k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=6956: Tue Aug 18 03:14:19 2020
  read: IOPS=31.0k, BW=125MiB/s (131MB/s)(3070MiB/24577msec)
   bw (  KiB/s): min=18067, max=177856, per=100.00%, avg=128085.43, stdev=40458.00, samples=49
   iops        : min= 4516, max=44464, avg=32021.27, stdev=10114.52, samples=49
  write: IOPS=10.7k, BW=41.7MiB/s (43.8MB/s)(1026MiB/24577msec); 0 zone resets
   bw (  KiB/s): min= 6275, max=60312, per=100.00%, avg=42806.94, stdev=13609.43, samples=49
   iops        : min= 1568, max=15078, avg=10701.61, stdev=3402.45, samples=49
  cpu          : usr=6.44%, sys=49.73%, ctx=241921, majf=0, minf=7
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=125MiB/s (131MB/s), 125MiB/s-125MiB/s (131MB/s-131MB/s), io=3070MiB (3219MB), run=24577-24577msec
  WRITE: bw=41.7MiB/s (43.8MB/s), 41.7MiB/s-41.7MiB/s (43.8MB/s-43.8MB/s), io=1026MiB (1076MB), run=24577-24577msec

Disk stats (read/write):
    dm-2: ios=779113/260369, merge=0/0, ticks=592114/375472, in_queue=967586, util=98.17%, aggrios=786299/262809, aggrmerge=0/0, aggrticks=597756/377389, aggrin_queue=975145, aggrutil=98.11%
    dm-0: ios=786299/262809, merge=0/0, ticks=597756/377389, in_queue=975145, util=98.11%, aggrios=786179/262734, aggrmerge=120/73, aggrticks=360370/171541, aggrin_queue=531993, aggrutil=98.23%
  sda: ios=786179/262734, merge=120/73, ticks=360370/171541, in_queue=531993, util=98.23%
Code:
$ sudo sysctl -a | grep dirty
vm.dirty_background_bytes = 17179869184
vm.dirty_background_ratio = 0
vm.dirty_bytes = 17179869184
vm.dirty_expire_centisecs = 150
vm.dirty_ratio = 0
vm.dirty_writeback_centisecs = 25
vm.dirtytime_expire_seconds = 9

Raid: 6x re 4 drives: Waiting for this... Taking 30 minutes... Redoing. Forgot to remove the files after restarting. Not doing this test again for a bit.

Code:
$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.21
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][99.9%][r=1221KiB/s,w=468KiB/s][r=305,w=117 IOPS][eta 00m:01s]
test: (groupid=0, jobs=1): err= 0: pid=30801: Tue Aug 18 08:47:16 2020
  read: IOPS=465, BW=1861KiB/s (1905kB/s)(3070MiB/1689628msec)
   bw (  KiB/s): min=    7, max= 4848, per=100.00%, avg=1871.71, stdev=860.37, samples=3356
   iops        : min=    1, max= 1212, avg=467.84, stdev=215.09, samples=3356
  write: IOPS=155, BW=622KiB/s (637kB/s)(1026MiB/1689628msec); 0 zone resets
   bw (  KiB/s): min=    7, max= 1580, per=100.00%, avg=633.04, stdev=283.49, samples=3316
   iops        : min=    1, max=  395, avg=158.12, stdev=70.90, samples=3316
  cpu          : usr=0.75%, sys=3.72%, ctx=786359, majf=0, minf=7
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=1861KiB/s (1905kB/s), 1861KiB/s-1861KiB/s (1905kB/s-1905kB/s), io=3070MiB (3219MB), run=1689628-1689628msec
  WRITE: bw=622KiB/s (637kB/s), 622KiB/s-622KiB/s (637kB/s-637kB/s), io=1026MiB (1076MB), run=1689628-1689628msec

Disk stats (read/write):
    md127: ios=1518582/508616, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=253030/169972, aggrmerge=96/164, aggrticks=29612300/4718268, aggrin_queue=34460811, aggrutil=73.05%
  sdf: ios=241726/171883, merge=31/24, ticks=14092330/476543, in_queue=14638250, util=65.50%
  sdd: ios=199980/165570, merge=23/383, ticks=8747882/547114, in_queue=9380725, util=60.80%
  sdg: ios=263819/172514, merge=91/40, ticks=34148155/8038456, in_queue=42355444, util=32.56%
  sde: ios=269005/171792, merge=276/115, ticks=60122375/10491011, in_queue=70782755, util=73.05%
  sdc: ios=291931/165563, merge=56/390, ticks=29043228/1622936, in_queue=30824624, util=71.59%
  sdh: ios=251722/172515, merge=100/36, ticks=31519833/7133553, in_queue=38783069, util=31.70%

Code:
sudo mdadm -D /dev/md*
mdadm: /dev/md does not appear to be an md device
/dev/md127:
           Version : 1.2
     Creation Time : Mon Aug 17 21:07:47 2020
        Raid Level : raid10
        Array Size : 732954624 (699.00 GiB 750.55 GB)
     Used Dev Size : 244318208 (233.00 GiB 250.18 GB)
      Raid Devices : 6
     Total Devices : 6
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Tue Aug 18 08:17:02 2020
             State : active
    Active Devices : 6
   Working Devices : 6
    Failed Devices : 0
     Spare Devices : 0

            Layout : near=2
        Chunk Size : 1048576K

Consistency Policy : bitmap

              Name : localhost-live.attlocal.net:10  (local to host localhost-live.attlocal.net)
              UUID : e738a64c:b49d0cf9:77313225:28c9bc6e
            Events : 5

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync set-A   /dev/sdc
       1       8       48        1      active sync set-B   /dev/sdd
       2       8       64        2      active sync set-A   /dev/sde
       3       8       80        3      active sync set-B   /dev/sdf
       4       8       96        4      active sync set-A   /dev/sdg
       5       8      112        5      active sync set-B   /dev/sdh

And yet when I do the actual file transfer I get at leats 120mb/s?!

Would offset be good with this sort of large chunk size? I'm assuming this is not optimal. I don't think it's any different with a normal chunk size or dirty settings.

The upside is this test is actually using 12.2gb of my ram(I think only 8gb is the actual transfer). Not sure why this does and not my normal file transfers. I'm going to let the drives be fully done via create instead of assume clean and see if it makes a difference. Does formatting the raid in gnome disk utility afterwords or with any program mess up the alignment after initial creation? That is another potential issue.

It also won't retain the name md10. I can't figure out why.

When I start delete and restart the file transfer it gets weird burst mode. I wonder if that is a legitimate way to do raid. Can whatever is happening be used to make it run full speed?

Is raid 0 better for testing purposes?
 
Last edited by a moderator:
I repaired and checked the file system instead of reformatting it. This is getting less than a minute with the 4gig test:

Code:
$ sudo fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.21
Starting 1 process
Jobs: 1 (f=1): [m(1)][100.0%][r=151MiB/s,w=49.7MiB/s][r=38.7k,w=12.7k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=44992: Tue Aug 18 11:32:43 2020
  read: IOPS=33.3k, BW=130MiB/s (137MB/s)(3070MiB/23576msec)
   bw (  KiB/s): min=16830, max=167632, per=100.00%, avg=133591.87, stdev=47595.49, samples=47
   iops        : min= 4207, max=41908, avg=33397.96, stdev=11898.94, samples=47
  write: IOPS=11.1k, BW=43.5MiB/s (45.6MB/s)(1026MiB/23576msec); 0 zone resets
   bw (  KiB/s): min= 5397, max=55568, per=100.00%, avg=44647.55, stdev=15975.42, samples=47
   iops        : min= 1349, max=13892, avg=11161.77, stdev=3993.89, samples=47
  cpu          : usr=7.48%, sys=45.87%, ctx=269502, majf=0, minf=6
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=130MiB/s (137MB/s), 130MiB/s-130MiB/s (137MB/s-137MB/s), io=3070MiB (3219MB), run=23576-23576msec
  WRITE: bw=43.5MiB/s (45.6MB/s), 43.5MiB/s-43.5MiB/s (45.6MB/s-45.6MB/s), io=1026MiB (1076MB), run=23576-23576msec

Disk stats (read/write):
    dm-1: ios=781655/261276, merge=0/0, ticks=608360/379836, in_queue=988196, util=99.73%, aggrios=785920/262711, aggrmerge=0/0, aggrticks=613678/381500, aggrin_queue=995178, aggrutil=99.54%
    dm-0: ios=785920/262711, merge=0/0, ticks=613678/381500, in_queue=995178, util=99.54%, aggrios=785466/262614, aggrmerge=459/96, aggrticks=496533/225885, aggrin_queue=722611, aggrutil=99.63%
  sda: ios=785466/262614, merge=459/96, ticks=496533/225885, in_queue=722611, util=99.63%
Code:
$ sudo sysctl -a | grep dirty
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 43200
Here is the same test with the other dirty layout:
Code:
$ sudo fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75 
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.21
Starting 1 process
Jobs: 1 (f=1): [m(1)][100.0%][r=156MiB/s,w=51.2MiB/s][r=39.8k,w=13.1k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=45982: Tue Aug 18 11:36:10 2020
  read: IOPS=34.2k, BW=134MiB/s (140MB/s)(3070MiB/22971msec)
   bw (  KiB/s): min=15473, max=178376, per=100.00%, avg=137228.58, stdev=48632.96, samples=45
   iops        : min= 3868, max=44594, avg=34307.09, stdev=12158.36, samples=45
  write: IOPS=11.4k, BW=44.7MiB/s (46.8MB/s)(1026MiB/22971msec); 0 zone resets
   bw (  KiB/s): min= 5133, max=59912, per=100.00%, avg=45867.78, stdev=16286.19, samples=45
   iops        : min= 1283, max=14978, avg=11466.87, stdev=4071.62, samples=45
  cpu          : usr=6.49%, sys=46.06%, ctx=275876, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=134MiB/s (140MB/s), 134MiB/s-134MiB/s (140MB/s-140MB/s), io=3070MiB (3219MB), run=22971-22971msec
  WRITE: bw=44.7MiB/s (46.8MB/s), 44.7MiB/s-44.7MiB/s (46.8MB/s-46.8MB/s), io=1026MiB (1076MB), run=22971-22971msec

Disk stats (read/write):
    dm-1: ios=784039/262196, merge=0/0, ticks=530436/346596, in_queue=877032, util=99.66%, aggrios=785920/262828, aggrmerge=0/0, aggrticks=532380/347303, aggrin_queue=879683, aggrutil=99.54%
    dm-0: ios=785920/262828, merge=0/0, ticks=532380/347303, in_queue=879683, util=99.54%, aggrios=785246/262671, aggrmerge=679/152, aggrticks=420972/194342, aggrin_queue=615515, aggrutil=99.67%
  sda: ios=785246/262671, merge=679/152, ticks=420972/194342, in_queue=615515, util=99.67%
Code:
$ sudo sysctl -a | grep dirty             
vm.dirty_background_bytes = 17179869184
vm.dirty_background_ratio = 0
vm.dirty_bytes = 17179869184
vm.dirty_expire_centisecs = 150
vm.dirty_ratio = 0
vm.dirty_writeback_centisecs = 25
vm.dirtytime_expire_seconds = 9

These may be the ssd. Redoing.

NVM, that was the messed up ssd speeds. I can't take this for now. I'm going to take a nap for a bit and maybe work on it later. I can't even find basic information on how to check which superblock or other things to understand what is going on. And all info is 10 plus years old with no way of telling if it's outdated let alone even check personally because of the lazy nonsense that comes from these idiots. I'm getting really fed up with the useless linux community.

FYI(or for example), I know you can use examine and other things to see the array. But none of them give superblock info like the version or other things. I need actual info to figure out the performance. This is as stupid as things get. Endless idiots clammering for ad money for their blogs giving half assed information at every point. You can't learn thing on your own even if you want to anymore. The entire tech world is now filled with completely useless idiots. If not it will be as google does nothing but throw the lowest denominator at you for their financial gain and without knowing other terms that aren't flooded it's impossible to find solid information now. Which probably get filled quickly from stupid google info the get to read to make crappy websites for...
 
Last edited by a moderator:
A significant part of that is because doing it "this way" has been slowly dying for quite some time. I'm going to dig through and see what I remember of this, but most workloads have either migrated to a SAN (especially if virtualized), or are using something like BTRFS/ZFS instead of MDRAID these days (or even LVM). Just the progression of the technology. Thinking back, it's probably been 10+ years for me to even try any of this, but I'll see what I remember.
 
I assumed there was no fundamental difference between using one file system over another. I can't find anything on any of them to do it though. I've tried putting different file systems including lvm. It does not change the performance at all. Or are you refering to something else?

Fundamentally there seems to be something effecting performance. I'm assuming and from testing(which I could ahve done wrong) it's not the buss since the drive array is as slow as something could be to start with even at max speed. I don't understand what is causing the problem to start with. I keep reading back and forth info. One says raid now uses all disks with mdadm. Others indicate you have to misalign things to make them all get used. Nothing makes any sense. And there are no specifics to anything to understand what the are talking about in detail.

What does moving to a virtualized sans or any other file or system have to do with the reality my raid is working like a single hard drive performance wise unless I stop and start the copy process and what I assume is making it flush for a few GB. I'm assuming this is some stupid linux software nonesense or something to do with allignment. Am I mistaken in this?! I"m trying to get a grasp on what is even happening.

Also, I tried removing the bitmap. This increased the IOPs in teh fio test by a long shot. But it still ran at the same file transfer speed in practice. What the heck is going on?!?!

https://louwrentius.com/the-impact-of-the-mdadm-bitmap-on-raid-performance.html

Now the fio tests takes 15 minutes to start instead of 30.... It doubled the IOP's for writes, but it doesn't budge as far as the base file transfer. Still 120-133mb/s?! does this say what is going on. Is it the alignment of the chunks vs the physical 512k physical locations?! Which BTW, I don't remember how to adjust anymore and can't find info on.

All base performance tests act as if the drive can and should be using fullspeed. DD and other things all show it can get max performance, unless those are unreliable. Although the weird massive starting and stopping of a copy job might support it can reach those speeds. Does it have to do with the linux kernal not using more than 1gb at a time? It's not using ram during file transfers even though I'm copying a 33gb file. The system is not doing anything to use the raid. Did linux drop all raid support for physical disks on desktop and just choose to be asses like they normally do with some things?!

Nothing I do changes the performance. And it's not the performance it should get. It's all single disk. This makes no sense.

Here is a test result with no bitmap:
Code:
$ sudo fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.21
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=2220KiB/s,w=736KiB/s][r=555,w=184 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=126828: Wed Aug 19 03:26:08 2020
  read: IOPS=769, BW=3078KiB/s (3152kB/s)(3070MiB/1021405msec)
   bw (  KiB/s): min=  400, max= 5804, per=100.00%, avg=3080.99, stdev=751.44, samples=2039
   iops        : min=  100, max= 1451, avg=770.17, stdev=187.88, samples=2039
  write: IOPS=257, BW=1029KiB/s (1053kB/s)(1026MiB/1021405msec); 0 zone resets
   bw (  KiB/s): min=   47, max= 2064, per=100.00%, avg=1030.18, stdev=298.99, samples=2038
   iops        : min=   11, max=  516, avg=257.45, stdev=74.78, samples=2038
  cpu          : usr=1.52%, sys=6.99%, ctx=708447, majf=0, minf=7
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=3078KiB/s (3152kB/s), 3078KiB/s-3078KiB/s (3152kB/s-3152kB/s), io=3070MiB (3219MB), run=1021405-1021405msec
  WRITE: bw=1029KiB/s (1053kB/s), 1029KiB/s-1029KiB/s (1053kB/s-1053kB/s), io=1026MiB (1076MB), run=1021405-1021405msec

Disk stats (read/write):
    md10: ios=785874/267914, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=131126/89361, aggrmerge=69/152, aggrticks=9715045/1307764, aggrin_queue=11064546, aggrutil=70.98%
  sdf: ios=196576/124785, merge=29/9, ticks=11329403/655252, in_queue=12035445, util=68.93%
  sdd: ios=196611/75710, merge=229/448, ticks=7704569/224224, in_queue=7958968, util=70.98%
  sdg: ios=170335/124487, merge=115/25, ticks=13776624/3585545, in_queue=17417518, util=31.34%
  sde: ios=172167/124552, merge=44/7, ticks=25113338/3182450, in_queue=28401715, util=68.93%
  sdc: ios=24869/19046, merge=0/425, ticks=106398/19027, in_queue=131499, util=12.69%
  sdh: ios=26199/67590, merge=0/0, ticks=259941/180089, in_queue=442135, util=19.22%

BTW, I don't get what you mean by out of use. What part of what I'm doing is any different or are you saying is old? SSD's vs HDD's? If not I thought the underlying reality was the same regardless for HDD's.

I can't even find information on what the things in the fio test mean. And it being out of use shouldn't get rid of information. This stuff comes in and out of use constantly. It always does. If things were documented properly to start with it wouldn't be an issue.(Which btw, it's illegal for a college to give out a degree without the recipient fully knowing and being able to do fully and easily!)

Also, I think this is two stripes of 3 mirrored instead of 3 mirrors. That seems to be the default with mdadm. Not sure how to make it use all 3 disks. The tests utilization seems to indicate that it's only using one stripe over the other also and in a weird way.

sdc/e/g are one stripe and sdd/f/h are another. Although now I'm not sure with the offset setting....

Code:
$ sudo mdadm -D /dev/md10
/dev/md10:
           Version : 1.2
     Creation Time : Wed Aug 19 03:04:48 2020
        Raid Level : raid10
        Array Size : 729808896 (696.00 GiB 747.32 GB)
     Used Dev Size : 243269632 (232.00 GiB 249.11 GB)
      Raid Devices : 6
     Total Devices : 6
       Persistence : Superblock is persistent

       Update Time : Wed Aug 19 12:17:18 2020
             State : clean
    Active Devices : 6
   Working Devices : 6
    Failed Devices : 0
     Spare Devices : 0

            Layout : offset=2
        Chunk Size : 1048576K

Consistency Policy : resync

              Name : localhost-live.attlocal.net:10  (local to host localhost-live.attlocal.net)
              UUID : 3bb33cc4:bd012be8:90733bf3:02c23931
            Events : 2

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       1       8       48        1      active sync   /dev/sdd
       2       8       64        2      active sync   /dev/sde
       3       8       80        3      active sync   /dev/sdf
       4       8       96        4      active sync   /dev/sdg
       5       8      112        5      active sync   /dev/sdh

It doesn't even list it in groups now....

Read test:
Code:
$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read.fio --bs=4k --iodepth=64 --size=4G --readwrite=randread
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.21
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [r(1)][100.0%][r=14.8MiB/s][r=3795 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=193991: Wed Aug 19 12:31:14 2020
  read: IOPS=1252, BW=5011KiB/s (5131kB/s)(4096MiB/837067msec)
   bw (  KiB/s): min= 2307, max=16950, per=100.00%, avg=5013.74, stdev=874.01, samples=1671
   iops        : min=  576, max= 4237, avg=1253.36, stdev=218.51, samples=1671
  cpu          : usr=1.88%, sys=8.40%, ctx=759944, majf=0, minf=71
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=1048576,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=5011KiB/s (5131kB/s), 5011KiB/s-5011KiB/s (5131kB/s-5131kB/s), io=4096MiB (4295MB), run=837067-837067msec

Disk stats (read/write):
    md10: ios=1048462/2, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=174805/2, aggrmerge=130/0, aggrticks=8905688/184, aggrin_queue=8906039, aggrutil=83.91%
  sdf: ios=100524/2, merge=0/0, ticks=949710/53, in_queue=949816, util=53.04%
  sdd: ios=262297/4, merge=19/0, ticks=9489670/89, in_queue=9489847, util=83.91%
  sdg: ios=172/2, merge=0/0, ticks=78/16, in_queue=110, util=0.03%
  sde: ios=229537/2, merge=11/0, ticks=8680969/86, in_queue=8681141, util=80.50%
  sdc: ios=229418/4, merge=135/0, ticks=8879868/166, in_queue=8880200, util=80.40%
  sdh: ios=226885/2, merge=620/0, ticks=25433835/695, in_queue=25435121, util=31.85%
Write test:
Code:
$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.21
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [w(1)][100.0%][w=10.3MiB/s][w=2626 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=201217: Wed Aug 19 13:06:21 2020
  write: IOPS=730, BW=2922KiB/s (2992kB/s)(4096MiB/1435257msec); 0 zone resets
   bw (  KiB/s): min=    7, max=10440, per=100.00%, avg=2950.22, stdev=1089.29, samples=2840
   iops        : min=    1, max= 2610, avg=737.44, stdev=272.35, samples=2840
  cpu          : usr=0.80%, sys=7.61%, ctx=452531, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,1048576,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=2922KiB/s (2992kB/s), 2922KiB/s-2922KiB/s (2992kB/s-2992kB/s), io=4096MiB (4295MB), run=1435257-1435257msec

Disk stats (read/write):
    md10: ios=0/1646526, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=297/392071, aggrmerge=0/157113, aggrticks=7334/14222126, aggrin_queue=14295449, aggrutil=97.04%
  sdf: ios=297/172129, merge=0/4553, ticks=285/60550, in_queue=62168, util=9.20%
  sdd: ios=297/505322, merge=0/462181, ticks=890/1671866, in_queue=1712775, util=27.59%
  sdg: ios=297/409284, merge=0/8299, ticks=698/580599, in_queue=596196, util=18.31%
  sde: ios=297/214842, merge=0/51486, ticks=375/179244, in_queue=182077, util=10.19%
  sdc: ios=297/551165, merge=0/412480, ticks=32263/62186264, in_queue=62432608, util=97.04%
  sdh: ios=297/499689, merge=0/3682, ticks=9494/20654234, in_queue=20786873, util=42.31%

Edit: I'm going to go out on a limb and say it's using a single disk... Not sure if it was doing this in near.

Does it matter if these tests are being done on an existing file?

And how it is taking 15+ minutes to do a random read write test?! How is it not fully loading the cache and simply filling it over and over in a timely manner? It should be done in seconds.
 
Last edited by a moderator:
NVM, that was the messed up ssd speeds. I can't take this for now. I'm going to take a nap for a bit and maybe work on it later. I can't even find basic information on how to check which superblock or other things to understand what is going on. And all info is 10 plus years old with no way of telling if it's outdated let alone even check personally because of the lazy nonsense that comes from these idiots. I'm getting really fed up with the useless linux community.

FYI(or for example), I know you can use examine and other things to see the array. But none of them give superblock info like the version or other things. I need actual info to figure out the performance. This is as stupid as things get. Endless idiots clammering for ad money for their blogs giving half assed information at every point. You can't learn thing on your own even if you want to anymore. The entire tech world is now filled with completely useless idiots. If not it will be as google does nothing but throw the lowest denominator at you for their financial gain and without knowing other terms that aren't flooded it's impossible to find solid information now. Which probably get filled quickly from stupid google info the get to read to make crappy websites for...

Since you are fed up with the members of "the useless linux community" and "it's impossible to find solid information now" then I suggest you contact a SI or SE firm, who would be MORE THAN HAPPY to slot you up for a few hundred hours of testing and equipment replacement to get you whatever performance you crave. DDN is an excellent choice.
 
This is for a home computer. Why on gods earth would I do that. It's bad enough everyone is this irresponsible and can't document things properly to carry out their responsibilities. It would be wrong of me to give money to anyone until and unless that is resolved. You have to learn to do work properly before making money. Not the other way around. Why would I support a company taking advantage of such a situation?

And why do you appear to be taking offense to that statement? Or is that an ad for your company?!
 
Last edited by a moderator:
Just my opinion (but I'm right ... a lot)... fio isn't bad, but can be complicated to get "right". For general filesystem benchmarking I'd use bonnie++ The ReWrite figure is a good indicator of general mixed use speed (what the average consumer will likely experience).

I've been doing this for years. While other tools are more "exact"... they usually just prove what I already found out with a quick bonnie++ test.

(bonnie++ isn't bonnie... avoid bonnie)

And like many, bonnie++ from just one client is just that... a single client test. I mention that because sometimes there are huge pathways... and if you don't use multiple clients, you can't really fill all the things that could be restricted by other parts of the overall system. You won't get what the storage system can actually deliver.
 
I think fio is hanging up the drive indefinitely also. Every time I use it I can't stop the array even after the test is done. One more example of wonderful linux software..

I'll try bonnie++

https://gzhls.at/blob/ldb/7/6/1/e/ee7a79edc97b885933949eeefdb2d9fbdf1b.pdf Here is the disk specs. It's the 251gb model.

I thought these sheets normally had platters and whatnot also..

Hopefully if that single disk usage is that good it will be good when it's working with all disks properly.

I'm assuming the problem is it's not using all disks. How do I make them use all the disks at once?

From what I can tell the disks should be getting around 75 iops each. Possibly average of around 83.3333. So I should be doing 250 iops write and 500 iops read. If I had full disk utilization.
 
Last edited by a moderator:
I assumed there was no fundamental difference between using one file system over another. I can't find anything on any of them to do it though. I've tried putting different file systems including lvm. It does not change the performance at all. Or are you refering to something else?

This is incorrect; most modern filesystems include the LVM / data redundancy system built in (NTFS aside, which still uses windows dyanamic disks LVM instead).

ZFS/BTRFS/etc you pass the filesystem the raw disks and it handles placing data for redundancy/performance/parity/etc; software defined raid generally outperforms and is more portable/reliable than hardware RAID these days, which is why the filesystems have taken it up. MDRAID is built on emulating / acting like a traditional raid system; fixed width, defined disks, etc - vs the others that are far more flexible. This also gives you snapshots, performance management, caching, etc etc etc.

LVM is just an alternative way of handling multiple devices; in my experience, generally used to group together multiple "reliable" devices to increase queuing (SAN devices) or to make resizing more flexible/easier (VMs, etc). It's built in to most modern filesystems.

That's part of why you're not finding a ton of data - most people aren't using MDRAID anymore, just because, well, you don't really need it - and if you do, chances are you really really know what you're doing already (no offense intended, there are senior linux devs that use it, but of all the people, they know what they're doing).
 
most people aren't using MDRAID anymore

I still use it on a raid6 array I created 7 or so years ago at work however that is a small 10 x 2TB array which I could now replace with a single drive or a raid 1 of 2 or even a 3 way raid 1 for extra redundancy. On new storage I am using zfs for all arrays. I used to tune the mdadm raid parameters but that was very long ago.
 
Last edited:
I still use it on a raid6 array I created 7 or so years ago at work however that is a small 10 x 2TB array which I could now replace with a single drive or a raid 1 of 2 or even a 3 way raid 1 for extra redundancy. On new storage I am using zfs for all arrays. I used to tune the mdadm raid parameters but that was very long ago.

Yeah, legacy setups will keep it - but I haven't seen anyone intentionally build a ~new~ one since at least 2014, off top of head - and that was appliance software that was written poorly.
 
Which raid setup with 6 disks get the best performance in practice. Shouldn't they all be using as many disks as possible normally? Can anything at least get the 3x read write a raid 10 can potentially.
 
Last edited by a moderator:
Which raid setup with 6 disks get the best performance in practice. Shouldn't they all be using as many disks as possible normally? Can anything at least get the 3x read write a raid 10 can potentially.

I’d try ZFS with its single parity configuration for that, most likely. RAIDZ1. But write performance will suffer with that configuration
 
I’d try ZFS with its single parity configuration for that, most likely. RAIDZ1. But write performance will suffer with that configuration
No reason not to do a stripe of mirrors with ZFS as well to maintain the raw advantages of the disc arrangement while also being backed up by ZFS. I do a four-disc array for my primary NAS this way, which is then backed up to a parity-based array using cheaper disks.
 
Example with two pools, freshly scrubbed:

pool: BackupPool
state: ONLINE
scan: scrub repaired 0B in 0 days 03:22:52 with 0 errors on Thu Aug 20 01:03:58 2020
config:

NAME STATE READ WRITE CKSUM
BackupPool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-ST8000DM004-2CX188_WG800ZNL ONLINE 0 0 0
ata-ST8000DM004-2CX188_WG802AEY ONLINE 0 0 0
ata-ST8000DM004-2CX188_WCT0PBWB ONLINE 0 0 0
ata-ST8000DM004-2CX188_WCT0P2Z7 ONLINE 0 0 0

errors: No known data errors

pool: NAS
state: ONLINE
scan: scrub repaired 0B in 0 days 04:29:08 with 0 errors on Thu Aug 20 02:10:26 2020
config:

NAME STATE READ WRITE CKSUM
NAS ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-ST6000VN0033-2EE110_ZAD54QCR ONLINE 0 0 0
ata-ST6000VN0033-2EE110_ZAD5PKE2 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-ST6000VN0033-2EE110_ZAD5PGJ3 ONLINE 0 0 0
ata-ST6000VN0033-2EE110_ZAD52R2T ONLINE 0 0 0

errors: No known data errors
 
https://serverfault.com/questions/751752/poor-performance-with-linux-software-raid-10

This looks very much like a bottleneck caused by the single thread in md.

Why haven't they updated md to use more threads?! Is this the issue? I thought this stuff was up to date ages ago and always used multiple threads. Did the guy making it stop updating it?

https://wiki.archlinux.org/index.php/RAID

It uses the word layer over and over but no definition to it anywhere. Or it's hidden randomly. What is it referring too when it says layer?

My brain is literally trying to cannibalize itself trying to read this. Does anyone learn how to do proper technical writing anymore? We live in the post millennium world where all of the advancements in documentation died and all cross links have vanished from the world. Let alone basic understanding of writing and the audience. I'm still traumatized from those events. Is that what they were destroying in 9/11?! Did they kill all human knowledge in regards to the need to define your documents own terms and/or how to crosslink something in a computer document to enhance previous writing techniques.... It really may have been the worlds most tragic event.

Does anyone remember how much better documentation was back in the 90's and earlier?!

BTW, I'm completely shocked all raid setups aren't all multi core and much more advanced than this. I can make it flush nearly 500mb/s for about 10gb of data with stopping and starting. I'm surprised they can't make raid automatically do this. It would be equal or better to ssd's if it could be sustained.
 
Last edited by a moderator:
https://serverfault.com/questions/751752/poor-performance-with-linux-software-raid-10



Why haven't they updated md to use more threads?! Is this the issue? I thought this stuff was up to date ages ago and always used multiple threads. Did the guy making it stop updating it?
Because no one uses MDRAID anymore. It was also written in the 90s. No idea who the maintainer is on it, but it's certainly not high priority - people have moved to modern filesystems instead of RAID. If you actually need old school RAID, you buy a RAID card.
https://wiki.archlinux.org/index.php/RAID

It uses the word layer over and over but no definition to it anywhere. Or it's hidden randomly. What is it referring too when it says layer?
Different places, different uses - most of the time they're referring to doing true native RAID 10 / 50 / etc instead of a RAID5 and then mirroring it, etc
My brain is literally trying to cannibalize itself trying to read this. Does anyone learn how to do proper technical writing anymore? We live in the post millennium world where all of the advancements in documentation died and all cross links have vanished from the world. Let alone basic understanding of writing and the audience. I'm still traumatized from those events. Is that what they were destroying in 9/11?! Did they kill all human knowledge in regards to the need to define your documents own terms and/or how to crosslink something in a computer document to enhance previous writing techniques.... It really may have been the worlds most tragic event.
It's an open source project and an open source wiki - there's no paid writer for Arch as far as I know, and it is what it is. I followed it just fine, but I know storage - that's my career. If you're reading the wiki for Arch, it's somewhat assumed you know what you're doing - triply so if you're reading the mdraid wiki. Go dig up RedHat docs - they're paid and tend to be ... better... for laymen.
Does anyone remember how much better documentation was back in the 90's and earlier?!

BTW, I'm completely shocked all raid setups aren't all multi core and much more advanced than this. I can make it flush nearly 500mb/s for about 10gb of data with stopping and starting. I'm surprised they can't make raid automatically do this. It would be equal or better to ssd's if it could be sustained.
They are. No one uses MDRAID anymore. ZFS is multi-threaded, BTRFS is multi-threaded, ReFS is multi-threaded, etc etc etc. Seriously, my notes on MDRAID were updated / accessed last in 2011 - NINE years ago. Back then, ZFS was really solaris only, BSD had a half-assed implementation, and linux didn't even try.

Throughput is easy; IOPS are hard. RAID will never equal SSDs based on the fact that almost all use cases people care about (eg: have money in) require high-IOPS first - spinning disks (unless you literally have 10,000+ - I just decommissioned a system with that) can't get there, only SSDs can. Once IOPS are an afterthought, then you go back to caring about throughput and/or latency, which is why NVMe is now king (gets all 3 at once - with a cost - and SAS / SATA SSDs as secondary considerations.

You're doing what we used to do circa mid-2000s. Fifteen years ago people dug into this a lot - and then, we had single core systems and had just gotten AMD64.

Seriously, if you're doing this now, you use a filesystem with data reliability built in. If you need old-school raid for some reason, you buy a RAID card (again; all things are cyclical) and let it handle things.
 
So, I tired to get zfs, but it seems to not be available on fedora atm(or it says the daemon isn't working.). Now trying btrfs, but I used the command in the tutorial and it simply put the file system on a bunch of disks but didn't assemble it into a single drive. It's shown as 6 different drives and no way to access the data?! Why does it even have a raid 10 thing if it doesn't make a raid 10? Is this only for hardware raids?

Code:
$ sudo mkfs.btrfs -m RAID10 -d RAID10 -f /dev/sd[cdefgh]
btrfs-progs v5.7
See http://btrfs.wiki.kernel.org for more information.

Label:              (null)
UUID:               9c75d6d9-1ac5-4030-b058-308682e21d18
Node size:          16384
Sector size:        4096
Filesystem size:    1.37TiB
Block group profiles:
  Data:             RAID10            3.00GiB
  Metadata:         RAID10          511.88MiB
  System:           RAID10            7.88MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Runtime features: 
Checksum:           crc32c
Number of devices:  6
Devices:
   ID        SIZE  PATH
    1   233.82GiB  /dev/sdc
    2   233.82GiB  /dev/sdd
    3   233.82GiB  /dev/sde
    4   233.82GiB  /dev/sdf
    5   233.82GiB  /dev/sdg
    6   233.82GiB  /dev/sdh

And I'm using software raid in order for it it to be transfered between systems potentially. And it cost money to get a raid card.
 
Last edited by a moderator:
Now it won't let me paste anything into the folder. Is that normal? I was trying to copy and paste a game folder into it to see how much performance it got.

Edit: NVM, it seems to be the permissions.

It's getting the same 133mb/s and slowing.

How do you get a raid 10 of 6 disks to get the 300mb/s plus it should be getting?

It should be running 210-399mb/s easily. mixed writes on these drives are 70mb/s. It's literally running bare minimum at all times with only 40mb/s. What am I not setting up correctly?

It's going up and down from 133-127ish. So, it seems to be running from all 3 disks at near 40mb/s or so.

I set it to this and now it's doing the same burst mode it was before but still slowing down to a lower speed by the end. But it's not maxing out a cpu core so it's dealing with that better.

Code:
vm.dirty_background_bytes = 17179869184
vm.dirty_background_ratio = 0
vm.dirty_bytes = 17179869184
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 0
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 43200

It's slowing down around 160mb/s, but it's not going back up. And it's not using a lot of ram. Only 2.2 gb over the entire system. file transfer ended around 150mb/s. Might have still been because I had copy and pasted previously.
 
Last edited by a moderator:
It's a copy of warthunders main game file. It's a mix.

My drive seems to be 1/3rd full also and may not be retaining files. At least not on the surface. Or it may be misreporting used and full. It doesn't change when I add a file.

It also keeps adding infinite version of the folder to the mount area of my file system if I click on it or try to mount in gnome disk utility. Apparently fedora 32 and caja aren't designed with btfrs in mind or something. Despite the age. Edit: the mounts preserved over a log out. And so did the file. Edit2: The reset on a system reboot. And so did the file thankfully.

Do I have to purposely missalign the file system in order to get full performance?
 
Last edited by a moderator:
Smaller files have file ops overhead; that requires IOPS more than throughout.

Try copying over a single 50G file or the like. That’ll be pure throughput. Let’s sanity check the operation first.
 
It started at 500mb/s and ended a bit over 400mb/s with 53.7gb dd test image.

How do I let sanity check it or is that not a program? There is a program called qemu-sanity-check.

what is stupid is it does not burst the warthunder file above 133mb/s unless I stop and start it once. I'm suprised it cannot load more into ram and use the cache more.

Edit: This little change and it's now sustaining 580mb/s...
Code:
$ sudo sysctl -a | grep dirty
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 17179869184
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 0
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 43200

Ended at around 474mb/s...

Now it's starting at 622mb/s and slowly dropping... It's faster than my ssd. It's ending at the same 475ish mb/s. Transfers are around 105 seconds for 50gb.

The warthunder folder still only does 133mb/s on the first copy job.

Each copy job of the test.img is faster than the last(maxing at around 650mb/s starting speed.). Should I go out on a limb and put my tinfoil hat on and wonder if these programs are not written correctly to stop older hardware from maintaining relevance....

Is there a way to set btrfs in order to maximize that file type? Warthunder might be made of more small files than I thought. I figured all the vehicle data would be decent size.

And if you can get that much performance that easily. I'm surprised they don't just chunk game folders into large files for transfers. Assuming that would work. Then just unpackage it when it has to be used and have a base file around for transfer. Or at least support for such things. In fact is there a way to do that? I would love to have improved file transfer by turning it all into a single large file. Assuming that would work.

Other problem is that it is acting like the disks are 2/3 full. Is this from inodes being added on a disk per disk basis and then overwhelming the raid volume?



Or do I not want to use the first disk as the mount point?

Code:
$ sudo fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.21
Starting 1 process
test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=8316KiB/s,w=2718KiB/s][r=2079,w=679 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=24881: Sat Aug 22 22:19:17 2020
  read: IOPS=694, BW=2777KiB/s (2843kB/s)(3070MiB/1132153msec)
   bw (  KiB/s): min= 1400, max= 9504, per=100.00%, avg=2779.77, stdev=314.98, samples=2261
   iops        : min=  350, max= 2376, avg=694.94, stdev=78.75, samples=2261
  write: IOPS=231, BW=928KiB/s (950kB/s)(1026MiB/1132153msec); 0 zone resets
   bw (  KiB/s): min=  344, max= 3136, per=100.00%, avg=929.08, stdev=150.48, samples=2261
   iops        : min=   86, max=  784, avg=232.26, stdev=37.62, samples=2261
  cpu          : usr=0.85%, sys=5.33%, ctx=661640, majf=0, minf=7
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=2777KiB/s (2843kB/s), 2777KiB/s-2777KiB/s (2843kB/s-2843kB/s), io=3070MiB (3219MB), run=1132153-1132153msec
  WRITE: bw=928KiB/s (950kB/s), 928KiB/s-928KiB/s (950kB/s-950kB/s), io=1026MiB (1076MB), run=1132153-1132153msec

I guess one downside is it's not giving the disk by disk utilization it does with mdadm. If the md version was only using one disk and some random other bit, i'm going to imagine this isn't using more than that either and isn't putting the writes across more when doing normal files.

Couldn't they logically make write on raid 10 write across all 6 disks and then do the redundancy like a file sync? Then it could run after the fact.
 
Last edited by a moderator:
How many files are in the game directory? The big file sounds right, which makes me think we have a ton of little files; and if you want to optimize that, well... use NVMe and tune the crap out of the metadata and file op subsystem.

from the cli:
find <directory> -type f | wc -l
 
Oh, and this is filesystem redundancy; it’s putting things in chunks like RAID10 across the disks. Where those land, there are utilities to figure out- but you’re generally not supposed to worry about it. That’s part of the point.
 
Back
Top