ZFS suitable for media streamer?

Again, you're confusing instruction length with data structure size, and then inferring some type of performance from that. A 64-bit processor is not automatically twice as fast as a 32-bit processor. CPU performance is way, WAY more complex than that. True, you can load in a larger data structure into a single register, but that doesn't mean performance is better.

A poorly designed 64-bit processor may actually have to do more work than a 32-bit processor. Just depends on the architecture.
 
Just because zfs is a 128-bit file system doesn't mean that any significant amount of the data processed when reading/writing data is 128-bit numbers, at least not in its basic form. Perhaps it can be loaded and stored in 128-bit chunks, but then you need to look at SSE performance which will blur the lines between register widths on 32/64 bit cpus.
 
And if the cpu is 128 bits, the entire Data1 field gets loaded at once and the checksum can be calculated in one step.

A checksum calculation is not embossed into the CPU as one instruction, nor is the block size only 128bit.

A ZFS block varies in size between I think 4KB and some upper limit in the megabytes. Checksum calculation is hugely complex in human terms, but easy for a CPU. That doesn't mean it does it in one step.

Doing a checksum over a single ZFS block involves - as a wild guess - thousands or tens of thousands register accesses. I don't really know for sure, but I know it's not one step.

Just look at the algorithm for SHA256 and you get an idea of the computations involved as well as the data sizes: http://csrc.nist.gov/groups/STM/cavp/documents/shs/sha256-384-512.pdf

Oh and BTW, SHA256 results in a 256bit checksum. I know you want to draw certain conclusions from that, but don't. I doesn't require a 256bit CPU and it's not dog slow on a 32bit CPU. A hint from the PDF: "Six logical functions are used in SHA-256. Each of these functions operates on 32-bit words and produces a 32-bit word as output."
 
Last edited:
This is now way OT but rather interesting. I design CPUs for a living. Brutalizer is correct when he says that a 64-bit CPU can do arithmetic on a 128-bit value in fewer instructions than a 32-bit CPU. (I think that's what he's saying, yes?) But as someone pointed out, the amount of 128-bit arithmetic in ZFS code is probably only a tiny fraction of the execution time, so that's mostly irrelevant.

Also some of you are confusing instruction length with word size. The term "32-bit CPU" does not refer to instruction length, at least not in x86 world, which has variable length instructions. In 32-bit mode an x86 instruction can be anywhere from 1 to 15 bytes. In 64-bit mode they can be from 1 to N bytes, where N>15 (I'm not sure exactly), but tend to be larger than 32-bit mode because offsets and immediate values in the code now take 8 bytes each instead of 4. This is why some programs compiled in 64-bit mode can actually run slower than the same code compiled for 32-bit mode: because the instructions are larger and fewer of them can fit into the instruction cache.

As for ZFS, I think I've talked myself out of using it for media storage.
 
This is now way OT but rather interesting. I design CPUs for a living. Brutalizer is correct when he says that a 64-bit CPU can do arithmetic on a 128-bit value in fewer instructions than a 32-bit CPU. (I think that's what he's saying, yes?)
This is exactly what I am trying to say. (And if you use 32 bit cpu, you also need to do some additional book keeping and administration, to keep track of which bits you have processed and not). But yes, it is interesting. I think everybody thinks so, because we are geeks on a tech site. :)

But as someone pointed out, the amount of 128-bit arithmetic in ZFS code is probably only a tiny fraction of the execution time, so that's mostly irrelevant.
Yes, so there should be a performance hit using 32bit cpus instead of 64 bit cpus. I have experienced 20-30MB/sec ZFS speed on 32bit cpus, and 100s of MB/sec on 64bit cpus. I am not talking about ARC, but disks. Goal of ZFS was to reach platter speed, so if you had 10 disks, it would go 10x faster - and I think ZFS has done substantial strides in that aspect. But, I would like to see 10 disks on a 4GB 32 bit cpu. On 64 bit cpus it should reach 400MB/sec or so.
 
This is exactly what I am trying to say. (And if you use 32 bit cpu, you also need to do some additional book keeping and administration, to keep track of which bits you have processed and not). But yes, it is interesting. I think everybody thinks so, because we are geeks on a tech site. :)

I should qualify that I wasn't considering SSE and other extensions like that. Some instructions, even on a so-called "32-bit CPU", can operate on 64-bit data or even 128-bit data. However I don't know if a typical compiler would utilize those instructions without specially-crafted code.

The difference between an 8-bit CPU and 16-bit CPU was pretty clear cut back in the day. But "32-bit" vs "64-bit" is a little more fuzzy. That number has never referred to the physical address width -- the 16-bit 8088 had a 20-bit address bus, the 80286 had 24. Even some 32-bit Intel CPUs had PAE and PSE-36 which could use 64GB RAM, and today's x86-64 CPUs have an address bus somewhere south of 64 bits wide (I think 48 is typical).

The virtual address space does indeed widen from 32 to 64, which makes a big difference for memory-hungry code like ZFS.
 
This is exactly what I am trying to say. (And if you use 32 bit cpu, you also need to do some additional book keeping and administration, to keep track of which bits you have processed and not). But yes, it is interesting. I think everybody thinks so, because we are geeks on a tech site. :)


Yes, so there should be a performance hit using 32bit cpus instead of 64 bit cpus. I have experienced 20-30MB/sec ZFS speed on 32bit cpus, and 100s of MB/sec on 64bit cpus. I am not talking about ARC, but disks. Goal of ZFS was to reach platter speed, so if you had 10 disks, it would go 10x faster - and I think ZFS has done substantial strides in that aspect. But, I would like to see 10 disks on a 4GB 32 bit cpu. On 64 bit cpus it should reach 400MB/sec or so.

There is a performance and stability hit on a 32-bit CPU, but not because of the reason you are leaning towards (128-bit file system). As others have stated, it's more to do with memory limitations.

Additionally, if raw performance is what you're after, don't be mislead to think that 10 disks in ZFS are ~10x faster than a single disk. ZFS is primarily concerned with data integrity, not performance.
 
What CPU is this anyway that doesn't do 64bit and what was it compared to? Old Pentium III vs. Core 2 or what?
 
The "book keeping" is called RAM.
You have to keep track of which 32 bit pattern you are processing, and which you have not processed yet. etc. These bits are typically stored in a register, not in RAM.

Just like when you manually subtract 100-99, you have to keep track of overflow digits (keep 1 in memory, etc). So the bookkeeping increases slightly.

There is a performance and stability hit on a 32-bit CPU, but not because of the reason you are leaning towards (128-bit file system). As others have stated, it's more to do with memory limitations.
And how do you know that? Have you studied the 32 bit ZFS code, or?

My point is, if you have a dataset which is not helped with ARC (reading new data all the time so the cache will never be used) then ZFS will degrade down to platter speed. And in that case, ARC size is irrelevant. It does not matter if you have 2GB ARC size or 64GB. And in that case (where disk cache is unused) - why do you get 20MB/sec with 32bit cpus, and 100s of MB/sec with 64bit cpus?

I am convinced that if had PAE or similar that allowed you to use 8GB RAM with 32bit cpu - you would still get 20MB/sec. In short, no matter how much RAM you have with 32bit cpus, you will still get 20MB/sec. Thus the RAM size is not a parameter in this question. And the ZFS devs never mentioned that RAM size is the limitation in bandwidth on 32bit cpus, they specifically mentioned that the code was not that good. I guess it is because 32bit code managing 128bit wide code will be punished perfomance wise.

Just as when a 128bit GPU will handle 512bit data - it will take four times as much instructions for a 128bit GPU, and a 512bit GPU will do just one transfer. Four times as much work.

One way to settle this question for all, is to look at the ZFS code. There you have all answers. Another way is to ask a ZFS developer.


Additionally, if raw performance is what you're after, don't be mislead to think that 10 disks in ZFS are ~10x faster than a single disk. ZFS is primarily concerned with data integrity, not performance.
Well, the ZFS devs had this as a goal: to reach disk platter speed. Have you not followed the ZFS mail lists and ZFS blogs? Sun Microsystems engineers were encouraged to blog about ZFS and all other sorts of tech. Ive followed the major ZFS blogs since very early beginning many years back. Oracle engineers is not encouraged in the same vein. If you want to follow the major ZFS blogs today, you can go here:
www.dtrace.org
Matt Ahrens (one of the fathers of ZFS) posts ZFS technical details there. And the DTrace creators too. They are doing exciting work on ZFS now.

And yes, back then I saw benchmarks that showed ZFS reached platter speeds. Which was one of the outspoken goals of ZFS, by the ZFS devs.
 
...
And how do you know that? Have you studied the 32 bit ZFS code, or?

My point is, if you have a dataset which is not helped with ARC (reading new data all the time so the cache will never be used) then ZFS will degrade down to platter speed. And in that case, ARC size is irrelevant. It does not matter if you have 2GB ARC size or 64GB. And in that case (where disk cache is unused) - why do you get 20MB/sec with 32bit cpus, and 100s of MB/sec with 64bit cpus?

I am convinced that if had PAE or similar that allowed you to use 8GB RAM with 32bit cpu - you would still get 20MB/sec. In short, no matter how much RAM you have with 32bit cpus, you will still get 20MB/sec. Thus the RAM size is not a parameter in this question. And the ZFS devs never mentioned that RAM size is the limitation in bandwidth on 32bit cpus, they specifically mentioned that the code was not that good. I guess it is because 32bit code managing 128bit wide code will be punished perfomance wise.


There is no 32-bit version of ZFS source code. This is another misunderstanding, the difference is how it is compiled. Also the only thing 128-bit about ZFS is addressing and the filesystem does not spend that much time manipulating that kind of bookkeeping data.

The majority of stuff ZFS does is either I/O bound or memory bound, very little could be considered CPU-bound. The issues ZFS has with 32-bit is that code base makes heavy use of virtual memory addressing and in modern 32-bit systems (not considering PAE atm) virtual memory is close in size to physical memory which is not the case on 64-bit systems this causes overhead in managing the ARC.
 
Just as when a 128bit GPU will handle 512bit data - it will take four times as much instructions for a 128bit GPU, and a 512bit GPU will do just one transfer. Four times as much work.

That's fantastic that you can divide numbers and understand that 512/4 = 128. That doesn't imply there is 4x as much work. You're making a gross simplification that is incorrect.


Well, the ZFS devs had this as a goal: to reach disk platter speed. Have you not followed the ZFS mail lists and ZFS blogs? Sun Microsystems engineers were encouraged to blog about ZFS and all other sorts of tech. Ive followed the major ZFS blogs since very early beginning many years back. Oracle engineers is not encouraged in the same vein. If you want to follow the major ZFS blogs today, you can go here:
www.dtrace.org
Matt Ahrens (one of the fathers of ZFS) posts ZFS technical details there. And the DTrace creators too. They are doing exciting work on ZFS now.

And yes, back then I saw benchmarks that showed ZFS reached platter speeds. Which was one of the outspoken goals of ZFS, by the ZFS devs.

ZFS can get excellent speeds, but performance is based on the number of VDEVS as well as your physical arrangement of disks. Like your 128/4=32 fascination, you are oversimplifying things. Simply getting platter speed in ZFS is ignoring the other goals that ZFS was engineered for.

I encourage you to continue following ZFS related topics and blogs.
 
Back
Top