How NOT to set up your ZFS storage

Joined
Jan 4, 2011
Messages
13
Hi all, I wanted to share with you an infortunate experience in setting up a ZFS storage, may it help others not to make the same mistakes.

It all begins with an "old" socket 775 Asus P5w-DH motherboard I had laying around, and as it is on the opensolaris HCL, I thought it would be a good idea to use it as my new storage server mainboard. I didn't had any CPU left, so I bought a Q8300. Besides that, I bought some "classic" ZFS gear : a 1068E SAS card, and 9 ("magic number" ?) Samsung F4 drives flashed to the new firmware before creating the zpool. I used the "ashift" patched zpool binary as well for the 4K sectors.

The openindiana b148 install didn't complete successfully, grub was complaining about not finding the dedicated boot drive I set up on the integrated ICH7 controller. It turned out that the P5W-DH has a sata port splitter that grub doesn't like in a certain setting, and moving a jumper on a board (to the "BIG" setting for those curious) solved the issue. I had a system freeze some time after, but I put it on the "young" b148 release, we'll see it was my first fatal mistake as I went on without investigating further.

I belived my system was stable as all was set by default in the mainboard BIOS. 3 passed of memtest made me think that it was stable indeed.

I transferred my data to the ZFS server, and second fatal mistake, deleted the original as I began to feel confident in my new system.

It turned out that this old P5W-DH doesn't really support the "new" Q8300. It was silently corrupting my data as I was transferring it from my other computer. ZFS cannot do everything in saving your data if it's corrupted in RAM or in the chipset before being written to the drives.

The system finally crashed after a few days, and I had a chill through the spine when I remembered that the board was stable only below 338 MHz FSB with a 65nm generation Q6600. The Q6600 is a 266 MHz FSB unit, the Q8300 is a 333 MHz one, so 333 MHz is pretty close to 338. In fact, the Q8300 is not stable at stock settings on this board, I had to pull the FSB back down to 266 MHz. The limit is somewhere in between, don't ask where.

The scrubbing fixed most of the corrupted files, but not all. Some of the files I didn't delete on the source show a different md5 sum. So all in all, I ended up with corrupted data, when at first I wanted to transfer it to a secure location.

Don't get me wrong, openindiana is doing it's job for what I want of it, and I think ZFS is great. But lack of luck and poor planning may lead you to data loss.

-Don't reuse old hardware, especially a mix of old and recent stuff. It's just not worth it. I really hesitated in buying a socket 1156 board, I should have.
-Test your storage system extensively for stability. I personnaly use a mix of memtest and prime95. I should have this time, even if my system was running at stock settings.
-Do NOT delete your original data before having extensively tested your new storage system...

Cheers
 
Last edited:
there is a reason servers use ECC RAM. this is exactly it. atleast you learned something from it all.

i dont think you should say "dont reuse hardware", as i purchase used stuff all the time without any sort of problems. you just have to be careful and not purchase "failing, incompatable or damaged hardware".
 
odditory : sure, at least keep your most important data on a different storage. Fortunately an inner fear kept me from transferring the content I care about.

ghost6303 : obviously my "don't reuse hardware" should be kept in mind, but not taken as is. As for the ECC memory I agree that it would have probably saved me from data corruption. For a home system it seemed overkill, but after thinking about it either you do it right or you don't. Nice engine with lousy tyres.
 
Last edited:
bummer man...

AMD allows ECC ram on Athlon and Phenom Lines where as intel requires xeon for ECC support...

to my knowledge.

hopefully you didnt loose anything too valuable.
 
bummer man...

AMD allows ECC ram on Athlon and Phenom Lines where as intel requires xeon for ECC support...

to my knowledge.

hopefully you didnt loose anything too valuable.

With AMD you also need a motherboard that supports ECC RAM, and not all do.

@OP: Good reason to research and invest in solid storage.
 
The system finally crashed after a few days, and I had a chill through the spine when I remembered that the board was stable only below 338 MHz FSB with a 65nm generation Q6600. The Q6600 is a 266 MHz FSB unit, the Q8300 is a 333 MHz one, so 333 MHz is pretty close to 338. In fact, the Q8300 is not stable at stock settings on this board, I had to pull the FSB back down to 266 MHz. The limit is somewhere in between, don't ask where.

Do you mean the core clock or the FSB? The FSB on those should be 1066MHz or 1333MHz. The core clock speed is correct on what you said though. ;)
 
Actually the FSB clock frequency is 266 or 333 MHz, but it is a quad data rate bus (transfers data four times per clock cycle) resulting in a "perceived" data frequency of 1066 or 1333 MHz.
 
some news

After a few days of normal use, my system kept panic looping. It was crashing when trying to import the raidz during boot. So, after data corruption, I thought I lost it all definitively.

I was able to import it read-only in Solaris 11 express. I don't know if it the readonly import would have worked in OpenIndiana. I don't know why the zpool import was crashing, but I suspect that the patched zpool *may* be related to this issue, as panic messages were referring to corrupted space maps.

So I decided to try another way to keep this holy "ashift=12". I first installed FreeBSD, and created the zpool with gnop generated pseudo devices, and exported the zpool. Then I installed Sol 11 Express, imported and upgraded the zpool. It still shows ashift=12 as the zpool was created on 4K virtual drives, so it stays after a reboot, there is much confusion about this.

Some info about the motherboard as well : It turned out that the sata port splitter is NOT supported in AHCI mode, OpenIndiana or Sol 11 Express won't install, you have to configure your ICH7R in IDE mode. So I connected my 9th drive on the JM363 controller port in AHCI mode, and installed Sol 11 express in IDE mode on one of the ICH7R ports. The EZ-Raid jumper has no "good" setting. And NO WAY of getting inside the 1068E card bios. Sometimes the mainboard will just hang with a blank screen after 1068E init, I found that setting the PCI latency to 128 in mainboard bios can help.

All in all STAY AWAY from this board for your ZFS install. The supported status in OpenSolaris HCL is a bad joke. I ordered a SuperMicro board with ECC memory, enough of it.
 
Actually the FSB clock frequency is 266 or 333 MHz, but it is a quad data rate bus (transfers data four times per clock cycle) resulting in a "perceived" data frequency of 1066 or 1333 MHz.

Ah, that's cool, thanks for telling me.
 
Back
Top