OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

EnderW · May 22, 2012

how much RAM is recommended if not using deduplication?

levak · May 22, 2012

as much as possible, because ZFS is using ram as cache memory

But I think anything above 2gb will work

EnderW · May 22, 2012

levak said:
as much as possible, because ZFS is using ram as cache memory

But I think anything above 2gb will work

well I can afford to max out my server with 32GB but I'd rather not if it's going to be wasted
I need enough performance to saturate a gigabit Ethernet connection

danswartz · May 22, 2012

Depends on your working set (disk data wise.) If it's largely streaming media, no amount of RAM will help much. If 90% or more of your data all fits in RAM, you should be good.

_Gea · May 22, 2012

EnderW said:
how much RAM is recommended if not using deduplication?

ZFS deduplication is realtime/online what means that every read/write must be deduplicated.
You can activate it with low RAM but then you have quite soon a situation, where the dedup tables are
too large for RAM and must be processed from disk.

A simple "delete a snap" can last a week under this condition.
The common suggestion for dedup is about 2-3 GB RAM per TB Data for deduplication + what is needed
for the storage server itself to become fast.

This is the reason that there are only very few use cases with dedup rates for example > 20 where dedup makes sense at all.
Activate compression and buy another disk is mostly the better way.

Without dedup, you need about 1-2 GB for the OS itself. But then its slow. Performance is mostly a factor of RAM because
every read or write is cached in RAM. If you have a single user home/media server, 2-4 GB may be enough. If you want a fast
multiuser server with a lot of different file read and writes, RAM up to the amount of disk space (100% guaranteed read cache rate) may be usefull.

Mostly it depends on needs and money.
If you want to save money or max RAM is limited, you can extend your fast RAM used as read cache with a slower read cache SSD (L2Arc)
which is must faster than regular disks regarding I/O but slower than more RAM.

madrebel · May 22, 2012

ARC is quite smart and will happily consume ~90% of available memory to speed up IO across the system.

if this is for a SOHO environment though, you likely won't see much difference between 8GB and 32GB. if you have say VDI running though i would absolutely max out the ram but only after making sure i can cache at 90% of my working dataset in l2arc SSDs.

madrebel · May 22, 2012

_Gea said:
ZFS deduplication is realtime/online what means that every read/write must be deduplicated.
You can activate it with low RAM but then you have quite soon a situation, where the dedup tables are
too large for RAM and must be processed from disk.

A simple "delete a snap" can last a week under this condition.
The common suggestion for dedup is about 2-3 GB RAM per TB Data for deduplication + what is needed
for the storage server itself to become fast.

This is the reason that there are only very few use cases with dedup rates for example > 20 where dedup makes sense at all.
Activate compression and buy another disk is mostly the better way.

the DD tables can be mostly cached in l2arc though right?

would be a really nice feature for the future to either have the option for post write dedup (how netapp/emc do it) or have the option to zpool add tank dcache disk1 disk 2 which would forcibly store the DDT only in dcache.

there is a use case for cloud backup/data graveyard where performance requirements aren't high and enabling gzip9 + dedup could really decrease hardware costs.

DarkNovaNick · May 22, 2012

EnderW said:
how much RAM is recommended if not using deduplication?

I have a similar question in determining how much to allocate to OpenIndiana when it is running as an all-in-one box with ESXi. I have a M1015 card, setup in pass-through mode, so ESXi makes you "reserve" all the memory you are allocating to the OpenIndiana VM. Since there will be other VMs running on ESXi, I can't give OpenIndiana all the memory, but I want to make sure it gets enough to keep up with the SMB/NFS requests. I'm not really sure how to systematically determine how much to use.

EnderW · May 22, 2012

_Gea said:
ZFS deduplication is realtime/online what means that every read/write must be deduplicated.
You can activate it with low RAM but then you have quite soon a situation, where the dedup tables are
too large for RAM and must be processed from disk.

A simple "delete a snap" can last a week under this condition.
The common suggestion for dedup is about 2-3 GB RAM per TB Data for deduplication + what is needed
for the storage server itself to become fast.

This is the reason that there are only very few use cases with dedup rates for example > 20 where dedup makes sense at all.
Activate compression and buy another disk is mostly the better way.

Without dedup, you need about 1-2 GB for the OS itself. But then its slow. Performance is mostly a factor of RAM because
every read or write is cached in RAM. If you have a single user home/media server, 2-4 GB may be enough. If you want a fast
multiuser server with a lot of different file read and writes, RAM up to the amount of disk space (100% guaranteed cache rate) may be usefull.

Mostly it depends on needs and money.

thanks for the reply Gea
I'll start with 2x8GB DIMMs and add more later if needed

_Gea · May 22, 2012

DarkNovaNick said:
I have a similar question in determining how much to allocate to OpenIndiana when it is running as an all-in-one box with ESXi. I have a M1015 card, setup in pass-through mode, so ESXi makes you "reserve" all the memory you are allocating to the OpenIndiana VM. Since there will be other VMs running on ESXi, I can't give OpenIndiana all the memory, but I want to make sure it gets enough to keep up with the SMB/NFS requests. I'm not really sure how to systematically determine how much to use.

Memory over-commitment is not possible with pass-through - and does not make sense with ZFS because all RAM is used.
(mostly you should be carefull with overcommitment, most modern OS's use what they have).

So the answer is. The performance of a virtualized SAN is quite similar to a barebone SAN with the same hardware specs.
If you virtualize with low RAM it is similar slow to a barebone SAN with low RAM, the same with a lot of RAM.
So the answer is. Give OI as much RAM as possible, as much as you do not need for VM's.

My home machine has 12 GB RAM with OI=5 GB and 3-4 VM's. My machines at work have up to 48 GB RAM with OI using 24 GB and 6-8VM's.
Main problem: free ESXi5 is limited to 32 GB RAM unlike free ESXi4 that has no limits. For large RAM and free all-in-ones, ESXi 4.1 is needed.

USF-Nealio · May 22, 2012

I don't think XenServer has a RAM limit. Does it work well on Xen?

_Gea · May 22, 2012

USF-Nealio said:
I don't think XenServer has a RAM limit. Does it work well on Xen?

Xen4 supports pass-through so it should work.
Main problem: most persons use ESXi so community is small.

bexamous · May 22, 2012

Can ZFS NFS shares have like an alias name?

Eg if I have:
/tank/users/someuser

And I enable nfsshare on someuser, they'd mount it with: server:/tank/users/someuser.

Can I make an alias or something so they can use like 'server:/someuser'?

Without actually having to crate a dirctory /someuser and change mountpoint for tank/users/someuser to /someuser.

madrebel · May 22, 2012

ln -s /source/of/share /name

or when you create the filesystem you can manually mount it anywhere which it will do hence forth via zfs set tank/users/someuser mountpoint=/location ...

yasser · May 22, 2012

Hello . i have been flowing this Thread for a very long time . trying and testing . some work and some doesnt . well . it might be just me . anyway

have anyone tried this 4 node twin servers from Super Micro ?

http://www.supermicro.com/products/system/2U/6026/SYS-6026TT-BIBQF.cfm

the idea is

1-brake 4 VM Nodes
2- install Open Solaris (maybe OI)
3- Create a Cluster ( using Pacemaker)
4- NFS
5- then have 4 Full Nodes with HA storage . that we can take one host down . and storage gets Replicated to other hosts . so no long down time

will that be the conclusion ?

now what i'm missing is ( why do i need the Nexenta ?)

thanks

madrebel · May 22, 2012

Nexenta only supports 2 head clusters. RSF-1, the HA component nexenta uses i've been told supports more than that but likely things begin to get trickier past 2 heads.

4 node failover can in theory be done with pacemaker but you'll need to manage the fail-over/back scenarios.

also you won't be able to do ative/active with pacemaker ... pretty sure you won't anyway.

you're better off looking at the storage bridgebay as a stand alone device IMO. you don't 'need' nexenta, napp-it has menus to integrate the RSF-1 cluster piece from the highavailability.com folks. I haven't used it in napp-it, I think Gea has. I have used nexenta's (same thing, rsf1 from ha.com) and it works great.

Gea and some others are working on the pacemaker thing and from what I understand it works but isn't a bare-metal solution iirc it requires the all in one VM approach. i could be wrong. I'm not a fan of running a hypervisor just so i can run storage clustering.

x-cimo · May 22, 2012

Thanks Gea!

It turns out the root pw was cached on my windows box, so the permission change worked but root had access to everything, rebooting the box solved the issue.

_Gea · May 23, 2012

madrebel said:
Gea and some others are working on the pacemaker thing and from what I understand it works but isn't a bare-metal solution iirc it requires the all in one VM approach. i could be wrong. I'm not a fan of running a hypervisor just so i can run storage clustering.

This PaceMaker solution usually works on hardware. We will try to build a ESXi solution to avoid special expensive shared storage to evaluate this solution.

Currently this project is not ready and needs persons with PaceMaker experience to build and document a sample NFS/CIFS active/passive cluster
(I have no experience yet with PaceMaker but I am interested in a free HA solution on OI. My part is a basic integration in napp-it for easy to use when it is working some day).

x-cimo · May 23, 2012

I created a auto snap job yesterday, and today I got:

---------
Job Alert
---------

Job 1337725331.err
request|1337767213|
req_date_time|23.05.2012, 06:00 13 s|
end snap: 23.05.2012, 06:00 13 s too many arguments
usage:
snapshot [-r] [-o property=value] ... <filesystem@snapname|volume@snapname>

For the property list, run: zfs set|get

For the delegated permission list, run: zfs allow|unallow

Jobinfo:
id=1337725331
pool=ZFS/Users
keep=60
name=usersauto snaps
delnull=yes.+
id=1337725331
text1=snap pool
opt1=ZFS/Users
text2=snaps_keep_60_del_size0
opt2=usersauto snaps
ps=
rps=
month=every
day=every
hour=6
min=0

Any idea?

JWeavis · May 23, 2012

I wanted to move my OI install to another disk in the move to a new case and expansion to my OI server. I was going to move it from the 36gb disk it's on now to a 16gb SSD I had laying around. Is this something that can be done or is this going to require a new install?

I figured Acronis/GParted would be able to resize the partition, but it doesn't appear that it can.

This is why:
https://www.dropbox.com/s/roavorq5ijxsvm9/wd36.jpg

What options do I have to use a new disk as the boot/system disk?

danswartz · May 23, 2012

I'm assuming acronis doesn't support zfs...

_Gea · May 23, 2012

x-cimo said:
I created a auto snap job yesterday, and today I got:

---------
Job Alert
---------

end snap: 23.05.2012, 06:00 13 s too many arguments
usage:
snapshot [-r] [-o property=value] ... <filesystem@snapname|volume@snapname>

name=usersauto snaps

do not use spaces in a jobname. spaces in jobs are used to separate arguments.
(current napp-it should delete spaces here)

delete the job and recreate as ex usersauto_snaps

DlStreamnet · May 23, 2012

JWeavis said:
I wanted to move my OI install to another disk in the move to a new case and expansion to my OI server. I was going to move it from the 36gb disk it's on now to a 16gb SSD I had laying around. Is this something that can be done or is this going to require a new install?

I figured Acronis/GParted would be able to resize the partition, but it doesn't appear that it can.

This is why:
https://www.dropbox.com/s/roavorq5ijxsvm9/wd36.jpg

What options do I have to use a new disk as the boot/system disk?

Re-install, import drives?

JWeavis · May 23, 2012

So just install to the SSD, then import the drives and it'll know about the structure of the vdev/pools?

axan · May 23, 2012

yes export the pool, install os on new drive and then import the pool. It will know the layout/structure of your pool.

thedge · May 23, 2012

JWeavis said:
So just install to the SSD, then import the drives and it'll know about the structure of the vdev/pools?

Yup, thats (one of the many) great things about ZFS.

levak · May 23, 2012

It will also remember your shares(SMB and NFS, not iscsi).

MAtej

JWeavis · May 23, 2012

What happens if I don't get in long enough to export the pool?

levak · May 23, 2012

I was also able to import the pools that werent properly exported. But I think i needed to add a special option (-f as in force) to import the pools and then do the scrub.

Matej

JWeavis · May 23, 2012

Ok, I had missed plugging in one of my drives and the pool was degraded. Found the problem and rebooting now. Will attempt to export.

JWeavis · May 23, 2012

System has been hanging about 3 min after getting a logon screen. I think it's the OS drive, but when trying to install OI to the SSD I received an error:

Code:

syncing file systems...
panic[cpu2]/thread=fffff00077d4c40: BAD TRAP: type=d (#gp General protection) rp=ffffffffffbc796d0 addr=0
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel

and it hangs.

Removed the LSI card, testing install again. If it fails, I'll stress test the machine. Maybe it's the new Seasonic 620w PS.

LBJ · May 23, 2012

Is there a good resource with the current best practices for choosing raidz vdev sizes with 4k disks or mixed 512/4k? I've read the wiki is out of date and on the forums there's no consistent message.

I'm trying to decide between a 3 and 4 drive pool for my backup server with at least some of the disks being 4k. But, I've seen conflicting benchmarks and I'm not sure what the most current info is. I don't want to waste a port but I also don't want to flush performance. Is it better to stick with 3 and figure something else out in the future with another controller or larger disks?

JWeavis · May 23, 2012

JWeavis said:
System has been hanging about 3 min after getting a logon screen. I think it's the OS drive, but when trying to install OI to the SSD I received an error:

Code:

syncing file systems... panic[cpu2]/thread=fffff00077d4c40: BAD TRAP: type=d (#gp General protection) rp=ffffffffffbc796d0 addr=0 dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel

and it hangs.

Removed the LSI card, testing install again. If it fails, I'll stress test the machine. Maybe it's the new Seasonic 620w PS.

Bad memory module. Down to 2GB should be fine for now, I'm sure someone around here has some DDR2 laying around. I'll be bugging friends tomorrow.

Back on to exporting.

madrebel · May 23, 2012

LBJ said:
Is there a good resource with the current best practices for choosing raidz vdev sizes with 4k disks or mixed 512/4k? I've read the wiki is out of date and on the forums there's no consistent message.

I'm trying to decide between a 3 and 4 drive pool for my backup server with at least some of the disks being 4k. But, I've seen conflicting benchmarks and I'm not sure what the most current info is. I don't want to waste a port but I also don't want to flush performance. Is it better to stick with 3 and figure something else out in the future with another controller or larger disks?

i wouldnt mix sector sizes within the same pool.

JWeavis · May 23, 2012

Pool is online, no errors. But, the Export errors telling me to try to reboot. I have and no change. Nothing in showing in the Pool History Log for these errors.

ShareNFS and ShareSMB were turned off after the reboot.

Edit - Guess it needed a reboot after the shares were disabled. Appears to have worked now! Off to install new OI on SSD.

JWeavis · May 23, 2012

Ok, back online.

OI installed on new 16gb SSD. Napp-it installed (had to run it 3 times on the OI-Server). Pool imported. Trying to setup my SMB shares.... If failed to allow guest access.

JWeavis · May 23, 2012

Grrrr, shutdown, moved downstairs (headless), powered on, can ping, napp-it won't load.

EDIT - Oh what a long day. After dragging a monitor and keyboard downstairs, and all required cables, I find the system sitting at a install screen. Just remember, after the install is done, remove the install media or make sure boot order is the HDD first. I'm going to get a beer!

jmk396 · May 24, 2012

Does Solaris Express/ZFS support SMB2?

Also, I tired doing a DD of a 10 GB file testing the difference between SMB and NFS:

NFS:
10240000000 bytes (10 GB) copied, 45.4108 s, 225 MB/s

SMB/CIFS:
10240000000 bytes (10 GB) copied, 120.952 s, 84.7 MB/s

Does that seem right?! I've run the test several times just to make sure but sure enough NFS seems MUCH, much faster.

PigLover · May 24, 2012

jmk396 said:
Does Solaris Express/ZFS support SMB2?

No, it does not.

jmk396 said:
Also, I tired doing a DD of a 10 GB file testing the difference between SMB and NFS:

NFS:
10240000000 bytes (10 GB) copied, 45.4108 s, 225 MB/s

SMB/CIFS:
10240000000 bytes (10 GB) copied, 120.952 s, 84.7 MB/s

Does that seem right?! I've run the test several times just to make sure but sure enough NFS seems MUCH, much faster.

This is not completely surprising. My experience (over a 10Gbe network) was that SMB transfers from Solaris to Windows stall at just over 100MB/s while NFS can be pushed to well over 700MB/s. Assuming the client-side is a Windows based system, you need an optimized NFS client to reach these speeds (i.e., Hummingbird). The built-in Windows NFS client is just crap.

jmk396 · May 24, 2012

Huh. Very interesting... thanks!

One more question, does ZFS (again, Solaris 11 Express) only support NFSv4 or does it support earlier versions like 2 and 3? (I'm trying to learn more about NFS...)

I've read that NFSv4 servers are incompatible with earlier versions, yet it seems like my Ubuntu box can connect to ZFS (using NFS) with either NFSv4 or NFSv3...

OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

[H]F Junkie

Limp Gawd

[H]F Junkie

2[H]4U

Supreme [H]ardness

Gawd

Gawd

n00b

[H]F Junkie

Supreme [H]ardness

Limp Gawd

Supreme [H]ardness

[H]ard|Gawd

Gawd

n00b

Gawd

Limp Gawd

Supreme [H]ardness

Limp Gawd

[H]ard|Gawd

2[H]4U

Supreme [H]ardness

Limp Gawd

[H]ard|Gawd

[H]ard|Gawd

Limp Gawd

Limp Gawd

[H]ard|Gawd

Limp Gawd

[H]ard|Gawd

[H]ard|Gawd

n00b

[H]ard|Gawd

Gawd

[H]ard|Gawd

[H]ard|Gawd

[H]ard|Gawd

Gawd

[H]ard|Gawd

Gawd