Monitoring ZFS

ldoodle

Limp Gawd
Joined
Jun 29, 2011
Messages
172
Hiya,

I'm about 2 months into my ZFS build (S11E) now and performance is so much better than WHS!

However, one of the nice things about WHS was the client console - giving you alerts when things go wrong.

I'm not overly bothered about have a local console with ZFS, but being able to run monitoring tools from the console would be ok, but i'm not sure what tools exist/can be added, i.e;

smart status monitoring of disks (is this CHKSUM on zpool status )
cpu/ram usage
iops (zpool iostat?)
fan speeds (longshot!)
disk usage (zfs list? - anything more detailed)

re point 1, I run weekly cron scrubs but I haven't figured out how to get the result of the scrub sent to me by email.

And any other monitoring/management things that typically go on (brain freeze at the moment!)

Cheers
 
re. the checking, I use nagios to monitor my servers and it runs a script to check the pool status...
Code:
# ./check_zfs cloud 2
OK ZPOOL cloud : ONLINE {Size:19T Used:9.50T Avail:9.50T Cap:50%}

Perhaps nagios is overkill for your needs, but then again, the script itself may be of use and you can expand it to look at other items like cpu, ram and disk usage.

Check out this post in my nas build thread for the script...
http://forums.overclockers.com.au/showpost.php?p=11953171&postcount=192
 
Davros, I may go this route, nice suggestion. I already have nagios running in a ubuntu VM monitoring all kinds of other things, so...
 
I spent a bit of time trying to get this loaded yesterday. Here's the blow by blow:

As root (or sudo):

1. Install the required Lighttpd and PHP packages:
Code:
pkg install lighttpd
pkg install php-52

2. enable the lighttpd service
Code:
svcadm enable svc:/network/http:lighttpd14

3. Open /etc/lighttpd/1.4/lighttpd.conf and do the following:
i) uncomment the mod_auth and mod_fastcgi module lines:
Code:
## modules to load
# at least mod_access and mod_accesslog should be loaded
# all other module should only be loaded if really neccesary
# - saves some time
# - saves memory
server.modules              = (
#                               "mod_rewrite",
#                               "mod_redirect",
#                               "mod_alias",
                                "mod_access",
#                               "mod_trigger_b4_dl",
                               "mod_auth",
#                               "mod_status",
#                               "mod_setenv",
                               "mod_fastcgi",
#                               "mod_proxy",
#                               "mod_simple_vhost",
#                               "mod_evhost",
#                               "mod_userdir",
#                               "mod_cgi",
#                               "mod_compress",
#                               "mod_ssi",
#                               "mod_usertrack",
#                               "mod_expire",
#                               "mod_secdownload",
#                               "mod_rrdtool",
                                "mod_accesslog" )

ii) add this to the end of the config file:
Code:
include "conf.d/fcgi-php.conf"

4. Uncomment the "cgi.fix_pathinfo" line in the /etc/php/5.2/php.ini file:
Code:
cgi.fix_pathinfo=1

5. Restart the lighttpd service:
Code:
svcadm restart svc:/network/http:lighttpd14

You can now test to see if Lighttpd is working by opening your browser, going to your server's IP address, and seeing if you get a nice "404 Not Found" page. If you do, great! If you don't you've done something wrong.

6. Download the SolarStatus source zip:
Code:
cd /var/lighttpd/1.4/docroot
wget -c https://github.com/hotzen/SolarStatus/blob/master/dist/SolarStatus_0.4.zip?raw=true
unzip SolarStatus_0.4.zip
chmod a+x scripts/var/lighttpd/1.4/docroot/*

There you have it. You should now be able to browse to your server's IP and see "SolarStatus Login Page". Default password is "f00bar"
 
Thanks tormentum.

That didn't work for me (IE just shows page cannot be found), so I want to uninstall and start again.

However when trying to uninstall lighttpd it complains that it's not installed;

user@server:~# pkg uninstall lighttpd
Creating Planpkg: 'lighttpd' matches no installed packages

Any ideas? php uninstalled just fine.
 
OK got lighttpd removed but now when trying to re-install I get the following;

user@server:~# pkg install lighttpd-14
pkg: 0/1 catalogs successfully updated:

Unable to contact valid package repository: http://pkg.oracle.com/solaris/release
Encountered the following error(s):
Unable to parse repository response

for any packages I try and install!!
 
OK got lighttpd removed but now when trying to re-install I get the following;

user@server:~# pkg install lighttpd-14
pkg: 0/1 catalogs successfully updated:

Unable to contact valid package repository: http://pkg.oracle.com/solaris/release
Encountered the following error(s):
Unable to parse repository response

for any packages I try and install!!

It appears that the oracle solaris pkg repo is down, maybe pending the release of solaris 11?
 
Ah OK, that's good news then - will try again later in the week.

Thanks guys.
 
cpu/ram usage
iops (zpool iostat?)
disk usage (zfs list? - anything more detailed)

I used collectd for a lot of these in particular.

collectd is not so good for "Is the box down?" but it's pretty good for trending data and "WTF just happened"
the CPU charts aren't very useful for me as it takes a while to render a graph for each T3 thread.

http://collectd.org/wiki/index.php/Plugin:ZFS_ARC

Plugin-zfs_arc-hits.png
 
All working now (both Oralce pkg repo and SolarStatus).

One thing, is there any documentation available as to the commands SolarStatus is running? I see each tab shows the commands its running so would be good to learn the CLI commands. ;)
 
Hello, did not see that thread. Thanks for trying out SolarStatus ;)
The install was and is online at: https://github.com/hotzen/SolarStatus/tree/master/install

I collected the commands by "man <cmd>" everything and looking through google.
Commands that are auto-TableTransformed (to HTML-Tables) should have a tooltip on the columns' headers.

All commands are dynamically loaded from the conf.ini.php, so feel free to adapt and comment out things that dont suit your needs.
And dont forget to edit conf.ini.php for the proper hard-disk paths!

Cheers
 
All working now (both Oralce pkg repo and SolarStatus).

One thing, is there any documentation available as to the commands SolarStatus is running? I see each tab shows the commands its running so would be good to learn the CLI commands. ;)

Yes, take a look at the "scripts" folder. It has all the scripts that SolarSatus is running to generate the statistics.
 
wierd, just installed this and I had to convert all the line endings in the scripts folder for them to work (well, about half were dos line endings and were bombing)

for f in *; do [[ -f $f ]] && dos2unix "$f" "$f"; done

in the scripts folder hits all of them
 
The install was and is online

Hi Hotzen

Yes I did see the INSTALL.TXT file but it doesn't detail it in the way tormentum has. If he's OK with with I would suggest copying his post to the INSTALL.TXT file for anyone else wanting to use it.
 
Hi Hotzen

Yes I did see the INSTALL.TXT file but it doesn't detail it in the way tormentum has. If he's OK with with I would suggest copying his post to the INSTALL.TXT file for anyone else wanting to use it.

No problem on this end. Note that I only tested with OpenIndiana oi_151a. Might be worthwhile testing on SE11 and others.

Also, looks like Oracle's pages are back up now. opensolaris.org included.
 
Might be worthwhile testing on SE11 and others.

I use S11E and confirm it works OK.

On the subject of monitoring ZFS, does any one know if you can get it to shut down if the UPS drops to battery?

I've been searching for ages but seems quite a 'niche' thing to so!
 
No problem on this end. Note that I only tested with OpenIndiana oi_151a. Might be worthwhile testing on SE11 and others.

Thank you, I will improve the install-instructions as well.
A problem might be that I chose to move the whole lighttp installation from its version-dependent /1.4/ directory straight to /etc/lighttpd....

I think I will need to setup a VM to write down the installation step by step ;)

Thanks for supporting guys
 
I use S11E and confirm it works OK.

On the subject of monitoring ZFS, does any one know if you can get it to shut down if the UPS drops to battery?

I've been searching for ages but seems quite a 'niche' thing to so!


Yeah, working fine on SE11 for me as well.

Re: UPS shutdown
What kind of UPS do you have? I have mine on a Liebert with a SNMP card so just have a script which polls the UPS and starts a shutdown if necessary. If it's an APC I think you can use apcupsd
 
I have mine on a Liebert with a SNMP card so just have a script which polls the UPS and starts a shutdown if necessary.

Interesting, would you mind sharing your SNMP script? I have a similar situation I'm dealing with.
 
While on UPS's is there a similar app for Eaton/powerware UPS's?

Paul
 
Interesting, would you mind sharing your SNMP script? I have a similar situation I'm dealing with.

Yeah - I can dig it up after work later on @ the house if you need it - it's nothing fancy though, all I took was the perl boilerplate @ http://www.cuddletech.com/articles/snmp/node18.html compared the battery capacity to a threshold (90% I think), and then called a shutdown.sh script which stopped the database, sent a shutdown command to a windows vm over the network, and then after a set amount of time (2 minutes i think) shuts down the main computer.
 
re. the checking, I use nagios to monitor my servers and it runs a script to check the pool status...
Code:
# ./check_zfs cloud 2
OK ZPOOL cloud : ONLINE {Size:19T Used:9.50T Avail:9.50T Cap:50%}

Perhaps nagios is overkill for your needs, but then again, the script itself may be of use and you can expand it to look at other items like cpu, ram and disk usage.

Check out this post in my nas build thread for the script...
http://forums.overclockers.com.au/showpost.php?p=11953171&postcount=192

Unfortunately the way the script is posted on the other forum, it downloads with the indentation messed up :( Also, where/how did you set opensolaris up to handle nagios plugins?
 
Last edited:
Back
Top