New Infiniband cables are in:
They were a bit more expensive ($90/each), but note the ribbon looking IB cables on the left...these are from 3M and are *significantly* easier to install and have a much more liberal max turn radius that typical round IB cables.
5 of 8 nodes now up:
Cluster...
They say I got the first two of these from the factory:
48TB *and* 12 cores PER 1U!
X4 per 4U chassis gives you almost ~200TB and 96 (hyperthreaded) cores.
Some initial comments:
* Amazing density...amazing price for what you get
* Great power efficiency...E2600 CPUS *and*...
What hardware platform are you using for your cluster?
Supermicro recently announced a sweet box for Hadoop:
http://www.supermicro.com/products/system/4U/F617/SYS-F617H6-FT_.cfm
(you actually want the other chassis that has the LSI 2308 (non-raid) HBA ... unfortunately there's not a...
Wow..somehow I missed this until now:
http://www.supermicro.com/products/system/4u/6047/ssg-6047r-e1r72l.cfm
72 3.5" drives in 4U PLUS room for a main board!
Hitachi 4TB is a supported drive, so 288TB in 4U...amazing.
I can only imagine the noise the fans on this thing make...crazy.
Setup:
Solaris 11
Supermicro motherboard
8 3TB Hitachi SATA (mix of 5400 and 7200RPM)
Sans Digital 8-drive SAS enclosure (no Expander)
Array is ZFS RaidZ1
Array keep intermittently dropping out...shows a CRC error for each of the 8 drives...remote server and then all is good again for a...
I don't think this is what the card you posted is all about, but on a similar topic happened to hit on stuff discussing new 'SCSI over PCI-e' standards which look REAL interesting:
http://www.snia.org/forums/sssi/knowledge/standards
and...
Let me expand on what Gea said with regard to effective ZFS snapshot differences.
IMO one of the most powerful features of ZFS is the ability to not only take a snapshot, but also **to access the prior state of any file in any snapshot**.
For example here is one of my zvols, I...
Have had this happen a bunch of times over the last couple of years.:
A drive fails in a storage pool (NOT rpool)...ZFS automatically resilvers to the hot-spare no problem and pool returns to normal state...however, if you leave the failed drive physically connected to the system *and* try to...
I would not go with a 4U server chassis...go with a 2U that has bunches of 2.5" drive slots...use those drive slots for OS, ZIL and/or L2ARC drives NOT your actual storage drives.
For 1PB you'd need about 250 4TB drives...to achieve this use SIX Supermicro SC847(E16)-JBOD enclosures...they...
Ditto!
I've lost data countless times with hardware RAID due to how poorly it (Adaptec) handled drive failures and replacement. I vowed never again.
I have 4 ZFS servers, Solaris 10, FreeBSD9, OI151, and even Solaris 11...over 100TB of data and in 3 years have lost a single byte.
More...
I can't even figure out where this product is on IBMs web site..they *do* have several other 16 port LSI cards listed, but NOT that 9202-16e ... are at least as far as I can tell.
Anyone know what IBM's product name/numbers are.
I google for the text names that show up in LSIutil (e.g. SAS...
Got my LSI 9202-16e fired up today...for those that missed the prior discussion on this, it is essentially TWO 9200-8e jammed into one low profile Pci-e 2.0 card and with a x16 PCI expressway to the bus.
# lsiutil
Main menu, select an option: [1-99 or e/p/w or 0 to quit] 1
Current...
*Red:
I don't know what you're doing but all the links you're posting are broken.
Here's the chassis you need:
http://www.supermicro.com/products/chassis/4U/847/SC847E16-RJBOD1.cfm
*or*
If you ever thing you will be using SAS drives *and* wanting to make use of full dual-pathing...
That is exactly what we are about to do. We had toyed around with SRP about nine months ago and got it working but it seemed very unstable...however at the time we didn't really understand that there are bunches of iSCSI initiator and target subsystems out there...I'm not even sure what we were...
Very interesting Intel article on PCI-e performance testing:
http://download.intel.com/design/intarch/papers/321071.pdf
I had already noticed Mellanox's recommendation of setting PCI-e Max Read Request Size to 4096 byte (some motherboards default to a setting as low as 128 bytes).
You can see...
BTW: There are a LOT of tunables to consider when attempting to get max performance out of IB:
http://www.mellanox.com/related-docs/prod_software/Performance_Tuning_Guide_for_Mellanox_Network_Adapters.pdf
What commands are you using for tput tests?
I am currently testing ConnectX-2 using stock combinations Centos 6.3 and OI151a5.
I believe that RDMA is the key to building an IB storage server that will rival local disks as you need RDMA to achieve the full BW potential of your IB adapter...
Your cheapest way to go is probably:
Rackable Systems SE3016 (search Ebay)
16 Drive Jbod enclosure *with* built in SAS Expander
Has SAS wide *in* and *out* ports
Use with LSI SAS 9200-8e or LSI SAS 9205-8e HBA
Get two enclosures and you can connect 32 drives using a single HBA (HBA...
Would the E26 have an effect on the ability to achieve an x8 wide link between the enclosure and the HBA?
I just ordered an E26 about a week ago, so will be testing this out. It was only about $200 more than the E16 so I went with E26 as in the worst case I figured it would set me up later...
Google that drive on Youtube and you can see examples of people cracking the case to extract out the drive. In at lease some cases it's a 6Gbps, 7200rpm Hitachi drive...however, you void your warranty and destroy the case.
I had briefly entertained this approach, but then B&H Photo put the...
Well...as I'm about to restart our iSer (iSCSI over RDMA) over 40Gbps Infiniband (~3200MB/s) testing I ordered EIGHT of these drives today along with another Supermicro SC847 6Gbps JBOD chassis. Figured I'd use these try try to generate 4000MB/s of IO and to try to saturate IB and then use...
How did you end up with a 512 byte recordsize initially?
Also, suggest you verify the block size requests being made by your Windows server using tcpdump and/or snoop on your iScisi traffic.
hotkernel seems to indicate there is a fair amount of ZFS IO going on (e.g. thus indirectly resulting in lots to being put into the L2ARC and ejected as needed).
What level of IO is actually going on ?
# zpool iostat 10 10 ?
Personally, I'd benchmark the pool locally first using...
Run the 'hotkernel' Dtrace script from the Dtrace toolkit...please the last 20 lines or so (top kernel hits):
http://www.brendangregg.com/dtrace.html#DTraceToolkit
Given box is running 40% kernel time, seems that something major it going on at the kernel level.
I did a 'zfs destroy' on the volume...it's totally gone...I deleted all snapshots on that zvol as you can do a destroy without deleting the snapshots first.
OK..I moved 5.5TB of space from the old pool ('rz2pool') to the new pool ('zulu01') now each pool is less than 50% full:
root@zulu01:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ CAP DEDUP HEALTH ALTROOT
rz2pool 16.2T 7.15T 9.10T - 44% 1.00x ONLINE -
zulu01...
Agreed...this is absolutely critical. I already changed the server from the default value of 30 to '4' as discussed in those forum posts...otherwise instead of the server just being dog slow, it's completely unstable.
The problem is I was probably running this server for many months with...
Somehow I'm experiencing what I'm going to call ZFS Rot...by this I do NOT mean (bit rot) but rather gradual deterioration of read and write performance across a pool that has been around for many years, has lots of data and lots of snapshots.
When I first built this array, was getting...
Multi threaded DD tests to the raw disk (10 in parallel) confirm that the expander and HBA are working as expected (e.g. seeing about 900-1000MB/s):
us sy wt id
0 1 0 99
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t...
Hmmm...I'm telling filebench to do 1MB iosize, however, iostat says iosize is about 64K:
Filebench profile:
set $dir=/rz2pool
set $filesize=40g
set $nthreads=1
set $iosize=1m <----
define file name=largefile1,path=$dir,size=$filesize,prealloc,reuse
define file...
OK..now I'm really confused...decided to do an iostat while filebench running (8 threads)...even with 8 threads filebench test maxing out at 100-200MB/s:
7163: 207.167: Per-Operation Breakdown
seqread 17120ops 106ops/s 106.0mb/s 75.2ms/op 807us/op-cpu [0ms -...
nothing showing in fmadm.
I *DO* intermittently see this in /var/adm/messages:
Nov 10 15:05:25 zulu01 unix: [ID 954099 kern.info] NOTICE: IRQ19 is being shared by drivers with different interrupt levels.
Nov 10 15:05:25 zulu01 This may result in reduced system performance.
Nov 10...
More details of hardware setup and problems:
* SMC X8DTL
* 32GB RAM
* 10 x Hitachi 2TB 3Gbps Sata
* HP SAS Expander
* Norco SAS chassis
* LSI 9200-8e HBA (v14.0 FW)
pool: rz2pool
state: ONLINE
scan: none requested
config:
NAME STATE READ...
Looks like seeing 'mwait' on the kernel stack when the system is idle may be expected behavior as at some point mwait is used to implement the Solaris 'idle loop':
http://docs.oracle.com/cd/E19957-01/820-0724/ggdqv/index.html
"x86: MONITOR and MWAIT CPU Idle Loop
This kernel functions...
How many files to you have in an individual folder?
Or how many folders within a given parent folder?
putting thousands (or even hundreds) of file entries int he same directory will swamp just about any operating system...Windows more so than Unix, but even linux has problems once you get into...