"TRIM" Over CIFS/NFS/iSCSI?

bao__zhe · Oct 19, 2012

I'm considering using ZFS + CIFS or NFS or iSCSI to serve storage space to Windows 7/2008R2 clients.

While TRIM is often associated with SSD but this time I'm thinking of similar concepts for NAS/SAN. That is, which protocol can offer the ability for the server side to know what space is no longer required?

A typical scenario would be as follows:

10 GB of space is provided by the ZFS server to the Windows client using CIFS/NFS/iSCSI
->
The client writes a 6GB file A
->
The client deletes the 6GB file A
->
The client writes another 6GB file B
->
The client deletes the 6GB file B
->
What is the space occupied on the server? 0GB? 6GB? 6-10GB? 10GB? not possible to write file B?

mwroobel · Oct 19, 2012

I think you are confusing what TRIM does. Regardless of the OS and filesystem, If you have a 10GB filesystem and write 6GB A, delete 6GB A, write 6GB B and again delete 6GB B you will have 10GB available. Depending on the OS (and if the physical files are on an SSD) without trim the low level layout of the physical blocks may not be allocated optimally, but that won't have a bearing on how much space is advertised as available. (Again, depending on the OS and FS of both the client and server, it can take a few seconds for the space to be marked as free after the deletion (this could be due to the age of the OS/FS or to underlying mechanics of the OS/FS (volume shadow, HSM etc.)))
Regardless of the daemon/server you choose to service your I/O file requests, the underlying OS and filesystem will maintain available space and offer it to the client (and/or report the amount of free space depending on client and server OS and FS.

bao__zhe · Oct 19, 2012

emmm...i guess i didn't make the point clear. here is another (better) example:

10 GB of space is provided by the ZFS server to 2 Windows clients C and D using CIFS/NFS/iSCSI
->
The client C writes a 6GB file A
->
The client C deletes the 6GB file A
->
The client D writes another 6GB file B
->
The client D deletes the 6GB file B
->
What is the space occupied on the server?

This question arises because currently I'm having storage space problem with my ESXi server. It hosts several Win2008R2 guests installed on NTFS partitions/disks encapsulated in thin-provisioned VMDKs. Some times there are large temp files created and deleted in the guests NTFS but the space is not released on VMFS and the VMDKs can can only grow steadily even tho the NTFS space utilization is relatively stable. This is not expected when I first planned for the storage of this ESXi so I'd like to avoid this situation when planning for a storage server to serve this ESXi server and my desktop.

Hopefully this can provide a better context.

jonnyjl · Oct 20, 2012

bao__zhe said:
emmm...i guess i didn't make the point clear. here is another (better) example:

10 GB of space is provided by the ZFS server to 2 Windows clients C and D using CIFS/NFS/iSCSI
->
The client C writes a 6GB file A
->
The client C deletes the 6GB file A
->
The client D writes another 6GB file B
->
The client D deletes the 6GB file B
->
What is the space occupied on the server?

This question arises because currently I'm having storage space problem with my ESXi server. It hosts several Win2008R2 guests installed on NTFS partitions/disks encapsulated in thin-provisioned VMDKs. Some times there are large temp files created and deleted in the guests NTFS but the space is not released on VMFS and the VMDKs can can only grow steadily even tho the NTFS space utilization is relatively stable. This is not expected when I first planned for the storage of this ESXi so I'd like to avoid this situation when planning for a storage server to serve this ESXi server and my desktop.

Hopefully this can provide a better context.

VMDKs don't shrink automatically.

Your confusion is between a thin-provisioned filesystem (block device) on ZFS and how VMDKs work.

staticlag · Oct 20, 2012

TRIM is only necessary for SSDs.

The TRIM function is where an OS filesystem tells the SSD what blocks do not contain data and instructs the SSD to zero those memory cells when they are not being used.

This operation only benefits write operations on the SSD.

Think of an SSD as a chalkboard.

You can write to the chalkboard quite nicely if it is blank. BUT, if the chalkboard already contains writing you must take time to erase the existing writing or else you will have a big mess on your hands.

so TRIM is like having a person that erases the chalkboard for you between lectures, it saves your time in that you can just concentrate on writing.

So either way the erasure must occur before writing, TRIM is just an neat scheduler that does it more efficiently.

Magnetic hard drives do not have this problem because individual blocks do not need to be erased before writing. Therefore trim is unnecessary.

patrickdk · Oct 20, 2012

TRIM, while it could work for vmdk's, there is no support for it yet, so it won't work over nfs/cifs.

I am not sure about the freebsd/linux iscsi targets, but the comstar iscsi target in illumos based os (openindiana), has supported scsi unmap (scsi version of trim) for about a year now. This works fine by turning on discard support in linux. Windows 7 only has trim support, and no scsi unmap support. Windows 8 and Windows 2012 support scsi unmap and should work fine with it, I haven't personally tested it though.

But again, I don't know any virtualization software that supports trim/unmap, so this is only going work correctly from mapping it directly to the machine, be it virtual or real. I personally do this to support diskless workstations, and it works good.

bao__zhe · Oct 22, 2012

thanks for the information! i suppose TRIM to SATA is the same as UNMAP to SCSI?

patrickdk said:
But again, I don't know any virtualization software that supports trim/unmap, so this is only going work correctly from mapping it directly to the machine, be it virtual or real. I personally do this to support diskless workstations, and it works good.

So if i expose a iSCSI target to a Win2008R2 guest directly will it work?

If i expose NFE/CIFS to a Win2008R2 guest does this problem exist? (i have a feeling it might not because it's file-level.)

patrickdk · Oct 22, 2012

file level doesn't matter, it's file based

But exposing an iscsi target to win2008r2 won't do it, there is no unmap support in win7/2008

If you exposed it to win8 or win2012, it is suppost to work.

bao__zhe · Oct 23, 2012

got you. thanks!
it's just that i don't feel well upgrading from win2008r2 to win2012...metro touch screen for servers that don't have a screen???

i'll make another post exploring the NFS/CIFS route.

patrickdk · Oct 23, 2012

I have the same thoughts, but linux works great with it though

bao__zhe · Oct 25, 2012

new thread:

http://hardforum.com/showthread.php?t=1724508

mainly because i need a new subject

pyite · May 22, 2014

Sorry for responding to an old thread, but it will help to understand why TRIM/UNMAP/discard are not valid operations for file sharing (NAS) protocols like NFS/CIFS.

Or, to look at it another way, with NAS protocols the server side manages the blocks, so NFS/CIFS get trim support automagically as long as the server OS can do it (which they all can now).

devman · May 22, 2014

pyite said:
Sorry for responding to an old thread, but it will help to understand why TRIM/UNMAP/discard are not valid operations for file sharing (NAS) protocols like NFS/CIFS.

Or, to look at it another way, with NAS protocols the server side manages the blocks, so NFS/CIFS get trim support automagically as long as the server OS can do it (which they all can now).

I think its an good to update an thread with relevant information. This thread is currently the top search result on Google for "TRIM NFS", "TRIM CIFS" and the second result for "TRIM ISCSI", you are being a good forum citizen to people who land here from Google.

To address your point, you are absolutely correct. Network protocols don't deal with block devices directly, the only filesystem that needs to be concerned with TRIM is the filesystem on the actual block device.

patrickdk · May 22, 2014

I don't know about cifs. But nfs supports sparsefiles, that is as close to trim as you can get for nfs. ESXi supports this by default.

JoBUSH · May 22, 2014

patrickdk said:
I don't know about cifs. But nfs supports sparsefiles, that is as close to trim as you can get for nfs. ESXi supports this by default.

The point of the recent post bumping the thread was to point out that such network protocols do ultimately work with TRIM. That's because the network protocol itself is agnostic of the block device and file system, and modifies files locally through normal file operations. Such as, a delete on NFS translates to an unlink locally. Thus if a filesystem and underlying block device supports TRIM, then deletions will issue the correct TRIM notifications to the block device.

One major caveat is that support for this kind of thing through intermediate block devices, such as device mapper, or software raid, is currently not supported. Meaning in most cases, unless your shared folder resides on a single physical block device, TRIM still doesn't work.

patrickdk · May 22, 2014

I am not talking about file deletions on nfs, oviously that would equate to a lower level trim operation, but that is NOT equivialent to a trim over nfs.

To compare TRIM over NFS/CIFS/ISCSI, you would have to match the results.

In this case, you can punch holes in files on nfs, to simulate trim support, so a file that was 10gigs, might now only use 8gigs of space, but still appear as a 10gig file.

This is what happens with iscsi, or fc, or sas, when you use unmap/trim. The resulting space used is less, than the logical space allocated.

Now the second issue, does this penitrate down the layers of stack? This is pretty much never the case currently.

If you issue a trim command inside a vm, you MIGHT (though not often supported yet) recover space on the host system.

If you issue trim from ESXi, or windows 8, 2012, or linux, you can recover space via iscsi mounted lun, freeing up space that virtual disk would have used.

Then if you go back to what is hosting that lun, it is very unlikely that space that was freed up over iscsi, would then result in a trim or unmap command to it's disks, if they where ssd, or other devices that supported it.

Note as the thread started, windows 7 and 2008r2 only support TRIM, so the only device it can issue it to is sata devices. It will not work over scsi, sas, iscsi, or fc.

If you want windows to issue unmap commands for iscsi you need windows 8 or 2012.

JoBUSH · May 22, 2014

patrickdk said:
I am not talking about file deletions on nfs, oviously that would equate to a lower level trim operation, but that is NOT equivialent to a trim over nfs.

To compare TRIM over NFS/CIFS/ISCSI, you would have to match the results.

In this case, you can punch holes in files on nfs, to simulate trim support, so a file that was 10gigs, might now only use 8gigs of space, but still appear as a 10gig file.

This is what happens with iscsi, or fc, or sas, when you use unmap/trim. The resulting space used is less, than the logical space allocated.

Now the second issue, does this penitrate down the layers of stack? This is pretty much never the case currently.

If you issue a trim command inside a vm, you MIGHT (though not often supported yet) recover space on the host system.

If you issue trim from ESXi, or windows 8, 2012, or linux, you can recover space via iscsi mounted lun, freeing up space that virtual disk would have used.

Then if you go back to what is hosting that lun, it is very unlikely that space that was freed up over iscsi, would then result in a trim or unmap command to it's disks, if they where ssd, or other devices that supported it.

Note as the thread started, windows 7 and 2008r2 only support TRIM, so the only device it can issue it to is sata devices. It will not work over scsi, sas, iscsi, or fc.

If you want windows to issue unmap commands for iscsi you need windows 8 or 2012.

Sparse files are a much bigger issue in the virutalization world, not so much otherwise. Like you said though, having the deallocations propagate through the various layers of abstraction is a big issue currently.

bao__zhe · May 24, 2014

Now I have explored much deeper in the virtualization world, and also due to this recent bump of the thread, i can finally come back to this with some useful information.

My original question 1.5 years ago was not very well posed. As @patrickdk mentioned CIFS/NFS/iSCSI protocols are not of the same type. Virtualization adds another layer to the already complicated question about TRIM, underlying devices, and the protocols. However the intention is clear: to save space.

So here are the possible situations:

1. VM accesses CIFS/NFS files directly from a NAS server. In this case the virtualization does not play a role. Since they are file level protocols the most VM can do is to use spares files, which CIFS/NFS both supports, but not issue TRIM/UNMAP. Thus the NAS server is responsible for issuing TRIM/UNMAP to the underlying device when the need arises.

2. VM mounts iSCSI devices directly from a SAN server. In this case the virtualization does not play a role neither. VMs can issue TRIM/UNMAP commands to the iSCSI devices (depends on the OS version, as stated in the previous posts). Whether these commands are honored by the SAN server depends on the configuration and how the iSCSI devices are stored on the SAN server. E.g. if the SCSI device is thick-provisioned then it may not need to.

3. VM mounts VMDK disks stored in ESXi local datastore. This is the simplest case where virtualization is involved. Historically VMDK can only grow but not shrink. Thus if the VM issues TRIM/UNMAP to the VMDK disk its size will not shrink. It may even grow as there's more thing happening to the disk and this extra information needs to be faithfully preserved in the VMDK file.

4. VM mounts VMDK disks stored in iSCSI datastore from a SAN server. This is more complicated but the situation is similar. TRIM/UNMAP from the VM terminates at the VMDK layer and is not passed to the iSCSI devices containing the datastore.

5. VM mounts VMDK disks stored in CIFS/NFS shares from a NAS server. This is actually a little bit simpler because, although we still have the issue of TRIM/UNMAP from the VM terminates at the VMDK layer, ESXi already uses sparse files for some VMDK disk files.

To address the issue of passing VM's TRIM/UNMAP to VMDK files the solution is to use hole punching, which is a manual process. First ESXi needs to know all the unused space in the file system inside a VMDK is marked zero and need to write this in the VMDK. This can be accomplished by using, for example, sdelete (http://technet.microsoft.com/en-us/sysinternals/bb897443.aspx) inside the VM:

Code:

sdelete -z

After this stage the thin-provisioned VMDK will grow to almost its maximum provisioned size, recording all the zeros. Then a hole punching command can be issued inside ESXi:

Code:

vmkfstools -K /path/to/thin-provisioned.vmdk

At the end the VMDK file will shrink to almost the same size as what the actual file system uses.

This process, while being manual, also has the limitation of not applicable to delta disks (VMDK linked clones, ending in "-delta.vmdk") but only base disks (VMDK files ending with "-thin.vmdk"). Thus, if you use sdelete to zero out a VMDK in a linked clone then delta disk will irreversibly grow to its maximum and there is no way to shrink it. So don't do this.

With the recent introduction of vSphere 5.1 (and 5.5u1, etc) a new type of VMDK is introduced: SE Sparse disks. It is conceptually similar to the thin-provisioned disk but has the additional ability to shrink on its own, when coordinated with the vmware tools inside the VM. To create and shrink SE Sparse disks refer here:

http://www.virtuallyghetto.com/2012/09/how-to-create-se-sparse-space-efficient.html

http://www.virtuallyghetto.com/2012/09/how-to-initiate-wipe-shrink-operation.html

I view this as a more automated and efficient way of hole punching with the additional benefit of also working with delta-disks (SE Sparse disks chained from a SE Sparse base disk, both ending in "-sesparse.vmdk"). However it does have some draw backs, first being that you need vSphere 5.1+ to create them and need vCenter Server (which is not free) to shrink them. In addition it is not using OS native TRIM/UNMAP neither but trying to analyze the file system independently, same as sdelete. This process, as of now, is still a manual process, resembling closely the 2 stage process of hole punching. Also note that this is not officially supported outside of vmware View.

To address the issue of ESXi issuing TRIM/UNMAP to block based datastore (local or iSCSI disk) this command can be issued from ESXi:

Code:

esxcli storage vmfs unmap --volume-label=DATASTORENAME

To see if this is supported run these:

Code:

esxcli storage core device list

esxcli storage core device vaai status get -d <device id>

After running this the unused space occupied by previouly-shrinked VMDK files, deleted VMDK files and other datastore files will be TRIM/UNMAP'ed. This is primarily for performance reason for local SSD datastore and space-saving reason for iSCSI datastore (if the SAN server honers it, as said previously). Note that this will cause heavy performance impact on the datastore so use with caution. This is also the reason why this command needs to be run manually not automatically (as of 5.1u1).

Regarding Case 5 above, where VMDK files are stored in CIFS/NFS shares, if properly configured and supported, ESXi will use spare file for thin-provisioned disks ("-thin.vmsk") and SE Sparse disks ("-sesparse.vmsk"), but not thin-provisioned delta disks ("-delta.vmsk").

There is still unexplored areas such as analyzing from a performance perspective (primarily for SSD), and the support of TRIM/UNMAP for ZFS backed iSCSI.

patrickdk · May 24, 2014

I never liked vmwares option to make this a manual process, and the new method in 5.5 is much more resource intensive than it was in 5.1

In 5.1, I let esxi unmap realtime, and didn't have in issue with it. When I upgraded to 5.1u1 I think that ability was lost, and I scheduled it weekly.

In 5.5, they redid how it works, and it's very intensive, it chews up about 8hours of time, at 200MB/sec when it processes. 5.1 never used that much bandwidth to do it's unmaps.

In linux, I just let it unmap realtime also.

I think the real issue people where having is, they are using these large netapp/emc/... systems, with allocated luns, but the unerlaying disk space was real spinning disks. So it took those systems awhile to calculate and mark all that space.

In my case, the underlaying systems are all ssd's, and I have never had the performance impact. Ubuntu reported the same dealings, and they created a trim application that runs nightly to instead of doing it realtime, for desktop/laptop users with ssd. There I also haven't noticed the issue.

BloodyIron · Apr 22, 2019

FYI, NFS cannot pass TRIM unless you're using NFS v4.2 on both server _and_ client.

RHEL 7.4 and newer has NFS v4.2 official support

Unsure of FreeBSD as of this writing

danswartz · Apr 23, 2019

Necro champion of hardforum!

"TRIM" Over CIFS/NFS/iSCSI?

Weaksauce

Supreme [H]ardness

Weaksauce

Limp Gawd

[H]ard|Gawd

Gawd

Weaksauce

Gawd

Weaksauce

Gawd

Weaksauce

n00b

2[H]4U

Gawd

Weaksauce

Gawd

Weaksauce

Weaksauce

Gawd

2[H]4U

2[H]4U