hp-ux networking august 2003 forcibly unmounting nfs filesystems technical white paper table of contents By: Dave Olker, SNSL Advanced Technology Center Contributor: Randy Saum, Global Solutions Engineering introduction problem statement other vendors’ solution hp’s future direction available solutions wait and try umount(1M) again kill any processes accessing the nfs filesystem setup a temporary “surrogate” nfs server 2 3 4 4 5 5 5 7 preventative steps 11 summary for more information 15 15 inst
forcibly unmounting nfs filesystems introduction introduction Network File System (NFS) is an industry standard file sharing protocol that allows the filesystems residing on one system (the server) to be seamlessly accessed from other systems (clients) across a network. When both the client and server systems are configured properly and performing well, NFS allows client systems to seamlessly access remote filesystem resources as if they resided on locally mounted disks.
forcibly unmounting nfs filesystems problem statement problem statement Most attempts at unmounting hung NFS filesystems from an unavailable server fail with an error such as “Device busy.” This typically occurs when client-side processes are referencing NFS files or directories on the server at the time it went down or when clientside processes continuously attempt to access NFS resources on the down server.
forcibly unmounting nfs filesystems other vendors’ solution & hp’s future direction other vendors’ solution Several NFS vendors, including Sun Microsystems, have added support for a new forcible unmount feature to their operating systems. Sun introduced a new “-f” option to the umount(1M) command in Solaris 8, which instructs the client to forcibly unmount the filesystem regardless of whether any processes are accessing the filesystem or not.
forcibly unmounting nfs filesystems available solutions available solutions While HP will not offer an officially supported forcible unmount solution for some time, there are several steps that you can take today to work around this problem. All of the solutions documented in this section should work on currently supported HP-UX 11.0 or 11i systems.
forcibly unmounting nfs filesystems available solutions 4. Use the fuser(1M) command with the “-k” option to kill the processes holding files open in the target filesystem. (See important fuser(1M) syntax note below.) 5. Issue the umount(1M) command again to successfully unmount the NFS filesystem now that all processes holding files open in the filesystem have been killed.
forcibly unmounting nfs filesystems available solutions setup a temporary “surrogate” nfs server In the example shown in Figure 1 the fuser(1M) command was able to kill the processes holding the NFS filesystem busy, and thereby allow the filesystem to be unmounted. However, in many situations the processes accessing a down NFS filesystem are sleeping at an uninterruptible level in the kernel and are unable to receive signals, such as SIGKILL or SIGTERM.
forcibly unmounting nfs filesystems available solutions Figure 3 - Setup a “surrogate” NFS server (part 2) The following steps were used in the above example to simulate an NFS filesystem hang scenario and then to create a “surrogate” NFS server: 1. The bdf(1M) command shows the mounted NFS filesystems. 2. The dd(1) command is used to generate file I/O in the target NFS filesystem. In this example, the dd(1) command is writing to a file in the NFS filesystem. 3.
forcibly unmounting nfs filesystems available solutions At first glance this fuser(1M) output doesn’t seem accurate since we know that the dd(1) command launched in step #2 is referencing a file in the target filesystem.
forcibly unmounting nfs filesystems available solutions Depending upon the design of the application, most processes, upon receiving an ESTALE error, will give up attempting to contact the NFS server and will either exit on their own or will transition to a state where they can be successfully killed. In this example, the dd(1) application returned an error and exited. 10. Even though the dd(1) command has exited, the filesystem cannot be immediately unmounted.
forcibly unmounting nfs filesystems preventative steps preventative steps As discussed earlier in this paper, there are many reasons why an NFS filesystem may at some point transition to a state where it cannot be unmounted; such as when processes are holding resources open on the filesystem and cannot be killed. There are also times when it appears no processes are holding any NFS filesystem resources and yet the filesystem is still considered “busy.
forcibly unmounting nfs filesystems preventative steps use the “soft” NFS mount option By default, NFS filesystems are mounted with the “hard” option, which instructs the client’s kernel to indefinitely retransmit any NFS request that is not responded to by the NFS server.
forcibly unmounting nfs filesystems preventative steps important note on the “soft” NFS mount option Use of the “soft” option on NFS filesystems mounted for read/write access can be dangerous if your applications are not designed to gracefully handle receiving a timeout error for operations such as read() or write().
forcibly unmounting nfs filesystems preventative steps The Highly Available NFS component of MC/ServiceGuard is a toolkit that enables you to create NFS packages that run on highly available servers. With MC/ServiceGuard NFS, an NFS server package containing exported filesystems can move from one node (the primary node) to a different node (the adoptive node) in the cluster in the event of failure.
forcibly unmounting nfs filesystems summary & for more information summary There are many times when an NFS filesystem will be in a state where it cannot be unmounted. In some situations these filesystems are considered “busy” for legitimate reasons – such as when processes are actively accessing the filesystem or when one or more processes are holding buffer cache memory resources that reference the filesystem.