Forcibly Unmounting NFS Filesystems
forcibly unmounting nfs filesystems
available solutions
9
At first glance this fuser(1M) output doesn’t seem accurate since we know that the
dd(1) command launched in step #2 is referencing a file in the target filesystem.
Why then does fuser(1M) not report the dd(1) process as having open files? The
reason for this apparent discrepancy is that the outstanding NFS write requests
for this file are queued in the client’s buffer cache memory waiting to be written
to the server, so the server’s file is not technically “open” at this point. However,
these buffer cache pages count against the client’s overall usage of the NFS
filesystem, so it is these buffer cache pages that are keeping the filesystem busy.
Even if fuser(1M) had been able to identify the dd(1) process as the one holding
the target NFS filesystem busy, fuser(1M) would not have been able to
successfully kill this process because dd(1) was in the middle of performing file
I/O operations at the time the server crashed and it would therefore be sleeping
at an uninterruptible level in the kernel – unable to receive signals like SIGKILL
and SIGTERM. Even manually sending a “kill -9” to this process would have no
effect, given the state the process was in.
At this point we have a client NFS filesystem that is hung and cannot be
unmounted until it gets a response from the NFS server. Since the original server
is unavailable and presumably cannot be restored in a timely manner, the
alternative solution is to setup a “surrogate” NFS server.
6. The first step in creating a “surrogate” server is to determine the IP address of the
down NFS server. In this example, the nslookup(1) command is used to retrieve
this information.
7. Now a suitable replacement system must be found. The system needs to be
running NFS server daemons (i.e. nfsds). In this example, we will use the NFS
client system, which happens to be running nfsds, as the “surrogate” server.
Before configuring the server’s IP address on this system, we first need to
determine which IP interface to plumb the server’s address to. The netstat(1)
command is used to display the configured IP interfaces on the surrogate system.
Examining this output, it appears this system has three IP interfaces: lan3, lan0,
and lo0. The lan0 interface is connected to the 15.43.208.0 subnet, lan3 is
connected to the 192.1.1.0 network and lo0 is the loopback interface. Since
the down server’s IP address is a member of the 192.1.1.X network, lan3 is the
appropriate interface on this system to plumb this address to.
8. The ifconfig(1M) command is used to add the server’s IP address to the client’s
lan3 interface.
9. Almost immediately the dd(1) command reports an “I/O error” and exits. This is
expected behavior since the temporary NFS server (i.e. the client in our example)
is not exporting the same filesystems as the original server, so the NFS requests
for the original target file will be considered “stale” and will be responded to
with an ESTALE error. This ESTALE error indicates to the client dd(1) process that
the file it was referencing no longer exists on the responding server.