Forcibly Unmounting NFS Filesystems

forcibly unmounting nfs filesystems

available solutions

At first glance this fuser(1M) output doesn’t seem accurate since we know that the

dd(1) command launched in step #2 is referencing a file in the target filesystem.

Why then does fuser(1M) not report the dd(1) process as having open files? The

reason for this apparent discrepancy is that the outstanding NFS write requests

for this file are queued in the client’s buffer cache memory waiting to be written

to the server, so the server’s file is not technically “open” at this point. However,

these buffer cache pages count against the client’s overall usage of the NFS

filesystem, so it is these buffer cache pages that are keeping the filesystem busy.

Even if fuser(1M) had been able to identify the dd(1) process as the one holding

the target NFS filesystem busy, fuser(1M) would not have been able to

successfully kill this process because dd(1) was in the middle of performing file

I/O operations at the time the server crashed and it would therefore be sleeping

at an uninterruptible level in the kernel – unable to receive signals like SIGKILL

and SIGTERM. Even manually sending a “kill -9” to this process would have no

effect, given the state the process was in.

At this point we have a client NFS filesystem that is hung and cannot be

unmounted until it gets a response from the NFS server. Since the original server

is unavailable and presumably cannot be restored in a timely manner, the

alternative solution is to setup a “surrogate” NFS server.

6. The first step in creating a “surrogate” server is to determine the IP address of the

down NFS server. In this example, the nslookup(1) command is used to retrieve

this information.

7. Now a suitable replacement system must be found. The system needs to be

running NFS server daemons (i.e. nfsds). In this example, we will use the NFS

client system, which happens to be running nfsds, as the “surrogate” server.

Before configuring the server’s IP address on this system, we first need to

determine which IP interface to plumb the server’s address to. The netstat(1)

command is used to display the configured IP interfaces on the surrogate system.

Examining this output, it appears this system has three IP interfaces: lan3, lan0,

and lo0. The lan0 interface is connected to the 15.43.208.0 subnet, lan3 is

connected to the 192.1.1.0 network and lo0 is the loopback interface. Since

the down server’s IP address is a member of the 192.1.1.X network, lan3 is the

appropriate interface on this system to plumb this address to.

8. The ifconfig(1M) command is used to add the server’s IP address to the client’s

lan3 interface.

9. Almost immediately the dd(1) command reports an “I/O error” and exits. This is

expected behavior since the temporary NFS server (i.e. the client in our example)

is not exporting the same filesystems as the original server, so the NFS requests

for the original target file will be considered “stale” and will be responded to

with an ESTALE error. This ESTALE error indicates to the client dd(1) process that

the file it was referencing no longer exists on the responding server.