Subject: Problem with nfs server reboot causing exports to jumble


We're using the 2.6.18 x86_64 kernel (RH SL 5.1) with a NFS
v3 server and NFS v3 clients using the automounter with options
(rw,noatime,hard,intr,rsize=32768,wsize=32768,proto=tcp).

Our NFS server had several exports

/export/a/{files-a}
/export/b/{files-b}

these were mounted on various clients.

We then added a new mount and export without rebooting:

/export/a/{files-a}
/export/b/{files-b}
/export/c/{files-c}

All were visible on the clients, and everything was fine:

/nfs/a/{files-a}
/nfs/b/{files-b}
/nfs/c/{files-c}

Our NFS server then crashed, and rebooted. After reboot the
clients then had:

/nfs/a/{files-c}
/nfs/b/{files-a}
/nfs/c/{files-b}

Note the files correspond to the wrong exports. Remounting
on every client fixed the problem - once we realized what
was going on! (files-a/b/c were all similar so it wasn't
immediately aparent)

My speculation is the reboot changed some enumeration
causing a mismatch between the clients and servers.

This is a new one by me - is this a known issue, and if so
is a patch available?

Other than patching, are there any precautions we can take
to avoid this in the future? (Other than not crashing the
server, which we'd be happy to not to do. :)

Thanks


2010-02-25 20:06:07

by Ben Myers

[permalink] [raw]
Subject: Re: Problem with nfs server reboot causing exports to jumble

Wilson,

On Thu, Feb 25, 2010 at 02:29:10PM -0500, Wilson Snyder wrote:
>
> We're using the 2.6.18 x86_64 kernel (RH SL 5.1) with a NFS
> v3 server and NFS v3 clients using the automounter with options
> (rw,noatime,hard,intr,rsize=32768,wsize=32768,proto=tcp).
>
> Our NFS server had several exports
>
> /export/a/{files-a}
> /export/b/{files-b}
>
> these were mounted on various clients.
>
> We then added a new mount and export without rebooting:
>
> /export/a/{files-a}
> /export/b/{files-b}
> /export/c/{files-c}
>
> All were visible on the clients, and everything was fine:
>
> /nfs/a/{files-a}
> /nfs/b/{files-b}
> /nfs/c/{files-c}
>
> Our NFS server then crashed, and rebooted. After reboot the
> clients then had:
>
> /nfs/a/{files-c}
> /nfs/b/{files-a}
> /nfs/c/{files-b}
>
> Note the files correspond to the wrong exports. Remounting
> on every client fixed the problem - once we realized what
> was going on! (files-a/b/c were all similar so it wasn't
> immediately aparent)
>
> My speculation is the reboot changed some enumeration
> causing a mismatch between the clients and servers.
>
> This is a new one by me - is this a known issue, and if so
> is a patch available?
>
> Other than patching, are there any precautions we can take
> to avoid this in the future? (Other than not crashing the
> server, which we'd be happy to not to do. :)

Use the mp and fsid export options documented in the exports manpage.

-Ben

Subject: Re: Problem with nfs server reboot causing exports to jumble


>Wilson,
>
>On Thu, Feb 25, 2010 at 02:29:10PM -0500, Wilson Snyder wrote:
>>
>> We're using the 2.6.18 x86_64 kernel (RH SL 5.1) with a NFS
>> v3 server and NFS v3 clients using the automounter with options
>> (rw,noatime,hard,intr,rsize=32768,wsize=32768,proto=tcp).
>>
>> Our NFS server had several exports
>>
>> /export/a/{files-a}
>> /export/b/{files-b}
>>
>> these were mounted on various clients.
>>
>> We then added a new mount and export without rebooting:
>>
>> /export/a/{files-a}
>> /export/b/{files-b}
>> /export/c/{files-c}
>>
>> All were visible on the clients, and everything was fine:
>>
>> /nfs/a/{files-a}
>> /nfs/b/{files-b}
>> /nfs/c/{files-c}
>>
>> Our NFS server then crashed, and rebooted. After reboot the
>> clients then had:
>>
>> /nfs/a/{files-c}
>> /nfs/b/{files-a}
>> /nfs/c/{files-b}
>>
>> Note the files correspond to the wrong exports. Remounting
>> on every client fixed the problem - once we realized what
>> was going on! (files-a/b/c were all similar so it wasn't
>> immediately aparent)
>>
>> My speculation is the reboot changed some enumeration
>> causing a mismatch between the clients and servers.
>>
>> This is a new one by me - is this a known issue, and if so
>> is a patch available?
>>
>> Other than patching, are there any precautions we can take
>> to avoid this in the future? (Other than not crashing the
>> server, which we'd be happy to not to do. :)
>
>Use the mp and fsid export options documented in the exports manpage.

Reading between the lines, are you suggesting the mounts
weren't ready when the exports started (or from the
pre-reboot's mount), causing the fsid's to be identical? I
could see that happening.

I presume then that it's enough to just use mp - assuming
each mountpoint has a unique UUID? Wouldn't adding fsid now
(with the server and clients all up) require me to remount
every client?

Finally, is there a way to examine the present
(automatically assigned) fsid's so I can confirm this was
the problem?

Thanks much

2010-02-26 16:37:50

by Ben Myers

[permalink] [raw]
Subject: Re: Problem with nfs server reboot causing exports to jumble

Hey Wilson,

On Thu, Feb 25, 2010 at 03:33:36PM -0500, Wilson Snyder wrote:
> >Use the mp and fsid export options documented in the exports manpage.
>
> Reading between the lines, are you suggesting the mounts
> weren't ready when the exports started

I'm suggesting that it's a good idea to use the mp export option so that
you don't export your root filesystem in the off chance that the
filesystems you want to export aren't mounted yet.

> I presume then that it's enough to just use mp - assuming
> each mountpoint has a unique UUID?

Look into how the fsid that identifies each filesystem to the nfs client
is chosen by the nfs server. Is it possible that it changed across the
reboot? I don't know offhand. If it did change across a reboot, using
the fsid export option to explicitly set the fsid for each export might
prevent this happening again.

> Wouldn't adding fsid now (with the server and clients all up) require
> me to remount every client?

Yep, I think so. Maybe you want to schedule some downtime.

> Finally, is there a way to examine the present
> (automatically assigned) fsid's so I can confirm this was
> the problem?

Maybe you can try ethereal or something like that.

-Ben

2010-02-27 21:10:13

by NeilBrown

[permalink] [raw]
Subject: Re: Problem with nfs server reboot causing exports to jumble

On Thu, 25 Feb 2010 14:29:10 -0500 (EST)
[email protected] (Wilson Snyder) wrote:

>
> We're using the 2.6.18 x86_64 kernel (RH SL 5.1) with a NFS
> v3 server and NFS v3 clients using the automounter with options
> (rw,noatime,hard,intr,rsize=32768,wsize=32768,proto=tcp).
>
> Our NFS server had several exports
>
> /export/a/{files-a}
> /export/b/{files-b}
>
> these were mounted on various clients.
>
> We then added a new mount and export without rebooting:
>
> /export/a/{files-a}
> /export/b/{files-b}
> /export/c/{files-c}
>
> All were visible on the clients, and everything was fine:
>
> /nfs/a/{files-a}
> /nfs/b/{files-b}
> /nfs/c/{files-c}
>
> Our NFS server then crashed, and rebooted. After reboot the
> clients then had:
>
> /nfs/a/{files-c}
> /nfs/b/{files-a}
> /nfs/c/{files-b}

There is lots of useful information you haven't given us, such as a copy
of /etc/exports, and /etc/fstab, and information about what hardware the
files are on, but at a guess:
files-a and files-b were on /dev/sdc and /dev/sdd
you plugged in a new hard drive to store files-c. It was called
/dev/sde.
But after a reboot the devices were discovered in a different order
and the one containing files-c is now called /dev/sdc,
files-a are on /dev/sdd and files-b are on /dev/sde.

As nfs in 2.6.18 uses the device id to identify a filesystem, the name of the
device that was mounted would be more important than the name of the place
where it was mounted on the server.

In more recent kernels and nfs-utils, the uuid of the filessytem is used in
preference to the device id, so this problem is avoided.

On your system you can add "fsid=1", "fsid=2" etc (avoid fsid=0) to help
disambiguate.
When you do this, the nfs server will still honour the old device-id-base
file handles, but will only give out the fsid= filehandles for new mounts.
So it will not break your current clients, but they will need to remount
before they can benefit from the increased stabilitiy of fsid= exports.

NeilBrown

>
> Note the files correspond to the wrong exports. Remounting
> on every client fixed the problem - once we realized what
> was going on! (files-a/b/c were all similar so it wasn't
> immediately aparent)
>
> My speculation is the reboot changed some enumeration
> causing a mismatch between the clients and servers.
>
> This is a new one by me - is this a known issue, and if so
> is a patch available?
>
> Other than patching, are there any precautions we can take
> to avoid this in the future? (Other than not crashing the
> server, which we'd be happy to not to do. :)
>
> Thanks
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html