2005-10-05 18:42:13

by Ed L. Cashin

[permalink] [raw]
Subject: block dev minor > 255 and exporting fs

Hi. I've noticed that an NFS mount times out when I export a
filesystem residing on a block device with a "large" minor number,
i.e. beyond the old limit of 255 from when there were only eight bits
for the minor number of devices.

If I use a block device with a lower minor number, things work as
expected, and if I "wrap" a high-numbered device in a trivial md set,
using /dev/md0 with its minor number of zero, things work as expected.

Without initial success I've looked at the kernel sources to see where
the nfs server might be using only eight of the twenty bits 2.6 uses
for minor numbers. Does anyone know where that might be occuring?

The nfs server in my tests is a debian testing machine running
2.6.12-1-amd64-generic, and the client is a debian stable system
running a custom 2.6.13-rc6 kernel, but I've seen this problem on
other systems a while ago. At that time I found out that 255 was the
magic minor number after which problems started occuring, if I recall
correctly. If you don't have block devices with high minor numbers to
test with, you can replicate this problem using the vblade:

http://sourceforge.net/projects/aoetools/

... and the aoe driver in any 2.6 kernel from 2.6.11. Anyway, here
are the details for interested parties. The nfs server is "makki" and
the client is "kokone".

makki:/home/ecashin# modprobe aoe
makki:/home/ecashin# ls -l /dev/etherd/e2.1
brw-rw---- 1 root disk 152, 336 2005-10-05 08:24 /dev/etherd/e2.1
makki:/home/ecashin# mount /dev/etherd/e2.1 /mnt/aoe/e2.1
makki:/home/ecashin# grep aoe /etc/exports
/mnt/aoe/e2.1 *.coraid.com(rw,sync)
makki:/home/ecashin#

On the client, mount times out.

root@kokone root# mount -t nfs makki:/mnt/aoe/e2.1 /mnt/makki
mount: makki:/mnt/aoe/e2.1: can't read superblock
root@kokone root# tail /var/log/everything
...
Oct 5 12:27:16 kokone kernel: nfs: server makki not responding, timed out
Oct 5 12:27:37 kokone last message repeated 2 times
root@kokone root#

I can use a trivial one-device linear software RAID on the nfs server
so that nfs doesn't see the high minor device number. This is just
using a low-minor-number md device as a wrapper for the
high-minor-number aoe device.

makki:/home/ecashin# /etc/init.d/nfs-kernel-server stop && /etc/init.d/nfs-common stop
Stopping NFS kernel daemon: mountd nfsd.
Unexporting directories for NFS kernel daemon...done.
Stopping NFS common utilities: statd.
makki:/home/ecashin# umount /mnt/aoe/e2.1
makki:/home/ecashin# ls -l /dev/md0
brw-rw---- 1 root disk 9, 0 2005-10-05 08:40 /dev/md0
makki:/home/ecashin# mdadm -B --auto=md --force -l linear -n 1 /dev/md0 /dev/etherd/e2.1
mdadm: array /dev/md0 built and started.
makki:/home/ecashin# mount /dev/md0 /mnt/aoe/e2.1
makki:/home/ecashin# ls /mnt/aoe/e2.1
screen
makki:/home/ecashin# /etc/init.d/nfs-common start && /etc/init.d/nfs-kernel-server start
Starting NFS common utilities: statd.
Exporting directories for NFS kernel daemon...done.
Starting NFS kernel daemon: nfsd mountd.
makki:/home/ecashin#

Then on the client, all goes well:

root@kokone root# mount -t nfs makki:/mnt/aoe/e2.1 /mnt/makki
root@kokone root# ls /mnt/makki
screen
root@kokone root# umount /mnt/makki

So I have a nice workaround, but I would rather not need it. Things
go well *without* the md wrapper if the aoe device has a minor number
below 256. What part of the nfs server doesn't use all twenty bits
that 2.6 uses for the device minor number? I remember guessing that
it was a handle or tag used in the protocol, but that was a long time
ago.

makki:/home/ecashin# /etc/init.d/nfs-kernel-server stop && /etc/init.d/nfs-common stop
Stopping NFS kernel daemon: mountd nfsd.
Unexporting directories for NFS kernel daemon...done.
Stopping NFS common utilities: statd.
makki:/home/ecashin# umount /mnt/aoe/e2.1
makki:/home/ecashin# mdadm -S /dev/md0
makki:/home/ecashin# sync
makki:/home/ecashin# ls -l /dev/etherd/e0.0
brw-rw---- 1 root disk 152, 0 2005-10-05 08:49 /dev/etherd/e0.0
makki:/home/ecashin# mount /dev/etherd/e0.0 /mnt/aoe/e2.1
makki:/home/ecashin# /etc/init.d/nfs-common start && /etc/init.d/nfs-kernel-server start
Starting NFS common utilities: statd.
Exporting directories for NFS kernel daemon...done.
Starting NFS kernel daemon: nfsd mountd.
makki:/home/ecashin#

root@kokone root# mount -t nfs makki:/mnt/aoe/e2.1 /mnt/makki
root@kokone root# ls /mnt/makki
screen
root@kokone root#

--
Ed L Cashin <[email protected]>



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2005-10-06 06:33:28

by NeilBrown

[permalink] [raw]
Subject: Re: block dev minor > 255 and exporting fs

On Wednesday October 5, [email protected] wrote:
> Hi. I've noticed that an NFS mount times out when I export a
> filesystem residing on a block device with a "large" minor number,
> i.e. beyond the old limit of 255 from when there were only eight bits
> for the minor number of devices.

Hmmm. it definitely shouldn't do that, but I'm not surprised. knfsd
has some fairly old-fashioned ideas about how device number work.
I'll put it on my todo list to look at, but in the mean time there is
another work around you could use that is slightly less awkward than
making an 'md' device.

If you would 'fsid=23', or some other number, as an option in
/etc/exports, then it will use that number rather than the device
number to identify the filesystem, and so it should work better.

NeilBrown


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-10-06 17:01:22

by Ed L. Cashin

[permalink] [raw]
Subject: Re: block dev minor > 255 and exporting fs

Neil Brown <[email protected]> writes:

> On Wednesday October 5, [email protected] wrote:
>> Hi. I've noticed that an NFS mount times out when I export a
>> filesystem residing on a block device with a "large" minor number,
>> i.e. beyond the old limit of 255 from when there were only eight bits
>> for the minor number of devices.
>
> Hmmm. it definitely shouldn't do that, but I'm not surprised. knfsd
> has some fairly old-fashioned ideas about how device number work.
> I'll put it on my todo list to look at,

Excellent! I'd like to hear about what you find.

...
> but in the mean time there is
> another work around you could use that is slightly less awkward than
> making an 'md' device.
>
> If you would 'fsid=23', or some other number, as an option in
> /etc/exports, then it will use that number rather than the device
> number to identify the filesystem, and so it should work better.

Well I'll be! That works great and is indeed much more convenient
than using md. Thanks for the tip---I hadn't noticed fsid in the
exports manpage before.

--
Ed L Cashin <[email protected]>



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-10-07 09:45:44

by Richard Hirst

[permalink] [raw]
Subject: Re: block dev minor > 255 and exporting fs

> Hi. I've noticed that an NFS mount times out when I export a
> filesystem residing on a block device with a "large" minor number,
> i.e. beyond the old limit of 255 from when there were only eight bits
> for the minor number of devices.

When I looked in to this I decided the problem lay in userland not
kernel land. Once you get to minor numbers greater than 255, this
kernel code:

+++ linux-2.6.10/fs/nfsd/nfsfh.c 2005-08-05 17:35:12.128552514 +0100
@@ -351,8 +351,13 @@

if (!old_valid_dev(ex_dev) && ref_fh_fsid_type == 0) {
/* for newer device numbers, we must use a newer fsid format */
ref_fh_version = 1;
ref_fh_fsid_type = 3;
}


switches from using a type 0 fsid to a type 3 fsid.

Then somewhere in mountd it reads that fsid and tries to interpret
it. Trouble is nfs-utils only understands fsid types 0 and 1. I'm
a bit vague about this .. it was while ago I looked at it, but IIRC
the nfs-utils code was here:

nfs-utils-1.0.6/utils/mountd/cache.c round line 122:

if (fsidtype < 0 || fsidtype > 1)
goto out; /* unknown type */


Anyway, the fsid type 0 can actually handle up to 16 bits for major
and minor and 16 bits was enough for me, so I hacked my kernel to
use fsid type 0 for minors up to 64K.

Obviously things might have moved on since I looked at those code
versions.

(I'm not subscribed, please CC me on replies)

Richard


>
> If I use a block device with a lower minor number, things work as
> expected, and if I "wrap" a high-numbered device in a trivial md set,
> using /dev/md0 with its minor number of zero, things work as expected.
>
> Without initial success I've looked at the kernel sources to see where
> the nfs server might be using only eight of the twenty bits 2.6 uses
> for minor numbers. Does anyone know where that might be occuring?
>
> The nfs server in my tests is a debian testing machine running
> 2.6.12-1-amd64-generic, and the client is a debian stable system
> running a custom 2.6.13-rc6 kernel, but I've seen this problem on
> other systems a while ago. At that time I found out that 255 was the
> magic minor number after which problems started occuring, if I recall
> correctly. If you don't have block devices with high minor numbers to
> test with, you can replicate this problem using the vblade:
>
> http://sourceforge.net/projects/aoetools/
>
> ... and the aoe driver in any 2.6 kernel from 2.6.11. Anyway, here
> are the details for interested parties. The nfs server is "makki" and
> the client is "kokone".
>
> makki:/home/ecashin# modprobe aoe
> makki:/home/ecashin# ls -l /dev/etherd/e2.1
> brw-rw---- 1 root disk 152, 336 2005-10-05 08:24 /dev/etherd/e2.1
> makki:/home/ecashin# mount /dev/etherd/e2.1 /mnt/aoe/e2.1
> makki:/home/ecashin# grep aoe /etc/exports
> /mnt/aoe/e2.1 *.coraid.com(rw,sync)
> makki:/home/ecashin#
>
> On the client, mount times out.
>
> root@kokone root# mount -t nfs makki:/mnt/aoe/e2.1 /mnt/makki
> mount: makki:/mnt/aoe/e2.1: can't read superblock
> root@kokone root# tail /var/log/everything
> ...
> Oct 5 12:27:16 kokone kernel: nfs: server makki not responding, timed out
> Oct 5 12:27:37 kokone last message repeated 2 times
> root@kokone root#
>
> I can use a trivial one-device linear software RAID on the nfs server
> so that nfs doesn't see the high minor device number. This is just
> using a low-minor-number md device as a wrapper for the
> high-minor-number aoe device.
>
> makki:/home/ecashin# /etc/init.d/nfs-kernel-server stop && /etc/init.d/nfs-common stop
> Stopping NFS kernel daemon: mountd nfsd.
> Unexporting directories for NFS kernel daemon...done.
> Stopping NFS common utilities: statd.
> makki:/home/ecashin# umount /mnt/aoe/e2.1
> makki:/home/ecashin# ls -l /dev/md0
> brw-rw---- 1 root disk 9, 0 2005-10-05 08:40 /dev/md0
> makki:/home/ecashin# mdadm -B --auto=md --force -l linear -n 1 /dev/md0 /dev/etherd/e2.1
> mdadm: array /dev/md0 built and started.
> makki:/home/ecashin# mount /dev/md0 /mnt/aoe/e2.1
> makki:/home/ecashin# ls /mnt/aoe/e2.1
> screen
> makki:/home/ecashin# /etc/init.d/nfs-common start && /etc/init.d/nfs-kernel-server start
> Starting NFS common utilities: statd.
> Exporting directories for NFS kernel daemon...done.
> Starting NFS kernel daemon: nfsd mountd.
> makki:/home/ecashin#
>
> Then on the client, all goes well:
>
> root@kokone root# mount -t nfs makki:/mnt/aoe/e2.1 /mnt/makki
> root@kokone root# ls /mnt/makki
> screen
> root@kokone root# umount /mnt/makki
>
> So I have a nice workaround, but I would rather not need it. Things
> go well *without* the md wrapper if the aoe device has a minor number
> below 256. What part of the nfs server doesn't use all twenty bits
> that 2.6 uses for the device minor number? I remember guessing that
> it was a handle or tag used in the protocol, but that was a long time
> ago.
>
> makki:/home/ecashin# /etc/init.d/nfs-kernel-server stop && /etc/init.d/nfs-common stop
> Stopping NFS kernel daemon: mountd nfsd.
> Unexporting directories for NFS kernel daemon...done.
> Stopping NFS common utilities: statd.
> makki:/home/ecashin# umount /mnt/aoe/e2.1
> makki:/home/ecashin# mdadm -S /dev/md0
> makki:/home/ecashin# sync
> makki:/home/ecashin# ls -l /dev/etherd/e0.0
> brw-rw---- 1 root disk 152, 0 2005-10-05 08:49 /dev/etherd/e0.0
> makki:/home/ecashin# mount /dev/etherd/e0.0 /mnt/aoe/e2.1
> makki:/home/ecashin# /etc/init.d/nfs-common start && /etc/init.d/nfs-kernel-server start
> Starting NFS common utilities: statd.
> Exporting directories for NFS kernel daemon...done.
> Starting NFS kernel daemon: nfsd mountd.
> makki:/home/ecashin#
>
> root@kokone root# mount -t nfs makki:/mnt/aoe/e2.1 /mnt/makki
> root@kokone root# ls /mnt/makki
> screen
> root@kokone root#
>
> --
> Ed L Cashin <[email protected]>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by:
> Power Architecture Resource Center: Free content, downloads, discussions,
> and more. http://solutions.newsforge.com/ibmarch.tmpl
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-10-14 07:42:15

by NeilBrown

[permalink] [raw]
Subject: Re: block dev minor > 255 and exporting fs

On Friday October 7, [email protected] wrote:
> > Hi. I've noticed that an NFS mount times out when I export a
> > filesystem residing on a block device with a "large" minor number,
> > i.e. beyond the old limit of 255 from when there were only eight bits
> > for the minor number of devices.
>
> When I looked in to this I decided the problem lay in userland not
> kernel land. Once you get to minor numbers greater than 255, this
> kernel code:
>
> +++ linux-2.6.10/fs/nfsd/nfsfh.c 2005-08-05 17:35:12.128552514 +0100
> @@ -351,8 +351,13 @@
>
> if (!old_valid_dev(ex_dev) && ref_fh_fsid_type == 0) {
> /* for newer device numbers, we must use a newer fsid format */
> ref_fh_version = 1;
> ref_fh_fsid_type = 3;
> }
>
>
> switches from using a type 0 fsid to a type 3 fsid.
>
> Then somewhere in mountd it reads that fsid and tries to interpret
> it. Trouble is nfs-utils only understands fsid types 0 and 1. I'm
> a bit vague about this .. it was while ago I looked at it, but IIRC
> the nfs-utils code was here:
>
> nfs-utils-1.0.6/utils/mountd/cache.c round line 122:
>
> if (fsidtype < 0 || fsidtype > 1)
> goto out; /* unknown type */

Oh yes, user-land hasn't kept up with the kernel very well, has it.
I've just committed a change to the nfs-utils CVS which makes it
understand types 2 and 3.

I guess it might be nearly time for a new nfs-utils release...

>
>
> Anyway, the fsid type 0 can actually handle up to 16 bits for major
> and minor and 16 bits was enough for me, so I hacked my kernel to
> use fsid type 0 for minors up to 64K.

Sounds like a sensible approach.

>
> Obviously things might have moved on since I looked at those code
> versions.

They hadn't, until now. Thanks.

NeilBrown


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs