2004-01-31 05:24:04

by Greg Banks

[permalink] [raw]
Subject: [PATCH] SGI 907674: document fsid export option

G'day,

[Resending as the previous attempt never appeared on the list]

This patch against nfs-utils 1.0.6 documents the fsid=num export option.

--- nfs-utils-sgi/nfs-utils-1.0.6/utils/exportfs/exports.man Thu Jan 29 13:52:34
2004
+++ nfs-utils-work/nfs-utils-1.0.6/utils/exportfs/exports.man Thu Jan 29 14:22:17
2004
@@ -271,6 +271,24 @@
then the nominted path must be a mountpoint for the exportpoint to be
exported.

+.TP
+.IR fsid= num
+This option forces the filesystem identification portion of the file
+handle and file attributes used on the wire to be
+.I num
+instead of a number derived from the major and minor number of the
+block device on which the filesystem is mounted. Any 32 bit number
+can be used, but it must be unique amongst all the exported filesystems.
+
+This can be useful for NFS failover, to ensure that both servers of
+the failover pair use the same NFS file handles for the shared filesystem
+thus avoiding stale file handles after failover.
+
+Some Linux filesystems are not mounted on a block device; exporting
+these via NFS requires the use of the
+.I fsid
+option (although that may still not be enough).
+
.SS User ID Mapping
.PP
.I nfsd

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-02-25 04:40:49

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH] SGI 907674: document fsid export option

Neil Brown wrote:
>
> On Wednesday February 25, [email protected] wrote:
> > Neil Brown wrote:
> >
> > > This interface is not needed in 2.6 and will be going away in 2.7, and
> > > the new interface (via text written into /proc ) doesn't have the 16
> > > bit limit.
> > >
> > > I think we should document it as a 32bit number, but note that only 16
> > > bits are significant in certain situations.

Ok, is this patch suitable?

--- nfs-utils-1.0.6/utils/exportfs/exports.man.orig Tue Feb 24 15:06:35 2004
+++ nfs-utils-1.0.6/utils/exportfs/exports.man Wed Feb 25 15:31:20 2004
@@ -278,7 +278,9 @@
.I num
instead of a number derived from the major and minor number of the
block device on which the filesystem is mounted. Any 32 bit number
-can be used, but it must be unique amongst all the exported filesystems.
+can be used (although only the lower 16 bits will be significant in
+certain situations), as long as the number is unique for each export
+point.

This can be useful for NFS failover, to ensure that both servers of
the failover pair use the same NFS file handles for the shared filesystem


> > >From my reading of nfs-utils last night it seems it still gets truncated
> > even with the new interface, because a struct nfsctl_export is used as
> > temporary storage.
>
> Hmm. That can and should be fixed. I will look at it.

FWIW my still-very-alpha patch addresses that as a side effect.

> An md5sum is 16bytes - half an NFSv2 filehandle (quarter NFSv3, eight NFSv4....)
> We would have to disallow it for NFSv2, but that probably isn't a big cost.

Agreed.

> We would have to use some hash of it for the fsid field in the GETATTR
> result, but I don't think that is a big deal.
>
> So that might be an option.
> I'd rather not use the pathname as filesystems can be mounted in
> different places. Most filesystems have some sort of UUID, but there
> is no standard way of extracting that information (and no guarantee
> that it is there).

Yes, the UUID solution would require per-fs plumbing which makes it
unnattractive as a solution.

> However an fsid really identifies an exportpoint, not a filesystem.
> There can be several exportpoints in one filesystem. So a UUID from
> the filesystem isn't alway sufficient.

No but a combination of UUID + path of export from root of fs, would be.

Perhaps the easiest thing to do is to use the initial 8 bytes subset
of the md5hash if the export options have "fsid=md5", and warn in syslog
if the resulting fsids aren't unique.

BTW, I have another reason for not wanting to expand the fsid too much:
I need to expand the fileid part for XFS to handle 64-bit inode numbers.

> I guess that as the MOUNT protocol uses a path to identify an export
> point, we should be safe in identifying the one with the other in the
> fsid too.

Yep. Certainly it's safer than a device pair. We have two specific
cases here in the office where the device pair is unstable: loopback
mounts and SAN Fibre Channel disks. In both cases the export point
pathname is stable.

> > Yes. However I'd be much happier if the common cases were handled
> > completely automatically in exportfs or inside the kernel without any
> > further intervention being necessary.
>
> Definitely. But we need to know what the common cases are.
>
> The simplest approach : md5sum of filename, seems nice and general.
> However suppose someone wanted to export their cdrom which they always
> mounted at /mnt/cdrom (That desire has been mentioned on this list
> occasionally).

> When I change CDs should I get a different fsid, thereby forcing the
> client to unmount and remount (which seems reasonable)?

Yes, you need to have different fsids for different insertions of
removable media. I presume the current device number scheme is really
quite broken for cdroms ? (I've always been careful when doing that).

> That would
> require a different fsid for each cdrom.

Or incorporate a generation number into the fsid, e.g. by hashing
the path+mount generation. This would imply either the VFS or the
block device maintaining a mount generation, or a notification mechanism
back into NFS when the mount/umount happens.

Actually, that notification mechanism could also deal with the other
issue that happens when an export point gets mounted over.

> If an auto-mounter were used which chose a mount-point name based on a
> label in the CDrom, then we could just use a name-based fsid and put
> the burden onto the auto-mounter :-)

Ugh, Solaris vold rises again.

> It might even be nice to have a different fsid for the filesystem root
> than for the rest of the fs. Then if you change the filesystem
> mounted as the exportpoint, any filehandles into the content would go
> stale, but the mountpoint would still be accessible.

Sure, but I don't see how it helps to have one unstale filehandle
which doesn't let you get any unstale files from it.


Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-02-27 11:40:19

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH] SGI 907674: document fsid export option

Neil Brown wrote:
>
> On Wednesday February 25, [email protected] wrote:
> > Neil Brown wrote:
> >
> > > This interface is not needed in 2.6 and will be going away in 2.7, and
> > > the new interface (via text written into /proc ) doesn't have the 16
> > > bit limit.
> > >
> > > I think we should document it as a 32bit number, but note that only 16
> > > bits are significant in certain situations.
> >
> > >From my reading of nfs-utils last night it seems it still gets truncated
> > even with the new interface, because a struct nfsctl_export is used as
> > temporary storage.
>
> Hmm. That can and should be fixed. I will look at it.



This patch against nfs-utils 1.0.6 enables 32 bit fsid when the new
/proc/fs/nfsd interface is being used. It even works with fsid > 2^31
(although both the kernel and nfs-utils print such fsids as negative,
they do work).

I have a much larger and uglier patch which enables 32 bit fsid on
2.4 kernels and the 2.6 kernels using nfsservctl(), which does work
but I believe risks compatibility issues. I won't post it.



diff -Napur --exclude-from=excludes
nfs-utils-sgi/nfs-utils-1.0.6/support/include/nfs/nfs.h
nfs-utils-work/nfs-utils-1.0.6/support/include/nfs/nfs.h
--- nfs-utils-sgi/nfs-utils-1.0.6/support/include/nfs/nfs.h Wed Feb 25 20:43:02 2004
+++ nfs-utils-work/nfs-utils-1.0.6/support/include/nfs/nfs.h Fri Feb 27 21:49:58 2004
@@ -71,7 +71,8 @@ struct nfsctl_client {
#endif

/* EXPORT/UNEXPORT */
-struct nfsctl_export {
+/* kernel's version */
+struct nfsctl_kexport {
char ex_client[NFSCLNT_IDMAX+1];
char ex_path[NFS_MAXPATHLEN+1];
__nfsd_dev_t ex_dev;
@@ -80,6 +81,18 @@ struct nfsctl_export {
__kernel_uid_t ex_anon_uid;
__kernel_gid_t ex_anon_gid;
};
+/* userspace version with 32bit ex_dev/fsid field */
+struct nfsctl_export {
+ char ex_client[NFSCLNT_IDMAX+1];
+ char ex_path[NFS_MAXPATHLEN+1];
+ __nfsd_dev_t ex_unused1;
+ __kernel_ino_t ex_ino;
+ int ex_flags;
+ __kernel_uid_t ex_anon_uid;
+ __kernel_gid_t ex_anon_gid;
+
+ uint32_t ex_dev;
+};

/* UGIDUPDATE */
struct nfsctl_uidmap {
@@ -128,7 +141,7 @@ struct nfsctl_arg {
union {
struct nfsctl_svc u_svc;
struct nfsctl_client u_client;
- struct nfsctl_export u_export;
+ struct nfsctl_kexport u_export;
struct nfsctl_uidmap u_umap;
struct nfsctl_fhparm u_getfh;
struct nfsctl_fdparm u_getfd;
diff -Napur --exclude-from=excludes
nfs-utils-sgi/nfs-utils-1.0.6/support/nfs/nfsexport.c
nfs-utils-work/nfs-utils-1.0.6/support/nfs/nfsexport.c
--- nfs-utils-sgi/nfs-utils-1.0.6/support/nfs/nfsexport.c Fri Aug 1 13:42:04 2003
+++ nfs-utils-work/nfs-utils-1.0.6/support/nfs/nfsexport.c Fri Feb 27 21:50:43 2004
@@ -99,6 +99,7 @@ nfsexport(struct nfsctl_export *exp)
}
arg.ca_version = NFSCTL_VERSION;
memcpy(&arg.ca_export, exp, sizeof(arg.ca_export));
+ arg.ca_export.ex_dev = (__nfsd_dev_t)exp->ex_dev;
return nfsctl(NFSCTL_EXPORT, &arg, NULL);
}

@@ -115,5 +116,6 @@ nfsunexport(struct nfsctl_export *exp)

arg.ca_version = NFSCTL_VERSION;
memcpy(&arg.ca_export, exp, sizeof(arg.ca_export));
+ arg.ca_export.ex_dev = (__nfsd_dev_t)exp->ex_dev;
return nfsctl(NFSCTL_UNEXPORT, &arg, NULL);
}



Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-02-24 03:40:00

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] SGI 907674: document fsid export option

On Thursday January 29, [email protected] wrote:
> G'day,
>
> This patch against nfs-utils 1.0.6 documents the fsid=num export
> option.

Thanks. I've added a paragraph about the special meaning of fsid=0
for NFSv4 and committed it to CVS.

NeilBrown

>
> --- nfs-utils-sgi/nfs-utils-1.0.6/utils/exportfs/exports.man Thu Jan 29 13:52:34 2004
> +++ nfs-utils-work/nfs-utils-1.0.6/utils/exportfs/exports.man Thu Jan 29 14:22:17
> 2004
> @@ -271,6 +271,24 @@
> then the nominted path must be a mountpoint for the exportpoint to be
> exported.
>
> +.TP
> +.IR fsid= num
> +This option forces the filesystem identification portion of the file
> +handle and file attributes used on the wire to be
> +.I num
> +instead of a number derived from the major and minor number of the
> +block device on which the filesystem is mounted. Any 32 bit number
> +can be used, but it must be unique amongst all the exported filesystems.
> +
> +This can be useful for NFS failover, to ensure that both servers of
> +the failover pair use the same NFS file handles for the shared filesystem
> +thus avoiding stale file handles after failover.
> +
> +Some Linux filesystems are not mounted on a block device; exporting
> +these via NFS requires the use of the
> +.I fsid
> +option (although that may still not be enough).
> +
> .SS User ID Mapping
> .PP
> .I nfsd
>
>
> Greg.
> --
> Greg Banks, R&D Software Engineer, SGI Australian Software Group.
> I don't speak for SGI.
>
>
> -------------------------------------------------------
> The SF.Net email is sponsored by EclipseCon 2004
> Premiere Conference on Open Tools Development and Integration
> See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
> http://www.eclipsecon.org/osdn
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-02-24 04:11:07

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] SGI 907674: document fsid export option

On Tuesday February 24, [email protected] wrote:
>
> Aha, it's embarrassment time. Since sending the patch I've discovered
> that this part
>
> > > +instead of a number derived from the major and minor number of the
> > > +block device on which the filesystem is mounted. Any 32 bit number
> > > +can be used, but it must be unique amongst all the exported filesystems.
>
> is wrong; the fsid passes through a dev_t interface and is silently
> truncated to 16 bits. The following fixes my gaffe. Sorry.
>

Hmm... I'd much rather we actually used 32 bits. Where does the
truncate happen? nfs-utils / kernel-2.4 / kernel-2.6 ??

NeilBrown

>
> --- utils/exportfs/exports.man.orig Tue Feb 24 15:06:35 2004
> +++ utils/exportfs/exports.man Tue Feb 24 15:06:38 2004
> @@ -277,7 +277,7 @@
> handle and file attributes used on the wire to be
> .I num
> instead of a number derived from the major and minor number of the
> -block device on which the filesystem is mounted. Any 32 bit number
> +block device on which the filesystem is mounted. Any 16 bit number
> can be used, but it must be unique amongst all the exported filesystems.
>
> This can be useful for NFS failover, to ensure that both servers of
>
>
> Greg.
> --
> Greg Banks, R&D Software Engineer, SGI Australian Software Group.
> I don't speak for SGI.


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-02-24 04:08:09

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH] SGI 907674: document fsid export option

Neil Brown wrote:
>
> On Thursday January 29, [email protected] wrote:
> > G'day,
> >
> > This patch against nfs-utils 1.0.6 documents the fsid=num export
> > option.
>
> Thanks. I've added a paragraph about the special meaning of fsid=0
> for NFSv4 and committed it to CVS.
>

Aha, it's embarrassment time. Since sending the patch I've discovered
that this part

> > +instead of a number derived from the major and minor number of the
> > +block device on which the filesystem is mounted. Any 32 bit number
> > +can be used, but it must be unique amongst all the exported filesystems.

is wrong; the fsid passes through a dev_t interface and is silently
truncated to 16 bits. The following fixes my gaffe. Sorry.


--- utils/exportfs/exports.man.orig Tue Feb 24 15:06:35 2004
+++ utils/exportfs/exports.man Tue Feb 24 15:06:38 2004
@@ -277,7 +277,7 @@
handle and file attributes used on the wire to be
.I num
instead of a number derived from the major and minor number of the
-block device on which the filesystem is mounted. Any 32 bit number
+block device on which the filesystem is mounted. Any 16 bit number
can be used, but it must be unique amongst all the exported filesystems.

This can be useful for NFS failover, to ensure that both servers of


Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-02-24 04:31:45

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH] SGI 907674: document fsid export option

Neil Brown wrote:
>
> On Tuesday February 24, [email protected] wrote:
> >
> > Aha, it's embarrassment time. Since sending the patch I've discovered
> > that this part
> >
> > > > +instead of a number derived from the major and minor number of the
> > > > +block device on which the filesystem is mounted. Any 32 bit number
> > > > +can be used, but it must be unique amongst all the exported filesystems.
> >
> > is wrong; the fsid passes through a dev_t interface and is silently
> > truncated to 16 bits. The following fixes my gaffe. Sorry.
> >
>
> Hmm... I'd much rather we actually used 32 bits.

Actually the field in the file handle on the wire is 64 bits:

/* fs/nfsd/nfs3xdr.c */
static inline u32 *
encode_fattr3(struct svc_rqst *rqstp, u32 *p, struct svc_fh *fhp)
{
[...]
if (rqstp->rq_reffh->fh_version == 1
&& rqstp->rq_reffh->fh_fsid_type == 1
&& (fhp->fh_export->ex_flags & NFSEXP_FSID))
p = xdr_encode_hyper(p, (u64) fhp->fh_export->ex_fsid);
else
p = xdr_encode_hyper(p, (u64) inode->i_dev);
[...]
}


> Where does the
> truncate happen? nfs-utils / kernel-2.4 / kernel-2.6 ??

The fsid is passed through the ex_dev field in struct nfsctl_export,
which (presumably for compatibility) is 16 bits both in 2.4 and 2.6.
There are two copies, one each in the kernel and nfs-utils.

/* linux/include/linux/nfsd/syscall.h */
/* EXPORT/UNEXPORT */
struct nfsctl_export {
char ex_client[NFSCLNT_IDMAX+1];
char ex_path[NFS_MAXPATHLEN+1];
__kernel_dev_t ex_dev; <---
__nfsd_ino_t ex_ino;
int ex_flags;
__kernel_uid_t ex_anon_uid;
__kernel_gid_t ex_anon_gid;
};

/* nfs-utils/support/include/nfs/nfs.h */
/* EXPORT/UNEXPORT */
struct nfsctl_export {
char ex_client[NFSCLNT_IDMAX+1];
char ex_path[NFS_MAXPATHLEN+1];
__nfsd_dev_t ex_dev; <---
__kernel_ino_t ex_ino;
int ex_flags;
__kernel_uid_t ex_anon_uid;
__kernel_gid_t ex_anon_gid;
};


I agree the truncate is unfortunate. We have a 2.4.25 machine here with
dozens of exports each with an fsid= option automatically created by taking
the first 2 bytes of the md5sum of their names (because their devices aren't
stable) and some of the fsids are uncomfortably close.


Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-02-25 00:05:42

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] SGI 907674: document fsid export option

On Tuesday February 24, [email protected] wrote:
> Neil Brown wrote:
> >
> > On Tuesday February 24, [email protected] wrote:
> > >
> > > Aha, it's embarrassment time. Since sending the patch I've discovered
> > > that this part
> > >
> > > > > +instead of a number derived from the major and minor number of the
> > > > > +block device on which the filesystem is mounted. Any 32 bit number
> > > > > +can be used, but it must be unique amongst all the exported filesystems.
> > >
> > > is wrong; the fsid passes through a dev_t interface and is silently
> > > truncated to 16 bits. The following fixes my gaffe. Sorry.
> > >
> >
> > Hmm... I'd much rather we actually used 32 bits.
>
> Actually the field in the file handle on the wire is 64 bits:
>
> /* fs/nfsd/nfs3xdr.c */
> static inline u32 *
> encode_fattr3(struct svc_rqst *rqstp, u32 *p, struct svc_fh *fhp)
> {
> [...]
> if (rqstp->rq_reffh->fh_version == 1
> && rqstp->rq_reffh->fh_fsid_type == 1
> && (fhp->fh_export->ex_flags & NFSEXP_FSID))
> p = xdr_encode_hyper(p, (u64) fhp->fh_export->ex_fsid);
> else
> p = xdr_encode_hyper(p, (u64) inode->i_dev);
> [...]
> }
>

The fsid is also use in the filehandle, and there only 32 bits are
used. This was the usage I was thinking of - I had forgotten the
other one.


>
> > Where does the
> > truncate happen? nfs-utils / kernel-2.4 / kernel-2.6 ??
>
> The fsid is passed through the ex_dev field in struct nfsctl_export,
> which (presumably for compatibility) is 16 bits both in 2.4 and 2.6.
> There are two copies, one each in the kernel and nfs-utils.
>
> /* linux/include/linux/nfsd/syscall.h */
> /* EXPORT/UNEXPORT */
> struct nfsctl_export {
> char ex_client[NFSCLNT_IDMAX+1];
> char ex_path[NFS_MAXPATHLEN+1];
> __kernel_dev_t ex_dev; <---
> __nfsd_ino_t ex_ino;

yuk... and there is probably 2 bytes of padding in there on most
architectures... not that we can really use it.

This interface is not needed in 2.6 and will be going away in 2.7, and
the new interface (via text written into /proc ) doesn't have the 16
bit limit.

I think we should document it as a 32bit number, but note that only 16
bits are significant in certain situations.

>
> I agree the truncate is unfortunate. We have a 2.4.25 machine here with
> dozens of exports each with an fsid= option automatically created by taking
> the first 2 bytes of the md5sum of their names (because their devices aren't
> stable) and some of the fsids are uncomfortably close.

This related so the next big issue with filehandles - how to identify
the filesystem reliably.
We now have a nice interface into the filesystem so that "which
file in the filesystem" can be encoded in the filehandle reliably, but
at the same time, the way we identify the filesystem is become less
reliably due to device number instability.

I don't like the md5sum approach as it is only probabilistically
reliable. If we could use all the bits it might be OK, but we clearly
cannot and with only 16 bits, you are already seeing some fsid's being
"uncomfortably close". 32bits will be better, but still not ideal.

There really needs to be a way for a site to centrally allocate fsid
numbers. Each filesystems fsid would need to be stored on the
filesystem itself otherwise we would be back to the bad-old-days of
depending on a state file in /var like /var/lib/nfs/rmtab.

I'm leaning towards something like:

fsid=auto
means look in the exportpoint for a file called ".nfs-fsid"
If it exists, read 8 hex bytes and use that to determine a 32bit
number.
If it doesn't exist and /sbin/nfs-fsid does, run that pass it the
export point. It should write 8 hex bytes to stdout.
It might also write them to .nfs-fsid if it wants to.
If /etc/nfs-nfsid doesn't exist, assume /var/lib/nfs/fsid
contains a hex number which should be used, stored in .nfs-fsid,
and incremented.

This would allow a fairly reliable way of automatically allocating
unique fsids on a per-machine basis, but would allow admins to define
their own nfs-fsid program that allocated ids on a site-wide basis.


Thoughts?

NeilBrown


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-02-25 00:32:09

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH] SGI 907674: document fsid export option

Neil Brown wrote:
>
>
> The fsid is also use in the filehandle, and there only 32 bits are
> used. This was the usage I was thinking of - I had forgotten the
> other one.

Yep, realised I was looking at the wrong code after I got home. Doh!
So the reak limit is 32b.

> > > Where does the
> > > truncate happen? nfs-utils / kernel-2.4 / kernel-2.6 ??
> >
> > The fsid is passed through the ex_dev field in struct nfsctl_export,
> > which (presumably for compatibility) is 16 bits both in 2.4 and 2.6.
> > There are two copies, one each in the kernel and nfs-utils.
> >
> > /* linux/include/linux/nfsd/syscall.h */
> > /* EXPORT/UNEXPORT */
> > struct nfsctl_export {
> > char ex_client[NFSCLNT_IDMAX+1];
> > char ex_path[NFS_MAXPATHLEN+1];
> > __kernel_dev_t ex_dev; <---
> > __nfsd_ino_t ex_ino;
>
> yuk... and there is probably 2 bytes of padding in there on most
> architectures... not that we can really use it.

I think any solution would involve extending the nfsctl_export structure,
hopefully in a compatible way. I have a very alpha patch I hacked up
last night to do that. With luck I might get to see if it compiles today.

> This interface is not needed in 2.6 and will be going away in 2.7, and
> the new interface (via text written into /proc ) doesn't have the 16
> bit limit.
>
> I think we should document it as a 32bit number, but note that only 16
> bits are significant in certain situations.

>From my reading of nfs-utils last night it seems it still gets truncated
even with the new interface, because a struct nfsctl_export is used as
temporary storage.

> > I agree the truncate is unfortunate. We have a 2.4.25 machine here with
> > dozens of exports each with an fsid= option automatically created by taking
> > the first 2 bytes of the md5sum of their names (because their devices aren't
> > stable) and some of the fsids are uncomfortably close.
>
> This related so the next big issue with filehandles - how to identify
> the filesystem reliably.
> We now have a nice interface into the filesystem so that "which
> file in the filesystem" can be encoded in the filehandle reliably, but
> at the same time, the way we identify the filesystem is become less
> reliably due to device number instability.
>
> I don't like the md5sum approach as it is only probabilistically
> reliable. If we could use all the bits it might be OK, but we clearly
> cannot and with only 16 bits, you are already seeing some fsid's being
> "uncomfortably close". 32bits will be better, but still not ideal.

Good point. How about defining a new fsid type in the file handle
which has enough space to store the md5sum of the path? We could then
fall back to using that automatically when we can tell from the underlying
fs that it either hasn't got a device or has an unstable device. This
would solve an issue seen here in SGI Melbourne when we tried using
userfs to present all those exports as a single fs union: userfs doesn't
have enough support to allows NFS export.

> There really needs to be a way for a site to centrally allocate fsid
> numbers. Each filesystems fsid would need to be stored on the
> filesystem itself otherwise we would be back to the bad-old-days of
> depending on a state file in /var like /var/lib/nfs/rmtab.

On XFS you could use the SCSI UUID of the filesystem which is 16B,
generated to be unique, and has uniqueness enforced at mount time
(to handle the case of ghosting an fs).

> I'm leaning towards something like:
>
> fsid=auto
> means look in the exportpoint for a file called ".nfs-fsid"
> If it exists, read 8 hex bytes and use that to determine a 32bit
> number.

This is interesting, but we would have a problem for the machine here
in Melbourne: all the exports are readonly loopback-mounted ISO9660
images.

Also, if the export point is writable by non-root (which IRIC might
happen for some NIS setups) you have an entertaining security issue.
Also, if it's accidentally written or deleted by non-squashed root
remotely you have a problem.

I don't think putting the fsid inside the export is going to work.

> If it doesn't exist and /sbin/nfs-fsid does, run that pass it the
> export point. It should write 8 hex bytes to stdout.
> It might also write them to .nfs-fsid if it wants to.
> If /etc/nfs-nfsid doesn't exist, assume /var/lib/nfs/fsid
> contains a hex number which should be used, stored in .nfs-fsid,
> and incremented.

Ok, nice and general; handles the case of multiple exports from a single
local fs. But the nfs-fsid program needs to be per-fstype to handle
those cases where the fs already gives you a useful number, like XFS.

> This would allow a fairly reliable way of automatically allocating
> unique fsids on a per-machine basis, but would allow admins to define
> their own nfs-fsid program that allocated ids on a site-wide basis.

Yes. However I'd be much happier if the common cases were handled
completely automatically in exportfs or inside the kernel without any
further intervention being necessary.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-02-25 03:23:13

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] SGI 907674: document fsid export option

On Wednesday February 25, [email protected] wrote:
> Neil Brown wrote:
>
> > This interface is not needed in 2.6 and will be going away in 2.7, and
> > the new interface (via text written into /proc ) doesn't have the 16
> > bit limit.
> >
> > I think we should document it as a 32bit number, but note that only 16
> > bits are significant in certain situations.
>
> >From my reading of nfs-utils last night it seems it still gets truncated
> even with the new interface, because a struct nfsctl_export is used as
> temporary storage.

Hmm. That can and should be fixed. I will look at it.

>
> Good point. How about defining a new fsid type in the file handle
> which has enough space to store the md5sum of the path? We could then
> fall back to using that automatically when we can tell from the underlying
> fs that it either hasn't got a device or has an unstable device. This
> would solve an issue seen here in SGI Melbourne when we tried using
> userfs to present all those exports as a single fs union: userfs doesn't
> have enough support to allows NFS export.

An md5sum is 16bytes - half an NFSv2 filehandle (quarter NFSv3, eight
NFSv4....)
We would have to disallow it for NFSv2, but that probably isn't a big
cost.
We would have to use some hash of it for the fsid field in the GETATTR
result, but I don't think that is a big deal.

So that might be an option.
I'd rather not use the pathname as filesystems can be mounted in
different places. Most filesystems have some sort of UUID, but there
is no standard way of extracting that information (and no guarantee
that it is there).

However an fsid really identifies an exportpoint, not a filesystem.
There can be several exportpoints in one filesystem. So a UUID from
the filesystem isn't alway sufficient.

I guess that as the MOUNT protocol uses a path to identify an export
point, we should be safe in identifying the one with the other in the
fsid too.

> > I'm leaning towards something like:
> >
> > fsid=auto
> > means look in the exportpoint for a file called ".nfs-fsid"
> > If it exists, read 8 hex bytes and use that to determine a 32bit
> > number.
>
> This is interesting, but we would have a problem for the machine here
> in Melbourne: all the exports are readonly loopback-mounted ISO9660
> images.

That would need to be handled by the /sbin/nfs-fsid program, but you
are right that it would be best if most common scenarios were handled
transparently, and the .nfs-fsid file does seem to have a number of
issues.

>
> Yes. However I'd be much happier if the common cases were handled
> completely automatically in exportfs or inside the kernel without any
> further intervention being necessary.

Definitely. But we need to know what the common cases are.

The simplest approach : md5sum of filename, seems nice and general.
However suppose someone wanted to export their cdrom which they always
mounted at /mnt/cdrom (That desire has been mentioned on this list
occasionally).
When I change CDs should I get a different fsid, thereby forcing the
client to unmount and remount (which seems reasonable)? That would
require a different fsid for each cdrom.
If an auto-mounter were used which chose a mount-point name based on a
label in the CDrom, then we could just use a name-based fsid and put
the burden onto the auto-mounter :-)

It might even be nice to have a different fsid for the filesystem root
than for the rest of the fs. Then if you change the filesystem
mounted as the exportpoint, any filehandles into the content would go
stale, but the mountpoint would still be accessible.

So as you can probably tell, it really isn't clear to me what is
best. Thankyou for your thoughts and suggestions.

NeilBrown


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs