2006-05-25 07:35:25

by Vijay Chauhan

[permalink] [raw]
Subject: 2.4 vs 2.6

Hi,
Can anyone tell me what are the differences in the linux kernel 2.4 version
and 2.6 version related to NFS implementation

TIA,

Vijay


Attachments:
(No filename) (139.00 B)
(No filename) (227.00 B)
Download all attachments

2006-05-26 07:29:57

by NeilBrown

[permalink] [raw]
Subject: Re: 2.4 vs 2.6

On Thursday May 25, [email protected] wrote:
> Hi,
> Can anyone tell me what are the differences in the linux kernel 2.4 version
> and 2.6 version related to NFS implementation

How much detail do you want?

Apart from adding NFSv4, the main difference is the authentication
cache with support for up-calls.
In 2.4, exportfs and mountd had to tell the kernel about any export
information that it might need to know.
In 2.6, the kernel can ask mountd, and mountd will provide information
on-demand.

NeilBrown


-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-05-26 07:31:03

by NeilBrown

[permalink] [raw]
Subject: Re: 2.4 vs 2.6

On Friday May 26, [email protected] wrote:
> On Thursday May 25, [email protected] wrote:
> > Hi,
> > Can anyone tell me what are the differences in the linux kernel 2.4 version
> > and 2.6 version related to NFS implementation
>
> How much detail do you want?
>
> Apart from adding NFSv4, the main difference is the authentication
> cache with support for up-calls.
> In 2.4, exportfs and mountd had to tell the kernel about any export
> information that it might need to know.
> In 2.6, the kernel can ask mountd, and mountd will provide information
> on-demand.
>
> NeilBrown

Ofcourse, that is just the server. There have been lots of changes to
the client as well including RPCSEC support and much much more.

What exactly do you need to know?

NeilBrown


-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-05-26 08:19:14

by mehta kiran

[permalink] [raw]
Subject: Re: 2.4 vs 2.6

Hi,

1. I noticed that 2.6 kernel is not aware of=20
exportfs until client tries to mount. It only
after this that proc nfsd filesystem shows this
in list of exports due to upcall.
Was 2.4 kernel aware of exportfs when the
filesystem was first exported or mounted ?

2. Can you let me know advantage 2.6 export approach
over 2.4 export approach ?=20
Is this approach required due to=20
security/authentication integration with NFS(say=20
ip->name conversion for security ) ?

Thanks,
kiran

--- Neil Brown <[email protected]> wrote:

> On Thursday May 25, [email protected] wrote:
> > Hi,
> > Can anyone tell me what are the differences in the
> linux kernel 2.4 version
> > and 2.6 version related to NFS implementation
>=20
> How much detail do you want?
>=20
> Apart from adding NFSv4, the main difference is the
> authentication
> cache with support for up-calls.
> In 2.4, exportfs and mountd had to tell the kernel
> about any export
> information that it might need to know.
> In 2.6, the kernel can ask mountd, and mountd will
> provide information
> on-demand.
>=20
> NeilBrown
>=20
>=20
>
-------------------------------------------------------
> All the advantages of Linux Managed Hosting--Without
> the Cost and Risk!
> Fully trained technicians. The highest number of Red
> Hat certifications in
> the hosting industry. Fanatical Support. Click to
> learn more
>
http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D107521&bid=3D248729&dat=3D=
121642
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>=20


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around=20
http://mail.yahoo.com=20


-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications i=
n
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D107521&bid=3D248729&dat=3D=
121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-05-26 19:31:29

by J. Bruce Fields

[permalink] [raw]
Subject: Re: 2.4 vs 2.6

On Fri, May 26, 2006 at 01:19:05AM -0700, mehta kiran wrote:
> Hi,
>
> 1. I noticed that 2.6 kernel is not aware of
> exportfs until client tries to mount. It only
> after this that proc nfsd filesystem shows this
> in list of exports due to upcall.
> Was 2.4 kernel aware of exportfs when the
> filesystem was first exported or mounted ?
>
> 2. Can you let me know advantage 2.6 export approach
> over 2.4 export approach ?
> Is this approach required due to
> security/authentication integration with NFS(say
> ip->name conversion for security ) ?

The kernel has to know whether an incoming IP address matches, say,
"*.umich.edu." But we don't want to add an entire DNS client
implementation to the kernel. And we also can't preload the kernel with
every possible IP address that might match *.umich.edu. So we allow
the kernel to ask mountd for that information when it needs it.

This isn't actually necessary for an NFSv2/v3 server, because mountd
always gets a "mount" request from any client before the kernel gets any
NFS requests from that client, so mountd can tell the kernel about the
new client then.

But NFSv4 clients don't contact mountd first. Neither do v2/v3 clients
that are failing over from another server. So you need the upcalls to
handle those cases.

That justifies the auth_unix_ip upcall, anyway; I'm less sure about the
others.

--b.


-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-05-29 05:55:32

by NeilBrown

[permalink] [raw]
Subject: Re: 2.4 vs 2.6

On Friday May 26, [email protected] wrote:
> On Fri, May 26, 2006 at 01:19:05AM -0700, mehta kiran wrote:
> > Hi,
> >
> > 1. I noticed that 2.6 kernel is not aware of
> > exportfs until client tries to mount. It only
> > after this that proc nfsd filesystem shows this
> > in list of exports due to upcall.
> > Was 2.4 kernel aware of exportfs when the
> > filesystem was first exported or mounted ?
> >
> > 2. Can you let me know advantage 2.6 export approach
> > over 2.4 export approach ?
> > Is this approach required due to
> > security/authentication integration with NFS(say
> > ip->name conversion for security ) ?
>
> The kernel has to know whether an incoming IP address matches, say,
> "*.umich.edu." But we don't want to add an entire DNS client
> implementation to the kernel. And we also can't preload the kernel with
> every possible IP address that might match *.umich.edu. So we allow
> the kernel to ask mountd for that information when it needs it.
>
> This isn't actually necessary for an NFSv2/v3 server, because mountd
> always gets a "mount" request from any client before the kernel gets any
> NFS requests from that client, so mountd can tell the kernel about the
> new client then.

Not exactly necessary, but without it we depend on 'rmtab' to record
which clients have the filesystem mounted across a server reboot, and
it is not possible to maintain an rmtab file reliably.

>
> But NFSv4 clients don't contact mountd first. Neither do v2/v3 clients
> that are failing over from another server. So you need the upcalls to
> handle those cases.
>
> That justifies the auth_unix_ip upcall, anyway; I'm less sure about the
> others.

The nfsd.fh upcall - which maps a file-handle-fragment to a directory -
can be used to implementing on-demand loading of filesystems,
e.g. CDroms from a CD library. We don't actually do that now, but we
could.
It also means that if a filesystem isn't currently being used by a
client, (where 'currently' means in the last 30 minutes I think), then
you can unmount it without having to unexport it. This is (arguably?)
a good thing.

nfsd.export is needed because once auth.unix.ip and nfsd.fh have
provided their information - it combines the two together to get
export options.

auth.domain was a mistake that is gone in recent kernels.

That leaves nfs4.nametoid and nfs4.idtoname. I'm sure they do
something useful.... :-)

Hope that completes the clarification.

NeilBrown


-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-05-29 08:52:57

by mehta kiran

[permalink] [raw]
Subject: Re: 2.4 vs 2.6



--- Neil Brown <[email protected]> wrote:

> On Friday May 26, [email protected] wrote:
> > On Fri, May 26, 2006 at 01:19:05AM -0700, mehta
> kiran wrote:
> > > Hi,
> > >=20
> > > 1. I noticed that 2.6 kernel is not aware of=20
> > > exportfs until client tries to mount. It only
> > > after this that proc nfsd filesystem shows
> this
> > > in list of exports due to upcall.
> > > Was 2.4 kernel aware of exportfs when the
> > > filesystem was first exported or mounted ?
> > >=20
> > > 2. Can you let me know advantage 2.6 export
> approach
> > > over 2.4 export approach ?=20
> > > Is this approach required due to=20
> > > security/authentication integration with
> NFS(say=20
> > > ip->name conversion for security ) ?
> >=20
> > The kernel has to know whether an incoming IP
> address matches, say,
> > "*.umich.edu." But we don't want to add an entire
> DNS client
> > implementation to the kernel. And we also can't
> preload the kernel with
> > every possible IP address that might match
> *.umich.edu. So we allow
> > the kernel to ask mountd for that information when
> it needs it.
> >=20
> > This isn't actually necessary for an NFSv2/v3
> server, because mountd
> > always gets a "mount" request from any client
> before the kernel gets any
> > NFS requests from that client, so mountd can tell
> the kernel about the
> > new client then.
>=20
> Not exactly necessary, but without it we depend on
> 'rmtab' to record
> which clients have the filesystem mounted across a
> server reboot, and
> it is not possible to maintain an rmtab file
> reliably.
>=20
> >=20
> > But NFSv4 clients don't contact mountd first.=20
> Neither do v2/v3 clients
> > that are failing over from another server. So you
> need the upcalls to
> > handle those cases.
> >=20
> > That justifies the auth_unix_ip upcall, anyway;
> I'm less sure about the
> > others.
>=20
> The nfsd.fh upcall - which maps a
> file-handle-fragment to a directory -
> can be used to implementing on-demand loading of
> filesystems,
> e.g. CDroms from a CD library. We don't actually do
> that now, but we
> could.
> It also means that if a filesystem isn't currently
> being used by a
> client, (where 'currently' means in the last 30
> minutes I think), then
> you can unmount it without having to unexport it.=20
> This is (arguably?)
KM>
1.I tried this but got to know that exported =20
filesystem may not be umounted cleanly when client
has mounted that fs atleast once in past.
Is this expected ? Looks like kernel
holds reference to exported fs
2.Just a thought ....
Consider a case where server exports a filesystem
(and nobody mounts it for long time)=20
and due to inactivity on it for some time, it
umounts the filesystem. Now if some new client
tries to mount it, it may access some data(data on=20
mount point) which server may not have exported,=20
right ?



> a good thing.
>=20
> nfsd.export is needed because once auth.unix.ip and
> nfsd.fh have
> provided their information - it combines the two
> together to get
> export options.
>=20
> auth.domain was a mistake that is gone in recent
> kernels.
>=20
> That leaves nfs4.nametoid and nfs4.idtoname. I'm
> sure they do
> something useful.... :-)
>=20
> Hope that completes the clarification.
>=20
> NeilBrown
>=20


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around=20
http://mail.yahoo.com=20


-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications i=
n
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=3D107521&bid=3D248729&dat=3D=
121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-05-29 16:02:44

by J. Bruce Fields

[permalink] [raw]
Subject: Re: 2.4 vs 2.6

On Mon, May 29, 2006 at 03:55:19PM +1000, Neil Brown wrote:
> Not exactly necessary, but without it we depend on 'rmtab' to record
> which clients have the filesystem mounted across a server reboot, and
> it is not possible to maintain an rmtab file reliably.

Oh, right. Hm. I don't actually understand exactly what the obstacles
are to maintaining it reliably, though intuitively it's easy to believe
that it's impossible. Is the problem just clients that die and never
come back, or are there inherent race coditions updating the rmtab on
unmount?

> nfsd.export is needed because once auth.unix.ip and nfsd.fh have
> provided their information - it combines the two together to get
> export options.

I don't see why this *has* to be done on demand, though, unless the
export table is extremely large and only sparsely used. I might be
missing something.

> That leaves nfs4.nametoid and nfs4.idtoname. I'm sure they do
> something useful.... :-)

On good days....

--b.


-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-05-30 01:12:37

by Greg Banks

[permalink] [raw]
Subject: Re: 2.4 vs 2.6

On Mon, May 29, 2006 at 12:02:36PM -0400, J. Bruce Fields wrote:
> On Mon, May 29, 2006 at 03:55:19PM +1000, Neil Brown wrote:
> > Not exactly necessary, but without it we depend on 'rmtab' to record
> > which clients have the filesystem mounted across a server reboot, and
> > it is not possible to maintain an rmtab file reliably.
>
> Oh, right. Hm. I don't actually understand exactly what the obstacles
> are to maintaining it reliably, though intuitively it's easy to believe
> that it's impossible. Is the problem just clients that die and never
> come back, or are there inherent race coditions updating the rmtab on
> unmount?

Clients can also ignore network errors when they unmount, or do
some kind of forced unmount which doesn't send an RPC to mountd.
Because clients still work regardless of whether the serverside state
is cleaned up, umount is effectively optional.

So rmtab is not only unreliable but tends to accumulate cruft. I've
seen reports of rmtab growing to over 500 megabytes (admittedly not
on a Linux box, but the principle is the same).

> > nfsd.export is needed because once auth.unix.ip and nfsd.fh have
> > provided their information - it combines the two together to get
> > export options.
>
> I don't see why this *has* to be done on demand, though, unless the
> export table is extremely large and only sparsely used. I might be
> missing something.

/mnt/data *.domain1.company.com(ro,sync) *.domain2.company.com(rw,sync)

Also, ISTR that the combination of the nohide export option and
a client wildcard didn't work properly in 2.4 and the kernel upcall
to mountd in 2.6 fixed that.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-05-30 01:59:25

by J. Bruce Fields

[permalink] [raw]
Subject: Re: 2.4 vs 2.6

On Tue, May 30, 2006 at 11:12:08AM +1000, Greg Banks wrote:
> On Mon, May 29, 2006 at 12:02:36PM -0400, J. Bruce Fields wrote:
> > I don't see why this *has* to be done on demand, though, unless the
> > export table is extremely large and only sparsely used. I might be
> > missing something.
>
> /mnt/data *.domain1.company.com(ro,sync) *.domain2.company.com(rw,sync)

I realize that the ip->client-name mapping is too large to preload into
the kernel.

It's the (path, client-name) --> export options that I wonder about.

--b.


-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

Subject: Re: 2.4 vs 2.6


> There are only two criticisms I would make of Linux' upcall design.
>
> First, Linux uses a wacky special-purpose filesystem. Solaris and
> IRIX use a normal RPC to a special RPC program number implemented
> in mountd, using the existing RPC client code which is already needed
> in the server to initiate lockd callbacks. This reduces code
> complexity in the kernel, and means you can watch the upcall traffic
> with wireshark (or whatever ethereal is called this week) or snoop
> instead of strace on mountd. But whatever, it mostly works now.
>

funny you should mention the RPC upcall. that was the origional design i
coded, was working just fine, and was nixed!

-->Andy


> --
> Greg Banks, R&D Software Engineer, SGI Australian Software Group.
> I don't speak for SGI.
>
>
>
>
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs




_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-06-15 03:17:23

by Greg Banks

[permalink] [raw]
Subject: Re: 2.4 vs 2.6

On Thu, 2006-06-15 at 05:40, William A.(Andy) Adamson wrote:
> funny you should mention the RPC upcall. that was the origional design i
> coded, was working just fine, and was nixed!

Whyever?

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.




_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-06-16 03:53:12

by NeilBrown

[permalink] [raw]
Subject: Re: multiple mountds (was: 2.4 vs 2.6)

On Wednesday June 14, [email protected] wrote:
>
> How about the attached patch against nfs-utils tot? It
> adds a -t option to set the number of forked workers.
> Default is 1 thread, i.e. the old behaviour.
>

Looks perfect, thanks.
I've added it to the git at
http://linux-nfs.org/cgi-bin/gitweb.cgi?p=nfs-utils;a=summary
git://linux-nfs.org/nfs-utils

Documentation and everything. No Changelog though... I should do that
at some stage.

Thanks,

NeilBrown


_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-06-13 03:30:03

by NeilBrown

[permalink] [raw]
Subject: Re: 2.4 vs 2.6

On Monday May 29, [email protected] wrote:
> On Tue, May 30, 2006 at 11:12:08AM +1000, Greg Banks wrote:
> > On Mon, May 29, 2006 at 12:02:36PM -0400, J. Bruce Fields wrote:
> > > I don't see why this *has* to be done on demand, though, unless the
> > > export table is extremely large and only sparsely used. I might be
> > > missing something.
> >
> > /mnt/data *.domain1.company.com(ro,sync) *.domain2.company.com(rw,sync)
>
> I realize that the ip->client-name mapping is too large to preload into
> the kernel.
>
> It's the (path, client-name) --> export options that I wonder about.

Having the mapping permanently in the kernel rather than filled in on
demand would mean that:

- the filesystem has to be mounted the whole time, and we cannot do
any on-demand mounting (based on fsid) - not that we currently do.
This is a fairly small point however.
- the client-names would need to be known in advance. It seems
obvious that they would be, but they aren't.
What do you do if you get a request from an IP address that matches
several of the tags in /etc/exports ? e.g.
/export1 @somehosts(rw,root_squash)
/export2 @otherhosts(ro,no_root_squash)

If a request arrives from a host which is in both 'somehosts' and
'otherhosts', then what name do you give to the kernel for that IP
address?
We currently say the IP address maps to
@somehosts+@otherhosts
(or something like that) and then tell the kernel any of the
following as required:
/export1 @somehosts+@otherhosts -> rw,root_squash
/export1 @somehosts -> rw,root_squash
/export2 @somehosts+@otherhosts -> ro,no_root_squash
/export2 @otherhosts -> ro,no_root_squash

Hopefully you can see that giving a full (path, client-name ) ->
export mapping the kernel in advance is not practical.

NeilBrown


_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-06-13 20:42:50

by J. Bruce Fields

[permalink] [raw]
Subject: Re: 2.4 vs 2.6

On Tue, Jun 13, 2006 at 01:29:42PM +1000, Neil Brown wrote:
> Having the mapping permanently in the kernel rather than filled in on
> demand would mean that:
>
> - the filesystem has to be mounted the whole time, and we cannot do
> any on-demand mounting (based on fsid) - not that we currently do.
> This is a fairly small point however.

Fair enough.

> If a request arrives from a host which is in both 'somehosts' and
> 'otherhosts', then what name do you give to the kernel for that IP
> address?
> We currently say the IP address maps to
> @somehosts+@otherhosts
> (or something like that) and then tell the kernel any of the
> following as required:
> /export1 @somehosts+@otherhosts -> rw,root_squash
> /export1 @somehosts -> rw,root_squash
> /export2 @somehosts+@otherhosts -> ro,no_root_squash
> /export2 @otherhosts -> ro,no_root_squash

The kernel could of course just split these plus+separated+name+lists
itself before mapping to export options. But I guess the point is that
in cases such as the above there's a policy decision that's better made
in userspace. OK.

> Hopefully you can see that giving a full (path, client-name ) ->
> export mapping the kernel in advance is not practical.

Just to be clear--I'm definitely not on some crusade to change all this.
I'm just curious about the motivation for the design; thanks for the
explanation.

--b.


_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-06-13 23:22:56

by NeilBrown

[permalink] [raw]
Subject: Re: 2.4 vs 2.6

On Tuesday June 13, [email protected] wrote:
>
> Just to be clear--I'm definitely not on some crusade to change all this.
> I'm just curious about the motivation for the design; thanks for the
> explanation.

Not the caped crusader then :-)

Oh! you wanted the motivation? I thought you wanted the justification!
Completely different things you know.....

The motivation was that I needed to do upcalls to remove the rmtab
problem, so it seemed 'obvious' to use upcalls for everything and
create a caching structure that could be use generally. I cannot say
that I considered every cache and asked myself if it could be static
or not. Had I thought more fully about such things I might not have
included the 'client' in the key for the fh -> path mapping. But it
seemed like the right idea at the time, and I didn't have anyone
questioning my design decisions. Looking back, I wish I had!


NeilBrown


_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-06-14 01:29:46

by Greg Banks

[permalink] [raw]
Subject: Re: 2.4 vs 2.6

On Wed, 2006-06-14 at 06:42, J. Bruce Fields wrote:
> On Tue, Jun 13, 2006 at 01:29:42PM +1000, Neil Brown wrote:
> > If a request arrives from a host which is in both 'somehosts' and
> > 'otherhosts', then what name do you give to the kernel for that IP
> > address?
> > We currently say the IP address maps to
> > @somehosts+@otherhosts
> > (or something like that) and then tell the kernel any of the
> > following as required:
> > /export1 @somehosts+@otherhosts -> rw,root_squash
> > /export1 @somehosts -> rw,root_squash
> > /export2 @somehosts+@otherhosts -> ro,no_root_squash
> > /export2 @otherhosts -> ro,no_root_squash
>
> The kernel could of course just split these plus+separated+name+lists
> itself before mapping to export options. But I guess the point is that
> in cases such as the above there's a policy decision that's better made
> in userspace. OK.

There are a couple more considerations.

First, the list of IP addresses to which a netgroup maps can be
quite large; we've seen problems on customer sites which had
netgroups with over 8K entries. Getting this amount of data
into the kernel in an exportfs operation is problematical.

Second, the membership of a netgroup can be controlled on a NIS
server and so can vary behind the kernel's back, so that the list
of addresses in the kernel becomes stale. Of course this is both
most likely to happen and most painful with the same customers who
have the enormous netgroups.

Both of these mean that enumerating netgroups and shoving the
results into the kernel isn't practical. The only real solution
here is to do a NIS query in userspace on mount. Thanks to NFSv3's
mythical "statelessness" this really means on first NFS call from
a new host. Hence you need an upcall.

There are only two criticisms I would make of Linux' upcall design.

First, Linux uses a wacky special-purpose filesystem. Solaris and
IRIX use a normal RPC to a special RPC program number implemented
in mountd, using the existing RPC client code which is already needed
in the server to initiate lockd callbacks. This reduces code
complexity in the kernel, and means you can watch the upcall traffic
with wireshark (or whatever ethereal is called this week) or snoop
instead of strace on mountd. But whatever, it mostly works now.

Second, rpc.mountd is single-threaded, and needs to do a blocking
reverse hostname lookup on every mount and needs to respond to at
least one upcall shortly afterward. When you have a thousand
compute cluster nodes all trying to mount in the same second, this
gets to be something of a problem.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.




_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-06-14 02:16:08

by J. Bruce Fields

[permalink] [raw]
Subject: Re: 2.4 vs 2.6

On Wed, Jun 14, 2006 at 11:29:31AM +1000, Greg Banks wrote:
> There are a couple more considerations.
>
> First, the list of IP addresses to which a netgroup maps can be
> quite large

Yeah, it's obvious why we need the ip address->netgroup mapping; it's
the netgroup->export options that I was interested in.

> Second, rpc.mountd is single-threaded, and needs to do a blocking
> reverse hostname lookup on every mount and needs to respond to at
> least one upcall shortly afterward. When you have a thousand
> compute cluster nodes all trying to mount in the same second, this
> gets to be something of a problem.

This should be trivially fixable, shouldn't it? (Actually, can you run
multiple concurrent mountd's right now?)

--b.


_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-06-14 02:22:36

by Greg Banks

[permalink] [raw]
Subject: Re: 2.4 vs 2.6

On Wed, 2006-06-14 at 12:15, J. Bruce Fields wrote:
> On Wed, Jun 14, 2006 at 11:29:31AM +1000, Greg Banks wrote:

> This should be trivially fixable, shouldn't it? (Actually, can you run
> multiple concurrent mountd's right now?)

I believe the kernel cache could can handle multiple readers
and writers (Neil?) but there's only one portmap registration,
so all the client RPCs will go to the last mountd started.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.




_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-06-14 04:18:52

by NeilBrown

[permalink] [raw]
Subject: Re: 2.4 vs 2.6

On Wednesday June 14, [email protected] wrote:
> On Wed, 2006-06-14 at 12:15, J. Bruce Fields wrote:
> > On Wed, Jun 14, 2006 at 11:29:31AM +1000, Greg Banks wrote:
>
> > This should be trivially fixable, shouldn't it? (Actually, can you run
> > multiple concurrent mountd's right now?)
>
> I believe the kernel cache could can handle multiple readers
> and writers (Neil?) but there's only one portmap registration,
> so all the client RPCs will go to the last mountd started.

Yes, the kernel caches can handle multiple reader/writers quite
transparently.

As rpc service is fairly stateless, it should be possible to get
mountd to fork N times after registering the service and before
entering the loop. All of the state of significance lives in files or
in the kernel, and the file access is already done with locking, so it
should "just work" - though some review and testing would not go
astray of course.

Did we have a volunteer :-)

NeilBrown


_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-06-14 04:36:29

by Greg Banks

[permalink] [raw]
Subject: Re: 2.4 vs 2.6

On Wed, 2006-06-14 at 14:18, Neil Brown wrote:
> On Wednesday June 14, [email protected] wrote:
> > On Wed, 2006-06-14 at 12:15, J. Bruce Fields wrote:
> > > On Wed, Jun 14, 2006 at 11:29:31AM +1000, Greg Banks wrote:
> >
> > > This should be trivially fixable, shouldn't it? (Actually, can you run
> > > multiple concurrent mountd's right now?)
> >
> > I believe the kernel cache could can handle multiple readers
> > and writers (Neil?) but there's only one portmap registration,
> > so all the client RPCs will go to the last mountd started.
>
> Yes, the kernel caches can handle multiple reader/writers quite
> transparently.
>
> As rpc service is fairly stateless, it should be possible to get
> mountd to fork N times after registering the service and before
> entering the loop.

Sounds relatively simple, assuming libc's svc_* code is fork-safe.

> All of the state of significance lives in files or
> in the kernel, and the file access is already done with locking, so it
> should "just work" - though some review and testing would not go
> astray of course.
>
> Did we have a volunteer :-)

I have this terrible feeling that you do ;-)

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.




_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs