2001-02-18 20:11:20

by Mark Hemment

[permalink] [raw]
Subject: [PATCH] nfsd + scalability

Hi Neil, all,

The nfs daemons run holding the global kernel lock. They still hold
this lock over calls to file_op's read and write.

The file system kernel interface (FSKI) doesn't require the kernel lock
to be held over these read/write calls. The nfs daemons do not require
that the reads or writes do not block (would be v silly if they did), so
they have no guarantee the lock isn't dropped and retaken during
blocking. ie. they aren't using it as a guard across the calls.

Dropping the kernel lock around read and write in fs/nfsd/vfs.c is a
_big_ SMP scalability win!

Attached patch is against 2.4.1-ac18, but should apply to most recent
kernel versions.

Mark


Attachments:
nfsd.patch (2.19 kB)
nfsd.patch

2001-02-22 00:22:05

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] nfsd + scalability

On Sunday February 18, [email protected] wrote:
> Hi Neil, all,
>
> The nfs daemons run holding the global kernel lock. They still hold
> this lock over calls to file_op's read and write.
>
> The file system kernel interface (FSKI) doesn't require the kernel lock
> to be held over these read/write calls. The nfs daemons do not require
> that the reads or writes do not block (would be v silly if they did), so
> they have no guarantee the lock isn't dropped and retaken during
> blocking. ie. they aren't using it as a guard across the calls.
>
> Dropping the kernel lock around read and write in fs/nfsd/vfs.c is a
> _big_ SMP scalability win!

Certainly I would like to not hold the BKL so much, but I'm curious
how much effect it will really have. Do you have any data on the
effect of this change?

Also, I would much rather drop the lock in nfsd_dispatch() and then go
around reclaiming it whereever it was needed.
Subsequently we would drop the lock in nfsd() and make sure sunrpc is
safe.
This would be more work (any volunteers?:-) but I feel that dropping
it in little places like this is a bit unsatisfying.

Thanks for the input,

NeilBrown


[patch deleted]

2001-02-22 08:57:23

by Mark Hemment

[permalink] [raw]
Subject: Re: [PATCH] nfsd + scalability

On Thu, 22 Feb 2001, Neil Brown wrote:

> On Sunday February 18, [email protected] wrote:
> > Hi Neil, all,
> >
> > The nfs daemons run holding the global kernel lock. They still hold
> > this lock over calls to file_op's read and write.
> > [snip]
> > Dropping the kernel lock around read and write in fs/nfsd/vfs.c is a
> > _big_ SMP scalability win!
>
> Certainly I would like to not hold the BKL so much, but I'm curious
> how much effect it will really have. Do you have any data on the
> effect of this change?

Depends very much on the hardware configuration, and underlying
filesystem.
On an I/O bound system, obviously this has little effect.

Using a filesystem which has a quite deep stack (CPU cycle heavy),
this is a big win. I've been running with this for so long that I can't
find my original data files at the moment, but it was around +8%
improvment in throughput for a 4-way box under SpecFS with vxfs as the
underlying filesystem. Less benefit for ext2 (all filesystems NFS
exported "sync" and "no_subtree_check"). Some of the benefit came from
the fact that there is also a volume manager sitting under the filesystem
(more CPU cycles with the kernel lock held!).

Holding the kernel lock for less cycles has an important side benefit!
If it is held for less cycles, then the probability of it being held
when an interrupt is processed is reduced. This benefit is further
enhanced if there are bottom-halfs running off the back of the interrupt
(such as networking code).
I need to get actual figures for how many cycles are we spinning at the
reacquire_kernel_lock() (in schedule()), but my gut feeling is that we
aren't doing too well when running as a file server.

> Also, I would much rather drop the lock in nfsd_dispatch() and then go
> around reclaiming it whereever it was needed.
> Subsequently we would drop the lock in nfsd() and make sure sunrpc is
> safe.

sunrpc definitely tries to be SMP safe, I haven't convienced myself that
it actually is.
In sunrpc, dropping the kernel lock when checksuming the UDP packet, and
when sending, is another win-win case (again, need to find my data files,
but this was approx +3% improvement in throughput).

> This would be more work (any volunteers?:-) but I feel that dropping
> it in little places like this is a bit unsatisfying.

True, not 100% satisfying, but even little places give an overall
improvement in scalability. If they can be proved to be correct, then
why not do it? It moves the code in the right direction.

I am planning on slow expanding the dropping of the kernel lock within
NFS, rather than do it in one single bang. It looks do able, but there
_might_ be an issue with the interaction with the dcache.

Mark

2001-02-23 05:45:44

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] nfsd + scalability

On Thursday February 22, [email protected] wrote:
> On Thu, 22 Feb 2001, Neil Brown wrote:
> >
> > Certainly I would like to not hold the BKL so much, but I'm curious
> > how much effect it will really have. Do you have any data on the
> > effect of this change?
>
> Depends very much on the hardware configuration, and underlying
> filesystem.
> On an I/O bound system, obviously this has little effect.
>
> Using a filesystem which has a quite deep stack (CPU cycle heavy),
> this is a big win. I've been running with this for so long that I can't
> find my original data files at the moment, but it was around +8%
> improvment in throughput for a 4-way box under SpecFS with vxfs as the
> underlying filesystem. Less benefit for ext2 (all filesystems NFS
> exported "sync" and "no_subtree_check"). Some of the benefit came from
> the fact that there is also a volume manager sitting under the filesystem
> (more CPU cycles with the kernel lock held!).
.... and more ....

Well, you are quite convincing. I am keen to see if I can measure a
performance difference for ext2/raid5, but unfortuantely I don't have
any 4-way boxed (only duals).

If you progress with this can create any patches that do more
interesting things than just drop the lock (e.g. protect the export
table or the read-ahead cache properly), let me know and I will
certainly take them. Meanwhile I will try to find time to have a look
myself.

Thanks,
NeilBrown