LinuxLists.cc - Re: [PATCH 010 of 11] knfsd: make rpc threads pools numa aware

2006-08-06 09:47:18

Subject: Re: [PATCH 010 of 11] knfsd: make rpc threads pools numa aware

On Mon, 31 Jul 2006 10:42:34 +1000
NeilBrown <[email protected]> wrote:

> knfsd: Actually implement multiple pools. On NUMA machines, allocate
> a svc_pool per NUMA node; on SMP a svc_pool per CPU; otherwise a single
> global pool. Enqueue sockets on the svc_pool corresponding to the CPU
> on which the socket bh is run (i.e. the NIC interrupt CPU). Threads
> have their cpu mask set to limit them to the CPUs in the svc_pool that
> owns them.
>
> This is the patch that allows an Altix to scale NFS traffic linearly
> beyond 4 CPUs and 4 NICs.
>
> Incorporates changes and feedback from Neil Brown, Trond Myklebust,
> and Christoph Hellwig.

This makes the NFS client go BUG. Simple nfsv3 workload (ie: mount, read
stuff). Uniproc, FC5.

+ BUG_ON(m->mode == SVC_POOL_NONE);

kernel BUG at net/sunrpc/svc.c:244!
invalid opcode: 0000 [#1]
4K_STACKS
last sysfs file: /class/net/eth1/flags
Modules linked in: nfs lockd nfs_acl ipw2200 sonypi autofs4 hidp l2cap bluetooth sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables video sony_acpi sbs i2c_ec button battery asus_acpi ac nvram ohci1394 ieee1394 ehci_hcd uhci_hcd sg joydev snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device ieee80211 snd_pcm_oss snd_mixer_oss ieee80211_crypt snd_pcm snd_timer snd i2c_i801 soundcore i2c_core piix snd_page_alloc pcspkr generic ext3 jbd ide_disk ide_core
CPU: 0
EIP: 0060:[<f8d6d308>] Not tainted VLI
EFLAGS: 00210246 (2.6.18-rc3-mm1 #21)
EIP is at svc_pool_for_cpu+0xc/0x43 [sunrpc]
eax: ffffffff ebx: f59a75c0 ecx: f59a76c0 edx: 00000000
esi: f59cbc20 edi: f59a75c0 ebp: f582d5c0 esp: f59cbb98
ds: 007b es: 007b ss: 0068
Process mount (pid: 2599, ti=f59cb000 task=f59a5550 task.ti=f59cb000)
Stack: f8d6e506 f59cbbb0 00000000 00200282 00000014 00000006 f59a76c0 f59a75c0
f59cbc20 f59a75c0 f582d5c0 f8d6ea83 00200286 c0270376 f582d5c0 f59a76c0
f59a75c0 c02dbbc0 00000006 f59a76c0 f5956c80 f8d6ebd7 00000001 00000000
Call Trace:
[<f8d6e506>] svc_sock_enqueue+0x33/0x294 [sunrpc]
[<f8d6ea83>] svc_setup_socket+0x31c/0x326 [sunrpc]
[<c0270376>] release_sock+0xc/0x83
[<f8d6ebd7>] svc_makesock+0x14a/0x185 [sunrpc]
[<f8ca3b10>] make_socks+0x72/0xae [lockd]
[<f8ca3bce>] lockd_up+0x82/0xd9 [lockd]
[<c01169a6>] __wake_up+0x11/0x1a
[<f9227743>] nfs_start_lockd+0x26/0x43 [nfs]
[<f9228264>] nfs_create_server+0x1dc/0x3da [nfs]
[<c02c4298>] wait_for_completion+0x70/0x99
[<c0116293>] default_wake_function+0x0/0xc
[<c0124918>] call_usermodehelper_keys+0xc4/0xd3
[<f922e348>] nfs_get_sb+0x398/0x3b4 [nfs]
[<c0124927>] __call_usermodehelper+0x0/0x43
[<c0158d68>] vfs_kern_mount+0x83/0xf6
[<c0158e1d>] do_kern_mount+0x2d/0x3e
[<c016a8ac>] do_mount+0x5b2/0x625
[<c019facb>] task_has_capability+0x56/0x5e
[<c029479e>] inet_bind_bucket_create+0x11/0x3c
[<c0295e57>] inet_csk_get_port+0x196/0x1a0
[<c0270376>] release_sock+0xc/0x83
[<c02add33>] inet_bind+0x1c6/0x1d0
[<c01397fe>] handle_IRQ_event+0x23/0x49
[<c013ec5e>] __alloc_pages+0x5e/0x28d
[<c01034d2>] common_interrupt+0x1a/0x20
[<c0169815>] copy_mount_options+0x26/0x109
[<c016a991>] sys_mount+0x72/0xa4
[<c0102b4b>] syscall_call+0x7/0xb
Code: 31 c0 eb 15 8b 40 10 89 d1 c1 e9 02 8b 50 1c 8d 41 02 89 42 04 8d 44 8b 08 5a 59 5b c3 90 90 89 c1 a1 88 86 d8 f8 83 f8 ff 75 0a <0f> 0b f4 00 2c 6f d7 f8 eb 0a 83 f8 01 74 09 83 f8 02 74 0e 31
EIP: [<f8d6d308>] svc_pool_for_cpu+0xc/0x43 [sunrpc] SS:ESP 0068:f59cbb98

2006-08-07 03:17:10

by Greg Banks

[permalink] [raw]

Subject: Re: [PATCH 010 of 11] knfsd: make rpc threads pools numa aware

On Sun, 2006-08-06 at 19:47, Andrew Morton wrote:
> On Mon, 31 Jul 2006 10:42:34 +1000
> NeilBrown <[email protected]> wrote:
>
> > knfsd: Actually implement multiple pools. On NUMA machines, allocate
> > a svc_pool per NUMA node; on SMP a svc_pool per CPU; otherwise a single
> > global pool. Enqueue sockets on the svc_pool corresponding to the CPU
> > on which the socket bh is run (i.e. the NIC interrupt CPU). Threads
> > have their cpu mask set to limit them to the CPUs in the svc_pool that
> > owns them.
> >
> > This is the patch that allows an Altix to scale NFS traffic linearly
> > beyond 4 CPUs and 4 NICs.
> >
> > Incorporates changes and feedback from Neil Brown, Trond Myklebust,
> > and Christoph Hellwig.
>
> This makes the NFS client go BUG. Simple nfsv3 workload (ie: mount, read
> stuff). Uniproc, FC5.
>
> + BUG_ON(m->mode == SVC_POOL_NONE);

Aha, I see what I b0rked up. On the client, lockd starts an RPC
service via the old svc_create() interface, which avoids calling
svc_pool_map_init(). When the first NLM callback arrives,
svc_sock_enqueue() calls svc_pool_for_cpu() which BUGs out because
the map is not initialised. The BUG_ON() was introduced in one
of the rewrites in response to review feedback in the last few
days; previously the code was simpler and would trivially return
pool 0, which is the right thing to do in this case. The bug was
hidden on my test machines because they have SLES userspaces,
where lockd is broken because both the kernel and userspace think
the other one is doing the rpc.statd functionality.

A simple patch should fix this, coming up as soon as I can find
a non-SLES machine and run some client tests.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

2006-08-07 11:25:34

by Greg Banks

[permalink] [raw]

Subject: Re: [PATCH 010 of 11] knfsd: make rpc threads pools numa aware

On Sun, 2006-08-06 at 19:47, Andrew Morton wrote:
> On Mon, 31 Jul 2006 10:42:34 +1000
> NeilBrown <[email protected]> wrote:
>
> > knfsd: Actually implement multiple pools. On NUMA machines, allocate
> > a svc_pool per NUMA node; on SMP a svc_pool per CPU; otherwise a single
> > global pool. Enqueue sockets on the svc_pool corresponding to the CPU
> > on which the socket bh is run (i.e. the NIC interrupt CPU). Threads
> > have their cpu mask set to limit them to the CPUs in the svc_pool that
> > owns them.
> >
> > This is the patch that allows an Altix to scale NFS traffic linearly
> > beyond 4 CPUs and 4 NICs.
> >
> > Incorporates changes and feedback from Neil Brown, Trond Myklebust,
> > and Christoph Hellwig.
>
> This makes the NFS client go BUG. Simple nfsv3 workload (ie: mount, read
> stuff). Uniproc, FC5.
>
> + BUG_ON(m->mode == SVC_POOL_NONE);
>

Reproduced on RHAS4; this patch fixes it for me.
--

knfsd: Fix a regression on an NFS client where mounting an
NFS filesystem trips a spurious BUG_ON() in the server code.
Tested using cthon04 lock tests on RHAS4-U2 userspace.

Signed-off-by: Greg Banks <[email protected]>
---

net/sunrpc/svc.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletion(-)

Index: linux-2.6.18-rc2/net/sunrpc/svc.c
===================================================================
--- linux-2.6.18-rc2.orig/net/sunrpc/svc.c
+++ linux-2.6.18-rc2/net/sunrpc/svc.c
@@ -211,6 +211,11 @@ svc_pool_map_set_cpumask(unsigned int pi
struct svc_pool_map *m = &svc_pool_map;
unsigned int node; /* or cpu */

+ /*
+ * The caller checks for sv_nrpools > 1, which
+ * implies that we've been initialized and the
+ * map mode is not NONE.
+ */
BUG_ON(m->mode == SVC_POOL_NONE);

switch (m->mode)
@@ -241,7 +246,11 @@ svc_pool_for_cpu(struct svc_serv *serv,
struct svc_pool_map *m = &svc_pool_map;
unsigned int pidx = 0;

- BUG_ON(m->mode == SVC_POOL_NONE);
+ /*
+ * SVC_POOL_NONE happens in a pure client when
+ * lockd is brought up, so silently treat it the
+ * same as SVC_POOL_GLOBAL.
+ */

switch (m->mode) {
case SVC_POOL_PERCPU:

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.