2013-12-04 05:55:17

by Weng Meiling

[permalink] [raw]
Subject: [PATCH 3.4 0/9] fix the NULL pointer when use nfs in different net ns

When testing NFS in different network namespace with stable-3.4,
the situation start NFSd in one non init_net network namespace,
and stop it in another one will trigger kernel panic, because
RPCBIND client is stored per net, and will be NULL on NFSd shutdown.

Walk through the code, this problem also exists in stable-3.5
to stable-3.7. Stanislav Kinsbursky had committed a fixed patch
for 3.8: commit f7fb86c6e639360ad9c253cec534819ef928a674 (nfsd:
use "init_net" for portmapper). But it causes another bug, When
starting NFSd in a non init_net network namespace will trigger
kernel panic. Because RPCBIND client will be NULL when register
RPC service with the local portmapper in svc_addsock().
This new bug also exists in 3.8, but disappears after patch
commit 11f779421a39b86da8a523d97e5fd3477878d44f ("containerize
NFSd filesystem") in 3.9.

So backport Stanislav's patches from 3.8 to fix the former bug and
cleanup init_net reference, and then using the current->nsproxy->net_ns
to repalce the init_net to make NFSd keep using a consistent network
namespace all the time to resolve the new bug.

After this patchset, testing NFS in different network namespace will
not trigger kernel panic any more, and other NFS client can use the
NFS server's shared files normally which was started in init_net namespace:

# ip netns add test
# ip netns list
test
# ip netns exec test service nfsserver start
Starting kernel based NFS server: idmapd mountd statd nfsd sm-notify done
# service nfsserver status
Checking for kernel based NFS server: idmapd running
mountd running
statd running
nfsd running
sles28:~ #
# service nfsserver stop
Shutting down kernel based NFS server: nfsd statd mountd idmapd done
# service nfsserver status
Checking for kernel based NFS server: idmapd unused
mountd unused
statd unused
nfsd unused

# dmesg
[ 74.505511] eth0: no IPv6 routers present
[ 99.412867] RPC: Registered named UNIX socket transport module.
[ 99.412871] RPC: Registered udp transport module.
[ 99.412873] RPC: Registered tcp transport module.
[ 99.412875] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 99.637872] Installing knfsd (copyright (C) 1996 [email protected]).
[ 100.055152] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[ 100.058699] NFSD: starting 90-second grace period
[ 129.449088] nfsd: last server has exited, flushing export cache



Stanislav Kinsbursky (8):
nfsd: use "init_net" for portmapper
nfsd: pass net to nfsd_init_socks()
nfsd: pass net to nfsd_startup() and nfsd_shutdown()
nfsd: pass net to nfsd_create_serv()
nfsd: pass net to nfsd_svc()
nfsd: pass net to nfsd_set_nrthreads()
nfsd: pass net to __write_ports() and down
nfsd: pass proper net to nfsd_destroy() from NFSd kthreads

Weng Meiling (1):
nfsd: use the current net ns in write_threads() and write_ports()

fs/nfsd/nfsctl.c | 34 +++++++++++++++++++---------------
fs/nfsd/nfsd.h | 6 +++---
fs/nfsd/nfssvc.c | 40 +++++++++++++++++++---------------------
3 files changed, 41 insertions(+), 39 deletions(-)

--
1.8.2.2




2013-12-06 18:32:31

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 3.4 2/9] nfsd: pass net to nfsd_init_socks()

On Wed, Dec 04, 2013 at 01:53:28PM +0800, Weng Meiling wrote:
> From: Stanislav Kinsbursky <[email protected]>
>
> commit db6e182c17cb1a7069f7f8924721ce58ac05d9a3 upstream.
>
> Precursor patch. Hard-coded "init_net" will be replaced by proper one in
> future.
>
> Signed-off-by: Stanislav Kinsbursky <[email protected]>
> Signed-off-by: J. Bruce Fields <[email protected]>
> [wengmeiling: backport to 3.4:
> - adjust context
> - one more parameter(int port) for nfsd_init_socks()]

This patch breaks the build, so I can't take it, sorry.

I took the first patch in the series, but had to stop here.

Care to fix this up and resend?

thanks,

greg k-h

2013-12-04 05:54:43

by Weng Meiling

[permalink] [raw]
Subject: [PATCH 3.4 3/9] nfsd: pass net to nfsd_startup() and nfsd_shutdown()

From: Stanislav Kinsbursky <[email protected]>

commit db42d1a76a8dfcaba7a2dc9c591fa4e231db22b3 upstream.

Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
[wengmeiling: backport to 3.4:
- adjust context
- one more parameter(int port) for nfsd_startup()
- no net ns initialization in nfsd_startup()
- no net ns initialization in nfsd_shutdown()
- pass @net to lockd_up() in nfsd_startup()
- pass @net to lockd_down() in nfsd_shutdown()]
Signed-off-by: Weng Meiling <[email protected]>
---
fs/nfsd/nfssvc.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index d1ed729..f8dae75 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -203,7 +203,7 @@ static int nfsd_init_socks(int port, struct net *net)

static bool nfsd_up = false;

-static int nfsd_startup(unsigned short port, int nrservs)
+static int nfsd_startup(unsigned short port, int nrservs, struct net *net)
{
int ret;

@@ -220,7 +220,7 @@ static int nfsd_startup(unsigned short port, int nrservs)
ret = nfsd_init_socks(port, net);
if (ret)
goto out_racache;
- ret = lockd_up(&init_net);
+ ret = lockd_up(net);
if (ret)
goto out_racache;
ret = nfs4_state_start();
@@ -229,13 +229,13 @@ static int nfsd_startup(unsigned short port, int nrservs)
nfsd_up = true;
return 0;
out_lockd:
- lockd_down(&init_net);
+ lockd_down(net);
out_racache:
nfsd_racache_shutdown();
return ret;
}

-static void nfsd_shutdown(void)
+static void nfsd_shutdown(struct net *net)
{
/*
* write_ports can create the server without actually starting
@@ -246,14 +246,14 @@ static void nfsd_shutdown(void)
if (!nfsd_up)
return;
nfs4_state_shutdown();
- lockd_down(&init_net);
+ lockd_down(net);
nfsd_racache_shutdown();
nfsd_up = false;
}

static void nfsd_last_thread(struct svc_serv *serv, struct net *net)
{
- nfsd_shutdown();
+ nfsd_shutdown(net);

svc_rpcb_cleanup(serv, net);

@@ -457,7 +457,7 @@ nfsd_svc(unsigned short port, int nrservs)

nfsd_up_before = nfsd_up;

- error = nfsd_startup(port, nrservs);
+ error = nfsd_startup(port, nrservs, net);
if (error)
goto out_destroy;
error = svc_set_num_threads(nfsd_serv, NULL, nrservs);
@@ -470,7 +470,7 @@ nfsd_svc(unsigned short port, int nrservs)
error = nfsd_serv->sv_nrthreads - 1;
out_shutdown:
if (error < 0 && !nfsd_up_before)
- nfsd_shutdown();
+ nfsd_shutdown(net);
out_destroy:
nfsd_destroy(net); /* Release server */
out:
--
1.8.2.2



2013-12-04 05:54:34

by Weng Meiling

[permalink] [raw]
Subject: [PATCH 3.4 8/9] nfsd: pass proper net to nfsd_destroy() from NFSd kthreads

From: Stanislav Kinsbursky <[email protected]>

commit 88c47666171989ed4c5b1a5687df09511e8c5e35 upstream.

Since NFSd service is per-net now, we have to pass proper network
context in nfsd_shutdown() from NFSd kthreads.

The simplest way I found is to get proper net from one of transports
with permanent sockets.

Signed-off-by: Stanislav Kinsbursky <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
[wengmeiling: backport to 3.4: adjust context]
Signed-off-by: Weng Meiling <[email protected]>
---
fs/nfsd/nfssvc.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 0974818..5bc9380 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -483,6 +483,8 @@ static int
nfsd(void *vrqstp)
{
struct svc_rqst *rqstp = (struct svc_rqst *) vrqstp;
+ struct svc_xprt *perm_sock = list_entry(rqstp->rq_server->sv_permsocks.next, typeof(struct svc_xprt), xpt_list);
+ struct net *net = perm_sock->xpt_net;
int err, preverr = 0;

/* Lock module and set up kernel thread */
@@ -557,7 +559,7 @@ out:
/* Release the thread */
svc_exit_thread(rqstp);

- nfsd_destroy(&init_net);
+ nfsd_destroy(net);

/* Release module */
mutex_unlock(&nfsd_mutex);
--
1.8.2.2



2013-12-16 01:27:39

by Weng Meiling

[permalink] [raw]
Subject: Re: NFSd 3.13 bug (Was "Re: [PATCH 3.4 9/9] nfsd: use the current net ns in write_threads() and write_ports()")

Hi Bruce, Stanislav:
Do you have any ideas about this problem?

On 2013/12/10 11:12, Weng Meiling wrote:
> Hi guys,
>
> When I test NFS in different network namespace with the
> 3.13-rc2 kernel, I trigger a kernel panic.
>
> On 2013/12/5 5:25, J. Bruce Fields wrote:
>> On Wed, Dec 04, 2013 at 01:53:35PM +0800, Weng Meiling wrote:
>>> Upstream commit f7fb86c6e639360ad9c253cec534819ef928a674 (nfsd: use
>>> "init_net" for portmapper) introduced a bug.
>>>
>>> Starting NFSd in a non init_net network namespace will lead to
>>> NULL pointer deference. Because RPCBIND client will be NULL when register
>>> RPC service with the local portmapper in svc_addsock().
>>>
>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
>>> IP: [<ffffffffa0439150>] call_start+0x10/0x30 [sunrpc]
>>> ...
>>> Pid: 27770, comm: rpc.nfsd ...
>>> RIP: 0010:[<ffffffffa0439150>] [<ffffffffa0439150>] call_start+0x10/0x30 [sunrpc]
>>> ...
>>> [<ffffffffa0442841>] __rpc_execute+0x91/0x160 [sunrpc]
>>> [<ffffffffa0442981>] rpc_execute+0x71/0x80 [sunrpc]
>>> [<ffffffffa043ab49>] rpc_run_task+0x89/0xa0 [sunrpc]
>>> [<ffffffffa043ac5d>] rpc_call_sync+0x3d/0x70 [sunrpc]
>>> [<ffffffffa044b316>] rpcb_register+0xa6/0xd0 [sunrpc]
>>> [<ffffffffa0444ede>] __svc_register+0x1ae/0x1c0 [sunrpc]
>>> [<ffffffff8114f975>] ? cache_alloc_refill+0x85/0x290
>>> [<ffffffffa0444f7f>] svc_register+0x8f/0xc0 [sunrpc]
>>> [<ffffffff811504f3>] ? kmem_cache_alloc_trace+0xc3/0x1d0
>>> [<ffffffffa04472f8>] svc_setup_socket+0x1a8/0x2c0 [sunrpc]
>>> [<ffffffff81009546>] ? read_tsc+0x16/0x40
>>> [<ffffffffa0448078>] svc_addsock+0x118/0x1c0 [sunrpc]
>>> [<ffffffff81090ee5>] ? do_gettimeofday+0x15/0x50
>>> [<ffffffffa049e69c>] ? nfsd_create_serv+0xdc/0x150 [nfsd]
>>> [<ffffffff8125605c>] ? simple_strtoull+0x2c/0x50
>>> [<ffffffffa049fdce>] __write_ports+0x1fe/0x230 [nfsd]
>>> [<ffffffffa049fe37>] write_ports+0x37/0x60 [nfsd]
>>> [<ffffffffa049fe00>] ? __write_ports+0x230/0x230 [nfsd]
>>> [<ffffffffa049edd2>] nfsctl_transaction_write+0x72/0x90 [nfsd]
>>> [<ffffffff8116573b>] vfs_write+0xcb/0x130
>>> [<ffffffff81165890>] sys_write+0x50/0x90
>>>
>>> Fix it by using the current's network namespace so NFSd uses the
>>> consistent net ns all the time.
>>
>> Everything else looks like a straightforward backport, but doing this
>> differently from upstream makes me nervous. Don't we also want to take
>> 11f779421a39b86da8a523d97e5fd3477878d44f "nfsd: containerize NFSd
>> filesystem" ? (Stanislav?)
>>
>> --b.
>>
>
> I backport the patch 11f779421a39b86da8a523d97e5fd3477878d44f "nfsd: containerize NFSd
> filesystem" and test. But I trigger a bug, this bug still exists in 3.13 kernel. The following
> is what I do:
>
> The steps:
>
> step 1: start NFS server in init_net net ns
> #service nfsserver start
>
> step 2: stop NFS server in non init_net net ns
> #ip netns add test
> #ip netns list
> test
> #ip netns exec test service nfsserver stop
>
> step 3: start NFS server again in the non init_net net ns
> #ip netns exec test service nfsserver start
>
> This step 3 will trigger kernel panic. The reason seems that "ip
> netns exec" creates a new mount namespace, the changes to the
> new mount namespace don't propgate to other namespaces. So
> when stop NFS server in second step, the NFSD filesystem isn't
> umounted. When restart NFS server in third step, the NFSD
> filesystem will not remount, this result to the NFSD file
> system superblock's net ns is still init_net and RPCBIND client
> will be NULL when register RPC service with the local portmapper
> in svc_addsock(). Do you have any ideas about this problem?
>
> the detail call trace:
> [ 497.554677] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
> [ 497.554687] IP: [<ffffffffa031a170>] call_start+0x10/0x30 [sunrpc]
> [ 497.554707] PGD 0
> [ 497.554711] Oops: 0000 [#1] SMP
> [ 497.554716] Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc oid_registry edd af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave loop dm_mod e1000e iTCO_wdt
> iTCO_vendor_support i2c_i801 bnx2 ipv6 lpc_ich i7core_edac edac_core acpi_cpufreq ehci_pci button ses enclosure serio_raw sg rtc_cmos mfd_core ptp hid_generic pps_core i2c_core pcspkr ext3 jbd mbcache
> usbhid hid uhci_hcd ehci_hcd usbcore sd_mod usb_common crc_t10dif crct10dif_common processor thermal_sys hwmon scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh ata_generic ata_piix libata
> megaraid_sas scsi_mod
> [ 497.554788] CPU: 2 PID: 7837 Comm: rpc.nfsd Not tainted 3.13.0-rc2-0.1-default+ #1
> [ 497.554793] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285 /BC11BTSA , BIOS CTSAV036 04/27/2011
> [ 497.554800] task: ffff8800ba76e2d0 ti: ffff88043e8e8000 task.ti: ffff88043e8e8000
> [ 497.554805] RIP: 0010:[<ffffffffa031a170>] [<ffffffffa031a170>] call_start+0x10/0x30 [sunrpc]
> [ 497.554819] RSP: 0018:ffff88043e8e9aa8 EFLAGS: 00010202
> [ 497.554823] RAX: ffffffffa033f4b8 RBX: ffff8800bb030040 RCX: 0000000000000034
> [ 497.554828] RDX: 0000000000000000 RSI: ffff8800bb0300b0 RDI: ffff8800bb030040
> [ 497.554832] RBP: ffff88043e8e9aa8 R08: 0040000000000000 R09: 0200000000000000
> [ 497.554836] R10: 0000000000000000 R11: ffff8802348fe040 R12: ffff8800bb030040
> [ 497.554841] R13: ffffffffa031a160 R14: 0000000000000000 R15: ffffffffa031a160
> [ 497.554846] FS: 00007f2fa0536700(0000) GS:ffff88023fc40000(0000) knlGS:0000000000000000
> [ 497.554851] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 497.554855] CR2: 0000000000000058 CR3: 0000000434e30000 CR4: 00000000000007e0
> [ 497.554859] Stack:
> [ 497.554862] ffff88043e8e9af8 ffffffffa0323f61 ffff00066c0a0100 ffff8800bb0300b0
> [ 497.554871] 000000003e8e9ae8 ffff8800bb030040 ffff8800bb030040 0000000000000000
> [ 497.554878] 0000000000000000 0000000000000002 ffff88043e8e9b28 ffffffffa03240ed
> [ 497.554886] Call Trace:
> [ 497.554902] [<ffffffffa0323f61>] __rpc_execute+0xa1/0x190 [sunrpc]
> [ 497.554918] [<ffffffffa03240ed>] rpc_execute+0x9d/0xc0 [sunrpc]
> [ 497.554930] [<ffffffffa031c3e9>] rpc_run_task+0x89/0xa0 [sunrpc]
> [ 497.554943] [<ffffffffa031c4fe>] rpc_call_sync+0x3e/0xa0 [sunrpc]
> [ 497.554961] [<ffffffffa032d337>] rpcb_register_call+0x37/0x60 [sunrpc]
> [ 497.554979] [<ffffffffa032d53c>] rpcb_register+0x9c/0xb0 [sunrpc]
> [ 497.554996] [<ffffffffa03270ee>] __svc_register+0x1ae/0x1c0 [sunrpc]
> [ 497.555012] [<ffffffffa0327190>] svc_register+0x90/0xe0 [sunrpc]
> [ 497.555029] [<ffffffffa032a157>] svc_setup_socket+0x1e7/0x300 [sunrpc]
> [ 497.555038] [<ffffffff810b39b3>] ? __getnstimeofday+0x43/0xd0
> [ 497.555055] [<ffffffffa032a78a>] svc_addsock+0xca/0x1e0 [sunrpc]
> [ 497.555068] [<ffffffffa0396b31>] ? nfsd_create_serv+0x111/0x180 [nfsd]
> [ 497.555075] [<ffffffff8128d47e>] ? simple_strtol+0xe/0x30
> [ 497.555084] [<ffffffffa03972b7>] ? get_int+0x57/0x70 [nfsd]
> [ 497.555094] [<ffffffffa03977e9>] __write_ports+0x119/0x140 [nfsd]
> [ 497.555103] [<ffffffffa039788a>] write_ports+0x7a/0xb0 [nfsd]
> [ 497.555112] [<ffffffffa0397810>] ? __write_ports+0x140/0x140 [nfsd]
> [ 497.555122] [<ffffffffa039713a>] nfsctl_transaction_write+0x6a/0x80 [nfsd]
> [ 497.555129] [<ffffffff81186207>] vfs_write+0xc7/0x1e0
> [ 497.555134] [<ffffffff8118643d>] SyS_write+0x5d/0xa0
> [ 497.555142] [<ffffffff814deaa2>] system_call_fastpath+0x16/0x1b
> [ 497.555146] Code: 00 00 00 01 55 48 89 e5 75 0d 48 c7 47 50 60 a1 31 a0 b8 01 00 00 00 c9 c3 66 90 48 8b 47 28 48 8b 57 18 55 83 40 20 01 48 89 e5 <48> 8b 42 58 83 40 1c 01 48 c7 47 50 f0 a1 31 a0
> c9 c3 66 66 66
> [ 497.555189] RIP [<ffffffffa031a170>] call_start+0x10/0x30 [sunrpc]
> [ 497.555200] RSP <ffff88043e8e9aa8>
> [ 497.555203] CR2: 0000000000000058
> [ 497.555208] ---[ end trace 34ca8d40727792e2 ]---
>
>>>
>>> Signed-off-by: Weng Meiling <[email protected]>
>>> ---
>>> fs/nfsd/nfsctl.c | 5 +++--
>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>>> index 1d74af2..4ff0db9 100644
>>> --- a/fs/nfsd/nfsctl.c
>>> +++ b/fs/nfsd/nfsctl.c
>>> @@ -15,6 +15,7 @@
>>> #include <linux/sunrpc/gss_krb5_enctypes.h>
>>> #include <linux/sunrpc/rpc_pipe_fs.h>
>>> #include <linux/module.h>
>>> +#include <linux/nsproxy.h>
>>>
>>> #include "idmap.h"
>>> #include "nfsd.h"
>>> @@ -389,7 +390,7 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size)
>>> {
>>> char *mesg = buf;
>>> int rv;
>>> - struct net *net = &init_net;
>>> + struct net *net = current->nsproxy->net_ns;
>>>
>>> if (size > 0) {
>>> int newthreads;
>>> @@ -857,7 +858,7 @@ static ssize_t __write_ports(struct file *file, char *buf, size_t size,
>>> static ssize_t write_ports(struct file *file, char *buf, size_t size)
>>> {
>>> ssize_t rv;
>>> - struct net *net = &init_net;
>>> + struct net *net = current->nsproxy->net_ns;
>>>
>>> mutex_lock(&nfsd_mutex);
>>> rv = __write_ports(file, buf, size, net);
>>> --
>>> 1.8.2.2
>>>
>>>
>>
>> .
>>
>



2013-12-30 09:06:08

by Weng Meiling

[permalink] [raw]
Subject: Re: NFSd 3.13 bug (Was "Re: [PATCH 3.4 9/9] nfsd: use the current net ns in write_threads() and write_ports()")

Hi Stanislav,

I test kernel with this patch, the problem has be fixed. Would you please send a
formal one? :) Thanks very much!

Weng Meiling
Thanks

On 2013/12/16 23:27, Stanislav Kinsbursky wrote:
> 16.12.2013 05:26, Weng Meiling пишет:
>> I backport the patch 11f779421a39b86da8a523d97e5fd3477878d44f "nfsd: containerize NFSd
>>> filesystem" and test. But I trigger a bug, this bug still exists in 3.13 kernel. The following
>>> is what I do:
>>>
>>> The steps:
>>>
>>> step 1: start NFS server in init_net net ns
>>> #service nfsserver start
>>>
>>> step 2: stop NFS server in non init_net net ns
>>> #ip netns add test
>>> #ip netns list
>>> test
>>> #ip netns exec test service nfsserver stop
>>>
>>> step 3: start NFS server again in the non init_net net ns
>>> #ip netns exec test service nfsserver start
>>>
>>> This step 3 will trigger kernel panic.
>
>
> This sequence can be reduced to steps 2 and 3.
>
>
>>> The reason seems that "ip
>>> netns exec" creates a new mount namespace, the changes to the
>>> new mount namespace don't propgate to other namespaces. So
>>> when stop NFS server in second step, the NFSD filesystem isn't
>>> umounted. When restart NFS server in third step, the NFSD
>>> filesystem will not remount, this result to the NFSD file
>>> system superblock's net ns is still init_net and RPCBIND client
>>> will be NULL when register RPC service with the local portmapper
>>> in svc_addsock(). Do you have any ideas about this problem?
>>>
>
> The problem here is that on NFS server stop, RPCBIND client were destroyed for init_net,
> because network namespace context is being taken from NFSd superblock.
> On NFS start start rpc.nfsd process creates socket in nested net and passes it into "write_ports",
> which leads to NFSd creation of RPCBIND socket in init_net because of the same reason. An attempt
> to register passed socket in nested net leads to panic. I think, this collusion should be handled
> as error and can be fixed like below.
>
> BTW, it looks to me. that mounts with namespace-aware superblocks can't just use the same
> superblock on new mount namespace creation and should be handled in more complex way.
>
> Eric, Al, could you share your opinion how this problem should be solved?
>
> =======================================================================================
>
>
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 7f55517..f34d9de 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -699,6 +699,11 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net)
> if (err != 0 || fd < 0)
> return -EINVAL;
>
> + if (svc_alien_sock(net, fd)) {
> + printk(KERN_ERR "%s: socket net is different to NFSd's one\n", __func__);
> + return -EINVAL;
> + }
> +
> err = nfsd_create_serv(net);
> if (err != 0)
> return err;
> diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
> index 62fd1b7..947009e 100644
> --- a/include/linux/sunrpc/svcsock.h
> +++ b/include/linux/sunrpc/svcsock.h
> @@ -56,6 +56,7 @@ int svc_recv(struct svc_rqst *, long);
> int svc_send(struct svc_rqst *);
> void svc_drop(struct svc_rqst *);
> void svc_sock_update_bufs(struct svc_serv *serv);
> +bool svc_alien_sock(struct net *net, int fd);
> int svc_addsock(struct svc_serv *serv, const int fd,
> char *name_return, const size_t len);
> void svc_init_xprt_sock(void);
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index b6e59f0..3ba5b87 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1397,6 +1397,17 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
> return svsk;
> }
>
> +bool svc_alien_sock(struct net *net, int fd)
> +{
> + int err;
> + struct socket *sock = sockfd_lookup(fd, &err);
> +
> + if (sock && (sock_net(sock->sk) != net))
> + return true;
> + return false;
> +}
> +EXPORT_SYMBOL_GPL(svc_alien_sock);
> +
> /**
> * svc_addsock - add a listener socket to an RPC service
> * @serv: pointer to RPC service to which to add a new listener
>
>
>
>



2013-12-10 03:15:42

by Weng Meiling

[permalink] [raw]
Subject: NFSd 3.13 bug (Was "Re: [PATCH 3.4 9/9] nfsd: use the current net ns in write_threads() and write_ports()")

Hi guys,

When I test NFS in different network namespace with the
3.13-rc2 kernel, I trigger a kernel panic.

On 2013/12/5 5:25, J. Bruce Fields wrote:
> On Wed, Dec 04, 2013 at 01:53:35PM +0800, Weng Meiling wrote:
>> Upstream commit f7fb86c6e639360ad9c253cec534819ef928a674 (nfsd: use
>> "init_net" for portmapper) introduced a bug.
>>
>> Starting NFSd in a non init_net network namespace will lead to
>> NULL pointer deference. Because RPCBIND client will be NULL when register
>> RPC service with the local portmapper in svc_addsock().
>>
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
>> IP: [<ffffffffa0439150>] call_start+0x10/0x30 [sunrpc]
>> ...
>> Pid: 27770, comm: rpc.nfsd ...
>> RIP: 0010:[<ffffffffa0439150>] [<ffffffffa0439150>] call_start+0x10/0x30 [sunrpc]
>> ...
>> [<ffffffffa0442841>] __rpc_execute+0x91/0x160 [sunrpc]
>> [<ffffffffa0442981>] rpc_execute+0x71/0x80 [sunrpc]
>> [<ffffffffa043ab49>] rpc_run_task+0x89/0xa0 [sunrpc]
>> [<ffffffffa043ac5d>] rpc_call_sync+0x3d/0x70 [sunrpc]
>> [<ffffffffa044b316>] rpcb_register+0xa6/0xd0 [sunrpc]
>> [<ffffffffa0444ede>] __svc_register+0x1ae/0x1c0 [sunrpc]
>> [<ffffffff8114f975>] ? cache_alloc_refill+0x85/0x290
>> [<ffffffffa0444f7f>] svc_register+0x8f/0xc0 [sunrpc]
>> [<ffffffff811504f3>] ? kmem_cache_alloc_trace+0xc3/0x1d0
>> [<ffffffffa04472f8>] svc_setup_socket+0x1a8/0x2c0 [sunrpc]
>> [<ffffffff81009546>] ? read_tsc+0x16/0x40
>> [<ffffffffa0448078>] svc_addsock+0x118/0x1c0 [sunrpc]
>> [<ffffffff81090ee5>] ? do_gettimeofday+0x15/0x50
>> [<ffffffffa049e69c>] ? nfsd_create_serv+0xdc/0x150 [nfsd]
>> [<ffffffff8125605c>] ? simple_strtoull+0x2c/0x50
>> [<ffffffffa049fdce>] __write_ports+0x1fe/0x230 [nfsd]
>> [<ffffffffa049fe37>] write_ports+0x37/0x60 [nfsd]
>> [<ffffffffa049fe00>] ? __write_ports+0x230/0x230 [nfsd]
>> [<ffffffffa049edd2>] nfsctl_transaction_write+0x72/0x90 [nfsd]
>> [<ffffffff8116573b>] vfs_write+0xcb/0x130
>> [<ffffffff81165890>] sys_write+0x50/0x90
>>
>> Fix it by using the current's network namespace so NFSd uses the
>> consistent net ns all the time.
>
> Everything else looks like a straightforward backport, but doing this
> differently from upstream makes me nervous. Don't we also want to take
> 11f779421a39b86da8a523d97e5fd3477878d44f "nfsd: containerize NFSd
> filesystem" ? (Stanislav?)
>
> --b.
>

I backport the patch 11f779421a39b86da8a523d97e5fd3477878d44f "nfsd: containerize NFSd
filesystem" and test. But I trigger a bug, this bug still exists in 3.13 kernel. The following
is what I do:

The steps:

step 1: start NFS server in init_net net ns
#service nfsserver start

step 2: stop NFS server in non init_net net ns
#ip netns add test
#ip netns list
test
#ip netns exec test service nfsserver stop

step 3: start NFS server again in the non init_net net ns
#ip netns exec test service nfsserver start

This step 3 will trigger kernel panic. The reason seems that "ip
netns exec" creates a new mount namespace, the changes to the
new mount namespace don't propgate to other namespaces. So
when stop NFS server in second step, the NFSD filesystem isn't
umounted. When restart NFS server in third step, the NFSD
filesystem will not remount, this result to the NFSD file
system superblock's net ns is still init_net and RPCBIND client
will be NULL when register RPC service with the local portmapper
in svc_addsock(). Do you have any ideas about this problem?

the detail call trace:
[ 497.554677] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
[ 497.554687] IP: [<ffffffffa031a170>] call_start+0x10/0x30 [sunrpc]
[ 497.554707] PGD 0
[ 497.554711] Oops: 0000 [#1] SMP
[ 497.554716] Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc oid_registry edd af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave loop dm_mod e1000e iTCO_wdt
iTCO_vendor_support i2c_i801 bnx2 ipv6 lpc_ich i7core_edac edac_core acpi_cpufreq ehci_pci button ses enclosure serio_raw sg rtc_cmos mfd_core ptp hid_generic pps_core i2c_core pcspkr ext3 jbd mbcache
usbhid hid uhci_hcd ehci_hcd usbcore sd_mod usb_common crc_t10dif crct10dif_common processor thermal_sys hwmon scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh ata_generic ata_piix libata
megaraid_sas scsi_mod
[ 497.554788] CPU: 2 PID: 7837 Comm: rpc.nfsd Not tainted 3.13.0-rc2-0.1-default+ #1
[ 497.554793] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285 /BC11BTSA , BIOS CTSAV036 04/27/2011
[ 497.554800] task: ffff8800ba76e2d0 ti: ffff88043e8e8000 task.ti: ffff88043e8e8000
[ 497.554805] RIP: 0010:[<ffffffffa031a170>] [<ffffffffa031a170>] call_start+0x10/0x30 [sunrpc]
[ 497.554819] RSP: 0018:ffff88043e8e9aa8 EFLAGS: 00010202
[ 497.554823] RAX: ffffffffa033f4b8 RBX: ffff8800bb030040 RCX: 0000000000000034
[ 497.554828] RDX: 0000000000000000 RSI: ffff8800bb0300b0 RDI: ffff8800bb030040
[ 497.554832] RBP: ffff88043e8e9aa8 R08: 0040000000000000 R09: 0200000000000000
[ 497.554836] R10: 0000000000000000 R11: ffff8802348fe040 R12: ffff8800bb030040
[ 497.554841] R13: ffffffffa031a160 R14: 0000000000000000 R15: ffffffffa031a160
[ 497.554846] FS: 00007f2fa0536700(0000) GS:ffff88023fc40000(0000) knlGS:0000000000000000
[ 497.554851] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 497.554855] CR2: 0000000000000058 CR3: 0000000434e30000 CR4: 00000000000007e0
[ 497.554859] Stack:
[ 497.554862] ffff88043e8e9af8 ffffffffa0323f61 ffff00066c0a0100 ffff8800bb0300b0
[ 497.554871] 000000003e8e9ae8 ffff8800bb030040 ffff8800bb030040 0000000000000000
[ 497.554878] 0000000000000000 0000000000000002 ffff88043e8e9b28 ffffffffa03240ed
[ 497.554886] Call Trace:
[ 497.554902] [<ffffffffa0323f61>] __rpc_execute+0xa1/0x190 [sunrpc]
[ 497.554918] [<ffffffffa03240ed>] rpc_execute+0x9d/0xc0 [sunrpc]
[ 497.554930] [<ffffffffa031c3e9>] rpc_run_task+0x89/0xa0 [sunrpc]
[ 497.554943] [<ffffffffa031c4fe>] rpc_call_sync+0x3e/0xa0 [sunrpc]
[ 497.554961] [<ffffffffa032d337>] rpcb_register_call+0x37/0x60 [sunrpc]
[ 497.554979] [<ffffffffa032d53c>] rpcb_register+0x9c/0xb0 [sunrpc]
[ 497.554996] [<ffffffffa03270ee>] __svc_register+0x1ae/0x1c0 [sunrpc]
[ 497.555012] [<ffffffffa0327190>] svc_register+0x90/0xe0 [sunrpc]
[ 497.555029] [<ffffffffa032a157>] svc_setup_socket+0x1e7/0x300 [sunrpc]
[ 497.555038] [<ffffffff810b39b3>] ? __getnstimeofday+0x43/0xd0
[ 497.555055] [<ffffffffa032a78a>] svc_addsock+0xca/0x1e0 [sunrpc]
[ 497.555068] [<ffffffffa0396b31>] ? nfsd_create_serv+0x111/0x180 [nfsd]
[ 497.555075] [<ffffffff8128d47e>] ? simple_strtol+0xe/0x30
[ 497.555084] [<ffffffffa03972b7>] ? get_int+0x57/0x70 [nfsd]
[ 497.555094] [<ffffffffa03977e9>] __write_ports+0x119/0x140 [nfsd]
[ 497.555103] [<ffffffffa039788a>] write_ports+0x7a/0xb0 [nfsd]
[ 497.555112] [<ffffffffa0397810>] ? __write_ports+0x140/0x140 [nfsd]
[ 497.555122] [<ffffffffa039713a>] nfsctl_transaction_write+0x6a/0x80 [nfsd]
[ 497.555129] [<ffffffff81186207>] vfs_write+0xc7/0x1e0
[ 497.555134] [<ffffffff8118643d>] SyS_write+0x5d/0xa0
[ 497.555142] [<ffffffff814deaa2>] system_call_fastpath+0x16/0x1b
[ 497.555146] Code: 00 00 00 01 55 48 89 e5 75 0d 48 c7 47 50 60 a1 31 a0 b8 01 00 00 00 c9 c3 66 90 48 8b 47 28 48 8b 57 18 55 83 40 20 01 48 89 e5 <48> 8b 42 58 83 40 1c 01 48 c7 47 50 f0 a1 31 a0
c9 c3 66 66 66
[ 497.555189] RIP [<ffffffffa031a170>] call_start+0x10/0x30 [sunrpc]
[ 497.555200] RSP <ffff88043e8e9aa8>
[ 497.555203] CR2: 0000000000000058
[ 497.555208] ---[ end trace 34ca8d40727792e2 ]---

>>
>> Signed-off-by: Weng Meiling <[email protected]>
>> ---
>> fs/nfsd/nfsctl.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>> index 1d74af2..4ff0db9 100644
>> --- a/fs/nfsd/nfsctl.c
>> +++ b/fs/nfsd/nfsctl.c
>> @@ -15,6 +15,7 @@
>> #include <linux/sunrpc/gss_krb5_enctypes.h>
>> #include <linux/sunrpc/rpc_pipe_fs.h>
>> #include <linux/module.h>
>> +#include <linux/nsproxy.h>
>>
>> #include "idmap.h"
>> #include "nfsd.h"
>> @@ -389,7 +390,7 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size)
>> {
>> char *mesg = buf;
>> int rv;
>> - struct net *net = &init_net;
>> + struct net *net = current->nsproxy->net_ns;
>>
>> if (size > 0) {
>> int newthreads;
>> @@ -857,7 +858,7 @@ static ssize_t __write_ports(struct file *file, char *buf, size_t size,
>> static ssize_t write_ports(struct file *file, char *buf, size_t size)
>> {
>> ssize_t rv;
>> - struct net *net = &init_net;
>> + struct net *net = current->nsproxy->net_ns;
>>
>> mutex_lock(&nfsd_mutex);
>> rv = __write_ports(file, buf, size, net);
>> --
>> 1.8.2.2
>>
>>
>
> .
>



2013-12-04 05:55:16

by Weng Meiling

[permalink] [raw]
Subject: [PATCH 3.4 9/9] nfsd: use the current net ns in write_threads() and write_ports()

Upstream commit f7fb86c6e639360ad9c253cec534819ef928a674 (nfsd: use
"init_net" for portmapper) introduced a bug.

Starting NFSd in a non init_net network namespace will lead to
NULL pointer deference. Because RPCBIND client will be NULL when register
RPC service with the local portmapper in svc_addsock().

BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
IP: [<ffffffffa0439150>] call_start+0x10/0x30 [sunrpc]
...
Pid: 27770, comm: rpc.nfsd ...
RIP: 0010:[<ffffffffa0439150>] [<ffffffffa0439150>] call_start+0x10/0x30 [sunrpc]
...
[<ffffffffa0442841>] __rpc_execute+0x91/0x160 [sunrpc]
[<ffffffffa0442981>] rpc_execute+0x71/0x80 [sunrpc]
[<ffffffffa043ab49>] rpc_run_task+0x89/0xa0 [sunrpc]
[<ffffffffa043ac5d>] rpc_call_sync+0x3d/0x70 [sunrpc]
[<ffffffffa044b316>] rpcb_register+0xa6/0xd0 [sunrpc]
[<ffffffffa0444ede>] __svc_register+0x1ae/0x1c0 [sunrpc]
[<ffffffff8114f975>] ? cache_alloc_refill+0x85/0x290
[<ffffffffa0444f7f>] svc_register+0x8f/0xc0 [sunrpc]
[<ffffffff811504f3>] ? kmem_cache_alloc_trace+0xc3/0x1d0
[<ffffffffa04472f8>] svc_setup_socket+0x1a8/0x2c0 [sunrpc]
[<ffffffff81009546>] ? read_tsc+0x16/0x40
[<ffffffffa0448078>] svc_addsock+0x118/0x1c0 [sunrpc]
[<ffffffff81090ee5>] ? do_gettimeofday+0x15/0x50
[<ffffffffa049e69c>] ? nfsd_create_serv+0xdc/0x150 [nfsd]
[<ffffffff8125605c>] ? simple_strtoull+0x2c/0x50
[<ffffffffa049fdce>] __write_ports+0x1fe/0x230 [nfsd]
[<ffffffffa049fe37>] write_ports+0x37/0x60 [nfsd]
[<ffffffffa049fe00>] ? __write_ports+0x230/0x230 [nfsd]
[<ffffffffa049edd2>] nfsctl_transaction_write+0x72/0x90 [nfsd]
[<ffffffff8116573b>] vfs_write+0xcb/0x130
[<ffffffff81165890>] sys_write+0x50/0x90

Fix it by using the current's network namespace so NFSd uses the
consistent net ns all the time.

Signed-off-by: Weng Meiling <[email protected]>
---
fs/nfsd/nfsctl.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 1d74af2..4ff0db9 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -15,6 +15,7 @@
#include <linux/sunrpc/gss_krb5_enctypes.h>
#include <linux/sunrpc/rpc_pipe_fs.h>
#include <linux/module.h>
+#include <linux/nsproxy.h>

#include "idmap.h"
#include "nfsd.h"
@@ -389,7 +390,7 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size)
{
char *mesg = buf;
int rv;
- struct net *net = &init_net;
+ struct net *net = current->nsproxy->net_ns;

if (size > 0) {
int newthreads;
@@ -857,7 +858,7 @@ static ssize_t __write_ports(struct file *file, char *buf, size_t size,
static ssize_t write_ports(struct file *file, char *buf, size_t size)
{
ssize_t rv;
- struct net *net = &init_net;
+ struct net *net = current->nsproxy->net_ns;

mutex_lock(&nfsd_mutex);
rv = __write_ports(file, buf, size, net);
--
1.8.2.2



2013-12-04 05:55:34

by Weng Meiling

[permalink] [raw]
Subject: [PATCH 3.4 4/9] nfsd: pass net to nfsd_create_serv()

From: Stanislav Kinsbursky <[email protected]>

commit 6777436b0f072fb20a025a73e9b67a35ad8a5451 upstream.

Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Weng Meiling <[email protected]>
---
fs/nfsd/nfsctl.c | 4 ++--
fs/nfsd/nfsd.h | 2 +-
fs/nfsd/nfssvc.c | 5 ++---
3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index d014727..e18f0a5 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -657,7 +657,7 @@ static ssize_t __write_ports_addfd(char *buf)
if (err != 0 || fd < 0)
return -EINVAL;

- err = nfsd_create_serv();
+ err = nfsd_create_serv(net);
if (err != 0)
return err;

@@ -708,7 +708,7 @@ static ssize_t __write_ports_addxprt(char *buf)
if (port < 1 || port > USHRT_MAX)
return -EINVAL;

- err = nfsd_create_serv();
+ err = nfsd_create_serv(net);
if (err != 0)
return err;

diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 1336a65..356da43 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -101,7 +101,7 @@ enum vers_op {NFSD_SET, NFSD_CLEAR, NFSD_TEST, NFSD_AVAIL };
int nfsd_vers(int vers, enum vers_op change);
int nfsd_minorversion(u32 minorversion, enum vers_op change);
void nfsd_reset_versions(void);
-int nfsd_create_serv(void);
+int nfsd_create_serv(struct net *net);

extern int nfsd_max_blksize;

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index f8dae75..632d11e 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -326,10 +326,9 @@ static int nfsd_get_default_max_blksize(void)
return ret;
}

-int nfsd_create_serv(void)
+int nfsd_create_serv(struct net *net)
{
int error;
- struct net *net = &init_net;

WARN_ON(!mutex_is_locked(&nfsd_mutex));
if (nfsd_serv) {
@@ -451,7 +450,7 @@ nfsd_svc(unsigned short port, int nrservs)
if (nrservs == 0 && nfsd_serv == NULL)
goto out;

- error = nfsd_create_serv();
+ error = nfsd_create_serv(net);
if (error)
goto out;

--
1.8.2.2



2013-12-04 05:54:43

by Weng Meiling

[permalink] [raw]
Subject: [PATCH 3.4 1/9] nfsd: use "init_net" for portmapper

From: Stanislav Kinsbursky <[email protected]>

commit f7fb86c6e639360ad9c253cec534819ef928a674 upstream.

There could be a situation, when NFSd was started in one network namespace, but
stopped in another one.
This will trigger kernel panic, because RPCBIND client is stored on per-net
NFSd data, and will be NULL on NFSd shutdown.

Signed-off-by: Stanislav Kinsbursky <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Weng Meiling <[email protected]>
---
fs/nfsd/nfssvc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 53459b0..25d839e 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -11,7 +11,6 @@
#include <linux/module.h>
#include <linux/fs_struct.h>
#include <linux/swap.h>
-#include <linux/nsproxy.h>

#include <linux/sunrpc/stats.h>
#include <linux/sunrpc/svcsock.h>
@@ -330,7 +329,7 @@ static int nfsd_get_default_max_blksize(void)
int nfsd_create_serv(void)
{
int error;
- struct net *net = current->nsproxy->net_ns;
+ struct net *net = &init_net;

WARN_ON(!mutex_is_locked(&nfsd_mutex));
if (nfsd_serv) {
--
1.8.2.2



2013-12-04 05:55:09

by Weng Meiling

[permalink] [raw]
Subject: [PATCH 3.4 7/9] nfsd: pass net to __write_ports() and down

From: Stanislav Kinsbursky <[email protected]>

commit 081603520b25f7b35ef63a363376a17c36ef74ed upstream.

Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
[wengmeiling: backport to 3.4:
- adjust context
- add net_ns parameter to __write_ports_delxprt()]
Signed-off-by: Weng Meiling <[email protected]>
---
fs/nfsd/nfsctl.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index f58e0f9..1d74af2 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -650,11 +650,10 @@ static ssize_t __write_ports_names(char *buf)
* a socket of a supported family/protocol, and we use it as an
* nfsd listener.
*/
-static ssize_t __write_ports_addfd(char *buf)
+static ssize_t __write_ports_addfd(char *buf, struct net *net)
{
char *mesg = buf;
int fd, err;
- struct net *net = &init_net;

err = get_int(&mesg, &fd);
if (err != 0 || fd < 0)
@@ -698,12 +697,11 @@ static ssize_t __write_ports_delfd(char *buf)
* A transport listener is added by writing it's transport name and
* a port number.
*/
-static ssize_t __write_ports_addxprt(char *buf)
+static ssize_t __write_ports_addxprt(char *buf, struct net *net)
{
char transport[16];
struct svc_xprt *xprt;
int port, err;
- struct net *net = &init_net;

if (sscanf(buf, "%15s %4u", transport, &port) != 2)
return -EINVAL;
@@ -743,7 +741,7 @@ out_err:
* A transport listener is removed by writing a "-", it's transport
* name, and it's port number.
*/
-static ssize_t __write_ports_delxprt(char *buf)
+static ssize_t __write_ports_delxprt(char *buf, struct net *net)
{
struct svc_xprt *xprt;
char transport[16];
@@ -755,7 +753,7 @@ static ssize_t __write_ports_delxprt(char *buf)
if (port < 1 || port > USHRT_MAX || nfsd_serv == NULL)
return -EINVAL;

- xprt = svc_find_xprt(nfsd_serv, transport, &init_net, AF_UNSPEC, port);
+ xprt = svc_find_xprt(nfsd_serv, transport, net, AF_UNSPEC, port);
if (xprt == NULL)
return -ENOTCONN;

@@ -764,22 +762,23 @@ static ssize_t __write_ports_delxprt(char *buf)
return 0;
}

-static ssize_t __write_ports(struct file *file, char *buf, size_t size)
+static ssize_t __write_ports(struct file *file, char *buf, size_t size,
+ struct net *net)
{
if (size == 0)
return __write_ports_names(buf);

if (isdigit(buf[0]))
- return __write_ports_addfd(buf);
+ return __write_ports_addfd(buf, net);

if (buf[0] == '-' && isdigit(buf[1]))
return __write_ports_delfd(buf);

if (isalpha(buf[0]))
- return __write_ports_addxprt(buf);
+ return __write_ports_addxprt(buf, net);

if (buf[0] == '-' && isalpha(buf[1]))
- return __write_ports_delxprt(buf);
+ return __write_ports_delxprt(buf, net);

return -EINVAL;
}
@@ -858,9 +857,10 @@ static ssize_t __write_ports(struct file *file, char *buf, size_t size)
static ssize_t write_ports(struct file *file, char *buf, size_t size)
{
ssize_t rv;
+ struct net *net = &init_net;

mutex_lock(&nfsd_mutex);
- rv = __write_ports(file, buf, size);
+ rv = __write_ports(file, buf, size, net);
mutex_unlock(&nfsd_mutex);
return rv;
}
--
1.8.2.2



2013-12-30 09:22:54

by Stanislav Kinsbursky

[permalink] [raw]
Subject: Re: NFSd 3.13 bug (Was "Re: [PATCH 3.4 9/9] nfsd: use the current net ns in write_threads() and write_ports()")

30.12.2013 13:04, Weng Meiling пишет:
> Hi Stanislav,
>
> I test kernel with this patch, the problem has be fixed. Would you please send a
> formal one? :) Thanks very much!
>

Thank you, Weng!
Sure, I'll resend.

> Weng Meiling
> Thanks
>
> On 2013/12/16 23:27, Stanislav Kinsbursky wrote:
>> 16.12.2013 05:26, Weng Meiling пишет:
>>> I backport the patch 11f779421a39b86da8a523d97e5fd3477878d44f "nfsd: containerize NFSd
>>>> filesystem" and test. But I trigger a bug, this bug still exists in 3.13 kernel. The following
>>>> is what I do:
>>>>
>>>> The steps:
>>>>
>>>> step 1: start NFS server in init_net net ns
>>>> #service nfsserver start
>>>>
>>>> step 2: stop NFS server in non init_net net ns
>>>> #ip netns add test
>>>> #ip netns list
>>>> test
>>>> #ip netns exec test service nfsserver stop
>>>>
>>>> step 3: start NFS server again in the non init_net net ns
>>>> #ip netns exec test service nfsserver start
>>>>
>>>> This step 3 will trigger kernel panic.
>>
>>
>> This sequence can be reduced to steps 2 and 3.
>>
>>
>>>> The reason seems that "ip
>>>> netns exec" creates a new mount namespace, the changes to the
>>>> new mount namespace don't propgate to other namespaces. So
>>>> when stop NFS server in second step, the NFSD filesystem isn't
>>>> umounted. When restart NFS server in third step, the NFSD
>>>> filesystem will not remount, this result to the NFSD file
>>>> system superblock's net ns is still init_net and RPCBIND client
>>>> will be NULL when register RPC service with the local portmapper
>>>> in svc_addsock(). Do you have any ideas about this problem?
>>>>
>>
>> The problem here is that on NFS server stop, RPCBIND client were destroyed for init_net,
>> because network namespace context is being taken from NFSd superblock.
>> On NFS start start rpc.nfsd process creates socket in nested net and passes it into "write_ports",
>> which leads to NFSd creation of RPCBIND socket in init_net because of the same reason. An attempt
>> to register passed socket in nested net leads to panic. I think, this collusion should be handled
>> as error and can be fixed like below.
>>
>> BTW, it looks to me. that mounts with namespace-aware superblocks can't just use the same
>> superblock on new mount namespace creation and should be handled in more complex way.
>>
>> Eric, Al, could you share your opinion how this problem should be solved?
>>
>> =======================================================================================
>>
>>
>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>> index 7f55517..f34d9de 100644
>> --- a/fs/nfsd/nfsctl.c
>> +++ b/fs/nfsd/nfsctl.c
>> @@ -699,6 +699,11 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net)
>> if (err != 0 || fd < 0)
>> return -EINVAL;
>>
>> + if (svc_alien_sock(net, fd)) {
>> + printk(KERN_ERR "%s: socket net is different to NFSd's one\n", __func__);
>> + return -EINVAL;
>> + }
>> +
>> err = nfsd_create_serv(net);
>> if (err != 0)
>> return err;
>> diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
>> index 62fd1b7..947009e 100644
>> --- a/include/linux/sunrpc/svcsock.h
>> +++ b/include/linux/sunrpc/svcsock.h
>> @@ -56,6 +56,7 @@ int svc_recv(struct svc_rqst *, long);
>> int svc_send(struct svc_rqst *);
>> void svc_drop(struct svc_rqst *);
>> void svc_sock_update_bufs(struct svc_serv *serv);
>> +bool svc_alien_sock(struct net *net, int fd);
>> int svc_addsock(struct svc_serv *serv, const int fd,
>> char *name_return, const size_t len);
>> void svc_init_xprt_sock(void);
>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>> index b6e59f0..3ba5b87 100644
>> --- a/net/sunrpc/svcsock.c
>> +++ b/net/sunrpc/svcsock.c
>> @@ -1397,6 +1397,17 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>> return svsk;
>> }
>>
>> +bool svc_alien_sock(struct net *net, int fd)
>> +{
>> + int err;
>> + struct socket *sock = sockfd_lookup(fd, &err);
>> +
>> + if (sock && (sock_net(sock->sk) != net))
>> + return true;
>> + return false;
>> +}
>> +EXPORT_SYMBOL_GPL(svc_alien_sock);
>> +
>> /**
>> * svc_addsock - add a listener socket to an RPC service
>> * @serv: pointer to RPC service to which to add a new listener
>>
>>
>>
>>
>
>


--
Best regards,
Stanislav Kinsbursky

2013-12-16 15:28:01

by Stanislav Kinsbursky

[permalink] [raw]
Subject: Re: NFSd 3.13 bug (Was "Re: [PATCH 3.4 9/9] nfsd: use the current net ns in write_threads() and write_ports()")

16.12.2013 05:26, Weng Meiling пишет:
> I backport the patch 11f779421a39b86da8a523d97e5fd3477878d44f "nfsd: containerize NFSd
>>filesystem" and test. But I trigger a bug, this bug still exists in 3.13 kernel. The following
>>is what I do:
>>
>>The steps:
>>
>>step 1: start NFS server in init_net net ns
>>#service nfsserver start
>>
>>step 2: stop NFS server in non init_net net ns
>>#ip netns add test
>>#ip netns list
>>test
>>#ip netns exec test service nfsserver stop
>>
>>step 3: start NFS server again in the non init_net net ns
>>#ip netns exec test service nfsserver start
>>
>>This step 3 will trigger kernel panic.


This sequence can be reduced to steps 2 and 3.


>>The reason seems that "ip
>>netns exec" creates a new mount namespace, the changes to the
>>new mount namespace don't propgate to other namespaces. So
>>when stop NFS server in second step, the NFSD filesystem isn't
>>umounted. When restart NFS server in third step, the NFSD
>>filesystem will not remount, this result to the NFSD file
>>system superblock's net ns is still init_net and RPCBIND client
>>will be NULL when register RPC service with the local portmapper
>>in svc_addsock(). Do you have any ideas about this problem?
>>

The problem here is that on NFS server stop, RPCBIND client were destroyed for init_net,
because network namespace context is being taken from NFSd superblock.
On NFS start start rpc.nfsd process creates socket in nested net and passes it into "write_ports",
which leads to NFSd creation of RPCBIND socket in init_net because of the same reason. An attempt
to register passed socket in nested net leads to panic. I think, this collusion should be handled
as error and can be fixed like below.

BTW, it looks to me. that mounts with namespace-aware superblocks can't just use the same
superblock on new mount namespace creation and should be handled in more complex way.

Eric, Al, could you share your opinion how this problem should be solved?

=======================================================================================


diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 7f55517..f34d9de 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -699,6 +699,11 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net)
if (err != 0 || fd < 0)
return -EINVAL;

+ if (svc_alien_sock(net, fd)) {
+ printk(KERN_ERR "%s: socket net is different to NFSd's one\n", __func__);
+ return -EINVAL;
+ }
+
err = nfsd_create_serv(net);
if (err != 0)
return err;
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index 62fd1b7..947009e 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -56,6 +56,7 @@ int svc_recv(struct svc_rqst *, long);
int svc_send(struct svc_rqst *);
void svc_drop(struct svc_rqst *);
void svc_sock_update_bufs(struct svc_serv *serv);
+bool svc_alien_sock(struct net *net, int fd);
int svc_addsock(struct svc_serv *serv, const int fd,
char *name_return, const size_t len);
void svc_init_xprt_sock(void);
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index b6e59f0..3ba5b87 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1397,6 +1397,17 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
return svsk;
}

+bool svc_alien_sock(struct net *net, int fd)
+{
+ int err;
+ struct socket *sock = sockfd_lookup(fd, &err);
+
+ if (sock && (sock_net(sock->sk) != net))
+ return true;
+ return false;
+}
+EXPORT_SYMBOL_GPL(svc_alien_sock);
+
/**
* svc_addsock - add a listener socket to an RPC service
* @serv: pointer to RPC service to which to add a new listener




--
Best regards,
Stanislav Kinsbursky

2013-12-04 05:54:39

by Weng Meiling

[permalink] [raw]
Subject: [PATCH 3.4 5/9] nfsd: pass net to nfsd_svc()

From: Stanislav Kinsbursky <[email protected]>

commit d41a9417cd89a69f58a26935034b4264a2d882d6 upstream.

Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
[wengmeiling: backport to 3.4:
- adjust context
- one more parameter(int port) for nfsd_svc()]
Signed-off-by: Weng Meiling <[email protected]>
---
fs/nfsd/nfsctl.c | 4 +++-
fs/nfsd/nfsd.h | 2 +-
fs/nfsd/nfssvc.c | 3 +--
3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index e18f0a5..9463bc0 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -389,6 +389,8 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size)
{
char *mesg = buf;
int rv;
+ struct net *net = &init_net;
+
if (size > 0) {
int newthreads;
rv = get_int(&mesg, &newthreads);
@@ -396,7 +398,7 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size)
return rv;
if (newthreads < 0)
return -EINVAL;
- rv = nfsd_svc(NFS_PORT, newthreads);
+ rv = nfsd_svc(NFS_PORT, newthreads, net);
if (rv < 0)
return rv;
} else
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 356da43..5007654 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -65,7 +65,7 @@ extern const struct seq_operations nfs_exports_op;
/*
* Function prototypes.
*/
-int nfsd_svc(unsigned short port, int nrservs);
+int nfsd_svc(unsigned short port, int nrservs, struct net *net);
int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp);

int nfsd_nrthreads(void);
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 632d11e..b8aea64 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -434,11 +434,10 @@ int nfsd_set_nrthreads(int n, int *nthreads)
* this is the first time nrservs is nonzero.
*/
int
-nfsd_svc(unsigned short port, int nrservs)
+nfsd_svc(unsigned short port, int nrservs, struct net *net)
{
int error;
bool nfsd_up_before;
- struct net *net = &init_net;

mutex_lock(&nfsd_mutex);
dprintk("nfsd: creating service\n");
--
1.8.2.2



2013-12-06 18:32:47

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 3.4 9/9] nfsd: use the current net ns in write_threads() and write_ports()

On Wed, Dec 04, 2013 at 04:25:33PM -0500, J. Bruce Fields wrote:
> On Wed, Dec 04, 2013 at 01:53:35PM +0800, Weng Meiling wrote:
> > Upstream commit f7fb86c6e639360ad9c253cec534819ef928a674 (nfsd: use
> > "init_net" for portmapper) introduced a bug.
> >
> > Starting NFSd in a non init_net network namespace will lead to
> > NULL pointer deference. Because RPCBIND client will be NULL when register
> > RPC service with the local portmapper in svc_addsock().
> >
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
> > IP: [<ffffffffa0439150>] call_start+0x10/0x30 [sunrpc]
> > ...
> > Pid: 27770, comm: rpc.nfsd ...
> > RIP: 0010:[<ffffffffa0439150>] [<ffffffffa0439150>] call_start+0x10/0x30 [sunrpc]
> > ...
> > [<ffffffffa0442841>] __rpc_execute+0x91/0x160 [sunrpc]
> > [<ffffffffa0442981>] rpc_execute+0x71/0x80 [sunrpc]
> > [<ffffffffa043ab49>] rpc_run_task+0x89/0xa0 [sunrpc]
> > [<ffffffffa043ac5d>] rpc_call_sync+0x3d/0x70 [sunrpc]
> > [<ffffffffa044b316>] rpcb_register+0xa6/0xd0 [sunrpc]
> > [<ffffffffa0444ede>] __svc_register+0x1ae/0x1c0 [sunrpc]
> > [<ffffffff8114f975>] ? cache_alloc_refill+0x85/0x290
> > [<ffffffffa0444f7f>] svc_register+0x8f/0xc0 [sunrpc]
> > [<ffffffff811504f3>] ? kmem_cache_alloc_trace+0xc3/0x1d0
> > [<ffffffffa04472f8>] svc_setup_socket+0x1a8/0x2c0 [sunrpc]
> > [<ffffffff81009546>] ? read_tsc+0x16/0x40
> > [<ffffffffa0448078>] svc_addsock+0x118/0x1c0 [sunrpc]
> > [<ffffffff81090ee5>] ? do_gettimeofday+0x15/0x50
> > [<ffffffffa049e69c>] ? nfsd_create_serv+0xdc/0x150 [nfsd]
> > [<ffffffff8125605c>] ? simple_strtoull+0x2c/0x50
> > [<ffffffffa049fdce>] __write_ports+0x1fe/0x230 [nfsd]
> > [<ffffffffa049fe37>] write_ports+0x37/0x60 [nfsd]
> > [<ffffffffa049fe00>] ? __write_ports+0x230/0x230 [nfsd]
> > [<ffffffffa049edd2>] nfsctl_transaction_write+0x72/0x90 [nfsd]
> > [<ffffffff8116573b>] vfs_write+0xcb/0x130
> > [<ffffffff81165890>] sys_write+0x50/0x90
> >
> > Fix it by using the current's network namespace so NFSd uses the
> > consistent net ns all the time.
>
> Everything else looks like a straightforward backport, but doing this
> differently from upstream makes me nervous. Don't we also want to take
> 11f779421a39b86da8a523d97e5fd3477878d44f "nfsd: containerize NFSd
> filesystem" ? (Stanislav?)

I'd prefer not doing it differently from upstream as well.

thanks,

greg k-h

2013-12-04 21:25:34

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 3.4 9/9] nfsd: use the current net ns in write_threads() and write_ports()

On Wed, Dec 04, 2013 at 01:53:35PM +0800, Weng Meiling wrote:
> Upstream commit f7fb86c6e639360ad9c253cec534819ef928a674 (nfsd: use
> "init_net" for portmapper) introduced a bug.
>
> Starting NFSd in a non init_net network namespace will lead to
> NULL pointer deference. Because RPCBIND client will be NULL when register
> RPC service with the local portmapper in svc_addsock().
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
> IP: [<ffffffffa0439150>] call_start+0x10/0x30 [sunrpc]
> ...
> Pid: 27770, comm: rpc.nfsd ...
> RIP: 0010:[<ffffffffa0439150>] [<ffffffffa0439150>] call_start+0x10/0x30 [sunrpc]
> ...
> [<ffffffffa0442841>] __rpc_execute+0x91/0x160 [sunrpc]
> [<ffffffffa0442981>] rpc_execute+0x71/0x80 [sunrpc]
> [<ffffffffa043ab49>] rpc_run_task+0x89/0xa0 [sunrpc]
> [<ffffffffa043ac5d>] rpc_call_sync+0x3d/0x70 [sunrpc]
> [<ffffffffa044b316>] rpcb_register+0xa6/0xd0 [sunrpc]
> [<ffffffffa0444ede>] __svc_register+0x1ae/0x1c0 [sunrpc]
> [<ffffffff8114f975>] ? cache_alloc_refill+0x85/0x290
> [<ffffffffa0444f7f>] svc_register+0x8f/0xc0 [sunrpc]
> [<ffffffff811504f3>] ? kmem_cache_alloc_trace+0xc3/0x1d0
> [<ffffffffa04472f8>] svc_setup_socket+0x1a8/0x2c0 [sunrpc]
> [<ffffffff81009546>] ? read_tsc+0x16/0x40
> [<ffffffffa0448078>] svc_addsock+0x118/0x1c0 [sunrpc]
> [<ffffffff81090ee5>] ? do_gettimeofday+0x15/0x50
> [<ffffffffa049e69c>] ? nfsd_create_serv+0xdc/0x150 [nfsd]
> [<ffffffff8125605c>] ? simple_strtoull+0x2c/0x50
> [<ffffffffa049fdce>] __write_ports+0x1fe/0x230 [nfsd]
> [<ffffffffa049fe37>] write_ports+0x37/0x60 [nfsd]
> [<ffffffffa049fe00>] ? __write_ports+0x230/0x230 [nfsd]
> [<ffffffffa049edd2>] nfsctl_transaction_write+0x72/0x90 [nfsd]
> [<ffffffff8116573b>] vfs_write+0xcb/0x130
> [<ffffffff81165890>] sys_write+0x50/0x90
>
> Fix it by using the current's network namespace so NFSd uses the
> consistent net ns all the time.

Everything else looks like a straightforward backport, but doing this
differently from upstream makes me nervous. Don't we also want to take
11f779421a39b86da8a523d97e5fd3477878d44f "nfsd: containerize NFSd
filesystem" ? (Stanislav?)

--b.

>
> Signed-off-by: Weng Meiling <[email protected]>
> ---
> fs/nfsd/nfsctl.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 1d74af2..4ff0db9 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -15,6 +15,7 @@
> #include <linux/sunrpc/gss_krb5_enctypes.h>
> #include <linux/sunrpc/rpc_pipe_fs.h>
> #include <linux/module.h>
> +#include <linux/nsproxy.h>
>
> #include "idmap.h"
> #include "nfsd.h"
> @@ -389,7 +390,7 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size)
> {
> char *mesg = buf;
> int rv;
> - struct net *net = &init_net;
> + struct net *net = current->nsproxy->net_ns;
>
> if (size > 0) {
> int newthreads;
> @@ -857,7 +858,7 @@ static ssize_t __write_ports(struct file *file, char *buf, size_t size,
> static ssize_t write_ports(struct file *file, char *buf, size_t size)
> {
> ssize_t rv;
> - struct net *net = &init_net;
> + struct net *net = current->nsproxy->net_ns;
>
> mutex_lock(&nfsd_mutex);
> rv = __write_ports(file, buf, size, net);
> --
> 1.8.2.2
>
>

2013-12-16 07:01:48

by Stanislav Kinsbursky

[permalink] [raw]
Subject: Re: NFSd 3.13 bug (Was "Re: [PATCH 3.4 9/9] nfsd: use the current net ns in write_threads() and write_ports()")

Hello, sorry, was out of the office, network, etc.
A couple of comment below.

16.12.2013 05:26, Weng Meiling пишет:
> Hi Bruce, Stanislav:
> Do you have any ideas about this problem?
>
> On 2013/12/10 11:12, Weng Meiling wrote:
>> Hi guys,
>>
>> When I test NFS in different network namespace with the
>> 3.13-rc2 kernel, I trigger a kernel panic.
>>
>> On 2013/12/5 5:25, J. Bruce Fields wrote:
>>> On Wed, Dec 04, 2013 at 01:53:35PM +0800, Weng Meiling wrote:
>>>> Upstream commit f7fb86c6e639360ad9c253cec534819ef928a674 (nfsd: use
>>>> "init_net" for portmapper) introduced a bug.
>>>>
>>>> Starting NFSd in a non init_net network namespace will lead to
>>>> NULL pointer deference. Because RPCBIND client will be NULL when register
>>>> RPC service with the local portmapper in svc_addsock().
>>>>
>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
>>>> IP: [<ffffffffa0439150>] call_start+0x10/0x30 [sunrpc]
>>>> ...
>>>> Pid: 27770, comm: rpc.nfsd ...
>>>> RIP: 0010:[<ffffffffa0439150>] [<ffffffffa0439150>] call_start+0x10/0x30 [sunrpc]
>>>> ...
>>>> [<ffffffffa0442841>] __rpc_execute+0x91/0x160 [sunrpc]
>>>> [<ffffffffa0442981>] rpc_execute+0x71/0x80 [sunrpc]
>>>> [<ffffffffa043ab49>] rpc_run_task+0x89/0xa0 [sunrpc]
>>>> [<ffffffffa043ac5d>] rpc_call_sync+0x3d/0x70 [sunrpc]
>>>> [<ffffffffa044b316>] rpcb_register+0xa6/0xd0 [sunrpc]
>>>> [<ffffffffa0444ede>] __svc_register+0x1ae/0x1c0 [sunrpc]
>>>> [<ffffffff8114f975>] ? cache_alloc_refill+0x85/0x290
>>>> [<ffffffffa0444f7f>] svc_register+0x8f/0xc0 [sunrpc]
>>>> [<ffffffff811504f3>] ? kmem_cache_alloc_trace+0xc3/0x1d0
>>>> [<ffffffffa04472f8>] svc_setup_socket+0x1a8/0x2c0 [sunrpc]
>>>> [<ffffffff81009546>] ? read_tsc+0x16/0x40
>>>> [<ffffffffa0448078>] svc_addsock+0x118/0x1c0 [sunrpc]
>>>> [<ffffffff81090ee5>] ? do_gettimeofday+0x15/0x50
>>>> [<ffffffffa049e69c>] ? nfsd_create_serv+0xdc/0x150 [nfsd]
>>>> [<ffffffff8125605c>] ? simple_strtoull+0x2c/0x50
>>>> [<ffffffffa049fdce>] __write_ports+0x1fe/0x230 [nfsd]
>>>> [<ffffffffa049fe37>] write_ports+0x37/0x60 [nfsd]
>>>> [<ffffffffa049fe00>] ? __write_ports+0x230/0x230 [nfsd]
>>>> [<ffffffffa049edd2>] nfsctl_transaction_write+0x72/0x90 [nfsd]
>>>> [<ffffffff8116573b>] vfs_write+0xcb/0x130
>>>> [<ffffffff81165890>] sys_write+0x50/0x90
>>>>
>>>> Fix it by using the current's network namespace so NFSd uses the
>>>> consistent net ns all the time.
>>>
>>> Everything else looks like a straightforward backport, but doing this
>>> differently from upstream makes me nervous. Don't we also want to take
>>> 11f779421a39b86da8a523d97e5fd3477878d44f "nfsd: containerize NFSd
>>> filesystem" ? (Stanislav?)
>>>
>>> --b.
>>>

Merging of 11f779421a39b86da8a523d97e5fd3477878d44f "nfsd: containerize NFSd
filesystem" depend on what network namespace is passed to svc_addsock(). If hard-coded init_net
is used, then no need in this commit, else otherwise.

>>
>> I backport the patch 11f779421a39b86da8a523d97e5fd3477878d44f "nfsd: containerize NFSd
>> filesystem" and test. But I trigger a bug, this bug still exists in 3.13 kernel. The following
>> is what I do:
>>
>> The steps:
>>
>> step 1: start NFS server in init_net net ns
>> #service nfsserver start
>>
>> step 2: stop NFS server in non init_net net ns
>> #ip netns add test
>> #ip netns list
>> test
>> #ip netns exec test service nfsserver stop
>>
>> step 3: start NFS server again in the non init_net net ns
>> #ip netns exec test service nfsserver start
>>
>> This step 3 will trigger kernel panic. The reason seems that "ip
>> netns exec" creates a new mount namespace, the changes to the
>> new mount namespace don't propgate to other namespaces. So
>> when stop NFS server in second step, the NFSD filesystem isn't
>> umounted. When restart NFS server in third step, the NFSD
>> filesystem will not remount, this result to the NFSD file
>> system superblock's net ns is still init_net and RPCBIND client
>> will be NULL when register RPC service with the local portmapper
>> in svc_addsock(). Do you have any ideas about this problem?
>>
>> the detail call trace:
>> [ 497.554677] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
>> [ 497.554687] IP: [<ffffffffa031a170>] call_start+0x10/0x30 [sunrpc]
>> [ 497.554707] PGD 0
>> [ 497.554711] Oops: 0000 [#1] SMP
>> [ 497.554716] Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc oid_registry edd af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave loop dm_mod e1000e iTCO_wdt
>> iTCO_vendor_support i2c_i801 bnx2 ipv6 lpc_ich i7core_edac edac_core acpi_cpufreq ehci_pci button ses enclosure serio_raw sg rtc_cmos mfd_core ptp hid_generic pps_core i2c_core pcspkr ext3 jbd mbcache
>> usbhid hid uhci_hcd ehci_hcd usbcore sd_mod usb_common crc_t10dif crct10dif_common processor thermal_sys hwmon scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh ata_generic ata_piix libata
>> megaraid_sas scsi_mod
>> [ 497.554788] CPU: 2 PID: 7837 Comm: rpc.nfsd Not tainted 3.13.0-rc2-0.1-default+ #1
>> [ 497.554793] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2285 /BC11BTSA , BIOS CTSAV036 04/27/2011
>> [ 497.554800] task: ffff8800ba76e2d0 ti: ffff88043e8e8000 task.ti: ffff88043e8e8000
>> [ 497.554805] RIP: 0010:[<ffffffffa031a170>] [<ffffffffa031a170>] call_start+0x10/0x30 [sunrpc]
>> [ 497.554819] RSP: 0018:ffff88043e8e9aa8 EFLAGS: 00010202
>> [ 497.554823] RAX: ffffffffa033f4b8 RBX: ffff8800bb030040 RCX: 0000000000000034
>> [ 497.554828] RDX: 0000000000000000 RSI: ffff8800bb0300b0 RDI: ffff8800bb030040
>> [ 497.554832] RBP: ffff88043e8e9aa8 R08: 0040000000000000 R09: 0200000000000000
>> [ 497.554836] R10: 0000000000000000 R11: ffff8802348fe040 R12: ffff8800bb030040
>> [ 497.554841] R13: ffffffffa031a160 R14: 0000000000000000 R15: ffffffffa031a160
>> [ 497.554846] FS: 00007f2fa0536700(0000) GS:ffff88023fc40000(0000) knlGS:0000000000000000
>> [ 497.554851] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [ 497.554855] CR2: 0000000000000058 CR3: 0000000434e30000 CR4: 00000000000007e0
>> [ 497.554859] Stack:
>> [ 497.554862] ffff88043e8e9af8 ffffffffa0323f61 ffff00066c0a0100 ffff8800bb0300b0
>> [ 497.554871] 000000003e8e9ae8 ffff8800bb030040 ffff8800bb030040 0000000000000000
>> [ 497.554878] 0000000000000000 0000000000000002 ffff88043e8e9b28 ffffffffa03240ed
>> [ 497.554886] Call Trace:
>> [ 497.554902] [<ffffffffa0323f61>] __rpc_execute+0xa1/0x190 [sunrpc]
>> [ 497.554918] [<ffffffffa03240ed>] rpc_execute+0x9d/0xc0 [sunrpc]
>> [ 497.554930] [<ffffffffa031c3e9>] rpc_run_task+0x89/0xa0 [sunrpc]
>> [ 497.554943] [<ffffffffa031c4fe>] rpc_call_sync+0x3e/0xa0 [sunrpc]
>> [ 497.554961] [<ffffffffa032d337>] rpcb_register_call+0x37/0x60 [sunrpc]
>> [ 497.554979] [<ffffffffa032d53c>] rpcb_register+0x9c/0xb0 [sunrpc]
>> [ 497.554996] [<ffffffffa03270ee>] __svc_register+0x1ae/0x1c0 [sunrpc]
>> [ 497.555012] [<ffffffffa0327190>] svc_register+0x90/0xe0 [sunrpc]
>> [ 497.555029] [<ffffffffa032a157>] svc_setup_socket+0x1e7/0x300 [sunrpc]
>> [ 497.555038] [<ffffffff810b39b3>] ? __getnstimeofday+0x43/0xd0
>> [ 497.555055] [<ffffffffa032a78a>] svc_addsock+0xca/0x1e0 [sunrpc]
>> [ 497.555068] [<ffffffffa0396b31>] ? nfsd_create_serv+0x111/0x180 [nfsd]
>> [ 497.555075] [<ffffffff8128d47e>] ? simple_strtol+0xe/0x30
>> [ 497.555084] [<ffffffffa03972b7>] ? get_int+0x57/0x70 [nfsd]
>> [ 497.555094] [<ffffffffa03977e9>] __write_ports+0x119/0x140 [nfsd]
>> [ 497.555103] [<ffffffffa039788a>] write_ports+0x7a/0xb0 [nfsd]
>> [ 497.555112] [<ffffffffa0397810>] ? __write_ports+0x140/0x140 [nfsd]
>> [ 497.555122] [<ffffffffa039713a>] nfsctl_transaction_write+0x6a/0x80 [nfsd]
>> [ 497.555129] [<ffffffff81186207>] vfs_write+0xc7/0x1e0
>> [ 497.555134] [<ffffffff8118643d>] SyS_write+0x5d/0xa0
>> [ 497.555142] [<ffffffff814deaa2>] system_call_fastpath+0x16/0x1b
>> [ 497.555146] Code: 00 00 00 01 55 48 89 e5 75 0d 48 c7 47 50 60 a1 31 a0 b8 01 00 00 00 c9 c3 66 90 48 8b 47 28 48 8b 57 18 55 83 40 20 01 48 89 e5 <48> 8b 42 58 83 40 1c 01 48 c7 47 50 f0 a1 31 a0
>> c9 c3 66 66 66
>> [ 497.555189] RIP [<ffffffffa031a170>] call_start+0x10/0x30 [sunrpc]
>> [ 497.555200] RSP <ffff88043e8e9aa8>
>> [ 497.555203] CR2: 0000000000000058
>> [ 497.555208] ---[ end trace 34ca8d40727792e2 ]---
>>

Nice...
I'll try to reproduce and figure out, how we can fix it.
Thanks!


>>>>
>>>> Signed-off-by: Weng Meiling <[email protected]>
>>>> ---
>>>> fs/nfsd/nfsctl.c | 5 +++--
>>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>>>> index 1d74af2..4ff0db9 100644
>>>> --- a/fs/nfsd/nfsctl.c
>>>> +++ b/fs/nfsd/nfsctl.c
>>>> @@ -15,6 +15,7 @@
>>>> #include <linux/sunrpc/gss_krb5_enctypes.h>
>>>> #include <linux/sunrpc/rpc_pipe_fs.h>
>>>> #include <linux/module.h>
>>>> +#include <linux/nsproxy.h>
>>>>
>>>> #include "idmap.h"
>>>> #include "nfsd.h"
>>>> @@ -389,7 +390,7 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size)
>>>> {
>>>> char *mesg = buf;
>>>> int rv;
>>>> - struct net *net = &init_net;
>>>> + struct net *net = current->nsproxy->net_ns;
>>>>
>>>> if (size > 0) {
>>>> int newthreads;
>>>> @@ -857,7 +858,7 @@ static ssize_t __write_ports(struct file *file, char *buf, size_t size,
>>>> static ssize_t write_ports(struct file *file, char *buf, size_t size)
>>>> {
>>>> ssize_t rv;
>>>> - struct net *net = &init_net;
>>>> + struct net *net = current->nsproxy->net_ns;
>>>>
>>>> mutex_lock(&nfsd_mutex);
>>>> rv = __write_ports(file, buf, size, net);
>>>> --
>>>> 1.8.2.2
>>>>
>>>>
>>>
>>> .
>>>
>>
>
>


--
Best regards,
Stanislav Kinsbursky

2013-12-04 05:54:43

by Weng Meiling

[permalink] [raw]
Subject: [PATCH 3.4 2/9] nfsd: pass net to nfsd_init_socks()

From: Stanislav Kinsbursky <[email protected]>

commit db6e182c17cb1a7069f7f8924721ce58ac05d9a3 upstream.

Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
[wengmeiling: backport to 3.4:
- adjust context
- one more parameter(int port) for nfsd_init_socks()]
Signed-off-by: Weng Meiling <[email protected]>
---
fs/nfsd/nfssvc.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 25d839e..d1ed729 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -182,18 +182,18 @@ int nfsd_nrthreads(void)
return rv;
}

-static int nfsd_init_socks(int port)
+static int nfsd_init_socks(int port, struct net *net)
{
int error;
if (!list_empty(&nfsd_serv->sv_permsocks))
return 0;

- error = svc_create_xprt(nfsd_serv, "udp", &init_net, PF_INET, port,
+ error = svc_create_xprt(nfsd_serv, "udp", net, PF_INET, port,
SVC_SOCK_DEFAULTS);
if (error < 0)
return error;

- error = svc_create_xprt(nfsd_serv, "tcp", &init_net, PF_INET, port,
+ error = svc_create_xprt(nfsd_serv, "tcp", net, PF_INET, port,
SVC_SOCK_DEFAULTS);
if (error < 0)
return error;
@@ -217,7 +217,7 @@ static int nfsd_startup(unsigned short port, int nrservs)
ret = nfsd_racache_init(2*nrservs);
if (ret)
return ret;
- ret = nfsd_init_socks(port);
+ ret = nfsd_init_socks(port, net);
if (ret)
goto out_racache;
ret = lockd_up(&init_net);
--
1.8.2.2



2013-12-04 05:54:35

by Weng Meiling

[permalink] [raw]
Subject: [PATCH 3.4 6/9] nfsd: pass net to nfsd_set_nrthreads()

From: Stanislav Kinsbursky <[email protected]>

commit 3938a0d5eb5effcc89c6909741403f4e6a37252d upstream.

Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
[wengmeiling: backport to 3.4: adjust context]
Signed-off-by: Weng Meiling <[email protected]>
---
fs/nfsd/nfsctl.c | 3 ++-
fs/nfsd/nfsd.h | 2 +-
fs/nfsd/nfssvc.c | 3 +--
3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 9463bc0..f58e0f9 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -440,6 +440,7 @@ static ssize_t write_pool_threads(struct file *file, char *buf, size_t size)
int len;
int npools;
int *nthreads;
+ struct net *net = &init_net;

mutex_lock(&nfsd_mutex);
npools = nfsd_nrpools();
@@ -470,7 +471,7 @@ static ssize_t write_pool_threads(struct file *file, char *buf, size_t size)
if (nthreads[i] < 0)
goto out_free;
}
- rv = nfsd_set_nrthreads(i, nthreads);
+ rv = nfsd_set_nrthreads(i, nthreads, net);
if (rv)
goto out_free;
}
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 5007654..a0989a2 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -71,7 +71,7 @@ int nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp);
int nfsd_nrthreads(void);
int nfsd_nrpools(void);
int nfsd_get_nrthreads(int n, int *);
-int nfsd_set_nrthreads(int n, int *);
+int nfsd_set_nrthreads(int n, int *, struct net *);

static inline void nfsd_destroy(struct net *net)
{
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index b8aea64..0974818 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -374,12 +374,11 @@ int nfsd_get_nrthreads(int n, int *nthreads)
return 0;
}

-int nfsd_set_nrthreads(int n, int *nthreads)
+int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
{
int i = 0;
int tot = 0;
int err = 0;
- struct net *net = &init_net;

WARN_ON(!mutex_is_locked(&nfsd_mutex));

--
1.8.2.2