2017-10-17 16:40:52

by Vasily Averin

[permalink] [raw]
Subject: [RFC PATCH 0/2] race of lockd/nfsd inetaddr notifiers with pointers change

lockd and nfsd inet[6]addr notifiers use pointer that can be changed during execution.

lockd_inet[6]addr_event use nlmsvc_rqst without taken nlmsvc_mutex,
nfsd notifier have similar trouble.

We got few crashes from OpenVz customers on RHEL6-based kernel,
and I have reproduced the problem locally on this kernel.

I was unable to reproduce the problem on new kernels,
however seems they are affected.

We cannot add mutexes into notifiers because inet6addr notifiers should be atomic.

To fix the problem I use atomic counter and waitqueue:
counter allows notifier to access the pointer,
waitqueue allows to delay stop of service until notifier is in use.

Patches was not tested because I was unable to reproduce the problem on new kernels.

Please review it carefully and let me know if this can be fixed in a better way.

Vasily Averin (2):
race of lockd inetaddr notifiers with nlmsvc_rqst change
race of nfsd inetaddr notifiers with nn->nfsd_serv change

fs/lockd/svc.c | 16 ++++++++++++++--
fs/nfsd/netns.h | 3 +++
fs/nfsd/nfsctl.c | 3 +++
fs/nfsd/nfssvc.c | 14 +++++++++++---
4 files changed, 31 insertions(+), 5 deletions(-)

--
2.7.4



2017-10-19 15:42:55

by Vasily Averin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] race of lockd/nfsd inetaddr notifiers with pointers change

cc: Scott Mayhew

Dear Scott,
could you please take look at patches?

Let me describe the problem once again:

lockd_inetaddr_event()
...
if (nlmsvc_rqst) {
...
svc_age_temp_xprts_now(nlmsvc_rqst->rq_server, (struct sockaddr *)&sin);
}

Usually access to nlmsvc_rqst is protected by nlmsvc_mutex
However lockd_inet[6]addr_event does not take the mutex,
therefore nlmsvc_rqst can be changed during execution.

as result "if (nlmsvc_rqst)" can be passed,
then another thread frees the memory or zeroes this pointer,
and then svc_age_temp_xprts_now crash the host on access to already freed memory.

Moreover on initialization nlmsvc_rqst can be temporally set to ERR_PTR.

NFSD have similar issue.

On 2017-10-17 19:40, Vasily Averin wrote:
> lockd and nfsd inet[6]addr notifiers use pointer that can be changed during execution.
>
> lockd_inet[6]addr_event use nlmsvc_rqst without taken nlmsvc_mutex,
> nfsd notifier have similar trouble.
>
> We got few crashes from OpenVz customers on RHEL6-based kernel,
> and I have reproduced the problem locally on this kernel.
>
> I was unable to reproduce the problem on new kernels,
> however seems they are affected.
>
> We cannot add mutexes into notifiers because inet6addr notifiers should be atomic.
>
> To fix the problem I use atomic counter and waitqueue:
> counter allows notifier to access the pointer,
> waitqueue allows to delay stop of service until notifier is in use.
>
> Patches was not tested because I was unable to reproduce the problem on new kernels.
>
> Please review it carefully and let me know if this can be fixed in a better way.
>
> Vasily Averin (2):
> race of lockd inetaddr notifiers with nlmsvc_rqst change
> race of nfsd inetaddr notifiers with nn->nfsd_serv change
>
> fs/lockd/svc.c | 16 ++++++++++++++--
> fs/nfsd/netns.h | 3 +++
> fs/nfsd/nfsctl.c | 3 +++
> fs/nfsd/nfssvc.c | 14 +++++++++++---
> 4 files changed, 31 insertions(+), 5 deletions(-)
>

2017-10-30 14:55:24

by Vasily Averin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] race of lockd/nfsd inetaddr notifiers with pointers change

I've reproduced the problem both on RHEL7 and then on last mainline kernel:

1) start nfsd on host
# service nfs start

2) create separate net and mount namespaces:
# unshare -m -n ; mount -t nfsd nfsd /proc/fs/nfsd

3) execute screen (we need 2 consoles with newly created namespaces)
4) on first console:
# ifconfig lo up
# while : ; do ip a a 1.2.3.4/32 dev lo ; do ip a d 1.2.3.4/32 dev lo ; done

5) on second console:
# while : ; do echo 1 > /proc/fs/nfsd/threads ; sleep 1 ; echo 0 > /proc/fs/nfsd/threads ; sleep 1 ; done

Result: crash inside nfsd_inteddr_event(), see attached log.

Submitted patches have resolved the problem, patched kernel was not crashed after day of testing.

NB: during my experiments I've found "list_add double add" in set_grace_period()
and fixed it by recently submitted "[PATCH] lockd: fix lockd shutdown race with signal"

Thank you,
Vasily Averin

On 2017-10-19 18:42, Vasily Averin wrote:
> cc: Scott Mayhew
>
> Dear Scott,
> could you please take look at patches?
>
> Let me describe the problem once again:
>
> lockd_inetaddr_event()
> ...
> if (nlmsvc_rqst) {
> ...
> svc_age_temp_xprts_now(nlmsvc_rqst->rq_server, (struct sockaddr *)&sin);
> }
>
> Usually access to nlmsvc_rqst is protected by nlmsvc_mutex
> However lockd_inet[6]addr_event does not take the mutex,
> therefore nlmsvc_rqst can be changed during execution.
>
> as result "if (nlmsvc_rqst)" can be passed,
> then another thread frees the memory or zeroes this pointer,
> and then svc_age_temp_xprts_now crash the host on access to already freed memory.
>
> Moreover on initialization nlmsvc_rqst can be temporally set to ERR_PTR.
>
> NFSD have similar issue.
>
> On 2017-10-17 19:40, Vasily Averin wrote:
>> lockd and nfsd inet[6]addr notifiers use pointer that can be changed during execution.
>>
>> lockd_inet[6]addr_event use nlmsvc_rqst without taken nlmsvc_mutex,
>> nfsd notifier have similar trouble.
>>
>> We got few crashes from OpenVz customers on RHEL6-based kernel,
>> and I have reproduced the problem locally on this kernel.
>>
>> I was unable to reproduce the problem on new kernels,
>> however seems they are affected.
>>
>> We cannot add mutexes into notifiers because inet6addr notifiers should be atomic.
>>
>> To fix the problem I use atomic counter and waitqueue:
>> counter allows notifier to access the pointer,
>> waitqueue allows to delay stop of service until notifier is in use.
>>
>> Patches was not tested because I was unable to reproduce the problem on new kernels.
>>
>> Please review it carefully and let me know if this can be fixed in a better way.
>>
>> Vasily Averin (2):
>> race of lockd inetaddr notifiers with nlmsvc_rqst change
>> race of nfsd inetaddr notifiers with nn->nfsd_serv change
>>
>> fs/lockd/svc.c | 16 ++++++++++++++--
>> fs/nfsd/netns.h | 3 +++
>> fs/nfsd/nfsctl.c | 3 +++
>> fs/nfsd/nfssvc.c | 14 +++++++++++---
>> 4 files changed, 31 insertions(+), 5 deletions(-)
>>


Attachments:
ml2.txt (3.92 kB)

2017-10-31 17:29:28

by Scott Mayhew

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] race of lockd/nfsd inetaddr notifiers with pointers change

On Mon, 30 Oct 2017, Vasily Averin wrote:

> I've reproduced the problem both on RHEL7 and then on last mainline kernel:
>
> 1) start nfsd on host
> # service nfs start
>
> 2) create separate net and mount namespaces:
> # unshare -m -n ; mount -t nfsd nfsd /proc/fs/nfsd
>
> 3) execute screen (we need 2 consoles with newly created namespaces)
> 4) on first console:
> # ifconfig lo up
> # while : ; do ip a a 1.2.3.4/32 dev lo ; do ip a d 1.2.3.4/32 dev lo ; done
>
> 5) on second console:
> # while : ; do echo 1 > /proc/fs/nfsd/threads ; sleep 1 ; echo 0 > /proc/fs/nfsd/threads ; sleep 1 ; done
>
> Result: crash inside nfsd_inteddr_event(), see attached log.
>
> Submitted patches have resolved the problem, patched kernel was not crashed after day of testing.

Thanks for that reproducer. I see the same panic (it seems to
reproduce more quickly with the rpcdebug printk's enabled). I've been
running the same reproducer with your patches for the past day and
haven't seen the panic.

Tested-by: Scott Mayhew <[email protected]>

>
> NB: during my experiments I've found "list_add double add" in set_grace_period()
> and fixed it by recently submitted "[PATCH] lockd: fix lockd shutdown race with signal"
>
> Thank you,
> Vasily Averin
>
> On 2017-10-19 18:42, Vasily Averin wrote:
> > cc: Scott Mayhew
> >
> > Dear Scott,
> > could you please take look at patches?
> >
> > Let me describe the problem once again:
> >
> > lockd_inetaddr_event()
> > ...
> > if (nlmsvc_rqst) {
> > ...
> > svc_age_temp_xprts_now(nlmsvc_rqst->rq_server, (struct sockaddr *)&sin);
> > }
> >
> > Usually access to nlmsvc_rqst is protected by nlmsvc_mutex
> > However lockd_inet[6]addr_event does not take the mutex,
> > therefore nlmsvc_rqst can be changed during execution.
> >
> > as result "if (nlmsvc_rqst)" can be passed,
> > then another thread frees the memory or zeroes this pointer,
> > and then svc_age_temp_xprts_now crash the host on access to already freed memory.
> >
> > Moreover on initialization nlmsvc_rqst can be temporally set to ERR_PTR.
> >
> > NFSD have similar issue.
> >
> > On 2017-10-17 19:40, Vasily Averin wrote:
> >> lockd and nfsd inet[6]addr notifiers use pointer that can be changed during execution.
> >>
> >> lockd_inet[6]addr_event use nlmsvc_rqst without taken nlmsvc_mutex,
> >> nfsd notifier have similar trouble.
> >>
> >> We got few crashes from OpenVz customers on RHEL6-based kernel,
> >> and I have reproduced the problem locally on this kernel.
> >>
> >> I was unable to reproduce the problem on new kernels,
> >> however seems they are affected.
> >>
> >> We cannot add mutexes into notifiers because inet6addr notifiers should be atomic.
> >>
> >> To fix the problem I use atomic counter and waitqueue:
> >> counter allows notifier to access the pointer,
> >> waitqueue allows to delay stop of service until notifier is in use.
> >>
> >> Patches was not tested because I was unable to reproduce the problem on new kernels.
> >>
> >> Please review it carefully and let me know if this can be fixed in a better way.
> >>
> >> Vasily Averin (2):
> >> race of lockd inetaddr notifiers with nlmsvc_rqst change
> >> race of nfsd inetaddr notifiers with nn->nfsd_serv change
> >>
> >> fs/lockd/svc.c | 16 ++++++++++++++--
> >> fs/nfsd/netns.h | 3 +++
> >> fs/nfsd/nfsctl.c | 3 +++
> >> fs/nfsd/nfssvc.c | 14 +++++++++++---
> >> 4 files changed, 31 insertions(+), 5 deletions(-)
> >>
>

> [ 604.294055] nfsd_inetaddr_event: removed 1.2.3.4
> [ 604.294060] nfsd: last server has exited, flushing export cache
> [ 604.295922] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> [ 604.296189] IP: _raw_spin_lock_bh+0x1b/0x30
> [ 604.296189] PGD 5a596067 P4D 5a596067 PUD 3052e067 PMD 0
> [ 604.296189] Oops: 0002 [#1] SMP
> [ 604.298844] Modules linked in: binfmt_misc nfsd auth_rpcgss nfs_acl lockd(E) grace ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables sunrpc joydev ppdev virtio_balloon crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pvpanic parport_pc pcspkr parport i2c_piix4 xfs libcrc32c virtio_console virtio_net virtio_scsi bochs_drm drm_kms_helper crc32c_intel ttm drm serio_raw virtio_pci virtio_ring virtio ata_generic pata_acpi floppy
> [ 604.302188] CPU: 6 PID: 4310 Comm: ip Tainted: G E 4.14.0-rc6+ #2
> [ 604.302188] Hardware name: Virtuozzo KVM, BIOS 1.9.1-5.3.2.vz7.7 04/01/2014
> [ 604.305117] task: ffff8e9eda512840 task.stack: ffffb1074f288000
> [ 604.305166] RIP: 0010:_raw_spin_lock_bh+0x1b/0x30
> [ 604.306034] RSP: 0018:ffffb1074f28b950 EFLAGS: 00010246
> [ 604.306034] RAX: 0000000000000000 RBX: 0000000000000038 RCX: 0000000000000000
> [ 604.307034] RDX: 0000000000000001 RSI: ffffb1074f28b9d0 RDI: 0000000000000010
> [ 604.307034] RBP: ffffb1074f28b950 R08: 00000000000190bd R09: 0000000000000000
> [ 604.307034] R10: 00000000ff000000 R11: 00000000ffffffff R12: ffffb1074f28b978
> [ 604.307034] R13: ffffb1074f28b9d0 R14: ffff8e9eefcd8ae8 R15: 0000000000000000
> [ 604.307034] FS: 00007f16f5e720c0(0000) GS:ffff8e9effd80000(0000) knlGS:0000000000000000
> [ 604.313236] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 604.313236] CR2: 0000000000000010 CR3: 000000005a695005 CR4: 00000000001606e0
> [ 604.313236] Call Trace:
> [ 604.313236] svc_age_temp_xprts_now+0x4b/0x200 [sunrpc]
> [ 604.315173] nfsd_inetaddr_event+0x87/0xb0 [nfsd]
> [ 604.315173] notifier_call_chain+0x4a/0x70
> [ 604.315173] blocking_notifier_call_chain+0x43/0x60
> [ 604.315173] __inet_del_ifa+0x16b/0x2c0
> [ 604.315173] inet_rtm_deladdr+0x129/0x1c0
> [ 604.315173] rtnetlink_rcv_msg+0x1f9/0x280
> [ 604.315173] ? rtnl_calcit.isra.24+0x110/0x110
> [ 604.315173] netlink_rcv_skb+0x91/0x130
> [ 604.322850] rtnetlink_rcv+0x15/0x20
> [ 604.322850] netlink_unicast+0x18e/0x220
> [ 604.322850] netlink_sendmsg+0x2c5/0x3c0
> [ 604.325114] sock_sendmsg+0x38/0x50
> [ 604.325150] ___sys_sendmsg+0x29a/0x2f0
> [ 604.325150] ? lru_cache_add+0x3a/0x80
> [ 604.325150] ? lru_cache_add_active_or_unevictable+0x4c/0xf0
> [ 604.325150] ? __handle_mm_fault+0x9be/0x11a0
> [ 604.325150] ? handle_mm_fault+0xb1/0x200
> [ 604.325150] __sys_sendmsg+0x54/0x90
> [ 604.325150] ? __sys_sendmsg+0x54/0x90
> [ 604.325150] SyS_sendmsg+0x12/0x20
> [ 604.325150] entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 604.325150] RIP: 0033:0x7f16f5579e57
> [ 604.331665] RSP: 002b:00007fffa38b4628 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> [ 604.332366] RAX: ffffffffffffffda RBX: 00000000006714c0 RCX: 00007f16f5579e57
> [ 604.332920] RDX: 0000000000000000 RSI: 00007fffa38b4670 RDI: 0000000000000003
> [ 604.333191] RBP: 00007fffa38bcaf0 R08: 0000000000000001 R09: fefefeff77686d74
> [ 604.333191] R10: 0000000000000006 R11: 0000000000000246 R12: 00007fffa38bc800
> [ 604.333191] R13: 0000000000000000 R14: 00007fffa38bc7a0 R15: 00007fffa38bc7a8
> [ 604.333191] Code: 00 5d c3 31 c0 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 65 81 05 af 47 76 64 00 02 00 00 48 89 e5 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 02 5d c3 89 c6 e8 d4 ac 84 ff 5d c3 66 90
> [ 604.335102] RIP: _raw_spin_lock_bh+0x1b/0x30 RSP: ffffb1074f28b950
> [ 604.335102] CR2: 0000000000000010


2017-11-10 07:19:29

by Vasily Averin

[permalink] [raw]
Subject: [PATCH 0/2] race of lockd/nfsd inetaddr notifiers vs pointers change

lockd and nfsd inet[6]addr notifiers use pointer that can be changed during execution.

lockd_inetaddr_event()
...
if (nlmsvc_rqst) {
...
svc_age_temp_xprts_now(nlmsvc_rqst->rq_server, (struct sockaddr *)&sin);
}

Usually access to nlmsvc_rqst is protected by nlmsvc_mutex
However lockd_inet[6]addr_event does not take the mutex,
therefore nlmsvc_rqst can be changed during execution.

As result "if (nlmsvc_rqst)" can be passed,
then another thread frees the memory or zeroes this pointer,
and then svc_age_temp_xprts_now crash the host on access to already freed memory.

Moreover on initialization nlmsvc_rqst can be temporally set to ERR_PTR.

NFSD have similar issue, its reproducer is below

1) start nfsd on host
# service nfs start

2) create separate net and mount namespaces:
# unshare -m -n ; mount -t nfsd nfsd /proc/fs/nfsd

3) execute screen (we need 2 consoles with newly created namespaces)
4) on first console:
# ifconfig lo up
# while : ; do ip a a 1.2.3.4/32 dev lo ; do ip a d 1.2.3.4/32 dev lo ; done

5) on second console:
# while : ; do echo 1 > /proc/fs/nfsd/threads ; sleep 1 ; echo 0 > /proc/fs/nfsd/threads ; sleep 1 ; done

Result: crash inside nfsd_inteddr_event(), see demsg in attachment.

We cannot add mutexes into notifiers because inet6addr notifiers should be atomic.

To fix the problem I use atomic counter and waitqueue:
counter allows notifier to access the pointer,
waitqueue allows to delay stop of service until notifier is in use.

Vasily Averin (2):
race of lockd inetaddr notifiers vs nlmsvc_rqst change
race of nfsd inetaddr notifiers vs nn->nfsd_serv change

fs/lockd/svc.c | 16 ++++++++++++++--
fs/nfsd/netns.h | 3 +++
fs/nfsd/nfsctl.c | 3 +++
fs/nfsd/nfssvc.c | 14 +++++++++++---
4 files changed, 31 insertions(+), 5 deletions(-)

--
2.7.4


Attachments:
ml2.txt (3.92 kB)

2017-11-10 07:19:34

by Vasily Averin

[permalink] [raw]
Subject: [PATCH 1/2] race of lockd inetaddr notifiers vs nlmsvc_rqst change

lockd_inet[6]addr_event use nlmsvc_rqst without taken nlmsvc_mutex,
nlmsvc_rqst can be changed during execution of notifiers and crash the host.

Patch enables access to nlmsvc_rqst only when it was correctly initialized
and delays its cleanup until notifiers are in use.

Note that nlmsvc_rqst can be temporally set to ERR_PTR,
in this case "if (nlmsvc_rqst)" check in notifiers will be passed incorrectly.

Signed-off-by: Vasily Averin <[email protected]>
Tested-by: Scott Mayhew <[email protected]>
---
fs/lockd/svc.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index f04ecfc..c1573860 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -57,6 +57,9 @@ static struct task_struct *nlmsvc_task;
static struct svc_rqst *nlmsvc_rqst;
unsigned long nlmsvc_timeout;

+atomic_t nlm_ntf_refcnt = ATOMIC_INIT(0);
+DECLARE_WAIT_QUEUE_HEAD(nlm_ntf_wq);
+
unsigned int lockd_net_id;

/*
@@ -290,7 +293,8 @@ static int lockd_inetaddr_event(struct notifier_block *this,
struct in_ifaddr *ifa = (struct in_ifaddr *)ptr;
struct sockaddr_in sin;

- if (event != NETDEV_DOWN)
+ if ((event != NETDEV_DOWN) ||
+ !atomic_inc_not_zero(&nlm_ntf_refcnt))
goto out;

if (nlmsvc_rqst) {
@@ -301,6 +305,8 @@ static int lockd_inetaddr_event(struct notifier_block *this,
svc_age_temp_xprts_now(nlmsvc_rqst->rq_server,
(struct sockaddr *)&sin);
}
+ atomic_dec(&nlm_ntf_refcnt);
+ wake_up(&nlm_ntf_wq);

out:
return NOTIFY_DONE;
@@ -317,7 +323,8 @@ static int lockd_inet6addr_event(struct notifier_block *this,
struct inet6_ifaddr *ifa = (struct inet6_ifaddr *)ptr;
struct sockaddr_in6 sin6;

- if (event != NETDEV_DOWN)
+ if ((event != NETDEV_DOWN) ||
+ !atomic_inc_not_zero(&nlm_ntf_refcnt))
goto out;

if (nlmsvc_rqst) {
@@ -329,6 +336,8 @@ static int lockd_inet6addr_event(struct notifier_block *this,
svc_age_temp_xprts_now(nlmsvc_rqst->rq_server,
(struct sockaddr *)&sin6);
}
+ atomic_dec(&nlm_ntf_refcnt);
+ wake_up(&nlm_ntf_wq);

out:
return NOTIFY_DONE;
@@ -345,10 +354,12 @@ static void lockd_unregister_notifiers(void)
#if IS_ENABLED(CONFIG_IPV6)
unregister_inet6addr_notifier(&lockd_inet6addr_notifier);
#endif
+ wait_event(nlm_ntf_wq, atomic_read(&nlm_ntf_refcnt) == 0);
}

static void lockd_svc_exit_thread(void)
{
+ atomic_dec(&nlm_ntf_refcnt);
lockd_unregister_notifiers();
svc_exit_thread(nlmsvc_rqst);
}
@@ -373,6 +384,7 @@ static int lockd_start_svc(struct svc_serv *serv)
goto out_rqst;
}

+ atomic_inc(&nlm_ntf_refcnt);
svc_sock_update_bufs(serv);
serv->sv_maxconn = nlm_max_connections;

--
2.7.4


2017-11-10 07:19:43

by Vasily Averin

[permalink] [raw]
Subject: [PATCH 2/2] race of nfsd inetaddr notifiers vs nn->nfsd_serv change

nfsd_inet[6]addr_event use nn->nfsd_serv without taken nfsd_mutex,
pointer can be changed during execution of notifiers and crash the host.

Moreover if notifiers was enabled in one net namespace it will be enabled
in all other net namespaces, from its creation until its destroy.

Patch allows notifiers to access to nn->nfsd_serv only when the pointer was
correctly initialized and delays its cleanup until notifiers are in use.

Signed-off-by: Vasily Averin <[email protected]>
Tested-by: Scott Mayhew <[email protected]>
---
fs/nfsd/netns.h | 3 +++
fs/nfsd/nfsctl.c | 3 +++
fs/nfsd/nfssvc.c | 14 +++++++++++---
3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 3714231..82f7999 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -119,6 +119,9 @@ struct nfsd_net {
u32 clverifier_counter;

struct svc_serv *nfsd_serv;
+
+ wait_queue_head_t ntf_wq;
+ atomic_t ntf_refcnt;
};

/* Simple check to find out if a given net was properly initialized */
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 6493df6..d107b44 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1241,6 +1241,9 @@ static __net_init int nfsd_init_net(struct net *net)
nn->nfsd4_grace = 90;
nn->clverifier_counter = prandom_u32();
nn->clientid_counter = prandom_u32();
+
+ atomic_set(&nn->ntf_refcnt, 0);
+ init_waitqueue_head(&nn->ntf_wq);
return 0;

out_idmap_error:
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 7e3af3e..29a263b 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -334,7 +334,8 @@ static int nfsd_inetaddr_event(struct notifier_block *this, unsigned long event,
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
struct sockaddr_in sin;

- if (event != NETDEV_DOWN)
+ if ((event != NETDEV_DOWN) ||
+ !atomic_inc_not_zero(&nn->ntf_refcnt))
goto out;

if (nn->nfsd_serv) {
@@ -343,6 +344,8 @@ static int nfsd_inetaddr_event(struct notifier_block *this, unsigned long event,
sin.sin_addr.s_addr = ifa->ifa_local;
svc_age_temp_xprts_now(nn->nfsd_serv, (struct sockaddr *)&sin);
}
+ atomic_dec(&nn->ntf_refcnt);
+ wake_up(&nn->ntf_wq);

out:
return NOTIFY_DONE;
@@ -362,7 +365,8 @@ static int nfsd_inet6addr_event(struct notifier_block *this,
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
struct sockaddr_in6 sin6;

- if (event != NETDEV_DOWN)
+ if ((event != NETDEV_DOWN) ||
+ !atomic_inc_not_zero(&nn->ntf_refcnt))
goto out;

if (nn->nfsd_serv) {
@@ -373,7 +377,8 @@ static int nfsd_inet6addr_event(struct notifier_block *this,
sin6.sin6_scope_id = ifa->idev->dev->ifindex;
svc_age_temp_xprts_now(nn->nfsd_serv, (struct sockaddr *)&sin6);
}
-
+ atomic_dec(&nn->ntf_refcnt);
+ wake_up(&nn->ntf_wq);
out:
return NOTIFY_DONE;
}
@@ -390,6 +395,7 @@ static void nfsd_last_thread(struct svc_serv *serv, struct net *net)
{
struct nfsd_net *nn = net_generic(net, nfsd_net_id);

+ atomic_dec(&nn->ntf_refcnt);
/* check if the notifier still has clients */
if (atomic_dec_return(&nfsd_notifier_refcount) == 0) {
unregister_inetaddr_notifier(&nfsd_inetaddr_notifier);
@@ -397,6 +403,7 @@ static void nfsd_last_thread(struct svc_serv *serv, struct net *net)
unregister_inet6addr_notifier(&nfsd_inet6addr_notifier);
#endif
}
+ wait_event(nn->ntf_wq, atomic_read(&nn->ntf_refcnt) == 0);

/*
* write_ports can create the server without actually starting
@@ -516,6 +523,7 @@ int nfsd_create_serv(struct net *net)
register_inet6addr_notifier(&nfsd_inet6addr_notifier);
#endif
}
+ atomic_inc(&nn->ntf_refcnt);
do_gettimeofday(&nn->nfssvc_boot); /* record boot time */
return 0;
}
--
2.7.4


2017-11-10 21:57:54

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 0/2] race of lockd/nfsd inetaddr notifiers vs pointers change

Thanks! Both applied.

--b.

On Fri, Nov 10, 2017 at 10:19:16AM +0300, Vasily Averin wrote:
> lockd and nfsd inet[6]addr notifiers use pointer that can be changed during execution.
>
> lockd_inetaddr_event()
> ...
> if (nlmsvc_rqst) {
> ...
> svc_age_temp_xprts_now(nlmsvc_rqst->rq_server, (struct sockaddr *)&sin);
> }
>
> Usually access to nlmsvc_rqst is protected by nlmsvc_mutex
> However lockd_inet[6]addr_event does not take the mutex,
> therefore nlmsvc_rqst can be changed during execution.
>
> As result "if (nlmsvc_rqst)" can be passed,
> then another thread frees the memory or zeroes this pointer,
> and then svc_age_temp_xprts_now crash the host on access to already freed memory.
>
> Moreover on initialization nlmsvc_rqst can be temporally set to ERR_PTR.
>
> NFSD have similar issue, its reproducer is below
>
> 1) start nfsd on host
> # service nfs start
>
> 2) create separate net and mount namespaces:
> # unshare -m -n ; mount -t nfsd nfsd /proc/fs/nfsd
>
> 3) execute screen (we need 2 consoles with newly created namespaces)
> 4) on first console:
> # ifconfig lo up
> # while : ; do ip a a 1.2.3.4/32 dev lo ; do ip a d 1.2.3.4/32 dev lo ; done
>
> 5) on second console:
> # while : ; do echo 1 > /proc/fs/nfsd/threads ; sleep 1 ; echo 0 > /proc/fs/nfsd/threads ; sleep 1 ; done
>
> Result: crash inside nfsd_inteddr_event(), see demsg in attachment.
>
> We cannot add mutexes into notifiers because inet6addr notifiers should be atomic.
>
> To fix the problem I use atomic counter and waitqueue:
> counter allows notifier to access the pointer,
> waitqueue allows to delay stop of service until notifier is in use.
>
> Vasily Averin (2):
> race of lockd inetaddr notifiers vs nlmsvc_rqst change
> race of nfsd inetaddr notifiers vs nn->nfsd_serv change
>
> fs/lockd/svc.c | 16 ++++++++++++++--
> fs/nfsd/netns.h | 3 +++
> fs/nfsd/nfsctl.c | 3 +++
> fs/nfsd/nfssvc.c | 14 +++++++++++---
> 4 files changed, 31 insertions(+), 5 deletions(-)
>
> --
> 2.7.4
>

> [ 604.294055] nfsd_inetaddr_event: removed 1.2.3.4
> [ 604.294060] nfsd: last server has exited, flushing export cache
> [ 604.295922] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> [ 604.296189] IP: _raw_spin_lock_bh+0x1b/0x30
> [ 604.296189] PGD 5a596067 P4D 5a596067 PUD 3052e067 PMD 0
> [ 604.296189] Oops: 0002 [#1] SMP
> [ 604.298844] Modules linked in: binfmt_misc nfsd auth_rpcgss nfs_acl lockd(E) grace ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables sunrpc joydev ppdev virtio_balloon crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pvpanic parport_pc pcspkr parport i2c_piix4 xfs libcrc32c virtio_console virtio_net virtio_scsi bochs_drm drm_kms_helper crc32c_intel ttm drm serio_raw virtio_pci virtio_ring virtio ata_generic pata_acpi floppy
> [ 604.302188] CPU: 6 PID: 4310 Comm: ip Tainted: G E 4.14.0-rc6+ #2
> [ 604.302188] Hardware name: Virtuozzo KVM, BIOS 1.9.1-5.3.2.vz7.7 04/01/2014
> [ 604.305117] task: ffff8e9eda512840 task.stack: ffffb1074f288000
> [ 604.305166] RIP: 0010:_raw_spin_lock_bh+0x1b/0x30
> [ 604.306034] RSP: 0018:ffffb1074f28b950 EFLAGS: 00010246
> [ 604.306034] RAX: 0000000000000000 RBX: 0000000000000038 RCX: 0000000000000000
> [ 604.307034] RDX: 0000000000000001 RSI: ffffb1074f28b9d0 RDI: 0000000000000010
> [ 604.307034] RBP: ffffb1074f28b950 R08: 00000000000190bd R09: 0000000000000000
> [ 604.307034] R10: 00000000ff000000 R11: 00000000ffffffff R12: ffffb1074f28b978
> [ 604.307034] R13: ffffb1074f28b9d0 R14: ffff8e9eefcd8ae8 R15: 0000000000000000
> [ 604.307034] FS: 00007f16f5e720c0(0000) GS:ffff8e9effd80000(0000) knlGS:0000000000000000
> [ 604.313236] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 604.313236] CR2: 0000000000000010 CR3: 000000005a695005 CR4: 00000000001606e0
> [ 604.313236] Call Trace:
> [ 604.313236] svc_age_temp_xprts_now+0x4b/0x200 [sunrpc]
> [ 604.315173] nfsd_inetaddr_event+0x87/0xb0 [nfsd]
> [ 604.315173] notifier_call_chain+0x4a/0x70
> [ 604.315173] blocking_notifier_call_chain+0x43/0x60
> [ 604.315173] __inet_del_ifa+0x16b/0x2c0
> [ 604.315173] inet_rtm_deladdr+0x129/0x1c0
> [ 604.315173] rtnetlink_rcv_msg+0x1f9/0x280
> [ 604.315173] ? rtnl_calcit.isra.24+0x110/0x110
> [ 604.315173] netlink_rcv_skb+0x91/0x130
> [ 604.322850] rtnetlink_rcv+0x15/0x20
> [ 604.322850] netlink_unicast+0x18e/0x220
> [ 604.322850] netlink_sendmsg+0x2c5/0x3c0
> [ 604.325114] sock_sendmsg+0x38/0x50
> [ 604.325150] ___sys_sendmsg+0x29a/0x2f0
> [ 604.325150] ? lru_cache_add+0x3a/0x80
> [ 604.325150] ? lru_cache_add_active_or_unevictable+0x4c/0xf0
> [ 604.325150] ? __handle_mm_fault+0x9be/0x11a0
> [ 604.325150] ? handle_mm_fault+0xb1/0x200
> [ 604.325150] __sys_sendmsg+0x54/0x90
> [ 604.325150] ? __sys_sendmsg+0x54/0x90
> [ 604.325150] SyS_sendmsg+0x12/0x20
> [ 604.325150] entry_SYSCALL_64_fastpath+0x1a/0xa5
> [ 604.325150] RIP: 0033:0x7f16f5579e57
> [ 604.331665] RSP: 002b:00007fffa38b4628 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> [ 604.332366] RAX: ffffffffffffffda RBX: 00000000006714c0 RCX: 00007f16f5579e57
> [ 604.332920] RDX: 0000000000000000 RSI: 00007fffa38b4670 RDI: 0000000000000003
> [ 604.333191] RBP: 00007fffa38bcaf0 R08: 0000000000000001 R09: fefefeff77686d74
> [ 604.333191] R10: 0000000000000006 R11: 0000000000000246 R12: 00007fffa38bc800
> [ 604.333191] R13: 0000000000000000 R14: 00007fffa38bc7a0 R15: 00007fffa38bc7a8
> [ 604.333191] Code: 00 5d c3 31 c0 5d c3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 65 81 05 af 47 76 64 00 02 00 00 48 89 e5 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 02 5d c3 89 c6 e8 d4 ac 84 ff 5d c3 66 90
> [ 604.335102] RIP: _raw_spin_lock_bh+0x1b/0x30 RSP: ffffb1074f28b950
> [ 604.335102] CR2: 0000000000000010