2017-11-12 08:40:07

by Vasily Averin

[permalink] [raw]
Subject: [PATCH v4 00/18] exit_net checks for objects initialized in net_init hook

OpenVz kernel team have a long history of fighting against namespace-related bugs,
some of them could be excluded by using simple checks described below.

One of typical errors is related to live cycle of namespaces:
usually objects created for some namespace should not live longer than namespace itself.

Such kind of issues can be invisible on usual systems where additional namespaces
are not used, because initial namespaces usually lives forever and never destroyed.

However in systems with namespaces it can lead to memory leaks or to use-after-free.
Both of them are critical for systems with running containers.
As you knows it's quite hard to find the reason of such issues,
especially in rarely-triggered scenarios on production nodes on default kernels
without specially enabled debug settings. Any additional hints can be useful here.

This patch set should help to detect some of these issues.
It is based on assumption that objects initialized in init hook of pernet_operations
should return to initial state until end of exit hook.

Many drivers and subsystems already have such checks, however I've found number
of places where list_empty check would be useful at least as smoke test.

These checks are useful for long-term stable kernels,
they allows to detect problems related to incomplete or incorrectly
backported patches.

Also this patch set replaces BUG_ON in existing checks:
memory leaks and possible memory corruptions are bad of course,
however in many cases they are not fatal
and should not crash production hosts unconditionally.

Changes:
v4:
- excluded grace and lockd patches taken by Bruce Fields
- let's use WARN_ON_ONCE without any extra messages
adobriyan@ is right, output of net Id gives nothing to host admin,
and developers in any case will extract information from core dump
- updated description in cover letter
- dropped nfs4blocklayout patch: waitqueue check does not look useful
- patches was reordered to be per-subsystem grouped
- cover letter should be sent to all people included into cc: of any patches
- minor cosmetic changes in some patches

v3:
- use net->ns.inum as net Id
- removed patches for hashlimit and recent,
they handle tables list in exit_net hook.
- added patches for grace and lockd

v2:
- net pointer removed from output
- fixed compilation for phonet driver


Vasily Averin (18):
af_key: replace BUG_ON on WARN_ON in net_exit hook
geneve: exit_net cleanup check added
packet: exit_net cleanup check added
vxlan: exit_net cleanup checks added
netdev: exit_net cleanup check added
fib_notifier: exit_net cleanup check added
fib_rules: exit_net cleanup check added
l2tp: exit_net cleanup check added
clusterip: exit_net cleanup check added
nf_tables: exit_net cleanup check added
nfnetlink_log: exit_net cleanup check added
nfnetlink_gueue: exit_net cleanup check added
x_tables: exit_net cleanup check added
nfs client: exit_net cleanup check added
sunrpc: exit_net cleanup check added
phonet: exit_net cleanup check added
ppp: exit_net cleanup checks added
xfrm6_tunnel: exit_net cleanup check added

drivers/net/geneve.c | 1 +
drivers/net/ppp/ppp_generic.c | 2 ++
drivers/net/vxlan.c | 5 +++++
fs/nfs/inode.c | 4 ++++
net/core/dev.c | 2 ++
net/core/fib_notifier.c | 6 ++++++
net/core/fib_rules.c | 6 ++++++
net/ipv4/netfilter/ipt_CLUSTERIP.c | 1 +
net/ipv6/xfrm6_tunnel.c | 10 ++++++++++
net/key/af_key.c | 2 +-
net/l2tp/l2tp_core.c | 5 +++++
net/netfilter/nf_tables_api.c | 7 +++++++
net/netfilter/nfnetlink_log.c | 5 +++++
net/netfilter/nfnetlink_queue.c | 6 ++++++
net/netfilter/x_tables.c | 10 ++++++++++
net/packet/af_packet.c | 1 +
net/phonet/pn_dev.c | 3 +++
net/sunrpc/sunrpc_syms.c | 3 +++
18 files changed, 78 insertions(+), 1 deletion(-)

--
2.7.4



2017-11-12 19:27:16

by Vasily Averin

[permalink] [raw]
Subject: [PATCH v5 00/13] exit_net checks for objects initialized in net_init hook

OpenVz kernel team have a long history of fighting against namespace-related bugs,
some of them could be prevented by using simple checks described below.

One of typical errors is related to live cycle of namespaces:
usually objects created for some namespace should not live longer than namespace itself.

Such kind of issues can be invisible on usual systems where additional namespaces
are not used, because initial namespaces usually lives forever and never destroyed.

However in systems with namespaces it can lead to memory leaks or to use-after-free.
Both of them are critical for systems with running containers.
As you knows it's quite hard to find the reason of such issues,
especially in rarely-triggered scenarios on production nodes on default kernels
without specially enabled debug settings. Any additional hints can be useful here.

This patch set should help to detect some of these issues.
It is based on assumption that objects initialized in init hook of pernet_operations
should return to initial state until end of exit hook.

Many drivers and subsystems already have such checks, however I've found number
of places where list_empty check would be useful at least as smoke test.

These checks are useful for long-term stable kernels,
they allows to detect problems related to incomplete or incorrectly
backported patches.

Changes:
v5:
- fixed nit pointed by Florian Westphal
- netfilter patches are send separately to netfilter-devel@

v4:
- excluded grace and lockd patches taken by Bruce Fields
- let's use WARN_ON_ONCE without any extra messages
adobriyan@ is right, output of net Id gives nothing to host admin,
and developers in any case will extract information from core dump
- updated description in cover letter
- dropped nfs4blocklayout patch: waitqueue check does not look useful
- patches was reordered to be per-subsystem grouped
- cover letter should be sent to all people included into cc: of any patches
- minor cosmetic changes in some patches

v3:
- use net->ns.inum as net Id
- removed patches for hashlimit and recent,
they handle tables list in exit_net hook.
- added patches for grace and lockd

v2:
- net pointer removed from output
- fixed compilation for phonet driver


Vasily Averin (13):
af_key: replace BUG_ON on WARN_ON in net_exit hook
geneve: exit_net cleanup check added
packet: exit_net cleanup check added
vxlan: exit_net cleanup checks added
netdev: exit_net cleanup check added
fib_notifier: exit_net cleanup check added
fib_rules: exit_net cleanup check added
l2tp: exit_net cleanup check added
nfs client: exit_net cleanup check added
sunrpc: exit_net cleanup check added
phonet: exit_net cleanup check added
ppp: exit_net cleanup checks added
xfrm6_tunnel: exit_net cleanup check added

drivers/net/geneve.c | 1 +
drivers/net/ppp/ppp_generic.c | 2 ++
drivers/net/vxlan.c | 4 ++++
fs/nfs/inode.c | 4 ++++
net/core/dev.c | 2 ++
net/core/fib_notifier.c | 6 ++++++
net/core/fib_rules.c | 6 ++++++
net/ipv6/xfrm6_tunnel.c | 8 ++++++++
net/key/af_key.c | 2 +-
net/l2tp/l2tp_core.c | 4 ++++
net/packet/af_packet.c | 1 +
net/phonet/pn_dev.c | 3 +++
net/sunrpc/sunrpc_syms.c | 3 +++
13 files changed, 45 insertions(+), 1 deletion(-)

--
2.7.4


2017-11-12 19:31:29

by Vasily Averin

[permalink] [raw]
Subject: [PATCH v5 09/13] nfs client: exit_net cleanup check added

Be sure that nfs_client_list and nfs_volume_list lists initialized
in net_init hook were return to initial state in net_exit hook.

Signed-off-by: Vasily Averin <[email protected]>
---
fs/nfs/inode.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 134d9f5..4ef515f 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -2084,8 +2084,12 @@ static int nfs_net_init(struct net *net)

static void nfs_net_exit(struct net *net)
{
+ struct nfs_net *nn = net_generic(net, nfs_net_id);
+
nfs_fs_proc_net_exit(net);
nfs_cleanup_cb_ident_idr(net);
+ WARN_ON_ONCE(!list_empty(&nn->nfs_client_list));
+ WARN_ON_ONCE(!list_empty(&nn->nfs_volume_list));
}

static struct pernet_operations nfs_net_ops = {
--
2.7.4


2017-11-12 19:32:21

by Vasily Averin

[permalink] [raw]
Subject: [PATCH v5 10/13] sunrpc: exit_net cleanup check added

Be sure that all_clients list initialized in net_init hook was return
to initial state.

Signed-off-by: Vasily Averin <[email protected]>
---
net/sunrpc/sunrpc_syms.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/net/sunrpc/sunrpc_syms.c b/net/sunrpc/sunrpc_syms.c
index c73de18..56f9eff 100644
--- a/net/sunrpc/sunrpc_syms.c
+++ b/net/sunrpc/sunrpc_syms.c
@@ -65,10 +65,13 @@ static __net_init int sunrpc_init_net(struct net *net)

static __net_exit void sunrpc_exit_net(struct net *net)
{
+ struct sunrpc_net *sn = net_generic(net, sunrpc_net_id);
+
rpc_pipefs_exit_net(net);
unix_gid_cache_destroy(net);
ip_map_cache_destroy(net);
rpc_proc_exit(net);
+ WARN_ON_ONCE(!list_empty(&sn->all_clients));
}

static struct pernet_operations sunrpc_net_ops = {
--
2.7.4


2017-11-14 06:47:24

by David Miller

[permalink] [raw]
Subject: Re: [PATCH v5 00/13] exit_net checks for objects initialized in net_init hook

From: Vasily Averin <[email protected]>
Date: Sun, 12 Nov 2017 22:26:44 +0300

> OpenVz kernel team have a long history of fighting against namespace-related bugs,
> some of them could be prevented by using simple checks described below.
>
> One of typical errors is related to live cycle of namespaces:
> usually objects created for some namespace should not live longer than namespace itself.
>
> Such kind of issues can be invisible on usual systems where additional namespaces
> are not used, because initial namespaces usually lives forever and never destroyed.
>
> However in systems with namespaces it can lead to memory leaks or to use-after-free.
> Both of them are critical for systems with running containers.
> As you knows it's quite hard to find the reason of such issues,
> especially in rarely-triggered scenarios on production nodes on default kernels
> without specially enabled debug settings. Any additional hints can be useful here.
>
> This patch set should help to detect some of these issues.
> It is based on assumption that objects initialized in init hook of pernet_operations
> should return to initial state until end of exit hook.
>
> Many drivers and subsystems already have such checks, however I've found number
> of places where list_empty check would be useful at least as smoke test.
>
> These checks are useful for long-term stable kernels,
> they allows to detect problems related to incomplete or incorrectly
> backported patches.

All applied to net-next except patch #9 and #10 which need to go via the
NFS maintainer.

2017-11-14 14:41:44

by Vasily Averin

[permalink] [raw]
Subject: Re: [PATCH v5 00/13] exit_net checks for objects initialized in net_init hook

On 2017-11-14 09:47, David Miller wrote:
> From: Vasily Averin <[email protected]>
> Date: Sun, 12 Nov 2017 22:26:44 +0300
>
>> OpenVz kernel team have a long history of fighting against namespace-related bugs,
>> some of them could be prevented by using simple checks described below.
>>
>> One of typical errors is related to live cycle of namespaces:
>> usually objects created for some namespace should not live longer than namespace itself.
>
> All applied to net-next except patch #9 and #10 which need to go via the
> NFS maintainer.

As far as I see Anna included them into Anna's linux-nfs work git.

Thank you!
Vasily Averin