2021-03-04 14:25:00

by Mark Tomlinson

[permalink] [raw]
Subject: [PATCH 0/3] Don't use RCU for x_tables synchronization

The patches to change to using RCU synchronization in x_tables cause
updating tables to be slowed down by an order of magnitude. This has
been tried before, see https://lore.kernel.org/patchwork/patch/151796/
and ultimately was rejected. As mentioned in the patch description, a
different method can be used to ensure ordering of reads/writes. This
can simply be done by changing from smp_wmb() to smp_mb().

Mark Tomlinson (3):
Revert "netfilter: x_tables: Update remaining dereference to RCU"
Revert "netfilter: x_tables: Switch synchronization to RCU"
netfilter: x_tables: Use correct memory barriers.

include/linux/netfilter/x_tables.h | 7 ++---
net/ipv4/netfilter/arp_tables.c | 16 +++++-----
net/ipv4/netfilter/ip_tables.c | 16 +++++-----
net/ipv6/netfilter/ip6_tables.c | 16 +++++-----
net/netfilter/x_tables.c | 49 +++++++++++++++++++++---------
5 files changed, 60 insertions(+), 44 deletions(-)

--
2.30.1


2021-03-04 23:59:06

by Mark Tomlinson

[permalink] [raw]
Subject: [PATCH 1/3] Revert "netfilter: x_tables: Update remaining dereference to RCU"

This reverts commit 443d6e86f821a165fae3fc3fc13086d27ac140b1.

This (and the following) patch basically re-implemented the RCU
mechanisms of patch 784544739a25. That patch was replaced because of the
performance problems that it created when replacing tables. Now, we have
the same issue: the call to synchronize_rcu() makes replacing tables
slower by as much as an order of magnitude.

See https://lore.kernel.org/patchwork/patch/151796/ for why using RCU is
not a good idea.

Revert these patches and fix the issue in a different way.

Signed-off-by: Mark Tomlinson <[email protected]>
---
net/ipv4/netfilter/arp_tables.c | 2 +-
net/ipv4/netfilter/ip_tables.c | 2 +-
net/ipv6/netfilter/ip6_tables.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index c576a63d09db..563b62b76a5f 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -1379,7 +1379,7 @@ static int compat_get_entries(struct net *net,
xt_compat_lock(NFPROTO_ARP);
t = xt_find_table_lock(net, NFPROTO_ARP, get.name);
if (!IS_ERR(t)) {
- const struct xt_table_info *private = xt_table_get_private_protected(t);
+ const struct xt_table_info *private = t->private;
struct xt_table_info info;

ret = compat_table_info(private, &info);
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index e8f6f9d86237..6e2851f8d3a3 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -1589,7 +1589,7 @@ compat_get_entries(struct net *net, struct compat_ipt_get_entries __user *uptr,
xt_compat_lock(AF_INET);
t = xt_find_table_lock(net, AF_INET, get.name);
if (!IS_ERR(t)) {
- const struct xt_table_info *private = xt_table_get_private_protected(t);
+ const struct xt_table_info *private = t->private;
struct xt_table_info info;
ret = compat_table_info(private, &info);
if (!ret && get.size == info.size)
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 0d453fa9e327..c4f532f4d311 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -1598,7 +1598,7 @@ compat_get_entries(struct net *net, struct compat_ip6t_get_entries __user *uptr,
xt_compat_lock(AF_INET6);
t = xt_find_table_lock(net, AF_INET6, get.name);
if (!IS_ERR(t)) {
- const struct xt_table_info *private = xt_table_get_private_protected(t);
+ const struct xt_table_info *private = t->private;
struct xt_table_info info;
ret = compat_table_info(private, &info);
if (!ret && get.size == info.size)
--
2.30.1

2021-03-05 00:00:04

by Mark Tomlinson

[permalink] [raw]
Subject: [PATCH 3/3] netfilter: x_tables: Use correct memory barriers.

When a new table value was assigned, it was followed by a write memory
barrier. This ensured that all writes before this point would complete
before any writes after this point. However, to determine whether the
rules are unused, the sequence counter is read. To ensure that all
writes have been done before these reads, a full memory barrier is
needed, not just a write memory barrier. The same argument applies when
incrementing the counter, before the rules are read.

Changing to using smp_mb() instead of smp_wmb() fixes the kernel panic
reported in cc00bcaa5899, while still maintaining the same speed of
replacing tables.

Fixes: 7f5c6d4f665b ("netfilter: get rid of atomic ops in fast path")
Signed-off-by: Mark Tomlinson <[email protected]>
---
include/linux/netfilter/x_tables.h | 2 +-
net/netfilter/x_tables.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index 5deb099d156d..8ec48466410a 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -376,7 +376,7 @@ static inline unsigned int xt_write_recseq_begin(void)
* since addend is most likely 1
*/
__this_cpu_add(xt_recseq.sequence, addend);
- smp_wmb();
+ smp_mb();

return addend;
}
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index af22dbe85e2c..a2b50596b87e 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1387,7 +1387,7 @@ xt_replace_table(struct xt_table *table,
table->private = newinfo;

/* make sure all cpus see new ->private value */
- smp_wmb();
+ smp_mb();

/*
* Even though table entries have now been swapped, other CPU's
--
2.30.1

2021-03-05 00:06:08

by Florian Westphal

[permalink] [raw]
Subject: Re: [PATCH 1/3] Revert "netfilter: x_tables: Update remaining dereference to RCU"

Mark Tomlinson <[email protected]> wrote:
> This reverts commit 443d6e86f821a165fae3fc3fc13086d27ac140b1.
>
> This (and the following) patch basically re-implemented the RCU
> mechanisms of patch 784544739a25. That patch was replaced because of the
> performance problems that it created when replacing tables. Now, we have
> the same issue: the call to synchronize_rcu() makes replacing tables
> slower by as much as an order of magnitude.
>
> See https://lore.kernel.org/patchwork/patch/151796/ for why using RCU is
> not a good idea.

Please don't add links for the rationale.