2022-07-06 23:42:31

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v2 net 00/12] sysctl: Fix data-races around ipv4_table.

A sysctl variable is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.

The first half of this series changes some proc handlers used in ipv4_table
to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the
sysctl side. Then, the second half adds READ_ONCE() to the other readers
of ipv4_table.


Changes:
v2:
* Drop some changes that makes backporting difficult
* First cleanup patch
* Lockless helpers and .proc_handler changes
* Drop the tracing part for .sysctl_mem
* Steve already posted a fix
* Drop int-to-bool change for cipso
* Should be posted to net-next later
* Drop proc_dobool() change
* Can be included in another series

v1: https://lore.kernel.org/netdev/[email protected]/


Kuniyuki Iwashima (12):
sysctl: Fix data races in proc_dointvec().
sysctl: Fix data races in proc_douintvec().
sysctl: Fix data races in proc_dointvec_minmax().
sysctl: Fix data races in proc_douintvec_minmax().
sysctl: Fix data races in proc_doulongvec_minmax().
sysctl: Fix data races in proc_dointvec_jiffies().
tcp: Fix a data-race around sysctl_tcp_max_orphans.
inetpeer: Fix data-races around sysctl.
net: Fix data-races around sysctl_mem.
cipso: Fix data-races around sysctl.
icmp: Fix data-races around sysctl.
ipv4: Fix a data-race around sysctl_fib_sync_mem.

Documentation/networking/ip-sysctl.rst | 2 +-
include/net/sock.h | 2 +-
kernel/sysctl.c | 25 ++++++++++++++-----------
net/ipv4/cipso_ipv4.c | 12 +++++++-----
net/ipv4/fib_trie.c | 2 +-
net/ipv4/icmp.c | 5 +++--
net/ipv4/inetpeer.c | 12 ++++++++----
net/ipv4/tcp.c | 3 ++-
8 files changed, 37 insertions(+), 26 deletions(-)

--
2.30.2


2022-07-06 23:44:20

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v2 net 04/12] sysctl: Fix data races in proc_douintvec_minmax().

A sysctl variable is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.

This patch changes proc_douintvec_minmax() to use READ_ONCE() and
WRITE_ONCE() internally to fix data-races on the sysctl side. For now,
proc_douintvec_minmax() itself is tolerant to a data-race, but we still
need to add annotations on the other subsystem's side.

Fixes: 61d9b56a8920 ("sysctl: add unsigned int range support")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
kernel/sysctl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4d87832367f2..379721a03d41 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -923,7 +923,7 @@ static int do_proc_douintvec_minmax_conv(unsigned long *lvalp,
(param->max && *param->max < tmp))
return -ERANGE;

- *valp = tmp;
+ WRITE_ONCE(*valp, tmp);
}

return 0;
--
2.30.2

2022-07-06 23:50:34

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v2 net 06/12] sysctl: Fix data races in proc_dointvec_jiffies().

A sysctl variable is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.

This patch changes proc_dointvec_jiffies() to use READ_ONCE() and
WRITE_ONCE() internally to fix data-races on the sysctl side. For now,
proc_dointvec_jiffies() itself is tolerant to a data-race, but we still
need to add annotations on the other subsystem's side.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
kernel/sysctl.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 8c55ba01f41b..bf9383d17e1b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1173,9 +1173,12 @@ static int do_proc_dointvec_jiffies_conv(bool *negp, unsigned long *lvalp,
if (write) {
if (*lvalp > INT_MAX / HZ)
return 1;
- *valp = *negp ? -(*lvalp*HZ) : (*lvalp*HZ);
+ if (*negp)
+ WRITE_ONCE(*valp, -*lvalp * HZ);
+ else
+ WRITE_ONCE(*valp, *lvalp * HZ);
} else {
- int val = *valp;
+ int val = READ_ONCE(*valp);
unsigned long lval;
if (val < 0) {
*negp = true;
--
2.30.2

2022-07-06 23:50:43

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v2 net 12/12] ipv4: Fix a data-race around sysctl_fib_sync_mem.

While reading sysctl_fib_sync_mem, it can be changed concurrently.
So, we need to add READ_ONCE() to avoid a data-race.

Fixes: 9ab948a91b2c ("ipv4: Allow amount of dirty memory from fib resizing to be controllable")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
CC: David Ahern <[email protected]>
---
net/ipv4/fib_trie.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 2734c3af7e24..46e8a5125853 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -498,7 +498,7 @@ static void tnode_free(struct key_vector *tn)
tn = container_of(head, struct tnode, rcu)->kv;
}

- if (tnode_free_size >= sysctl_fib_sync_mem) {
+ if (tnode_free_size >= READ_ONCE(sysctl_fib_sync_mem)) {
tnode_free_size = 0;
synchronize_rcu();
}
--
2.30.2

2022-07-06 23:52:43

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v2 net 05/12] sysctl: Fix data races in proc_doulongvec_minmax().

A sysctl variable is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.

This patch changes proc_doulongvec_minmax() to use READ_ONCE() and
WRITE_ONCE() internally to fix data-races on the sysctl side. For now,
proc_doulongvec_minmax() itself is tolerant to a data-race, but we still
need to add annotations on the other subsystem's side.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
kernel/sysctl.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 379721a03d41..8c55ba01f41b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1090,9 +1090,9 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table,
err = -EINVAL;
break;
}
- *i = val;
+ WRITE_ONCE(*i, val);
} else {
- val = convdiv * (*i) / convmul;
+ val = convdiv * READ_ONCE(*i) / convmul;
if (!first)
proc_put_char(&buffer, &left, '\t');
proc_put_long(&buffer, &left, val, false);
--
2.30.2

2022-07-06 23:53:02

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v2 net 01/12] sysctl: Fix data races in proc_dointvec().

A sysctl variable is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.

This patch changes proc_dointvec() to use READ_ONCE() and WRITE_ONCE()
internally to fix data-races on the sysctl side. For now, proc_dointvec()
itself is tolerant to a data-race, but we still need to add annotations on
the other subsystem's side.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
kernel/sysctl.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index e52b6e372c60..c8a05655ae60 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -446,14 +446,14 @@ static int do_proc_dointvec_conv(bool *negp, unsigned long *lvalp,
if (*negp) {
if (*lvalp > (unsigned long) INT_MAX + 1)
return -EINVAL;
- *valp = -*lvalp;
+ WRITE_ONCE(*valp, -*lvalp);
} else {
if (*lvalp > (unsigned long) INT_MAX)
return -EINVAL;
- *valp = *lvalp;
+ WRITE_ONCE(*valp, *lvalp);
}
} else {
- int val = *valp;
+ int val = READ_ONCE(*valp);
if (val < 0) {
*negp = true;
*lvalp = -(unsigned long)val;
--
2.30.2

2022-07-06 23:53:38

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v2 net 11/12] icmp: Fix data-races around sysctl.

While reading icmp sysctl variables, they can be changed concurrently.
So, we need to add READ_ONCE() to avoid data-races.

Fixes: 4cdf507d5452 ("icmp: add a global rate limitation")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
net/ipv4/icmp.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index efea0e796f06..0f9e61d29f73 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -253,11 +253,12 @@ bool icmp_global_allow(void)
spin_lock(&icmp_global.lock);
delta = min_t(u32, now - icmp_global.stamp, HZ);
if (delta >= HZ / 50) {
- incr = sysctl_icmp_msgs_per_sec * delta / HZ ;
+ incr = READ_ONCE(sysctl_icmp_msgs_per_sec) * delta / HZ;
if (incr)
WRITE_ONCE(icmp_global.stamp, now);
}
- credit = min_t(u32, icmp_global.credit + incr, sysctl_icmp_msgs_burst);
+ credit = min_t(u32, icmp_global.credit + incr,
+ READ_ONCE(sysctl_icmp_msgs_burst));
if (credit) {
/* We want to use a credit of one in average, but need to randomize
* it for security reasons.
--
2.30.2

2022-07-06 23:54:08

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v2 net 09/12] net: Fix data-races around sysctl_mem.

While reading .sysctl_mem, it can be changed concurrently.
So, we need to add READ_ONCE() to avoid data-races.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
include/net/sock.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 72ca97ccb460..9fa54762e077 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1529,7 +1529,7 @@ void __sk_mem_reclaim(struct sock *sk, int amount);
/* sysctl_mem values are in pages, we convert them in SK_MEM_QUANTUM units */
static inline long sk_prot_mem_limits(const struct sock *sk, int index)
{
- long val = sk->sk_prot->sysctl_mem[index];
+ long val = READ_ONCE(sk->sk_prot->sysctl_mem[index]);

#if PAGE_SIZE > SK_MEM_QUANTUM
val <<= PAGE_SHIFT - SK_MEM_QUANTUM_SHIFT;
--
2.30.2

2022-07-06 23:57:28

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v2 net 07/12] tcp: Fix a data-race around sysctl_tcp_max_orphans.

While reading sysctl_tcp_max_orphans, it can be changed concurrently.
So, we need to add READ_ONCE() to avoid a data-race.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
net/ipv4/tcp.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 028513d3e2a2..2222dfdde316 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2715,7 +2715,8 @@ static void tcp_orphan_update(struct timer_list *unused)

static bool tcp_too_many_orphans(int shift)
{
- return READ_ONCE(tcp_orphan_cache) << shift > sysctl_tcp_max_orphans;
+ return READ_ONCE(tcp_orphan_cache) << shift >
+ READ_ONCE(sysctl_tcp_max_orphans);
}

bool tcp_check_oom(struct sock *sk, int shift)
--
2.30.2

2022-07-07 00:09:45

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v2 net 03/12] sysctl: Fix data races in proc_dointvec_minmax().

A sysctl variable is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.

This patch changes proc_dointvec_minmax() to use READ_ONCE() and
WRITE_ONCE() internally to fix data-races on the sysctl side. For now,
proc_dointvec_minmax() itself is tolerant to a data-race, but we still
need to add annotations on the other subsystem's side.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
kernel/sysctl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 2ab8c2a37e8f..4d87832367f2 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -857,7 +857,7 @@ static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp,
if ((param->min && *param->min > tmp) ||
(param->max && *param->max < tmp))
return -EINVAL;
- *valp = tmp;
+ WRITE_ONCE(*valp, tmp);
}

return 0;
--
2.30.2

2022-07-07 00:12:33

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v2 net 02/12] sysctl: Fix data races in proc_douintvec().

A sysctl variable is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.

This patch changes proc_douintvec() to use READ_ONCE() and WRITE_ONCE()
internally to fix data-races on the sysctl side. For now, proc_douintvec()
itself is tolerant to a data-race, but we still need to add annotations on
the other subsystem's side.

Fixes: e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32 fields")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
CC: Subash Abhinov Kasiviswanathan <[email protected]>
---
kernel/sysctl.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c8a05655ae60..2ab8c2a37e8f 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -472,9 +472,9 @@ static int do_proc_douintvec_conv(unsigned long *lvalp,
if (write) {
if (*lvalp > UINT_MAX)
return -EINVAL;
- *valp = *lvalp;
+ WRITE_ONCE(*valp, *lvalp);
} else {
- unsigned int val = *valp;
+ unsigned int val = READ_ONCE(*valp);
*lvalp = (unsigned long)val;
}
return 0;
--
2.30.2

2022-07-07 00:16:12

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v2 net 08/12] inetpeer: Fix data-races around sysctl.

While reading inetpeer sysctl variables, they can be changed
concurrently. So, we need to add READ_ONCE() to avoid data-races.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
net/ipv4/inetpeer.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index da21dfce24d7..e9fed83e9b3c 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -141,16 +141,20 @@ static void inet_peer_gc(struct inet_peer_base *base,
struct inet_peer *gc_stack[],
unsigned int gc_cnt)
{
+ int peer_threshold, peer_maxttl, peer_minttl;
struct inet_peer *p;
__u32 delta, ttl;
int i;

- if (base->total >= inet_peer_threshold)
+ peer_threshold = READ_ONCE(inet_peer_threshold);
+ peer_maxttl = READ_ONCE(inet_peer_maxttl);
+ peer_minttl = READ_ONCE(inet_peer_minttl);
+
+ if (base->total >= peer_threshold)
ttl = 0; /* be aggressive */
else
- ttl = inet_peer_maxttl
- - (inet_peer_maxttl - inet_peer_minttl) / HZ *
- base->total / inet_peer_threshold * HZ;
+ ttl = peer_maxttl - (peer_maxttl - peer_minttl) / HZ *
+ base->total / peer_threshold * HZ;
for (i = 0; i < gc_cnt; i++) {
p = gc_stack[i];

--
2.30.2

2022-07-07 00:43:40

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v2 net 10/12] cipso: Fix data-races around sysctl.

While reading cipso sysctl variables, they can be changed concurrently.
So, we need to add READ_ONCE() to avoid data-races.

Fixes: 446fda4f2682 ("[NetLabel]: CIPSOv4 engine")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
CC: Paul Moore <[email protected]>
---
Documentation/networking/ip-sysctl.rst | 2 +-
net/ipv4/cipso_ipv4.c | 12 +++++++-----
2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 9f41961d11d5..0e58001f8580 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -1085,7 +1085,7 @@ cipso_cache_enable - BOOLEAN
cipso_cache_bucket_size - INTEGER
The CIPSO label cache consists of a fixed size hash table with each
hash bucket containing a number of cache entries. This variable limits
- the number of entries in each hash bucket; the larger the value the
+ the number of entries in each hash bucket; the larger the value is, the
more CIPSO label mappings that can be cached. When the number of
entries in a given hash bucket reaches this limit adding new entries
causes the oldest entry in the bucket to be removed to make room.
diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index 62d5f99760aa..6cd3b6c559f0 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -239,7 +239,7 @@ static int cipso_v4_cache_check(const unsigned char *key,
struct cipso_v4_map_cache_entry *prev_entry = NULL;
u32 hash;

- if (!cipso_v4_cache_enabled)
+ if (!READ_ONCE(cipso_v4_cache_enabled))
return -ENOENT;

hash = cipso_v4_map_cache_hash(key, key_len);
@@ -296,13 +296,14 @@ static int cipso_v4_cache_check(const unsigned char *key,
int cipso_v4_cache_add(const unsigned char *cipso_ptr,
const struct netlbl_lsm_secattr *secattr)
{
+ int bkt_size = READ_ONCE(cipso_v4_cache_bucketsize);
int ret_val = -EPERM;
u32 bkt;
struct cipso_v4_map_cache_entry *entry = NULL;
struct cipso_v4_map_cache_entry *old_entry = NULL;
u32 cipso_ptr_len;

- if (!cipso_v4_cache_enabled || cipso_v4_cache_bucketsize <= 0)
+ if (!READ_ONCE(cipso_v4_cache_enabled) || bkt_size <= 0)
return 0;

cipso_ptr_len = cipso_ptr[1];
@@ -322,7 +323,7 @@ int cipso_v4_cache_add(const unsigned char *cipso_ptr,

bkt = entry->hash & (CIPSO_V4_CACHE_BUCKETS - 1);
spin_lock_bh(&cipso_v4_cache[bkt].lock);
- if (cipso_v4_cache[bkt].size < cipso_v4_cache_bucketsize) {
+ if (cipso_v4_cache[bkt].size < bkt_size) {
list_add(&entry->list, &cipso_v4_cache[bkt].list);
cipso_v4_cache[bkt].size += 1;
} else {
@@ -1199,7 +1200,8 @@ static int cipso_v4_gentag_rbm(const struct cipso_v4_doi *doi_def,
/* This will send packets using the "optimized" format when
* possible as specified in section 3.4.2.6 of the
* CIPSO draft. */
- if (cipso_v4_rbm_optfmt && ret_val > 0 && ret_val <= 10)
+ if (READ_ONCE(cipso_v4_rbm_optfmt) && ret_val > 0 &&
+ ret_val <= 10)
tag_len = 14;
else
tag_len = 4 + ret_val;
@@ -1603,7 +1605,7 @@ int cipso_v4_validate(const struct sk_buff *skb, unsigned char **option)
* all the CIPSO validations here but it doesn't
* really specify _exactly_ what we need to validate
* ... so, just make it a sysctl tunable. */
- if (cipso_v4_rbm_strictvalid) {
+ if (READ_ONCE(cipso_v4_rbm_strictvalid)) {
if (cipso_v4_map_lvl_valid(doi_def,
tag[3]) < 0) {
err_offset = opt_iter + 3;
--
2.30.2

2022-07-07 19:36:24

by Paul Moore

[permalink] [raw]
Subject: Re: [PATCH v2 net 10/12] cipso: Fix data-races around sysctl.

On Wed, Jul 6, 2022 at 7:43 PM Kuniyuki Iwashima <[email protected]> wrote:
>
> While reading cipso sysctl variables, they can be changed concurrently.
> So, we need to add READ_ONCE() to avoid data-races.
>
> Fixes: 446fda4f2682 ("[NetLabel]: CIPSOv4 engine")
> Signed-off-by: Kuniyuki Iwashima <[email protected]>
> ---
> CC: Paul Moore <[email protected]>

Thanks for the patch, this looks good to me. However, in the future
you should probably drop the extra "---" separator (just leave the one
before the diffstat below) and move my "Cc:" up above "Fixes:".

Acked-by: Paul Moore <[email protected]>

> ---
> Documentation/networking/ip-sysctl.rst | 2 +-
> net/ipv4/cipso_ipv4.c | 12 +++++++-----
> 2 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
> index 9f41961d11d5..0e58001f8580 100644
> --- a/Documentation/networking/ip-sysctl.rst
> +++ b/Documentation/networking/ip-sysctl.rst
> @@ -1085,7 +1085,7 @@ cipso_cache_enable - BOOLEAN
> cipso_cache_bucket_size - INTEGER
> The CIPSO label cache consists of a fixed size hash table with each
> hash bucket containing a number of cache entries. This variable limits
> - the number of entries in each hash bucket; the larger the value the
> + the number of entries in each hash bucket; the larger the value is, the
> more CIPSO label mappings that can be cached. When the number of
> entries in a given hash bucket reaches this limit adding new entries
> causes the oldest entry in the bucket to be removed to make room.
> diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
> index 62d5f99760aa..6cd3b6c559f0 100644
> --- a/net/ipv4/cipso_ipv4.c
> +++ b/net/ipv4/cipso_ipv4.c
> @@ -239,7 +239,7 @@ static int cipso_v4_cache_check(const unsigned char *key,
> struct cipso_v4_map_cache_entry *prev_entry = NULL;
> u32 hash;
>
> - if (!cipso_v4_cache_enabled)
> + if (!READ_ONCE(cipso_v4_cache_enabled))
> return -ENOENT;
>
> hash = cipso_v4_map_cache_hash(key, key_len);
> @@ -296,13 +296,14 @@ static int cipso_v4_cache_check(const unsigned char *key,
> int cipso_v4_cache_add(const unsigned char *cipso_ptr,
> const struct netlbl_lsm_secattr *secattr)
> {
> + int bkt_size = READ_ONCE(cipso_v4_cache_bucketsize);
> int ret_val = -EPERM;
> u32 bkt;
> struct cipso_v4_map_cache_entry *entry = NULL;
> struct cipso_v4_map_cache_entry *old_entry = NULL;
> u32 cipso_ptr_len;
>
> - if (!cipso_v4_cache_enabled || cipso_v4_cache_bucketsize <= 0)
> + if (!READ_ONCE(cipso_v4_cache_enabled) || bkt_size <= 0)
> return 0;
>
> cipso_ptr_len = cipso_ptr[1];
> @@ -322,7 +323,7 @@ int cipso_v4_cache_add(const unsigned char *cipso_ptr,
>
> bkt = entry->hash & (CIPSO_V4_CACHE_BUCKETS - 1);
> spin_lock_bh(&cipso_v4_cache[bkt].lock);
> - if (cipso_v4_cache[bkt].size < cipso_v4_cache_bucketsize) {
> + if (cipso_v4_cache[bkt].size < bkt_size) {
> list_add(&entry->list, &cipso_v4_cache[bkt].list);
> cipso_v4_cache[bkt].size += 1;
> } else {
> @@ -1199,7 +1200,8 @@ static int cipso_v4_gentag_rbm(const struct cipso_v4_doi *doi_def,
> /* This will send packets using the "optimized" format when
> * possible as specified in section 3.4.2.6 of the
> * CIPSO draft. */
> - if (cipso_v4_rbm_optfmt && ret_val > 0 && ret_val <= 10)
> + if (READ_ONCE(cipso_v4_rbm_optfmt) && ret_val > 0 &&
> + ret_val <= 10)
> tag_len = 14;
> else
> tag_len = 4 + ret_val;
> @@ -1603,7 +1605,7 @@ int cipso_v4_validate(const struct sk_buff *skb, unsigned char **option)
> * all the CIPSO validations here but it doesn't
> * really specify _exactly_ what we need to validate
> * ... so, just make it a sysctl tunable. */
> - if (cipso_v4_rbm_strictvalid) {
> + if (READ_ONCE(cipso_v4_rbm_strictvalid)) {
> if (cipso_v4_map_lvl_valid(doi_def,
> tag[3]) < 0) {
> err_offset = opt_iter + 3;
> --
> 2.30.2

--
paul-moore.com

2022-07-07 22:33:47

by Kuniyuki Iwashima

[permalink] [raw]
Subject: Re: [PATCH v2 net 10/12] cipso: Fix data-races around sysctl.

From: Paul Moore <[email protected]>
Date: Thu, 7 Jul 2022 15:15:52 -0400
> On Wed, Jul 6, 2022 at 7:43 PM Kuniyuki Iwashima <[email protected]> wrote:
> >
> > While reading cipso sysctl variables, they can be changed concurrently.
> > So, we need to add READ_ONCE() to avoid data-races.
> >
> > Fixes: 446fda4f2682 ("[NetLabel]: CIPSOv4 engine")
> > Signed-off-by: Kuniyuki Iwashima <[email protected]>
> > ---
> > CC: Paul Moore <[email protected]>
>
> Thanks for the patch, this looks good to me. However, in the future
> you should probably drop the extra "---" separator (just leave the one
> before the diffstat below) and move my "Cc:" up above "Fixes:".
>
> Acked-by: Paul Moore <[email protected]>

I was wondering if both CC and Acked-by should stay in each commit, but
will do so in the next time.

Thank you for taking a look!

2022-07-08 00:42:56

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH v2 net 10/12] cipso: Fix data-races around sysctl.

On Thu, 7 Jul 2022 15:15:37 -0700 Kuniyuki Iwashima wrote:
> I was wondering if both CC and Acked-by should stay in each commit, but
> will do so in the next time.

For Paul only, don't take that as general advice.

2022-07-08 00:46:22

by Kuniyuki Iwashima

[permalink] [raw]
Subject: Re: [PATCH v2 net 10/12] cipso: Fix data-races around sysctl.

From: Jakub Kicinski <[email protected]>
Date: Thu, 7 Jul 2022 17:24:24 -0700
> On Thu, 7 Jul 2022 15:15:37 -0700 Kuniyuki Iwashima wrote:
> > I was wondering if both CC and Acked-by should stay in each commit, but
> > will do so in the next time.
>
> For Paul only, don't take that as general advice.

...got it, thanks :)

2022-07-08 12:47:29

by patchwork-bot+netdevbpf

[permalink] [raw]
Subject: Re: [PATCH v2 net 00/12] sysctl: Fix data-races around ipv4_table.

Hello:

This series was applied to netdev/net.git (master)
by David S. Miller <[email protected]>:

On Wed, 6 Jul 2022 16:39:51 -0700 you wrote:
> A sysctl variable is accessed concurrently, and there is always a chance
> of data-race. So, all readers and writers need some basic protection to
> avoid load/store-tearing.
>
> The first half of this series changes some proc handlers used in ipv4_table
> to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the
> sysctl side. Then, the second half adds READ_ONCE() to the other readers
> of ipv4_table.
>
> [...]

Here is the summary with links:
- [v2,net,01/12] sysctl: Fix data races in proc_dointvec().
https://git.kernel.org/netdev/net/c/1f1be04b4d48
- [v2,net,02/12] sysctl: Fix data races in proc_douintvec().
https://git.kernel.org/netdev/net/c/4762b532ec95
- [v2,net,03/12] sysctl: Fix data races in proc_dointvec_minmax().
https://git.kernel.org/netdev/net/c/f613d86d014b
- [v2,net,04/12] sysctl: Fix data races in proc_douintvec_minmax().
https://git.kernel.org/netdev/net/c/2d3b559df3ed
- [v2,net,05/12] sysctl: Fix data races in proc_doulongvec_minmax().
https://git.kernel.org/netdev/net/c/c31bcc8fb89f
- [v2,net,06/12] sysctl: Fix data races in proc_dointvec_jiffies().
https://git.kernel.org/netdev/net/c/e87782087766
- [v2,net,07/12] tcp: Fix a data-race around sysctl_tcp_max_orphans.
https://git.kernel.org/netdev/net/c/47e6ab24e8c6
- [v2,net,08/12] inetpeer: Fix data-races around sysctl.
https://git.kernel.org/netdev/net/c/3d32edf1f3c3
- [v2,net,09/12] net: Fix data-races around sysctl_mem.
https://git.kernel.org/netdev/net/c/310731e2f161
- [v2,net,10/12] cipso: Fix data-races around sysctl.
https://git.kernel.org/netdev/net/c/dd44f04b9214
- [v2,net,11/12] icmp: Fix data-races around sysctl.
https://git.kernel.org/netdev/net/c/48d7ee321ea5
- [v2,net,12/12] ipv4: Fix a data-race around sysctl_fib_sync_mem.
https://git.kernel.org/netdev/net/c/73318c4b7dbd

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html


2022-07-09 09:45:40

by Matthieu Baerts

[permalink] [raw]
Subject: Re: [PATCH v2 net 09/12] net: Fix data-races around sysctl_mem.

Hello,

On 07/07/2022 01:40, Kuniyuki Iwashima wrote:
> While reading .sysctl_mem, it can be changed concurrently.
> So, we need to add READ_ONCE() to avoid data-races.

FYI, we got a small conflict when merging -net in net-next in the MPTCP
tree due to this patch applied in -net:

310731e2f161 ("net: Fix data-races around sysctl_mem.")

and this one from net-next:

e70f3c701276 ("Revert "net: set SK_MEM_QUANTUM to 4096"")

The conflict has been resolved on our side[1] and the resolution we
suggest is attached to this email.

I'm sharing this thinking it can help others but if it only creates
noise, please tell me! :)

Cheers,
Matt

[1] https://github.com/multipath-tcp/mptcp_net-next/commit/b01bda9d0fe6
--
Tessares | Belgium | Hybrid Access Solutions
http://www.tessares.net


Attachments:
b01bda9d0fe6.patch (836.00 B)