2022-07-06 05:35:32

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v1 net 03/16] sysctl: Add proc_dointvec_lockless().

A sysctl variable is accessed concurrently, and there is always a chance of
data-race. So, all readers and writers need some basic protection to avoid
load/store-tearing.

This patch changes proc_dointvec() to use READ_ONCE()/WRITE_ONCE()
internally to fix a data-race on the sysctl side. For now, proc_dointvec()
itself is tolerant to a data-race, but we still need to add annotations on
the other subsystem's side.

In case we miss such fixes, this patch converts proc_dointvec() to a
wrapper of proc_dointvec_lockless(). When we fix a data-race in the other
subsystem, we can explicitly set it as a handler.

Also, this patch removes proc_dointvec()'s document and adds
proc_dointvec_lockless()'s one so that no one will use proc_dointvec()
anymore.

While we are on it, we remove some trailing spaces.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
include/linux/sysctl.h | 1 +
kernel/sysctl.c | 27 +++++++++++++++++++--------
2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index fcafc16abbad..cb87919b5508 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -84,6 +84,7 @@ PROC_HANDLER(proc_do_large_bitmap);
PROC_HANDLER(proc_do_static_key);

PROC_HANDLER(proc_dobool_lockless);
+PROC_HANDLER(proc_dointvec_lockless);

/*
* Register a set of sysctl names by calling register_sysctl_table
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index bc6fcc64eeaf..50d9b78aa0b3 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -445,14 +445,17 @@ static int do_proc_dointvec_conv(bool *negp, unsigned long *lvalp,
if (*negp) {
if (*lvalp > (unsigned long) INT_MAX + 1)
return -EINVAL;
- *valp = -*lvalp;
+
+ WRITE_ONCE(*valp, -*lvalp);
} else {
if (*lvalp > (unsigned long) INT_MAX)
return -EINVAL;
- *valp = *lvalp;
+
+ WRITE_ONCE(*valp, *lvalp);
}
} else {
- int val = *valp;
+ int val = READ_ONCE(*valp);
+
if (val < 0) {
*negp = true;
*lvalp = -(unsigned long)val;
@@ -491,12 +494,12 @@ static int __do_proc_dointvec(void *tbl_data, struct ctl_table *table,
int *i, vleft, first = 1, err = 0;
size_t left;
char *p;
-
+
if (!tbl_data || !table->maxlen || !*lenp || (*ppos && !write)) {
*lenp = 0;
return 0;
}
-
+
i = (int *) tbl_data;
vleft = table->maxlen / sizeof(*i);
left = *lenp;
@@ -726,7 +729,7 @@ int proc_dobool(struct ctl_table *table, int write, void *buffer,
}

/**
- * proc_dointvec - read a vector of integers
+ * proc_dointvec_lockless - read/write a vector of integers locklessly
* @table: the sysctl table
* @write: %TRUE if this is a write to the sysctl file
* @buffer: the user buffer
@@ -734,14 +737,20 @@ int proc_dobool(struct ctl_table *table, int write, void *buffer,
* @ppos: file position
*
* Reads/writes up to table->maxlen/sizeof(unsigned int) integer
- * values from/to the user buffer, treated as an ASCII string.
+ * values from/to the user buffer, treated as an ASCII string.
*
* Returns 0 on success.
*/
+int proc_dointvec_lockless(struct ctl_table *table, int write, void *buffer,
+ size_t *lenp, loff_t *ppos)
+{
+ return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL);
+}
+
int proc_dointvec(struct ctl_table *table, int write, void *buffer,
size_t *lenp, loff_t *ppos)
{
- return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL);
+ return proc_dointvec_lockless(table, write, buffer, lenp, ppos);
}

#ifdef CONFIG_COMPACTION
@@ -1503,6 +1512,7 @@ PROC_HANDLER_ENOSYS(proc_do_cad_pid);
PROC_HANDLER_ENOSYS(proc_do_large_bitmap);

PROC_HANDLER_ENOSYS(proc_dobool_lockless);
+PROC_HANDLER_ENOSYS(proc_dointvec_lockless);

#endif /* CONFIG_PROC_SYSCTL */

@@ -2414,3 +2424,4 @@ EXPORT_SYMBOL(proc_dointvec_ms_jiffies);
EXPORT_SYMBOL(proc_do_large_bitmap);

EXPORT_SYMBOL(proc_dobool_lockless);
+EXPORT_SYMBOL(proc_dointvec_lockless);
--
2.30.2


2022-07-06 07:41:14

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH v1 net 03/16] sysctl: Add proc_dointvec_lockless().

On Wed, Jul 6, 2022 at 7:22 AM Kuniyuki Iwashima <[email protected]> wrote:
>
> A sysctl variable is accessed concurrently, and there is always a chance of
> data-race. So, all readers and writers need some basic protection to avoid
> load/store-tearing.
>
> This patch changes proc_dointvec() to use READ_ONCE()/WRITE_ONCE()
> internally to fix a data-race on the sysctl side. For now, proc_dointvec()
> itself is tolerant to a data-race, but we still need to add annotations on
> the other subsystem's side.
>
> In case we miss such fixes, this patch converts proc_dointvec() to a
> wrapper of proc_dointvec_lockless(). When we fix a data-race in the other
> subsystem, we can explicitly set it as a handler.
>
> Also, this patch removes proc_dointvec()'s document and adds
> proc_dointvec_lockless()'s one so that no one will use proc_dointvec()
> anymore.
>
> While we are on it, we remove some trailing spaces.


I do not see why you add more functions.

Really all sysctls can change locklessly by nature, as I pointed out.

So I would simply add WRITE_ONCE() whenever they are written, and
READ_ONCE() when they are read.

If stable teams care enough, they will have to backport these changes,
so I would rather not have to change
proc_dointvec() to proc_dointvec_lockless() in many files, with many
conflicts, that ultimately will either
add bugs, or ask extra work for maintainers.

2022-07-06 16:25:46

by Kuniyuki Iwashima

[permalink] [raw]
Subject: Re: [PATCH v1 net 03/16] sysctl: Add proc_dointvec_lockless().

From: Eric Dumazet <[email protected]>
Date: Wed, 6 Jul 2022 09:00:11 +0200
> On Wed, Jul 6, 2022 at 7:22 AM Kuniyuki Iwashima <[email protected]> wrote:
> >
> > A sysctl variable is accessed concurrently, and there is always a chance of
> > data-race. So, all readers and writers need some basic protection to avoid
> > load/store-tearing.
> >
> > This patch changes proc_dointvec() to use READ_ONCE()/WRITE_ONCE()
> > internally to fix a data-race on the sysctl side. For now, proc_dointvec()
> > itself is tolerant to a data-race, but we still need to add annotations on
> > the other subsystem's side.
> >
> > In case we miss such fixes, this patch converts proc_dointvec() to a
> > wrapper of proc_dointvec_lockless(). When we fix a data-race in the other
> > subsystem, we can explicitly set it as a handler.
> >
> > Also, this patch removes proc_dointvec()'s document and adds
> > proc_dointvec_lockless()'s one so that no one will use proc_dointvec()
> > anymore.
> >
> > While we are on it, we remove some trailing spaces.
>
>
> I do not see why you add more functions.

It was not to miss where we still need fixes and to be taken care of
by newly added sysctl knob.


> Really all sysctls can change locklessly by nature, as I pointed out.
>
> So I would simply add WRITE_ONCE() whenever they are written, and
> READ_ONCE() when they are read.
>
> If stable teams care enough, they will have to backport these changes,
> so I would rather not have to change
> proc_dointvec() to proc_dointvec_lockless() in many files, with many
> conflicts, that ultimately will either
> add bugs, or ask extra work for maintainers.

Indeed, I will drop such changes and just add annotations in *_conv().
Thank you!