2022-07-06 06:00:29

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v1 net 00/16] sysctl: Fix data-races around ipv4_table.

A sysctl variable is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.

This series changes some proc handlers to use READ_ONCE()/WRITE_ONCE()
internally and tries to fix a data-race on the sysctl side. However, we
still need a fix for readers/writers in other subsystems.

Not to miss the fix, we convert such handlers to a wrapper function of one
with the "_lockless" suffix. When we add a fix on other subsystems, we set
the lockless handler as .proc_handler to mark the sysctl knob safe.

After this series, if a proc handler does not have the lockless suffix, it
means we need fixes in other subsystems. Finally, when there is no user of
proc handlers without the lockless suffix, we can remove them and get free
from sysctl data-races.

This series starts fixing from ipv4_table.


Kuniyuki Iwashima (16):
sysctl: Clean up proc_handler definitions.
sysctl: Add proc_dobool_lockless().
sysctl: Add proc_dointvec_lockless().
sysctl: Add proc_douintvec_lockless().
sysctl: Add proc_dointvec_minmax_lockless().
sysctl: Add proc_douintvec_minmax_lockless().
sysctl: Add proc_doulongvec_minmax_lockless().
sysctl: Add proc_dointvec_jiffies_lockless().
tcp: Fix a data-race around sysctl_tcp_max_orphans.
inetpeer: Fix data-races around sysctl.
net: Fix a data-race around sysctl_mem.
tcp: Mark sysctl_tcp_low_latency obsolete.
cipso: Fix a data-race around cipso_v4_cache_bucketsize.
cipso: Fix data-races around boolean sysctl.
icmp: Fix data-races around sysctl.
ipv4: Fix a data-race around sysctl_fib_sync_mem.

Documentation/networking/ip-sysctl.rst | 2 +-
include/linux/sysctl.h | 51 ++---
include/net/sock.h | 2 +-
include/trace/events/sock.h | 6 +-
kernel/sysctl.c | 258 ++++++++++++++-----------
net/decnet/sysctl_net_decnet.c | 2 +-
net/ipv4/cipso_ipv4.c | 19 +-
net/ipv4/fib_trie.c | 2 +-
net/ipv4/icmp.c | 5 +-
net/ipv4/inetpeer.c | 13 +-
net/ipv4/sysctl_net_ipv4.c | 29 +--
net/ipv4/tcp.c | 3 +-
net/sctp/sysctl.c | 2 +-
13 files changed, 214 insertions(+), 180 deletions(-)

--
2.30.2


2022-07-06 06:00:45

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v1 net 11/16] net: Fix a data-race around sysctl_mem.

While reading .sysctl_mem, it can be changed concurrently. So, we need to
add READ_ONCE(). Then we can set proc_doulongvec_minmax_lockless() as the
handler to mark it safe.

Fixes: 3847ce32aea9 ("core: add tracepoints for queueing skb to rcvbuf")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
CC: Satoru Moriya <[email protected]>
CC: Steven Rostedt <[email protected]>
---
include/net/sock.h | 2 +-
include/trace/events/sock.h | 6 +++---
net/decnet/sysctl_net_decnet.c | 2 +-
net/ipv4/sysctl_net_ipv4.c | 4 ++--
net/sctp/sysctl.c | 2 +-
5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 72ca97ccb460..9fa54762e077 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1529,7 +1529,7 @@ void __sk_mem_reclaim(struct sock *sk, int amount);
/* sysctl_mem values are in pages, we convert them in SK_MEM_QUANTUM units */
static inline long sk_prot_mem_limits(const struct sock *sk, int index)
{
- long val = sk->sk_prot->sysctl_mem[index];
+ long val = READ_ONCE(sk->sk_prot->sysctl_mem[index]);

#if PAGE_SIZE > SK_MEM_QUANTUM
val <<= PAGE_SHIFT - SK_MEM_QUANTUM_SHIFT;
diff --git a/include/trace/events/sock.h b/include/trace/events/sock.h
index 12c315782766..3c36c2812782 100644
--- a/include/trace/events/sock.h
+++ b/include/trace/events/sock.h
@@ -122,9 +122,9 @@ TRACE_EVENT(sock_exceed_buf_limit,

TP_printk("proto:%s sysctl_mem=%ld,%ld,%ld allocated=%ld sysctl_rmem=%d rmem_alloc=%d sysctl_wmem=%d wmem_alloc=%d wmem_queued=%d kind=%s",
__entry->name,
- __entry->sysctl_mem[0],
- __entry->sysctl_mem[1],
- __entry->sysctl_mem[2],
+ READ_ONCE(__entry->sysctl_mem[0]),
+ READ_ONCE(__entry->sysctl_mem[1]),
+ READ_ONCE(__entry->sysctl_mem[2]),
__entry->allocated,
__entry->sysctl_rmem,
__entry->rmem_alloc,
diff --git a/net/decnet/sysctl_net_decnet.c b/net/decnet/sysctl_net_decnet.c
index 67b5ab2657b7..e7e658f1ba67 100644
--- a/net/decnet/sysctl_net_decnet.c
+++ b/net/decnet/sysctl_net_decnet.c
@@ -315,7 +315,7 @@ static struct ctl_table dn_table[] = {
.data = &sysctl_decnet_mem,
.maxlen = sizeof(sysctl_decnet_mem),
.mode = 0644,
- .proc_handler = proc_doulongvec_minmax
+ .proc_handler = proc_doulongvec_minmax_lockless,
},
{
.procname = "decnet_rmem",
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index eea11218a663..b14931ca5c85 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -504,7 +504,7 @@ static struct ctl_table ipv4_table[] = {
.maxlen = sizeof(sysctl_tcp_mem),
.data = &sysctl_tcp_mem,
.mode = 0644,
- .proc_handler = proc_doulongvec_minmax,
+ .proc_handler = proc_doulongvec_minmax_lockless,
},
{
.procname = "tcp_low_latency",
@@ -570,7 +570,7 @@ static struct ctl_table ipv4_table[] = {
.data = &sysctl_udp_mem,
.maxlen = sizeof(sysctl_udp_mem),
.mode = 0644,
- .proc_handler = proc_doulongvec_minmax,
+ .proc_handler = proc_doulongvec_minmax_lockless,
},
{
.procname = "fib_sync_mem",
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index b46a416787ec..fa79bf4059d1 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -64,7 +64,7 @@ static struct ctl_table sctp_table[] = {
.data = &sysctl_sctp_mem,
.maxlen = sizeof(sysctl_sctp_mem),
.mode = 0644,
- .proc_handler = proc_doulongvec_minmax
+ .proc_handler = proc_doulongvec_minmax_lockless,
},
{
.procname = "sctp_rmem",
--
2.30.2

2022-07-06 06:01:17

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v1 net 09/16] tcp: Fix a data-race around sysctl_tcp_max_orphans.

While reading sysctl_tcp_max_orphans, it can be changed concurrently. So,
we need to add READ_ONCE(). Then we can set proc_dointvec_lockless() as
the handler to mark it safe.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
net/ipv4/sysctl_net_ipv4.c | 2 +-
net/ipv4/tcp.c | 3 ++-
2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index cd448cdd3b38..aa5adf136556 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -476,7 +476,7 @@ static struct ctl_table ipv4_table[] = {
.data = &sysctl_tcp_max_orphans,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec
+ .proc_handler = proc_dointvec_lockless,
},
{
.procname = "inet_peer_threshold",
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 028513d3e2a2..2222dfdde316 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2715,7 +2715,8 @@ static void tcp_orphan_update(struct timer_list *unused)

static bool tcp_too_many_orphans(int shift)
{
- return READ_ONCE(tcp_orphan_cache) << shift > sysctl_tcp_max_orphans;
+ return READ_ONCE(tcp_orphan_cache) << shift >
+ READ_ONCE(sysctl_tcp_max_orphans);
}

bool tcp_check_oom(struct sock *sk, int shift)
--
2.30.2

2022-07-06 06:01:44

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v1 net 04/16] sysctl: Add proc_douintvec_lockless().

A sysctl variable is accessed concurrently, and there is always a chance of
data-race. So, all readers and writers need some basic protection to avoid
load/store-tearing.

This patch changes proc_douintvec() to use READ_ONCE()/WRITE_ONCE()
internally to fix a data-race on the sysctl side. For now,
proc_douintvec() itself is tolerant to a data-race, but we still need to
add annotations on the other subsystem's side.

In case we miss such fixes, this patch converts proc_douintvec() to a
wrapper of proc_douintvec_lockless(). When we fix a data-race in the other
subsystem, we can explicitly set it as a handler.

Also, this patch removes proc_douintvec()'s document and adds
proc_douintvec_lockless()'s one so that no one will use proc_douintvec()
anymore.

Fixes: e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32 fields")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
CC: Subash Abhinov Kasiviswanathan <[email protected]>
---
include/linux/sysctl.h | 1 +
kernel/sysctl.c | 20 +++++++++++++++-----
2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index cb87919b5508..770ee1833c25 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -85,6 +85,7 @@ PROC_HANDLER(proc_do_static_key);

PROC_HANDLER(proc_dobool_lockless);
PROC_HANDLER(proc_dointvec_lockless);
+PROC_HANDLER(proc_douintvec_lockless);

/*
* Register a set of sysctl names by calling register_sysctl_table
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 50d9b78aa0b3..be8a7d912180 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -474,9 +474,11 @@ static int do_proc_douintvec_conv(unsigned long *lvalp,
if (write) {
if (*lvalp > UINT_MAX)
return -EINVAL;
- *valp = *lvalp;
+
+ WRITE_ONCE(*valp, *lvalp);
} else {
- unsigned int val = *valp;
+ unsigned int val = READ_ONCE(*valp);
+
*lvalp = (unsigned long)val;
}
return 0;
@@ -775,7 +777,7 @@ static int proc_dointvec_minmax_warn_RT_change(struct ctl_table *table,
#endif

/**
- * proc_douintvec - read a vector of unsigned integers
+ * proc_douintvec_lockless - read/write a vector of unsigned integers locklessly
* @table: the sysctl table
* @write: %TRUE if this is a write to the sysctl file
* @buffer: the user buffer
@@ -787,13 +789,19 @@ static int proc_dointvec_minmax_warn_RT_change(struct ctl_table *table,
*
* Returns 0 on success.
*/
-int proc_douintvec(struct ctl_table *table, int write, void *buffer,
- size_t *lenp, loff_t *ppos)
+int proc_douintvec_lockless(struct ctl_table *table, int write, void *buffer,
+ size_t *lenp, loff_t *ppos)
{
return do_proc_douintvec(table, write, buffer, lenp, ppos,
do_proc_douintvec_conv, NULL);
}

+int proc_douintvec(struct ctl_table *table, int write, void *buffer,
+ size_t *lenp, loff_t *ppos)
+{
+ return proc_douintvec_lockless(table, write, buffer, lenp, ppos);
+}
+
/*
* Taint values can only be increased
* This means we can safely use a temporary.
@@ -1513,6 +1521,7 @@ PROC_HANDLER_ENOSYS(proc_do_large_bitmap);

PROC_HANDLER_ENOSYS(proc_dobool_lockless);
PROC_HANDLER_ENOSYS(proc_dointvec_lockless);
+PROC_HANDLER_ENOSYS(proc_douintvec_lockless);

#endif /* CONFIG_PROC_SYSCTL */

@@ -2425,3 +2434,4 @@ EXPORT_SYMBOL(proc_do_large_bitmap);

EXPORT_SYMBOL(proc_dobool_lockless);
EXPORT_SYMBOL(proc_dointvec_lockless);
+EXPORT_SYMBOL(proc_douintvec_lockless);
--
2.30.2

2022-07-06 06:28:52

by Kuniyuki Iwashima

[permalink] [raw]
Subject: [PATCH v1 net 15/16] icmp: Fix data-races around sysctl.

While reading sysctl variables, it can be changed concurrently. So, we
need to add READ_ONCE(). Then we can set proc_dointvec_minmax_lockless()
as the handler to mark it safe.

Fixes: 4cdf507d5452 ("icmp: add a global rate limitation")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
---
net/ipv4/icmp.c | 5 +++--
net/ipv4/sysctl_net_ipv4.c | 4 ++--
2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index efea0e796f06..0f9e61d29f73 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -253,11 +253,12 @@ bool icmp_global_allow(void)
spin_lock(&icmp_global.lock);
delta = min_t(u32, now - icmp_global.stamp, HZ);
if (delta >= HZ / 50) {
- incr = sysctl_icmp_msgs_per_sec * delta / HZ ;
+ incr = READ_ONCE(sysctl_icmp_msgs_per_sec) * delta / HZ;
if (incr)
WRITE_ONCE(icmp_global.stamp, now);
}
- credit = min_t(u32, icmp_global.credit + incr, sysctl_icmp_msgs_burst);
+ credit = min_t(u32, icmp_global.credit + incr,
+ READ_ONCE(sysctl_icmp_msgs_burst));
if (credit) {
/* We want to use a credit of one in average, but need to randomize
* it for security reasons.
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 706795a3b369..3b1d18be0857 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -555,7 +555,7 @@ static struct ctl_table ipv4_table[] = {
.data = &sysctl_icmp_msgs_per_sec,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec_minmax,
+ .proc_handler = proc_dointvec_minmax_lockless,
.extra1 = SYSCTL_ZERO,
},
{
@@ -563,7 +563,7 @@ static struct ctl_table ipv4_table[] = {
.data = &sysctl_icmp_msgs_burst,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec_minmax,
+ .proc_handler = proc_dointvec_minmax_lockless,
.extra1 = SYSCTL_ZERO,
},
{
--
2.30.2

2022-07-06 13:24:28

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v1 net 11/16] net: Fix a data-race around sysctl_mem.

On Tue, 5 Jul 2022 22:21:25 -0700
Kuniyuki Iwashima <[email protected]> wrote:

> --- a/include/trace/events/sock.h
> +++ b/include/trace/events/sock.h
> @@ -122,9 +122,9 @@ TRACE_EVENT(sock_exceed_buf_limit,
>
> TP_printk("proto:%s sysctl_mem=%ld,%ld,%ld allocated=%ld sysctl_rmem=%d rmem_alloc=%d sysctl_wmem=%d wmem_alloc=%d wmem_queued=%d kind=%s",
> __entry->name,
> - __entry->sysctl_mem[0],
> - __entry->sysctl_mem[1],
> - __entry->sysctl_mem[2],
> + READ_ONCE(__entry->sysctl_mem[0]),
> + READ_ONCE(__entry->sysctl_mem[1]),
> + READ_ONCE(__entry->sysctl_mem[2]),

This is not reading anything to do with sysctl. It's reading the content of
what was recorded in the ring buffer.

That is, the READ_ONCE() here is not necessary, and if anything will break
user space parsing, as this is exported to user space to tell it how to
read the binary format in the ring buffer.

-- Steve


> __entry->allocated,
> __entry->sysctl_rmem,
> __entry->rmem_alloc,

2022-07-06 13:35:17

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v1 net 11/16] net: Fix a data-race around sysctl_mem.

On Wed, 6 Jul 2022 09:17:07 -0400
Steven Rostedt <[email protected]> wrote:

> On Tue, 5 Jul 2022 22:21:25 -0700
> Kuniyuki Iwashima <[email protected]> wrote:
>
> > --- a/include/trace/events/sock.h
> > +++ b/include/trace/events/sock.h
> > @@ -122,9 +122,9 @@ TRACE_EVENT(sock_exceed_buf_limit,
> >
> > TP_printk("proto:%s sysctl_mem=%ld,%ld,%ld allocated=%ld sysctl_rmem=%d rmem_alloc=%d sysctl_wmem=%d wmem_alloc=%d wmem_queued=%d kind=%s",
> > __entry->name,
> > - __entry->sysctl_mem[0],
> > - __entry->sysctl_mem[1],
> > - __entry->sysctl_mem[2],
> > + READ_ONCE(__entry->sysctl_mem[0]),
> > + READ_ONCE(__entry->sysctl_mem[1]),
> > + READ_ONCE(__entry->sysctl_mem[2]),
>
> This is not reading anything to do with sysctl. It's reading the content of
> what was recorded in the ring buffer.
>
> That is, the READ_ONCE() here is not necessary, and if anything will break
> user space parsing, as this is exported to user space to tell it how to
> read the binary format in the ring buffer.

I take that back. Looking at the actual trace event, it is pointing to
sysctl memory, which is a major bug.

TRACE_EVENT(sock_exceed_buf_limit,

TP_PROTO(struct sock *sk, struct proto *prot, long allocated, int kind),

TP_ARGS(sk, prot, allocated, kind),

TP_STRUCT__entry(
__array(char, name, 32)
__field(long *, sysctl_mem)

sysctl_mem is a pointer.

__field(long, allocated)
__field(int, sysctl_rmem)
__field(int, rmem_alloc)
__field(int, sysctl_wmem)
__field(int, wmem_alloc)
__field(int, wmem_queued)
__field(int, kind)
),

TP_fast_assign(
strncpy(__entry->name, prot->name, 32);

__entry->sysctl_mem = prot->sysctl_mem;


They save the pointer **IN THE RING BUFFER**!!!

__entry->allocated = allocated;
__entry->sysctl_rmem = sk_get_rmem0(sk, prot);
__entry->rmem_alloc = atomic_read(&sk->sk_rmem_alloc);
__entry->sysctl_wmem = sk_get_wmem0(sk, prot);
__entry->wmem_alloc = refcount_read(&sk->sk_wmem_alloc);
__entry->wmem_queued = READ_ONCE(sk->sk_wmem_queued);
__entry->kind = kind;
),

TP_printk("proto:%s sysctl_mem=%ld,%ld,%ld allocated=%ld sysctl_rmem=%d rmem_alloc=%d sysctl_wmem=%d wmem_alloc=%d wmem_queued=%d kind=%s",
__entry->name,
__entry->sysctl_mem[0],
__entry->sysctl_mem[1],
__entry->sysctl_mem[2],

They are now reading a stale pointer, which can be read at any time. That
is, you get the information of what is in sysctl_mem at the time the ring
buffer is read (which is useless from user space), and not at the time of
the event.

Thanks for pointing this out. This needs to be fixed.

-- Steve


__entry->allocated,
__entry->sysctl_rmem,
__entry->rmem_alloc,
__entry->sysctl_wmem,
__entry->wmem_alloc,
__entry->wmem_queued,
show_skmem_kind_names(__entry->kind)
)

2022-07-06 16:44:09

by Kuniyuki Iwashima

[permalink] [raw]
Subject: Re: [PATCH v1 net 11/16] net: Fix a data-race around sysctl_mem.

From: Steven Rostedt <[email protected]>
Date: Wed, 6 Jul 2022 09:27:11 -0400
> On Wed, 6 Jul 2022 09:17:07 -0400
> Steven Rostedt <[email protected]> wrote:
>
> > On Tue, 5 Jul 2022 22:21:25 -0700
> > Kuniyuki Iwashima <[email protected]> wrote:
> >
> > > --- a/include/trace/events/sock.h
> > > +++ b/include/trace/events/sock.h
> > > @@ -122,9 +122,9 @@ TRACE_EVENT(sock_exceed_buf_limit,
> > >
> > > TP_printk("proto:%s sysctl_mem=%ld,%ld,%ld allocated=%ld sysctl_rmem=%d rmem_alloc=%d sysctl_wmem=%d wmem_alloc=%d wmem_queued=%d kind=%s",
> > > __entry->name,
> > > - __entry->sysctl_mem[0],
> > > - __entry->sysctl_mem[1],
> > > - __entry->sysctl_mem[2],
> > > + READ_ONCE(__entry->sysctl_mem[0]),
> > > + READ_ONCE(__entry->sysctl_mem[1]),
> > > + READ_ONCE(__entry->sysctl_mem[2]),
> >
> > This is not reading anything to do with sysctl. It's reading the content of
> > what was recorded in the ring buffer.
> >
> > That is, the READ_ONCE() here is not necessary, and if anything will break
> > user space parsing, as this is exported to user space to tell it how to
> > read the binary format in the ring buffer.
>
> I take that back. Looking at the actual trace event, it is pointing to
> sysctl memory, which is a major bug.
>
> TRACE_EVENT(sock_exceed_buf_limit,
>
> TP_PROTO(struct sock *sk, struct proto *prot, long allocated, int kind),
>
> TP_ARGS(sk, prot, allocated, kind),
>
> TP_STRUCT__entry(
> __array(char, name, 32)
> __field(long *, sysctl_mem)
>
> sysctl_mem is a pointer.
>
> __field(long, allocated)
> __field(int, sysctl_rmem)
> __field(int, rmem_alloc)
> __field(int, sysctl_wmem)
> __field(int, wmem_alloc)
> __field(int, wmem_queued)
> __field(int, kind)
> ),
>
> TP_fast_assign(
> strncpy(__entry->name, prot->name, 32);
>
> __entry->sysctl_mem = prot->sysctl_mem;
>
>
> They save the pointer **IN THE RING BUFFER**!!!
>
> __entry->allocated = allocated;
> __entry->sysctl_rmem = sk_get_rmem0(sk, prot);
> __entry->rmem_alloc = atomic_read(&sk->sk_rmem_alloc);
> __entry->sysctl_wmem = sk_get_wmem0(sk, prot);
> __entry->wmem_alloc = refcount_read(&sk->sk_wmem_alloc);
> __entry->wmem_queued = READ_ONCE(sk->sk_wmem_queued);
> __entry->kind = kind;
> ),
>
> TP_printk("proto:%s sysctl_mem=%ld,%ld,%ld allocated=%ld sysctl_rmem=%d rmem_alloc=%d sysctl_wmem=%d wmem_alloc=%d wmem_queued=%d kind=%s",
> __entry->name,
> __entry->sysctl_mem[0],
> __entry->sysctl_mem[1],
> __entry->sysctl_mem[2],
>
> They are now reading a stale pointer, which can be read at any time. That
> is, you get the information of what is in sysctl_mem at the time the ring
> buffer is read (which is useless from user space), and not at the time of
> the event.
>
> Thanks for pointing this out. This needs to be fixed.

For the record, Steve fixed this properly here, so I'll drop the tracing
part in v2.
https://lore.kernel.org/netdev/[email protected]/