2024-05-10 14:20:15

by Breno Leitao

[permalink] [raw]
Subject: [PATCH] block: Annotate a racy read in blk_do_io_stat()

KCSAN has reported a potential data race in blk_mq subsystem where
reading the rq->flag.

BUG: KCSAN: data-race in __blk_mq_end_request / blk_mq_check_inflight

read-write to 0xffff888120514d1c of 4 bytes by interrupt on cpu 6:
__blk_mq_end_request (block/blk-mq.c:700 block/blk-mq.c:1040)
scsi_end_request (drivers/scsi/scsi_lib.c:667)
scsi_io_completion (drivers/scsi/scsi_lib.c:1068)
scsi_finish_command (drivers/scsi/scsi.c:199)
scsi_complete (drivers/scsi/scsi_lib.c:?)
blk_done_softirq (block/blk-mq.c:? block/blk-mq.c:1134)
handle_softirqs (./arch/x86/include/asm/jump_label.h:27
./include/linux/jump_label.h:207
./include/trace/events/irq.h:142 kernel/softirq.c:555)
__irq_exit_rcu (kernel/softirq.c:617 kernel/softirq.c:639)
irq_exit_rcu (kernel/softirq.c:651)
common_interrupt (arch/x86/kernel/irq.c:247)
asm_common_interrupt (./arch/x86/include/asm/idtentry.h:693)
cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291)
cpuidle_enter (drivers/cpuidle/cpuidle.c:388)
do_idle (kernel/sched/idle.c:155 kernel/sched/idle.c:236
kernel/sched/idle.c:332)
cpu_startup_entry (kernel/sched/idle.c:429)
start_secondary (arch/x86/kernel/smpboot.c:313)
common_startup_64 (arch/x86/kernel/head_64.S:421)

read to 0xffff888120514d1c of 4 bytes by task 9106 on cpu 51:
blk_mq_check_inflight (block/blk.h:356 block/blk-mq.c:94)
14:06:18 bt_iter (block/blk-mq-tag.c:292)
sbitmap_for_each_set (./include/linux/sbitmap.h:284
./include/linux/sbitmap.h:302)
blk_mq_queue_tag_busy_iter (block/blk-mq-tag.c:? block/blk-mq-tag.c:533)
blk_mq_in_flight (block/blk-mq.c:109)
diskstats_show (block/genhd.c:?)
seq_read_iter (fs/seq_file.c:?)
proc_reg_read_iter (fs/proc/inode.c:299)
vfs_read (fs/read_write.c:396 fs/read_write.c:476)
ksys_read (fs/read_write.c:619)
__x64_sys_read (fs/read_write.c:627)
x64_sys_call (arch/x86/entry/syscall_64.c:33)
do_syscall_64 (arch/x86/entry/common.c:?)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

value changed: 0x00022382 -> 0x00022182

Discussing it with Jens Axboe and Pavel Begunkov, they suggested we just
want to annotated this with data_race(), since disk statistic reading
isn't critical, and it will not be a big deal if this bit is not stable.

Suggested-by: Jens Axboe <[email protected]>
Signed-off-by: Breno Leitao <[email protected]>
---
block/blk.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/blk.h b/block/blk.h
index d9f584984bc4..57a1d73a0718 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -353,7 +353,8 @@ int blk_dev_init(void);
*/
static inline bool blk_do_io_stat(struct request *rq)
{
- return (rq->rq_flags & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
+ /* Disk stats reading isn’t critical, let it race */
+ return (data_race(rq->rq_flags) & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
}

void update_io_ticks(struct block_device *part, unsigned long now, bool end);
--
2.43.0



2024-05-10 14:29:06

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On 5/10/24 07:19, Breno Leitao wrote:
> diff --git a/block/blk.h b/block/blk.h
> index d9f584984bc4..57a1d73a0718 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -353,7 +353,8 @@ int blk_dev_init(void);
> */
> static inline bool blk_do_io_stat(struct request *rq)
> {
> - return (rq->rq_flags & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
> + /* Disk stats reading isn’t critical, let it race */
> + return (data_race(rq->rq_flags) & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
> }
>
> void update_io_ticks(struct block_device *part, unsigned long now, bool end);

Why to annotate this race with data_race() instead of READ_ONCE()? Are
there any cases in which it is better to use data_race() than
READ_ONCE()?

Thanks,

Bart.

2024-05-10 14:57:39

by Breno Leitao

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On Fri, May 10, 2024 at 07:28:41AM -0700, Bart Van Assche wrote:
> On 5/10/24 07:19, Breno Leitao wrote:
> > diff --git a/block/blk.h b/block/blk.h
> > index d9f584984bc4..57a1d73a0718 100644
> > --- a/block/blk.h
> > +++ b/block/blk.h
> > @@ -353,7 +353,8 @@ int blk_dev_init(void);
> > */
> > static inline bool blk_do_io_stat(struct request *rq)
> > {
> > - return (rq->rq_flags & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
> > + /* Disk stats reading isn’t critical, let it race */
> > + return (data_race(rq->rq_flags) & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
> > }
> > void update_io_ticks(struct block_device *part, unsigned long now, bool end);
>
> Why to annotate this race with data_race() instead of READ_ONCE()? Are
> there any cases in which it is better to use data_race() than
> READ_ONCE()?

data_race() doesn't not emit any code, but, keep KCSAN silent.
READ_ONCE()/WRITE_ONCE() emits code.

So, if you do not want to change the current behaviour, but, keep KCSAN
away, data_race() is preferred.

2024-05-10 15:41:33

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On Fri, May 10, 2024 at 07:28:41AM -0700, Bart Van Assche wrote:
> On 5/10/24 07:19, Breno Leitao wrote:
> > diff --git a/block/blk.h b/block/blk.h
> > index d9f584984bc4..57a1d73a0718 100644
> > --- a/block/blk.h
> > +++ b/block/blk.h
> > @@ -353,7 +353,8 @@ int blk_dev_init(void);
> > */
> > static inline bool blk_do_io_stat(struct request *rq)
> > {
> > - return (rq->rq_flags & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
> > + /* Disk stats reading isn’t critical, let it race */
> > + return (data_race(rq->rq_flags) & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
> > }
> > void update_io_ticks(struct block_device *part, unsigned long now, bool end);
>
> Why to annotate this race with data_race() instead of READ_ONCE()? Are
> there any cases in which it is better to use data_race() than
> READ_ONCE()?

We use this pattern quite a bit in RCU. For example, suppose that we
have a variable that is accessed only under a given lock, except that it
is also locklessly accessed for diagnostics or statistics. Then having
unmarked (normal C language) accesses under the lock and data_race()
for that statistics enables KCSAN to flag other (buggy) lockless accesses.

Thanx, Paul

2024-05-10 16:21:27

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On 5/10/24 8:41 AM, Paul E. McKenney wrote:
> On Fri, May 10, 2024 at 07:28:41AM -0700, Bart Van Assche wrote:
>> On 5/10/24 07:19, Breno Leitao wrote:
>>> diff --git a/block/blk.h b/block/blk.h
>>> index d9f584984bc4..57a1d73a0718 100644
>>> --- a/block/blk.h
>>> +++ b/block/blk.h
>>> @@ -353,7 +353,8 @@ int blk_dev_init(void);
>>> */
>>> static inline bool blk_do_io_stat(struct request *rq)
>>> {
>>> - return (rq->rq_flags & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
>>> + /* Disk stats reading isn’t critical, let it race */
>>> + return (data_race(rq->rq_flags) & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
>>> }
>>> void update_io_ticks(struct block_device *part, unsigned long now, bool end);
>>
>> Why to annotate this race with data_race() instead of READ_ONCE()? Are
>> there any cases in which it is better to use data_race() than
>> READ_ONCE()?
>
> We use this pattern quite a bit in RCU. For example, suppose that we
> have a variable that is accessed only under a given lock, except that it
> is also locklessly accessed for diagnostics or statistics. Then having
> unmarked (normal C language) accesses under the lock and data_race()
> for that statistics enables KCSAN to flag other (buggy) lockless accesses.

Can using data_race() instead of READ_ONCE() result in incorrect code
generation, e.g. the compiler emitting a read twice and reading two
different values?

Thanks,

Bart.


2024-05-10 17:08:25

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On Fri, May 10, 2024 at 09:20:58AM -0700, Bart Van Assche wrote:
> On 5/10/24 8:41 AM, Paul E. McKenney wrote:
> > On Fri, May 10, 2024 at 07:28:41AM -0700, Bart Van Assche wrote:
> > > On 5/10/24 07:19, Breno Leitao wrote:
> > > > diff --git a/block/blk.h b/block/blk.h
> > > > index d9f584984bc4..57a1d73a0718 100644
> > > > --- a/block/blk.h
> > > > +++ b/block/blk.h
> > > > @@ -353,7 +353,8 @@ int blk_dev_init(void);
> > > > */
> > > > static inline bool blk_do_io_stat(struct request *rq)
> > > > {
> > > > - return (rq->rq_flags & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
> > > > + /* Disk stats reading isn’t critical, let it race */
> > > > + return (data_race(rq->rq_flags) & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
> > > > }
> > > > void update_io_ticks(struct block_device *part, unsigned long now, bool end);
> > >
> > > Why to annotate this race with data_race() instead of READ_ONCE()? Are
> > > there any cases in which it is better to use data_race() than
> > > READ_ONCE()?
> >
> > We use this pattern quite a bit in RCU. For example, suppose that we
> > have a variable that is accessed only under a given lock, except that it
> > is also locklessly accessed for diagnostics or statistics. Then having
> > unmarked (normal C language) accesses under the lock and data_race()
> > for that statistics enables KCSAN to flag other (buggy) lockless accesses.
>
> Can using data_race() instead of READ_ONCE() result in incorrect code
> generation, e.g. the compiler emitting a read twice and reading two
> different values?

It could.

And if that was a big enough problem, you might want READ_ONCE() there.
The cases in Linux-kernel RCU involve quantities that rarely change,
so even if the compiler does something counterproductive, the odds of
output being affected are low.

So why not just always use READ_ONCE() for debugging/statistical accesses?

To see that, consider a variable that is supposed to be accessed only
under a lock (aside from the debugging/statistical access). Under RCU's
KCSAN rules, marking those debugging/statistical accesses with READ_ONCE()
would require all the updates to be marked with WRITE_ONCE(). Which would
prevent KCSAN from noticing a buggy lockless WRITE_ONCE() update of
that variable.

In contrast, if we use data_race() for the debugging/statistical accesses
and leave the normal lock-protected accesses unmarked (as normal
C-language accesses), then KCSAN will complain about buggy lockless
accesses, even if they are marked with READ_ONCE() or WRITE_ONCE().

Does that help, or am I missing your point?

Thanx, Paul

2024-05-10 20:30:25

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On 5/10/24 10:08 AM, Paul E. McKenney wrote:
> To see that, consider a variable that is supposed to be accessed only
> under a lock (aside from the debugging/statistical access). Under RCU's
> KCSAN rules, marking those debugging/statistical accesses with READ_ONCE()
> would require all the updates to be marked with WRITE_ONCE(). Which would
> prevent KCSAN from noticing a buggy lockless WRITE_ONCE() update of
> that variable.
>
> In contrast, if we use data_race() for the debugging/statistical accesses
> and leave the normal lock-protected accesses unmarked (as normal
> C-language accesses), then KCSAN will complain about buggy lockless
> accesses, even if they are marked with READ_ONCE() or WRITE_ONCE().
>
> Does that help, or am I missing your point?

Thanks, that's very helpful. Has it been considered to add this
explanation as a comment above the data_race() macro definition?
There may be other kernel developers who are wondering about when
to use data_race() and when to use READ_ONCE().

Thanks,

Bart.



2024-05-10 22:35:45

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On Fri, May 10, 2024 at 01:30:03PM -0700, Bart Van Assche wrote:
> On 5/10/24 10:08 AM, Paul E. McKenney wrote:
> > To see that, consider a variable that is supposed to be accessed only
> > under a lock (aside from the debugging/statistical access). Under RCU's
> > KCSAN rules, marking those debugging/statistical accesses with READ_ONCE()
> > would require all the updates to be marked with WRITE_ONCE(). Which would
> > prevent KCSAN from noticing a buggy lockless WRITE_ONCE() update of
> > that variable.
> >
> > In contrast, if we use data_race() for the debugging/statistical accesses
> > and leave the normal lock-protected accesses unmarked (as normal
> > C-language accesses), then KCSAN will complain about buggy lockless
> > accesses, even if they are marked with READ_ONCE() or WRITE_ONCE().
> >
> > Does that help, or am I missing your point?
>
> Thanks, that's very helpful. Has it been considered to add this
> explanation as a comment above the data_race() macro definition?
> There may be other kernel developers who are wondering about when
> to use data_race() and when to use READ_ONCE().

Well, sometimes you need to use both!

Does the prototype patch below help?

(Also adding Marco on CC for his thoughts.)

Thanx, Paul

------------------------------------------------------------------------

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index c00cc6c0878a1..78593b40fe7e9 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -194,9 +194,18 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
* This data_race() macro is useful for situations in which data races
* should be forgiven. One example is diagnostic code that accesses
* shared variables but is not a part of the core synchronization design.
+ * For example, if accesses to a given variable are protected by a lock,
+ * except for diagnostic code, then the accesses under the lock should
+ * be plain C-language accesses and those in the diagnostic code should
+ * use data_race(). This way, KCSAN will complain if buggy lockless
+ * accesses to that variable are introduced, even if the buggy accesses
+ * are protected by READ_ONCE() or WRITE_ONCE().
+ *
+ * This macro *does not* affect normal code generation, but is a hint to
+ * tooling that data races here are to be ignored. If code generation must
+ * be protected *and* KCSAN should ignore the access, use both data_race()
+ * and READ_ONCE(), for example, data_race(READ_ONCE(x)).
*
- * This macro *does not* affect normal code generation, but is a hint
- * to tooling that data races here are to be ignored.
*/
#define data_race(expr) \
({ \

2024-05-10 23:29:02

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On 5/10/24 3:35 PM, Paul E. McKenney wrote:
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index c00cc6c0878a1..78593b40fe7e9 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -194,9 +194,18 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> * This data_race() macro is useful for situations in which data races
> * should be forgiven. One example is diagnostic code that accesses
> * shared variables but is not a part of the core synchronization design.
> + * For example, if accesses to a given variable are protected by a lock,
> + * except for diagnostic code, then the accesses under the lock should
> + * be plain C-language accesses and those in the diagnostic code should
> + * use data_race(). This way, KCSAN will complain if buggy lockless
> + * accesses to that variable are introduced, even if the buggy accesses
> + * are protected by READ_ONCE() or WRITE_ONCE().
> + *
> + * This macro *does not* affect normal code generation, but is a hint to
> + * tooling that data races here are to be ignored. If code generation must
> + * be protected *and* KCSAN should ignore the access, use both data_race()
> + * and READ_ONCE(), for example, data_race(READ_ONCE(x)).
> *
> - * This macro *does not* affect normal code generation, but is a hint
> - * to tooling that data races here are to be ignored.
> */

This patch changes the end of the comment from "*/" into "*\n*/".
That's probably unintended? Otherwise this patch looks good to me.

Thanks,

Bart.


2024-05-11 00:41:14

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On Fri, May 10, 2024 at 04:22:58PM -0700, Bart Van Assche wrote:
> On 5/10/24 3:35 PM, Paul E. McKenney wrote:
> > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > index c00cc6c0878a1..78593b40fe7e9 100644
> > --- a/include/linux/compiler.h
> > +++ b/include/linux/compiler.h
> > @@ -194,9 +194,18 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> > * This data_race() macro is useful for situations in which data races
> > * should be forgiven. One example is diagnostic code that accesses
> > * shared variables but is not a part of the core synchronization design.
> > + * For example, if accesses to a given variable are protected by a lock,
> > + * except for diagnostic code, then the accesses under the lock should
> > + * be plain C-language accesses and those in the diagnostic code should
> > + * use data_race(). This way, KCSAN will complain if buggy lockless
> > + * accesses to that variable are introduced, even if the buggy accesses
> > + * are protected by READ_ONCE() or WRITE_ONCE().
> > + *
> > + * This macro *does not* affect normal code generation, but is a hint to
> > + * tooling that data races here are to be ignored. If code generation must
> > + * be protected *and* KCSAN should ignore the access, use both data_race()
> > + * and READ_ONCE(), for example, data_race(READ_ONCE(x)).
> > *
> > - * This macro *does not* affect normal code generation, but is a hint
> > - * to tooling that data races here are to be ignored.
> > */
>
> This patch changes the end of the comment from "*/" into "*\n*/".
> That's probably unintended? Otherwise this patch looks good to me.

Good eyes, thank you!

How about like this instead?

Thanx, Paul

------------------------------------------------------------------------

commit 930cb5f711443d8044e88080ee21b0a5edda33bd
Author: Paul E. McKenney <[email protected]>
Date: Fri May 10 15:36:57 2024 -0700

kcsan: Add example to data_race() kerneldoc header

Although the data_race() kerneldoc header accurately states what it does,
some of the implications and usage patterns are non-obvious. Therefore,
add a brief locking example and also state how to have KCSAN ignore
accesses while also preventing the compiler from folding, spindling,
or otherwise mutilating the access.

[ paulmck: Apply Bart Van Assche feedback. ]

Reported-by: Bart Van Assche <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: Breno Leitao <[email protected]>
Cc: Jens Axboe <[email protected]>

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index c00cc6c0878a1..9249768ec7a26 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -194,9 +194,17 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
* This data_race() macro is useful for situations in which data races
* should be forgiven. One example is diagnostic code that accesses
* shared variables but is not a part of the core synchronization design.
+ * For example, if accesses to a given variable are protected by a lock,
+ * except for diagnostic code, then the accesses under the lock should
+ * be plain C-language accesses and those in the diagnostic code should
+ * use data_race(). This way, KCSAN will complain if buggy lockless
+ * accesses to that variable are introduced, even if the buggy accesses
+ * are protected by READ_ONCE() or WRITE_ONCE().
*
- * This macro *does not* affect normal code generation, but is a hint
- * to tooling that data races here are to be ignored.
+ * This macro *does not* affect normal code generation, but is a hint to
+ * tooling that data races here are to be ignored. If code generation must
+ * be protected *and* KCSAN should ignore the access, use both data_race()
+ * and READ_ONCE(), for example, data_race(READ_ONCE(x)).
*/
#define data_race(expr) \
({ \

2024-05-13 08:14:37

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On Sat, 11 May 2024 at 02:41, Paul E. McKenney <[email protected]> wrote:
[...]
> ------------------------------------------------------------------------
>
> commit 930cb5f711443d8044e88080ee21b0a5edda33bd
> Author: Paul E. McKenney <[email protected]>
> Date: Fri May 10 15:36:57 2024 -0700
>
> kcsan: Add example to data_race() kerneldoc header
>
> Although the data_race() kerneldoc header accurately states what it does,
> some of the implications and usage patterns are non-obvious. Therefore,
> add a brief locking example and also state how to have KCSAN ignore
> accesses while also preventing the compiler from folding, spindling,
> or otherwise mutilating the access.
>
> [ paulmck: Apply Bart Van Assche feedback. ]
>
> Reported-by: Bart Van Assche <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
> Cc: Marco Elver <[email protected]>
> Cc: Breno Leitao <[email protected]>
> Cc: Jens Axboe <[email protected]>
>
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index c00cc6c0878a1..9249768ec7a26 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -194,9 +194,17 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> * This data_race() macro is useful for situations in which data races
> * should be forgiven. One example is diagnostic code that accesses
> * shared variables but is not a part of the core synchronization design.
> + * For example, if accesses to a given variable are protected by a lock,
> + * except for diagnostic code, then the accesses under the lock should
> + * be plain C-language accesses and those in the diagnostic code should
> + * use data_race(). This way, KCSAN will complain if buggy lockless
> + * accesses to that variable are introduced, even if the buggy accesses
> + * are protected by READ_ONCE() or WRITE_ONCE().
> *
> - * This macro *does not* affect normal code generation, but is a hint
> - * to tooling that data races here are to be ignored.
> + * This macro *does not* affect normal code generation, but is a hint to
> + * tooling that data races here are to be ignored. If code generation must
> + * be protected *and* KCSAN should ignore the access, use both data_race()

"code generation must be protected" seems ambiguous, because
protecting code generation could mean a lot of different things to
different people.

The more precise thing would be to write that "If the access must be
atomic *and* KCSAN should ignore the access, use both ...".

I've also had trouble in the past referring to "miscompilation" or
similar, because that also entirely depends on the promised vs.
expected semantics: if the code in question assumes for the access to
be atomic, the compiler compiling the code in a way that the access is
no longer atomic would be a "miscompilation". Although is it still a
"miscompilation" if the compiler generated code according to specified
language semantics (say according to our own LKMM) - and that's where
opinions can diverge because it depends on which side we stand
(compiler vs. our code).

> + * and READ_ONCE(), for example, data_race(READ_ONCE(x)).

Having more documentation sounds good to me, thanks for adding!

This extra bit of documentation also exists in a longer form in
access-marking.txt, correct? I wonder how it would be possible to
refer to it, in case the reader wants to learn even more.

Thanks,
-- Marco

2024-05-14 23:47:43

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On Mon, May 13, 2024 at 10:13:49AM +0200, Marco Elver wrote:
> On Sat, 11 May 2024 at 02:41, Paul E. McKenney <[email protected]> wrote:
> [...]
> > ------------------------------------------------------------------------
> >
> > commit 930cb5f711443d8044e88080ee21b0a5edda33bd
> > Author: Paul E. McKenney <[email protected]>
> > Date: Fri May 10 15:36:57 2024 -0700
> >
> > kcsan: Add example to data_race() kerneldoc header
> >
> > Although the data_race() kerneldoc header accurately states what it does,
> > some of the implications and usage patterns are non-obvious. Therefore,
> > add a brief locking example and also state how to have KCSAN ignore
> > accesses while also preventing the compiler from folding, spindling,
> > or otherwise mutilating the access.
> >
> > [ paulmck: Apply Bart Van Assche feedback. ]
> >
> > Reported-by: Bart Van Assche <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> > Cc: Marco Elver <[email protected]>
> > Cc: Breno Leitao <[email protected]>
> > Cc: Jens Axboe <[email protected]>
> >
> > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > index c00cc6c0878a1..9249768ec7a26 100644
> > --- a/include/linux/compiler.h
> > +++ b/include/linux/compiler.h
> > @@ -194,9 +194,17 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> > * This data_race() macro is useful for situations in which data races
> > * should be forgiven. One example is diagnostic code that accesses
> > * shared variables but is not a part of the core synchronization design.
> > + * For example, if accesses to a given variable are protected by a lock,
> > + * except for diagnostic code, then the accesses under the lock should
> > + * be plain C-language accesses and those in the diagnostic code should
> > + * use data_race(). This way, KCSAN will complain if buggy lockless
> > + * accesses to that variable are introduced, even if the buggy accesses
> > + * are protected by READ_ONCE() or WRITE_ONCE().
> > *
> > - * This macro *does not* affect normal code generation, but is a hint
> > - * to tooling that data races here are to be ignored.
> > + * This macro *does not* affect normal code generation, but is a hint to
> > + * tooling that data races here are to be ignored. If code generation must
> > + * be protected *and* KCSAN should ignore the access, use both data_race()
>
> "code generation must be protected" seems ambiguous, because
> protecting code generation could mean a lot of different things to
> different people.
>
> The more precise thing would be to write that "If the access must be
> atomic *and* KCSAN should ignore the access, use both ...".

Good point, and I took your wording, thank you.

> I've also had trouble in the past referring to "miscompilation" or
> similar, because that also entirely depends on the promised vs.
> expected semantics: if the code in question assumes for the access to
> be atomic, the compiler compiling the code in a way that the access is
> no longer atomic would be a "miscompilation". Although is it still a
> "miscompilation" if the compiler generated code according to specified
> language semantics (say according to our own LKMM) - and that's where
> opinions can diverge because it depends on which side we stand
> (compiler vs. our code).

Agreed, use of words like "miscompilation" can annoy people. What
word would you suggest using instead?

> > + * and READ_ONCE(), for example, data_race(READ_ONCE(x)).
>
> Having more documentation sounds good to me, thanks for adding!
>
> This extra bit of documentation also exists in a longer form in
> access-marking.txt, correct? I wonder how it would be possible to
> refer to it, in case the reader wants to learn even more.

Good point, especially given that I had forgotten about it.

I don't have any immediate ideas for calling attention to this file,
but would the following update be helpful?

Thanx, Paul

------------------------------------------------------------------------

diff --git a/tools/memory-model/Documentation/access-marking.txt b/tools/memory-model/Documentation/access-marking.txt
index 65778222183e3..690dd59b7ac59 100644
--- a/tools/memory-model/Documentation/access-marking.txt
+++ b/tools/memory-model/Documentation/access-marking.txt
@@ -24,6 +24,12 @@ The Linux kernel provides the following access-marking options:
4. WRITE_ONCE(), for example, "WRITE_ONCE(a, b);"
The various forms of atomic_set() also fit in here.

+5. ASSERT_EXCLUSIVE_ACCESS() and ASSERT_EXCLUSIVE_WRITER().
+
+6. The volatile keyword, for example, "int volatile a;"
+
+7. __data_racy, for example "int __data_racy a;"
+

These may be used in combination, as shown in this admittedly improbable
example:
@@ -205,6 +211,27 @@ because doing otherwise prevents KCSAN from detecting violations of your
code's synchronization rules.


+Use of volatile and __data_racy
+-------------------------------
+
+Adding the volatile keyword to the declaration of a variable causes both
+the compiler and KCSAN to treat all reads from that variable as if they
+were protected by READ_ONCE() and all writes to that variable as if they
+were protected by WRITE_ONCE().
+
+Adding the __data_racy type qualifier to the declaration of a variable
+causes KCSAN to treat all accesses to that variable as if they were
+enclosed by data_race(). However, __data_racy does not affect the
+compiler, though one could imagine hardened kernel builds treating the
+__data_racy type qualifier as if it was the volatile keyword.
+
+Note well that __data_racy is subject to the same pointer-declaration
+rules as is the volatile keyword. For example:
+
+ int __data_racy *p; // Pointer to data-racy data.
+ int *__data_racy p; // Data-racy pointer to non-data-racy data.
+
+
ACCESS-DOCUMENTATION OPTIONS
============================

@@ -342,7 +369,7 @@ as follows:

Because foo is read locklessly, all accesses are marked. The purpose
of the ASSERT_EXCLUSIVE_WRITER() is to allow KCSAN to check for a buggy
-concurrent lockless write.
+concurrent write, whether marked or not.


Lock-Protected Writes With Heuristic Lockless Reads

2024-05-15 08:05:57

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On Wed, 15 May 2024 at 01:47, Paul E. McKenney <[email protected]> wrote:
>
> On Mon, May 13, 2024 at 10:13:49AM +0200, Marco Elver wrote:
> > On Sat, 11 May 2024 at 02:41, Paul E. McKenney <[email protected]> wrote:
> > [...]
> > > ------------------------------------------------------------------------
> > >
> > > commit 930cb5f711443d8044e88080ee21b0a5edda33bd
> > > Author: Paul E. McKenney <[email protected]>
> > > Date: Fri May 10 15:36:57 2024 -0700
> > >
> > > kcsan: Add example to data_race() kerneldoc header
> > >
> > > Although the data_race() kerneldoc header accurately states what it does,
> > > some of the implications and usage patterns are non-obvious. Therefore,
> > > add a brief locking example and also state how to have KCSAN ignore
> > > accesses while also preventing the compiler from folding, spindling,
> > > or otherwise mutilating the access.
> > >
> > > [ paulmck: Apply Bart Van Assche feedback. ]
> > >
> > > Reported-by: Bart Van Assche <[email protected]>
> > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > Cc: Marco Elver <[email protected]>
> > > Cc: Breno Leitao <[email protected]>
> > > Cc: Jens Axboe <[email protected]>
> > >
> > > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > > index c00cc6c0878a1..9249768ec7a26 100644
> > > --- a/include/linux/compiler.h
> > > +++ b/include/linux/compiler.h
> > > @@ -194,9 +194,17 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> > > * This data_race() macro is useful for situations in which data races
> > > * should be forgiven. One example is diagnostic code that accesses
> > > * shared variables but is not a part of the core synchronization design.
> > > + * For example, if accesses to a given variable are protected by a lock,
> > > + * except for diagnostic code, then the accesses under the lock should
> > > + * be plain C-language accesses and those in the diagnostic code should
> > > + * use data_race(). This way, KCSAN will complain if buggy lockless
> > > + * accesses to that variable are introduced, even if the buggy accesses
> > > + * are protected by READ_ONCE() or WRITE_ONCE().
> > > *
> > > - * This macro *does not* affect normal code generation, but is a hint
> > > - * to tooling that data races here are to be ignored.
> > > + * This macro *does not* affect normal code generation, but is a hint to
> > > + * tooling that data races here are to be ignored. If code generation must
> > > + * be protected *and* KCSAN should ignore the access, use both data_race()
> >
> > "code generation must be protected" seems ambiguous, because
> > protecting code generation could mean a lot of different things to
> > different people.
> >
> > The more precise thing would be to write that "If the access must be
> > atomic *and* KCSAN should ignore the access, use both ...".
>
> Good point, and I took your wording, thank you.
>
> > I've also had trouble in the past referring to "miscompilation" or
> > similar, because that also entirely depends on the promised vs.
> > expected semantics: if the code in question assumes for the access to
> > be atomic, the compiler compiling the code in a way that the access is
> > no longer atomic would be a "miscompilation". Although is it still a
> > "miscompilation" if the compiler generated code according to specified
> > language semantics (say according to our own LKMM) - and that's where
> > opinions can diverge because it depends on which side we stand
> > (compiler vs. our code).
>
> Agreed, use of words like "miscompilation" can annoy people. What
> word would you suggest using instead?

Not sure. As suggested above, I try to just stick to "atomic" vs
"non-atomic" because that's ultimately the functional end result of
such a miscompilation. Although I've also had people be confused as in
"what atomic?! as in atomic RMW?!", but I don't know how to remove
that kind of confusion.

If, however, our intended model is the LKMM and e.g. a compiler breaks
a dependency-chain, then we could talk about miscompilation, because
the compiler violates our desired language semantics. Of course the
compiler writers then will say that we try to do things that are
outside any implemented language semantics the compiler is aware of,
so it's not a miscompilation again. So it all depends on which side
we're arguing for. Fun times.

> > > + * and READ_ONCE(), for example, data_race(READ_ONCE(x)).
> >
> > Having more documentation sounds good to me, thanks for adding!
> >
> > This extra bit of documentation also exists in a longer form in
> > access-marking.txt, correct? I wonder how it would be possible to
> > refer to it, in case the reader wants to learn even more.
>
> Good point, especially given that I had forgotten about it.
>
> I don't have any immediate ideas for calling attention to this file,
> but would the following update be helpful?

Mentioning __data_racy along with data_race() could be helpful, thank
you. See comments below.

Thanks,
-- Marco

> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/tools/memory-model/Documentation/access-marking.txt b/tools/memory-model/Documentation/access-marking.txt
> index 65778222183e3..690dd59b7ac59 100644
> --- a/tools/memory-model/Documentation/access-marking.txt
> +++ b/tools/memory-model/Documentation/access-marking.txt
> @@ -24,6 +24,12 @@ The Linux kernel provides the following access-marking options:
> 4. WRITE_ONCE(), for example, "WRITE_ONCE(a, b);"
> The various forms of atomic_set() also fit in here.
>
> +5. ASSERT_EXCLUSIVE_ACCESS() and ASSERT_EXCLUSIVE_WRITER().

Perhaps worth mentioning, but they aren't strictly access-marking
options. In the interest of simplicity could leave it out.

> +6. The volatile keyword, for example, "int volatile a;"

See below - I'm not sure we should mention volatile. It may set the
wrong example.

> +7. __data_racy, for example "int __data_racy a;"
> +
>
> These may be used in combination, as shown in this admittedly improbable
> example:
> @@ -205,6 +211,27 @@ because doing otherwise prevents KCSAN from detecting violations of your
> code's synchronization rules.
>
>
> +Use of volatile and __data_racy
> +-------------------------------
> +
> +Adding the volatile keyword to the declaration of a variable causes both
> +the compiler and KCSAN to treat all reads from that variable as if they
> +were protected by READ_ONCE() and all writes to that variable as if they
> +were protected by WRITE_ONCE().

"volatile" isn't something we encourage, right? In which case, I think
to avoid confusion we should not mention volatile. After all we have
this: Documentation/process/volatile-considered-harmful.rst

> +Adding the __data_racy type qualifier to the declaration of a variable
> +causes KCSAN to treat all accesses to that variable as if they were
> +enclosed by data_race(). However, __data_racy does not affect the
> +compiler, though one could imagine hardened kernel builds treating the
> +__data_racy type qualifier as if it was the volatile keyword.
> +
> +Note well that __data_racy is subject to the same pointer-declaration
> +rules as is the volatile keyword. For example:

To avoid referring to volatile could make it more general and say "..
rules as other type qualifiers like const and volatile".


> + int __data_racy *p; // Pointer to data-racy data.
> + int *__data_racy p; // Data-racy pointer to non-data-racy data.
> +
> +
> ACCESS-DOCUMENTATION OPTIONS
> ============================
>
> @@ -342,7 +369,7 @@ as follows:
>
> Because foo is read locklessly, all accesses are marked. The purpose
> of the ASSERT_EXCLUSIVE_WRITER() is to allow KCSAN to check for a buggy
> -concurrent lockless write.
> +concurrent write, whether marked or not.
>
>
> Lock-Protected Writes With Heuristic Lockless Reads

2024-05-15 12:48:56

by Breno Leitao

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On Wed, May 15, 2024 at 09:58:35AM +0200, Marco Elver wrote:
> On Wed, 15 May 2024 at 01:47, Paul E. McKenney <[email protected]> wrote:
> > On Mon, May 13, 2024 at 10:13:49AM +0200, Marco Elver wrote:
> > +Use of volatile and __data_racy
> > +-------------------------------
> > +
> > +Adding the volatile keyword to the declaration of a variable causes both
> > +the compiler and KCSAN to treat all reads from that variable as if they
> > +were protected by READ_ONCE() and all writes to that variable as if they
> > +were protected by WRITE_ONCE().

> "volatile" isn't something we encourage, right? In which case, I think
> to avoid confusion we should not mention volatile. After all we have
> this: Documentation/process/volatile-considered-harmful.rst

Since you mentioned this document, the other day I was reading
volatile-considered-harmful.rst document, and I was surprised that there
is no reference for READ_ONCE() primitive at all (same for WRITE_ONCE).

# grep -c READ_ONCE Documentation/process/volatile-considered-harmful.rst
0

From my perspective, READ_ONCE() is another way of doing real memory
read (volatile) when you really need, at the same time keeping the
compiler free to optimize and reuse the value that was read.

Should volatile-considered-harmful.rst be also expanded to cover
READ_ONCE()?

Thanks!

--breno

2024-05-15 13:22:02

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On Wed, 15 May 2024 at 14:48, Breno Leitao <[email protected]> wrote:
>
> On Wed, May 15, 2024 at 09:58:35AM +0200, Marco Elver wrote:
> > On Wed, 15 May 2024 at 01:47, Paul E. McKenney <[email protected]> wrote:
> > > On Mon, May 13, 2024 at 10:13:49AM +0200, Marco Elver wrote:
> > > +Use of volatile and __data_racy
> > > +-------------------------------
> > > +
> > > +Adding the volatile keyword to the declaration of a variable causes both
> > > +the compiler and KCSAN to treat all reads from that variable as if they
> > > +were protected by READ_ONCE() and all writes to that variable as if they
> > > +were protected by WRITE_ONCE().
>
> > "volatile" isn't something we encourage, right? In which case, I think
> > to avoid confusion we should not mention volatile. After all we have
> > this: Documentation/process/volatile-considered-harmful.rst
>
> Since you mentioned this document, the other day I was reading
> volatile-considered-harmful.rst document, and I was surprised that there
> is no reference for READ_ONCE() primitive at all (same for WRITE_ONCE).
>
> # grep -c READ_ONCE Documentation/process/volatile-considered-harmful.rst
> 0
>
> From my perspective, READ_ONCE() is another way of doing real memory
> read (volatile) when you really need, at the same time keeping the
> compiler free to optimize and reuse the value that was read.

The Linux kernel memory model [1] defines the semantics of READ_ONCE()
/ WRITE_ONCE(). You could say that a READ_ONCE() is something like a
"relaxed (but sometimes consume) atomic load" (in C11 lingo), and a
WRITE_ONCE() is a "relaxed atomic store". But again, not exactly
because the kernel wants to do things that are outside the C standard
and no compiler fully supports. This has fun consequences, such as
[2].

[1] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0124r6.html
[2] https://lpc.events/event/16/contributions/1174/attachments/1108/2121/Status%20Report%20-%20Broken%20Dependency%20Orderings%20in%20the%20Linux%20Kernel.pdf

To get what the kernel wants, implementing *ONCE in terms of
"volatile" works in all important cases. But we know that it can go
wrong - [2] discusses this.

The big hope is that one day, the kernel can just switch the *ONCE()
implementation to use something better but as a programmer we
shouldn't care that there's volatile underneath.

> Should volatile-considered-harmful.rst be also expanded to cover
> READ_ONCE()?

No, on most architectures READ_ONCE/WRITE_ONCE are implemented with
volatile, but that's merely an implementation detail. Once upon a
time, READ_ONCE on alpha actually implied a real barrier. On arm64
with LTO, READ_ONCE translates into an acquire-load [3] to mitigate
the risk of "volatile" actually resulting in "miscompilation"
(according to our desired semantics of READ_ONCE).

[3] https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/asm/rwonce.h#L26

Thanks,
-- Marco

2024-05-15 15:57:55

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On Wed, May 15, 2024 at 09:58:35AM +0200, Marco Elver wrote:
> On Wed, 15 May 2024 at 01:47, Paul E. McKenney <[email protected]> wrote:
> > On Mon, May 13, 2024 at 10:13:49AM +0200, Marco Elver wrote:
> > > On Sat, 11 May 2024 at 02:41, Paul E. McKenney <[email protected]> wrote:
> > > [...]
> > > > ------------------------------------------------------------------------
> > > >
> > > > commit 930cb5f711443d8044e88080ee21b0a5edda33bd
> > > > Author: Paul E. McKenney <[email protected]>
> > > > Date: Fri May 10 15:36:57 2024 -0700
> > > >
> > > > kcsan: Add example to data_race() kerneldoc header
> > > >
> > > > Although the data_race() kerneldoc header accurately states what it does,
> > > > some of the implications and usage patterns are non-obvious. Therefore,
> > > > add a brief locking example and also state how to have KCSAN ignore
> > > > accesses while also preventing the compiler from folding, spindling,
> > > > or otherwise mutilating the access.
> > > >
> > > > [ paulmck: Apply Bart Van Assche feedback. ]
> > > >
> > > > Reported-by: Bart Van Assche <[email protected]>
> > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > > Cc: Marco Elver <[email protected]>
> > > > Cc: Breno Leitao <[email protected]>
> > > > Cc: Jens Axboe <[email protected]>
> > > >
> > > > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > > > index c00cc6c0878a1..9249768ec7a26 100644
> > > > --- a/include/linux/compiler.h
> > > > +++ b/include/linux/compiler.h
> > > > @@ -194,9 +194,17 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> > > > * This data_race() macro is useful for situations in which data races
> > > > * should be forgiven. One example is diagnostic code that accesses
> > > > * shared variables but is not a part of the core synchronization design.
> > > > + * For example, if accesses to a given variable are protected by a lock,
> > > > + * except for diagnostic code, then the accesses under the lock should
> > > > + * be plain C-language accesses and those in the diagnostic code should
> > > > + * use data_race(). This way, KCSAN will complain if buggy lockless
> > > > + * accesses to that variable are introduced, even if the buggy accesses
> > > > + * are protected by READ_ONCE() or WRITE_ONCE().
> > > > *
> > > > - * This macro *does not* affect normal code generation, but is a hint
> > > > - * to tooling that data races here are to be ignored.
> > > > + * This macro *does not* affect normal code generation, but is a hint to
> > > > + * tooling that data races here are to be ignored. If code generation must
> > > > + * be protected *and* KCSAN should ignore the access, use both data_race()
> > >
> > > "code generation must be protected" seems ambiguous, because
> > > protecting code generation could mean a lot of different things to
> > > different people.
> > >
> > > The more precise thing would be to write that "If the access must be
> > > atomic *and* KCSAN should ignore the access, use both ...".
> >
> > Good point, and I took your wording, thank you.
> >
> > > I've also had trouble in the past referring to "miscompilation" or
> > > similar, because that also entirely depends on the promised vs.
> > > expected semantics: if the code in question assumes for the access to
> > > be atomic, the compiler compiling the code in a way that the access is
> > > no longer atomic would be a "miscompilation". Although is it still a
> > > "miscompilation" if the compiler generated code according to specified
> > > language semantics (say according to our own LKMM) - and that's where
> > > opinions can diverge because it depends on which side we stand
> > > (compiler vs. our code).
> >
> > Agreed, use of words like "miscompilation" can annoy people. What
> > word would you suggest using instead?
>
> Not sure. As suggested above, I try to just stick to "atomic" vs
> "non-atomic" because that's ultimately the functional end result of
> such a miscompilation. Although I've also had people be confused as in
> "what atomic?! as in atomic RMW?!", but I don't know how to remove
> that kind of confusion.
>
> If, however, our intended model is the LKMM and e.g. a compiler breaks
> a dependency-chain, then we could talk about miscompilation, because
> the compiler violates our desired language semantics. Of course the
> compiler writers then will say that we try to do things that are
> outside any implemented language semantics the compiler is aware of,
> so it's not a miscompilation again. So it all depends on which side
> we're arguing for. Fun times.

;-) ;-) ;-)

> > > > + * and READ_ONCE(), for example, data_race(READ_ONCE(x)).
> > >
> > > Having more documentation sounds good to me, thanks for adding!
> > >
> > > This extra bit of documentation also exists in a longer form in
> > > access-marking.txt, correct? I wonder how it would be possible to
> > > refer to it, in case the reader wants to learn even more.
> >
> > Good point, especially given that I had forgotten about it.
> >
> > I don't have any immediate ideas for calling attention to this file,
> > but would the following update be helpful?
>
> Mentioning __data_racy along with data_race() could be helpful, thank
> you. See comments below.

I did add a mention of it in "Linux-Kernel RCU Shared-Variable Marking"
[1], but just a mention, given that I do not expect that we will use it
within RCU.

> Thanks,
> -- Marco
>
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > diff --git a/tools/memory-model/Documentation/access-marking.txt b/tools/memory-model/Documentation/access-marking.txt
> > index 65778222183e3..690dd59b7ac59 100644
> > --- a/tools/memory-model/Documentation/access-marking.txt
> > +++ b/tools/memory-model/Documentation/access-marking.txt
> > @@ -24,6 +24,12 @@ The Linux kernel provides the following access-marking options:
> > 4. WRITE_ONCE(), for example, "WRITE_ONCE(a, b);"
> > The various forms of atomic_set() also fit in here.
> >
> > +5. ASSERT_EXCLUSIVE_ACCESS() and ASSERT_EXCLUSIVE_WRITER().
>
> Perhaps worth mentioning, but they aren't strictly access-marking
> options. In the interest of simplicity could leave it out.

Interestingly enough, they can be said to be implicitly marking other
concurrent accesses to the variable. ;-)

I believe that the do need to be mentioned more prominently, though.

Maybe a second list following this one? In that case, what do we name
the list? I suppose the other alternative would be to leave them in
this list, and change the preceding sentence to say something like
"The Linux kernel provides the following access-marking-related primitives"

Thoughts?

> > +6. The volatile keyword, for example, "int volatile a;"
>
> See below - I'm not sure we should mention volatile. It may set the
> wrong example.

Good point. I removed this item from this list.

> > +7. __data_racy, for example "int __data_racy a;"
> > +
> >
> > These may be used in combination, as shown in this admittedly improbable
> > example:
> > @@ -205,6 +211,27 @@ because doing otherwise prevents KCSAN from detecting violations of your
> > code's synchronization rules.
> >
> >
> > +Use of volatile and __data_racy
> > +-------------------------------
> > +
> > +Adding the volatile keyword to the declaration of a variable causes both
> > +the compiler and KCSAN to treat all reads from that variable as if they
> > +were protected by READ_ONCE() and all writes to that variable as if they
> > +were protected by WRITE_ONCE().
>
> "volatile" isn't something we encourage, right? In which case, I think
> to avoid confusion we should not mention volatile. After all we have
> this: Documentation/process/volatile-considered-harmful.rst

Good point, I removed this paragraph. But we do sometimes use volatile,
for example for atomic_t and jiffies. Nevertheless, agreed, we don't
want to encourage it and additions of this keyword should be subjected
to serious scrutiny.

> > +Adding the __data_racy type qualifier to the declaration of a variable
> > +causes KCSAN to treat all accesses to that variable as if they were
> > +enclosed by data_race(). However, __data_racy does not affect the
> > +compiler, though one could imagine hardened kernel builds treating the
> > +__data_racy type qualifier as if it was the volatile keyword.
> > +
> > +Note well that __data_racy is subject to the same pointer-declaration
> > +rules as is the volatile keyword. For example:
>
> To avoid referring to volatile could make it more general and say "..
> rules as other type qualifiers like const and volatile".

Very good, thank you! I happily took your wording.

Thanx, Paul

> > + int __data_racy *p; // Pointer to data-racy data.
> > + int *__data_racy p; // Data-racy pointer to non-data-racy data.
> > +
> > +
> > ACCESS-DOCUMENTATION OPTIONS
> > ============================
> >
> > @@ -342,7 +369,7 @@ as follows:
> >
> > Because foo is read locklessly, all accesses are marked. The purpose
> > of the ASSERT_EXCLUSIVE_WRITER() is to allow KCSAN to check for a buggy
> > -concurrent lockless write.
> > +concurrent write, whether marked or not.
> >
> >
> > Lock-Protected Writes With Heuristic Lockless Reads

2024-05-15 17:47:15

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On Wed, 15 May 2024 at 17:57, Paul E. McKenney <[email protected]> wrote:
>
> On Wed, May 15, 2024 at 09:58:35AM +0200, Marco Elver wrote:
> > On Wed, 15 May 2024 at 01:47, Paul E. McKenney <[email protected]> wrote:
> > > On Mon, May 13, 2024 at 10:13:49AM +0200, Marco Elver wrote:
> > > > On Sat, 11 May 2024 at 02:41, Paul E. McKenney <[email protected]> wrote:
> > > > [...]
> > > > > ------------------------------------------------------------------------
> > > > >
> > > > > commit 930cb5f711443d8044e88080ee21b0a5edda33bd
> > > > > Author: Paul E. McKenney <[email protected]>
> > > > > Date: Fri May 10 15:36:57 2024 -0700
> > > > >
> > > > > kcsan: Add example to data_race() kerneldoc header
> > > > >
> > > > > Although the data_race() kerneldoc header accurately states what it does,
> > > > > some of the implications and usage patterns are non-obvious. Therefore,
> > > > > add a brief locking example and also state how to have KCSAN ignore
> > > > > accesses while also preventing the compiler from folding, spindling,
> > > > > or otherwise mutilating the access.
> > > > >
> > > > > [ paulmck: Apply Bart Van Assche feedback. ]
> > > > >
> > > > > Reported-by: Bart Van Assche <[email protected]>
> > > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > > > Cc: Marco Elver <[email protected]>
> > > > > Cc: Breno Leitao <[email protected]>
> > > > > Cc: Jens Axboe <[email protected]>
> > > > >
> > > > > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > > > > index c00cc6c0878a1..9249768ec7a26 100644
> > > > > --- a/include/linux/compiler.h
> > > > > +++ b/include/linux/compiler.h
> > > > > @@ -194,9 +194,17 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> > > > > * This data_race() macro is useful for situations in which data races
> > > > > * should be forgiven. One example is diagnostic code that accesses
> > > > > * shared variables but is not a part of the core synchronization design.
> > > > > + * For example, if accesses to a given variable are protected by a lock,
> > > > > + * except for diagnostic code, then the accesses under the lock should
> > > > > + * be plain C-language accesses and those in the diagnostic code should
> > > > > + * use data_race(). This way, KCSAN will complain if buggy lockless
> > > > > + * accesses to that variable are introduced, even if the buggy accesses
> > > > > + * are protected by READ_ONCE() or WRITE_ONCE().
> > > > > *
> > > > > - * This macro *does not* affect normal code generation, but is a hint
> > > > > - * to tooling that data races here are to be ignored.
> > > > > + * This macro *does not* affect normal code generation, but is a hint to
> > > > > + * tooling that data races here are to be ignored. If code generation must
> > > > > + * be protected *and* KCSAN should ignore the access, use both data_race()
> > > >
> > > > "code generation must be protected" seems ambiguous, because
> > > > protecting code generation could mean a lot of different things to
> > > > different people.
> > > >
> > > > The more precise thing would be to write that "If the access must be
> > > > atomic *and* KCSAN should ignore the access, use both ...".
> > >
> > > Good point, and I took your wording, thank you.
> > >
> > > > I've also had trouble in the past referring to "miscompilation" or
> > > > similar, because that also entirely depends on the promised vs.
> > > > expected semantics: if the code in question assumes for the access to
> > > > be atomic, the compiler compiling the code in a way that the access is
> > > > no longer atomic would be a "miscompilation". Although is it still a
> > > > "miscompilation" if the compiler generated code according to specified
> > > > language semantics (say according to our own LKMM) - and that's where
> > > > opinions can diverge because it depends on which side we stand
> > > > (compiler vs. our code).
> > >
> > > Agreed, use of words like "miscompilation" can annoy people. What
> > > word would you suggest using instead?
> >
> > Not sure. As suggested above, I try to just stick to "atomic" vs
> > "non-atomic" because that's ultimately the functional end result of
> > such a miscompilation. Although I've also had people be confused as in
> > "what atomic?! as in atomic RMW?!", but I don't know how to remove
> > that kind of confusion.
> >
> > If, however, our intended model is the LKMM and e.g. a compiler breaks
> > a dependency-chain, then we could talk about miscompilation, because
> > the compiler violates our desired language semantics. Of course the
> > compiler writers then will say that we try to do things that are
> > outside any implemented language semantics the compiler is aware of,
> > so it's not a miscompilation again. So it all depends on which side
> > we're arguing for. Fun times.
>
> ;-) ;-) ;-)
>
> > > > > + * and READ_ONCE(), for example, data_race(READ_ONCE(x)).
> > > >
> > > > Having more documentation sounds good to me, thanks for adding!
> > > >
> > > > This extra bit of documentation also exists in a longer form in
> > > > access-marking.txt, correct? I wonder how it would be possible to
> > > > refer to it, in case the reader wants to learn even more.
> > >
> > > Good point, especially given that I had forgotten about it.
> > >
> > > I don't have any immediate ideas for calling attention to this file,
> > > but would the following update be helpful?
> >
> > Mentioning __data_racy along with data_race() could be helpful, thank
> > you. See comments below.
>
> I did add a mention of it in "Linux-Kernel RCU Shared-Variable Marking"
> [1], but just a mention, given that I do not expect that we will use it
> within RCU.
>
> > Thanks,
> > -- Marco
> >
> > > Thanx, Paul
> > >
> > > ------------------------------------------------------------------------
> > >
> > > diff --git a/tools/memory-model/Documentation/access-marking.txt b/tools/memory-model/Documentation/access-marking.txt
> > > index 65778222183e3..690dd59b7ac59 100644
> > > --- a/tools/memory-model/Documentation/access-marking.txt
> > > +++ b/tools/memory-model/Documentation/access-marking.txt
> > > @@ -24,6 +24,12 @@ The Linux kernel provides the following access-marking options:
> > > 4. WRITE_ONCE(), for example, "WRITE_ONCE(a, b);"
> > > The various forms of atomic_set() also fit in here.
> > >
> > > +5. ASSERT_EXCLUSIVE_ACCESS() and ASSERT_EXCLUSIVE_WRITER().
> >
> > Perhaps worth mentioning, but they aren't strictly access-marking
> > options. In the interest of simplicity could leave it out.
>
> Interestingly enough, they can be said to be implicitly marking other
> concurrent accesses to the variable. ;-)

The document starts with "guidelines for marking intentionally
concurrent normal accesses to shared memory". The ASSERT_EXCLUSIVE
macros do capture more of the concurrency rules, and perhaps they
could be seen as some kind of "negative marking" where concurrent
access should _not_ happen concurrently with these. But I'm still not
convinced it's the same kind of marking the document introduces.

I always considered them in the realm of general assertions that we
can just use to tell the tool more than can be inferred from the bits
of C code required for the functional implementation of whatever we're
doing.

> I believe that the do need to be mentioned more prominently, though.
>
> Maybe a second list following this one? In that case, what do we name
> the list? I suppose the other alternative would be to leave them in
> this list, and change the preceding sentence to say something like
> "The Linux kernel provides the following access-marking-related primitives"
>
> Thoughts?

And I just checked the current access-marking.txt to see where we
might add more, and found the section "ACCESS-DOCUMENTATION OPTIONS"
already exists. I think that section is perfectly reasonable as is,
and it does explicitly talk about ASSERT_EXCLUSIVE* macros.

Did you want to add it more prominently at the top? If so, maybe a
brief forward-reference to that section might be helpful.

2024-05-15 21:51:14

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On Wed, May 15, 2024 at 07:40:08PM +0200, Marco Elver wrote:
> On Wed, 15 May 2024 at 17:57, Paul E. McKenney <[email protected]> wrote:
> > On Wed, May 15, 2024 at 09:58:35AM +0200, Marco Elver wrote:
> > > On Wed, 15 May 2024 at 01:47, Paul E. McKenney <[email protected]> wrote:
> > > > On Mon, May 13, 2024 at 10:13:49AM +0200, Marco Elver wrote:
> > > > > On Sat, 11 May 2024 at 02:41, Paul E. McKenney <[email protected]> wrote:
> > > > > [...]
> > > > > > ------------------------------------------------------------------------
> > > > > >
> > > > > > commit 930cb5f711443d8044e88080ee21b0a5edda33bd
> > > > > > Author: Paul E. McKenney <[email protected]>
> > > > > > Date: Fri May 10 15:36:57 2024 -0700
> > > > > >
> > > > > > kcsan: Add example to data_race() kerneldoc header
> > > > > >
> > > > > > Although the data_race() kerneldoc header accurately states what it does,
> > > > > > some of the implications and usage patterns are non-obvious. Therefore,
> > > > > > add a brief locking example and also state how to have KCSAN ignore
> > > > > > accesses while also preventing the compiler from folding, spindling,
> > > > > > or otherwise mutilating the access.
> > > > > >
> > > > > > [ paulmck: Apply Bart Van Assche feedback. ]
> > > > > >
> > > > > > Reported-by: Bart Van Assche <[email protected]>
> > > > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > > > > Cc: Marco Elver <[email protected]>
> > > > > > Cc: Breno Leitao <[email protected]>
> > > > > > Cc: Jens Axboe <[email protected]>
> > > > > >
> > > > > > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > > > > > index c00cc6c0878a1..9249768ec7a26 100644
> > > > > > --- a/include/linux/compiler.h
> > > > > > +++ b/include/linux/compiler.h
> > > > > > @@ -194,9 +194,17 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
> > > > > > * This data_race() macro is useful for situations in which data races
> > > > > > * should be forgiven. One example is diagnostic code that accesses
> > > > > > * shared variables but is not a part of the core synchronization design.
> > > > > > + * For example, if accesses to a given variable are protected by a lock,
> > > > > > + * except for diagnostic code, then the accesses under the lock should
> > > > > > + * be plain C-language accesses and those in the diagnostic code should
> > > > > > + * use data_race(). This way, KCSAN will complain if buggy lockless
> > > > > > + * accesses to that variable are introduced, even if the buggy accesses
> > > > > > + * are protected by READ_ONCE() or WRITE_ONCE().
> > > > > > *
> > > > > > - * This macro *does not* affect normal code generation, but is a hint
> > > > > > - * to tooling that data races here are to be ignored.
> > > > > > + * This macro *does not* affect normal code generation, but is a hint to
> > > > > > + * tooling that data races here are to be ignored. If code generation must
> > > > > > + * be protected *and* KCSAN should ignore the access, use both data_race()
> > > > >
> > > > > "code generation must be protected" seems ambiguous, because
> > > > > protecting code generation could mean a lot of different things to
> > > > > different people.
> > > > >
> > > > > The more precise thing would be to write that "If the access must be
> > > > > atomic *and* KCSAN should ignore the access, use both ...".
> > > >
> > > > Good point, and I took your wording, thank you.
> > > >
> > > > > I've also had trouble in the past referring to "miscompilation" or
> > > > > similar, because that also entirely depends on the promised vs.
> > > > > expected semantics: if the code in question assumes for the access to
> > > > > be atomic, the compiler compiling the code in a way that the access is
> > > > > no longer atomic would be a "miscompilation". Although is it still a
> > > > > "miscompilation" if the compiler generated code according to specified
> > > > > language semantics (say according to our own LKMM) - and that's where
> > > > > opinions can diverge because it depends on which side we stand
> > > > > (compiler vs. our code).
> > > >
> > > > Agreed, use of words like "miscompilation" can annoy people. What
> > > > word would you suggest using instead?
> > >
> > > Not sure. As suggested above, I try to just stick to "atomic" vs
> > > "non-atomic" because that's ultimately the functional end result of
> > > such a miscompilation. Although I've also had people be confused as in
> > > "what atomic?! as in atomic RMW?!", but I don't know how to remove
> > > that kind of confusion.
> > >
> > > If, however, our intended model is the LKMM and e.g. a compiler breaks
> > > a dependency-chain, then we could talk about miscompilation, because
> > > the compiler violates our desired language semantics. Of course the
> > > compiler writers then will say that we try to do things that are
> > > outside any implemented language semantics the compiler is aware of,
> > > so it's not a miscompilation again. So it all depends on which side
> > > we're arguing for. Fun times.
> >
> > ;-) ;-) ;-)
> >
> > > > > > + * and READ_ONCE(), for example, data_race(READ_ONCE(x)).
> > > > >
> > > > > Having more documentation sounds good to me, thanks for adding!
> > > > >
> > > > > This extra bit of documentation also exists in a longer form in
> > > > > access-marking.txt, correct? I wonder how it would be possible to
> > > > > refer to it, in case the reader wants to learn even more.
> > > >
> > > > Good point, especially given that I had forgotten about it.
> > > >
> > > > I don't have any immediate ideas for calling attention to this file,
> > > > but would the following update be helpful?
> > >
> > > Mentioning __data_racy along with data_race() could be helpful, thank
> > > you. See comments below.
> >
> > I did add a mention of it in "Linux-Kernel RCU Shared-Variable Marking"
> > [1], but just a mention, given that I do not expect that we will use it
> > within RCU.
> >
> > > Thanks,
> > > -- Marco
> > >
> > > > Thanx, Paul
> > > >
> > > > ------------------------------------------------------------------------
> > > >
> > > > diff --git a/tools/memory-model/Documentation/access-marking.txt b/tools/memory-model/Documentation/access-marking.txt
> > > > index 65778222183e3..690dd59b7ac59 100644
> > > > --- a/tools/memory-model/Documentation/access-marking.txt
> > > > +++ b/tools/memory-model/Documentation/access-marking.txt
> > > > @@ -24,6 +24,12 @@ The Linux kernel provides the following access-marking options:
> > > > 4. WRITE_ONCE(), for example, "WRITE_ONCE(a, b);"
> > > > The various forms of atomic_set() also fit in here.
> > > >
> > > > +5. ASSERT_EXCLUSIVE_ACCESS() and ASSERT_EXCLUSIVE_WRITER().
> > >
> > > Perhaps worth mentioning, but they aren't strictly access-marking
> > > options. In the interest of simplicity could leave it out.
> >
> > Interestingly enough, they can be said to be implicitly marking other
> > concurrent accesses to the variable. ;-)
>
> The document starts with "guidelines for marking intentionally
> concurrent normal accesses to shared memory". The ASSERT_EXCLUSIVE
> macros do capture more of the concurrency rules, and perhaps they
> could be seen as some kind of "negative marking" where concurrent
> access should _not_ happen concurrently with these. But I'm still not
> convinced it's the same kind of marking the document introduces.
>
> I always considered them in the realm of general assertions that we
> can just use to tell the tool more than can be inferred from the bits
> of C code required for the functional implementation of whatever we're
> doing.
>
> > I believe that the do need to be mentioned more prominently, though.
> >
> > Maybe a second list following this one? In that case, what do we name
> > the list? I suppose the other alternative would be to leave them in
> > this list, and change the preceding sentence to say something like
> > "The Linux kernel provides the following access-marking-related primitives"
> >
> > Thoughts?
>
> And I just checked the current access-marking.txt to see where we
> might add more, and found the section "ACCESS-DOCUMENTATION OPTIONS"
> already exists. I think that section is perfectly reasonable as is,
> and it does explicitly talk about ASSERT_EXCLUSIVE* macros.
>
> Did you want to add it more prominently at the top? If so, maybe a
> brief forward-reference to that section might be helpful.

How about like this?

------------------------------------------------------------------------

The Linux kernel provides the following access-marking options:

1. Plain C-language accesses (unmarked), for example, "a = b;"

2. Data-race marking, for example, "data_race(a = b);"

3. READ_ONCE(), for example, "a = READ_ONCE(b);"
The various forms of atomic_read() also fit in here.

4. WRITE_ONCE(), for example, "WRITE_ONCE(a, b);"
The various forms of atomic_set() also fit in here.

5. __data_racy, for example "int __data_racy a;"

6. KCSAN's negative-marking assertions, ASSERT_EXCLUSIVE_ACCESS()
and ASSERT_EXCLUSIVE_WRITER(), are desccribed in the
"ACCESS-DOCUMENTATION OPTIONS" section below.

------------------------------------------------------------------------

Would that work?

Thanx, Paul

2024-05-16 06:35:49

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On Wed, 15 May 2024 at 23:51, Paul E. McKenney <[email protected]> wrote:
[...]
> > And I just checked the current access-marking.txt to see where we
> > might add more, and found the section "ACCESS-DOCUMENTATION OPTIONS"
> > already exists. I think that section is perfectly reasonable as is,
> > and it does explicitly talk about ASSERT_EXCLUSIVE* macros.
> >
> > Did you want to add it more prominently at the top? If so, maybe a
> > brief forward-reference to that section might be helpful.
>
> How about like this?
>
> ------------------------------------------------------------------------
>
> The Linux kernel provides the following access-marking options:
>
> 1. Plain C-language accesses (unmarked), for example, "a = b;"
>
> 2. Data-race marking, for example, "data_race(a = b);"
>
> 3. READ_ONCE(), for example, "a = READ_ONCE(b);"
> The various forms of atomic_read() also fit in here.
>
> 4. WRITE_ONCE(), for example, "WRITE_ONCE(a, b);"
> The various forms of atomic_set() also fit in here.
>
> 5. __data_racy, for example "int __data_racy a;"
>
> 6. KCSAN's negative-marking assertions, ASSERT_EXCLUSIVE_ACCESS()
> and ASSERT_EXCLUSIVE_WRITER(), are desccribed in the
> "ACCESS-DOCUMENTATION OPTIONS" section below.

s/desccribed/described/

> ------------------------------------------------------------------------
>
> Would that work?

It works for me, if we agree that "negative marking" makes sense: if
the other markings indicate the access is happening concurrently with
others, a negative marking does the opposite.

Thanks,
-- Marco

2024-05-20 18:05:42

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] block: Annotate a racy read in blk_do_io_stat()

On Thu, May 16, 2024 at 08:35:02AM +0200, Marco Elver wrote:
> On Wed, 15 May 2024 at 23:51, Paul E. McKenney <[email protected]> wrote:
> [...]
> > > And I just checked the current access-marking.txt to see where we
> > > might add more, and found the section "ACCESS-DOCUMENTATION OPTIONS"
> > > already exists. I think that section is perfectly reasonable as is,
> > > and it does explicitly talk about ASSERT_EXCLUSIVE* macros.
> > >
> > > Did you want to add it more prominently at the top? If so, maybe a
> > > brief forward-reference to that section might be helpful.
> >
> > How about like this?
> >
> > ------------------------------------------------------------------------
> >
> > The Linux kernel provides the following access-marking options:
> >
> > 1. Plain C-language accesses (unmarked), for example, "a = b;"
> >
> > 2. Data-race marking, for example, "data_race(a = b);"
> >
> > 3. READ_ONCE(), for example, "a = READ_ONCE(b);"
> > The various forms of atomic_read() also fit in here.
> >
> > 4. WRITE_ONCE(), for example, "WRITE_ONCE(a, b);"
> > The various forms of atomic_set() also fit in here.
> >
> > 5. __data_racy, for example "int __data_racy a;"
> >
> > 6. KCSAN's negative-marking assertions, ASSERT_EXCLUSIVE_ACCESS()
> > and ASSERT_EXCLUSIVE_WRITER(), are desccribed in the
> > "ACCESS-DOCUMENTATION OPTIONS" section below.
>
> s/desccribed/described/

Good eyes, fixed!

> > ------------------------------------------------------------------------
> >
> > Would that work?
>
> It works for me, if we agree that "negative marking" makes sense: if
> the other markings indicate the access is happening concurrently with
> others, a negative marking does the opposite.

Very good, I will send this out shortly after v6.10-rc1 comes out and
let's see where the bikeshedding leads. ;-)

Thanx, Paul