LinuxLists.cc - ipc/sem, ipc/msg, ipc/mqueue.c kcsan questions

2021-05-12 20:52:28

Subject: ipc/sem, ipc/msg, ipc/mqueue.c kcsan questions

Hi,

I got a report from kcsan for sem_lock()/sem_unlock(), but I'm fairly
certain that this is a false positive:

> [ 184.344960] BUG: KCSAN: data-race in sem_lock / sem_unlock.part.0
> [ 184.360437]
> [ 184.375443] write to 0xffff8881022fd6c0 of 4 bytes by task 1128 on
> cpu 0:
> [ 184.391192] sem_unlock.part.0+0xfa/0x118
0000000000001371 <sem_unlock.part.0>:
static inline void sem_unlock(struct sem_array *sma, int locknum)
    1464:       eb 0f                   jmp    1475
<sem_unlock.part.0+0x104>
                sma->use_global_lock--;
    1466:       e8 00 00 00 00          callq 146b
<sem_unlock.part.0+0xfa>
                        1467: R_X86_64_PLT32    __tsan_write4-0x4
    146b:       41 ff cc                dec    %r12d

> [ 184.406693] do_semtimedop+0x690/0xab3
> [ 184.422032] __x64_sys_semop+0x3e/0x43
> [ 184.437180] do_syscall_64+0x9e/0xb5
> [ 184.452125] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 184.467269]
> [ 184.482215] read to 0xffff8881022fd6c0 of 4 bytes by task 1129 on
> cpu 2:
> [ 184.497750] sem_lock+0x59/0xe0
0000000000001bbc <sem_lock>:
        if (!sma->use_global_lock) {
    1c0a:       4c 89 ef                mov    %r13,%rdi
        idx = array_index_nospec(sops->sem_num, sma->sem_nsems);
    1c0d:       0f b7 db                movzwl %bx,%ebx
        if (!sma->use_global_lock) {
    1c10:       e8 00 00 00 00          callq 1c15 <sem_lock+0x59>
                        1c11: R_X86_64_PLT32    __tsan_read4-0x4

> [ 184.513121] do_semtimedop+0x4f6/0xab3
> [ 184.528427] __x64_sys_semop+0x3e/0x43
> [ 184.543540] do_syscall_64+0x9e/0xb5
> [ 184.558473] entry_SYSCALL_64_after_hwframe+0x44/0xae

sma->use_global_lock is evaluated in sem_lock() twice:

>        /*
>          * Initial check for use_global_lock. Just an optimization,
>          * no locking, no memory barrier.
>          */
>         if (!sma->use_global_lock) {
Both sides of the if-clause handle possible data races.

Is

    if (!data_race(sma->use_global_lock)) {

the correct thing to suppress the warning?

>                 /*
>                  * It appears that no complex operation is around.
>                  * Acquire the per-semaphore lock.
>                  */
>                 spin_lock(&sem->lock);
>
>                 /* see SEM_BARRIER_1 for purpose/pairing */
>                 if (!smp_load_acquire(&sma->use_global_lock)) {
Here I would need advise: The code only checks for zero / non-zero.

This pairs with complexmode_tryleave():

>         if (sma->use_global_lock == 1) {
>
>                 /* See SEM_BARRIER_1 for purpose/pairing */
>                 smp_store_release(&sma->use_global_lock, 0);
>         } else {
>                 sma->use_global_lock--;
>         }

If use_global_lock is reduced from e.g. 6 to 5, it is undefined if a
concurrent reader sees 6 or 5. But it doesn't matter, as both values are
non-zero.

The change to 0 is protected.

What is the right way to prevent false positives from kcsan?

As 2nd question:

net/netfilter/nf_conntrack_core.c, nf_conntrack_all_lock():

Is a data_race() needed around "nf_conntrack_locks_all = true;"?

--

    Manfred

2021-05-12 20:52:28

by Paul E. McKenney

[permalink] [raw]

Subject: Re: ipc/sem, ipc/msg, ipc/mqueue.c kcsan questions

On Wed, May 12, 2021 at 09:58:18PM +0200, Manfred Spraul wrote:
> Hi,
>
> I got a report from kcsan for sem_lock()/sem_unlock(), but I'm fairly
> certain that this is a false positive:
>
> > [? 184.344960] BUG: KCSAN: data-race in sem_lock / sem_unlock.part.0
> > [? 184.360437]
> > [? 184.375443] write to 0xffff8881022fd6c0 of 4 bytes by task 1128 on
> > cpu 0:
> > [? 184.391192]? sem_unlock.part.0+0xfa/0x118
> 0000000000001371 <sem_unlock.part.0>:
> static inline void sem_unlock(struct sem_array *sma, int locknum)
> ??? 1464:?????? eb 0f?????????????????? jmp??? 1475
> <sem_unlock.part.0+0x104>
> ??????????????? sma->use_global_lock--;
> ??? 1466:?????? e8 00 00 00 00????????? callq? 146b <sem_unlock.part.0+0xfa>
> ??????????????????????? 1467: R_X86_64_PLT32??? __tsan_write4-0x4
> ??? 146b:?????? 41 ff cc??????????????? dec??? %r12d
>
> > [? 184.406693]? do_semtimedop+0x690/0xab3
> > [? 184.422032]? __x64_sys_semop+0x3e/0x43
> > [? 184.437180]? do_syscall_64+0x9e/0xb5
> > [? 184.452125]? entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [? 184.467269]
> > [? 184.482215] read to 0xffff8881022fd6c0 of 4 bytes by task 1129 on cpu
> > 2:
> > [? 184.497750]? sem_lock+0x59/0xe0
> 0000000000001bbc <sem_lock>:
> ??????? if (!sma->use_global_lock) {
> ??? 1c0a:?????? 4c 89 ef??????????????? mov??? %r13,%rdi
> ??????? idx = array_index_nospec(sops->sem_num, sma->sem_nsems);
> ??? 1c0d:?????? 0f b7 db??????????????? movzwl %bx,%ebx
> ??????? if (!sma->use_global_lock) {
> ??? 1c10:?????? e8 00 00 00 00????????? callq? 1c15 <sem_lock+0x59>
> ??????????????????????? 1c11: R_X86_64_PLT32??? __tsan_read4-0x4
>
> > [? 184.513121]? do_semtimedop+0x4f6/0xab3
> > [? 184.528427]? __x64_sys_semop+0x3e/0x43
> > [? 184.543540]? do_syscall_64+0x9e/0xb5
> > [? 184.558473]? entry_SYSCALL_64_after_hwframe+0x44/0xae
>
>
> sma->use_global_lock is evaluated in sem_lock() twice:
>
> > ?????? /*
> > ???????? * Initial check for use_global_lock. Just an optimization,
> > ???????? * no locking, no memory barrier.
> > ???????? */
> > ??????? if (!sma->use_global_lock) {
> Both sides of the if-clause handle possible data races.
>
> Is
>
> ??? if (!data_race(sma->use_global_lock)) {
>
> the correct thing to suppress the warning?

Most likely READ_ONCE() rather than data_race(), but please see
the end of this message.

> > ??????????????? /*
> > ???????????????? * It appears that no complex operation is around.
> > ???????????????? * Acquire the per-semaphore lock.
> > ???????????????? */
> > ??????????????? spin_lock(&sem->lock);
> >
> > ??????????????? /* see SEM_BARRIER_1 for purpose/pairing */
> > ??????????????? if (!smp_load_acquire(&sma->use_global_lock)) {
> Here I would need advise: The code only checks for zero / non-zero.

The smp_load_acquire() is just fine.

> This pairs with complexmode_tryleave():
>
> > ??????? if (sma->use_global_lock == 1) {
> >
> > ??????????????? /* See SEM_BARRIER_1 for purpose/pairing */
> > ??????????????? smp_store_release(&sma->use_global_lock, 0);
> > ??????? } else {
> > ??????????????? sma->use_global_lock--;
> > ??????? }
>
> If use_global_lock is reduced from e.g. 6 to 5, it is undefined if a
> concurrent reader sees 6 or 5. But it doesn't matter, as both values are
> non-zero.
>
> The change to 0 is protected.

Again, most likely a READ_ONCE() for sma->use_global_lock, but again
please see the end of this message.

The key point is that adding (or avoiding) markings is not a mechanical
process.

> What is the right way to prevent false positives from kcsan?
>
> As 2nd question:
>
> net/netfilter/nf_conntrack_core.c, nf_conntrack_all_lock():
>
> Is a data_race() needed around "nf_conntrack_locks_all = true;"?

Interesting code. The nf_conntrack_all_lock() function acquires
nf_conntrack_locks_all_lock, except that the smp_load_acquire() of
nf_conntrack_locks_all in nf_conntrack_lock() might be protected by any
of a number of locks.

In contrast, it appears that the smp_store_release()
in nf_conntrack_all_unlock() is always protected by
nf_conntrack_locks_all_lock.

Is the fact that nf_conntrack_all_lock()'s store can run concurrently
with nf_conntrack_lock() smp_load_acquire() intentional? If not, then
KCSAN is letting you know of a bug. Otherwise, WRITE_ONCE() might
be helpful, but I don't know this code, so that is just a guess.

Does tools/memory-model/Documentation/access-marking.txt, shown below,
help?

Thanx, Paul

------------------------------------------------------------------------

MARKING SHARED-MEMORY ACCESSES
==============================

This document provides guidelines for marking intentionally concurrent
normal accesses to shared memory, that is "normal" as in accesses that do
not use read-modify-write atomic operations. It also describes how to
document these accesses, both with comments and with special assertions
processed by the Kernel Concurrency Sanitizer (KCSAN). This discussion
builds on an earlier LWN article [1].

ACCESS-MARKING OPTIONS
======================

The Linux kernel provides the following access-marking options:

1. Plain C-language accesses (unmarked), for example, "a = b;"

2. Data-race marking, for example, "data_race(a = b);"

3. READ_ONCE(), for example, "a = READ_ONCE(b);"
The various forms of atomic_read() also fit in here.

4. WRITE_ONCE(), for example, "WRITE_ONCE(a, b);"
The various forms of atomic_set() also fit in here.

These may be used in combination, as shown in this admittedly improbable
example:

WRITE_ONCE(a, b + data_race(c + d) + READ_ONCE(e));

Neither plain C-language accesses nor data_race() (#1 and #2 above) place
any sort of constraint on the compiler's choice of optimizations [2].
In contrast, READ_ONCE() and WRITE_ONCE() (#3 and #4 above) restrict the
compiler's use of code-motion and common-subexpression optimizations.
Therefore, if a given access is involved in an intentional data race,
using READ_ONCE() for loads and WRITE_ONCE() for stores is usually
preferable to data_race(), which in turn is usually preferable to plain
C-language accesses.

KCSAN will complain about many types of data races involving plain
C-language accesses, but marking all accesses involved in a given data
race with one of data_race(), READ_ONCE(), or WRITE_ONCE(), will prevent
KCSAN from complaining. Of course, lack of KCSAN complaints does not
imply correct code. Therefore, please take a thoughtful approach
when responding to KCSAN complaints. Churning the code base with
ill-considered additions of data_race(), READ_ONCE(), and WRITE_ONCE()
is unhelpful.

In fact, the following sections describe situations where use of
data_race() and even plain C-language accesses is preferable to
READ_ONCE() and WRITE_ONCE().

Use of the data_race() Macro
----------------------------

Here are some situations where data_race() should be used instead of
READ_ONCE() and WRITE_ONCE():

1. Data-racy loads from shared variables whose values are used only
for diagnostic purposes.

2. Data-racy reads whose values are checked against marked reload.

3. Reads whose values feed into error-tolerant heuristics.

4. Writes setting values that feed into error-tolerant heuristics.

Data-Racy Reads for Approximate Diagnostics

Approximate diagnostics include lockdep reports, monitoring/statistics
(including /proc and /sys output), WARN*()/BUG*() checks whose return
values are ignored, and other situations where reads from shared variables
are not an integral part of the core concurrency design.

In fact, use of data_race() instead READ_ONCE() for these diagnostic
reads can enable better checking of the remaining accesses implementing
the core concurrency design. For example, suppose that the core design
prevents any non-diagnostic reads from shared variable x from running
concurrently with updates to x. Then using plain C-language writes
to x allows KCSAN to detect reads from x from within regions of code
that fail to exclude the updates. In this case, it is important to use
data_race() for the diagnostic reads because otherwise KCSAN would give
false-positive warnings about these diagnostic reads.

In theory, plain C-language loads can also be used for this use case.
However, in practice this will have the disadvantage of causing KCSAN
to generate false positives because KCSAN will have no way of knowing
that the resulting data race was intentional.

Data-Racy Reads That Are Checked Against Marked Reload

The values from some reads are not implicitly trusted. They are instead
fed into some operation that checks the full value against a later marked
load from memory, which means that the occasional arbitrarily bogus value
is not a problem. For example, if a bogus value is fed into cmpxchg(),
all that happens is that this cmpxchg() fails, which normally results
in a retry. Unless the race condition that resulted in the bogus value
recurs, this retry will with high probability succeed, so no harm done.

However, please keep in mind that a data_race() load feeding into
a cmpxchg_relaxed() might still be subject to load fusing on some
architectures. Therefore, it is best to capture the return value from
the failing cmpxchg() for the next iteration of the loop, an approach
that provides the compiler much less scope for mischievous optimizations.
Capturing the return value from cmpxchg() also saves a memory reference
in many cases.

In theory, plain C-language loads can also be used for this use case.
However, in practice this will have the disadvantage of causing KCSAN
to generate false positives because KCSAN will have no way of knowing
that the resulting data race was intentional.

Reads Feeding Into Error-Tolerant Heuristics

Values from some reads feed into heuristics that can tolerate occasional
errors. Such reads can use data_race(), thus allowing KCSAN to focus on
the other accesses to the relevant shared variables. But please note
that data_race() loads are subject to load fusing, which can result in
consistent errors, which in turn are quite capable of breaking heuristics.
Therefore use of data_race() should be limited to cases where some other
code (such as a barrier() call) will force the occasional reload.

In theory, plain C-language loads can also be used for this use case.
However, in practice this will have the disadvantage of causing KCSAN
to generate false positives because KCSAN will have no way of knowing
that the resulting data race was intentional.

Writes Setting Values Feeding Into Error-Tolerant Heuristics

The values read into error-tolerant heuristics come from somewhere,
for example, from sysfs. This means that some code in sysfs writes
to this same variable, and these writes can also use data_race().
After all, if the heuristic can tolerate the occasional bogus value
due to compiler-mangled reads, it can also tolerate the occasional
compiler-mangled write, at least assuming that the proper value is in
place once the write completes.

Plain C-language stores can also be used for this use case. However,
in kernels built with CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n, this
will have the disadvantage of causing KCSAN to generate false positives
because KCSAN will have no way of knowing that the resulting data race
was intentional.

Use of Plain C-Language Accesses
--------------------------------

Here are some example situations where plain C-language accesses should
used instead of READ_ONCE(), WRITE_ONCE(), and data_race():

1. Accesses protected by mutual exclusion, including strict locking
and sequence locking.

2. Initialization-time and cleanup-time accesses. This covers a
wide variety of situations, including the uniprocessor phase of
system boot, variables to be used by not-yet-spawned kthreads,
structures not yet published to reference-counted or RCU-protected
data structures, and the cleanup side of any of these situations.

3. Per-CPU variables that are not accessed from other CPUs.

4. Private per-task variables, including on-stack variables, some
fields in the task_struct structure, and task-private heap data.

5. Any other loads for which there is not supposed to be a concurrent
store to that same variable.

6. Any other stores for which there should be neither concurrent
loads nor concurrent stores to that same variable.

But note that KCSAN makes two explicit exceptions to this rule
by default, refraining from flagging plain C-language stores:

a. No matter what. You can override this default by building
with CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n.

b. When the store writes the value already contained in
that variable. You can override this default by building
with CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n.

c. When one of the stores is in an interrupt handler and
the other in the interrupted code. You can override this
default by building with CONFIG_KCSAN_INTERRUPT_WATCHER=y.

Note that it is important to use plain C-language accesses in these cases,
because doing otherwise prevents KCSAN from detecting violations of your
code's synchronization rules.

ACCESS-DOCUMENTATION OPTIONS
============================

It is important to comment marked accesses so that people reading your
code, yourself included, are reminded of the synchronization design.
However, it is even more important to comment plain C-language accesses
that are intentionally involved in data races. Such comments are
needed to remind people reading your code, again, yourself included,
of how the compiler has been prevented from optimizing those accesses
into concurrency bugs.

It is also possible to tell KCSAN about your synchronization design.
For example, ASSERT_EXCLUSIVE_ACCESS(foo) tells KCSAN that any
concurrent access to variable foo by any other CPU is an error, even
if that concurrent access is marked with READ_ONCE(). In addition,
ASSERT_EXCLUSIVE_WRITER(foo) tells KCSAN that although it is OK for there
to be concurrent reads from foo from other CPUs, it is an error for some
other CPU to be concurrently writing to foo, even if that concurrent
write is marked with data_race() or WRITE_ONCE().

Note that although KCSAN will call out data races involving either
ASSERT_EXCLUSIVE_ACCESS() or ASSERT_EXCLUSIVE_WRITER() on the one hand
and data_race() writes on the other, KCSAN will not report the location
of these data_race() writes.

EXAMPLES
========

As noted earlier, the goal is to prevent the compiler from destroying
your concurrent algorithm, to help the human reader, and to inform
KCSAN of aspects of your concurrency design. This section looks at a
few examples showing how this can be done.

Lock Protection With Lockless Diagnostic Access
-----------------------------------------------

For example, suppose a shared variable "foo" is read only while a
reader-writer spinlock is read-held, written only while that same
spinlock is write-held, except that it is also read locklessly for
diagnostic purposes. The code might look as follows:

int foo;
DEFINE_RWLOCK(foo_rwlock);

void update_foo(int newval)
{
write_lock(&foo_rwlock);
foo = newval;
do_something(newval);
write_unlock(&foo_rwlock);
}

int read_foo(void)
{
int ret;

read_lock(&foo_rwlock);
do_something_else();
ret = foo;
read_unlock(&foo_rwlock);
return ret;
}

int read_foo_diagnostic(void)
{
return data_race(foo);
}

The reader-writer lock prevents the compiler from introducing concurrency
bugs into any part of the main algorithm using foo, which means that
the accesses to foo within both update_foo() and read_foo() can (and
should) be plain C-language accesses. One benefit of making them be
plain C-language accesses is that KCSAN can detect any erroneous lockless
reads from or updates to foo. The data_race() in read_foo_diagnostic()
tells KCSAN that data races are expected, and should be silently
ignored. This data_race() also tells the human reading the code that
read_foo_diagnostic() might sometimes return a bogus value.

However, please note that your kernel must be built with
CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n in order for KCSAN to
detect a buggy lockless write. If you need KCSAN to detect such a
write even if that write did not change the value of foo, you also
need CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n. If you need KCSAN to
detect such a write happening in an interrupt handler running on the
same CPU doing the legitimate lock-protected write, you also need
CONFIG_KCSAN_INTERRUPT_WATCHER=y. With some or all of these Kconfig
options set properly, KCSAN can be quite helpful, although it is not
necessarily a full replacement for hardware watchpoints. On the other
hand, neither are hardware watchpoints a full replacement for KCSAN
because it is not always easy to tell hardware watchpoint to conditionally
trap on accesses.

Lock-Protected Writes With Lockless Reads
-----------------------------------------

For another example, suppose a shared variable "foo" is updated only
while holding a spinlock, but is read locklessly. The code might look
as follows:

int foo;
DEFINE_SPINLOCK(foo_lock);

void update_foo(int newval)
{
spin_lock(&foo_lock);
WRITE_ONCE(foo, newval);
ASSERT_EXCLUSIVE_WRITER(foo);
do_something(newval);
spin_unlock(&foo_wlock);
}

int read_foo(void)
{
do_something_else();
return READ_ONCE(foo);
}

Because foo is read locklessly, all accesses are marked. The purpose
of the ASSERT_EXCLUSIVE_WRITER() is to allow KCSAN to check for a buggy
concurrent lockless write.

Lockless Reads and Writes
-------------------------

For another example, suppose a shared variable "foo" is both read and
updated locklessly. The code might look as follows:

int foo;

int update_foo(int newval)
{
int ret;

ret = xchg(&foo, newval);
do_something(newval);
return ret;
}

int read_foo(void)
{
do_something_else();
return READ_ONCE(foo);
}

Because foo is accessed locklessly, all accesses are marked. It does
not make sense to use ASSERT_EXCLUSIVE_WRITER() in this case because
there really can be concurrent lockless writers. KCSAN would
flag any concurrent plain C-language reads from foo, and given
CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n, also any concurrent plain
C-language writes to foo.

Lockless Reads and Writes, But With Single-Threaded Initialization
------------------------------------------------------------------

For yet another example, suppose that foo is initialized in a
single-threaded manner, but that a number of kthreads are then created
that locklessly and concurrently access foo. Some snippets of this code
might look as follows:

int foo;

void initialize_foo(int initval, int nkthreads)
{
int i;

foo = initval;
ASSERT_EXCLUSIVE_ACCESS(foo);
for (i = 0; i < nkthreads; i++)
kthread_run(access_foo_concurrently, ...);
}

/* Called from access_foo_concurrently(). */
int update_foo(int newval)
{
int ret;

ret = xchg(&foo, newval);
do_something(newval);
return ret;
}

/* Also called from access_foo_concurrently(). */
int read_foo(void)
{
do_something_else();
return READ_ONCE(foo);
}

The initialize_foo() uses a plain C-language write to foo because there
are not supposed to be concurrent accesses during initialization. The
ASSERT_EXCLUSIVE_ACCESS() allows KCSAN to flag buggy concurrent unmarked
reads, and the ASSERT_EXCLUSIVE_ACCESS() call further allows KCSAN to
flag buggy concurrent writes, even if: (1) Those writes are marked or
(2) The kernel was built with CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=y.

Checking Stress-Test Race Coverage
----------------------------------

When designing stress tests it is important to ensure that race conditions
of interest really do occur. For example, consider the following code
fragment:

int foo;

int update_foo(int newval)
{
return xchg(&foo, newval);
}

int xor_shift_foo(int shift, int mask)
{
int old, new, newold;

newold = data_race(foo); /* Checked by cmpxchg(). */
do {
old = newold;
new = (old << shift) ^ mask;
newold = cmpxchg(&foo, old, new);
} while (newold != old);
return old;
}

int read_foo(void)
{
return READ_ONCE(foo);
}

If it is possible for update_foo(), xor_shift_foo(), and read_foo() to be
invoked concurrently, the stress test should force this concurrency to
actually happen. KCSAN can evaluate the stress test when the above code
is modified to read as follows:

int foo;

int update_foo(int newval)
{
ASSERT_EXCLUSIVE_ACCESS(foo);
return xchg(&foo, newval);
}

int xor_shift_foo(int shift, int mask)
{
int old, new, newold;

newold = data_race(foo); /* Checked by cmpxchg(). */
do {
old = newold;
new = (old << shift) ^ mask;
ASSERT_EXCLUSIVE_ACCESS(foo);
newold = cmpxchg(&foo, old, new);
} while (newold != old);
return old;
}

int read_foo(void)
{
ASSERT_EXCLUSIVE_ACCESS(foo);
return READ_ONCE(foo);
}

If a given stress-test run does not result in KCSAN complaints from
each possible pair of ASSERT_EXCLUSIVE_ACCESS() invocations, the
stress test needs improvement. If the stress test was to be evaluated
on a regular basis, it would be wise to place the above instances of
ASSERT_EXCLUSIVE_ACCESS() under #ifdef so that they did not result in
false positives when not evaluating the stress test.

REFERENCES
==========

[1] "Concurrency bugs should fear the big bad data-race detector (part 2)"
https://lwn.net/Articles/816854/

[2] "Who's afraid of a big bad optimizing compiler?"
https://lwn.net/Articles/793253/

2021-05-13 06:14:31

by Manfred Spraul

[permalink] [raw]

Subject: Re: ipc/sem, ipc/msg, ipc/mqueue.c kcsan questions

Hi Paul,

On 5/12/21 10:17 PM, Paul E. McKenney wrote:
> On Wed, May 12, 2021 at 09:58:18PM +0200, Manfred Spraul wrote:
>> [...]
>> sma->use_global_lock is evaluated in sem_lock() twice:
>>
>>>        /*
>>>          * Initial check for use_global_lock. Just an optimization,
>>>          * no locking, no memory barrier.
>>>          */
>>>         if (!sma->use_global_lock) {
>> Both sides of the if-clause handle possible data races.
>>
>> Is
>>
>>     if (!data_race(sma->use_global_lock)) {
>>
>> the correct thing to suppress the warning?
> Most likely READ_ONCE() rather than data_race(), but please see
> the end of this message.

Based on the document, I would say data_race() is sufficient:

I have replaced the code with "if (jiffies %2)", and it runs fine.

Thus I don't see which evil things a compiler could do, ... .

[...]

Does tools/memory-model/Documentation/access-marking.txt, shown below,
> help?
>
[...]
> int foo;
> DEFINE_RWLOCK(foo_rwlock);
>
> void update_foo(int newval)
> {
> write_lock(&foo_rwlock);
> foo = newval;
> do_something(newval);
> write_unlock(&foo_rwlock);
> }
>
> int read_foo(void)
> {
> int ret;
>
> read_lock(&foo_rwlock);
> do_something_else();
> ret = foo;
> read_unlock(&foo_rwlock);
> return ret;
> }
>
> int read_foo_diagnostic(void)
> {
> return data_race(foo);
> }

The text didn't help, the example has helped:

It was not clear to me if I have to use data_race() both on the read and
the write side, or only on one side.

Based on this example: plain C may be paired with data_race(), there is
no need to mark both sides.

Attached is a dummy change to ipc/sem.c, where I have added comments to
every access.

If data_race() is sufficient, then I think I have understood the rules,
and I would recheck ipc/*.c and the netfilter code.

--

    Manfred

Attachments:

ipc-sem-dummy-change (3.93 kB)

2021-05-14 01:06:44

by Paul E. McKenney

[permalink] [raw]

Subject: Re: ipc/sem, ipc/msg, ipc/mqueue.c kcsan questions

On Thu, May 13, 2021 at 12:02:01PM -0700, Paul E. McKenney wrote:
> On Thu, May 13, 2021 at 08:10:51AM +0200, Manfred Spraul wrote:
> > Hi Paul,
> >
> > On 5/12/21 10:17 PM, Paul E. McKenney wrote:
> > > On Wed, May 12, 2021 at 09:58:18PM +0200, Manfred Spraul wrote:
> > > > [...]
> > > > sma->use_global_lock is evaluated in sem_lock() twice:
> > > >
> > > > > ?????? /*
> > > > > ???????? * Initial check for use_global_lock. Just an optimization,
> > > > > ???????? * no locking, no memory barrier.
> > > > > ???????? */
> > > > > ??????? if (!sma->use_global_lock) {
> > > > Both sides of the if-clause handle possible data races.
> > > >
> > > > Is
> > > >
> > > > ??? if (!data_race(sma->use_global_lock)) {
> > > >
> > > > the correct thing to suppress the warning?
> > > Most likely READ_ONCE() rather than data_race(), but please see
> > > the end of this message.
> >
> > Based on the document, I would say data_race() is sufficient:
> >
> > I have replaced the code with "if (jiffies %2)", and it runs fine.
>
> OK, but please note that "jiffies" is marked volatile, which prevents the
> compiler from fusing loads. You just happen to be OK in this particular
> case, as described below. Use of the "jiffies_64" non-volatile synonym
> for "jiffies" is better for this sort of checking. But even so, just
> because a particular version of a particular compiler refrains from
> fusing loads in a particular situation does not mean that all future
> versions of all future compilers will behave so nicely.
>
> Again, you are OK in this particular situation, as described below.
>
> > Thus I don't see which evil things a compiler could do, ... .
>
> Fair enough, and your example is covered by the section "Reads Feeding
> Into Error-Tolerant Heuristics". The worst that the compiler can do is
> to force an unnecessary acquisition of the global lock.
>
> This cannot cause incorrect execution, but could results in poor
> scalability. This could be a problem is load fusing were possible, that
> is, if successes calls to this function were inlined and the compiler
> just reused the value initially loaded.
>
> The reason that load fusing cannot happen in this case is that the
> load is immediately followed by a lock acquisition, which implies a
> barrier(), which prevents the compiler from fusing loads on opposite
> sides of that barrier().
>
> > [...]
> >
> > Does tools/memory-model/Documentation/access-marking.txt, shown below,
> > > help?
> > >
> > [...]
> > > int foo;
> > > DEFINE_RWLOCK(foo_rwlock);
> > >
> > > void update_foo(int newval)
> > > {
> > > write_lock(&foo_rwlock);
> > > foo = newval;
> > > do_something(newval);
> > > write_unlock(&foo_rwlock);
> > > }
> > >
> > > int read_foo(void)
> > > {
> > > int ret;
> > >
> > > read_lock(&foo_rwlock);
> > > do_something_else();
> > > ret = foo;
> > > read_unlock(&foo_rwlock);
> > > return ret;
> > > }
> > >
> > > int read_foo_diagnostic(void)
> > > {
> > > return data_race(foo);
> > > }
> >
> > The text didn't help, the example has helped:
> >
> > It was not clear to me if I have to use data_race() both on the read and the
> > write side, or only on one side.
> >
> > Based on this example: plain C may be paired with data_race(), there is no
> > need to mark both sides.
>
> Actually, you just demonstrated that this example is quite misleading.
> That data_race() works only because the read is for diagnostic
> purposes. I am queuing a commit with your Reported-by that makes
> read_foo_diagnostic() just do a pr_info(), like this:
>
> void read_foo_diagnostic(void)
> {
> pr_info("Current value of foo: %d\n", data_race(foo));
> }
>
> So thank you for that!

And please see below for an example better illustrating your use case.
Anything messed up or missing?

Thanx, Paul

------------------------------------------------------------------------

commit b4287410ee93109501defc4695ccc29144e8f3a3
Author: Paul E. McKenney <[email protected]>
Date: Thu May 13 14:54:58 2021 -0700

tools/memory-model: Add example for heuristic lockless reads

This commit adds example code for heuristic lockless reads, based loosely
on the sem_lock() and sem_unlock() functions.

Reported-by: Manfred Spraul <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/tools/memory-model/Documentation/access-marking.txt b/tools/memory-model/Documentation/access-marking.txt
index 58bff2619876..e4a20ebf565d 100644
--- a/tools/memory-model/Documentation/access-marking.txt
+++ b/tools/memory-model/Documentation/access-marking.txt
@@ -319,6 +319,98 @@ of the ASSERT_EXCLUSIVE_WRITER() is to allow KCSAN to check for a buggy
concurrent lockless write.

+Lock-Protected Writes With Heuristic Lockless Reads
+---------------------------------------------------
+
+For another example, suppose that the code can normally make use of
+a per-data-structure lock, but there are times when a global lock is
+required. These times are indicated via a global flag. The code might
+look as follows, and is based loosely on sem_lock() and sem_unlock():
+
+ bool global_flag;
+ DEFINE_SPINLOCK(global_lock);
+ struct foo {
+ spinlock_t f_lock;
+ int f_data;
+ };
+
+ /* All foo structures are in the following array. */
+ int nfoo;
+ struct foo *foo_array;
+
+ void do_something_locked(struct foo *fp)
+ {
+ /* IMPORTANT: Heuristic plus spin_lock()! */
+ if (!data_race(global_flag)) {
+ spin_lock(&fp->f_lock);
+ if (!smp_load_acquire(&global_flag)) {
+ do_something(fp);
+ spin_unlock(&fp->f_lock);
+ return;
+ }
+ spin_unlock(&fp->f_lock);
+ }
+ spin_lock(&global_flag);
+ /* Lock held, thus global flag cannot change. */
+ if (!global_flag) {
+ spin_lock(&fp->f_lock);
+ spin_unlock(&global_flag);
+ }
+ do_something(fp);
+ if (global_flag)
+ spin_unlock(&global_flag);
+ else
+ spin_lock(&fp->f_lock);
+ }
+
+ void begin_global(void)
+ {
+ int i;
+
+ spin_lock(&global_flag);
+ WRITE_ONCE(global_flag, true);
+ for (i = 0; i < nfoo; i++) {
+ /* Wait for pre-existing local locks. */
+ spin_lock(&fp->f_lock);
+ spin_unlock(&fp->f_lock);
+ }
+ spin_unlock(&global_flag);
+ }
+
+ void end_global(void)
+ {
+ spin_lock(&global_flag);
+ smp_store_release(&global_flag, false);
+ /* Pre-existing global lock acquisitions will recheck. */
+ spin_unlock(&global_flag);
+ }
+
+All code paths leading from the do_something_locked() function's first
+read from global_flag acquire a lock, so endless load fusing cannot
+happen.
+
+If the value read from global_flag is true, then global_flag is rechecked
+while holding global_lock, which prevents global_flag from changing.
+If this recheck finds that global_flag is now false, the acquisition
+of ->f_lock prior to the release of global_lock will result in any subsequent
+begin_global() invocation waiting to acquire ->f_lock.
+
+On the other hand, if the value read from global_flag is false, then
+global_flag, then rechecking under ->f_lock combined with synchronization
+with begin_global() guarantees than any erroneous read will cause the
+do_something_locked() function's first do_something() invocation to happen
+before begin_global() returns. The combination of the smp_load_acquire()
+in do_something_locked() and the smp_store_release() in end_global()
+guarantees that either the do_something_locked() function's first
+do_something() invocation happens after the call to end_global() or that
+do_something_locked() acquires global_lock() and rechecks under the lock.
+
+For this to work, only those foo structures in foo_array[] may be
+passed to do_something_locked(). The reason for this is that the
+synchronization with begin_global() relies on momentarily locking each
+and every foo structure.
+
+
Lockless Reads and Writes
-------------------------

2021-05-14 05:39:40

by Paul E. McKenney

[permalink] [raw]

Subject: Re: ipc/sem, ipc/msg, ipc/mqueue.c kcsan questions

On Thu, May 13, 2021 at 08:10:51AM +0200, Manfred Spraul wrote:
> Hi Paul,
>
> On 5/12/21 10:17 PM, Paul E. McKenney wrote:
> > On Wed, May 12, 2021 at 09:58:18PM +0200, Manfred Spraul wrote:
> > > [...]
> > > sma->use_global_lock is evaluated in sem_lock() twice:
> > >
> > > > ?????? /*
> > > > ???????? * Initial check for use_global_lock. Just an optimization,
> > > > ???????? * no locking, no memory barrier.
> > > > ???????? */
> > > > ??????? if (!sma->use_global_lock) {
> > > Both sides of the if-clause handle possible data races.
> > >
> > > Is
> > >
> > > ??? if (!data_race(sma->use_global_lock)) {
> > >
> > > the correct thing to suppress the warning?
> > Most likely READ_ONCE() rather than data_race(), but please see
> > the end of this message.
>
> Based on the document, I would say data_race() is sufficient:
>
> I have replaced the code with "if (jiffies %2)", and it runs fine.

OK, but please note that "jiffies" is marked volatile, which prevents the
compiler from fusing loads. You just happen to be OK in this particular
case, as described below. Use of the "jiffies_64" non-volatile synonym
for "jiffies" is better for this sort of checking. But even so, just
because a particular version of a particular compiler refrains from
fusing loads in a particular situation does not mean that all future
versions of all future compilers will behave so nicely.

Again, you are OK in this particular situation, as described below.

> Thus I don't see which evil things a compiler could do, ... .

Fair enough, and your example is covered by the section "Reads Feeding
Into Error-Tolerant Heuristics". The worst that the compiler can do is
to force an unnecessary acquisition of the global lock.

This cannot cause incorrect execution, but could results in poor
scalability. This could be a problem is load fusing were possible, that
is, if successes calls to this function were inlined and the compiler
just reused the value initially loaded.

The reason that load fusing cannot happen in this case is that the
load is immediately followed by a lock acquisition, which implies a
barrier(), which prevents the compiler from fusing loads on opposite
sides of that barrier().

> [...]
>
> Does tools/memory-model/Documentation/access-marking.txt, shown below,
> > help?
> >
> [...]
> > int foo;
> > DEFINE_RWLOCK(foo_rwlock);
> >
> > void update_foo(int newval)
> > {
> > write_lock(&foo_rwlock);
> > foo = newval;
> > do_something(newval);
> > write_unlock(&foo_rwlock);
> > }
> >
> > int read_foo(void)
> > {
> > int ret;
> >
> > read_lock(&foo_rwlock);
> > do_something_else();
> > ret = foo;
> > read_unlock(&foo_rwlock);
> > return ret;
> > }
> >
> > int read_foo_diagnostic(void)
> > {
> > return data_race(foo);
> > }
>
> The text didn't help, the example has helped:
>
> It was not clear to me if I have to use data_race() both on the read and the
> write side, or only on one side.
>
> Based on this example: plain C may be paired with data_race(), there is no
> need to mark both sides.

Actually, you just demonstrated that this example is quite misleading.
That data_race() works only because the read is for diagnostic
purposes. I am queuing a commit with your Reported-by that makes
read_foo_diagnostic() just do a pr_info(), like this:

void read_foo_diagnostic(void)
{
pr_info("Current value of foo: %d\n", data_race(foo));
}

So thank you for that!

> Attached is a dummy change to ipc/sem.c, where I have added comments to
> every access.

Please see below.

> If data_race() is sufficient, then I think I have understood the rules, and
> I would recheck ipc/*.c and the netfilter code.
>
> --
>
> ??? Manfred
>
>

> diff --git a/ipc/sem.c b/ipc/sem.c
> index bf534c74293e..6026187f79f8 100644
> --- a/ipc/sem.c
> +++ b/ipc/sem.c
> @@ -87,6 +87,7 @@
> #include <linux/sched/wake_q.h>
> #include <linux/nospec.h>
> #include <linux/rhashtable.h>
> +#include <linux/jiffies.h>
>
> #include <linux/uaccess.h>
> #include "util.h"
> @@ -336,20 +337,43 @@ static void complexmode_enter(struct sem_array *sma)
> int i;
> struct sem *sem;
>
> + /* caller owns sem_perm.lock -> plain C access */
> if (sma->use_global_lock > 0) {
> /*
> * We are already in global lock mode.
> * Nothing to do, just reset the
> * counter until we return to simple mode.
> */
> + /* a change from a non-zero value to another
> + * non-zero value. Plain C is sufficient, as all
> + * readers either own sem_perm.lock or are using
> + * data_race() or smp_load_acquire().

This is OK, but only because all of the bits are confined to a byte.
If (say) 0x10000 and 0x0ffff were legal values, then store tearing could
result in a momentary zero when switching between them. There has
been a claim that compilers should not tear stores, but there was
recently one that would do so when storing certain 32-bit constants.
And the standard does not prohibit tearing unmarked loads and stores,
even if all the value are confined to a single byte. (But a compiler
that tore stores bit-at-a-time would be of questionable value on any of
the architectures that the Linux kernel support.)

> + */
> sma->use_global_lock = USE_GLOBAL_LOCK_HYSTERESIS;
> return;
> }
> + /* Question: This pairs with the smp_load_acquire
> + * in sem_lock(), in a racy way:
> + * The reader in sem_lock() may see the new value
> + * immediately, ...
> + */
> sma->use_global_lock = USE_GLOBAL_LOCK_HYSTERESIS;

In my code, I would make this use WRITE_ONCE(). One fewer set of compiler
tricks to worry about.

> for (i = 0; i < sma->sem_nsems; i++) {
> sem = &sma->sems[i];
> spin_lock(&sem->lock);
> + /* ..., or much later.
> + * But this is the latest possible time:
> + * sem_lock() owns one of the sem->lock locks
> + * when using smp_load_acquire(). Thus one of the
> + * spin_unlock()s in this loop is the _release for
> + * the plain C write above.
> + * My current understanding: Plain C is correct,
> + * as the reader is either using data_race() or
> + * smp_load_acquire(), or it is a trivial case
> + * of the reader owns sem_perm.lock - and we own
> + * that lock all the time.

Yes, once we release a given sem->lock, any future acquisitions of that
lock must see the new value of sma->use_global_lock. If they get to
their sem->lock before we do, then the above spin_lock() acquisition
will wait for them. This use of locks is an unusual form of RCU. ;-)

(Grace-period latencies might be a bit long for actual RCU here.)

> + */
> spin_unlock(&sem->lock);
> }
> }
> @@ -366,11 +390,21 @@ static void complexmode_tryleave(struct sem_array *sma)
> */
> return;
> }
> + /* sem_perm.lock owned, and all writes to sma->use_global_lock
> + * happen under that lock -> plain C

Other than the smp_store_release()?

> + */
> if (sma->use_global_lock == 1) {
>
> /* See SEM_BARRIER_1 for purpose/pairing */
> smp_store_release(&sma->use_global_lock, 0);
> } else {
> + /* the read side is maked -> plain C.

s/maked/marked/?

> + * Question: Old value 4, new value 3.
> + * If it might happen that the actual
> + * change is 4 -> 0 -> 3 (i.e. first:
> + * clear bit 2, then set bits 0&1, then
> + * this would break the algorithm.
> + * Is therefore WRITE_ONCE() required? */
> sma->use_global_lock--;

In my code, I would use WRITE_ONCE() here.

> }
> }
> @@ -412,7 +446,20 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
> * Initial check for use_global_lock. Just an optimization,
> * no locking, no memory barrier.
> */
> - if (!sma->use_global_lock) {
> +#if 1
> + /* the code works fine regardless of the returned value
> + * -> data_race().
> + */
> + if (!data_race(sma->use_global_lock)) {
> +#else
> + /* proof of the claim that the code always works:
> + * My benchmarks ran fine with this implementation :-)
> + */
> + if (jiffies%2) {

As noted above, use of jiffies_64 would be more convincing because
jiffies is immune from load fusing and jiffies_64 is not. But this
still does not constitute a proof. You have instead only shown that a
given version of a given compiler does what you want. ;-)

> + pr_info("jiffies mod 2 is 1.\n");
> + } else {
> + pr_info("jiffies mod 2 is 0.\n");
> +#endif
> /*
> * It appears that no complex operation is around.
> * Acquire the per-semaphore lock.
> @@ -420,6 +467,11 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
> spin_lock(&sem->lock);
>
> /* see SEM_BARRIER_1 for purpose/pairing */
> + /* sma->use_global_lock is written to with plain C
> + * within a spinlock protected region (but: another
> + * lock, not the sem->lock that we own). No need
> + * for data_race(), as we use smp_load_acquire().
> + */
> if (!smp_load_acquire(&sma->use_global_lock)) {
> /* fast path successful! */
> return sops->sem_num;
> @@ -430,6 +482,10 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
> /* slow path: acquire the full lock */
> ipc_lock_object(&sma->sem_perm);
>
> + /* Trivial case: All writes to sma->use_global_lock happen under
> + * sma->sem_perm.lock. We own that lock, thus plain C access is
> + * correct.
> + */
> if (sma->use_global_lock == 0) {
> /*
> * The use_global_lock mode ended while we waited for

2021-05-14 13:46:47

by Manfred Spraul

[permalink] [raw]

Subject: Re: ipc/sem, ipc/msg, ipc/mqueue.c kcsan questions

On 5/13/21 9:02 PM, Paul E. McKenney wrote:
> On Thu, May 13, 2021 at 08:10:51AM +0200, Manfred Spraul wrote:
>> Hi Paul,
>>
>> On 5/12/21 10:17 PM, Paul E. McKenney wrote:
>> [...]
>>> int foo;
>>> DEFINE_RWLOCK(foo_rwlock);
>>>
>>> void update_foo(int newval)
>>> {
>>> write_lock(&foo_rwlock);
>>> foo = newval;
>>> do_something(newval);
>>> write_unlock(&foo_rwlock);
>>> }
>>>
>>> int read_foo(void)
>>> {
>>> int ret;
>>>
>>> read_lock(&foo_rwlock);
>>> do_something_else();
>>> ret = foo;
>>> read_unlock(&foo_rwlock);
>>> return ret;
>>> }
>>>
>>> int read_foo_diagnostic(void)
>>> {
>>> return data_race(foo);
>>> }
>> The text didn't help, the example has helped:
>>
>> It was not clear to me if I have to use data_race() both on the read and the
>> write side, or only on one side.
>>
>> Based on this example: plain C may be paired with data_race(), there is no
>> need to mark both sides.
> Actually, you just demonstrated that this example is quite misleading.
> That data_race() works only because the read is for diagnostic
> purposes. I am queuing a commit with your Reported-by that makes
> read_foo_diagnostic() just do a pr_info(), like this:
>
> void read_foo_diagnostic(void)
> {
> pr_info("Current value of foo: %d\n", data_race(foo));
> }
>
> So thank you for that!

I would not like this change at all.
Assume you chase a rare bug, and notice an odd pr_info() output.
It will take you really long until you figure out that a data_race()
mislead you.
Thus for a pr_info(), I would consider READ_ONCE() as the correct thing.

What about something like the attached change?

--

Manfred

Attachments:

access-marking.txt (3.12 kB)

2021-05-14 19:35:23

by Manfred Spraul

[permalink] [raw]

Subject: Re: ipc/sem, ipc/msg, ipc/mqueue.c kcsan questions

Hi Paul,

On 5/14/21 12:01 AM, Paul E. McKenney wrote:
> On Thu, May 13, 2021 at 12:02:01PM -0700, Paul E. McKenney wrote:
>> On Thu, May 13, 2021 at 08:10:51AM +0200, Manfred Spraul wrote:
>>> Hi Paul,
>>>
>>> On 5/12/21 10:17 PM, Paul E. McKenney wrote:
>>>> On Wed, May 12, 2021 at 09:58:18PM +0200, Manfred Spraul wrote:
>>>>> [...]
>>>>> sma->use_global_lock is evaluated in sem_lock() twice:
>>>>>
>>>>>>        /*
>>>>>>          * Initial check for use_global_lock. Just an optimization,
>>>>>>          * no locking, no memory barrier.
>>>>>>          */
>>>>>>         if (!sma->use_global_lock) {
>>>>> Both sides of the if-clause handle possible data races.
>>>>>
>>>>> Is
>>>>>
>>>>>     if (!data_race(sma->use_global_lock)) {
>>>>>
>>>>> the correct thing to suppress the warning?
>>>> Most likely READ_ONCE() rather than data_race(), but please see
>>>> the end of this message.
>>> Based on the document, I would say data_race() is sufficient:
>>>
>>> I have replaced the code with "if (jiffies %2)", and it runs fine.
>> OK, but please note that "jiffies" is marked volatile, which prevents the
>> compiler from fusing loads. You just happen to be OK in this particular
>> case, as described below. Use of the "jiffies_64" non-volatile synonym
>> for "jiffies" is better for this sort of checking. But even so, just
>> because a particular version of a particular compiler refrains from
>> fusing loads in a particular situation does not mean that all future
>> versions of all future compilers will behave so nicely.
>>
>> Again, you are OK in this particular situation, as described below.
>>
>>> Thus I don't see which evil things a compiler could do, ... .
>> Fair enough, and your example is covered by the section "Reads Feeding
>> Into Error-Tolerant Heuristics". The worst that the compiler can do is
>> to force an unnecessary acquisition of the global lock.
>>
>> This cannot cause incorrect execution, but could results in poor
>> scalability. This could be a problem is load fusing were possible, that
>> is, if successes calls to this function were inlined and the compiler
>> just reused the value initially loaded.
>>
>> The reason that load fusing cannot happen in this case is that the
>> load is immediately followed by a lock acquisition, which implies a
>> barrier(), which prevents the compiler from fusing loads on opposite
>> sides of that barrier().
>>
>>> [...]
>>>
>>> Does tools/memory-model/Documentation/access-marking.txt, shown below,
>>>> help?
>>>>
>>> [...]
>>>> int foo;
>>>> DEFINE_RWLOCK(foo_rwlock);
>>>>
>>>> void update_foo(int newval)
>>>> {
>>>> write_lock(&foo_rwlock);
>>>> foo = newval;
>>>> do_something(newval);
>>>> write_unlock(&foo_rwlock);
>>>> }
>>>>
>>>> int read_foo(void)
>>>> {
>>>> int ret;
>>>>
>>>> read_lock(&foo_rwlock);
>>>> do_something_else();
>>>> ret = foo;
>>>> read_unlock(&foo_rwlock);
>>>> return ret;
>>>> }
>>>>
>>>> int read_foo_diagnostic(void)
>>>> {
>>>> return data_race(foo);
>>>> }
>>> The text didn't help, the example has helped:
>>>
>>> It was not clear to me if I have to use data_race() both on the read and the
>>> write side, or only on one side.
>>>
>>> Based on this example: plain C may be paired with data_race(), there is no
>>> need to mark both sides.
>> Actually, you just demonstrated that this example is quite misleading.
>> That data_race() works only because the read is for diagnostic
>> purposes. I am queuing a commit with your Reported-by that makes
>> read_foo_diagnostic() just do a pr_info(), like this:
>>
>> void read_foo_diagnostic(void)
>> {
>> pr_info("Current value of foo: %d\n", data_race(foo));
>> }
>>
>> So thank you for that!
> And please see below for an example better illustrating your use case.
> Anything messed up or missing?
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit b4287410ee93109501defc4695ccc29144e8f3a3
> Author: Paul E. McKenney <[email protected]>
> Date: Thu May 13 14:54:58 2021 -0700
>
> tools/memory-model: Add example for heuristic lockless reads
>
> This commit adds example code for heuristic lockless reads, based loosely
> on the sem_lock() and sem_unlock() functions.

I would refer to nf_conntrack_all_lock() instead of sem_lock():

nf_conntrack_all_lock() is far easier to read, and it contains the same
heuristics

>
> Reported-by: Manfred Spraul <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/tools/memory-model/Documentation/access-marking.txt b/tools/memory-model/Documentation/access-marking.txt
> index 58bff2619876..e4a20ebf565d 100644
> --- a/tools/memory-model/Documentation/access-marking.txt
> +++ b/tools/memory-model/Documentation/access-marking.txt
> @@ -319,6 +319,98 @@ of the ASSERT_EXCLUSIVE_WRITER() is to allow KCSAN to check for a buggy
> concurrent lockless write.
>
>
> +Lock-Protected Writes With Heuristic Lockless Reads
> +---------------------------------------------------
> +
> +For another example, suppose that the code can normally make use of
> +a per-data-structure lock, but there are times when a global lock is
> +required. These times are indicated via a global flag. The code might
> +look as follows, and is based loosely on sem_lock() and sem_unlock():
> +
> + bool global_flag;
> + DEFINE_SPINLOCK(global_lock);
> + struct foo {
> + spinlock_t f_lock;
> + int f_data;
> + };
> +
> + /* All foo structures are in the following array. */
> + int nfoo;
> + struct foo *foo_array;
> +
> + void do_something_locked(struct foo *fp)
> + {
> + /* IMPORTANT: Heuristic plus spin_lock()! */
> + if (!data_race(global_flag)) {
> + spin_lock(&fp->f_lock);
> + if (!smp_load_acquire(&global_flag)) {
> + do_something(fp);
> + spin_unlock(&fp->f_lock);
> + return;
> + }
> + spin_unlock(&fp->f_lock);
> + }
> + spin_lock(&global_flag);
> + /* Lock held, thus global flag cannot change. */
> + if (!global_flag) {
> + spin_lock(&fp->f_lock);
> + spin_unlock(&global_flag);

spin_unlock(&global_lock), not &global_flag.

That was the main results from the discussions a few years ago:

Split global_lock and global_flag. Do not try to use
spin_is_locked(&global_lock). Just add a flag. The 4 bytes are well
invested.

> + }
> + do_something(fp);
> + if (global_flag)
> + spin_unlock(&global_flag);
&global_lock
> + else
> + spin_lock(&fp->f_lock);
> + }
> +
> + void begin_global(void)
> + {
> + int i;
> +
> + spin_lock(&global_flag);
> + WRITE_ONCE(global_flag, true);
> + for (i = 0; i < nfoo; i++) {
> + /* Wait for pre-existing local locks. */
> + spin_lock(&fp->f_lock);
> + spin_unlock(&fp->f_lock);
> + }
> + spin_unlock(&global_flag);
> + }
> +
> + void end_global(void)
> + {
> + spin_lock(&global_flag);
> + smp_store_release(&global_flag, false);
> + /* Pre-existing global lock acquisitions will recheck. */
> + spin_unlock(&global_flag);
> + }
> +
> +All code paths leading from the do_something_locked() function's first
> +read from global_flag acquire a lock, so endless load fusing cannot
> +happen.
> +
> +If the value read from global_flag is true, then global_flag is rechecked
> +while holding global_lock, which prevents global_flag from changing.
> +If this recheck finds that global_flag is now false, the acquisition
> +of ->f_lock prior to the release of global_lock will result in any subsequent
> +begin_global() invocation waiting to acquire ->f_lock.
> +
> +On the other hand, if the value read from global_flag is false, then
> +global_flag, then rechecking under ->f_lock combined with synchronization
> +with begin_global() guarantees than any erroneous read will cause the
> +do_something_locked() function's first do_something() invocation to happen
> +before begin_global() returns. The combination of the smp_load_acquire()
> +in do_something_locked() and the smp_store_release() in end_global()
> +guarantees that either the do_something_locked() function's first
> +do_something() invocation happens after the call to end_global() or that
> +do_something_locked() acquires global_lock() and rechecks under the lock.
> +
> +For this to work, only those foo structures in foo_array[] may be
> +passed to do_something_locked(). The reason for this is that the
> +synchronization with begin_global() relies on momentarily locking each
> +and every foo structure.
> +
> +
> Lockless Reads and Writes
> -------------------------
>

2021-05-14 20:21:28

by Paul E. McKenney

[permalink] [raw]

Subject: Re: ipc/sem, ipc/msg, ipc/mqueue.c kcsan questions

On Fri, May 14, 2021 at 07:41:02AM +0200, Manfred Spraul wrote:
> On 5/13/21 9:02 PM, Paul E. McKenney wrote:
> > On Thu, May 13, 2021 at 08:10:51AM +0200, Manfred Spraul wrote:
> > > Hi Paul,
> > >
> > > On 5/12/21 10:17 PM, Paul E. McKenney wrote:
> > > [...]
> > > > int foo;
> > > > DEFINE_RWLOCK(foo_rwlock);
> > > >
> > > > void update_foo(int newval)
> > > > {
> > > > write_lock(&foo_rwlock);
> > > > foo = newval;
> > > > do_something(newval);
> > > > write_unlock(&foo_rwlock);
> > > > }
> > > >
> > > > int read_foo(void)
> > > > {
> > > > int ret;
> > > >
> > > > read_lock(&foo_rwlock);
> > > > do_something_else();
> > > > ret = foo;
> > > > read_unlock(&foo_rwlock);
> > > > return ret;
> > > > }
> > > >
> > > > int read_foo_diagnostic(void)
> > > > {
> > > > return data_race(foo);
> > > > }
> > > The text didn't help, the example has helped:
> > >
> > > It was not clear to me if I have to use data_race() both on the read and the
> > > write side, or only on one side.
> > >
> > > Based on this example: plain C may be paired with data_race(), there is no
> > > need to mark both sides.
> > Actually, you just demonstrated that this example is quite misleading.
> > That data_race() works only because the read is for diagnostic
> > purposes. I am queuing a commit with your Reported-by that makes
> > read_foo_diagnostic() just do a pr_info(), like this:
> >
> > void read_foo_diagnostic(void)
> > {
> > pr_info("Current value of foo: %d\n", data_race(foo));
> > }
> >
> > So thank you for that!
>
> I would not like this change at all.
> Assume you chase a rare bug, and notice an odd pr_info() output.
> It will take you really long until you figure out that a data_race() mislead
> you.
> Thus for a pr_info(), I would consider READ_ONCE() as the correct thing.

It depends, but I agree with a general preference for READ_ONCE() over
data_race().

However, for some types of concurrency designs, using a READ_ONCE()
can make it more difficult to enlist KCSAN's help. For example, if this
variable is read or written only while holding a particular lock, so that
read_foo_diagnostic() is the only lockless read, then using READ_ONCE()
adds a concurrent read. In RCU, the updates would now need WRITE_ONCE(),
which would cause KCSAN to fail to detect a buggy lockless WRITE_ONCE().
If data_race() is used, then adding a buggy lockless WRITE_ONCE() will
cause KCSAN to complain.

Of course, you would be quite correct to say that this must be balanced
against the possibility of a messed-up pr_info() due to compiler mischief.
Tradeoffs, tradeoffs! ;-)

I should document this tradeoff, shouldn't I?

> What about something like the attached change?
>
> --
>
> ??? Manfred
>
>

> diff --git a/tools/memory-model/Documentation/access-marking.txt b/tools/memory-model/Documentation/access-marking.txt
> index 1ab189f51f55..588326b60834 100644
> --- a/tools/memory-model/Documentation/access-marking.txt
> +++ b/tools/memory-model/Documentation/access-marking.txt
> @@ -68,6 +68,11 @@ READ_ONCE() and WRITE_ONCE():
>
> 4. Writes setting values that feed into error-tolerant heuristics.
>
> +In theory, plain C-language loads can also be used for these use cases.
> +However, in practice this will have the disadvantage of causing KCSAN
> +to generate false positives because KCSAN will have no way of knowing
> +that the resulting data race was intentional.
> +
>
> Data-Racy Reads for Approximate Diagnostics
>
> @@ -86,11 +91,6 @@ that fail to exclude the updates. In this case, it is important to use
> data_race() for the diagnostic reads because otherwise KCSAN would give
> false-positive warnings about these diagnostic reads.
>
> -In theory, plain C-language loads can also be used for this use case.
> -However, in practice this will have the disadvantage of causing KCSAN
> -to generate false positives because KCSAN will have no way of knowing
> -that the resulting data race was intentional.
> -
>
> Data-Racy Reads That Are Checked Against Marked Reload
>
> @@ -110,11 +110,6 @@ that provides the compiler much less scope for mischievous optimizations.
> Capturing the return value from cmpxchg() also saves a memory reference
> in many cases.
>
> -In theory, plain C-language loads can also be used for this use case.
> -However, in practice this will have the disadvantage of causing KCSAN
> -to generate false positives because KCSAN will have no way of knowing
> -that the resulting data race was intentional.

Normally, I would be completely in favor of your suggestion to give
this advice only once. But in this case, there are likely to be people
reading just the part of the document that they think applies to their
situation. So it is necessary to replicate the reminder into all the
sections.

That said, I do applaud your approach of reading the whole thing. That
of course gets you a much more complete understanding of the situation,
and gets me more feedback. ;-)

> Reads Feeding Into Error-Tolerant Heuristics
>
> @@ -125,11 +120,9 @@ that data_race() loads are subject to load fusing, which can result in
> consistent errors, which in turn are quite capable of breaking heuristics.
> Therefore use of data_race() should be limited to cases where some other
> code (such as a barrier() call) will force the occasional reload.
> -
> -In theory, plain C-language loads can also be used for this use case.
> -However, in practice this will have the disadvantage of causing KCSAN
> -to generate false positives because KCSAN will have no way of knowing
> -that the resulting data race was intentional.
> +The heuristics must be able to handle any error. If the heuristics are
> +only able to handle old and new values, then WRITE_ONCE()/READ_ONCE()
> +must be used.

Excellent addition! I have applied the commit shown below with your
Signed-off-by. Please let me know if you would like me to take some other
course of action. And also please let me know if I messed something up.

> Writes Setting Values Feeding Into Error-Tolerant Heuristics
> @@ -142,11 +135,8 @@ due to compiler-mangled reads, it can also tolerate the occasional
> compiler-mangled write, at least assuming that the proper value is in
> place once the write completes.
>
> -Plain C-language stores can also be used for this use case. However,
> -in kernels built with CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n, this
> -will have the disadvantage of causing KCSAN to generate false positives
> -because KCSAN will have no way of knowing that the resulting data race
> -was intentional.
> +Note that KCSAN will only detect mangled writes in kernels built with
> +CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n.

And the same point on needing to say this more than once.

Thanx, Paul

------------------------------------------------------------------------

commit 48db6caa1d32c39e7405df3940f9f7ba07ed0527
Author: Manfred Spraul <[email protected]>
Date: Fri May 14 11:40:06 2021 -0700

tools/memory-model: Heuristics using data_race() must handle all values

Data loaded for use by some sorts of heuristics can tolerate the
occasional erroneous value. In this case the loads may use data_race()
to give the compiler full freedom to optimize while also informing KCSAN
of the intent. However, for this to work, the heuristic needs to be
able to tolerate any erroneous value that could possibly arise. This
commit therefore adds a paragraph spelling this out.

Signed-off-by: Manfred Spraul <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/tools/memory-model/Documentation/access-marking.txt b/tools/memory-model/Documentation/access-marking.txt
index e4a20ebf565d..22ecadec4894 100644
--- a/tools/memory-model/Documentation/access-marking.txt
+++ b/tools/memory-model/Documentation/access-marking.txt
@@ -126,6 +126,11 @@ consistent errors, which in turn are quite capable of breaking heuristics.
Therefore use of data_race() should be limited to cases where some other
code (such as a barrier() call) will force the occasional reload.

+Note that this use case requires that the heuristic be able to handle
+any possible error. In contrast, if the heuristics might be fatally
+confused by one or more of the possible erroneous values, use READ_ONCE()
+instead of data_race().
+
In theory, plain C-language loads can also be used for this use case.
However, in practice this will have the disadvantage of causing KCSAN
to generate false positives because KCSAN will have no way of knowing

2021-05-14 20:21:33

by Paul E. McKenney

[permalink] [raw]

Subject: Re: ipc/sem, ipc/msg, ipc/mqueue.c kcsan questions

On Fri, May 14, 2021 at 04:29:18PM +0800, Hillf Danton wrote:
> On Thu, 13 May 2021 15:01:27 Paul E. McKenney wrote:
> >On Thu, May 13, 2021 at 12:02:01PM -0700, Paul E. McKenney wrote:
> >> On Thu, May 13, 2021 at 08:10:51AM +0200, Manfred Spraul wrote:
> >> > Hi Paul,
> >> >
> >> > On 5/12/21 10:17 PM, Paul E. McKenney wrote:
> >> > > On Wed, May 12, 2021 at 09:58:18PM +0200, Manfred Spraul wrote:
> >> > > > [...]
> >> > > > sma->use_global_lock is evaluated in sem_lock() twice:
> >> > > >
> >> > > > > ?????? /*
> >> > > > > ???????? * Initial check for use_global_lock. Just an optimization,
> >> > > > > ???????? * no locking, no memory barrier.
> >> > > > > ???????? */
> >> > > > > ??????? if (!sma->use_global_lock) {
> >> > > > Both sides of the if-clause handle possible data races.
> >> > > >
> >> > > > Is
> >> > > >
> >> > > > ??? if (!data_race(sma->use_global_lock)) {
> >> > > >
> >> > > > the correct thing to suppress the warning?
> >> > > Most likely READ_ONCE() rather than data_race(), but please see
> >> > > the end of this message.
> >> >
> >> > Based on the document, I would say data_race() is sufficient:
> >> >
> >> > I have replaced the code with "if (jiffies %2)", and it runs fine.
> >>
> >> OK, but please note that "jiffies" is marked volatile, which prevents the
> >> compiler from fusing loads. You just happen to be OK in this particular
> >> case, as described below. Use of the "jiffies_64" non-volatile synonym
> >> for "jiffies" is better for this sort of checking. But even so, just
> >> because a particular version of a particular compiler refrains from
> >> fusing loads in a particular situation does not mean that all future
> >> versions of all future compilers will behave so nicely.
> >>
> >> Again, you are OK in this particular situation, as described below.
> >>
> >> > Thus I don't see which evil things a compiler could do, ... .
> >>
> >> Fair enough, and your example is covered by the section "Reads Feeding
> >> Into Error-Tolerant Heuristics". The worst that the compiler can do is
> >> to force an unnecessary acquisition of the global lock.
> >>
> >> This cannot cause incorrect execution, but could results in poor
> >> scalability. This could be a problem is load fusing were possible, that
> >> is, if successes calls to this function were inlined and the compiler
> >> just reused the value initially loaded.
> >>
> >> The reason that load fusing cannot happen in this case is that the
> >> load is immediately followed by a lock acquisition, which implies a
> >> barrier(), which prevents the compiler from fusing loads on opposite
> >> sides of that barrier().
> >>
> >> > [...]
> >> >
> >> > Does tools/memory-model/Documentation/access-marking.txt, shown below,
> >> > > help?
> >> > >
> >> > [...]
> >> > > int foo;
> >> > > DEFINE_RWLOCK(foo_rwlock);
> >> > >
> >> > > void update_foo(int newval)
> >> > > {
> >> > > write_lock(&foo_rwlock);
> >> > > foo = newval;
> >> > > do_something(newval);
> >> > > write_unlock(&foo_rwlock);
> >> > > }
> >> > >
> >> > > int read_foo(void)
> >> > > {
> >> > > int ret;
> >> > >
> >> > > read_lock(&foo_rwlock);
> >> > > do_something_else();
> >> > > ret = foo;
> >> > > read_unlock(&foo_rwlock);
> >> > > return ret;
> >> > > }
> >> > >
> >> > > int read_foo_diagnostic(void)
> >> > > {
> >> > > return data_race(foo);
> >> > > }
> >> >
> >> > The text didn't help, the example has helped:
> >> >
> >> > It was not clear to me if I have to use data_race() both on the read and the
> >> > write side, or only on one side.
> >> >
> >> > Based on this example: plain C may be paired with data_race(), there is no
> >> > need to mark both sides.
> >>
> >> Actually, you just demonstrated that this example is quite misleading.
> >> That data_race() works only because the read is for diagnostic
> >> purposes. I am queuing a commit with your Reported-by that makes
> >> read_foo_diagnostic() just do a pr_info(), like this:
> >>
> >> void read_foo_diagnostic(void)
> >> {
> >> pr_info("Current value of foo: %d\n", data_race(foo));
> >> }
> >>
> >> So thank you for that!
> >
> >And please see below for an example better illustrating your use case.
> >Anything messed up or missing?
> >
> > Thanx, Paul
> >
> >------------------------------------------------------------------------
> >
> >commit b4287410ee93109501defc4695ccc29144e8f3a3
> >Author: Paul E. McKenney <[email protected]>
> >Date: Thu May 13 14:54:58 2021 -0700
> >
> > tools/memory-model: Add example for heuristic lockless reads
> >
> > This commit adds example code for heuristic lockless reads, based loosely
> > on the sem_lock() and sem_unlock() functions.
> >
> > Reported-by: Manfred Spraul <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> >diff --git a/tools/memory-model/Documentation/access-marking.txt b/tools/memory-model/Documentation/access-marking.txt
> >index 58bff2619876..e4a20ebf565d 100644
> >--- a/tools/memory-model/Documentation/access-marking.txt
> >+++ b/tools/memory-model/Documentation/access-marking.txt
> >@@ -319,6 +319,98 @@ of the ASSERT_EXCLUSIVE_WRITER() is to allow KCSAN to check for a buggy
> > concurrent lockless write.
> >
> >
> >+Lock-Protected Writes With Heuristic Lockless Reads
> >+---------------------------------------------------
> >+
> >+For another example, suppose that the code can normally make use of
> >+a per-data-structure lock, but there are times when a global lock is
> >+required. These times are indicated via a global flag. The code might
> >+look as follows, and is based loosely on sem_lock() and sem_unlock():
> >+
> >+ bool global_flag;
> >+ DEFINE_SPINLOCK(global_lock);
> >+ struct foo {
> >+ spinlock_t f_lock;
> >+ int f_data;
> >+ };
> >+
> >+ /* All foo structures are in the following array. */
> >+ int nfoo;
> >+ struct foo *foo_array;
> >+
> >+ void do_something_locked(struct foo *fp)
> >+ {
> >+ /* IMPORTANT: Heuristic plus spin_lock()! */
> >+ if (!data_race(global_flag)) {
> >+ spin_lock(&fp->f_lock);
> >+ if (!smp_load_acquire(&global_flag)) {
> >+ do_something(fp);
> >+ spin_unlock(&fp->f_lock);
> >+ return;
> >+ }
> >+ spin_unlock(&fp->f_lock);
> >+ }
> >+ spin_lock(&global_flag);
> >+ /* Lock held, thus global flag cannot change. */
> >+ if (!global_flag) {
> >+ spin_lock(&fp->f_lock);
> >+ spin_unlock(&global_flag);
> >+ }
> >+ do_something(fp);
> >+ if (global_flag)
>
> The global flag may change without global lock held - we will likely have the
> wrong lock released if we can see the change.

Right you are! I am adding a local variable to address this, thank you!

Thanx, Paul

> >+ spin_unlock(&global_flag);
> >+ else
> >+ spin_lock(&fp->f_lock);
> >+ }
> >+
> >+ void begin_global(void)
> >+ {
> >+ int i;
> >+
> >+ spin_lock(&global_flag);
> >+ WRITE_ONCE(global_flag, true);
> >+ for (i = 0; i < nfoo; i++) {
> >+ /* Wait for pre-existing local locks. */
> >+ spin_lock(&fp->f_lock);
> >+ spin_unlock(&fp->f_lock);
> >+ }
> >+ spin_unlock(&global_flag);
> >+ }
> >+
> >+ void end_global(void)
> >+ {
> >+ spin_lock(&global_flag);
> >+ smp_store_release(&global_flag, false);
> >+ /* Pre-existing global lock acquisitions will recheck. */
> >+ spin_unlock(&global_flag);
> >+ }
> >+
> >+All code paths leading from the do_something_locked() function's first
> >+read from global_flag acquire a lock, so endless load fusing cannot
> >+happen.
> >+
> >+If the value read from global_flag is true, then global_flag is rechecked
> >+while holding global_lock, which prevents global_flag from changing.
> >+If this recheck finds that global_flag is now false, the acquisition
> >+of ->f_lock prior to the release of global_lock will result in any subsequent
> >+begin_global() invocation waiting to acquire ->f_lock.
> >+
> >+On the other hand, if the value read from global_flag is false, then
> >+global_flag, then rechecking under ->f_lock combined with synchronization
> >+with begin_global() guarantees than any erroneous read will cause the
> >+do_something_locked() function's first do_something() invocation to happen
> >+before begin_global() returns. The combination of the smp_load_acquire()
> >+in do_something_locked() and the smp_store_release() in end_global()
> >+guarantees that either the do_something_locked() function's first
> >+do_something() invocation happens after the call to end_global() or that
> >+do_something_locked() acquires global_lock() and rechecks under the lock.
> >+
> >+For this to work, only those foo structures in foo_array[] may be
> >+passed to do_something_locked(). The reason for this is that the
> >+synchronization with begin_global() relies on momentarily locking each
> >+and every foo structure.
> >+
> >+
> > Lockless Reads and Writes
> > -------------------------
> >
> >

2021-05-14 20:21:52

by Paul E. McKenney

[permalink] [raw]

Subject: Re: ipc/sem, ipc/msg, ipc/mqueue.c kcsan questions

On Fri, May 14, 2021 at 06:01:37PM +0200, Manfred Spraul wrote:
> Hi Paul,
>
> On 5/14/21 12:01 AM, Paul E. McKenney wrote:
> > On Thu, May 13, 2021 at 12:02:01PM -0700, Paul E. McKenney wrote:
> > > On Thu, May 13, 2021 at 08:10:51AM +0200, Manfred Spraul wrote:
> > > > Hi Paul,
> > > >
> > > > On 5/12/21 10:17 PM, Paul E. McKenney wrote:
> > > > > On Wed, May 12, 2021 at 09:58:18PM +0200, Manfred Spraul wrote:
> > > > > > [...]
> > > > > > sma->use_global_lock is evaluated in sem_lock() twice:
> > > > > >
> > > > > > > ?????? /*
> > > > > > > ???????? * Initial check for use_global_lock. Just an optimization,
> > > > > > > ???????? * no locking, no memory barrier.
> > > > > > > ???????? */
> > > > > > > ??????? if (!sma->use_global_lock) {
> > > > > > Both sides of the if-clause handle possible data races.
> > > > > >
> > > > > > Is
> > > > > >
> > > > > > ??? if (!data_race(sma->use_global_lock)) {
> > > > > >
> > > > > > the correct thing to suppress the warning?
> > > > > Most likely READ_ONCE() rather than data_race(), but please see
> > > > > the end of this message.
> > > > Based on the document, I would say data_race() is sufficient:
> > > >
> > > > I have replaced the code with "if (jiffies %2)", and it runs fine.
> > > OK, but please note that "jiffies" is marked volatile, which prevents the
> > > compiler from fusing loads. You just happen to be OK in this particular
> > > case, as described below. Use of the "jiffies_64" non-volatile synonym
> > > for "jiffies" is better for this sort of checking. But even so, just
> > > because a particular version of a particular compiler refrains from
> > > fusing loads in a particular situation does not mean that all future
> > > versions of all future compilers will behave so nicely.
> > >
> > > Again, you are OK in this particular situation, as described below.
> > >
> > > > Thus I don't see which evil things a compiler could do, ... .
> > > Fair enough, and your example is covered by the section "Reads Feeding
> > > Into Error-Tolerant Heuristics". The worst that the compiler can do is
> > > to force an unnecessary acquisition of the global lock.
> > >
> > > This cannot cause incorrect execution, but could results in poor
> > > scalability. This could be a problem is load fusing were possible, that
> > > is, if successes calls to this function were inlined and the compiler
> > > just reused the value initially loaded.
> > >
> > > The reason that load fusing cannot happen in this case is that the
> > > load is immediately followed by a lock acquisition, which implies a
> > > barrier(), which prevents the compiler from fusing loads on opposite
> > > sides of that barrier().
> > >
> > > > [...]
> > > >
> > > > Does tools/memory-model/Documentation/access-marking.txt, shown below,
> > > > > help?
> > > > >
> > > > [...]
> > > > > int foo;
> > > > > DEFINE_RWLOCK(foo_rwlock);
> > > > >
> > > > > void update_foo(int newval)
> > > > > {
> > > > > write_lock(&foo_rwlock);
> > > > > foo = newval;
> > > > > do_something(newval);
> > > > > write_unlock(&foo_rwlock);
> > > > > }
> > > > >
> > > > > int read_foo(void)
> > > > > {
> > > > > int ret;
> > > > >
> > > > > read_lock(&foo_rwlock);
> > > > > do_something_else();
> > > > > ret = foo;
> > > > > read_unlock(&foo_rwlock);
> > > > > return ret;
> > > > > }
> > > > >
> > > > > int read_foo_diagnostic(void)
> > > > > {
> > > > > return data_race(foo);
> > > > > }
> > > > The text didn't help, the example has helped:
> > > >
> > > > It was not clear to me if I have to use data_race() both on the read and the
> > > > write side, or only on one side.
> > > >
> > > > Based on this example: plain C may be paired with data_race(), there is no
> > > > need to mark both sides.
> > > Actually, you just demonstrated that this example is quite misleading.
> > > That data_race() works only because the read is for diagnostic
> > > purposes. I am queuing a commit with your Reported-by that makes
> > > read_foo_diagnostic() just do a pr_info(), like this:
> > >
> > > void read_foo_diagnostic(void)
> > > {
> > > pr_info("Current value of foo: %d\n", data_race(foo));
> > > }
> > >
> > > So thank you for that!
> > And please see below for an example better illustrating your use case.
> > Anything messed up or missing?
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit b4287410ee93109501defc4695ccc29144e8f3a3
> > Author: Paul E. McKenney <[email protected]>
> > Date: Thu May 13 14:54:58 2021 -0700
> >
> > tools/memory-model: Add example for heuristic lockless reads
> > This commit adds example code for heuristic lockless reads, based loosely
> > on the sem_lock() and sem_unlock() functions.
>
> I would refer to nf_conntrack_all_lock() instead of sem_lock():
>
> nf_conntrack_all_lock() is far easier to read, and it contains the same
> heuristics

Sounds good, updated to nf_conntrack_lock(), nf_conntrack_all_lock(),
and nf_conntrack_all_unlock().

> > Reported-by: Manfred Spraul <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> > diff --git a/tools/memory-model/Documentation/access-marking.txt b/tools/memory-model/Documentation/access-marking.txt
> > index 58bff2619876..e4a20ebf565d 100644
> > --- a/tools/memory-model/Documentation/access-marking.txt
> > +++ b/tools/memory-model/Documentation/access-marking.txt
> > @@ -319,6 +319,98 @@ of the ASSERT_EXCLUSIVE_WRITER() is to allow KCSAN to check for a buggy
> > concurrent lockless write.
> > +Lock-Protected Writes With Heuristic Lockless Reads
> > +---------------------------------------------------
> > +
> > +For another example, suppose that the code can normally make use of
> > +a per-data-structure lock, but there are times when a global lock is
> > +required. These times are indicated via a global flag. The code might
> > +look as follows, and is based loosely on sem_lock() and sem_unlock():
> > +
> > + bool global_flag;
> > + DEFINE_SPINLOCK(global_lock);
> > + struct foo {
> > + spinlock_t f_lock;
> > + int f_data;
> > + };
> > +
> > + /* All foo structures are in the following array. */
> > + int nfoo;
> > + struct foo *foo_array;
> > +
> > + void do_something_locked(struct foo *fp)
> > + {
> > + /* IMPORTANT: Heuristic plus spin_lock()! */
> > + if (!data_race(global_flag)) {
> > + spin_lock(&fp->f_lock);
> > + if (!smp_load_acquire(&global_flag)) {
> > + do_something(fp);
> > + spin_unlock(&fp->f_lock);
> > + return;
> > + }
> > + spin_unlock(&fp->f_lock);
> > + }
> > + spin_lock(&global_flag);
> > + /* Lock held, thus global flag cannot change. */
> > + if (!global_flag) {
> > + spin_lock(&fp->f_lock);
> > + spin_unlock(&global_flag);
>
> spin_unlock(&global_lock), not &global_flag.
>
> That was the main results from the discussions a few years ago:
>
> Split global_lock and global_flag. Do not try to use
> spin_is_locked(&global_lock). Just add a flag. The 4 bytes are well
> invested.

Thank you for catching this typo! It is now global_lock.

Thanx, Paul

> > + }
> > + do_something(fp);
> > + if (global_flag)
> > + spin_unlock(&global_flag);
> &global_lock
> > + else
> > + spin_lock(&fp->f_lock);
> > + }
> > +
> > + void begin_global(void)
> > + {
> > + int i;
> > +
> > + spin_lock(&global_flag);
> > + WRITE_ONCE(global_flag, true);
> > + for (i = 0; i < nfoo; i++) {
> > + /* Wait for pre-existing local locks. */
> > + spin_lock(&fp->f_lock);
> > + spin_unlock(&fp->f_lock);
> > + }
> > + spin_unlock(&global_flag);
> > + }
> > +
> > + void end_global(void)
> > + {
> > + spin_lock(&global_flag);
> > + smp_store_release(&global_flag, false);
> > + /* Pre-existing global lock acquisitions will recheck. */
> > + spin_unlock(&global_flag);
> > + }
> > +
> > +All code paths leading from the do_something_locked() function's first
> > +read from global_flag acquire a lock, so endless load fusing cannot
> > +happen.
> > +
> > +If the value read from global_flag is true, then global_flag is rechecked
> > +while holding global_lock, which prevents global_flag from changing.
> > +If this recheck finds that global_flag is now false, the acquisition
> > +of ->f_lock prior to the release of global_lock will result in any subsequent
> > +begin_global() invocation waiting to acquire ->f_lock.
> > +
> > +On the other hand, if the value read from global_flag is false, then
> > +global_flag, then rechecking under ->f_lock combined with synchronization
> > +with begin_global() guarantees than any erroneous read will cause the
> > +do_something_locked() function's first do_something() invocation to happen
> > +before begin_global() returns. The combination of the smp_load_acquire()
> > +in do_something_locked() and the smp_store_release() in end_global()
> > +guarantees that either the do_something_locked() function's first
> > +do_something() invocation happens after the call to end_global() or that
> > +do_something_locked() acquires global_lock() and rechecks under the lock.
> > +
> > +For this to work, only those foo structures in foo_array[] may be
> > +passed to do_something_locked(). The reason for this is that the
> > +synchronization with begin_global() relies on momentarily locking each
> > +and every foo structure.
> > +
> > +
> > Lockless Reads and Writes
> > -------------------------
>
>

2021-05-19 00:45:53

by Paul E. McKenney

[permalink] [raw]

Subject: Re: ipc/sem, ipc/msg, ipc/mqueue.c kcsan questions

On Fri, May 14, 2021 at 11:44:55AM -0700, Paul E. McKenney wrote:
> On Fri, May 14, 2021 at 07:41:02AM +0200, Manfred Spraul wrote:
> > On 5/13/21 9:02 PM, Paul E. McKenney wrote:
> > > On Thu, May 13, 2021 at 08:10:51AM +0200, Manfred Spraul wrote:
> > > > On 5/12/21 10:17 PM, Paul E. McKenney wrote:

[ . . . ]

> > > > > int foo;
> > > > > DEFINE_RWLOCK(foo_rwlock);
> > > > >
> > > > > void update_foo(int newval)
> > > > > {
> > > > > write_lock(&foo_rwlock);
> > > > > foo = newval;
> > > > > do_something(newval);
> > > > > write_unlock(&foo_rwlock);
> > > > > }
> > > > >
> > > > > int read_foo(void)
> > > > > {
> > > > > int ret;
> > > > >
> > > > > read_lock(&foo_rwlock);
> > > > > do_something_else();
> > > > > ret = foo;
> > > > > read_unlock(&foo_rwlock);
> > > > > return ret;
> > > > > }
> > > > >
> > > > > int read_foo_diagnostic(void)
> > > > > {
> > > > > return data_race(foo);
> > > > > }
> > > > The text didn't help, the example has helped:
> > > >
> > > > It was not clear to me if I have to use data_race() both on the read and the
> > > > write side, or only on one side.
> > > >
> > > > Based on this example: plain C may be paired with data_race(), there is no
> > > > need to mark both sides.
> > > Actually, you just demonstrated that this example is quite misleading.
> > > That data_race() works only because the read is for diagnostic
> > > purposes. I am queuing a commit with your Reported-by that makes
> > > read_foo_diagnostic() just do a pr_info(), like this:
> > >
> > > void read_foo_diagnostic(void)
> > > {
> > > pr_info("Current value of foo: %d\n", data_race(foo));
> > > }
> > >
> > > So thank you for that!
> >
> > I would not like this change at all.
> > Assume you chase a rare bug, and notice an odd pr_info() output.
> > It will take you really long until you figure out that a data_race() mislead
> > you.
> > Thus for a pr_info(), I would consider READ_ONCE() as the correct thing.
>
> It depends, but I agree with a general preference for READ_ONCE() over
> data_race().
>
> However, for some types of concurrency designs, using a READ_ONCE()
> can make it more difficult to enlist KCSAN's help. For example, if this
> variable is read or written only while holding a particular lock, so that
> read_foo_diagnostic() is the only lockless read, then using READ_ONCE()
> adds a concurrent read. In RCU, the updates would now need WRITE_ONCE(),
> which would cause KCSAN to fail to detect a buggy lockless WRITE_ONCE().
> If data_race() is used, then adding a buggy lockless WRITE_ONCE() will
> cause KCSAN to complain.
>
> Of course, you would be quite correct to say that this must be balanced
> against the possibility of a messed-up pr_info() due to compiler mischief.
> Tradeoffs, tradeoffs! ;-)
>
> I should document this tradeoff, shouldn't I?

Except that Marco Elver reminds me that there are two other possibilities:

1. data_race(READ_ONCE(foo)), which both suppresses compiler
optimizations and causes KCSAN to ignore the access.

2. "void __no_kcsan read_foo_diagnostic(void)" to cause KCSAN to
ignore the entire function, and READ_ONCE() on the access.

So things might be the way you want anyway. Does the patch below work
for you?

Thanx, Paul

------------------------------------------------------------------------

diff --git a/tools/memory-model/Documentation/access-marking.txt b/tools/memory-model/Documentation/access-marking.txt
index fe4ad6d12d24..e3012f666e62 100644
--- a/tools/memory-model/Documentation/access-marking.txt
+++ b/tools/memory-model/Documentation/access-marking.txt
@@ -279,19 +279,34 @@ tells KCSAN that data races are expected, and should be silently
ignored. This data_race() also tells the human reading the code that
read_foo_diagnostic() might sometimes return a bogus value.

-However, please note that your kernel must be built with
-CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n in order for KCSAN to
-detect a buggy lockless write. If you need KCSAN to detect such a
-write even if that write did not change the value of foo, you also
-need CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n. If you need KCSAN to
-detect such a write happening in an interrupt handler running on the
-same CPU doing the legitimate lock-protected write, you also need
-CONFIG_KCSAN_INTERRUPT_WATCHER=y. With some or all of these Kconfig
-options set properly, KCSAN can be quite helpful, although it is not
-necessarily a full replacement for hardware watchpoints. On the other
-hand, neither are hardware watchpoints a full replacement for KCSAN
-because it is not always easy to tell hardware watchpoint to conditionally
-trap on accesses.
+If it is necessary to suppress compiler optimization and also detect
+buggy lockless writes, read_foo_diagnostic() can be updated as follows:
+
+ void read_foo_diagnostic(void)
+ {
+ pr_info("Current value of foo: %d\n", data_race(READ_ONCE(foo)));
+ }
+
+Alternatively, given that KCSAN is to ignore all accesses in this function,
+this function can be marked __no_kcsan and the data_race() can be dropped:
+
+ void __no_kcsan read_foo_diagnostic(void)
+ {
+ pr_info("Current value of foo: %d\n", READ_ONCE(foo));
+ }
+
+However, in order for KCSAN to detect buggy lockless writes, your kernel
+must be built with CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n. If you
+need KCSAN to detect such a write even if that write did not change
+the value of foo, you also need CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n.
+If you need KCSAN to detect such a write happening in an interrupt handler
+running on the same CPU doing the legitimate lock-protected write, you
+also need CONFIG_KCSAN_INTERRUPT_WATCHER=y. With some or all of these
+Kconfig options set properly, KCSAN can be quite helpful, although
+it is not necessarily a full replacement for hardware watchpoints.
+On the other hand, neither are hardware watchpoints a full replacement
+for KCSAN because it is not always easy to tell hardware watchpoint to
+conditionally trap on accesses.

Lock-Protected Writes With Lockless Reads