2018-05-22 19:41:24

by Prasad Sodagudi

[permalink] [raw]
Subject: write_lock_irq(&tasklist_lock)

Hi All,

When following test is executed on 4.14.41 stable kernel, observed that
one of the core is waiting for tasklist_lock for long time with IRQs
disabled.
./stress-ng-64 --get 8 -t 3h --times --metrics-brief

Every time when device is crashed, I observed that one the task stuck at
fork system call and waiting for tasklist_lock as writer with irq
disabled.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/fork.c?h=linux-4.14.y#n1843

Some other tasks are making getrlimit, prlimit system calls, so that
these readers are continuously taking tasklist_list read lock.

Writer has disabled local IRQs for long time and waiting to readers to
finish but readers are keeping tasklist_lock busy
for quite long time.

I think, −−get N option creates N thread and they make following system
calls.
========================================================================
start N workers that call system calls that fetch data from the kernel,
currently these are: getpid,
getppid, getcwd, getgid, getegid, getuid, getgroups, getpgrp, getpgid,
getpriority, getresgid, getresuid,
getrlimit, prlimit, getrusage, getsid, gettid, getcpu, gettimeofday,
uname, adjtimex, sysfs.
Some of these system calls are OS specific.
========================================================================

Have you observed this type of issues with tasklist_lock ?
Do we need write_lock_irq(&tasklist_lock) in below portion of code ? Can
I use write_unlock instead of write_lock_irq in portion of code?
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/fork.c?h=linux-4.14.y#n1843

-Thanks, Prasad

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum,
Linux Foundation Collaborative Project


2018-05-22 20:29:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: write_lock_irq(&tasklist_lock)

On Tue, May 22, 2018 at 12:40 PM Sodagudi Prasad <[email protected]>
wrote:

> Have you observed this type of issues with tasklist_lock ?

tasklist_lock remains pretty painful. It covers too much, but trying to
split it up has never worked well.

It's usually not a huge problem because there are so few writers, but what
you're seeing is the writer starvation issue because readers can be
plentiful.

> Do we need write_lock_irq(&tasklist_lock) in below portion of code ? Can
> I use write_unlock instead of write_lock_irq in portion of code?

You absolutely need write_lock_irq(), because taking the tasklist_lock
without disabling interrupts will deadlock very quickly due to an interrupt
taking the tasklist_lock for reading.

That said, the write_lock_irq(&tasklist_lock) could possibly be replaced
with something like

static void tasklist_write_lock(void)
{
unsigned long flags;
local_irq_save(flags);
while (!write_trylock(&tasklist_lock)) {
local_irq_restore(flags);
do { cpu_relax(); } while (write_islocked(&tasklist_lock));
local_irq_disable();
}
}

but we don't have that "write_islocked()" function.

So the above would need more work, and is entirely untested anyway,
obviously.

Linus

2018-05-22 21:18:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: write_lock_irq(&tasklist_lock)

On Tue, May 22, 2018 at 01:27:17PM -0700, Linus Torvalds wrote:

> It's usually not a huge problem because there are so few writers, but what
> you're seeing is the writer starvation issue because readers can be
> plentiful.

qrwlock is a fair lock and should not exhibit writer starvation. Write
acquire time is of course proportional to the number of CPUs in the
system because each can be holding a reader, but once there is a pending
writer there should not be new readers.

> > Do we need write_lock_irq(&tasklist_lock) in below portion of code ? Can
> > I use write_unlock instead of write_lock_irq in portion of code?
>
> You absolutely need write_lock_irq(), because taking the tasklist_lock
> without disabling interrupts will deadlock very quickly due to an interrupt
> taking the tasklist_lock for reading.
>
> That said, the write_lock_irq(&tasklist_lock) could possibly be replaced
> with something like
>
> static void tasklist_write_lock(void)
> {
> unsigned long flags;
> local_irq_save(flags);
> while (!write_trylock(&tasklist_lock)) {
> local_irq_restore(flags);
> do { cpu_relax(); } while (write_islocked(&tasklist_lock));
> local_irq_disable();
> }
> }
>
> but we don't have that "write_islocked()" function.

You basically want to spin-wait with interrupts enabled, right? I think
there were problems with that, but I can't remember, my brain is
entirely shot for the day. But given how qrwlock is constructed, you'd
have to look at the arch spinlock implementation for that (qspinlock for
x86).

The obvious problem of course is:


spin_lock_irq(&L)
// contended
local_irq_enable();
while(...)
cpu_relax();
<IRQ>
spin_lock(&L);

Which because the spinlock is fair, is an instant deadlock.


You can do the IRQ enabled spin-wait for unfair locks, but by now all
our locks are fair (because we kept hitting starvation).


2018-05-22 21:33:13

by Linus Torvalds

[permalink] [raw]
Subject: Re: write_lock_irq(&tasklist_lock)

On Tue, May 22, 2018 at 2:17 PM Peter Zijlstra <[email protected]> wrote:

> qrwlock is a fair lock and should not exhibit writer starvation.

We actually have a special rule to make it *not* be fair, in that
interrupts are allowed to take the read lock if there are readers - even if
there are waiting writers.

I'm not sure how much of an fairness effect this has, but it's required
because of our rule that you can take it for reading without disabling
interrupts.

See

void queued_read_lock_slowpath(struct qrwlock *lock)

in kernel/locking/qrwlock.c.

> You basically want to spin-wait with interrupts enabled, right?

That was the intent of my (untested) pseudo-code. It should work fine. Note
that I used write_trylock() only, so there is no queueing (which also
implies no fairness).

I'm not saying it's a _good_ idea. I'm saying it might work if all you
worry about is the irq-disabled part.

Linus

2018-05-23 08:21:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: write_lock_irq(&tasklist_lock)

On Tue, May 22, 2018 at 02:31:42PM -0700, Linus Torvalds wrote:
> On Tue, May 22, 2018 at 2:17 PM Peter Zijlstra <[email protected]> wrote:
>
> > qrwlock is a fair lock and should not exhibit writer starvation.
>
> We actually have a special rule to make it *not* be fair, in that
> interrupts are allowed to take the read lock if there are readers - even if
> there are waiting writers.

Urgh, right.. would be interesting to know how much of that is happening
in that workload. I assumed the readers were mostly due to the syscalls
the reporter talked about, and those should not trigger that case.

> > You basically want to spin-wait with interrupts enabled, right?
>
> That was the intent of my (untested) pseudo-code. It should work fine. Note
> that I used write_trylock() only, so there is no queueing (which also
> implies no fairness).
>
> I'm not saying it's a _good_ idea. I'm saying it might work if all you
> worry about is the irq-disabled part.

Right, if you make it unfair and utterly prone to starvation then yes,
you can make it 'work'.

2018-05-23 13:06:20

by Will Deacon

[permalink] [raw]
Subject: Re: write_lock_irq(&tasklist_lock)

Hi Prasad,

On Tue, May 22, 2018 at 12:40:05PM -0700, Sodagudi Prasad wrote:
> When following test is executed on 4.14.41 stable kernel, observed that one
> of the core is waiting for tasklist_lock for long time with IRQs disabled.
> ./stress-ng-64 --get 8 -t 3h --times --metrics-brief
>
> Every time when device is crashed, I observed that one the task stuck at
> fork system call and waiting for tasklist_lock as writer with irq disabled.
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/fork.c?h=linux-4.14.y#n1843

Please use a newer kernel. We've addressed this in mainline by moving
arm64 over to the qrwlock implementation which (after some other changes)
guarantees forward progress for well-behaved readers and writers.

To give an example from locktorture with 2 writers and 8 readers, after
a few seconds I see:

rwlock:
Writes: Total: 6725 Max/Min: 0/0 Fail: 0
Reads: Total: 5103727 Max/Min: 0/0 Fail: 0

qrwlock:
Writes: Total: 155284 Max/Min: 0/0 Fail: 0
Reads: Total: 767703 Max/Min: 0/0 Fail: 0

so the ratio is closer to ~6:1 rather than ~191:1 for this case. The
total locking throughput has dropped, but that's expected for this type
of lock where maximum throughput would be achieved by favouring readers.

Will

2018-05-23 15:28:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: write_lock_irq(&tasklist_lock)

On Wed, May 23, 2018 at 6:05 AM Will Deacon <[email protected]> wrote:

> Please use a newer kernel. We've addressed this in mainline by moving
> arm64 over to the qrwlock implementation which (after some other changes)
> guarantees forward progress for well-behaved readers and writers.

Oh, I didn't even realize that this wasn't x86, and that there was still
the very unfair rwlock issue on 4.14 on arm.

Yeah, the queuing rwlocks shouldn't show the really pathological problems
we used to have long ago.

Linus

2018-05-23 15:36:32

by Will Deacon

[permalink] [raw]
Subject: Re: write_lock_irq(&tasklist_lock)

[+Boqun]

On Wed, May 23, 2018 at 08:25:06AM -0700, Linus Torvalds wrote:
> On Wed, May 23, 2018 at 6:05 AM Will Deacon <[email protected]> wrote:
>
> > Please use a newer kernel. We've addressed this in mainline by moving
> > arm64 over to the qrwlock implementation which (after some other changes)
> > guarantees forward progress for well-behaved readers and writers.
>
> Oh, I didn't even realize that this wasn't x86, and that there was still
> the very unfair rwlock issue on 4.14 on arm.
>
> Yeah, the queuing rwlocks shouldn't show the really pathological problems
> we used to have long ago.

Yup, although they do reveal new issues that Boqun has been running into
recently with his lockdep improvements. The general pattern is if you
have:

P0: P1: P2:

spin_lock(&slock) read_lock(&rwlock) write_lock(&rwlock)
read_lock(&rwlock) spin_lock(&slock)

then the CPUs can be queued on the rwlock such that P1 has the lock, then
P2 is queued and then P0. If P0 has taken the spinlock, we have a deadlock
which wouldn't arise with the non-queued version.

In other words, qrwlock requires consistent locking order wrt spinlocks.

Will

2018-05-23 16:28:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: write_lock_irq(&tasklist_lock)

On Wed, May 23, 2018 at 8:35 AM Will Deacon <[email protected]> wrote:

> In other words, qrwlock requires consistent locking order wrt spinlocks.

I *thought* lockdep already tracked and detected this. Or is that only with
with the sleeping versions?

But yes, that's equivalent to the irq-unfairness thing we have a special
case for.

Linus

2018-05-24 22:42:37

by Will Deacon

[permalink] [raw]
Subject: Re: write_lock_irq(&tasklist_lock)

On Wed, May 23, 2018 at 09:26:35AM -0700, Linus Torvalds wrote:
> On Wed, May 23, 2018 at 8:35 AM Will Deacon <[email protected]> wrote:
>
> > In other words, qrwlock requires consistent locking order wrt spinlocks.
>
> I *thought* lockdep already tracked and detected this. Or is that only with
> with the sleeping versions?

There are patches in-flight to detect this:

https://marc.info/?l=linux-kernel&m=152483640529740&w=2k

as part of Boqun's work into recursive read locking.

Will

2018-05-25 00:10:44

by Boqun Feng

[permalink] [raw]
Subject: Re: write_lock_irq(&tasklist_lock)

On Thu, May 24, 2018 at 01:49:31PM +0100, Will Deacon wrote:
> On Wed, May 23, 2018 at 09:26:35AM -0700, Linus Torvalds wrote:
> > On Wed, May 23, 2018 at 8:35 AM Will Deacon <[email protected]> wrote:
> >
> > > In other words, qrwlock requires consistent locking order wrt spinlocks.
> >
> > I *thought* lockdep already tracked and detected this. Or is that only with
> > with the sleeping versions?
>
> There are patches in-flight to detect this:
>
> https://marc.info/?l=linux-kernel&m=152483640529740&w=2k
>
> as part of Boqun's work into recursive read locking.
>

Yeah, lemme put some details here:

So we have three cases:

Case #1 (from Will)

P0: P1: P2:

spin_lock(&slock) read_lock(&rwlock)
write_lock(&rwlock)
read_lock(&rwlock) spin_lock(&slock)

, which is a deadlock, and couldn't not be detected by lockdep yet. And
lockdep could detect this with the patch I attach at the end of the
mail.

Case #2

P0: P1: P2:

<in irq handler>
spin_lock(&slock) read_lock(&rwlock)
write_lock(&rwlock)
read_lock(&rwlock) spin_lock_irq(&slock)

, which is not a deadlock, as the read_lock() on P0 can use the unfair
fastpass.

Case #3

P0: P1: P2:

<in irq handler>
spin_lock_irq(&slock) read_lock(&rwlock)
write_lock_irq(&rwlock)
read_lock(&rwlock) spin_lock(&slock)

, which is a deadlock, as the read_lock() on P0 cannot use the fastpass.
To detect this and not to make case #2 as a false positive, the
recursive deadlock detection patchset is needed, the WIP version is at:

git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git arr-rfc-wip

The code is done, I'm just working on the rework for documention stuff,
so if anyone is interested, could try it out ;-)

Regards,
Boqun

------------------->8
Subject: [PATCH] locking: More accurate annotations for read_lock()

On the archs using QUEUED_RWLOCKS, read_lock() is not always a recursive
read lock, actually it's only recursive if in_interrupt() is true. So
change the annotation accordingly to catch more deadlocks.

Note we used to treat read_lock() as pure recursive read locks in
lib/locking-seftest.c, and this is useful, especially for the lockdep
development selftest, so we keep this via a variable to force switching
lock annotation for read_lock().

Signed-off-by: Boqun Feng <[email protected]>
---
include/linux/lockdep.h | 35 ++++++++++++++++++++++++++++++++++-
lib/locking-selftest.c | 11 +++++++++++
2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6fc77d4dbdcd..07793986c063 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -27,6 +27,7 @@ extern int lock_stat;
#include <linux/list.h>
#include <linux/debug_locks.h>
#include <linux/stacktrace.h>
+#include <linux/preempt.h>

/*
* We'd rather not expose kernel/lockdep_states.h this wide, but we do need
@@ -540,6 +541,31 @@ static inline void print_irqtrace_events(struct task_struct *curr)
}
#endif

+/* Variable used to make lockdep treat read_lock() as recursive in selftests */
+#ifdef CONFIG_DEBUG_LOCKING_API_SELFTESTS
+extern unsigned int force_read_lock_recursive;
+#else /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */
+#define force_read_lock_recursive 0
+#endif /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */
+
+#ifdef CONFIG_LOCKDEP
+/*
+ * read_lock() is recursive if:
+ * 1. We force lockdep think this way in selftests or
+ * 2. The implementation is not queued read/write lock or
+ * 3. The locker is at an in_interrupt() context.
+ */
+static inline bool read_lock_is_recursive(void)
+{
+ return force_read_lock_recursive ||
+ !IS_ENABLED(CONFIG_QUEUED_RWLOCKS) ||
+ in_interrupt();
+}
+#else /* CONFIG_LOCKDEP */
+/* If !LOCKDEP, the value is meaningless */
+#define read_lock_is_recursive() 0
+#endif
+
/*
* For trivial one-depth nesting of a lock-class, the following
* global define can be used. (Subsystems with multiple levels
@@ -561,7 +587,14 @@ static inline void print_irqtrace_events(struct task_struct *curr)
#define spin_release(l, n, i) lock_release(l, n, i)

#define rwlock_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
-#define rwlock_acquire_read(l, s, t, i) lock_acquire_shared_recursive(l, s, t, NULL, i)
+#define rwlock_acquire_read(l, s, t, i) \
+do { \
+ if (read_lock_is_recursive()) \
+ lock_acquire_shared_recursive(l, s, t, NULL, i); \
+ else \
+ lock_acquire_shared(l, s, t, NULL, i); \
+} while (0)
+
#define rwlock_release(l, n, i) lock_release(l, n, i)

#define seqcount_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index b5c1293ce147..dd92f8ad83d5 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -28,6 +28,7 @@
* Change this to 1 if you want to see the failure printouts:
*/
static unsigned int debug_locks_verbose;
+unsigned int force_read_lock_recursive;

static DEFINE_WW_CLASS(ww_lockdep);

@@ -1978,6 +1979,11 @@ void locking_selftest(void)
return;
}

+ /*
+ * treats read_lock() as recursive read locks for testing purpose
+ */
+ force_read_lock_recursive = 1;
+
/*
* Run the testsuite:
*/
@@ -2072,6 +2078,11 @@ void locking_selftest(void)

ww_tests();

+ force_read_lock_recursive = 0;
+ /*
+ * queued_read_lock() specific test cases can be put here
+ */
+
if (unexpected_testcase_failures) {
printk("-----------------------------------------------------------------\n");
debug_locks = 0;
--
2.16.2


2018-05-25 02:35:37

by Prasad Sodagudi

[permalink] [raw]
Subject: Re: write_lock_irq(&tasklist_lock)

On 2018-05-24 06:51, Boqun Feng wrote:
> On Thu, May 24, 2018 at 01:49:31PM +0100, Will Deacon wrote:
>> On Wed, May 23, 2018 at 09:26:35AM -0700, Linus Torvalds wrote:
>> > On Wed, May 23, 2018 at 8:35 AM Will Deacon <[email protected]> wrote:
>> >
>> > > In other words, qrwlock requires consistent locking order wrt spinlocks.
>> >
>> > I *thought* lockdep already tracked and detected this. Or is that only with
>> > with the sleeping versions?
>>
>> There are patches in-flight to detect this:
>>
>> https://marc.info/?l=linux-kernel&m=152483640529740&w=2k
>>
>> as part of Boqun's work into recursive read locking.
>>
>
Hi Linus and Will,

Thank you for quick responses and providing the suggestions.

Kernel version is locked for couple of products and same issue observed
on both 4.14.41
and 4.9.96 kernels. We can only accept the stable updates from upstream
for these products.
If QUEUED_RWLOCKS works on above listed kernel versions without any
issues, we can enabled QUEUED_RWLOCKS.

Can we go ahead with Linus suggestion for these kernel version?
So that IRQ wont be disabled for quite a long time.

static void tasklist_write_lock(void)
{
unsigned long flags;
local_irq_save(flags);
while (!write_trylock(&tasklist_lock)) {
local_irq_restore(flags);
do { cpu_relax(); } while
(write_islocked(&tasklist_lock));
local_irq_disable();
}
}


-Thanks, Prasad

> Yeah, lemme put some details here:
>
> So we have three cases:
>
> Case #1 (from Will)
>
> P0: P1: P2:
>
> spin_lock(&slock) read_lock(&rwlock)
> write_lock(&rwlock)
> read_lock(&rwlock) spin_lock(&slock)
>
> , which is a deadlock, and couldn't not be detected by lockdep yet. And
> lockdep could detect this with the patch I attach at the end of the
> mail.
>
> Case #2
>
> P0: P1: P2:
>
> <in irq handler>
> spin_lock(&slock) read_lock(&rwlock)
> write_lock(&rwlock)
> read_lock(&rwlock) spin_lock_irq(&slock)
>
> , which is not a deadlock, as the read_lock() on P0 can use the unfair
> fastpass.
>
> Case #3
>
> P0: P1: P2:
>
> <in irq handler>
> spin_lock_irq(&slock) read_lock(&rwlock)
> write_lock_irq(&rwlock)
> read_lock(&rwlock) spin_lock(&slock)
>
> , which is a deadlock, as the read_lock() on P0 cannot use the
> fastpass.
> To detect this and not to make case #2 as a false positive, the
> recursive deadlock detection patchset is needed, the WIP version is at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git
> arr-rfc-wip
>
> The code is done, I'm just working on the rework for documention stuff,
> so if anyone is interested, could try it out ;-)
>
> Regards,
> Boqun
>
> ------------------->8
> Subject: [PATCH] locking: More accurate annotations for read_lock()
>
> On the archs using QUEUED_RWLOCKS, read_lock() is not always a
> recursive
> read lock, actually it's only recursive if in_interrupt() is true. So
> change the annotation accordingly to catch more deadlocks.
>
> Note we used to treat read_lock() as pure recursive read locks in
> lib/locking-seftest.c, and this is useful, especially for the lockdep
> development selftest, so we keep this via a variable to force switching
> lock annotation for read_lock().
>
> Signed-off-by: Boqun Feng <[email protected]>
> ---
> include/linux/lockdep.h | 35 ++++++++++++++++++++++++++++++++++-
> lib/locking-selftest.c | 11 +++++++++++
> 2 files changed, 45 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index 6fc77d4dbdcd..07793986c063 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -27,6 +27,7 @@ extern int lock_stat;
> #include <linux/list.h>
> #include <linux/debug_locks.h>
> #include <linux/stacktrace.h>
> +#include <linux/preempt.h>
>
> /*
> * We'd rather not expose kernel/lockdep_states.h this wide, but we do
> need
> @@ -540,6 +541,31 @@ static inline void print_irqtrace_events(struct
> task_struct *curr)
> }
> #endif
>
> +/* Variable used to make lockdep treat read_lock() as recursive in
> selftests */
> +#ifdef CONFIG_DEBUG_LOCKING_API_SELFTESTS
> +extern unsigned int force_read_lock_recursive;
> +#else /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */
> +#define force_read_lock_recursive 0
> +#endif /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */
> +
> +#ifdef CONFIG_LOCKDEP
> +/*
> + * read_lock() is recursive if:
> + * 1. We force lockdep think this way in selftests or
> + * 2. The implementation is not queued read/write lock or
> + * 3. The locker is at an in_interrupt() context.
> + */
> +static inline bool read_lock_is_recursive(void)
> +{
> + return force_read_lock_recursive ||
> + !IS_ENABLED(CONFIG_QUEUED_RWLOCKS) ||
> + in_interrupt();
> +}
> +#else /* CONFIG_LOCKDEP */
> +/* If !LOCKDEP, the value is meaningless */
> +#define read_lock_is_recursive() 0
> +#endif
> +
> /*
> * For trivial one-depth nesting of a lock-class, the following
> * global define can be used. (Subsystems with multiple levels
> @@ -561,7 +587,14 @@ static inline void print_irqtrace_events(struct
> task_struct *curr)
> #define spin_release(l, n, i) lock_release(l, n, i)
>
> #define rwlock_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t,
> NULL, i)
> -#define rwlock_acquire_read(l, s, t,
> i) lock_acquire_shared_recursive(l, s, t, NULL, i)
> +#define rwlock_acquire_read(l, s, t, i) \
> +do { \
> + if (read_lock_is_recursive()) \
> + lock_acquire_shared_recursive(l, s, t, NULL, i); \
> + else \
> + lock_acquire_shared(l, s, t, NULL, i); \
> +} while (0)
> +
> #define rwlock_release(l, n, i) lock_release(l, n, i)
>
> #define seqcount_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t,
> NULL, i)
> diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
> index b5c1293ce147..dd92f8ad83d5 100644
> --- a/lib/locking-selftest.c
> +++ b/lib/locking-selftest.c
> @@ -28,6 +28,7 @@
> * Change this to 1 if you want to see the failure printouts:
> */
> static unsigned int debug_locks_verbose;
> +unsigned int force_read_lock_recursive;
>
> static DEFINE_WW_CLASS(ww_lockdep);
>
> @@ -1978,6 +1979,11 @@ void locking_selftest(void)
> return;
> }
>
> + /*
> + * treats read_lock() as recursive read locks for testing purpose
> + */
> + force_read_lock_recursive = 1;
> +
> /*
> * Run the testsuite:
> */
> @@ -2072,6 +2078,11 @@ void locking_selftest(void)
>
> ww_tests();
>
> + force_read_lock_recursive = 0;
> + /*
> + * queued_read_lock() specific test cases can be put here
> + */
> +
> if (unexpected_testcase_failures) {
>
> printk("-----------------------------------------------------------------\n");
> debug_locks = 0;

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum,
Linux Foundation Collaborative Project

2018-05-25 02:37:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: write_lock_irq(&tasklist_lock)

On Thu, May 24, 2018 at 10:37:25AM -0700, Sodagudi Prasad wrote:
> Kernel version is locked for couple of products and same issue observed on
> both 4.14.41
> and 4.9.96 kernels. We can only accept the stable updates from upstream for
> these products.

> If QUEUED_RWLOCKS works on above listed kernel versions without any issues,
> we can enabled QUEUED_RWLOCKS.

You want:

e0d02285f16e ("locking/qrwlock: Use 'struct qrwlock' instead of 'struct __qrwlock'")
4df714be4dcf ("locking/atomic: Add atomic_cond_read_acquire()")
b519b56e378e ("locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock")
087133ac9076 ("locking/qrwlock, arm64: Move rwlock implementation over to qrwlocks")
d13316614633 ("locking/qrwlock: Prevent slowpath writers getting held up by fastpath")

IIRC, enabling QUEUED_RWLOCKS will 'work' but esp. that
atomic_cond_read_acquire() one is goodness for ARM64.

> Can we go ahead with Linus suggestion for these kernel version?
> So that IRQ wont be disabled for quite a long time.
>
> static void tasklist_write_lock(void)
> {
> unsigned long flags;
> local_irq_save(flags);
> while (!write_trylock(&tasklist_lock)) {
> local_irq_restore(flags);
> do { cpu_relax(); } while (write_islocked(&tasklist_lock));
> local_irq_disable();
> }
> }

First you say you can only take stable updates, then you ask for a
gruesome hack that will seriously regress architectures that do use
qrwlock which will hence never make it into stable.

Just take the arm64 qrwlock patches and pretend they're from stable.

2018-05-25 02:44:39

by Andrea Parri

[permalink] [raw]
Subject: Re: write_lock_irq(&tasklist_lock)

> Yeah, lemme put some details here:
>
> So we have three cases:
>
> Case #1 (from Will)
>
> P0: P1: P2:
>
> spin_lock(&slock) read_lock(&rwlock)
> write_lock(&rwlock)
> read_lock(&rwlock) spin_lock(&slock)
>
> , which is a deadlock, and couldn't not be detected by lockdep yet. And
> lockdep could detect this with the patch I attach at the end of the
> mail.
>
> Case #2
>
> P0: P1: P2:
>
> <in irq handler>
> spin_lock(&slock) read_lock(&rwlock)
> write_lock(&rwlock)
> read_lock(&rwlock) spin_lock_irq(&slock)
>
> , which is not a deadlock, as the read_lock() on P0 can use the unfair
> fastpass.
>
> Case #3
>
> P0: P1: P2:
>
> <in irq handler>
> spin_lock_irq(&slock) read_lock(&rwlock)
> write_lock_irq(&rwlock)
> read_lock(&rwlock) spin_lock(&slock)
>
> , which is a deadlock, as the read_lock() on P0 cannot use the fastpass.

Mmh, I'm starting to think that, maybe, we need a model (a tool) to
distinguish these and other(?) cases (sorry, I could not resist ;-)

[...]


> ------------------->8
> Subject: [PATCH] locking: More accurate annotations for read_lock()
>
> On the archs using QUEUED_RWLOCKS, read_lock() is not always a recursive
> read lock, actually it's only recursive if in_interrupt() is true. So

Mmh, taking the "common denominator" over archs/Kconfig options and
CPU states, this would suggest that read_lock() is non-recursive;

it looks like I can say "good-bye" to my idea to define (formalize)
consistent executions/the memory ordering of RW-LOCKS "by following"
the following _emulation_:

void read_lock(rwlock_t *s)
{
r0 = atomic_fetch_inc_acquire(&s->val);
}

void read_unlock(rwlock_t *s)
{
r0 = atomic_fetch_sub_release(&s->val);
}

void write_lock(rwlock_t *s)
{
r0 = atomic_cmpxchg_acquire(&s->val, 0, -1);
}

void write_unlock(rwlock_t *s)
{
atomic_set_release(&s->val, 0);
}

filter (~read_lock:r0=-1 /\ write_lock:r0=0)

[...]


> The code is done, I'm just working on the rework for documention stuff,
> so if anyone is interested, could try it out ;-)

Any idea on how to "educate" the LKMM about this code/documentation?

Andrea