LinuxLists.cc - [PATCH RFC] ashmem: Fix lockdep RECLAIM

2018-02-06 00:50:33

Subject: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive

During invocation of ashmem shrinker under memory pressure, ashmem
calls into VFS code via vfs_fallocate. We however make sure we
don't enter it if the allocation was GFP_FS to prevent looping
into filesystem code. However lockdep doesn't know this and prints
a lockdep splat as below.

This patch fixes the issue by releasing the reclaim_fs lock after
checking for GFP_FS but before calling into the VFS path, and
reacquiring it after so that lockdep can continue reporting any
reclaim issues later.

[ 2115.359650] -(1)[106:kswapd0]=================================
[ 2115.359665] -(1)[106:kswapd0][ INFO: inconsistent lock state ]
[ 2115.359684] -(1)[106:kswapd0]4.9.60+ #2 Tainted: G W O
[ 2115.359699] -(1)[106:kswapd0]---------------------------------
[ 2115.359715] -(1)[106:kswapd0]inconsistent {RECLAIM_FS-ON-W} ->
{IN-RECLAIM_FS-W} usage.
[ 2115.359732] -(1)[106:kswapd0]kswapd0/106 [HC0[0]:SC0[0]:HE1:SE1]
takes:
[ 2115.359748] (&sb->s_type->i_mutex_key#9){++++?+}, at:
[<ffffff9008470644>] shmem_fallocate+0x104/0x880
[ 2115.359809] -(1)[106:kswapd0]{RECLAIM_FS-ON-W} state was registered
at:
[ 2115.359828] -(1)[106:kswapd0] mark_lock+0x2a4/0x10c0
[ 2115.359845] -(1)[106:kswapd0] mark_held_locks+0xc0/0x128
[ 2115.359862] -(1)[106:kswapd0] lockdep_trace_alloc+0x284/0x368
[ 2115.359881] -(1)[106:kswapd0] kmem_cache_alloc+0x3c/0x368
[ 2115.359900] -(1)[106:kswapd0] __d_alloc+0x3c/0x7a8
[ 2115.359917] -(1)[106:kswapd0] d_alloc+0x34/0x140
[ 2115.359934] -(1)[106:kswapd0] d_alloc_parallel+0xfc/0x1480
[ 2115.359953] -(1)[106:kswapd0] lookup_open+0x3c4/0x12a8
[ 2115.359971] -(1)[106:kswapd0] path_openat+0xb40/0x1dc0
[ 2115.359987] -(1)[106:kswapd0] do_filp_open+0x170/0x258
[ 2115.360006] -(1)[106:kswapd0] do_sys_open+0x1b8/0x2f0
[ 2115.360023] -(1)[106:kswapd0] SyS_openat+0x10/0x18
[ 2115.360041] -(1)[106:kswapd0] el0_svc_naked+0x24/0x28
[ 2115.360056] -(1)[106:kswapd0]irq event stamp: 2437365
[ 2115.360079] -(1)[106:kswapd0]hardirqs last enabled at (2437365):
[<ffffff900a0ec6ec>] mutex_trylock+0x224/0x460
[ 2115.360098] -(1)[106:kswapd0]hardirqs last disabled at (2437364):
[<ffffff900a0ec5b0>] mutex_trylock+0xe8/0x460
[ 2115.360116] -(1)[106:kswapd0]softirqs last enabled at (2436534):
[<ffffff90080822a0>] __do_softirq+0xc38/0x1190
[ 2115.360138] -(1)[106:kswapd0]softirqs last disabled at (2436515):
[<ffffff90080de05c>] irq_exit+0x1ac/0x228
[ 2115.360153] -(1)[106:kswapd0]\x0aother info that might help us debug
this:
[ 2115.360169] -(1)[106:kswapd0] Possible unsafe locking scenario:\x0a
[ 2115.360184] -(1)[106:kswapd0] CPU0
[ 2115.360198] -(1)[106:kswapd0] ----
[ 2115.360211] -(1)[106:kswapd0] lock(&sb->s_type->i_mutex_key#9);
[ 2115.360252] -(1)[106:kswapd0] <Interrupt>
[ 2115.360265] -(1)[106:kswapd0] lock(&sb->s_type->i_mutex_key#9);
[ 2115.360304] -(1)[106:kswapd0]\x0a *** DEADLOCK ***\x0a
[ 2115.360322] -(1)[106:kswapd0] #0: (shrinker_rwsem){++++..}, at:
[<ffffff9008459e08>] shrink_slab.part.15.constprop.28+0xb0/0xe10
[ 2115.360382] -(1)[106:kswapd0] #1: (ashmem_mutex){+.+.+.}, at:
[<ffffff90098c4ba0>] ashmem_shrink_scan+0x80/0x308
[ 2115.360439] -(1)[106:kswapd0]\x0astack backtrace:
[ 2115.360462] -(1)[106:kswapd0]CPU: 1 PID: 106 Comm: kswapd0 Tainted: G
W O 4.9.60+ #2
[ 2115.360478] -(1)[106:kswapd0]Hardware name: MT6765 (DT)
[ 2115.360494] -(1)[106:kswapd0]Call trace:
[ 2115.360515] -(1)[106:kswapd0][<ffffff9008092938>]
dump_backtrace+0x0/0x400
[ 2115.360533] -(1)[106:kswapd0][<ffffff900809302c>]
show_stack+0x14/0x20
[ 2115.360555] -(1)[106:kswapd0][<ffffff9008988bc0>]
dump_stack+0xb0/0xe8
[ 2115.360576] -(1)[106:kswapd0][<ffffff900841e8b0>]
print_usage_bug.part.24+0x548/0x568
[ 2115.360595] -(1)[106:kswapd0][<ffffff90082398bc>]
mark_lock+0x494/0x10c0
[ 2115.360613] -(1)[106:kswapd0][<ffffff900823bf1c>]
__lock_acquire+0xc94/0x58e8
[ 2115.360630] -(1)[106:kswapd0][<ffffff9008241ba0>]
lock_acquire+0x1d0/0x708
[ 2115.360650] -(1)[106:kswapd0][<ffffff900a0f27a8>]
down_write+0x48/0xd0
[ 2115.360669] -(1)[106:kswapd0][<ffffff9008470644>]
shmem_fallocate+0x104/0x880
[ 2115.360688] -(1)[106:kswapd0][<ffffff90098c4cc4>]
ashmem_shrink_scan+0x1a4/0x308
[ 2115.360709] -(1)[106:kswapd0][<ffffff900845a088>]
shrink_slab.part.15.constprop.28+0x330/0xe10
[ 2115.360729] -(1)[106:kswapd0][<ffffff9008463b34>]
shrink_node+0x1b4/0x588
[ 2115.360747] -(1)[106:kswapd0][<ffffff90084655f4>] kswapd+0x774/0x1640
[ 2115.360767] -(1)[106:kswapd0][<ffffff9008128a4c>] kthread+0x28c/0x310
[ 2115.360786] -(1)[106:kswapd0][<ffffff9008083f00>]
ret_from_fork+0x10/0x50

Cc: Peter Zilstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Joel Fernandes <[email protected]>
---
drivers/staging/android/ashmem.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index 372ce9913e6d..7e060f32aaa8 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -32,6 +32,7 @@
#include <linux/bitops.h>
#include <linux/mutex.h>
#include <linux/shmem_fs.h>
+#include <linux/sched/mm.h>
#include "ashmem.h"

#define ASHMEM_NAME_PREFIX "dev/ashmem/"
@@ -446,8 +447,17 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
if (!(sc->gfp_mask & __GFP_FS))
return SHRINK_STOP;

- if (!mutex_trylock(&ashmem_mutex))
+ /*
+ * Release reclaim-fs marking since we've already checked GFP_FS, This
+ * will prevent lockdep's reclaim recursion deadlock false positives.
+ * We'll renable it before returning from this function.
+ */
+ fs_reclaim_release(sc->gfp_mask);
+
+ if (!mutex_trylock(&ashmem_mutex)) {
+ fs_reclaim_acquire(sc->gfp_mask);
return -1;
+ }

list_for_each_entry_safe(range, next, &ashmem_lru_list, lru) {
loff_t start = range->pgstart * PAGE_SIZE;
@@ -464,6 +474,8 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
break;
}
mutex_unlock(&ashmem_mutex);
+
+ fs_reclaim_acquire(sc->gfp_mask);
return freed;
}

--
2.16.0.rc1.238.g530d649a79-goog

2018-02-06 22:04:36

by Minchan Kim

[permalink] [raw]

Subject: Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive

Hi Joel,

On Mon, Feb 05, 2018 at 04:49:03PM -0800, Joel Fernandes wrote:
> During invocation of ashmem shrinker under memory pressure, ashmem
> calls into VFS code via vfs_fallocate. We however make sure we
> don't enter it if the allocation was GFP_FS to prevent looping
> into filesystem code. However lockdep doesn't know this and prints
> a lockdep splat as below.
>
> This patch fixes the issue by releasing the reclaim_fs lock after
> checking for GFP_FS but before calling into the VFS path, and
> reacquiring it after so that lockdep can continue reporting any
> reclaim issues later.

At first glance, it looks reasonable. However, Couldn't we return
just 0 in ashmem_shrink_count when the context is under FS?

>
> [ 2115.359650] -(1)[106:kswapd0]=================================
> [ 2115.359665] -(1)[106:kswapd0][ INFO: inconsistent lock state ]
> [ 2115.359684] -(1)[106:kswapd0]4.9.60+ #2 Tainted: G W O
> [ 2115.359699] -(1)[106:kswapd0]---------------------------------
> [ 2115.359715] -(1)[106:kswapd0]inconsistent {RECLAIM_FS-ON-W} ->
> {IN-RECLAIM_FS-W} usage.
> [ 2115.359732] -(1)[106:kswapd0]kswapd0/106 [HC0[0]:SC0[0]:HE1:SE1]
> takes:
> [ 2115.359748] (&sb->s_type->i_mutex_key#9){++++?+}, at:
> [<ffffff9008470644>] shmem_fallocate+0x104/0x880
> [ 2115.359809] -(1)[106:kswapd0]{RECLAIM_FS-ON-W} state was registered
> at:
> [ 2115.359828] -(1)[106:kswapd0] mark_lock+0x2a4/0x10c0
> [ 2115.359845] -(1)[106:kswapd0] mark_held_locks+0xc0/0x128
> [ 2115.359862] -(1)[106:kswapd0] lockdep_trace_alloc+0x284/0x368
> [ 2115.359881] -(1)[106:kswapd0] kmem_cache_alloc+0x3c/0x368
> [ 2115.359900] -(1)[106:kswapd0] __d_alloc+0x3c/0x7a8
> [ 2115.359917] -(1)[106:kswapd0] d_alloc+0x34/0x140
> [ 2115.359934] -(1)[106:kswapd0] d_alloc_parallel+0xfc/0x1480
> [ 2115.359953] -(1)[106:kswapd0] lookup_open+0x3c4/0x12a8
> [ 2115.359971] -(1)[106:kswapd0] path_openat+0xb40/0x1dc0
> [ 2115.359987] -(1)[106:kswapd0] do_filp_open+0x170/0x258
> [ 2115.360006] -(1)[106:kswapd0] do_sys_open+0x1b8/0x2f0
> [ 2115.360023] -(1)[106:kswapd0] SyS_openat+0x10/0x18
> [ 2115.360041] -(1)[106:kswapd0] el0_svc_naked+0x24/0x28
> [ 2115.360056] -(1)[106:kswapd0]irq event stamp: 2437365
> [ 2115.360079] -(1)[106:kswapd0]hardirqs last enabled at (2437365):
> [<ffffff900a0ec6ec>] mutex_trylock+0x224/0x460
> [ 2115.360098] -(1)[106:kswapd0]hardirqs last disabled at (2437364):
> [<ffffff900a0ec5b0>] mutex_trylock+0xe8/0x460
> [ 2115.360116] -(1)[106:kswapd0]softirqs last enabled at (2436534):
> [<ffffff90080822a0>] __do_softirq+0xc38/0x1190
> [ 2115.360138] -(1)[106:kswapd0]softirqs last disabled at (2436515):
> [<ffffff90080de05c>] irq_exit+0x1ac/0x228
> [ 2115.360153] -(1)[106:kswapd0]\x0aother info that might help us debug
> this:
> [ 2115.360169] -(1)[106:kswapd0] Possible unsafe locking scenario:\x0a
> [ 2115.360184] -(1)[106:kswapd0] CPU0
> [ 2115.360198] -(1)[106:kswapd0] ----
> [ 2115.360211] -(1)[106:kswapd0] lock(&sb->s_type->i_mutex_key#9);
> [ 2115.360252] -(1)[106:kswapd0] <Interrupt>
> [ 2115.360265] -(1)[106:kswapd0] lock(&sb->s_type->i_mutex_key#9);
> [ 2115.360304] -(1)[106:kswapd0]\x0a *** DEADLOCK ***\x0a
> [ 2115.360322] -(1)[106:kswapd0] #0: (shrinker_rwsem){++++..}, at:
> [<ffffff9008459e08>] shrink_slab.part.15.constprop.28+0xb0/0xe10
> [ 2115.360382] -(1)[106:kswapd0] #1: (ashmem_mutex){+.+.+.}, at:
> [<ffffff90098c4ba0>] ashmem_shrink_scan+0x80/0x308
> [ 2115.360439] -(1)[106:kswapd0]\x0astack backtrace:
> [ 2115.360462] -(1)[106:kswapd0]CPU: 1 PID: 106 Comm: kswapd0 Tainted: G
> W O 4.9.60+ #2
> [ 2115.360478] -(1)[106:kswapd0]Hardware name: MT6765 (DT)
> [ 2115.360494] -(1)[106:kswapd0]Call trace:
> [ 2115.360515] -(1)[106:kswapd0][<ffffff9008092938>]
> dump_backtrace+0x0/0x400
> [ 2115.360533] -(1)[106:kswapd0][<ffffff900809302c>]
> show_stack+0x14/0x20
> [ 2115.360555] -(1)[106:kswapd0][<ffffff9008988bc0>]
> dump_stack+0xb0/0xe8
> [ 2115.360576] -(1)[106:kswapd0][<ffffff900841e8b0>]
> print_usage_bug.part.24+0x548/0x568
> [ 2115.360595] -(1)[106:kswapd0][<ffffff90082398bc>]
> mark_lock+0x494/0x10c0
> [ 2115.360613] -(1)[106:kswapd0][<ffffff900823bf1c>]
> __lock_acquire+0xc94/0x58e8
> [ 2115.360630] -(1)[106:kswapd0][<ffffff9008241ba0>]
> lock_acquire+0x1d0/0x708
> [ 2115.360650] -(1)[106:kswapd0][<ffffff900a0f27a8>]
> down_write+0x48/0xd0
> [ 2115.360669] -(1)[106:kswapd0][<ffffff9008470644>]
> shmem_fallocate+0x104/0x880
> [ 2115.360688] -(1)[106:kswapd0][<ffffff90098c4cc4>]
> ashmem_shrink_scan+0x1a4/0x308
> [ 2115.360709] -(1)[106:kswapd0][<ffffff900845a088>]
> shrink_slab.part.15.constprop.28+0x330/0xe10
> [ 2115.360729] -(1)[106:kswapd0][<ffffff9008463b34>]
> shrink_node+0x1b4/0x588
> [ 2115.360747] -(1)[106:kswapd0][<ffffff90084655f4>] kswapd+0x774/0x1640
> [ 2115.360767] -(1)[106:kswapd0][<ffffff9008128a4c>] kthread+0x28c/0x310
> [ 2115.360786] -(1)[106:kswapd0][<ffffff9008083f00>]
> ret_from_fork+0x10/0x50
>
> Cc: Peter Zilstra <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Joel Fernandes <[email protected]>
> ---
> drivers/staging/android/ashmem.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
> index 372ce9913e6d..7e060f32aaa8 100644
> --- a/drivers/staging/android/ashmem.c
> +++ b/drivers/staging/android/ashmem.c
> @@ -32,6 +32,7 @@
> #include <linux/bitops.h>
> #include <linux/mutex.h>
> #include <linux/shmem_fs.h>
> +#include <linux/sched/mm.h>
> #include "ashmem.h"
>
> #define ASHMEM_NAME_PREFIX "dev/ashmem/"
> @@ -446,8 +447,17 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> if (!(sc->gfp_mask & __GFP_FS))
> return SHRINK_STOP;
>
> - if (!mutex_trylock(&ashmem_mutex))
> + /*
> + * Release reclaim-fs marking since we've already checked GFP_FS, This
> + * will prevent lockdep's reclaim recursion deadlock false positives.
> + * We'll renable it before returning from this function.
> + */
> + fs_reclaim_release(sc->gfp_mask);
> +
> + if (!mutex_trylock(&ashmem_mutex)) {
> + fs_reclaim_acquire(sc->gfp_mask);
> return -1;
> + }
>
> list_for_each_entry_safe(range, next, &ashmem_lru_list, lru) {
> loff_t start = range->pgstart * PAGE_SIZE;
> @@ -464,6 +474,8 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> break;
> }
> mutex_unlock(&ashmem_mutex);
> +
> + fs_reclaim_acquire(sc->gfp_mask);
> return freed;
> }
>
> --
> 2.16.0.rc1.238.g530d649a79-goog
>

2018-02-06 22:35:10

by Joel Fernandes

[permalink] [raw]

Subject: Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive

Hi Minchan,

On Tue, Feb 6, 2018 at 2:01 PM, Minchan Kim <[email protected]> wrote:
[...]
> On Mon, Feb 05, 2018 at 04:49:03PM -0800, Joel Fernandes wrote:
>> During invocation of ashmem shrinker under memory pressure, ashmem
>> calls into VFS code via vfs_fallocate. We however make sure we
>> don't enter it if the allocation was GFP_FS to prevent looping
>> into filesystem code. However lockdep doesn't know this and prints
>> a lockdep splat as below.
>>
>> This patch fixes the issue by releasing the reclaim_fs lock after
>> checking for GFP_FS but before calling into the VFS path, and
>> reacquiring it after so that lockdep can continue reporting any
>> reclaim issues later.
>
> At first glance, it looks reasonable. However, Couldn't we return
> just 0 in ashmem_shrink_count when the context is under FS?
>

We're already checking if GFP_FS in ashmem_shrink_scan and bailing out
though, did I miss something?

The problem is not that there is a deadlock that occurs, the problem
that even when we're not under FS, lockdep reports an issue that can't
happen. The fix is for the lockdep false positive that occurs.

thanks,

- Joel

2018-02-06 22:57:45

by Minchan Kim

[permalink] [raw]

Subject: Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive

On Tue, Feb 06, 2018 at 02:32:13PM -0800, Joel Fernandes wrote:
> Hi Minchan,
>
> On Tue, Feb 6, 2018 at 2:01 PM, Minchan Kim <[email protected]> wrote:
> [...]
> > On Mon, Feb 05, 2018 at 04:49:03PM -0800, Joel Fernandes wrote:
> >> During invocation of ashmem shrinker under memory pressure, ashmem
> >> calls into VFS code via vfs_fallocate. We however make sure we
> >> don't enter it if the allocation was GFP_FS to prevent looping
> >> into filesystem code. However lockdep doesn't know this and prints
> >> a lockdep splat as below.
> >>
> >> This patch fixes the issue by releasing the reclaim_fs lock after
> >> checking for GFP_FS but before calling into the VFS path, and
> >> reacquiring it after so that lockdep can continue reporting any
> >> reclaim issues later.
> >
> > At first glance, it looks reasonable. However, Couldn't we return
> > just 0 in ashmem_shrink_count when the context is under FS?
> >
>
> We're already checking if GFP_FS in ashmem_shrink_scan and bailing out
> though, did I miss something?

I understand your concern now. Apart from that, if ashmem_shrink_count
is called under GFP_FS, you can just return 0 for removing pointless
ashmem_shrink_scan calling. But it might be trivial so up to you. :)

>
> The problem is not that there is a deadlock that occurs, the problem
> that even when we're not under FS, lockdep reports an issue that can't
> happen. The fix is for the lockdep false positive that occurs.

Yub, you are right. I am happy to add

Reviewed-by: Minchan Kim <[email protected]?

Other than that, I thought a while we could make it in generic so we
can add SHRINKER_FS_AWARE like that so VM code itself can do for
preventing such false positive instead of doing it in each driver
itself.

However, if driver can do by itself, it could be more flexible.
Also, at this moment, my suggestion would be also overengineering so
I'm not against you.

Thanks!

2018-02-06 23:17:55

by Joel Fernandes

[permalink] [raw]

Subject: Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive

Hi Minchan,

On Tue, Feb 6, 2018 at 2:55 PM, Minchan Kim <[email protected]> wrote:
> On Tue, Feb 06, 2018 at 02:32:13PM -0800, Joel Fernandes wrote:
[...]
>> On Tue, Feb 6, 2018 at 2:01 PM, Minchan Kim <[email protected]> wrote:
>> [...]
>> > On Mon, Feb 05, 2018 at 04:49:03PM -0800, Joel Fernandes wrote:
>> >> During invocation of ashmem shrinker under memory pressure, ashmem
>> >> calls into VFS code via vfs_fallocate. We however make sure we
>> >> don't enter it if the allocation was GFP_FS to prevent looping
>> >> into filesystem code. However lockdep doesn't know this and prints
>> >> a lockdep splat as below.
>> >>
>> >> This patch fixes the issue by releasing the reclaim_fs lock after
>> >> checking for GFP_FS but before calling into the VFS path, and
>> >> reacquiring it after so that lockdep can continue reporting any
>> >> reclaim issues later.
>> >
>> > At first glance, it looks reasonable. However, Couldn't we return
>> > just 0 in ashmem_shrink_count when the context is under FS?
>> >
>>
>> We're already checking if GFP_FS in ashmem_shrink_scan and bailing out
>> though, did I miss something?
>
> I understand your concern now. Apart from that, if ashmem_shrink_count
> is called under GFP_FS, you can just return 0 for removing pointless
> ashmem_shrink_scan calling. But it might be trivial so up to you. :)

Yes, I think we can do that in a subsequent patch since that's a
different optimization. Thanks for the suggestion.

>>
>> The problem is not that there is a deadlock that occurs, the problem
>> that even when we're not under FS, lockdep reports an issue that can't
>> happen. The fix is for the lockdep false positive that occurs.
>
> Yub, you are right. I am happy to add
>
> Reviewed-by: Minchan Kim <[email protected]?

Great, thanks!

>
> Other than that, I thought a while we could make it in generic so we
> can add SHRINKER_FS_AWARE like that so VM code itself can do for
> preventing such false positive instead of doing it in each driver
> itself.
>
> However, if driver can do by itself, it could be more flexible.

Yes, off the top it feels like something that driver you should do at
more finer grained level, since probably only driver knows that/if we
will be looping into FS.

Thanks for the review,

- Joel

2018-02-07 08:08:59

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive

On Mon, Feb 05, 2018 at 04:49:03PM -0800, Joel Fernandes wrote:

> [ 2115.359650] -(1)[106:kswapd0]=================================
> [ 2115.359665] -(1)[106:kswapd0][ INFO: inconsistent lock state ]
> [ 2115.359684] -(1)[106:kswapd0]4.9.60+ #2 Tainted: G W O
> [ 2115.359699] -(1)[106:kswapd0]---------------------------------
> [ 2115.359715] -(1)[106:kswapd0]inconsistent {RECLAIM_FS-ON-W} ->
> {IN-RECLAIM_FS-W} usage.

Please don't wrap log output, this is unreadable :/

Also, the output is from an ancient kernel and doesn't match the current
code.

> ---
> drivers/staging/android/ashmem.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
> index 372ce9913e6d..7e060f32aaa8 100644
> --- a/drivers/staging/android/ashmem.c
> +++ b/drivers/staging/android/ashmem.c
> @@ -32,6 +32,7 @@
> #include <linux/bitops.h>
> #include <linux/mutex.h>
> #include <linux/shmem_fs.h>
> +#include <linux/sched/mm.h>
> #include "ashmem.h"
>
> #define ASHMEM_NAME_PREFIX "dev/ashmem/"
> @@ -446,8 +447,17 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> if (!(sc->gfp_mask & __GFP_FS))
> return SHRINK_STOP;
>
> - if (!mutex_trylock(&ashmem_mutex))
> + /*
> + * Release reclaim-fs marking since we've already checked GFP_FS, This
> + * will prevent lockdep's reclaim recursion deadlock false positives.
> + * We'll renable it before returning from this function.
> + */
> + fs_reclaim_release(sc->gfp_mask);
> +
> + if (!mutex_trylock(&ashmem_mutex)) {
> + fs_reclaim_acquire(sc->gfp_mask);
> return -1;
> + }
>
> list_for_each_entry_safe(range, next, &ashmem_lru_list, lru) {
> loff_t start = range->pgstart * PAGE_SIZE;
> @@ -464,6 +474,8 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> break;
> }
> mutex_unlock(&ashmem_mutex);
> +
> + fs_reclaim_acquire(sc->gfp_mask);
> return freed;
> }

Yuck that is horrible.. so if GFP_FS was set, we bail, but if GFP_FS
wasn't set, why is fs_reclaim_*() doing anything at all?

That is, __need_fd_reclaim() should return false when !GFP_FS.

2018-02-07 16:12:10

by Joel Fernandes

[permalink] [raw]

Subject: Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive

Hi Peter,

On Wed, Feb 7, 2018 at 12:07 AM, Peter Zijlstra <[email protected]> wrote:
> On Mon, Feb 05, 2018 at 04:49:03PM -0800, Joel Fernandes wrote:
>
>> [ 2115.359650] -(1)[106:kswapd0]=================================
>> [ 2115.359665] -(1)[106:kswapd0][ INFO: inconsistent lock state ]
>> [ 2115.359684] -(1)[106:kswapd0]4.9.60+ #2 Tainted: G W O
>> [ 2115.359699] -(1)[106:kswapd0]---------------------------------
>> [ 2115.359715] -(1)[106:kswapd0]inconsistent {RECLAIM_FS-ON-W} ->
>> {IN-RECLAIM_FS-W} usage.
>
> Please don't wrap log output, this is unreadable :/

Sorry about that, here's the unwrapped output, I'll fix the commit
message in next rev: https://pastebin.com/e0BNGkaN

>
> Also, the output is from an ancient kernel and doesn't match the current
> code.

Right, however the driver hasn't changed and I don't see immediately
how lockdep handles this differently upstream, so I thought of fixing
it upstream.

>> diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
>> index 372ce9913e6d..7e060f32aaa8 100644
>> --- a/drivers/staging/android/ashmem.c
>> +++ b/drivers/staging/android/ashmem.c
>> @@ -32,6 +32,7 @@
>> #include <linux/bitops.h>
>> #include <linux/mutex.h>
>> #include <linux/shmem_fs.h>
>> +#include <linux/sched/mm.h>
>> #include "ashmem.h"
>>
>> #define ASHMEM_NAME_PREFIX "dev/ashmem/"
>> @@ -446,8 +447,17 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>> if (!(sc->gfp_mask & __GFP_FS))
>> return SHRINK_STOP;
>>
>> - if (!mutex_trylock(&ashmem_mutex))
>> + /*
>> + * Release reclaim-fs marking since we've already checked GFP_FS, This
>> + * will prevent lockdep's reclaim recursion deadlock false positives.
>> + * We'll renable it before returning from this function.
>> + */
>> + fs_reclaim_release(sc->gfp_mask);
>> +
>> + if (!mutex_trylock(&ashmem_mutex)) {
>> + fs_reclaim_acquire(sc->gfp_mask);
>> return -1;
>> + }
>>
>> list_for_each_entry_safe(range, next, &ashmem_lru_list, lru) {
>> loff_t start = range->pgstart * PAGE_SIZE;
>> @@ -464,6 +474,8 @@ ashmem_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
>> break;
>> }
>> mutex_unlock(&ashmem_mutex);
>> +
>> + fs_reclaim_acquire(sc->gfp_mask);
>> return freed;
>> }
>
> Yuck that is horrible.. so if GFP_FS was set, we bail, but if GFP_FS
> wasn't set, why is fs_reclaim_*() doing anything at all?
>
> That is, __need_fd_reclaim() should return false when !GFP_FS.

So my patch is wrong, very sorry about that. That's why I marked it as
RFC and wanted to get your expert eyes on it.
The bail out happens when GFP_FS is *not* set. Lockdep reports this
issue when GFP_FS is infact set, and we enter this path and acquire
the lock. So lockdep seems to be doing the right thing however by
design it is reporting a false-positive.

The real issue is that the lock being acquired is of the same lock
class and a different lock instance is acquired under GFP_FS that
happens to be of the same class.

So the issue seems to me to be:
Process A kswapd
--------- ------
acquire i_mutex Enter RECLAIM_FS

Enter RECLAIM_FS acquire different i_mutex

Neil tried to fix this sometime back:
https://www.mail-archive.com/[email protected]/msg623909.html
but it was kind of NAK'ed.

Any thoughts on how we can fix this?

Thanks Peter,

- Joel

2018-02-07 17:01:12

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive

On Wed, Feb 07, 2018 at 08:09:36AM -0800, Joel Fernandes wrote:
> Hi Peter,
>
> On Wed, Feb 7, 2018 at 12:07 AM, Peter Zijlstra <[email protected]> wrote:
> > On Mon, Feb 05, 2018 at 04:49:03PM -0800, Joel Fernandes wrote:
> >
> >> [ 2115.359650] -(1)[106:kswapd0]=================================
> >> [ 2115.359665] -(1)[106:kswapd0][ INFO: inconsistent lock state ]
> >> [ 2115.359684] -(1)[106:kswapd0]4.9.60+ #2 Tainted: G W O
> >> [ 2115.359699] -(1)[106:kswapd0]---------------------------------
> >> [ 2115.359715] -(1)[106:kswapd0]inconsistent {RECLAIM_FS-ON-W} ->
> >> {IN-RECLAIM_FS-W} usage.
> >
> > Please don't wrap log output, this is unreadable :/
>
> Sorry about that, here's the unwrapped output, I'll fix the commit
> message in next rev: https://pastebin.com/e0BNGkaN

So if you trim that leading garbage: "[ 2115.359650] -(1)[106:kswapd0]"
you instantly have half you screen back.

> > Also, the output is from an ancient kernel and doesn't match the current
> > code.
>
> Right, however the driver hasn't changed and I don't see immediately
> how lockdep handles this differently upstream, so I thought of fixing
> it upstream.

Well, the annotation got a complete rewrite. Granted, it _should_ be
similar, but the output will be different.

> The bail out happens when GFP_FS is *not* set.

Argh, reading is hard.

> Lockdep reports this issue when GFP_FS is infact set, and we enter
> this path and acquire the lock. So lockdep seems to be doing the right
> thing however by design it is reporting a false-positive.

So I'm not seeing how its a false positive. fs/inode.c sets a different
lock class per filesystem type. So recursing on an i_mutex within a
filesystem does sound dodgy.

> The real issue is that the lock being acquired is of the same lock
> class and a different lock instance is acquired under GFP_FS that
> happens to be of the same class.
>
> So the issue seems to me to be:
> Process A kswapd
> --------- ------
> acquire i_mutex Enter RECLAIM_FS
>
> Enter RECLAIM_FS acquire different i_mutex

That's not a false positive, that's a 2 process way of writing i_mutex
recursion.

What are the rules of acquiring two i_mutexes within a filesystem?

> Neil tried to fix this sometime back:
> https://www.mail-archive.com/[email protected]/msg623909.html
> but it was kind of NAK'ed.

So that got nacked because Neil tried to fix it in the vfs core. Also
not entirely sure that's the same problem.

2018-02-07 22:28:35

by Joel Fernandes

[permalink] [raw]

Subject: Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive

Hi Peter,

On Wed, Feb 7, 2018 at 8:58 AM, Peter Zijlstra <[email protected]> wrote:
[...]
>
>> Lockdep reports this issue when GFP_FS is infact set, and we enter
>> this path and acquire the lock. So lockdep seems to be doing the right
>> thing however by design it is reporting a false-positive.
>
> So I'm not seeing how its a false positive. fs/inode.c sets a different
> lock class per filesystem type. So recursing on an i_mutex within a
> filesystem does sound dodgy.

But directory inodes and file inodes in the same filesystem share the
same lock class right? All the issues I've seen (both our's and
Neil's) are similar in that a directory inode's lock is held followed
by a RECLAIM_FS allocation, and in parallel to that, memory reclaim
involving the same FS is going on in another thread. In the splat I
shared, during the VFS lookup- the d_alloc is called with an inode's
lock held (I am guessing this the lock of the directory inode which is
locked just before the d_alloc), and in parallel (kswapd or some other
thread) is doing memory reclaim.

>> The real issue is that the lock being acquired is of the same lock
>> class and a different lock instance is acquired under GFP_FS that
>> happens to be of the same class.
>>
>> So the issue seems to me to be:
>> Process A kswapd
>> --------- ------
>> acquire i_mutex Enter RECLAIM_FS
>>
>> Enter RECLAIM_FS acquire different i_mutex
>
> That's not a false positive, that's a 2 process way of writing i_mutex
> recursion.

Yes, but I mention false positive since the kswapd->ashmem_shrink_scan
path can never acquire the mutex of a directory inode AFAIK. So from
that perspective it seems a false-positive.

>
> What are the rules of acquiring two i_mutexes within a filesystem?
>

I am not fully sure. I am CC'ing Ted and linux-fs-devel as well for
any input on this question.

>> Neil tried to fix this sometime back:
>> https://www.mail-archive.com/[email protected]/msg623909.html
>> but it was kind of NAK'ed.
>
> So that got nacked because Neil tried to fix it in the vfs core. Also
> not entirely sure that's the same problem.

Yes, a similar fix was proposed internally here, I would say the
signature of the problem reported there is quite similar (its just
that there its nfsd mentioned as doing the reclaim instead of kswapd).

thanks,

- Joel

[1] https://www.mail-archive.com/[email protected]/msg623986.html

2018-02-08 00:37:02

by NeilBrown

[permalink] [raw]

Subject: Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive

On Wed, Feb 07 2018, Joel Fernandes wrote:

> Hi Peter,
>
> On Wed, Feb 7, 2018 at 8:58 AM, Peter Zijlstra <[email protected]> wrote:
> [...]
>>
>>> Lockdep reports this issue when GFP_FS is infact set, and we enter
>>> this path and acquire the lock. So lockdep seems to be doing the right
>>> thing however by design it is reporting a false-positive.
>>
>> So I'm not seeing how its a false positive. fs/inode.c sets a different
>> lock class per filesystem type. So recursing on an i_mutex within a
>> filesystem does sound dodgy.
>
> But directory inodes and file inodes in the same filesystem share the
> same lock class right?

Not since v2.6.24
Commit: 14358e6ddaed ("lockdep: annotate dir vs file i_mutex")

You were using 4.9.60. so they should be separate....

Maybe shmem_get_inode() needs to call unlock_new_inode() or just
lockdep_annotate_inode_mutex_key() after inode_init_owner().

Maybe inode_init_owner() should call lockdep_annotate_inode_mutex_key()
directly.

NeilBrown

Attachments:

signature.asc (847.00 B)

2018-02-08 02:29:58

by Joel Fernandes

[permalink] [raw]

Subject: Re: [PATCH RFC] ashmem: Fix lockdep RECLAIM_FS false positive

On Wed, Feb 7, 2018 at 4:35 PM, NeilBrown <[email protected]> wrote:
>> On Wed, Feb 7, 2018 at 8:58 AM, Peter Zijlstra <[email protected]> wrote:
>> [...]
>>>
>>>> Lockdep reports this issue when GFP_FS is infact set, and we enter
>>>> this path and acquire the lock. So lockdep seems to be doing the right
>>>> thing however by design it is reporting a false-positive.
>>>
>>> So I'm not seeing how its a false positive. fs/inode.c sets a different
>>> lock class per filesystem type. So recursing on an i_mutex within a
>>> filesystem does sound dodgy.
>>
>> But directory inodes and file inodes in the same filesystem share the
>> same lock class right?
>
> Not since v2.6.24
> Commit: 14358e6ddaed ("lockdep: annotate dir vs file i_mutex")
>
> You were using 4.9.60. so they should be separate....
>
> Maybe shmem_get_inode() needs to call unlock_new_inode() or just
> lockdep_annotate_inode_mutex_key() after inode_init_owner().
>
> Maybe inode_init_owner() should call lockdep_annotate_inode_mutex_key()
> directly.

Thanks for the ideas! I will test lockdep_annotate_inode_mutex_key
after inode_init_owner in shmem and let you know if the issue goes
away. It seems hugetlbfs does this too (I think for similar reasons).

- Joel