cache_purge should hold cache_list_lock as cache_clean does. Otherwise a cache
can be cache_put twice, which leads to a use-after-free bug.
To reproduce, run ltp. It happens rarely. /opt/ltp/runltp run -f net.nfs
[14454.137661] ==================================================================
[14454.138863] BUG: KASAN: use-after-free in cache_purge+0xce/0x160 [sunrpc]
[14454.139822] Read of size 4 at addr ffff8883d484d560 by task nfsd/31993
[14454.140746]
[14454.140995] CPU: 1 PID: 31993 Comm: nfsd Kdump: loaded Not tainted 4.19.91-0.229.git.87bac30.al7.x86_64.debug #1
[14454.141002] Call Trace:
[14454.141014] dump_stack+0xaf/0xfb[14454.141027] print_address_description+0x6a/0x2a0
[14454.141037] kasan_report+0x166/0x2b0[14454.141057] ? cache_purge+0xce/0x160 [sunrpc]
[14454.141079] cache_purge+0xce/0x160 [sunrpc]
[14454.141099] nfsd_last_thread+0x267/0x270 [nfsd][14454.141109] ? nfsd_last_thread+0x5/0x270 [nfsd]
[14454.141130] nfsd_destroy+0xcb/0x180 [nfsd]
[14454.141140] ? nfsd_destroy+0x5/0x180 [nfsd]
[14454.141153] nfsd+0x1e4/0x2b0 [nfsd]
[14454.141163] ? nfsd+0x5/0x2b0 [nfsd]
[14454.141173] kthread+0x114/0x150
[14454.141183] ? nfsd_destroy+0x180/0x180 [nfsd]
[14454.141187] ? kthread_park+0xb0/0xb0
[14454.141197] ret_from_fork+0x3a/0x50
[14454.141224]
[14454.141475] Allocated by task 20918:
[14454.142011] kmem_cache_alloc_trace+0x9f/0x2e0
[14454.142027] sunrpc_cache_lookup+0xca/0x2f0 [sunrpc]
[14454.142037] svc_export_parse+0x1e7/0x930 [nfsd]
[14454.142051] cache_do_downcall+0x5a/0x80 [sunrpc]
[14454.142064] cache_downcall+0x78/0x180 [sunrpc]
[14454.142078] cache_write_procfs+0x57/0x80 [sunrpc]
[14454.142083] proc_reg_write+0x90/0xd0
[14454.142088] vfs_write+0xc2/0x1c0
[14454.142092] ksys_write+0x4d/0xd0
[14454.142098] do_syscall_64+0x60/0x250
[14454.142103] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[14454.142106]
[14454.142344] Freed by task 19165:
[14454.142804] kfree+0x114/0x300
[14454.142819] cache_clean+0x2a4/0x2e0 [sunrpc]
[14454.142833] cache_flush+0x24/0x60 [sunrpc]
[14454.142845] write_flush.isra.19+0xbe/0x100 [sunrpc]
[14454.142849] proc_reg_write+0x90/0xd0
[14454.142853] vfs_write+0xc2/0x1c0
[14454.142856] ksys_write+0x4d/0xd0
[14454.142860] do_syscall_64+0x60/0x250
[14454.142865] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[14454.142867]
[14454.143095] The buggy address belongs to the object at ffff8883d484d540 which belongs to the cache kmalloc-256 of size 256
[14454.144842] The buggy address is located 32 bytes inside of 256-byte region [ffff8883d484d540, ffff8883d484d640)
[14454.146463] The buggy address belongs to the page:
[14454.147155] page:ffffea000f521300 count:1 mapcount:0 mapping:ffff888107c02e00 index:0xffff8883d484da40 compound_map count: 0
[14454.148712] flags: 0x17fffc00010200(slab|head)
[14454.149356] raw: 0017fffc00010200 ffffea000f4baf00 0000000200000002 ffff888107c02e00
[14454.150453] raw: ffff8883d484da40 0000000080190001 00000001ffffffff 0000000000000000
[14454.151557] page dumped because: kasan: bad access detected
[14454.152364]
[14454.152606] Memory state around the buggy address:
[14454.153300] ffff8883d484d400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[14454.154319] ffff8883d484d480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[14454.155324] >ffff8883d484d500: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
[14454.156334] ^
[14454.157237] ffff8883d484d580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[14454.158262] ffff8883d484d600: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[14454.159282] ==================================================================
[14454.160224] Disabling lock debugging due to kernel taint
Fixes: 471a930ad7d1(SUNRPC: Drop all entries from cache_detail when cache_purge())
Cc: [email protected] #v4.11+
Signed-off-by: Yihao Wu <[email protected]>
---
net/sunrpc/cache.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index bd843a81afa0..3e523eefc47f 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -524,9 +524,11 @@ void cache_purge(struct cache_detail *detail)
struct hlist_node *tmp = NULL;
int i = 0;
+ spin_lock(&cache_list_lock);
spin_lock(&detail->hash_lock);
if (!detail->entries) {
spin_unlock(&detail->hash_lock);
+ spin_unlock(&cache_list_lock);
return;
}
@@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
}
}
spin_unlock(&detail->hash_lock);
+ spin_unlock(&cache_list_lock);
}
EXPORT_SYMBOL_GPL(cache_purge);
--
2.20.1.2432.ga663e714
> On Mar 24, 2020, at 5:49 AM, Yihao Wu <[email protected]> wrote:
>
> cache_purge should hold cache_list_lock as cache_clean does. Otherwise a cache
> can be cache_put twice, which leads to a use-after-free bug.
>
> To reproduce, run ltp. It happens rarely. /opt/ltp/runltp run -f net.nfs
>
> [14454.137661] ==================================================================
> [14454.138863] BUG: KASAN: use-after-free in cache_purge+0xce/0x160 [sunrpc]
> [14454.139822] Read of size 4 at addr ffff8883d484d560 by task nfsd/31993
> [14454.140746]
> [14454.140995] CPU: 1 PID: 31993 Comm: nfsd Kdump: loaded Not tainted 4.19.91-0.229.git.87bac30.al7.x86_64.debug #1
> [14454.141002] Call Trace:
> [14454.141014] dump_stack+0xaf/0xfb[14454.141027] print_address_description+0x6a/0x2a0
> [14454.141037] kasan_report+0x166/0x2b0[14454.141057] ? cache_purge+0xce/0x160 [sunrpc]
> [14454.141079] cache_purge+0xce/0x160 [sunrpc]
> [14454.141099] nfsd_last_thread+0x267/0x270 [nfsd][14454.141109] ? nfsd_last_thread+0x5/0x270 [nfsd]
> [14454.141130] nfsd_destroy+0xcb/0x180 [nfsd]
> [14454.141140] ? nfsd_destroy+0x5/0x180 [nfsd]
> [14454.141153] nfsd+0x1e4/0x2b0 [nfsd]
> [14454.141163] ? nfsd+0x5/0x2b0 [nfsd]
> [14454.141173] kthread+0x114/0x150
> [14454.141183] ? nfsd_destroy+0x180/0x180 [nfsd]
> [14454.141187] ? kthread_park+0xb0/0xb0
> [14454.141197] ret_from_fork+0x3a/0x50
> [14454.141224]
> [14454.141475] Allocated by task 20918:
> [14454.142011] kmem_cache_alloc_trace+0x9f/0x2e0
> [14454.142027] sunrpc_cache_lookup+0xca/0x2f0 [sunrpc]
> [14454.142037] svc_export_parse+0x1e7/0x930 [nfsd]
> [14454.142051] cache_do_downcall+0x5a/0x80 [sunrpc]
> [14454.142064] cache_downcall+0x78/0x180 [sunrpc]
> [14454.142078] cache_write_procfs+0x57/0x80 [sunrpc]
> [14454.142083] proc_reg_write+0x90/0xd0
> [14454.142088] vfs_write+0xc2/0x1c0
> [14454.142092] ksys_write+0x4d/0xd0
> [14454.142098] do_syscall_64+0x60/0x250
> [14454.142103] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [14454.142106]
> [14454.142344] Freed by task 19165:
> [14454.142804] kfree+0x114/0x300
> [14454.142819] cache_clean+0x2a4/0x2e0 [sunrpc]
> [14454.142833] cache_flush+0x24/0x60 [sunrpc]
> [14454.142845] write_flush.isra.19+0xbe/0x100 [sunrpc]
> [14454.142849] proc_reg_write+0x90/0xd0
> [14454.142853] vfs_write+0xc2/0x1c0
> [14454.142856] ksys_write+0x4d/0xd0
> [14454.142860] do_syscall_64+0x60/0x250
> [14454.142865] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [14454.142867]
> [14454.143095] The buggy address belongs to the object at ffff8883d484d540 which belongs to the cache kmalloc-256 of size 256
> [14454.144842] The buggy address is located 32 bytes inside of 256-byte region [ffff8883d484d540, ffff8883d484d640)
> [14454.146463] The buggy address belongs to the page:
> [14454.147155] page:ffffea000f521300 count:1 mapcount:0 mapping:ffff888107c02e00 index:0xffff8883d484da40 compound_map count: 0
> [14454.148712] flags: 0x17fffc00010200(slab|head)
> [14454.149356] raw: 0017fffc00010200 ffffea000f4baf00 0000000200000002 ffff888107c02e00
> [14454.150453] raw: ffff8883d484da40 0000000080190001 00000001ffffffff 0000000000000000
> [14454.151557] page dumped because: kasan: bad access detected
> [14454.152364]
> [14454.152606] Memory state around the buggy address:
> [14454.153300] ffff8883d484d400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [14454.154319] ffff8883d484d480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [14454.155324] >ffff8883d484d500: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
> [14454.156334] ^
> [14454.157237] ffff8883d484d580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [14454.158262] ffff8883d484d600: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> [14454.159282] ==================================================================
> [14454.160224] Disabling lock debugging due to kernel taint
>
> Fixes: 471a930ad7d1(SUNRPC: Drop all entries from cache_detail when cache_purge())
> Cc: [email protected] #v4.11+
> Signed-off-by: Yihao Wu <[email protected]>
Mechanically this looks OK, but I would feel more comfortable
if a domain expert could review this. Neil, Trond, Bruce?
> ---
> net/sunrpc/cache.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index bd843a81afa0..3e523eefc47f 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -524,9 +524,11 @@ void cache_purge(struct cache_detail *detail)
> struct hlist_node *tmp = NULL;
> int i = 0;
>
> + spin_lock(&cache_list_lock);
> spin_lock(&detail->hash_lock);
> if (!detail->entries) {
> spin_unlock(&detail->hash_lock);
> + spin_unlock(&cache_list_lock);
> return;
> }
>
> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
> }
> }
> spin_unlock(&detail->hash_lock);
> + spin_unlock(&cache_list_lock);
> }
> EXPORT_SYMBOL_GPL(cache_purge);
>
> --
> 2.20.1.2432.ga663e714
>
--
Chuck Lever
> On Mar 24, 2020, at 5:49 AM, Yihao Wu <[email protected]> wrote:
>
> cache_purge should hold cache_list_lock as cache_clean does. Otherwise a cache
> can be cache_put twice, which leads to a use-after-free bug.
>
> To reproduce, run ltp. It happens rarely. /opt/ltp/runltp run -f net.nfs
>
> [14454.137661] ==================================================================
> [14454.138863] BUG: KASAN: use-after-free in cache_purge+0xce/0x160 [sunrpc]
> [14454.139822] Read of size 4 at addr ffff8883d484d560 by task nfsd/31993
> [14454.140746]
> [14454.140995] CPU: 1 PID: 31993 Comm: nfsd Kdump: loaded Not tainted 4.19.91-0.229.git.87bac30.al7.x86_64.debug #1
> [14454.141002] Call Trace:
> [14454.141014] dump_stack+0xaf/0xfb[14454.141027] print_address_description+0x6a/0x2a0
> [14454.141037] kasan_report+0x166/0x2b0[14454.141057] ? cache_purge+0xce/0x160 [sunrpc]
> [14454.141079] cache_purge+0xce/0x160 [sunrpc]
> [14454.141099] nfsd_last_thread+0x267/0x270 [nfsd][14454.141109] ? nfsd_last_thread+0x5/0x270 [nfsd]
> [14454.141130] nfsd_destroy+0xcb/0x180 [nfsd]
> [14454.141140] ? nfsd_destroy+0x5/0x180 [nfsd]
> [14454.141153] nfsd+0x1e4/0x2b0 [nfsd]
> [14454.141163] ? nfsd+0x5/0x2b0 [nfsd]
> [14454.141173] kthread+0x114/0x150
> [14454.141183] ? nfsd_destroy+0x180/0x180 [nfsd]
> [14454.141187] ? kthread_park+0xb0/0xb0
> [14454.141197] ret_from_fork+0x3a/0x50
> [14454.141224]
> [14454.141475] Allocated by task 20918:
> [14454.142011] kmem_cache_alloc_trace+0x9f/0x2e0
> [14454.142027] sunrpc_cache_lookup+0xca/0x2f0 [sunrpc]
> [14454.142037] svc_export_parse+0x1e7/0x930 [nfsd]
> [14454.142051] cache_do_downcall+0x5a/0x80 [sunrpc]
> [14454.142064] cache_downcall+0x78/0x180 [sunrpc]
> [14454.142078] cache_write_procfs+0x57/0x80 [sunrpc]
> [14454.142083] proc_reg_write+0x90/0xd0
> [14454.142088] vfs_write+0xc2/0x1c0
> [14454.142092] ksys_write+0x4d/0xd0
> [14454.142098] do_syscall_64+0x60/0x250
> [14454.142103] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [14454.142106]
> [14454.142344] Freed by task 19165:
> [14454.142804] kfree+0x114/0x300
> [14454.142819] cache_clean+0x2a4/0x2e0 [sunrpc]
> [14454.142833] cache_flush+0x24/0x60 [sunrpc]
> [14454.142845] write_flush.isra.19+0xbe/0x100 [sunrpc]
> [14454.142849] proc_reg_write+0x90/0xd0
> [14454.142853] vfs_write+0xc2/0x1c0
> [14454.142856] ksys_write+0x4d/0xd0
> [14454.142860] do_syscall_64+0x60/0x250
> [14454.142865] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [14454.142867]
> [14454.143095] The buggy address belongs to the object at ffff8883d484d540 which belongs to the cache kmalloc-256 of size 256
> [14454.144842] The buggy address is located 32 bytes inside of 256-byte region [ffff8883d484d540, ffff8883d484d640)
> [14454.146463] The buggy address belongs to the page:
> [14454.147155] page:ffffea000f521300 count:1 mapcount:0 mapping:ffff888107c02e00 index:0xffff8883d484da40 compound_map count: 0
> [14454.148712] flags: 0x17fffc00010200(slab|head)
> [14454.149356] raw: 0017fffc00010200 ffffea000f4baf00 0000000200000002 ffff888107c02e00
> [14454.150453] raw: ffff8883d484da40 0000000080190001 00000001ffffffff 0000000000000000
> [14454.151557] page dumped because: kasan: bad access detected
> [14454.152364]
> [14454.152606] Memory state around the buggy address:
> [14454.153300] ffff8883d484d400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [14454.154319] ffff8883d484d480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [14454.155324] >ffff8883d484d500: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
> [14454.156334] ^
> [14454.157237] ffff8883d484d580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [14454.158262] ffff8883d484d600: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> [14454.159282] ==================================================================
> [14454.160224] Disabling lock debugging due to kernel taint
>
> Fixes: 471a930ad7d1(SUNRPC: Drop all entries from cache_detail when cache_purge())
> Cc: [email protected] #v4.11+
Yihao, I couldn't get this patch to apply to kernels before v5.0.
I don't think we need both a Fixes tag and a Cc: stable, because
stable maintainers will try to apply this patch to any kernel that
has 471a930, and ignore the failures.
So if I apply your fix, I'm going to drop the Cc: stable tag.
> Signed-off-by: Yihao Wu <[email protected]>
> ---
> net/sunrpc/cache.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index bd843a81afa0..3e523eefc47f 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -524,9 +524,11 @@ void cache_purge(struct cache_detail *detail)
> struct hlist_node *tmp = NULL;
> int i = 0;
>
> + spin_lock(&cache_list_lock);
> spin_lock(&detail->hash_lock);
> if (!detail->entries) {
> spin_unlock(&detail->hash_lock);
> + spin_unlock(&cache_list_lock);
> return;
> }
>
> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
> }
> }
> spin_unlock(&detail->hash_lock);
> + spin_unlock(&cache_list_lock);
> }
> EXPORT_SYMBOL_GPL(cache_purge);
>
> --
> 2.20.1.2432.ga663e714
>
--
Chuck Lever
On 2020/3/24 10:18 PM, Chuck Lever wrote:
>> Fixes: 471a930ad7d1(SUNRPC: Drop all entries from cache_detail when cache_purge())
>> Cc: [email protected] #v4.11+
> Yihao, I couldn't get this patch to apply to kernels before v5.0.
>
> I don't think we need both a Fixes tag and a Cc: stable, because
> stable maintainers will try to apply this patch to any kernel that
> has 471a930, and ignore the failures.
>
> So if I apply your fix, I'm going to drop the Cc: stable tag.
>
>
OK. And thanks for reviewing and reminding me this, Chuck.
Yihao Wu
On Tue, Mar 24, 2020 at 09:38:55AM -0400, Chuck Lever wrote:
> Mechanically this looks OK, but I would feel more comfortable
> if a domain expert could review this. Neil, Trond, Bruce?
Looks right to me.
Reviewed-by: J. Bruce Fields <[email protected]>
--b.
> On Mar 24, 2020, at 10:33 AM, Bruce Fields <[email protected]> wrote:
>
> On Tue, Mar 24, 2020 at 09:38:55AM -0400, Chuck Lever wrote:
>> Mechanically this looks OK, but I would feel more comfortable
>> if a domain expert could review this. Neil, Trond, Bruce?
>
> Looks right to me.
>
> Reviewed-by: J. Bruce Fields <[email protected]>
Thanks, Bruce! I've added your R-b.
--
Chuck Lever
On Tue, 2020-03-24 at 09:38 -0400, Chuck Lever wrote:
> > On Mar 24, 2020, at 5:49 AM, Yihao Wu <[email protected]>
> > wrote:
> >
> > cache_purge should hold cache_list_lock as cache_clean does.
> > Otherwise a cache
> > can be cache_put twice, which leads to a use-after-free bug.
> >
> > To reproduce, run ltp. It happens rarely. /opt/ltp/runltp run -f
> > net.nfs
> >
> > [14454.137661]
> > ==================================================================
> > [14454.138863] BUG: KASAN: use-after-free in cache_purge+0xce/0x160
> > [sunrpc]
> > [14454.139822] Read of size 4 at addr ffff8883d484d560 by task
> > nfsd/31993
> > [14454.140746]
> > [14454.140995] CPU: 1 PID: 31993 Comm: nfsd Kdump: loaded Not
> > tainted 4.19.91-0.229.git.87bac30.al7.x86_64.debug #1
> > [14454.141002] Call Trace:
> > [14454.141014] dump_stack+0xaf/0xfb[14454.141027] print_address_d
> > escription+0x6a/0x2a0
> > [14454.141037] kasan_report+0x166/0x2b0[14454.141057] ?
> > cache_purge+0xce/0x160 [sunrpc]
> > [14454.141079] cache_purge+0xce/0x160 [sunrpc]
> > [14454.141099] nfsd_last_thread+0x267/0x270
> > [nfsd][14454.141109] ? nfsd_last_thread+0x5/0x270 [nfsd]
> > [14454.141130] nfsd_destroy+0xcb/0x180 [nfsd]
> > [14454.141140] ? nfsd_destroy+0x5/0x180 [nfsd]
> > [14454.141153] nfsd+0x1e4/0x2b0 [nfsd]
> > [14454.141163] ? nfsd+0x5/0x2b0 [nfsd]
> > [14454.141173] kthread+0x114/0x150
> > [14454.141183] ? nfsd_destroy+0x180/0x180 [nfsd]
> > [14454.141187] ? kthread_park+0xb0/0xb0
> > [14454.141197] ret_from_fork+0x3a/0x50
> > [14454.141224]
> > [14454.141475] Allocated by task 20918:
> > [14454.142011] kmem_cache_alloc_trace+0x9f/0x2e0
> > [14454.142027] sunrpc_cache_lookup+0xca/0x2f0 [sunrpc]
> > [14454.142037] svc_export_parse+0x1e7/0x930 [nfsd]
> > [14454.142051] cache_do_downcall+0x5a/0x80 [sunrpc]
> > [14454.142064] cache_downcall+0x78/0x180 [sunrpc]
> > [14454.142078] cache_write_procfs+0x57/0x80 [sunrpc]
> > [14454.142083] proc_reg_write+0x90/0xd0
> > [14454.142088] vfs_write+0xc2/0x1c0
> > [14454.142092] ksys_write+0x4d/0xd0
> > [14454.142098] do_syscall_64+0x60/0x250
> > [14454.142103] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > [14454.142106]
> > [14454.142344] Freed by task 19165:
> > [14454.142804] kfree+0x114/0x300
> > [14454.142819] cache_clean+0x2a4/0x2e0 [sunrpc]
> > [14454.142833] cache_flush+0x24/0x60 [sunrpc]
> > [14454.142845] write_flush.isra.19+0xbe/0x100 [sunrpc]
> > [14454.142849] proc_reg_write+0x90/0xd0
> > [14454.142853] vfs_write+0xc2/0x1c0
> > [14454.142856] ksys_write+0x4d/0xd0
> > [14454.142860] do_syscall_64+0x60/0x250
> > [14454.142865] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > [14454.142867]
> > [14454.143095] The buggy address belongs to the object at
> > ffff8883d484d540 which belongs to the cache kmalloc-256 of size 256
> > [14454.144842] The buggy address is located 32 bytes inside
> > of 256-byte region [ffff8883d484d540, ffff8883d484d640)
> > [14454.146463] The buggy address belongs to the page:
> > [14454.147155] page:ffffea000f521300 count:1 mapcount:0
> > mapping:ffff888107c02e00 index:0xffff8883d484da40 compound_map
> > count: 0
> > [14454.148712] flags: 0x17fffc00010200(slab|head)
> > [14454.149356] raw: 0017fffc00010200 ffffea000f4baf00
> > 0000000200000002 ffff888107c02e00
> > [14454.150453] raw: ffff8883d484da40 0000000080190001
> > 00000001ffffffff 0000000000000000
> > [14454.151557] page dumped because: kasan: bad access detected
> > [14454.152364]
> > [14454.152606] Memory state around the buggy address:
> > [14454.153300] ffff8883d484d400: fb fb fb fb fb fb fb fb fb fb fb
> > fb fb fb fb fb
> > [14454.154319] ffff8883d484d480: fb fb fb fb fb fb fb fb fb fb fb
> > fb fb fb fb fb
> > [14454.155324] >ffff8883d484d500: fc fc fc fc fc fc fc fc fb fb fb
> > fb fb fb fb fb
> > [14454.156334]
> > ^
> > [14454.157237] ffff8883d484d580: fb fb fb fb fb fb fb fb fb fb fb
> > fb fb fb fb fb
> > [14454.158262] ffff8883d484d600: fb fb fb fb fb fb fb fb fc fc fc
> > fc fc fc fc fc
> > [14454.159282]
> > ==================================================================
> > [14454.160224] Disabling lock debugging due to kernel taint
> >
> > Fixes: 471a930ad7d1(SUNRPC: Drop all entries from cache_detail when
> > cache_purge())
> > Cc: [email protected] #v4.11+
> > Signed-off-by: Yihao Wu <[email protected]>
>
> Mechanically this looks OK, but I would feel more comfortable
> if a domain expert could review this. Neil, Trond, Bruce?
>
>
> > ---
> > net/sunrpc/cache.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> > index bd843a81afa0..3e523eefc47f 100644
> > --- a/net/sunrpc/cache.c
> > +++ b/net/sunrpc/cache.c
> > @@ -524,9 +524,11 @@ void cache_purge(struct cache_detail *detail)
> > struct hlist_node *tmp = NULL;
> > int i = 0;
> >
> > + spin_lock(&cache_list_lock);
> > spin_lock(&detail->hash_lock);
> > if (!detail->entries) {
> > spin_unlock(&detail->hash_lock);
> > + spin_unlock(&cache_list_lock);
> > return;
> > }
> >
> > @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
> > }
> > }
> > spin_unlock(&detail->hash_lock);
> > + spin_unlock(&cache_list_lock);
> > }
> > EXPORT_SYMBOL_GPL(cache_purge);
Hmm... Shouldn't this patch be dropping cache_list_lock() when we call
sunrpc_end_cache_remove_entry()? The latter does call both
cache_revisit_request() and cache_put(), and while they do not
explicitly call anything that holds cache_list_lock, some of those cd-
>cache_put callbacks do look as if there is potential for deadlock.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
> On Mar 24, 2020, at 11:24 AM, Trond Myklebust <[email protected]> wrote:
>
> On Tue, 2020-03-24 at 09:38 -0400, Chuck Lever wrote:
>>> On Mar 24, 2020, at 5:49 AM, Yihao Wu <[email protected]>
>>> wrote:
>>>
>>> cache_purge should hold cache_list_lock as cache_clean does.
>>> Otherwise a cache
>>> can be cache_put twice, which leads to a use-after-free bug.
>>>
>>> To reproduce, run ltp. It happens rarely. /opt/ltp/runltp run -f
>>> net.nfs
>>>
>>> [14454.137661]
>>> ==================================================================
>>> [14454.138863] BUG: KASAN: use-after-free in cache_purge+0xce/0x160
>>> [sunrpc]
>>> [14454.139822] Read of size 4 at addr ffff8883d484d560 by task
>>> nfsd/31993
>>> [14454.140746]
>>> [14454.140995] CPU: 1 PID: 31993 Comm: nfsd Kdump: loaded Not
>>> tainted 4.19.91-0.229.git.87bac30.al7.x86_64.debug #1
>>> [14454.141002] Call Trace:
>>> [14454.141014] dump_stack+0xaf/0xfb[14454.141027] print_address_d
>>> escription+0x6a/0x2a0
>>> [14454.141037] kasan_report+0x166/0x2b0[14454.141057] ?
>>> cache_purge+0xce/0x160 [sunrpc]
>>> [14454.141079] cache_purge+0xce/0x160 [sunrpc]
>>> [14454.141099] nfsd_last_thread+0x267/0x270
>>> [nfsd][14454.141109] ? nfsd_last_thread+0x5/0x270 [nfsd]
>>> [14454.141130] nfsd_destroy+0xcb/0x180 [nfsd]
>>> [14454.141140] ? nfsd_destroy+0x5/0x180 [nfsd]
>>> [14454.141153] nfsd+0x1e4/0x2b0 [nfsd]
>>> [14454.141163] ? nfsd+0x5/0x2b0 [nfsd]
>>> [14454.141173] kthread+0x114/0x150
>>> [14454.141183] ? nfsd_destroy+0x180/0x180 [nfsd]
>>> [14454.141187] ? kthread_park+0xb0/0xb0
>>> [14454.141197] ret_from_fork+0x3a/0x50
>>> [14454.141224]
>>> [14454.141475] Allocated by task 20918:
>>> [14454.142011] kmem_cache_alloc_trace+0x9f/0x2e0
>>> [14454.142027] sunrpc_cache_lookup+0xca/0x2f0 [sunrpc]
>>> [14454.142037] svc_export_parse+0x1e7/0x930 [nfsd]
>>> [14454.142051] cache_do_downcall+0x5a/0x80 [sunrpc]
>>> [14454.142064] cache_downcall+0x78/0x180 [sunrpc]
>>> [14454.142078] cache_write_procfs+0x57/0x80 [sunrpc]
>>> [14454.142083] proc_reg_write+0x90/0xd0
>>> [14454.142088] vfs_write+0xc2/0x1c0
>>> [14454.142092] ksys_write+0x4d/0xd0
>>> [14454.142098] do_syscall_64+0x60/0x250
>>> [14454.142103] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>> [14454.142106]
>>> [14454.142344] Freed by task 19165:
>>> [14454.142804] kfree+0x114/0x300
>>> [14454.142819] cache_clean+0x2a4/0x2e0 [sunrpc]
>>> [14454.142833] cache_flush+0x24/0x60 [sunrpc]
>>> [14454.142845] write_flush.isra.19+0xbe/0x100 [sunrpc]
>>> [14454.142849] proc_reg_write+0x90/0xd0
>>> [14454.142853] vfs_write+0xc2/0x1c0
>>> [14454.142856] ksys_write+0x4d/0xd0
>>> [14454.142860] do_syscall_64+0x60/0x250
>>> [14454.142865] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>> [14454.142867]
>>> [14454.143095] The buggy address belongs to the object at
>>> ffff8883d484d540 which belongs to the cache kmalloc-256 of size 256
>>> [14454.144842] The buggy address is located 32 bytes inside
>>> of 256-byte region [ffff8883d484d540, ffff8883d484d640)
>>> [14454.146463] The buggy address belongs to the page:
>>> [14454.147155] page:ffffea000f521300 count:1 mapcount:0
>>> mapping:ffff888107c02e00 index:0xffff8883d484da40 compound_map
>>> count: 0
>>> [14454.148712] flags: 0x17fffc00010200(slab|head)
>>> [14454.149356] raw: 0017fffc00010200 ffffea000f4baf00
>>> 0000000200000002 ffff888107c02e00
>>> [14454.150453] raw: ffff8883d484da40 0000000080190001
>>> 00000001ffffffff 0000000000000000
>>> [14454.151557] page dumped because: kasan: bad access detected
>>> [14454.152364]
>>> [14454.152606] Memory state around the buggy address:
>>> [14454.153300] ffff8883d484d400: fb fb fb fb fb fb fb fb fb fb fb
>>> fb fb fb fb fb
>>> [14454.154319] ffff8883d484d480: fb fb fb fb fb fb fb fb fb fb fb
>>> fb fb fb fb fb
>>> [14454.155324] >ffff8883d484d500: fc fc fc fc fc fc fc fc fb fb fb
>>> fb fb fb fb fb
>>> [14454.156334]
>>> ^
>>> [14454.157237] ffff8883d484d580: fb fb fb fb fb fb fb fb fb fb fb
>>> fb fb fb fb fb
>>> [14454.158262] ffff8883d484d600: fb fb fb fb fb fb fb fb fc fc fc
>>> fc fc fc fc fc
>>> [14454.159282]
>>> ==================================================================
>>> [14454.160224] Disabling lock debugging due to kernel taint
>>>
>>> Fixes: 471a930ad7d1(SUNRPC: Drop all entries from cache_detail when
>>> cache_purge())
>>> Cc: [email protected] #v4.11+
>>> Signed-off-by: Yihao Wu <[email protected]>
>>
>> Mechanically this looks OK, but I would feel more comfortable
>> if a domain expert could review this. Neil, Trond, Bruce?
>>
>>
>>> ---
>>> net/sunrpc/cache.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
>>> index bd843a81afa0..3e523eefc47f 100644
>>> --- a/net/sunrpc/cache.c
>>> +++ b/net/sunrpc/cache.c
>>> @@ -524,9 +524,11 @@ void cache_purge(struct cache_detail *detail)
>>> struct hlist_node *tmp = NULL;
>>> int i = 0;
>>>
>>> + spin_lock(&cache_list_lock);
>>> spin_lock(&detail->hash_lock);
>>> if (!detail->entries) {
>>> spin_unlock(&detail->hash_lock);
>>> + spin_unlock(&cache_list_lock);
>>> return;
>>> }
>>>
>>> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
>>> }
>>> }
>>> spin_unlock(&detail->hash_lock);
>>> + spin_unlock(&cache_list_lock);
>>> }
>>> EXPORT_SYMBOL_GPL(cache_purge);
>
>
> Hmm... Shouldn't this patch be dropping cache_list_lock() when we call
> sunrpc_end_cache_remove_entry()? The latter does call both
> cache_revisit_request() and cache_put(), and while they do not
> explicitly call anything that holds cache_list_lock, some of those cd-
>> cache_put callbacks do look as if there is potential for deadlock.
I see svc_export_put calling dput, eventually, which might_sleep().
--
Chuck Lever
On Tue, Mar 24 2020, Yihao Wu wrote:
> cache_purge should hold cache_list_lock as cache_clean does. Otherwise a cache
> can be cache_put twice, which leads to a use-after-free bug.
>
> To reproduce, run ltp. It happens rarely. /opt/ltp/runltp run -f net.nfs
>
> [14454.137661] ==================================================================
> [14454.138863] BUG: KASAN: use-after-free in cache_purge+0xce/0x160 [sunrpc]
> [14454.139822] Read of size 4 at addr ffff8883d484d560 by task nfsd/31993
> [14454.140746]
> [14454.140995] CPU: 1 PID: 31993 Comm: nfsd Kdump: loaded Not tainted 4.19.91-0.229.git.87bac30.al7.x86_64.debug #1
> [14454.141002] Call Trace:
> [14454.141014] dump_stack+0xaf/0xfb[14454.141027] print_address_description+0x6a/0x2a0
> [14454.141037] kasan_report+0x166/0x2b0[14454.141057] ? cache_purge+0xce/0x160 [sunrpc]
> [14454.141079] cache_purge+0xce/0x160 [sunrpc]
> [14454.141099] nfsd_last_thread+0x267/0x270 [nfsd][14454.141109] ? nfsd_last_thread+0x5/0x270 [nfsd]
> [14454.141130] nfsd_destroy+0xcb/0x180 [nfsd]
> [14454.141140] ? nfsd_destroy+0x5/0x180 [nfsd]
> [14454.141153] nfsd+0x1e4/0x2b0 [nfsd]
> [14454.141163] ? nfsd+0x5/0x2b0 [nfsd]
> [14454.141173] kthread+0x114/0x150
> [14454.141183] ? nfsd_destroy+0x180/0x180 [nfsd]
> [14454.141187] ? kthread_park+0xb0/0xb0
> [14454.141197] ret_from_fork+0x3a/0x50
> [14454.141224]
> [14454.141475] Allocated by task 20918:
> [14454.142011] kmem_cache_alloc_trace+0x9f/0x2e0
> [14454.142027] sunrpc_cache_lookup+0xca/0x2f0 [sunrpc]
> [14454.142037] svc_export_parse+0x1e7/0x930 [nfsd]
> [14454.142051] cache_do_downcall+0x5a/0x80 [sunrpc]
> [14454.142064] cache_downcall+0x78/0x180 [sunrpc]
> [14454.142078] cache_write_procfs+0x57/0x80 [sunrpc]
> [14454.142083] proc_reg_write+0x90/0xd0
> [14454.142088] vfs_write+0xc2/0x1c0
> [14454.142092] ksys_write+0x4d/0xd0
> [14454.142098] do_syscall_64+0x60/0x250
> [14454.142103] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [14454.142106]
> [14454.142344] Freed by task 19165:
> [14454.142804] kfree+0x114/0x300
> [14454.142819] cache_clean+0x2a4/0x2e0 [sunrpc]
> [14454.142833] cache_flush+0x24/0x60 [sunrpc]
> [14454.142845] write_flush.isra.19+0xbe/0x100 [sunrpc]
> [14454.142849] proc_reg_write+0x90/0xd0
> [14454.142853] vfs_write+0xc2/0x1c0
> [14454.142856] ksys_write+0x4d/0xd0
> [14454.142860] do_syscall_64+0x60/0x250
> [14454.142865] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [14454.142867]
> [14454.143095] The buggy address belongs to the object at ffff8883d484d540 which belongs to the cache kmalloc-256 of size 256
> [14454.144842] The buggy address is located 32 bytes inside of 256-byte region [ffff8883d484d540, ffff8883d484d640)
> [14454.146463] The buggy address belongs to the page:
> [14454.147155] page:ffffea000f521300 count:1 mapcount:0 mapping:ffff888107c02e00 index:0xffff8883d484da40 compound_map count: 0
> [14454.148712] flags: 0x17fffc00010200(slab|head)
> [14454.149356] raw: 0017fffc00010200 ffffea000f4baf00 0000000200000002 ffff888107c02e00
> [14454.150453] raw: ffff8883d484da40 0000000080190001 00000001ffffffff 0000000000000000
> [14454.151557] page dumped because: kasan: bad access detected
> [14454.152364]
> [14454.152606] Memory state around the buggy address:
> [14454.153300] ffff8883d484d400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [14454.154319] ffff8883d484d480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [14454.155324] >ffff8883d484d500: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
> [14454.156334] ^
> [14454.157237] ffff8883d484d580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [14454.158262] ffff8883d484d600: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> [14454.159282] ==================================================================
> [14454.160224] Disabling lock debugging due to kernel taint
>
> Fixes: 471a930ad7d1(SUNRPC: Drop all entries from cache_detail when cache_purge())
> Cc: [email protected] #v4.11+
> Signed-off-by: Yihao Wu <[email protected]>
> ---
> net/sunrpc/cache.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index bd843a81afa0..3e523eefc47f 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -524,9 +524,11 @@ void cache_purge(struct cache_detail *detail)
> struct hlist_node *tmp = NULL;
> int i = 0;
>
> + spin_lock(&cache_list_lock);
> spin_lock(&detail->hash_lock);
> if (!detail->entries) {
> spin_unlock(&detail->hash_lock);
> + spin_unlock(&cache_list_lock);
> return;
> }
>
> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
> }
> }
> spin_unlock(&detail->hash_lock);
> + spin_unlock(&cache_list_lock);
> }
> EXPORT_SYMBOL_GPL(cache_purge);
>
> --
> 2.20.1.2432.ga663e714
I wonder if this is the best solution.
This code:
hlist_for_each_entry_safe(ch, tmp, head, cache_list) {
sunrpc_begin_cache_remove_entry(ch, detail);
spin_unlock(&detail->hash_lock);
sunrpc_end_cache_remove_entry(ch, detail);
spin_lock(&detail->hash_lock);
}
Looks wrong.
Dropping a lock while walking a list if only safe if you hold a
reference to the place-holder - 'tmp' in this case. but we don't.
As this is trying to remove everything in the list it would be safer to
do something like
while (!hlist_empty(head)) {
ch = hlist_entry(head->first, struct cache_head, h);
sunrpc_begin_cache_remove_entry(ch, detail);
spin_unlock(&detail->hash_lock);
sunrpc_end_cache_remove_entry(ch, detail);
spin_lock(&detail->hash_lock);
}
I'm guessing that would fix the problem in a more focused.
But I'm not 100% sure because there is no analysis given of the cause.
What line is
cache_purge+0xce/0x160
./scripts/faddr2line can tell you.
I suspect it is the hlist_for_each_entry_safe() line.
Can you test the change I suggest and see if it helps?
Thanks,
NeilBrown
Hi
[This is an automated email]
The bot has tested the following trees: v5.5.11, v5.4.27, v4.19.112, v4.14.174, v4.9.217, v4.4.217.
v5.5.11: Build OK!
v5.4.27: Build OK!
v4.19.112: Failed to apply! Possible dependencies:
1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")
v4.14.174: Failed to apply! Possible dependencies:
1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")
v4.9.217: Failed to apply! Possible dependencies:
1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")
2b477c00f3bd ("svcrpc: free contexts immediately on PROC_DESTROY")
471a930ad7d1 ("SUNRPC: Drop all entries from cache_detail when cache_purge()")
v4.4.217: Failed to apply! Possible dependencies:
1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")
2b477c00f3bd ("svcrpc: free contexts immediately on PROC_DESTROY")
471a930ad7d1 ("SUNRPC: Drop all entries from cache_detail when cache_purge()")
d8d29138b17c ("sunrpc: remove 'inuse' flag from struct cache_detail.")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks
Sasha
On 2020/3/25 1:46 AM, Chuck Lever wrote:
>>>> ---
>>>> net/sunrpc/cache.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
>>>> index bd843a81afa0..3e523eefc47f 100644
>>>> --- a/net/sunrpc/cache.c
>>>> +++ b/net/sunrpc/cache.c
>>>> @@ -524,9 +524,11 @@ void cache_purge(struct cache_detail *detail)
>>>> struct hlist_node *tmp = NULL;
>>>> int i = 0;
>>>>
>>>> + spin_lock(&cache_list_lock);
>>>> spin_lock(&detail->hash_lock);
>>>> if (!detail->entries) {
>>>> spin_unlock(&detail->hash_lock);
>>>> + spin_unlock(&cache_list_lock);
>>>> return;
>>>> }
>>>>
>>>> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
>>>> }
>>>> }
>>>> spin_unlock(&detail->hash_lock);
>>>> + spin_unlock(&cache_list_lock);
>>>> }
>>>> EXPORT_SYMBOL_GPL(cache_purge);
>>
>> Hmm... Shouldn't this patch be dropping cache_list_lock() when we call
>> sunrpc_end_cache_remove_entry()? The latter does call both
>> cache_revisit_request() and cache_put(), and while they do not
>> explicitly call anything that holds cache_list_lock, some of those cd-
>>> cache_put callbacks do look as if there is potential for deadlock.
> I see svc_export_put calling dput, eventually, which might_sleep().
Wow that's a little strange. If svc_export_put->dput might_sleep, why can we
spin_lock(&detail->hash_lock); in cache_purge in the first place?
And I agree with Trond those cd->cache_put callbacks are dangerous. I will look
into them today.
But if we dropping cache_list_lock when we call sunrpc_end_cache_remove_entry,
cache_put is not protected, and this patch won't work anymore, right?
Thanks,
Yihao Wu
> On Mar 24, 2020, at 9:01 PM, Sasha Levin <[email protected]> wrote:
>
> Hi
>
> [This is an automated email]
>
> The bot has tested the following trees: v5.5.11, v5.4.27, v4.19.112, v4.14.174, v4.9.217, v4.4.217.
>
> v5.5.11: Build OK!
> v5.4.27: Build OK!
> v4.19.112: Failed to apply! Possible dependencies:
> 1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")
>
> v4.14.174: Failed to apply! Possible dependencies:
> 1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")
>
> v4.9.217: Failed to apply! Possible dependencies:
> 1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")
> 2b477c00f3bd ("svcrpc: free contexts immediately on PROC_DESTROY")
> 471a930ad7d1 ("SUNRPC: Drop all entries from cache_detail when cache_purge()")
>
> v4.4.217: Failed to apply! Possible dependencies:
> 1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")
> 2b477c00f3bd ("svcrpc: free contexts immediately on PROC_DESTROY")
> 471a930ad7d1 ("SUNRPC: Drop all entries from cache_detail when cache_purge()")
> d8d29138b17c ("sunrpc: remove 'inuse' flag from struct cache_detail.")
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
Please drop it. Thanks!
--
Chuck Lever
> On Mar 25, 2020, at 2:37 AM, Yihao Wu <[email protected]> wrote:
>
> On 2020/3/25 1:46 AM, Chuck Lever wrote:
>>>>> ---
>>>>> net/sunrpc/cache.c | 3 +++
>>>>> 1 file changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
>>>>> index bd843a81afa0..3e523eefc47f 100644
>>>>> --- a/net/sunrpc/cache.c
>>>>> +++ b/net/sunrpc/cache.c
>>>>> @@ -524,9 +524,11 @@ void cache_purge(struct cache_detail *detail)
>>>>> struct hlist_node *tmp = NULL;
>>>>> int i = 0;
>>>>>
>>>>> + spin_lock(&cache_list_lock);
>>>>> spin_lock(&detail->hash_lock);
>>>>> if (!detail->entries) {
>>>>> spin_unlock(&detail->hash_lock);
>>>>> + spin_unlock(&cache_list_lock);
>>>>> return;
>>>>> }
>>>>>
>>>>> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
>>>>> }
>>>>> }
>>>>> spin_unlock(&detail->hash_lock);
>>>>> + spin_unlock(&cache_list_lock);
>>>>> }
>>>>> EXPORT_SYMBOL_GPL(cache_purge);
>>>
>>> Hmm... Shouldn't this patch be dropping cache_list_lock() when we call
>>> sunrpc_end_cache_remove_entry()? The latter does call both
>>> cache_revisit_request() and cache_put(), and while they do not
>>> explicitly call anything that holds cache_list_lock, some of those cd-
>>>> cache_put callbacks do look as if there is potential for deadlock.
>> I see svc_export_put calling dput, eventually, which might_sleep().
>
> Wow that's a little strange. If svc_export_put->dput might_sleep, why can we
> spin_lock(&detail->hash_lock); in cache_purge in the first place?
>
> And I agree with Trond those cd->cache_put callbacks are dangerous. I will look
> into them today.
>
> But if we dropping cache_list_lock when we call sunrpc_end_cache_remove_entry,
> cache_put is not protected, and this patch won't work anymore, right?
IMHO Neil's proposed solution seems pretty safe, and follows a well-understood
pattern.
It would be nice (but not 100% necessary) if the race you found was spelled out
in the patch description.
Thanks!
--
Chuck Lever
On 2020/3/25 10:24 PM, Chuck Lever wrote:
>
>
>> On Mar 25, 2020, at 2:37 AM, Yihao Wu <[email protected]> wrote:
>>
>> On 2020/3/25 1:46 AM, Chuck Lever wrote:
>>>>>> ---
>>>>>> net/sunrpc/cache.c | 3 +++
>>>>>> 1 file changed, 3 insertions(+)
>>>>>>
>>>>>> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
>>>>>> index bd843a81afa0..3e523eefc47f 100644
>>>>>> --- a/net/sunrpc/cache.c
>>>>>> +++ b/net/sunrpc/cache.c
>>>>>> @@ -524,9 +524,11 @@ void cache_purge(struct cache_detail *detail)
>>>>>> struct hlist_node *tmp = NULL;
>>>>>> int i = 0;
>>>>>>
>>>>>> + spin_lock(&cache_list_lock);
>>>>>> spin_lock(&detail->hash_lock);
>>>>>> if (!detail->entries) {
>>>>>> spin_unlock(&detail->hash_lock);
>>>>>> + spin_unlock(&cache_list_lock);
>>>>>> return;
>>>>>> }
>>>>>>
>>>>>> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
>>>>>> }
>>>>>> }
>>>>>> spin_unlock(&detail->hash_lock);
>>>>>> + spin_unlock(&cache_list_lock);
>>>>>> }
>>>>>> EXPORT_SYMBOL_GPL(cache_purge);
>>>>
>>>> Hmm... Shouldn't this patch be dropping cache_list_lock() when we call
>>>> sunrpc_end_cache_remove_entry()? The latter does call both
>>>> cache_revisit_request() and cache_put(), and while they do not
>>>> explicitly call anything that holds cache_list_lock, some of those cd-
>>>>> cache_put callbacks do look as if there is potential for deadlock.
>>> I see svc_export_put calling dput, eventually, which might_sleep().
>>
>> Wow that's a little strange. If svc_export_put->dput might_sleep, why can we
>> spin_lock(&detail->hash_lock); in cache_purge in the first place?
>>
>> And I agree with Trond those cd->cache_put callbacks are dangerous. I will look
>> into them today.
>>
>> But if we dropping cache_list_lock when we call sunrpc_end_cache_remove_entry,
>> cache_put is not protected, and this patch won't work anymore, right?
>
> IMHO Neil's proposed solution seems pretty safe, and follows a well-understood
> pattern.
>
> It would be nice (but not 100% necessary) if the race you found was spelled out
> in the patch description.
>
> Thanks!
>
>
> --
> Chuck Lever
>
>
Yeah. I believe Neil's solution must be better.
But I'm still studying it, so I didn't reply to him yet.
OK. I'll try make it more clearly in the next version patch.
Thanks,
Yihao Wu
On 2020/3/25 7:07 AM, NeilBrown wrote:
>>
>> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail)
>> }
>> }
>> spin_unlock(&detail->hash_lock);
>> + spin_unlock(&cache_list_lock);
>> }
>> EXPORT_SYMBOL_GPL(cache_purge);
>>
>> --
>> 2.20.1.2432.ga663e714
> I wonder if this is the best solution.
> This code:
>
> hlist_for_each_entry_safe(ch, tmp, head, cache_list) {
> sunrpc_begin_cache_remove_entry(ch, detail);
> spin_unlock(&detail->hash_lock);
> sunrpc_end_cache_remove_entry(ch, detail);
> spin_lock(&detail->hash_lock);
> }
>
> Looks wrong.
> Dropping a lock while walking a list if only safe if you hold a
> reference to the place-holder - 'tmp' in this case. but we don't.
> As this is trying to remove everything in the list it would be safer to
> do something like
>
> while (!hlist_empty(head)) {
> ch = hlist_entry(head->first, struct cache_head, h);
> sunrpc_begin_cache_remove_entry(ch, detail);
> spin_unlock(&detail->hash_lock);
> sunrpc_end_cache_remove_entry(ch, detail);
> spin_lock(&detail->hash_lock);
> }
>
> I'm guessing that would fix the problem in a more focused.
> But I'm not 100% sure because there is no analysis given of the cause.
> What line is
> cache_purge+0xce/0x160
> ./scripts/faddr2line can tell you.
> I suspect it is the hlist_for_each_entry_safe() line.
>
> Can you test the change I suggest and see if it helps?
>
> Thanks,
> NeilBrown
Sorry for the late. It took me some time to reproduce the bug stably so I
can verify the correctness of the fix.
You definitely pointed out the root cause. And the solution is more elegant.
After applying your solution. The bug doesn't reproduce now.
There's no race condition. hash_lock is designed to protect cache_detail in
fine grain. And it already did its job. And yes, hlist_for_each_entry_safe
is where the bug at. It may walk to a deleted entry(tmp). My v1 solution is
a regression considering this.
So I will modify the patch title in v2 too.
BTW, I checked faddr2line output, it says cache_purge+0xce/0x160 is cache_put.
It make sense too, and doesn't go against your theory.
Here's my reproduce script:
systemctl enable rpcbind
systemctl enable nfs
systemctl start rpcbind
systemctl start nfs
mkdir /tmp/x /tmp/y
# Create some collision in hash table
for i in `seq 256`; do
mkdir /tmp/x/$i
mkdir /tmp/y/$i
exportfs localhost:/tmp/x/$i
done
for i in `seq 256`; do
mount localhost:/tmp/x/$i /tmp/y/$i
done
END=$(cat /proc/self/net/rpc/nfsd.export/flush)
NOW=$(date +%s)
sleep $((END - NOW))
# Trigger cache_purge
systemctl stop nfs &
usleep 20000
# Trigger cache_clean
echo > /proc/self/net/rpc/nfsd.export/flush
To speedup the reproducing process, I also added mdelay(500) between acquiring
and releasing hash_lock in cache_purge.
Thank you so much!
Yihao Wu