2019-04-03 02:04:18

by kernel test robot

[permalink] [raw]
Subject: 15c8410c67 ("mm/slob.c: respect list_head abstraction layer"): WARNING: CPU: 0 PID: 1 at lib/list_debug.c:28 __list_add_valid

Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master

commit 15c8410c67adefd26ea0df1f1b86e1836051784b
Author: Tobin C. Harding <[email protected]>
AuthorDate: Fri Mar 29 10:01:23 2019 +1100
Commit: Stephen Rothwell <[email protected]>
CommitDate: Sat Mar 30 16:09:41 2019 +1100

mm/slob.c: respect list_head abstraction layer

Currently we reach inside the list_head. This is a violation of the layer
of abstraction provided by the list_head. It makes the code fragile.
More importantly it makes the code wicked hard to understand.

The code logic is based on the page in which an allocation was made, we
want to modify the slob_list we are working on to have this page at the
front. We already have a function to check if an entry is at the front of
the list. Recently a function was added to list.h to do the list
rotation. We can use these two functions to reduce line count, reduce
code fragility, and reduce cognitive load required to read the code.

Use list_head functions to interact with lists thereby maintaining the
abstraction provided by the list_head structure.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Tobin C. Harding <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Roman Gushchin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>

2e1f88301e include/linux/list.h: add list_rotate_to_front()
15c8410c67 mm/slob.c: respect list_head abstraction layer
05d08e2995 Add linux-next specific files for 20190402
+-------------------------------------------------------+------------+------------+---------------+
| | 2e1f88301e | 15c8410c67 | next-20190402 |
+-------------------------------------------------------+------------+------------+---------------+
| boot_successes | 1009 | 198 | 299 |
| boot_failures | 0 | 2 | 44 |
| WARNING:at_lib/list_debug.c:#__list_add_valid | 0 | 2 | 44 |
| RIP:__list_add_valid | 0 | 2 | 44 |
| WARNING:at_lib/list_debug.c:#__list_del_entry_valid | 0 | 2 | 25 |
| RIP:__list_del_entry_valid | 0 | 2 | 25 |
| WARNING:possible_circular_locking_dependency_detected | 0 | 2 | 44 |
| RIP:_raw_spin_unlock_irqrestore | 0 | 2 | 2 |
| BUG:kernel_hang_in_test_stage | 0 | 0 | 6 |
| BUG:unable_to_handle_kernel | 0 | 0 | 1 |
| Oops:#[##] | 0 | 0 | 1 |
| RIP:slob_page_alloc | 0 | 0 | 1 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 0 | 1 |
| RIP:delay_tsc | 0 | 0 | 2 |
+-------------------------------------------------------+------------+------------+---------------+

[ 2.618737] db_root: cannot open: /etc/target
[ 2.620114] mtdoops: mtd device (mtddev=name/number) must be supplied
[ 2.620967] slram: not enough parameters.
[ 2.621614] ------------[ cut here ]------------
[ 2.622254] list_add corruption. prev->next should be next (ffffffffaeeb71b0), but was ffffcee1406d3f70. (prev=ffffcee140422508).
[ 2.623645] WARNING: CPU: 0 PID: 1 at lib/list_debug.c:28 __list_add_valid+0x42/0x79
[ 2.624760] CPU: 0 PID: 1 Comm: swapper Tainted: G T 5.1.0-rc2-00286-g15c8410 #1
[ 2.625498] RIP: 0010:__list_add_valid+0x42/0x79
[ 2.625498] Code: 74 47 48 89 d9 48 89 c2 48 c7 c7 e4 e3 9f ae e8 ad 90 ae ff 0f 0b eb 2d 48 89 c1 48 89 de 48 c7 c7 5a e4 9f ae e8 97 90 ae ff <0f> 0b eb 17 48 89 f2 48 89 d9 48 89 ee 48 c7 c7 aa e4 9f ae e8 7e
[ 2.625498] RSP: 0000:ffff8e630000bc08 EFLAGS: 00010086
[ 2.625498] RAX: 0000000000000075 RBX: ffffffffaeeb71b0 RCX: 0000000000000099
[ 2.625498] RDX: 0000000000000046 RSI: 0000000000000099 RDI: 0000000000000001
[ 2.625498] RBP: ffffffffaeeb71b0 R08: 0000000000000001 R09: 0000000000000001
[ 2.625498] R10: 000000000000004c R11: 0000000000000005 R12: ffff8b9f1f0fc268
[ 2.625498] R13: 0000000000000dc0 R14: ffffffffaeeb71b0 R15: ffffcee140422508
[ 2.625498] FS: 0000000000000000(0000) GS:ffffffffaeca3000(0000) knlGS:0000000000000000
[ 2.625498] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.625498] CR2: 0000000000000000 CR3: 0000000017e79000 CR4: 00000000000006b0
[ 2.625498] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2.625498] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2.625498] Call Trace:
[ 2.625498] slob_alloc+0xc9/0x1e9
[ 2.625498] kmem_cache_alloc+0x3f/0x1d0
[ 2.625498] __kernfs_new_node+0x54/0x196
[ 2.625498] ? kernfs_add_one+0x108/0x13f
[ 2.625498] ? __mutex_unlock_slowpath+0x3c/0x1ef
[ 2.625498] kernfs_new_node+0x4e/0x6e
[ 2.625498] __kernfs_create_file+0x65/0x10c
[ 2.625498] sysfs_add_file_mode_ns+0x14a/0x18f
[ 2.625498] sysfs_create_file_ns+0x5c/0x63
[ 2.625498] bus_add_driver+0x136/0x1a3
[ 2.625498] ? m25p80_driver_init+0x40/0x40
[ 2.625498] driver_register+0x99/0xcb
[ 2.625498] do_one_initcall+0x1e6/0x4d9
[ 2.625498] kernel_init_freeable+0x491/0x5db
[ 2.625498] ? rest_init+0x219/0x219
[ 2.625498] kernel_init+0xa/0xf0
[ 2.625498] ret_from_fork+0x3a/0x50
[ 2.625498] irq event stamp: 884120
[ 2.625498] hardirqs last enabled at (884119): [<ffffffffadf74f16>] _raw_spin_unlock_irqrestore+0x3c/0x5a
[ 2.625498] hardirqs last disabled at (884120): [<ffffffffadf74c74>] _raw_spin_lock_irqsave+0x15/0x75
[ 2.625498] softirqs last enabled at (884092): [<ffffffffae200353>] __do_softirq+0x353/0x393
[ 2.625498] softirqs last disabled at (884069): [<ffffffffad100134>] irq_exit+0x67/0x82
[ 2.625498] ---[ end trace 2b1c6a5e2748f253 ]---
[ 2.651195] ------------[ cut here ]------------
[ 2.651195] ------------[ cut here ]------------
[ 2.651812] list_del corruption. prev->next should be ffffffffaeeb71b0, but was ffffcee1406d3f70
[ 2.652857] WARNING: CPU: 0 PID: 7 at lib/list_debug.c:53 __list_del_entry_valid+0x51/0x8e
[ 2.654047] CPU: 0 PID: 7 Comm: kworker/u2:0 Tainted: G W T 5.1.0-rc2-00286-g15c8410 #1
[ 2.655122] Workqueue: events_unbound async_run_entry_fn
[ 2.655122] RIP: 0010:__list_del_entry_valid+0x51/0x8e
[ 2.655122] Code: 9f ae e8 32 90 ae ff 0f 0b eb 34 48 c7 c7 13 e5 9f ae e8 22 90 ae ff 0f 0b eb 24 48 89 c2 48 c7 c7 49 e5 9f ae e8 0f 90 ae ff <0f> 0b eb 11 48 89 c6 48 c7 c7 85 e5 9f ae e8 fc 8f ae ff 0f 0b 31
[ 2.655122] RSP: 0000:ffff8e630003bcf0 EFLAGS: 00010082
[ 2.655122] RAX: 0000000000000054 RBX: ffffffffaeeb71b0 RCX: 0000000000000090
[ 2.655122] RDX: 0000000000000046 RSI: 0000000000000090 RDI: 0000000000000001
[ 2.655122] RBP: 0000000000000048 R08: 0000000000000001 R09: 0000000000000001
[ 2.655122] R10: 0000000000000002 R11: 0000000000000005 R12: ffff8b9f1f36bfb8
[ 2.655122] R13: 0000000000000cc0 R14: ffffcee1406d3f70 R15: ffffcee1406d3f68
[ 2.655122] FS: 0000000000000000(0000) GS:ffffffffaeca3000(0000) knlGS:0000000000000000
[ 2.655122] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.655122] CR2: 0000000000000000 CR3: 0000000017e79000 CR4: 00000000000006b0
[ 2.655122] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2.655122] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2.655122] Call Trace:
[ 2.655122] slob_alloc+0xa5/0x1e9
[ 2.655122] ? sd_revalidate_disk+0x752/0x17a4
[ 2.655122] __kmalloc+0x60/0x1ab
[ 2.655122] sd_revalidate_disk+0x752/0x17a4
[ 2.655122] sd_probe_async+0xb6/0x1cd
[ 2.655122] async_run_entry_fn+0x3a/0xe5
[ 2.655122] process_one_work+0x2a0/0x491
[ 2.655122] ? worker_thread+0x239/0x2b0
[ 2.655122] worker_thread+0x1df/0x2b0
[ 2.655122] ? process_scheduled_works+0x2c/0x2c
[ 2.655122] kthread+0x11d/0x125
[ 2.655122] ? __kthread_create_on_node+0x169/0x169
[ 2.655122] ret_from_fork+0x3a/0x50
[ 2.655122] irq event stamp: 506
[ 2.655122] hardirqs last enabled at (505): [<ffffffffadf74ebe>] _raw_spin_unlock_irq+0x29/0x45
[ 2.655122] hardirqs last disabled at (506): [<ffffffffadf74c74>] _raw_spin_lock_irqsave+0x15/0x75
[ 2.655122] softirqs last enabled at (484): [<ffffffffae200353>] __do_softirq+0x353/0x393
[ 2.655122] softirqs last disabled at (475): [<ffffffffad100134>] irq_exit+0x67/0x82
[ 2.655122] ---[ end trace 2b1c6a5e2748f254 ]---
[ 2.679270] sd 0:0:0:0: [sda] 16384 512-byte logical blocks: (8.39 MB/8.00 MiB)

# HH:MM RESULT GOOD BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD
git bisect start 05d08e2995cbe6efdb993482ee0d38a77040861a 79a3aaa7b82e3106be97842dedfd8429248896e6 --
git bisect good 2dbd2d8f2c2ccd640f9cb6462e23f0a5ac67e1a2 # 06:13 G 909 0 0 0 Merge remote-tracking branch 'net-next/master'
git bisect good d177ed11c13c43e0f5a289727c0237b9141ca458 # 06:32 G 904 0 0 0 Merge remote-tracking branch 'kvm-arm/next'
git bisect good a1a606c7831374d6ef20ed04c16a76b44f79bcab # 06:48 G 900 0 0 0 Merge remote-tracking branch 'rpmsg/for-next'
git bisect good f2ea30d060707080d2d5f8532f0efebfa3a04302 # 07:03 G 903 0 0 0 Merge remote-tracking branch 'nvdimm/libnvdimm-for-next'
git bisect good e006c7613228cfa7abefd1c5175e171e6ae2c4b7 # 07:20 G 902 0 1 1 Merge remote-tracking branch 'xarray/xarray'
git bisect good 046b78627faba9a4b85c9f7a0bba764bbbbe76ff # 07:38 G 906 0 2 2 Merge remote-tracking branch 'devfreq/for-next'
git bisect bad 1999d633921bdbbf76c7f1065d15ec237a977c02 # 07:38 B 15 42 0 0 Merge branch 'akpm-current/current'
git bisect bad 4aa445a97c1da9d169f63377262709254e496f65 # 07:38 B 39 18 0 0 mm: introduce put_user_page*(), placeholder versions
git bisect bad 7a12d85195df96396eb2ba121ff6f4635a5af451 # 07:38 B 902 8 0 0 mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM
git bisect good 2e1f88301e46de5bad7a8342f5bb41f228225462 # 08:24 G 906 0 1 1 include/linux/list.h: add list_rotate_to_front()
git bisect bad 3203d9ca496aeb0a55dbd8d2fc6f821cf6bb105f # 08:36 B 0 2 16 0 mm/cma_debug.c: fix the break condition in cma_maxchunk_get()
git bisect bad f46dc6b6ca0271d51721c2b5b054ef2ffcdcbfa0 # 09:10 B 271 1 0 0 mm/slab.c: use slab_list instead of lru
git bisect bad 179f17e589d7c0ce1433aa967113b71e4db992a5 # 09:21 B 7 2 0 0 mm/slob.c: use slab_list instead of lru
git bisect bad 15c8410c67adefd26ea0df1f1b86e1836051784b # 09:35 B 19 1 0 0 mm/slob.c: respect list_head abstraction layer
# first bad commit: [15c8410c67adefd26ea0df1f1b86e1836051784b] mm/slob.c: respect list_head abstraction layer
git bisect good 2e1f88301e46de5bad7a8342f5bb41f228225462 # 09:50 G 1003 0 0 1 include/linux/list.h: add list_rotate_to_front()
# extra tests with debug options
git bisect bad 15c8410c67adefd26ea0df1f1b86e1836051784b # 09:59 B 2 1 0 0 mm/slob.c: respect list_head abstraction layer
# extra tests on HEAD of linux-next/master
git bisect bad 05d08e2995cbe6efdb993482ee0d38a77040861a # 09:59 B 299 44 0 0 Add linux-next specific files for 20190402
# extra tests on tree/branch linux-next/master
git bisect bad 05d08e2995cbe6efdb993482ee0d38a77040861a # 10:00 B 299 44 0 0 Add linux-next specific files for 20190402

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/lkp Intel Corporation


Attachments:
(No filename) (12.40 kB)
dmesg-quantal-vm-quantal-127:20190403093917:x86_64-randconfig-a0-04021905:5.1.0-rc2-00286-g15c8410:1.gz (24.29 kB)
dmesg-quantal-vm-quantal-222:20190403081045:x86_64-randconfig-a0-04021905:5.1.0-rc2-00285-g2e1f883:1.gz (23.13 kB)
reproduce-quantal-vm-quantal-127:20190403093917:x86_64-randconfig-a0-04021905:5.1.0-rc2-00286-g15c8410:1 (955.00 B)
config-5.1.0-rc2-00286-g15c8410 (136.62 kB)
Download all attachments

2019-04-03 04:55:58

by Tobin C. Harding

[permalink] [raw]
Subject: Re: 15c8410c67 ("mm/slob.c: respect list_head abstraction layer"): WARNING: CPU: 0 PID: 1 at lib/list_debug.c:28 __list_add_valid

On Wed, Apr 03, 2019 at 10:00:38AM +0800, kernel test robot wrote:
> Greetings,
>
> 0day kernel testing robot got the below dmesg and the first bad commit is
>
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
>
> commit 15c8410c67adefd26ea0df1f1b86e1836051784b
> Author: Tobin C. Harding <[email protected]>
> AuthorDate: Fri Mar 29 10:01:23 2019 +1100
> Commit: Stephen Rothwell <[email protected]>
> CommitDate: Sat Mar 30 16:09:41 2019 +1100
>
> mm/slob.c: respect list_head abstraction layer
>
> Currently we reach inside the list_head. This is a violation of the layer
> of abstraction provided by the list_head. It makes the code fragile.
> More importantly it makes the code wicked hard to understand.
>
> The code logic is based on the page in which an allocation was made, we
> want to modify the slob_list we are working on to have this page at the
> front. We already have a function to check if an entry is at the front of
> the list. Recently a function was added to list.h to do the list
> rotation. We can use these two functions to reduce line count, reduce
> code fragility, and reduce cognitive load required to read the code.
>
> Use list_head functions to interact with lists thereby maintaining the
> abstraction provided by the list_head structure.
>
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Tobin C. Harding <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: David Rientjes <[email protected]>
> Cc: Joonsoo Kim <[email protected]>
> Cc: Pekka Enberg <[email protected]>
> Cc: Roman Gushchin <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> Signed-off-by: Stephen Rothwell <[email protected]>
>
> 2e1f88301e include/linux/list.h: add list_rotate_to_front()
> 15c8410c67 mm/slob.c: respect list_head abstraction layer
> 05d08e2995 Add linux-next specific files for 20190402
> +-------------------------------------------------------+------------+------------+---------------+
> | | 2e1f88301e | 15c8410c67 | next-20190402 |
> +-------------------------------------------------------+------------+------------+---------------+
> | boot_successes | 1009 | 198 | 299 |
> | boot_failures | 0 | 2 | 44 |
> | WARNING:at_lib/list_debug.c:#__list_add_valid | 0 | 2 | 44 |
> | RIP:__list_add_valid | 0 | 2 | 44 |
> | WARNING:at_lib/list_debug.c:#__list_del_entry_valid | 0 | 2 | 25 |
> | RIP:__list_del_entry_valid | 0 | 2 | 25 |
> | WARNING:possible_circular_locking_dependency_detected | 0 | 2 | 44 |
> | RIP:_raw_spin_unlock_irqrestore | 0 | 2 | 2 |
> | BUG:kernel_hang_in_test_stage | 0 | 0 | 6 |
> | BUG:unable_to_handle_kernel | 0 | 0 | 1 |
> | Oops:#[##] | 0 | 0 | 1 |
> | RIP:slob_page_alloc | 0 | 0 | 1 |
> | Kernel_panic-not_syncing:Fatal_exception | 0 | 0 | 1 |
> | RIP:delay_tsc | 0 | 0 | 2 |
> +-------------------------------------------------------+------------+------------+---------------+
>
> [ 2.618737] db_root: cannot open: /etc/target
> [ 2.620114] mtdoops: mtd device (mtddev=name/number) must be supplied
> [ 2.620967] slram: not enough parameters.
> [ 2.621614] ------------[ cut here ]------------
> [ 2.622254] list_add corruption. prev->next should be next (ffffffffaeeb71b0), but was ffffcee1406d3f70. (prev=ffffcee140422508).

Is this perhaps a false positive because we hackishly move the list_head
'head' and insert it back into the list. Perhaps this is confusing the
validation functions?

Tobin

2019-04-04 04:44:11

by Tobin C. Harding

[permalink] [raw]
Subject: Re: 15c8410c67 ("mm/slob.c: respect list_head abstraction layer"): WARNING: CPU: 0 PID: 1 at lib/list_debug.c:28 __list_add_valid

On Wed, Apr 03, 2019 at 03:54:17PM +1100, Tobin C. Harding wrote:
> On Wed, Apr 03, 2019 at 10:00:38AM +0800, kernel test robot wrote:
> > Greetings,
> >
> > 0day kernel testing robot got the below dmesg and the first bad commit is
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> >
> > commit 15c8410c67adefd26ea0df1f1b86e1836051784b
> > Author: Tobin C. Harding <[email protected]>
> > AuthorDate: Fri Mar 29 10:01:23 2019 +1100
> > Commit: Stephen Rothwell <[email protected]>
> > CommitDate: Sat Mar 30 16:09:41 2019 +1100
> >
> > mm/slob.c: respect list_head abstraction layer
> >
> > Currently we reach inside the list_head. This is a violation of the layer
> > of abstraction provided by the list_head. It makes the code fragile.
> > More importantly it makes the code wicked hard to understand.
> >
> > The code logic is based on the page in which an allocation was made, we
> > want to modify the slob_list we are working on to have this page at the
> > front. We already have a function to check if an entry is at the front of
> > the list. Recently a function was added to list.h to do the list
> > rotation. We can use these two functions to reduce line count, reduce
> > code fragility, and reduce cognitive load required to read the code.
> >
> > Use list_head functions to interact with lists thereby maintaining the
> > abstraction provided by the list_head structure.
> >
> > Link: http://lkml.kernel.org/r/[email protected]
> > Signed-off-by: Tobin C. Harding <[email protected]>
> > Cc: Christoph Lameter <[email protected]>
> > Cc: David Rientjes <[email protected]>
> > Cc: Joonsoo Kim <[email protected]>
> > Cc: Pekka Enberg <[email protected]>
> > Cc: Roman Gushchin <[email protected]>
> > Signed-off-by: Andrew Morton <[email protected]>
> > Signed-off-by: Stephen Rothwell <[email protected]>
> >
> > 2e1f88301e include/linux/list.h: add list_rotate_to_front()
> > 15c8410c67 mm/slob.c: respect list_head abstraction layer
> > 05d08e2995 Add linux-next specific files for 20190402
> > +-------------------------------------------------------+------------+------------+---------------+
> > | | 2e1f88301e | 15c8410c67 | next-20190402 |
> > +-------------------------------------------------------+------------+------------+---------------+
> > | boot_successes | 1009 | 198 | 299 |
> > | boot_failures | 0 | 2 | 44 |
> > | WARNING:at_lib/list_debug.c:#__list_add_valid | 0 | 2 | 44 |
> > | RIP:__list_add_valid | 0 | 2 | 44 |
> > | WARNING:at_lib/list_debug.c:#__list_del_entry_valid | 0 | 2 | 25 |
> > | RIP:__list_del_entry_valid | 0 | 2 | 25 |
> > | WARNING:possible_circular_locking_dependency_detected | 0 | 2 | 44 |
> > | RIP:_raw_spin_unlock_irqrestore | 0 | 2 | 2 |
> > | BUG:kernel_hang_in_test_stage | 0 | 0 | 6 |
> > | BUG:unable_to_handle_kernel | 0 | 0 | 1 |
> > | Oops:#[##] | 0 | 0 | 1 |
> > | RIP:slob_page_alloc | 0 | 0 | 1 |
> > | Kernel_panic-not_syncing:Fatal_exception | 0 | 0 | 1 |
> > | RIP:delay_tsc | 0 | 0 | 2 |
> > +-------------------------------------------------------+------------+------------+---------------+
> >
> > [ 2.618737] db_root: cannot open: /etc/target
> > [ 2.620114] mtdoops: mtd device (mtddev=name/number) must be supplied
> > [ 2.620967] slram: not enough parameters.
> > [ 2.621614] ------------[ cut here ]------------
> > [ 2.622254] list_add corruption. prev->next should be next (ffffffffaeeb71b0), but was ffffcee1406d3f70. (prev=ffffcee140422508).
>
> Is this perhaps a false positive because we hackishly move the list_head
> 'head' and insert it back into the list. Perhaps this is confusing the
> validation functions?

This has got me stumped. I cannot create a test case where manipulating
a list with list_rotate_to_front() causes the list validation functions
to emit an error. Also I cannot come up with a way on paper that it can
happen either.

I don't really know how to go forwards from here. I'll sleep on it and
see if something comes to me, any ideas to look into please?

thanks,
Tobin.