2019-12-01 15:02:16

by kernel test robot

[permalink] [raw]
Subject: 5e6669387e ("of/platform: Pause/resume sync state during init .."): [ 3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume

Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

commit 5e6669387e2287f25f09fd0abd279dae104cfa7e
Author: Saravana Kannan <[email protected]>
AuthorDate: Wed Sep 4 14:11:24 2019 -0700
Commit: Greg Kroah-Hartman <[email protected]>
CommitDate: Fri Oct 4 17:30:19 2019 +0200

of/platform: Pause/resume sync state during init and of_platform_populate()

When all the top level devices are populated from DT during kernel
init, the supplier devices could be added and probed before the
consumer devices are added and linked to the suppliers. To avoid the
sync_state() callback from being called prematurely, pause the
sync_state() callbacks before populating the devices and resume them
at late_initcall_sync().

Similarly, when children devices are populated from a module using
of_platform_populate(), there could be supplier-consumer dependencies
between the children devices that are populated. To avoid the same
problem with sync_state() being called prematurely, pause and resume
sync_state() callbacks across of_platform_populate().

Signed-off-by: Saravana Kannan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>

fc5a251d0f driver core: Add sync_state driver/bus callback
5e6669387e of/platform: Pause/resume sync state during init and of_platform_populate()
81b6b96475 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux; tag 'dma-mapping-5.5' of git://git.infradead.org/users/hch/dma-mapping
+-------------------------------------------------------------------------+------------+------------+------------+
| | fc5a251d0f | 5e6669387e | 81b6b96475 |
+-------------------------------------------------------------------------+------------+------------+------------+
| boot_successes | 30 | 0 | 0 |
| boot_failures | 1 | 11 | 22 |
| Oops:#[##] | 1 | | |
| EIP:unmap_vmas | 1 | | |
| PANIC:double_fault | 1 | | |
| Kernel_panic-not_syncing:Fatal_exception | 1 | | |
| WARNING:at_drivers/base/core.c:#device_links_supplier_sync_state_resume | 0 | 11 | 22 |
| EIP:device_links_supplier_sync_state_resume | 0 | 11 | 22 |
+-------------------------------------------------------------------------+------------+------------+------------+

If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>

[ 3.186107] OF: /testcase-data/phandle-tests/consumer-b: #phandle-cells = 2 found -1
[ 3.188595] platform testcase-data:testcase-device2: IRQ index 0 not found
[ 3.191047] ### dt-test ### end of unittest - 199 passed, 0 failed
[ 3.191932] ------------[ cut here ]------------
[ 3.192571] Unmatched sync_state pause/resume!
[ 3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume+0x27/0xc0
[ 3.195084] Modules linked in:
[ 3.195494] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G T 5.4.0-rc1-00005-g5e6669387e228 #1
[ 3.196674] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 3.197693] EIP: device_links_supplier_sync_state_resume+0x27/0xc0
[ 3.198680] Code: 00 00 00 3e 8d 74 26 00 57 56 31 d2 53 b8 a0 d0 d9 c1 e8 6c b6 38 00 a1 e4 d0 d9 c1 85 c0 75 13 68 84 ba c4 c1 e8 29 30 b1 ff <0f> 0b 58 eb 7f 8d 74 26 00 83 e8 01 85 c0 a3 e4 d0 d9 c1 75 6f 8b
[ 3.201560] EAX: 00000022 EBX: 00000000 ECX: 00000000 EDX: 00000000
[ 3.202466] ESI: 000001ab EDI: c02c7f80 EBP: c1e87d27 ESP: c02c7f20
[ 3.203301] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010282
[ 3.204258] CR0: 80050033 CR2: bfa1bf98 CR3: 01f28000 CR4: 00140690
[ 3.205022] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 3.205919] DR6: fffe0ff0 DR7: 00000400
[ 3.206529] Call Trace:
[ 3.207011] ? of_platform_sync_state_init+0x13/0x16
[ 3.207719] ? do_one_initcall+0xda/0x260
[ 3.208247] ? kernel_init_freeable+0x110/0x197
[ 3.208906] ? rest_init+0x120/0x120
[ 3.209369] ? kernel_init+0xa/0x100
[ 3.209775] ? ret_from_fork+0x19/0x24
[ 3.210283] ---[ end trace 81d0f2d2ee65199b ]---
[ 3.210955] ALSA device list:

# HH:MM RESULT GOOD BAD GOOD_BUT_DIRTY DIRTY_NOT_BAD
git bisect start 1251b72273bfd2d17832055d93713797f027bd7e 219d54332a09e8d8741c1e1982f5eae56099de85 --
git bisect bad 5850da4ca99731a66a1b54fcd4e937f2ce09d422 # 07:14 B 0 3 19 0 Merge 'frank-w-bpi-r2-4.14/5.4-r64-pcie' into devel-catchup-201911300217
git bisect bad a9ff9d736bfbcad09085e7b370eff28200691aae # 07:38 B 0 3 19 0 Merge 'linux-review/Jaroslav-Kysela/ALSA-hda-fixup-for-the-bass-speaker-on-Lenovo-Carbon-X1-7th-gen/20191130-013305' into devel-catchup-201911300217
git bisect good ff7d78108fd9afb5e90dcdbf84c1857702452473 # 08:37 G 10 0 0 0 0day base guard for 'devel-catchup-201911300217'
git bisect bad 95a324fbf220358a41e082098d16489e7efabb05 # 08:59 B 0 2 18 0 Merge 'linux-review/Vincenzo-Frascino/mips-Fix-gettimeofday-in-the-vdso-library/20191130-011805' into devel-catchup-201911300217
git bisect good 361b0d286afea0d867537536977a695b5557d133 # 09:55 G 11 0 0 0 Merge tag 'devprop-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect good 0dd09bc02c1bad55e92306ca83b38b3cf48b9f40 # 11:06 G 11 0 1 1 Merge tag 'staging-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect good 3275a71e76fac5bc276f0d60e027b18c2e8d7a5b # 12:07 G 11 0 0 0 Merge tag 'drm-next-5.5-2019-10-09' of git://people.freedesktop.org/~agd5f/linux into drm-next
git bisect good 2ef4144d1ea8b181d377d0783c43032cb44889f7 # 15:08 G 11 0 0 0 Merge tag 'drm-intel-next-2019-11-01-1' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
git bisect good acc61b8929365e63a3e8c8c8913177795aa45594 # 15:34 G 11 0 0 1 Merge tag 'drm-next-5.5-2019-11-22' of git://people.freedesktop.org/~agd5f/linux into drm-next
git bisect bad 95f1fa9e3418d50ce099e67280b5497b9c93843b # 16:06 B 0 2 19 1 Merge tag 'trace-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
git bisect bad 80eb5fea3c14fb171facb5242a1555b3aafea4d0 # 16:31 B 0 1 17 0 Merge tag 'powerpc-spectre-rsb' of powerpc-CVE-2019-18660.bundle
git bisect bad 0c40c1be2512abc99ea27df83f882dd61b5437bc # 17:30 B 0 11 27 0 mmc: dw_mmc: Fix debugfs on 64-bit platforms
git bisect bad f5cb0a7e64f41b6f1c5cacc64a476962f5e97f91 # 18:00 B 0 2 18 0 debugfs: remove return value of debugfs_create_x32()
git bisect bad c31e73121f4c1ec45a3e523ac6ce3ce6dafdcec1 # 18:35 B 0 1 17 0 base: soc: Handle custom soc information sysfs entries
git bisect good fc5a251d0fd7ca9038bab78a8c97932c8c6ca23b # 19:07 G 11 0 0 0 driver core: Add sync_state driver/bus callback
git bisect bad d4387cd117414ba80230f27a514be5ca4a09cfcc # 19:34 B 0 9 25 0 of: property: Create device links for all child-supplier depencencies
git bisect bad 5e6669387e2287f25f09fd0abd279dae104cfa7e # 19:53 B 0 11 27 0 of/platform: Pause/resume sync state during init and of_platform_populate()
# first bad commit: [5e6669387e2287f25f09fd0abd279dae104cfa7e] of/platform: Pause/resume sync state during init and of_platform_populate()
git bisect good fc5a251d0fd7ca9038bab78a8c97932c8c6ca23b # 20:17 G 31 0 1 1 driver core: Add sync_state driver/bus callback
# extra tests with debug options
git bisect good 5e6669387e2287f25f09fd0abd279dae104cfa7e # 20:29 G 10 0 0 0 of/platform: Pause/resume sync state during init and of_platform_populate()
# extra tests on head commit of linus/master
git bisect bad 81b6b96475ac7a4ebfceae9f16fb3758327adbfe # 20:47 B 0 22 61 0 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux; tag 'dma-mapping-5.5' of git://git.infradead.org/users/hch/dma-mapping
# bad: [81b6b96475ac7a4ebfceae9f16fb3758327adbfe] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux; tag 'dma-mapping-5.5' of git://git.infradead.org/users/hch/dma-mapping
# extra tests on revert first bad commit
git bisect good c1a48460693b4421be494a524e06e57bdb9fc9bf # 21:19 G 10 0 0 0 Revert "of/platform: Pause/resume sync state during init and of_platform_populate()"
# good: [c1a48460693b4421be494a524e06e57bdb9fc9bf] Revert "of/platform: Pause/resume sync state during init and of_platform_populate()"
# extra tests on linus/master
# duplicated: [81b6b96475ac7a4ebfceae9f16fb3758327adbfe] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux; tag 'dma-mapping-5.5' of git://git.infradead.org/users/hch/dma-mapping
# extra tests on linux-next/master
# 119: [419593dad8439007452bb6f267861863b572c520] Add linux-next specific files for 20191129

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation


Attachments:
(No filename) (9.90 kB)
dmesg-yocto-vm-yocto-af60b987e7d2:20191130195207:i386-randconfig-b003-20191129:5.4.0-rc1-00005-g5e6669387e228:1.gz (18.85 kB)
dmesg-yocto-vm-yocto-2102271fd292:20191130201615:i386-randconfig-b003-20191129:5.4.0-rc1-00004-gfc5a251d0fd7c:1.gz (14.88 kB)
reproduce-yocto-vm-yocto-af60b987e7d2:20191130195207:i386-randconfig-b003-20191129:5.4.0-rc1-00005-g5e6669387e228:1 (987.00 B)
1251b72273bfd2d17832055d93713797f027bd7e:gcc-7:i386-randconfig-b003-20191129:EIP:device_links_supplier_sync_state_resume.xz (8.48 kB)
config-5.4.0-rc1-00005-g5e6669387e228 (124.59 kB)
Download all attachments

2019-12-02 21:20:56

by Saravana Kannan

[permalink] [raw]
Subject: Re: 5e6669387e ("of/platform: Pause/resume sync state during init .."): [ 3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume

On Sun, Dec 1, 2019 at 7:00 AM kernel test robot <[email protected]> wrote:
>
> Greetings,
>
> 0day kernel testing robot got the below dmesg and the first bad commit is
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>
> commit 5e6669387e2287f25f09fd0abd279dae104cfa7e
> Author: Saravana Kannan <[email protected]>
> AuthorDate: Wed Sep 4 14:11:24 2019 -0700
> Commit: Greg Kroah-Hartman <[email protected]>
> CommitDate: Fri Oct 4 17:30:19 2019 +0200
>
> of/platform: Pause/resume sync state during init and of_platform_populate()
>
> When all the top level devices are populated from DT during kernel
> init, the supplier devices could be added and probed before the
> consumer devices are added and linked to the suppliers. To avoid the
> sync_state() callback from being called prematurely, pause the
> sync_state() callbacks before populating the devices and resume them
> at late_initcall_sync().
>
> Similarly, when children devices are populated from a module using
> of_platform_populate(), there could be supplier-consumer dependencies
> between the children devices that are populated. To avoid the same
> problem with sync_state() being called prematurely, pause and resume
> sync_state() callbacks across of_platform_populate().
>
> Signed-off-by: Saravana Kannan <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> fc5a251d0f driver core: Add sync_state driver/bus callback
> 5e6669387e of/platform: Pause/resume sync state during init and of_platform_populate()
> 81b6b96475 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux; tag 'dma-mapping-5.5' of git://git.infradead.org/users/hch/dma-mapping
> +-------------------------------------------------------------------------+------------+------------+------------+
> | | fc5a251d0f | 5e6669387e | 81b6b96475 |
> +-------------------------------------------------------------------------+------------+------------+------------+
> | boot_successes | 30 | 0 | 0 |
> | boot_failures | 1 | 11 | 22 |
> | Oops:#[##] | 1 | | |
> | EIP:unmap_vmas | 1 | | |
> | PANIC:double_fault | 1 | | |
> | Kernel_panic-not_syncing:Fatal_exception | 1 | | |
> | WARNING:at_drivers/base/core.c:#device_links_supplier_sync_state_resume | 0 | 11 | 22 |
> | EIP:device_links_supplier_sync_state_resume | 0 | 11 | 22 |
> +-------------------------------------------------------------------------+------------+------------+------------+
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
> [ 3.186107] OF: /testcase-data/phandle-tests/consumer-b: #phandle-cells = 2 found -1
> [ 3.188595] platform testcase-data:testcase-device2: IRQ index 0 not found
> [ 3.191047] ### dt-test ### end of unittest - 199 passed, 0 failed
> [ 3.191932] ------------[ cut here ]------------
> [ 3.192571] Unmatched sync_state pause/resume!
> [ 3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume+0x27/0xc0
> [ 3.195084] Modules linked in:
> [ 3.195494] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G T 5.4.0-rc1-00005-g5e6669387e228 #1
> [ 3.196674] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> [ 3.197693] EIP: device_links_supplier_sync_state_resume+0x27/0xc0
> [ 3.198680] Code: 00 00 00 3e 8d 74 26 00 57 56 31 d2 53 b8 a0 d0 d9 c1 e8 6c b6 38 00 a1 e4 d0 d9 c1 85 c0 75 13 68 84 ba c4 c1 e8 29 30 b1 ff <0f> 0b 58 eb 7f 8d 74 26 00 83 e8 01 85 c0 a3 e4 d0 d9 c1 75 6f 8b
> [ 3.201560] EAX: 00000022 EBX: 00000000 ECX: 00000000 EDX: 00000000
> [ 3.202466] ESI: 000001ab EDI: c02c7f80 EBP: c1e87d27 ESP: c02c7f20
> [ 3.203301] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010282
> [ 3.204258] CR0: 80050033 CR2: bfa1bf98 CR3: 01f28000 CR4: 00140690
> [ 3.205022] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 3.205919] DR6: fffe0ff0 DR7: 00000400
> [ 3.206529] Call Trace:
> [ 3.207011] ? of_platform_sync_state_init+0x13/0x16
> [ 3.207719] ? do_one_initcall+0xda/0x260
> [ 3.208247] ? kernel_init_freeable+0x110/0x197
> [ 3.208906] ? rest_init+0x120/0x120
> [ 3.209369] ? kernel_init+0xa/0x100
> [ 3.209775] ? ret_from_fork+0x19/0x24
> [ 3.210283] ---[ end trace 81d0f2d2ee65199b ]---
> [ 3.210955] ALSA device list:

Rob/Frank,

This seems to be an issue with the unit test code not properly
cleaning up the state after it's done.

Specifically, unittest_data_add() setting up of_root on systems where
there's no device tree (of_root == NULL). It doesn't clean up of_root
after the tests are done. This affects the of_have_populated_dt() API
that in turn affects calls to
device_links_supplier_sync_state_pause/resume(). I think unittests
shouldn't affect the of_have_populated_dt() API.

I was looking into writing a unittest patch to fix this, but I don't
know enough about the FDT parsing code to make sure I don't leak any
memory or free stuff that's in use. I'm not sure I can simply set
of_root = NULL if it was NULL before the unittest started. Let me know
how I should proceed or if you plan to write up a patch for this.

Thanks,
Saravana

2019-12-03 09:02:00

by Frank Rowand

[permalink] [raw]
Subject: Re: 5e6669387e ("of/platform: Pause/resume sync state during init .."): [ 3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume

On 12/2/19 3:19 PM, Saravana Kannan wrote:
> On Sun, Dec 1, 2019 at 7:00 AM kernel test robot <[email protected]> wrote:
>>
>> Greetings,
>>
>> 0day kernel testing robot got the below dmesg and the first bad commit is
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>>
>> commit 5e6669387e2287f25f09fd0abd279dae104cfa7e
>> Author: Saravana Kannan <[email protected]>
>> AuthorDate: Wed Sep 4 14:11:24 2019 -0700
>> Commit: Greg Kroah-Hartman <[email protected]>
>> CommitDate: Fri Oct 4 17:30:19 2019 +0200
>>
>> of/platform: Pause/resume sync state during init and of_platform_populate()
>>
>> When all the top level devices are populated from DT during kernel
>> init, the supplier devices could be added and probed before the
>> consumer devices are added and linked to the suppliers. To avoid the
>> sync_state() callback from being called prematurely, pause the
>> sync_state() callbacks before populating the devices and resume them
>> at late_initcall_sync().
>>
>> Similarly, when children devices are populated from a module using
>> of_platform_populate(), there could be supplier-consumer dependencies
>> between the children devices that are populated. To avoid the same
>> problem with sync_state() being called prematurely, pause and resume
>> sync_state() callbacks across of_platform_populate().
>>
>> Signed-off-by: Saravana Kannan <[email protected]>
>> Link: https://lore.kernel.org/r/[email protected]
>> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>>
>> fc5a251d0f driver core: Add sync_state driver/bus callback
>> 5e6669387e of/platform: Pause/resume sync state during init and of_platform_populate()
>> 81b6b96475 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux; tag 'dma-mapping-5.5' of git://git.infradead.org/users/hch/dma-mapping
>> +-------------------------------------------------------------------------+------------+------------+------------+
>> | | fc5a251d0f | 5e6669387e | 81b6b96475 |
>> +-------------------------------------------------------------------------+------------+------------+------------+
>> | boot_successes | 30 | 0 | 0 |
>> | boot_failures | 1 | 11 | 22 |
>> | Oops:#[##] | 1 | | |
>> | EIP:unmap_vmas | 1 | | |
>> | PANIC:double_fault | 1 | | |
>> | Kernel_panic-not_syncing:Fatal_exception | 1 | | |
>> | WARNING:at_drivers/base/core.c:#device_links_supplier_sync_state_resume | 0 | 11 | 22 |
>> | EIP:device_links_supplier_sync_state_resume | 0 | 11 | 22 |
>> +-------------------------------------------------------------------------+------------+------------+------------+
>>
>> If you fix the issue, kindly add following tag
>> Reported-by: kernel test robot <[email protected]>
>>
>> [ 3.186107] OF: /testcase-data/phandle-tests/consumer-b: #phandle-cells = 2 found -1
>> [ 3.188595] platform testcase-data:testcase-device2: IRQ index 0 not found
>> [ 3.191047] ### dt-test ### end of unittest - 199 passed, 0 failed
>> [ 3.191932] ------------[ cut here ]------------
>> [ 3.192571] Unmatched sync_state pause/resume!
>> [ 3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume+0x27/0xc0
>> [ 3.195084] Modules linked in:
>> [ 3.195494] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G T 5.4.0-rc1-00005-g5e6669387e228 #1
>> [ 3.196674] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>> [ 3.197693] EIP: device_links_supplier_sync_state_resume+0x27/0xc0
>> [ 3.198680] Code: 00 00 00 3e 8d 74 26 00 57 56 31 d2 53 b8 a0 d0 d9 c1 e8 6c b6 38 00 a1 e4 d0 d9 c1 85 c0 75 13 68 84 ba c4 c1 e8 29 30 b1 ff <0f> 0b 58 eb 7f 8d 74 26 00 83 e8 01 85 c0 a3 e4 d0 d9 c1 75 6f 8b
>> [ 3.201560] EAX: 00000022 EBX: 00000000 ECX: 00000000 EDX: 00000000
>> [ 3.202466] ESI: 000001ab EDI: c02c7f80 EBP: c1e87d27 ESP: c02c7f20
>> [ 3.203301] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010282
>> [ 3.204258] CR0: 80050033 CR2: bfa1bf98 CR3: 01f28000 CR4: 00140690
>> [ 3.205022] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>> [ 3.205919] DR6: fffe0ff0 DR7: 00000400
>> [ 3.206529] Call Trace:
>> [ 3.207011] ? of_platform_sync_state_init+0x13/0x16
>> [ 3.207719] ? do_one_initcall+0xda/0x260
>> [ 3.208247] ? kernel_init_freeable+0x110/0x197
>> [ 3.208906] ? rest_init+0x120/0x120
>> [ 3.209369] ? kernel_init+0xa/0x100
>> [ 3.209775] ? ret_from_fork+0x19/0x24
>> [ 3.210283] ---[ end trace 81d0f2d2ee65199b ]---
>> [ 3.210955] ALSA device list:
>
> Rob/Frank,
>
> This seems to be an issue with the unit test code not properly
> cleaning up the state after it's done.
>
> Specifically, unittest_data_add() setting up of_root on systems where
> there's no device tree (of_root == NULL). It doesn't clean up of_root
> after the tests are done. This affects the of_have_populated_dt() API
> that in turn affects calls to
> device_links_supplier_sync_state_pause/resume(). I think unittests
> shouldn't affect the of_have_populated_dt() API.
There are at least a couple of reasons why the unittest devicetree data
needs to remain after the point where devicetree unittests currently
complete. So cleaning up (removing the data) is not an option.

I depend on the unittest devicetree entries still existing after the system
boots and I can log into a shell for some validation of the final result of
the devicetree data.

There is also a desire for the devicetree unittests to be able to be loaded
as a module. That work is not yet scheduled, but I do not want to preclude
the possibility. If unittests are loaded from a module then they will
need some devicetree data to exist that is created in early boot. That
data will be in the devicetree when of_platform_sync_state_init() is
invoked.


> I was looking into writing a unittest patch to fix this, but I don't
> know enough about the FDT parsing code to make sure I don't leak any
> memory or free stuff that's in use. I'm not sure I can simply set
> of_root = NULL if it was NULL before the unittest started. Let me know
> how I should proceed or if you plan to write up a patch for this.

Based on the above, "clean up" of the unittest data is not the solution.

I haven't looked at the mechanism in device_links_supplier_sync_state_resume()
that leads to the WARN yet. But is does not seem reasonable for that code
to be so sensitive to what valid data is in the devicetree that a WARN results.

-Frank

>
> Thanks,
> Saravana
>

2019-12-03 20:07:01

by Saravana Kannan

[permalink] [raw]
Subject: Re: 5e6669387e ("of/platform: Pause/resume sync state during init .."): [ 3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume

On Tue, Dec 3, 2019 at 1:01 AM Frank Rowand <[email protected]> wrote:
>
> On 12/2/19 3:19 PM, Saravana Kannan wrote:
> > On Sun, Dec 1, 2019 at 7:00 AM kernel test robot <[email protected]> wrote:
> >>
> >> Greetings,
> >>
> >> 0day kernel testing robot got the below dmesg and the first bad commit is
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> >>
> >> commit 5e6669387e2287f25f09fd0abd279dae104cfa7e
> >> Author: Saravana Kannan <[email protected]>
> >> AuthorDate: Wed Sep 4 14:11:24 2019 -0700
> >> Commit: Greg Kroah-Hartman <[email protected]>
> >> CommitDate: Fri Oct 4 17:30:19 2019 +0200
> >>
> >> of/platform: Pause/resume sync state during init and of_platform_populate()
> >>
> >> When all the top level devices are populated from DT during kernel
> >> init, the supplier devices could be added and probed before the
> >> consumer devices are added and linked to the suppliers. To avoid the
> >> sync_state() callback from being called prematurely, pause the
> >> sync_state() callbacks before populating the devices and resume them
> >> at late_initcall_sync().
> >>
> >> Similarly, when children devices are populated from a module using
> >> of_platform_populate(), there could be supplier-consumer dependencies
> >> between the children devices that are populated. To avoid the same
> >> problem with sync_state() being called prematurely, pause and resume
> >> sync_state() callbacks across of_platform_populate().
> >>
> >> Signed-off-by: Saravana Kannan <[email protected]>
> >> Link: https://lore.kernel.org/r/[email protected]
> >> Signed-off-by: Greg Kroah-Hartman <[email protected]>
> >>
> >> fc5a251d0f driver core: Add sync_state driver/bus callback
> >> 5e6669387e of/platform: Pause/resume sync state during init and of_platform_populate()
> >> 81b6b96475 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux; tag 'dma-mapping-5.5' of git://git.infradead.org/users/hch/dma-mapping
> >> +-------------------------------------------------------------------------+------------+------------+------------+
> >> | | fc5a251d0f | 5e6669387e | 81b6b96475 |
> >> +-------------------------------------------------------------------------+------------+------------+------------+
> >> | boot_successes | 30 | 0 | 0 |
> >> | boot_failures | 1 | 11 | 22 |
> >> | Oops:#[##] | 1 | | |
> >> | EIP:unmap_vmas | 1 | | |
> >> | PANIC:double_fault | 1 | | |
> >> | Kernel_panic-not_syncing:Fatal_exception | 1 | | |
> >> | WARNING:at_drivers/base/core.c:#device_links_supplier_sync_state_resume | 0 | 11 | 22 |
> >> | EIP:device_links_supplier_sync_state_resume | 0 | 11 | 22 |
> >> +-------------------------------------------------------------------------+------------+------------+------------+
> >>
> >> If you fix the issue, kindly add following tag
> >> Reported-by: kernel test robot <[email protected]>
> >>
> >> [ 3.186107] OF: /testcase-data/phandle-tests/consumer-b: #phandle-cells = 2 found -1
> >> [ 3.188595] platform testcase-data:testcase-device2: IRQ index 0 not found
> >> [ 3.191047] ### dt-test ### end of unittest - 199 passed, 0 failed
> >> [ 3.191932] ------------[ cut here ]------------
> >> [ 3.192571] Unmatched sync_state pause/resume!
> >> [ 3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume+0x27/0xc0
> >> [ 3.195084] Modules linked in:
> >> [ 3.195494] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G T 5.4.0-rc1-00005-g5e6669387e228 #1
> >> [ 3.196674] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> >> [ 3.197693] EIP: device_links_supplier_sync_state_resume+0x27/0xc0
> >> [ 3.198680] Code: 00 00 00 3e 8d 74 26 00 57 56 31 d2 53 b8 a0 d0 d9 c1 e8 6c b6 38 00 a1 e4 d0 d9 c1 85 c0 75 13 68 84 ba c4 c1 e8 29 30 b1 ff <0f> 0b 58 eb 7f 8d 74 26 00 83 e8 01 85 c0 a3 e4 d0 d9 c1 75 6f 8b
> >> [ 3.201560] EAX: 00000022 EBX: 00000000 ECX: 00000000 EDX: 00000000
> >> [ 3.202466] ESI: 000001ab EDI: c02c7f80 EBP: c1e87d27 ESP: c02c7f20
> >> [ 3.203301] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010282
> >> [ 3.204258] CR0: 80050033 CR2: bfa1bf98 CR3: 01f28000 CR4: 00140690
> >> [ 3.205022] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> >> [ 3.205919] DR6: fffe0ff0 DR7: 00000400
> >> [ 3.206529] Call Trace:
> >> [ 3.207011] ? of_platform_sync_state_init+0x13/0x16
> >> [ 3.207719] ? do_one_initcall+0xda/0x260
> >> [ 3.208247] ? kernel_init_freeable+0x110/0x197
> >> [ 3.208906] ? rest_init+0x120/0x120
> >> [ 3.209369] ? kernel_init+0xa/0x100
> >> [ 3.209775] ? ret_from_fork+0x19/0x24
> >> [ 3.210283] ---[ end trace 81d0f2d2ee65199b ]---
> >> [ 3.210955] ALSA device list:
> >
> > Rob/Frank,
> >
> > This seems to be an issue with the unit test code not properly
> > cleaning up the state after it's done.
> >
> > Specifically, unittest_data_add() setting up of_root on systems where
> > there's no device tree (of_root == NULL). It doesn't clean up of_root
> > after the tests are done. This affects the of_have_populated_dt() API
> > that in turn affects calls to
> > device_links_supplier_sync_state_pause/resume(). I think unittests
> > shouldn't affect the of_have_populated_dt() API.
> There are at least a couple of reasons why the unittest devicetree data
> needs to remain after the point where devicetree unittests currently
> complete. So cleaning up (removing the data) is not an option.
>
> I depend on the unittest devicetree entries still existing after the system
> boots and I can log into a shell for some validation of the final result of
> the devicetree data.

IMHO unittests shouldn't have a residual impact on the system after
they are done. So, I'll agree to disagree on this one.

> There is also a desire for the devicetree unittests to be able to be loaded
> as a module. That work is not yet scheduled, but I do not want to preclude
> the possibility. If unittests are loaded from a module then they will
> need some devicetree data to exist that is created in early boot. That
> data will be in the devicetree when of_platform_sync_state_init() is
> invoked.

On a normal system, FDT is parsed and of_root is set (or not set) very
early on during setup_arch() before any of the initcall levels are
run. The return value of of_have_populated_dt() isn't expected to
change across initcall levels. But because of the way the unittest is
written (the of_root is changed at late_initcall() level) the return
value of of_have_populated_dt() changes across initcall levels. I
think that's a real problem with the unittest -- it's breaking API
semantics.

of_have_populated_dt() is being used to check if DT is present in the
system and different things are done based on that. We can't have that
value change across initcall levels.

Couple of thoughts:
1. Don't run unit test if there is no live DT in the system?
2. If you don't want to do (1), then at least set up the unit test
data during setup_arch() instead of doing it at some initcall level?
3. Can you use overlays for the unit tests if they are loaded as a module?

> > I was looking into writing a unittest patch to fix this, but I don't
> > know enough about the FDT parsing code to make sure I don't leak any
> > memory or free stuff that's in use. I'm not sure I can simply set
> > of_root = NULL if it was NULL before the unittest started. Let me know
> > how I should proceed or if you plan to write up a patch for this.
>
> Based on the above, "clean up" of the unittest data is not the solution.
>
> I haven't looked at the mechanism in device_links_supplier_sync_state_resume()
> that leads to the WARN yet. But is does not seem reasonable for that code
> to be so sensitive to what valid data is in the devicetree that a WARN results.

Sure, I could easily fix it to work around this. But this seems to be
a genuine problem with the unittest setup IMO.

-Saravana

2019-12-03 21:11:18

by Rob Herring

[permalink] [raw]
Subject: Re: 5e6669387e ("of/platform: Pause/resume sync state during init .."): [ 3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume

On Tue, Dec 3, 2019 at 2:05 PM Saravana Kannan <[email protected]> wrote:
>
> On Tue, Dec 3, 2019 at 1:01 AM Frank Rowand <[email protected]> wrote:
> >
> > On 12/2/19 3:19 PM, Saravana Kannan wrote:
> > > On Sun, Dec 1, 2019 at 7:00 AM kernel test robot <[email protected]> wrote:
> > >>
> > >> Greetings,
> > >>
> > >> 0day kernel testing robot got the below dmesg and the first bad commit is
> > >>
> > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > >>
> > >> commit 5e6669387e2287f25f09fd0abd279dae104cfa7e
> > >> Author: Saravana Kannan <[email protected]>
> > >> AuthorDate: Wed Sep 4 14:11:24 2019 -0700
> > >> Commit: Greg Kroah-Hartman <[email protected]>
> > >> CommitDate: Fri Oct 4 17:30:19 2019 +0200
> > >>
> > >> of/platform: Pause/resume sync state during init and of_platform_populate()
> > >>
> > >> When all the top level devices are populated from DT during kernel
> > >> init, the supplier devices could be added and probed before the
> > >> consumer devices are added and linked to the suppliers. To avoid the
> > >> sync_state() callback from being called prematurely, pause the
> > >> sync_state() callbacks before populating the devices and resume them
> > >> at late_initcall_sync().
> > >>
> > >> Similarly, when children devices are populated from a module using
> > >> of_platform_populate(), there could be supplier-consumer dependencies
> > >> between the children devices that are populated. To avoid the same
> > >> problem with sync_state() being called prematurely, pause and resume
> > >> sync_state() callbacks across of_platform_populate().
> > >>
> > >> Signed-off-by: Saravana Kannan <[email protected]>
> > >> Link: https://lore.kernel.org/r/[email protected]
> > >> Signed-off-by: Greg Kroah-Hartman <[email protected]>
> > >>
> > >> fc5a251d0f driver core: Add sync_state driver/bus callback
> > >> 5e6669387e of/platform: Pause/resume sync state during init and of_platform_populate()
> > >> 81b6b96475 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux; tag 'dma-mapping-5.5' of git://git.infradead.org/users/hch/dma-mapping
> > >> +-------------------------------------------------------------------------+------------+------------+------------+
> > >> | | fc5a251d0f | 5e6669387e | 81b6b96475 |
> > >> +-------------------------------------------------------------------------+------------+------------+------------+
> > >> | boot_successes | 30 | 0 | 0 |
> > >> | boot_failures | 1 | 11 | 22 |
> > >> | Oops:#[##] | 1 | | |
> > >> | EIP:unmap_vmas | 1 | | |
> > >> | PANIC:double_fault | 1 | | |
> > >> | Kernel_panic-not_syncing:Fatal_exception | 1 | | |
> > >> | WARNING:at_drivers/base/core.c:#device_links_supplier_sync_state_resume | 0 | 11 | 22 |
> > >> | EIP:device_links_supplier_sync_state_resume | 0 | 11 | 22 |
> > >> +-------------------------------------------------------------------------+------------+------------+------------+
> > >>
> > >> If you fix the issue, kindly add following tag
> > >> Reported-by: kernel test robot <[email protected]>
> > >>
> > >> [ 3.186107] OF: /testcase-data/phandle-tests/consumer-b: #phandle-cells = 2 found -1
> > >> [ 3.188595] platform testcase-data:testcase-device2: IRQ index 0 not found
> > >> [ 3.191047] ### dt-test ### end of unittest - 199 passed, 0 failed
> > >> [ 3.191932] ------------[ cut here ]------------
> > >> [ 3.192571] Unmatched sync_state pause/resume!
> > >> [ 3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume+0x27/0xc0
> > >> [ 3.195084] Modules linked in:
> > >> [ 3.195494] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G T 5.4.0-rc1-00005-g5e6669387e228 #1
> > >> [ 3.196674] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> > >> [ 3.197693] EIP: device_links_supplier_sync_state_resume+0x27/0xc0
> > >> [ 3.198680] Code: 00 00 00 3e 8d 74 26 00 57 56 31 d2 53 b8 a0 d0 d9 c1 e8 6c b6 38 00 a1 e4 d0 d9 c1 85 c0 75 13 68 84 ba c4 c1 e8 29 30 b1 ff <0f> 0b 58 eb 7f 8d 74 26 00 83 e8 01 85 c0 a3 e4 d0 d9 c1 75 6f 8b
> > >> [ 3.201560] EAX: 00000022 EBX: 00000000 ECX: 00000000 EDX: 00000000
> > >> [ 3.202466] ESI: 000001ab EDI: c02c7f80 EBP: c1e87d27 ESP: c02c7f20
> > >> [ 3.203301] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010282
> > >> [ 3.204258] CR0: 80050033 CR2: bfa1bf98 CR3: 01f28000 CR4: 00140690
> > >> [ 3.205022] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > >> [ 3.205919] DR6: fffe0ff0 DR7: 00000400
> > >> [ 3.206529] Call Trace:
> > >> [ 3.207011] ? of_platform_sync_state_init+0x13/0x16
> > >> [ 3.207719] ? do_one_initcall+0xda/0x260
> > >> [ 3.208247] ? kernel_init_freeable+0x110/0x197
> > >> [ 3.208906] ? rest_init+0x120/0x120
> > >> [ 3.209369] ? kernel_init+0xa/0x100
> > >> [ 3.209775] ? ret_from_fork+0x19/0x24
> > >> [ 3.210283] ---[ end trace 81d0f2d2ee65199b ]---
> > >> [ 3.210955] ALSA device list:
> > >
> > > Rob/Frank,
> > >
> > > This seems to be an issue with the unit test code not properly
> > > cleaning up the state after it's done.
> > >
> > > Specifically, unittest_data_add() setting up of_root on systems where
> > > there's no device tree (of_root == NULL). It doesn't clean up of_root
> > > after the tests are done. This affects the of_have_populated_dt() API
> > > that in turn affects calls to
> > > device_links_supplier_sync_state_pause/resume(). I think unittests
> > > shouldn't affect the of_have_populated_dt() API.
> > There are at least a couple of reasons why the unittest devicetree data
> > needs to remain after the point where devicetree unittests currently
> > complete. So cleaning up (removing the data) is not an option.
> >
> > I depend on the unittest devicetree entries still existing after the system
> > boots and I can log into a shell for some validation of the final result of
> > the devicetree data.
>
> IMHO unittests shouldn't have a residual impact on the system after
> they are done. So, I'll agree to disagree on this one.

They shouldn't be enabled in a production system either. Why would you
want the extra boot time?

> > There is also a desire for the devicetree unittests to be able to be loaded
> > as a module. That work is not yet scheduled, but I do not want to preclude
> > the possibility. If unittests are loaded from a module then they will
> > need some devicetree data to exist that is created in early boot. That
> > data will be in the devicetree when of_platform_sync_state_init() is
> > invoked.
>
> On a normal system, FDT is parsed and of_root is set (or not set) very
> early on during setup_arch() before any of the initcall levels are
> run. The return value of of_have_populated_dt() isn't expected to
> change across initcall levels. But because of the way the unittest is
> written (the of_root is changed at late_initcall() level) the return
> value of of_have_populated_dt() changes across initcall levels. I
> think that's a real problem with the unittest -- it's breaking API
> semantics.

I think what's really desired here is a 'Am I booting using DT' call.

> of_have_populated_dt() is being used to check if DT is present in the
> system and different things are done based on that. We can't have that
> value change across initcall levels.
>
> Couple of thoughts:
> 1. Don't run unit test if there is no live DT in the system?

That's pretty much the only case I do run. I use UML to run the tests.

> 2. If you don't want to do (1), then at least set up the unit test
> data during setup_arch() instead of doing it at some initcall level?

That further breaks making it a module. The plan is also to move to
kunit which probably will preclude some hacky hook into setup_arch().
Side effects may need to be fixed for kunit though.

> 3. Can you use overlays for the unit tests if they are loaded as a module?

That was the idea, yes.


4. Make running the unittests a command line option instead of running
if enabled. Still has side effects, but you have to explicitly run it.

A module would still be my preference. If only there was someone
interested in making everything a module... ;)

> > > I was looking into writing a unittest patch to fix this, but I don't
> > > know enough about the FDT parsing code to make sure I don't leak any
> > > memory or free stuff that's in use. I'm not sure I can simply set
> > > of_root = NULL if it was NULL before the unittest started. Let me know
> > > how I should proceed or if you plan to write up a patch for this.
> >
> > Based on the above, "clean up" of the unittest data is not the solution.
> >
> > I haven't looked at the mechanism in device_links_supplier_sync_state_resume()
> > that leads to the WARN yet. But is does not seem reasonable for that code
> > to be so sensitive to what valid data is in the devicetree that a WARN results.
>
> Sure, I could easily fix it to work around this. But this seems to be
> a genuine problem with the unittest setup IMO.
>
> -Saravana

2019-12-03 23:05:39

by Saravana Kannan

[permalink] [raw]
Subject: Re: 5e6669387e ("of/platform: Pause/resume sync state during init .."): [ 3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume

On Tue, Dec 3, 2019 at 1:10 PM Rob Herring <[email protected]> wrote:
>
> On Tue, Dec 3, 2019 at 2:05 PM Saravana Kannan <[email protected]> wrote:
> >
> > On Tue, Dec 3, 2019 at 1:01 AM Frank Rowand <[email protected]> wrote:
> > >
> > > On 12/2/19 3:19 PM, Saravana Kannan wrote:
> > > > On Sun, Dec 1, 2019 at 7:00 AM kernel test robot <[email protected]> wrote:
> > > >>
> > > >> Greetings,
> > > >>
> > > >> 0day kernel testing robot got the below dmesg and the first bad commit is
> > > >>
> > > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > > >>
> > > >> commit 5e6669387e2287f25f09fd0abd279dae104cfa7e
> > > >> Author: Saravana Kannan <[email protected]>
> > > >> AuthorDate: Wed Sep 4 14:11:24 2019 -0700
> > > >> Commit: Greg Kroah-Hartman <[email protected]>
> > > >> CommitDate: Fri Oct 4 17:30:19 2019 +0200
> > > >>
> > > >> of/platform: Pause/resume sync state during init and of_platform_populate()
> > > >>
> > > >> When all the top level devices are populated from DT during kernel
> > > >> init, the supplier devices could be added and probed before the
> > > >> consumer devices are added and linked to the suppliers. To avoid the
> > > >> sync_state() callback from being called prematurely, pause the
> > > >> sync_state() callbacks before populating the devices and resume them
> > > >> at late_initcall_sync().
> > > >>
> > > >> Similarly, when children devices are populated from a module using
> > > >> of_platform_populate(), there could be supplier-consumer dependencies
> > > >> between the children devices that are populated. To avoid the same
> > > >> problem with sync_state() being called prematurely, pause and resume
> > > >> sync_state() callbacks across of_platform_populate().
> > > >>
> > > >> Signed-off-by: Saravana Kannan <[email protected]>
> > > >> Link: https://lore.kernel.org/r/[email protected]
> > > >> Signed-off-by: Greg Kroah-Hartman <[email protected]>
> > > >>
> > > >> fc5a251d0f driver core: Add sync_state driver/bus callback
> > > >> 5e6669387e of/platform: Pause/resume sync state during init and of_platform_populate()
> > > >> 81b6b96475 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux; tag 'dma-mapping-5.5' of git://git.infradead.org/users/hch/dma-mapping
> > > >> +-------------------------------------------------------------------------+------------+------------+------------+
> > > >> | | fc5a251d0f | 5e6669387e | 81b6b96475 |
> > > >> +-------------------------------------------------------------------------+------------+------------+------------+
> > > >> | boot_successes | 30 | 0 | 0 |
> > > >> | boot_failures | 1 | 11 | 22 |
> > > >> | Oops:#[##] | 1 | | |
> > > >> | EIP:unmap_vmas | 1 | | |
> > > >> | PANIC:double_fault | 1 | | |
> > > >> | Kernel_panic-not_syncing:Fatal_exception | 1 | | |
> > > >> | WARNING:at_drivers/base/core.c:#device_links_supplier_sync_state_resume | 0 | 11 | 22 |
> > > >> | EIP:device_links_supplier_sync_state_resume | 0 | 11 | 22 |
> > > >> +-------------------------------------------------------------------------+------------+------------+------------+
> > > >>
> > > >> If you fix the issue, kindly add following tag
> > > >> Reported-by: kernel test robot <[email protected]>
> > > >>
> > > >> [ 3.186107] OF: /testcase-data/phandle-tests/consumer-b: #phandle-cells = 2 found -1
> > > >> [ 3.188595] platform testcase-data:testcase-device2: IRQ index 0 not found
> > > >> [ 3.191047] ### dt-test ### end of unittest - 199 passed, 0 failed
> > > >> [ 3.191932] ------------[ cut here ]------------
> > > >> [ 3.192571] Unmatched sync_state pause/resume!
> > > >> [ 3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume+0x27/0xc0
> > > >> [ 3.195084] Modules linked in:
> > > >> [ 3.195494] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G T 5.4.0-rc1-00005-g5e6669387e228 #1
> > > >> [ 3.196674] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> > > >> [ 3.197693] EIP: device_links_supplier_sync_state_resume+0x27/0xc0
> > > >> [ 3.198680] Code: 00 00 00 3e 8d 74 26 00 57 56 31 d2 53 b8 a0 d0 d9 c1 e8 6c b6 38 00 a1 e4 d0 d9 c1 85 c0 75 13 68 84 ba c4 c1 e8 29 30 b1 ff <0f> 0b 58 eb 7f 8d 74 26 00 83 e8 01 85 c0 a3 e4 d0 d9 c1 75 6f 8b
> > > >> [ 3.201560] EAX: 00000022 EBX: 00000000 ECX: 00000000 EDX: 00000000
> > > >> [ 3.202466] ESI: 000001ab EDI: c02c7f80 EBP: c1e87d27 ESP: c02c7f20
> > > >> [ 3.203301] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010282
> > > >> [ 3.204258] CR0: 80050033 CR2: bfa1bf98 CR3: 01f28000 CR4: 00140690
> > > >> [ 3.205022] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > > >> [ 3.205919] DR6: fffe0ff0 DR7: 00000400
> > > >> [ 3.206529] Call Trace:
> > > >> [ 3.207011] ? of_platform_sync_state_init+0x13/0x16
> > > >> [ 3.207719] ? do_one_initcall+0xda/0x260
> > > >> [ 3.208247] ? kernel_init_freeable+0x110/0x197
> > > >> [ 3.208906] ? rest_init+0x120/0x120
> > > >> [ 3.209369] ? kernel_init+0xa/0x100
> > > >> [ 3.209775] ? ret_from_fork+0x19/0x24
> > > >> [ 3.210283] ---[ end trace 81d0f2d2ee65199b ]---
> > > >> [ 3.210955] ALSA device list:
> > > >
> > > > Rob/Frank,
> > > >
> > > > This seems to be an issue with the unit test code not properly
> > > > cleaning up the state after it's done.
> > > >
> > > > Specifically, unittest_data_add() setting up of_root on systems where
> > > > there's no device tree (of_root == NULL). It doesn't clean up of_root
> > > > after the tests are done. This affects the of_have_populated_dt() API
> > > > that in turn affects calls to
> > > > device_links_supplier_sync_state_pause/resume(). I think unittests
> > > > shouldn't affect the of_have_populated_dt() API.
> > > There are at least a couple of reasons why the unittest devicetree data
> > > needs to remain after the point where devicetree unittests currently
> > > complete. So cleaning up (removing the data) is not an option.
> > >
> > > I depend on the unittest devicetree entries still existing after the system
> > > boots and I can log into a shell for some validation of the final result of
> > > the devicetree data.
> >
> > IMHO unittests shouldn't have a residual impact on the system after
> > they are done. So, I'll agree to disagree on this one.
>
> They shouldn't be enabled in a production system either. Why would you
> want the extra boot time?

Should we ask the kernel test robot folks to not enable OF unittest
then? It broke my patch, but I wouldn't be surprised if it's silently
breaking other stuff too. I think we need to do option 4 below.

>
> > > There is also a desire for the devicetree unittests to be able to be loaded
> > > as a module. That work is not yet scheduled, but I do not want to preclude
> > > the possibility. If unittests are loaded from a module then they will
> > > need some devicetree data to exist that is created in early boot. That
> > > data will be in the devicetree when of_platform_sync_state_init() is
> > > invoked.
> >
> > On a normal system, FDT is parsed and of_root is set (or not set) very
> > early on during setup_arch() before any of the initcall levels are
> > run. The return value of of_have_populated_dt() isn't expected to
> > change across initcall levels. But because of the way the unittest is
> > written (the of_root is changed at late_initcall() level) the return
> > value of of_have_populated_dt() changes across initcall levels. I
> > think that's a real problem with the unittest -- it's breaking API
> > semantics.
>
> I think what's really desired here is a 'Am I booting using DT' call.

I think the community has decided to use of_have_populated_dt() as
that call. So, we shouldn't break it.

>
> > of_have_populated_dt() is being used to check if DT is present in the
> > system and different things are done based on that. We can't have that
> > value change across initcall levels.
> >
> > Couple of thoughts:
> > 1. Don't run unit test if there is no live DT in the system?
>
> That's pretty much the only case I do run. I use UML to run the tests.

Ah, makes sense.

> > 2. If you don't want to do (1), then at least set up the unit test
> > data during setup_arch() instead of doing it at some initcall level?
>
> That further breaks making it a module. The plan is also to move to
> kunit which probably will preclude some hacky hook into setup_arch().
> Side effects may need to be fixed for kunit though.

Yup.

> > 3. Can you use overlays for the unit tests if they are loaded as a module?
>
> That was the idea, yes.
>
>
> 4. Make running the unittests a command line option instead of running
> if enabled. Still has side effects, but you have to explicitly run it.

Hmm... this is another good option. I think this should be done. Do we
have a consensus on this?

> A module would still be my preference. If only there was someone
> interested in making everything a module... ;)

:)

> > > > I was looking into writing a unittest patch to fix this, but I don't
> > > > know enough about the FDT parsing code to make sure I don't leak any
> > > > memory or free stuff that's in use. I'm not sure I can simply set
> > > > of_root = NULL if it was NULL before the unittest started. Let me know
> > > > how I should proceed or if you plan to write up a patch for this.
> > >
> > > Based on the above, "clean up" of the unittest data is not the solution.
> > >
> > > I haven't looked at the mechanism in device_links_supplier_sync_state_resume()
> > > that leads to the WARN yet. But is does not seem reasonable for that code
> > > to be so sensitive to what valid data is in the devicetree that a WARN results.
> >
> > Sure, I could easily fix it to work around this. But this seems to be
> > a genuine problem with the unittest setup IMO.

I'll go ahead and do this (basically always doing it instead of
checking on of_have_populated_dt()) but I don't want us to forget this
unittest issue.

-Saravana

2019-12-04 04:18:25

by Frank Rowand

[permalink] [raw]
Subject: Re: 5e6669387e ("of/platform: Pause/resume sync state during init .."): [ 3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume

On 12/3/19 4:50 PM, Saravana Kannan wrote:
> On Tue, Dec 3, 2019 at 1:10 PM Rob Herring <[email protected]> wrote:
>>
>> On Tue, Dec 3, 2019 at 2:05 PM Saravana Kannan <[email protected]> wrote:
>>>
>>> On Tue, Dec 3, 2019 at 1:01 AM Frank Rowand <[email protected]> wrote:
>>>>
>>>> On 12/2/19 3:19 PM, Saravana Kannan wrote:
>>>>> On Sun, Dec 1, 2019 at 7:00 AM kernel test robot <[email protected]> wrote:
>>>>>>
>>>>>> Greetings,
>>>>>>
>>>>>> 0day kernel testing robot got the below dmesg and the first bad commit is
>>>>>>
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>>>>>>
>>>>>> commit 5e6669387e2287f25f09fd0abd279dae104cfa7e
>>>>>> Author: Saravana Kannan <[email protected]>
>>>>>> AuthorDate: Wed Sep 4 14:11:24 2019 -0700
>>>>>> Commit: Greg Kroah-Hartman <[email protected]>
>>>>>> CommitDate: Fri Oct 4 17:30:19 2019 +0200
>>>>>>
>>>>>> of/platform: Pause/resume sync state during init and of_platform_populate()
>>>>>>
>>>>>> When all the top level devices are populated from DT during kernel
>>>>>> init, the supplier devices could be added and probed before the
>>>>>> consumer devices are added and linked to the suppliers. To avoid the
>>>>>> sync_state() callback from being called prematurely, pause the
>>>>>> sync_state() callbacks before populating the devices and resume them
>>>>>> at late_initcall_sync().
>>>>>>
>>>>>> Similarly, when children devices are populated from a module using
>>>>>> of_platform_populate(), there could be supplier-consumer dependencies
>>>>>> between the children devices that are populated. To avoid the same
>>>>>> problem with sync_state() being called prematurely, pause and resume
>>>>>> sync_state() callbacks across of_platform_populate().
>>>>>>
>>>>>> Signed-off-by: Saravana Kannan <[email protected]>
>>>>>> Link: https://lore.kernel.org/r/[email protected]
>>>>>> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>>>>>>
>>>>>> fc5a251d0f driver core: Add sync_state driver/bus callback
>>>>>> 5e6669387e of/platform: Pause/resume sync state during init and of_platform_populate()
>>>>>> 81b6b96475 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux; tag 'dma-mapping-5.5' of git://git.infradead.org/users/hch/dma-mapping
>>>>>> +-------------------------------------------------------------------------+------------+------------+------------+
>>>>>> | | fc5a251d0f | 5e6669387e | 81b6b96475 |
>>>>>> +-------------------------------------------------------------------------+------------+------------+------------+
>>>>>> | boot_successes | 30 | 0 | 0 |
>>>>>> | boot_failures | 1 | 11 | 22 |
>>>>>> | Oops:#[##] | 1 | | |
>>>>>> | EIP:unmap_vmas | 1 | | |
>>>>>> | PANIC:double_fault | 1 | | |
>>>>>> | Kernel_panic-not_syncing:Fatal_exception | 1 | | |
>>>>>> | WARNING:at_drivers/base/core.c:#device_links_supplier_sync_state_resume | 0 | 11 | 22 |
>>>>>> | EIP:device_links_supplier_sync_state_resume | 0 | 11 | 22 |
>>>>>> +-------------------------------------------------------------------------+------------+------------+------------+
>>>>>>
>>>>>> If you fix the issue, kindly add following tag
>>>>>> Reported-by: kernel test robot <[email protected]>
>>>>>>
>>>>>> [ 3.186107] OF: /testcase-data/phandle-tests/consumer-b: #phandle-cells = 2 found -1
>>>>>> [ 3.188595] platform testcase-data:testcase-device2: IRQ index 0 not found
>>>>>> [ 3.191047] ### dt-test ### end of unittest - 199 passed, 0 failed
>>>>>> [ 3.191932] ------------[ cut here ]------------
>>>>>> [ 3.192571] Unmatched sync_state pause/resume!
>>>>>> [ 3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume+0x27/0xc0
>>>>>> [ 3.195084] Modules linked in:
>>>>>> [ 3.195494] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G T 5.4.0-rc1-00005-g5e6669387e228 #1
>>>>>> [ 3.196674] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>>>>>> [ 3.197693] EIP: device_links_supplier_sync_state_resume+0x27/0xc0
>>>>>> [ 3.198680] Code: 00 00 00 3e 8d 74 26 00 57 56 31 d2 53 b8 a0 d0 d9 c1 e8 6c b6 38 00 a1 e4 d0 d9 c1 85 c0 75 13 68 84 ba c4 c1 e8 29 30 b1 ff <0f> 0b 58 eb 7f 8d 74 26 00 83 e8 01 85 c0 a3 e4 d0 d9 c1 75 6f 8b
>>>>>> [ 3.201560] EAX: 00000022 EBX: 00000000 ECX: 00000000 EDX: 00000000
>>>>>> [ 3.202466] ESI: 000001ab EDI: c02c7f80 EBP: c1e87d27 ESP: c02c7f20
>>>>>> [ 3.203301] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010282
>>>>>> [ 3.204258] CR0: 80050033 CR2: bfa1bf98 CR3: 01f28000 CR4: 00140690
>>>>>> [ 3.205022] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>>>>>> [ 3.205919] DR6: fffe0ff0 DR7: 00000400
>>>>>> [ 3.206529] Call Trace:
>>>>>> [ 3.207011] ? of_platform_sync_state_init+0x13/0x16
>>>>>> [ 3.207719] ? do_one_initcall+0xda/0x260
>>>>>> [ 3.208247] ? kernel_init_freeable+0x110/0x197
>>>>>> [ 3.208906] ? rest_init+0x120/0x120
>>>>>> [ 3.209369] ? kernel_init+0xa/0x100
>>>>>> [ 3.209775] ? ret_from_fork+0x19/0x24
>>>>>> [ 3.210283] ---[ end trace 81d0f2d2ee65199b ]---
>>>>>> [ 3.210955] ALSA device list:
>>>>>
>>>>> Rob/Frank,
>>>>>
>>>>> This seems to be an issue with the unit test code not properly
>>>>> cleaning up the state after it's done.
>>>>>
>>>>> Specifically, unittest_data_add() setting up of_root on systems where
>>>>> there's no device tree (of_root == NULL). It doesn't clean up of_root
>>>>> after the tests are done. This affects the of_have_populated_dt() API
>>>>> that in turn affects calls to
>>>>> device_links_supplier_sync_state_pause/resume(). I think unittests
>>>>> shouldn't affect the of_have_populated_dt() API.
>>>> There are at least a couple of reasons why the unittest devicetree data
>>>> needs to remain after the point where devicetree unittests currently
>>>> complete. So cleaning up (removing the data) is not an option.
>>>>
>>>> I depend on the unittest devicetree entries still existing after the system
>>>> boots and I can log into a shell for some validation of the final result of
>>>> the devicetree data.
>>>
>>> IMHO unittests shouldn't have a residual impact on the system after
>>> they are done. So, I'll agree to disagree on this one.
>>
>> They shouldn't be enabled in a production system either. Why would you
>> want the extra boot time?
>
> Should we ask the kernel test robot folks to not enable OF unittest

No. If unittests are breaking other code I want to know that.


> then? It broke my patch, but I wouldn't be surprised if it's silently
> breaking other stuff too. I think we need to do option 4 below.
>
>>
>>>> There is also a desire for the devicetree unittests to be able to be loaded
>>>> as a module. That work is not yet scheduled, but I do not want to preclude
>>>> the possibility. If unittests are loaded from a module then they will
>>>> need some devicetree data to exist that is created in early boot. That
>>>> data will be in the devicetree when of_platform_sync_state_init() is
>>>> invoked.
>>>
>>> On a normal system, FDT is parsed and of_root is set (or not set) very
>>> early on during setup_arch() before any of the initcall levels are
>>> run. The return value of of_have_populated_dt() isn't expected to
>>> change across initcall levels. But because of the way the unittest is
>>> written (the of_root is changed at late_initcall() level) the return
>>> value of of_have_populated_dt() changes across initcall levels. I
>>> think that's a real problem with the unittest -- it's breaking API
>>> semantics.
>>
>> I think what's really desired here is a 'Am I booting using DT' call.
>
> I think the community has decided to use of_have_populated_dt() as
> that call. So, we shouldn't break it.

Have you analyzed each and every use of of_have_populated_dt() to verify
that? I have not yet looked at each of them.

The function was created with one user for a specific purpose and the use
of it has grown over the years. I was not going to modify it to have
the specific meaning of "Am I booting using DT" (thus being able to ignore
the existence of the unittest data in the devicetree) without first examining
each of the users of the of_have_populated_dt(). [[ This possible change was
one of the solutions I considered before I examined what the actual problem
leading to the WARNing was. ]]

>
>>
>>> of_have_populated_dt() is being used to check if DT is present in the
>>> system and different things are done based on that. We can't have that
>>> value change across initcall levels.
>>>
>>> Couple of thoughts:
>>> 1. Don't run unit test if there is no live DT in the system?
>>
>> That's pretty much the only case I do run. I use UML to run the tests.
>
> Ah, makes sense.
>
>>> 2. If you don't want to do (1), then at least set up the unit test
>>> data during setup_arch() instead of doing it at some initcall level?
>>
>> That further breaks making it a module. The plan is also to move to
>> kunit which probably will preclude some hacky hook into setup_arch().
>> Side effects may need to be fixed for kunit though.
>
> Yup.
>
>>> 3. Can you use overlays for the unit tests if they are loaded as a module?
>>
>> That was the idea, yes.
>>
>>
>> 4. Make running the unittests a command line option instead of running
>> if enabled. Still has side effects, but you have to explicitly run it.

I am assuming "command line option" means the kernel boot command line, not
a command line interface.

I would prefer not. It is a debug option. There is no need to add the extra
complexity of an additional switch to control it. Configure it in or configure
it out.


>
> Hmm... this is another good option. I think this should be done. Do we
> have a consensus on this?
Why would you even ask if there was consensus on something that has not
even been discussed?


>> A module would still be my preference. If only there was someone
>> interested in making everything a module... ;)>
> :)
>
>>>>> I was looking into writing a unittest patch to fix this, but I don't
>>>>> know enough about the FDT parsing code to make sure I don't leak any
>>>>> memory or free stuff that's in use. I'm not sure I can simply set
>>>>> of_root = NULL if it was NULL before the unittest started. Let me know
>>>>> how I should proceed or if you plan to write up a patch for this.
>>>>
>>>> Based on the above, "clean up" of the unittest data is not the solution.
>>>>
>>>> I haven't looked at the mechanism in device_links_supplier_sync_state_resume()
>>>> that leads to the WARN yet. But is does not seem reasonable for that code
>>>> to be so sensitive to what valid data is in the devicetree that a WARN results.
>>>
>>> Sure, I could easily fix it to work around this. But this seems to be
>>> a genuine problem with the unittest setup IMO.
>
> I'll go ahead and do this (basically always doing it instead of
> checking on of_have_populated_dt()) but I don't want us to forget this
> unittest issue.

Thank you for planning to do this fix.

The unittest issue will not be forgotten. The possible impacts of unittest on
other users of devicetree is something I am very sensitive to and have
thought about quite a bit.

-Frank


>
> -Saravana
>