Hi
The commit 7f99cb5e60392fc3494c610776e733b68784280c ("x86/CPU/AMD: Use
default_groups in kobj_type") causes the following warnings to be printed
during suspend to disk and resume from disk. There are many of these
warnings, 3 for each core.
The machine is two six-core Opterons 8435.
Mikulas
[ 31.349584] PM: hibernation: hibernation entry
[ 31.350319] Filesystems sync: 0.000 seconds
[ 31.350417] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 31.351994] OOM killer disabled.
[ 31.357889] PM: hibernation: Preallocating image memory
[ 34.791852] PM: hibernation: Allocated 735563 pages for snapshot
[ 34.792065] PM: hibernation: Allocated 2942252 kbytes in 3.43 seconds (857.79 MB/s)
[ 34.792296] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done.
[ 34.793791] printk: Suspending console(s) (use no_console_suspend to debug)
[ 34.795159] serial 00:03: disabled
[ 34.795248] serial 00:02: disabled
[ 34.824316] mptbase: ioc0: pci-suspend: pdev=0x00000000f4bc4e1a, slot=0000:02:06.0, Entering operating state [D3]
[ 35.470390] amdgpu 0000:07:00.0: amdgpu: BACO reset
[ 35.533783] Disabling non-boot CPUs ...
[ 35.535798] smpboot: CPU 1 is now offline
[ 35.537754] ------------[ cut here ]------------
[ 35.537764] kernfs: can not remove 'threshold_limit', no directory
[ 35.537789] WARNING: CPU: 2 PID: 21 at fs/kernfs/dir.c:1555 kernfs_remove_by_name_ns+0xa9/0xc0
[ 35.537812] Modules linked in: ipt_REJECT nf_reject_ipv4 hid_generic nft_chain_nat xt_MASQUERADE xt_tcpudp nft_compat nf_tables libcrc32c crc32c_generic nfnetlink snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_pcm snd_timer usbhid snd hid soundcore cpufreq_userspace cpufreq_ondemand cpufreq_powersave cpufreq_conservative bridge stp llc ohci_pci amd64_edac edac_mce_amd kvm_amd tg3 ohci_hcd ehci_pci ptp ehci_hcd e100 pps_core kvm mii usbcore pata_serverworks k10temp irqbypass libphy i2c_piix4 usb_common pcspkr rtc_cmos floppy acpi_cpufreq processor button mousedev dmi_sysfs nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 lm85 hwmon_vid fuse configfs ip_tables x_tables ipv6 autofs4 spadfs amdgpu drm_ttm_helper ttm hwmon gpu_sched i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm evdev fb font fbdev drm_panel_orientation_quirks backlight
[ 35.537994] CPU: 2 PID: 21 Comm: cpuhp/2 Not tainted 5.17.0-rc2 #1
[ 35.538003] Hardware name: empty empty/S3992-E, BIOS 'V1.06 ' 06/09/2009
[ 35.538008] RIP: 0010:kernfs_remove_by_name_ns+0xa9/0xc0
[ 35.538052] Code: 4c 8b 6c 24 10 4c 8b 74 24 18 48 83 c4 20 c3 4c 89 e7 e8 ba d1 de ff b8 fe ff ff ff eb d9 48 c7 c7 70 6f d6 81 e8 95 30 31 00 <0f> 0b b8 fe ff ff ff eb c4 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
[ 35.538060] RSP: 0018:ffff8881036d7d80 EFLAGS: 00010282
[ 35.538067] RAX: 0000000000000036 RBX: ffffffff820320c8 RCX: ffff88900fc9b3c8
[ 35.538071] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff88900fc9b3c0
[ 35.538076] RBP: 0000000000000000 R08: ffffffff8204da60 R09: 64206f6e202c2774
[ 35.538080] R10: 726964206f6e202c R11: 2774696d696c5f64 R12: ffffffff81c0d820
[ 35.538084] R13: ffffffff81d3b89a R14: ffff888103e04800 R15: ffff88900fd751f0
[ 35.538089] FS: 0000000000000000(0000) GS:ffff88900fc80000(0000) knlGS:0000000000000000
[ 35.538095] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 35.538099] CR2: 00007f8bc5d4f4b0 CR3: 0000000150296000 CR4: 00000000000006e0
[ 35.538104] Call Trace:
[ 35.538110] <TASK>
[ 35.538115] remove_files+0x26/0x60
[ 35.538124] sysfs_remove_group+0x41/0xa0
[ 35.538131] sysfs_remove_groups+0x23/0x40
[ 35.538137] __kobject_del+0x1b/0xa0
[ 35.538146] kobject_del+0xf/0x20
[ 35.538153] mce_threshold_remove_device+0xd4/0x1c0
[ 35.538164] mce_cpu_pre_down+0x5c/0x70
[ 35.538170] ? mce_enable_ce+0x40/0x40
[ 35.538176] cpuhp_invoke_callback+0x2bd/0x460
[ 35.538185] ? __schedule+0x232/0x630
[ 35.538195] ? smpboot_register_percpu_thread+0xd0/0xd0
[ 35.538204] cpuhp_thread_fun+0x75/0x120
[ 35.538212] smpboot_thread_fn+0xc3/0x1c0
[ 35.538220] kthread+0xd1/0x100
[ 35.538229] ? kthread_complete_and_exit+0x20/0x20
[ 35.538238] ret_from_fork+0x1f/0x30
[ 35.538248] </TASK>
[ 35.538250] ---[ end trace 0000000000000000 ]---
[ 35.538254] ------------[ cut here ]------------
[ 35.538256] kernfs: can not remove 'error_count', no directory
[ 35.538271] WARNING: CPU: 2 PID: 21 at fs/kernfs/dir.c:1555 kernfs_remove_by_name_ns+0xa9/0xc0
[ 35.538283] Modules linked in: ipt_REJECT nf_reject_ipv4 hid_generic nft_chain_nat xt_MASQUERADE xt_tcpudp nft_compat nf_tables libcrc32c crc32c_generic nfnetlink snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_pcm snd_timer usbhid snd hid soundcore cpufreq_userspace cpufreq_ondemand cpufreq_powersave cpufreq_conservative bridge stp llc ohci_pci amd64_edac edac_mce_amd kvm_amd tg3 ohci_hcd ehci_pci ptp ehci_hcd e100 pps_core kvm mii usbcore pata_serverworks k10temp irqbypass libphy i2c_piix4 usb_common pcspkr rtc_cmos floppy acpi_cpufreq processor button mousedev dmi_sysfs nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 lm85 hwmon_vid fuse configfs ip_tables x_tables ipv6 autofs4 spadfs amdgpu drm_ttm_helper ttm hwmon gpu_sched i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm evdev fb font fbdev drm_panel_orientation_quirks backlight
[ 35.538429] CPU: 2 PID: 21 Comm: cpuhp/2 Tainted: G W 5.17.0-rc2 #1
[ 35.538436] Hardware name: empty empty/S3992-E, BIOS 'V1.06 ' 06/09/2009
[ 35.538438] RIP: 0010:kernfs_remove_by_name_ns+0xa9/0xc0
[ 35.538447] Code: 4c 8b 6c 24 10 4c 8b 74 24 18 48 83 c4 20 c3 4c 89 e7 e8 ba d1 de ff b8 fe ff ff ff eb d9 48 c7 c7 70 6f d6 81 e8 95 30 31 00 <0f> 0b b8 fe ff ff ff eb c4 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
[ 35.538453] RSP: 0018:ffff8881036d7d80 EFLAGS: 00010282
[ 35.538459] RAX: 0000000000000032 RBX: ffffffff820320d0 RCX: ffff88900fc9b3c8
[ 35.538463] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff88900fc9b3c0
[ 35.538467] RBP: 0000000000000000 R08: ffffffff8204da60 R09: 64206f6e202c2774
[ 35.538471] R10: 6f74636572696420 R11: 6f6e202c27746e75 R12: ffffffff81c0d820
[ 35.538475] R13: ffffffff81d3b8bb R14: ffff888103e04800 R15: ffff88900fd751f0
[ 35.538479] FS: 0000000000000000(0000) GS:ffff88900fc80000(0000) knlGS:0000000000000000
[ 35.538484] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 35.538488] CR2: 00007f8bc5d4f4b0 CR3: 0000000150296000 CR4: 00000000000006e0
[ 35.538493] Call Trace:
[ 35.538494] <TASK>
[ 35.538497] remove_files+0x26/0x60
[ 35.538503] sysfs_remove_group+0x41/0xa0
[ 35.538509] sysfs_remove_groups+0x23/0x40
[ 35.538515] __kobject_del+0x1b/0xa0
[ 35.538521] kobject_del+0xf/0x20
[ 35.538527] mce_threshold_remove_device+0xd4/0x1c0
[ 35.538535] mce_cpu_pre_down+0x5c/0x70
[ 35.538540] ? mce_enable_ce+0x40/0x40
[ 35.538545] cpuhp_invoke_callback+0x2bd/0x460
[ 35.538553] ? __schedule+0x232/0x630
[ 35.538562] ? smpboot_register_percpu_thread+0xd0/0xd0
[ 35.538569] cpuhp_thread_fun+0x75/0x120
[ 35.538577] smpboot_thread_fn+0xc3/0x1c0
[ 35.538585] kthread+0xd1/0x100
[ 35.538592] ? kthread_complete_and_exit+0x20/0x20
[ 35.538601] ret_from_fork+0x1f/0x30
[ 35.538609] </TASK>
[ 35.538611] ---[ end trace 0000000000000000 ]---
[ 35.538614] ------------[ cut here ]------------
[ 35.538615] kernfs: can not remove 'interrupt_enable', no directory
[ 35.538629] WARNING: CPU: 2 PID: 21 at fs/kernfs/dir.c:1555 kernfs_remove_by_name_ns+0xa9/0xc0
[ 35.538641] Modules linked in: ipt_REJECT nf_reject_ipv4 hid_generic nft_chain_nat xt_MASQUERADE xt_tcpudp nft_compat nf_tables libcrc32c crc32c_generic nfnetlink snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_pcm snd_timer usbhid snd hid soundcore cpufreq_userspace cpufreq_ondemand cpufreq_powersave cpufreq_conservative bridge stp llc ohci_pci amd64_edac edac_mce_amd kvm_amd tg3 ohci_hcd ehci_pci ptp ehci_hcd e100 pps_core kvm mii usbcore pata_serverworks k10temp irqbypass libphy i2c_piix4 usb_common pcspkr rtc_cmos floppy acpi_cpufreq processor button mousedev dmi_sysfs nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 lm85 hwmon_vid fuse configfs ip_tables x_tables ipv6 autofs4 spadfs amdgpu drm_ttm_helper ttm hwmon gpu_sched i2c_algo_bit drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm evdev fb font fbdev drm_panel_orientation_quirks backlight
[ 35.538785] CPU: 2 PID: 21 Comm: cpuhp/2 Tainted: G W 5.17.0-rc2 #1
[ 35.538791] Hardware name: empty empty/S3992-E, BIOS 'V1.06 ' 06/09/2009
[ 35.538794] RIP: 0010:kernfs_remove_by_name_ns+0xa9/0xc0
[ 35.538803] Code: 4c 8b 6c 24 10 4c 8b 74 24 18 48 83 c4 20 c3 4c 89 e7 e8 ba d1 de ff b8 fe ff ff ff eb d9 48 c7 c7 70 6f d6 81 e8 95 30 31 00 <0f> 0b b8 fe ff ff ff eb c4 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
[ 35.538808] RSP: 0018:ffff8881036d7d80 EFLAGS: 00010282
[ 35.538813] RAX: 0000000000000037 RBX: ffffffff820320d8 RCX: ffff88900fc9b3c8
[ 35.538817] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff88900fc9b3c0
[ 35.538821] RBP: 0000000000000000 R08: ffffffff8204da60 R09: 64206f6e202c2765
[ 35.538825] R10: 6964206f6e202c27 R11: 656c62616e655f74 R12: ffffffff81c0d820
[ 35.538829] R13: ffffffff81d3b8aa R14: ffff888103e04800 R15: ffff88900fd751f0
[ 35.538833] FS: 0000000000000000(0000) GS:ffff88900fc80000(0000) knlGS:0000000000000000
[ 35.538838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 35.538842] CR2: 00007f8bc5d4f4b0 CR3: 0000000150296000 CR4: 00000000000006e0
[ 35.538846] Call Trace:
[ 35.538848] <TASK>
[ 35.538850] remove_files+0x26/0x60
[ 35.538856] sysfs_remove_group+0x41/0xa0
[ 35.538862] sysfs_remove_groups+0x23/0x40
[ 35.538868] __kobject_del+0x1b/0xa0
[ 35.538874] kobject_del+0xf/0x20
[ 35.538880] mce_threshold_remove_device+0xd4/0x1c0
[ 35.538887] mce_cpu_pre_down+0x5c/0x70
[ 35.538893] ? mce_enable_ce+0x40/0x40
[ 35.538897] cpuhp_invoke_callback+0x2bd/0x460
[ 35.538905] ? __schedule+0x232/0x630
[ 35.538914] ? smpboot_register_percpu_thread+0xd0/0xd0
[ 35.538922] cpuhp_thread_fun+0x75/0x120
[ 35.538929] smpboot_thread_fn+0xc3/0x1c0
[ 35.538937] kthread+0xd1/0x100
[ 35.538944] ? kthread_complete_and_exit+0x20/0x20
[ 35.538953] ret_from_fork+0x1f/0x30
[ 35.538961] </TASK>
[ 35.538963] ---[ end trace 0000000000000000 ]---
On Mon, May 30, 2022 at 12:16:24PM -0400, Mikulas Patocka wrote:
> Hi
>
> The commit 7f99cb5e60392fc3494c610776e733b68784280c ("x86/CPU/AMD: Use
> default_groups in kobj_type") causes the following warnings to be printed
> during suspend to disk and resume from disk. There are many of these
> warnings, 3 for each core.
And if you revert that change it goes back to not warning?
that is odd.
>
> The machine is two six-core Opterons 8435.
>
> Mikulas
>
>
> [ 31.349584] PM: hibernation: hibernation entry
> [ 31.350319] Filesystems sync: 0.000 seconds
> [ 31.350417] Freezing user space processes ... (elapsed 0.001 seconds) done.
> [ 31.351994] OOM killer disabled.
> [ 31.357889] PM: hibernation: Preallocating image memory
> [ 34.791852] PM: hibernation: Allocated 735563 pages for snapshot
> [ 34.792065] PM: hibernation: Allocated 2942252 kbytes in 3.43 seconds (857.79 MB/s)
> [ 34.792296] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done.
> [ 34.793791] printk: Suspending console(s) (use no_console_suspend to debug)
> [ 34.795159] serial 00:03: disabled
> [ 34.795248] serial 00:02: disabled
> [ 34.824316] mptbase: ioc0: pci-suspend: pdev=0x00000000f4bc4e1a, slot=0000:02:06.0, Entering operating state [D3]
> [ 35.470390] amdgpu 0000:07:00.0: amdgpu: BACO reset
> [ 35.533783] Disabling non-boot CPUs ...
> [ 35.535798] smpboot: CPU 1 is now offline
> [ 35.537754] ------------[ cut here ]------------
> [ 35.537764] kernfs: can not remove 'threshold_limit', no directory
Before you suspend, is this directory (and the other ones) really there?
Are they not getting created now properly somehow? Any warning messages
at boot time?
thanks,
greg k-h
On Mon, 30 May 2022, Greg Kroah-Hartman wrote:
> On Mon, May 30, 2022 at 12:16:24PM -0400, Mikulas Patocka wrote:
> > Hi
> >
> > The commit 7f99cb5e60392fc3494c610776e733b68784280c ("x86/CPU/AMD: Use
> > default_groups in kobj_type") causes the following warnings to be printed
> > during suspend to disk and resume from disk. There are many of these
> > warnings, 3 for each core.
>
> And if you revert that change it goes back to not warning?
>
> that is odd.
If I revert this change on 5.18, I end up with non-compilable kernel - it
complains that ?struct kobj_type? has no member named ?default_attrs?
However, I verified that the bug is present on commit
7f99cb5e60392fc3494c610776e733b68784280c and absent on its parent commit
26291c54e111ff6ba87a164d85d4a4e134b7315c.
> >
> > The machine is two six-core Opterons 8435.
> >
> > Mikulas
> >
> >
> > [ 31.349584] PM: hibernation: hibernation entry
> > [ 31.350319] Filesystems sync: 0.000 seconds
> > [ 31.350417] Freezing user space processes ... (elapsed 0.001 seconds) done.
> > [ 31.351994] OOM killer disabled.
> > [ 31.357889] PM: hibernation: Preallocating image memory
> > [ 34.791852] PM: hibernation: Allocated 735563 pages for snapshot
> > [ 34.792065] PM: hibernation: Allocated 2942252 kbytes in 3.43 seconds (857.79 MB/s)
> > [ 34.792296] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done.
> > [ 34.793791] printk: Suspending console(s) (use no_console_suspend to debug)
> > [ 34.795159] serial 00:03: disabled
> > [ 34.795248] serial 00:02: disabled
> > [ 34.824316] mptbase: ioc0: pci-suspend: pdev=0x00000000f4bc4e1a, slot=0000:02:06.0, Entering operating state [D3]
> > [ 35.470390] amdgpu 0000:07:00.0: amdgpu: BACO reset
> > [ 35.533783] Disabling non-boot CPUs ...
> > [ 35.535798] smpboot: CPU 1 is now offline
> > [ 35.537754] ------------[ cut here ]------------
> > [ 35.537764] kernfs: can not remove 'threshold_limit', no directory
>
> Before you suspend, is this directory (and the other ones) really there?
The files are present both before the suspend and after the
suspend+resume. This is the list of files for one core:
/sys/devices/system/machinecheck/machinecheck0
/sys/devices/system/machinecheck/machinecheck0/bank0
/sys/devices/system/machinecheck/machinecheck0/bank1
/sys/devices/system/machinecheck/machinecheck0/bank2
/sys/devices/system/machinecheck/machinecheck0/bank3
/sys/devices/system/machinecheck/machinecheck0/bank4
/sys/devices/system/machinecheck/machinecheck0/bank5
/sys/devices/system/machinecheck/machinecheck0/cmci_disabled
/sys/devices/system/machinecheck/machinecheck0/dont_log_ce
/sys/devices/system/machinecheck/machinecheck0/check_interval
/sys/devices/system/machinecheck/machinecheck0/ignore_ce
/sys/devices/system/machinecheck/machinecheck0/monarch_timeout
/sys/devices/system/machinecheck/machinecheck0/northbridge
/sys/devices/system/machinecheck/machinecheck0/northbridge/dram
/sys/devices/system/machinecheck/machinecheck0/northbridge/dram/error_count
/sys/devices/system/machinecheck/machinecheck0/northbridge/dram/interrupt_enable
/sys/devices/system/machinecheck/machinecheck0/northbridge/dram/threshold_limit
/sys/devices/system/machinecheck/machinecheck0/northbridge/ht_links
/sys/devices/system/machinecheck/machinecheck0/northbridge/ht_links/error_count
/sys/devices/system/machinecheck/machinecheck0/northbridge/ht_links/interrupt_enable
/sys/devices/system/machinecheck/machinecheck0/northbridge/ht_links/threshold_limit
/sys/devices/system/machinecheck/machinecheck0/northbridge/l3_cache
/sys/devices/system/machinecheck/machinecheck0/northbridge/l3_cache/error_count
/sys/devices/system/machinecheck/machinecheck0/northbridge/l3_cache/interrupt_enable
/sys/devices/system/machinecheck/machinecheck0/northbridge/l3_cache/threshold_limit
/sys/devices/system/machinecheck/machinecheck0/power
/sys/devices/system/machinecheck/machinecheck0/power/autosuspend_delay_ms
/sys/devices/system/machinecheck/machinecheck0/power/control
/sys/devices/system/machinecheck/machinecheck0/power/runtime_active_time
/sys/devices/system/machinecheck/machinecheck0/power/runtime_status
/sys/devices/system/machinecheck/machinecheck0/power/runtime_suspended_time
/sys/devices/system/machinecheck/machinecheck0/print_all
/sys/devices/system/machinecheck/machinecheck0/subsystem
/sys/devices/system/machinecheck/machinecheck0/uevent
> Are they not getting created now properly somehow? Any warning messages
> at boot time?
There are no warnings on boot.
> thanks,
>
> greg k-h
Mikulas
On Tue, May 31, 2022 at 03:42:12AM -0400, Mikulas Patocka wrote:
...
> > > The machine is two six-core Opterons 8435.
> > >
> > > Mikulas
Hi Mikulas,
I'm not able to reproduce this issue on the systems I have access to. But I
think the following patch may be the solution. Can you please try this?
Thanks,
Yazen
=========
From 811aab8b5eb96d7f62b30d745aeacf74447eeccc Mon Sep 17 00:00:00 2001
From: Yazen Ghannam <[email protected]>
Date: Tue, 31 May 2022 16:29:37 +0000
Subject: [PATCH] x86/MCE/AMD: Delete kobject for block instead of bank
The AMD Thresholding kobjects are laid out such that each "bank" is parent
to one or more "blocks". Systems from Family 10h to 16h have bank4 shared
between logical CPUs that are attached to the same Northbridge. This
sharing behavior is handled when creating and removing kobjects.
During removal, the block kobjects are deleted from each CPU sharing bank4.
The final CPU puts all the block kobjects, which also deletes them, and
then puts the bank kobject.
However, __threshold_remove_blocks() deletes the bank kobject before
deleting the block kobjects. This essentially deletes the parent before all
the children, and this may cause kernel warnings.
Don't delete the bank kobject in __threshold_remove_blocks(). Leave this
for the put at the end of threshold_remove_bank(). Instead delete the block
kobject which is used as the head of the list of blocks, after deleting
all the other blocks in the list. This follows the same behavior seen in
deallocate_threshold_blocks().
Fixes: 019f34fccfd5 ("x86, MCE, AMD: Move shared bank to node descriptor")
Cc: [email protected]
Signed-off-by: Yazen Ghannam <[email protected]>
---
arch/x86/kernel/cpu/mce/amd.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 1c87501e0fa3..cda75aed8ea0 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -1258,10 +1258,10 @@ static void __threshold_remove_blocks(struct threshold_bank *b)
struct threshold_block *pos = NULL;
struct threshold_block *tmp = NULL;
- kobject_del(b->kobj);
-
list_for_each_entry_safe(pos, tmp, &b->blocks->miscj, miscj)
kobject_del(&pos->kobj);
+
+ kobject_del(&b->blocks->kobj);
}
static void threshold_remove_bank(struct threshold_bank *bank)
--
2.25.1
On Thu, 2 Jun 2022, Yazen Ghannam wrote:
> On Tue, May 31, 2022 at 03:42:12AM -0400, Mikulas Patocka wrote:
>
> ...
>
> > > > The machine is two six-core Opterons 8435.
> > > >
> > > > Mikulas
>
> Hi Mikulas,
>
> I'm not able to reproduce this issue on the systems I have access to. But I
> think the following patch may be the solution. Can you please try this?
>
> Thanks,
> Yazen
I tried this patch and it doesn't help.
With this patch, it's even worse - before the patch, I had 48 warnings per
suspend, with the patch, I have 72 warnings.
Mikulas
On Fri, Jun 03, 2022 at 01:34:26PM -0400, Mikulas Patocka wrote:
...
> I tried this patch and it doesn't help.
Thanks Mikulas for testing.
I'm still not able to reproduce the exact issue. But I was able to reproduce
the same symptom by hacking the kernel and doing CPU hotplug.
Can you please try the following patch? This seems to work in my hacked case.
I also tried to write out a detailed description of the issue to the best of
my knowledge.
Thanks,
Yazen
========================
From d1fa5cdc7f29bf810215f0a83f16bc7435e55240 Mon Sep 17 00:00:00 2001
From: Yazen Ghannam <[email protected]>
Date: Mon, 6 Jun 2022 19:45:56 +0000
Subject: [PATCH] x86/MCE/AMD: Decrement threshold_bank refcount when removing
threshold blocks
AMD systems from Family 10h to 16h share MCA bank 4 across multiple CPUs.
Therefore, the threshold_bank structure for bank 4, and its threshold_block
structures, will be initialized once at boot time. And the kobject for the
shared bank will be added to each of the CPUs that share it. Furthermore,
the threshold_blocks for the shared bank will be added again to the bank's
kobject. These additions will increase the refcount for the bank's kobject.
For example, a shared bank with two blocks and shared across two CPUs will
be set up like this:
CPU0 init
bank create and add; bank refcount = 1; threshold_create_bank()
block 0 init and add; bank refcount = 2; allocate_threshold_blocks()
block 1 init and add; bank refcount = 3; allocate_threshold_blocks()
CPU1 init
bank add; bank refcount = 3; threshold_create_bank()
block 0 add; bank refcount = 4; __threshold_add_blocks()
block 1 add; bank refcount = 5; __threshold_add_blocks()
Currently in threshold_remove_bank(), if the bank is shared then
__threshold_remove_blocks() is called. Here the shared bank's kobject and
the bank's blocks' kobjects are deleted. This is done on the first call
even while the structures are still shared. Subsequent calls from other
CPUs that share the structures will attempt to delete the kobjects.
During kobject_del(), kobject->sd is removed. If the kobject is not part of
a kset with default_groups, then subsequent kobject_del() calls seem safe
even with kobject->sd == NULL.
Originally, the AMD MCA thresholding structures did not use default_groups.
And so the above behavior was not apparent.
However, a recent change implemented default_groups for the thresholding
structures. Therefore, kobject_del() will go down the sysfs_remove_groups()
code path. In this case, the first kobject_del() may succeed and remove
kobject->sd. But subsequent kobject_del() calls will give a WARNing in
kernfs_remove_by_name_ns() since kobject->sd == NULL.
Use kobject_put() on the shared bank's kobject when "removing" blocks. This
decrements the bank's refcount while keeping kobjects enabled until the
bank is no longer shared. At that point, kobject_put() will be called on
the blocks which drives their refcount to 0 and deletes them and also
decrementing the bank's refcount. And finally kobject_put() will be called
on the bank driving its refcount to 0 and deleting it.
With this patch and the example above:
CPU1 shutdown
bank is shared; bank refcount = 5; threshold_remove_bank()
block 0 put parent bank; bank refcount = 4; __threshold_remove_blocks()
block 1 put parent bank; bank refcount = 3; __threshold_remove_blocks()
CPU0 shutdown
bank is no longer shared; bank refcount = 3; threshold_remove_bank()
block 0 put block; bank refcount = 2; deallocate_threshold_blocks()
block 1 put block; bank refcount = 1; deallocate_threshold_blocks()
put bank; bank refcount = 0; threshold_remove_bank()
Signed-off-by: Yazen Ghannam <[email protected]>
---
arch/x86/kernel/cpu/mce/amd.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 2b7ee4a6c6ba..680b75d23a03 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -1260,10 +1260,10 @@ static void __threshold_remove_blocks(struct threshold_bank *b)
struct threshold_block *pos = NULL;
struct threshold_block *tmp = NULL;
- kobject_del(b->kobj);
+ kobject_put(b->kobj);
list_for_each_entry_safe(pos, tmp, &b->blocks->miscj, miscj)
- kobject_del(&pos->kobj);
+ kobject_put(b->kobj);
}
static void threshold_remove_bank(struct threshold_bank *bank)
--
2.25.1
On Wed, 8 Jun 2022, Yazen Ghannam wrote:
> On Fri, Jun 03, 2022 at 01:34:26PM -0400, Mikulas Patocka wrote:
>
> ...
>
> > I tried this patch and it doesn't help.
>
> Thanks Mikulas for testing.
>
> I'm still not able to reproduce the exact issue. But I was able to reproduce
> the same symptom by hacking the kernel and doing CPU hotplug.
I also see the warnings when disabling cores.
> Can you please try the following patch? This seems to work in my hacked case.
> I also tried to write out a detailed description of the issue to the best of
> my knowledge.
This patch works - there are no longer any warnings on CPU disable or on
suspend to disk.
Mikulas
> Thanks,
> Yazen
>
> ========================
>
> >From d1fa5cdc7f29bf810215f0a83f16bc7435e55240 Mon Sep 17 00:00:00 2001
> From: Yazen Ghannam <[email protected]>
> Date: Mon, 6 Jun 2022 19:45:56 +0000
> Subject: [PATCH] x86/MCE/AMD: Decrement threshold_bank refcount when removing
> threshold blocks
>
> AMD systems from Family 10h to 16h share MCA bank 4 across multiple CPUs.
> Therefore, the threshold_bank structure for bank 4, and its threshold_block
> structures, will be initialized once at boot time. And the kobject for the
> shared bank will be added to each of the CPUs that share it. Furthermore,
> the threshold_blocks for the shared bank will be added again to the bank's
> kobject. These additions will increase the refcount for the bank's kobject.
>
> For example, a shared bank with two blocks and shared across two CPUs will
> be set up like this:
>
> CPU0 init
> bank create and add; bank refcount = 1; threshold_create_bank()
> block 0 init and add; bank refcount = 2; allocate_threshold_blocks()
> block 1 init and add; bank refcount = 3; allocate_threshold_blocks()
> CPU1 init
> bank add; bank refcount = 3; threshold_create_bank()
> block 0 add; bank refcount = 4; __threshold_add_blocks()
> block 1 add; bank refcount = 5; __threshold_add_blocks()
>
> Currently in threshold_remove_bank(), if the bank is shared then
> __threshold_remove_blocks() is called. Here the shared bank's kobject and
> the bank's blocks' kobjects are deleted. This is done on the first call
> even while the structures are still shared. Subsequent calls from other
> CPUs that share the structures will attempt to delete the kobjects.
>
> During kobject_del(), kobject->sd is removed. If the kobject is not part of
> a kset with default_groups, then subsequent kobject_del() calls seem safe
> even with kobject->sd == NULL.
>
> Originally, the AMD MCA thresholding structures did not use default_groups.
> And so the above behavior was not apparent.
>
> However, a recent change implemented default_groups for the thresholding
> structures. Therefore, kobject_del() will go down the sysfs_remove_groups()
> code path. In this case, the first kobject_del() may succeed and remove
> kobject->sd. But subsequent kobject_del() calls will give a WARNing in
> kernfs_remove_by_name_ns() since kobject->sd == NULL.
>
> Use kobject_put() on the shared bank's kobject when "removing" blocks. This
> decrements the bank's refcount while keeping kobjects enabled until the
> bank is no longer shared. At that point, kobject_put() will be called on
> the blocks which drives their refcount to 0 and deletes them and also
> decrementing the bank's refcount. And finally kobject_put() will be called
> on the bank driving its refcount to 0 and deleting it.
>
> With this patch and the example above:
>
> CPU1 shutdown
> bank is shared; bank refcount = 5; threshold_remove_bank()
> block 0 put parent bank; bank refcount = 4; __threshold_remove_blocks()
> block 1 put parent bank; bank refcount = 3; __threshold_remove_blocks()
> CPU0 shutdown
> bank is no longer shared; bank refcount = 3; threshold_remove_bank()
> block 0 put block; bank refcount = 2; deallocate_threshold_blocks()
> block 1 put block; bank refcount = 1; deallocate_threshold_blocks()
> put bank; bank refcount = 0; threshold_remove_bank()
>
> Signed-off-by: Yazen Ghannam <[email protected]>
Tested-by: Mikulas Patocka <[email protected]>
> ---
> arch/x86/kernel/cpu/mce/amd.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 2b7ee4a6c6ba..680b75d23a03 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -1260,10 +1260,10 @@ static void __threshold_remove_blocks(struct threshold_bank *b)
> struct threshold_block *pos = NULL;
> struct threshold_block *tmp = NULL;
>
> - kobject_del(b->kobj);
> + kobject_put(b->kobj);
>
> list_for_each_entry_safe(pos, tmp, &b->blocks->miscj, miscj)
> - kobject_del(&pos->kobj);
> + kobject_put(b->kobj);
> }
>
> static void threshold_remove_bank(struct threshold_bank *bank)
> --
> 2.25.1
>
Hi
I upgraded the kernel to 6.4.4 and it still has this bug (see
https://lore.kernel.org/lkml/YqD1YjeovGu28xsP@yaz-fattaah/T/ for the
beginning of this thread).
The patch below fixes these warnings.
I'd like to ask you to submit the patch upstream, with
Fixes: 7f99cb5e6039 ("x86/CPU/AMD: Use default_groups in kobj_type")
Cc: [email protected] # v5.18+
Tested-by: Mikulas Patocka <[email protected]>
Mikulas
On Sat, 11 Jun 2022, Mikulas Patocka wrote:
>
>
> On Wed, 8 Jun 2022, Yazen Ghannam wrote:
>
> > On Fri, Jun 03, 2022 at 01:34:26PM -0400, Mikulas Patocka wrote:
> >
> > ...
> >
> > > I tried this patch and it doesn't help.
> >
> > Thanks Mikulas for testing.
> >
> > I'm still not able to reproduce the exact issue. But I was able to reproduce
> > the same symptom by hacking the kernel and doing CPU hotplug.
>
> I also see the warnings when disabling cores.
>
> > Can you please try the following patch? This seems to work in my hacked case.
> > I also tried to write out a detailed description of the issue to the best of
> > my knowledge.
>
> This patch works - there are no longer any warnings on CPU disable or on
> suspend to disk.
>
> Mikulas
>
> > Thanks,
> > Yazen
> >
> > ========================
> >
> > >From d1fa5cdc7f29bf810215f0a83f16bc7435e55240 Mon Sep 17 00:00:00 2001
> > From: Yazen Ghannam <[email protected]>
> > Date: Mon, 6 Jun 2022 19:45:56 +0000
> > Subject: [PATCH] x86/MCE/AMD: Decrement threshold_bank refcount when removing
> > threshold blocks
> >
> > AMD systems from Family 10h to 16h share MCA bank 4 across multiple CPUs.
> > Therefore, the threshold_bank structure for bank 4, and its threshold_block
> > structures, will be initialized once at boot time. And the kobject for the
> > shared bank will be added to each of the CPUs that share it. Furthermore,
> > the threshold_blocks for the shared bank will be added again to the bank's
> > kobject. These additions will increase the refcount for the bank's kobject.
> >
> > For example, a shared bank with two blocks and shared across two CPUs will
> > be set up like this:
> >
> > CPU0 init
> > bank create and add; bank refcount = 1; threshold_create_bank()
> > block 0 init and add; bank refcount = 2; allocate_threshold_blocks()
> > block 1 init and add; bank refcount = 3; allocate_threshold_blocks()
> > CPU1 init
> > bank add; bank refcount = 3; threshold_create_bank()
> > block 0 add; bank refcount = 4; __threshold_add_blocks()
> > block 1 add; bank refcount = 5; __threshold_add_blocks()
> >
> > Currently in threshold_remove_bank(), if the bank is shared then
> > __threshold_remove_blocks() is called. Here the shared bank's kobject and
> > the bank's blocks' kobjects are deleted. This is done on the first call
> > even while the structures are still shared. Subsequent calls from other
> > CPUs that share the structures will attempt to delete the kobjects.
> >
> > During kobject_del(), kobject->sd is removed. If the kobject is not part of
> > a kset with default_groups, then subsequent kobject_del() calls seem safe
> > even with kobject->sd == NULL.
> >
> > Originally, the AMD MCA thresholding structures did not use default_groups.
> > And so the above behavior was not apparent.
> >
> > However, a recent change implemented default_groups for the thresholding
> > structures. Therefore, kobject_del() will go down the sysfs_remove_groups()
> > code path. In this case, the first kobject_del() may succeed and remove
> > kobject->sd. But subsequent kobject_del() calls will give a WARNing in
> > kernfs_remove_by_name_ns() since kobject->sd == NULL.
> >
> > Use kobject_put() on the shared bank's kobject when "removing" blocks. This
> > decrements the bank's refcount while keeping kobjects enabled until the
> > bank is no longer shared. At that point, kobject_put() will be called on
> > the blocks which drives their refcount to 0 and deletes them and also
> > decrementing the bank's refcount. And finally kobject_put() will be called
> > on the bank driving its refcount to 0 and deleting it.
> >
> > With this patch and the example above:
> >
> > CPU1 shutdown
> > bank is shared; bank refcount = 5; threshold_remove_bank()
> > block 0 put parent bank; bank refcount = 4; __threshold_remove_blocks()
> > block 1 put parent bank; bank refcount = 3; __threshold_remove_blocks()
> > CPU0 shutdown
> > bank is no longer shared; bank refcount = 3; threshold_remove_bank()
> > block 0 put block; bank refcount = 2; deallocate_threshold_blocks()
> > block 1 put block; bank refcount = 1; deallocate_threshold_blocks()
> > put bank; bank refcount = 0; threshold_remove_bank()
> >
> > Signed-off-by: Yazen Ghannam <[email protected]>
>
> Tested-by: Mikulas Patocka <[email protected]>
>
> > ---
> > arch/x86/kernel/cpu/mce/amd.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> > index 2b7ee4a6c6ba..680b75d23a03 100644
> > --- a/arch/x86/kernel/cpu/mce/amd.c
> > +++ b/arch/x86/kernel/cpu/mce/amd.c
> > @@ -1260,10 +1260,10 @@ static void __threshold_remove_blocks(struct threshold_bank *b)
> > struct threshold_block *pos = NULL;
> > struct threshold_block *tmp = NULL;
> >
> > - kobject_del(b->kobj);
> > + kobject_put(b->kobj);
> >
> > list_for_each_entry_safe(pos, tmp, &b->blocks->miscj, miscj)
> > - kobject_del(&pos->kobj);
> > + kobject_put(b->kobj);
> > }
> >
> > static void threshold_remove_bank(struct threshold_bank *bank)
> > --
> > 2.25.1
> >
>