Hi,
Munin uses the following command to get sensor-type information out of
SMART-aware disks (e.g. temperature):
/usr/sbin/smartctl -A --nocheck=standby -d ata /dev/sda
This broke following an upgrade from v6.1.76 (as found in Debian 12) to
v6.1.82 (as currently found in the proposed-updates repository for the
next point release of Debian 12), with smartctl's now reporting:
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-19-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, http://www.smartmontools.org
Device is in SLEEP mode, exit(2)
This happens on baremetal with 2 pairs of disks:
- 2×ST4000VN008-2DR1 (sda, sdb)
- 2×ST8000VN004-2M21 (sdc, sdd)
and that's an obvious lie with one pair doing system stuff and the other
one doing media stuff.
This also happens within a Debian 12 QEMU VM running on a Debian 12
libvirt host, when using a SATA disk, which is what I've used to test
various builds from the stable/linux-6.1.y branch and associated tags.
Building stable releases, I pinpointed it as a regression between
v6.1.80 and v6.1.81, then pinpointed it to commit cf33e6ca12d8.
#regzbot introduced: v6.1.80..v6.1.81
#regzbot introduced: cf33e6ca12d8
This is also affecting v6.1.84 and v6.1.85 (released during my git
bisect session).
Reported in Debian via: https://bugs.debian.org/1068675 (which included
a trace with the distribution-provided v6.1.82 package).
Most recent trace, with v6.1.85 (mainline, using the distribution's
config but without any patches):
[ 30.547027] ------------[ cut here ]------------
[ 30.547034] WARNING: CPU: 0 PID: 697 at drivers/scsi/scsi_lib.c:214 scsi_execute_cmd+0x42/0x2c0 [scsi_mod]
[ 30.547082] Modules linked in: tls tun intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi aesni_intel snd_hda_codec crypto_simd cryptd rapl snd_hda_core snd_hwdep bochs drm_vram_helper pcspkr drm_ttm_helper snd_pcm iTCO_wdt snd_timer intel_pmc_bxt ttm iTCO_vendor_support snd watchdog soundcore virtio_console virtio_balloon drm_kms_helper button joydev evdev serio_raw sg binfmt_misc fuse loop drm efi_pstore dm_mod configfs qemu_fw_cfg virtio_rng ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid sd_mod t10_pi crc64_rocksoft crc64 crc_t10dif crct10dif_generic ahci libahci virtio_scsi virtio_blk virtio_net net_failover failover xhci_pci crct10dif_pclmul crct10dif_common crc32_pclmul libata crc32c_intel xhci_hcd psmouse i2c_i801 i2c_smbus scsi_mod scsi_common lpc_ich virtio_pci
[ 30.547194] virtio_pci_legacy_dev virtio_pci_modern_dev usbcore usb_common virtio virtio_ring
[ 30.547205] CPU: 0 PID: 697 Comm: smartctl Not tainted 6.1.85 #1
[ 30.547210] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 30.547217] RIP: 0010:scsi_execute_cmd+0x42/0x2c0 [scsi_mod]
[ 30.547247] Code: 55 48 89 fd 53 48 83 ec 10 4c 8b 64 24 50 48 89 0c 24 4d 85 e4 0f 84 02 02 00 00 49 83 3c 24 00 74 24 41 83 7c 24 08 60 74 1c <0f> 0b bd ea ff ff ff 48 83 c4 10 89 e8 5b 5d 41 5c 41 5d 41 5e 41
[ 30.547251] RSP: 0018:ffffa70f80defbd0 EFLAGS: 00010287
[ 30.547256] RAX: ffffa70f80defc30 RBX: ffff9ab18b085000 RCX: 0000000000000000
[ 30.547259] RDX: 0000000000000022 RSI: 0000000000000022 RDI: ffff9ab18b085000
[ 30.547262] RBP: ffff9ab18b085000 R08: 0000000000000000 R09: 00000000000009c4
[ 30.547265] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa70f80defc30
[ 30.547268] R13: 0000000000000000 R14: 00000000000009c4 R15: ffffa70f80defc60
[ 30.547271] FS: 00007f8ee64ad840(0000) GS:ffff9ab1bec00000(0000) knlGS:0000000000000000
[ 30.547275] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 30.547278] CR2: 00007fff08df0bc0 CR3: 000000000439a003 CR4: 0000000000170ef0
[ 30.547291] Call Trace:
[ 30.547296] <TASK>
[ 30.547301] ? __warn+0x7d/0xc0
[ 30.547308] ? scsi_execute_cmd+0x42/0x2c0 [scsi_mod]
[ 30.547338] ? report_bug+0xe2/0x150
[ 30.547348] ? handle_bug+0x41/0x70
[ 30.547354] ? exc_invalid_op+0x13/0x60
[ 30.547358] ? asm_exc_invalid_op+0x16/0x20
[ 30.547368] ? scsi_execute_cmd+0x42/0x2c0 [scsi_mod]
[ 30.547397] ata_cmd_ioctl+0x144/0x2f0 [libata]
[ 30.547448] scsi_ioctl+0x3f5/0x930 [scsi_mod]
[ 30.547477] ? scsi_block_when_processing_errors+0x22/0x100 [scsi_mod]
[ 30.547503] ? __mod_lruvec_page_state+0x93/0x140
[ 30.547508] ? scsi_ioctl_block_when_processing_errors+0x45/0x50 [scsi_mod]
[ 30.547535] blkdev_ioctl+0x133/0x270
[ 30.547553] __x64_sys_ioctl+0x90/0xd0
[ 30.547564] do_syscall_64+0x55/0xb0
[ 30.547574] ? handle_mm_fault+0xdb/0x2d0
[ 30.547582] ? do_user_addr_fault+0x1b0/0x580
[ 30.547589] ? exit_to_user_mode_prepare+0x40/0x1e0
[ 30.547596] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 30.547608] RIP: 0033:0x7f8ee611cc5b
[ 30.547617] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[ 30.547621] RSP: 002b:00007fff08df0960 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 30.547626] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f8ee611cc5b
[ 30.547629] RDX: 00007fff08df0bc0 RSI: 000000000000031f RDI: 0000000000000003
[ 30.547632] RBP: 00007fff08df1040 R08: 0000000000000000 R09: 0000000000000000
[ 30.547634] R10: e7e85eefeeee1b19 R11: 0000000000000246 R12: 000056348ba28600
[ 30.547637] R13: 00007fff08df0bc0 R14: 00007fff08df12e0 R15: 0000000000000000
[ 30.547642] </TASK>
[ 30.547644] ---[ end trace 0000000000000000 ]---
Cheers,
--
Cyril Brulebois ([email protected]) <https://debamax.com/>
D-I release manager -- Release team member -- Freelance Consultant
On Wed, 2024-04-10 at 21:32 +0200, Cyril Brulebois wrote:
> Hi,
>
> Munin uses the following command to get sensor-type information out
> of SMART-aware disks (e.g. temperature):
>
> /usr/sbin/smartctl -A --nocheck=standby -d ata /dev/sda
>
> This broke following an upgrade from v6.1.76 (as found in Debian 12)
> to v6.1.82 (as currently found in the proposed-updates repository for
> the next point release of Debian 12), with smartctl's now reporting:
>
> smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-19-amd64]
> (local build)
> Copyright (C) 2002-22, Bruce Allen, Christian Franke,
> http://www.smartmontools.org
>
> Device is in SLEEP mode, exit(2)
>
> This happens on baremetal with 2 pairs of disks:
> - 2×ST4000VN008-2DR1 (sda, sdb)
> - 2×ST8000VN004-2M21 (sdc, sdd)
>
> and that's an obvious lie with one pair doing system stuff and the
> other
> one doing media stuff.
>
> This also happens within a Debian 12 QEMU VM running on a Debian 12
> libvirt host, when using a SATA disk, which is what I've used to test
> various builds from the stable/linux-6.1.y branch and associated
> tags.
>
> Building stable releases, I pinpointed it as a regression between
> v6.1.80 and v6.1.81, then pinpointed it to commit cf33e6ca12d8.
>
> #regzbot introduced: v6.1.80..v6.1.81
> #regzbot introduced: cf33e6ca12d8
>
> This is also affecting v6.1.84 and v6.1.85 (released during my git
> bisect session).
>
> Reported in Debian via: https://bugs.debian.org/1068675 (which
> included a trace with the distribution-provided v6.1.82 package).
>
> Most recent trace, with v6.1.85 (mainline, using the distribution's
> config but without any patches):
>
> [ 30.547027] ------------[ cut here ]------------
> [ 30.547034] WARNING: CPU: 0 PID: 697 at
> drivers/scsi/scsi_lib.c:214 scsi_execute_cmd+0x42/0x2c0 [scsi_mod]
> [ 30.547082] Modules linked in: tls tun intel_rapl_msr
> intel_rapl_common kvm_intel kvm irqbypass ghash_clmulni_intel
> sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3
> snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg
> snd_intel_sdw_acpi aesni_intel snd_hda_codec crypto_simd cryptd rapl
> snd_hda_core snd_hwdep bochs drm_vram_helper pcspkr drm_ttm_helper
> snd_pcm iTCO_wdt snd_timer intel_pmc_bxt ttm iTCO_vendor_support snd
> watchdog soundcore virtio_console virtio_balloon drm_kms_helper
> button joydev evdev serio_raw sg binfmt_misc fuse loop drm efi_pstore
> dm_mod configfs qemu_fw_cfg virtio_rng ip_tables x_tables autofs4
> ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid sd_mod
> t10_pi crc64_rocksoft crc64 crc_t10dif crct10dif_generic ahci libahci
> virtio_scsi virtio_blk virtio_net net_failover failover xhci_pci
> crct10dif_pclmul crct10dif_common crc32_pclmul libata crc32c_intel
> xhci_hcd psmouse i2c_i801 i2c_smbus scsi_mod scsi_common lpc_ich
> virtio_pci
> [ 30.547194] virtio_pci_legacy_dev virtio_pci_modern_dev
> usbcore usb_common virtio virtio_ring
> [ 30.547205] CPU: 0 PID: 697 Comm: smartctl Not tainted 6.1.85
> #1
> [ 30.547210] Hardware name: QEMU Standard PC (Q35 + ICH9,
> 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 30.547217] RIP: 0010:scsi_execute_cmd+0x42/0x2c0 [scsi_mod]
This is a different manifestation of the same bug in stable that was
introduced by a backport of scsi_execute_cmd. The proposed fix for the
domain validation problem here will also sort out this problem:
https://lore.kernel.org/linux-scsi/[email protected]/
James
James Bottomley <[email protected]> (2024-04-10):
> This is a different manifestation of the same bug in stable that was
> introduced by a backport of scsi_execute_cmd. The proposed fix for the
> domain validation problem here will also sort out this problem:
>
> https://lore.kernel.org/linux-scsi/[email protected]/
Thanks for the pointer! I've just confirmed this, and I'll follow up
there.
Cheers,
--
Cyril Brulebois ([email protected]) <https://debamax.com/>
D-I release manager -- Release team member -- Freelance Consultant