2023-09-14 00:30:28

by John David Anglin

[permalink] [raw]
Subject: Re: [PATCH] linux/export: fix reference to exported functions for parisc64

On 2023-09-13 7:45 p.m., Damien Le Moal wrote:
> On 9/14/23 06:22, John David Anglin wrote:
>> On 2023-09-13 1:58 p.m., John David Anglin wrote:
>>> On 2023-09-12 5:53 p.m., John David Anglin wrote:
>>>> On 2023-09-10 5:30 p.m., John David Anglin wrote:
>>>>> Hi Masahiro,
>>>>>
>>>>> The attached change fixed boot at ddb5cdbafaaa ????
>>>>>
>>>>> However, v6.5.x boot is still broken:
>>>>>
>>>>> Run /init as init process
>>>>> process '/usr/bin/sh' started with executable stack
>>>>> Loading, please wait...
>>>>> Starting systemd-udevd version 254.1-3
>>>>> e1000 alternatives: applied 0 out of 569 patches
>>>>> e1000: Intel(R) PRO/1000 Network Driver
>>>>> e1000: Copyright (c) 1999-2006 Intel Corporation.
>>>>> scsi_mod alternatives: applied 0 out of 7 patches
>>>>> SCSI subsystem initialized
>>>>> usbcore alternatives: applied 0 out of 18 patches
>>>>> usbcore: registered new interface driver usbfs
>>>>> libata alternatives: applied 0 out of 3 patches
>>>>> usbcore: registered new interface driver hub
>>>>> usbcore: registered new device driver usb
>>>>> mptbase alternatives: applied 0 out of 73 patches
>>>>> ehci_hcd alternatives: applied 0 out of 114 patches
>>>>> sata_sil24 alternatives: applied 0 out of 56 patches
>>>>> Fusion MPT base driver 3.04.20
>>>>> Copyright (c) 1999-2008 LSI Corporation
>>>>> sata_sil24 0000:00:01.0: Applying completion IRQ loss on PCI-X errata fix
>>>>> scsi host0: sata_sil24
>>>>> scsi host1: sata_sil24
>>>>> pata_sil680 0000:60:02.0: sil680: 133MHz clock.
>>>>> scsi host2: sata_sil24
>>>>> ehci_pci alternatives: applied 0 out of 2 patches
>>>>> ohci_hcd alternatives: applied 0 out of 144 patches
>>>>> ehci-pci 0000:60:01.2: EHCI Host Controller
>>>>> scsi host3: pata_sil680
>>>>> ehci-pci 0000:60:01.2: new USB bus registered, assigned bus number 1
>>>>> scsi host4: sata_sil24
>>>>> ata1: SATA max UDMA/100 host m128@0xffffffff80088000 port 0xffffffff80080000 ir6
>>>>> ata2: SATA max UDMA/100 host m128@0xffffffff80088000 port 0xffffffff80082000 ir6
>>>>> ata3: SATA max UDMA/100 host m128@0xffffffff80088000 port 0xffffffff80084000 ir6
>>>>> ata4: SATA max UDMA/100 host m128@0xffffffff80088000 port 0xffffffff80086000 ir6
>>>>> e1000 0000:60:03.0 eth0: (PCI:33MHz:32-bit) 00:11:0a:31:8a:77
>>>>> ehci-pci 0000:60:01.2: irq 71, io mem 0xffffffffb00a1000
>>>>> scsi host5: pata_sil680
>>>>> ata5: PATA max UDMA/133 cmd 0x26058 ctl 0x26064 bmdma 0x26040 irq 72
>>>>> ata6: PATA max UDMA/133 cmd 0x26050 ctl 0x26060 bmdma 0x26048 irq 72
>>>>> e1000 0000:60:03.0 eth0: Intel(R) PRO/1000 Network Connection
>>>>> ehci-pci 0000:60:01.2: USB 2.0 started, EHCI 0.95
>>>>> usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.05
>>>>> usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
>>>>> usb usb1: Product: EHCI Host Controller
>>>>> usb usb1: Manufacturer: Linux 6.5.2-dirty ehci_hcd
>>>>> usb usb1: SerialNumber: 0000:60:01.2
>>>>> hub 1-0:1.0: USB hub found
>>>>> hub 1-0:1.0: 5 ports detected
>>>>> ata1: SATA link down (SStatus 0 SControl 0)
>>>>> ata2: SATA link down (SStatus 0 SControl 0)
>>>>> ata3: SATA link down (SStatus 0 SControl 0)
>>>>> ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
>>>>> ata4.00: ATA-10: ST4000VN008-2DR166, SC60, max UDMA/133
>>>>> ata4.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32)
>>>>> ata4.00: configured for UDMA/100
>>>>> scsi 4:0:0:0: Direct-Access     ATA      ST4000VN008-2DR1 SC60 PQ: 0 ANSI: 5
>>>>> ata6.00: ATAPI: HL-DT-STDVD+-RW GSA-H21L, 1.04, max UDMA/44
>>>>> scsi 5:0:0:0: CD-ROM            HL-DT-ST DVD+-RW GSA-H21L 1.04 PQ: 0 ANSI: 5
>>>>> random: crng init done
>>>>> Timed out for waiting the udev queue being empty.
>>>>> Begin: Loading essential drivers ... done.
>>>>> Begin: Running /scripts/init-premount ... done.
>>>>> Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
>>>>> Begin: Running /scripts/local-premount ... done.
>>>>> Timed out for waiting the udev queue being empty.
>>>>> Begin: Waiting for root file system ... Begin: Running /scripts/local-block ....
>>>>> Begin: Running /scripts/local-block ... done.
>>>>> Begin: Running /scripts/local-block ... done.
>>>>> Begin: Running /scripts/local-block ... done.
>>>>> Begin: Running /scripts/local-block ... done.
>>>>> Begin: Running /scripts/local-block ... done.
>>>>> Begin: Running /scripts/local-block ... done.
>>>>> Begin: Running /scripts/local-block ... done.
>>>>> Begin: Running /scripts/local-block ... done.
>>>>> Begin: Running /scripts/local-block ... done.
>>>>> Begin: Running /scripts/local-block ... done.
>>>>> Begin: Running /scripts/local-block ... done.
>>>>> Begin: Running /scripts/local-block ... done.
>>>>> Begin: Running /scripts/local-block ... done.
>>>>> Begin: Running /scripts/local-block ... done.
>>>>> Begin: Running /scripts/local-block ... done.
>>>>> Begin: Running /scripts/local-block ... done.
>>>>> Begin: Running /scripts/local-block ... done.
>>>>> Begin: Running /scripts/local-block ... done.
>>>>> done.
>>>>> Gave up waiting for root file system device.  Common problems:
>>>>>  - Boot args (cat /proc/cmdline)
>>>>>    - Check rootdelay= (did the system wait long enough?)
>>>>>  - Missing modules (cat /proc/modules; ls /dev)
>>>>> ALERT!  LABEL=ROOT does not exist.  Dropping to a shell!
>>>>> Rebooting automatically due to panic= boot argument
>>>>>
>>>>> I'll see if I can find the commit that breaks 6.5.
>>>> I've traced this to the following merge commit:
>>>>
>>>> dave@atlas:~/linux/linux$ git bisect good
>>>> ca7ce08d6a063e0ccb91dc57f9bc213120d0d1a7 is the first bad commit
>>>> commit ca7ce08d6a063e0ccb91dc57f9bc213120d0d1a7
>>>> Merge: 1546cd4bfda4 af92c02fb209
>>>> Author: Linus Torvalds <[email protected]>
>>>> Date:   Fri Jun 30 11:57:07 2023 -0700
>>>>
>>>>     Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
>>>>
>>>>     Pull SCSI updates from James Bottomley:
>>>>      "Updates to the usual drivers (ufs, pm80xx, libata-scsi, smartpqi,
>>>>       lpfc, qla2xxx).
>>>>
>>>>       We have a couple of major core changes impacting other systems:
>>>>
>>>>        - Command Duration Limits, which spills into block and ATA
>>>>
>>>>        - block level Persistent Reservation Operations, which touches block,
>>>>          nvme, target and dm
>>>>
>>>>       Both of these are added with merge commits containing a cover letter
>>>>       explaining what's going on"
>>>>
>>>>     * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (187 commits)
>>>>       scsi: core: Improve warning message in scsi_device_block()
>>>>       scsi: core: Replace scsi_target_block() with scsi_block_targets()
>>>>       scsi: core: Don't wait for quiesce in scsi_device_block()
>>>>       scsi: core: Don't wait for quiesce in scsi_stop_queue()
>>>>       scsi: core: Merge scsi_internal_device_block() and device_block()
>>>>       scsi: sg: Increase number of devices
>>>>       scsi: bsg: Increase number of devices
>>>>       scsi: qla2xxx: Remove unused nvme_ls_waitq wait queue
>>>>       scsi: ufs: ufs-pci: Add support for Intel Arrow Lake
>>>>       scsi: sd: sd_zbc: Use PAGE_SECTORS_SHIFT
>>>>       scsi: ufs: wb: Add explicit flush_threshold sysfs attribute
>>>>       scsi: ufs: ufs-qcom: Switch to the new ICE API
>>>>       scsi: ufs: dt-bindings: qcom: Add ICE phandle
>>>>       scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_RTC quirk
>>>>       scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_INTR quirk
>>>>       scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_RTC
>>>>       scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_INTR
>>>>       scsi: ufs: core: Remove dedicated hwq for dev command
>>>>       scsi: ufs: core: mcq: Fix the incorrect OCS value for the device command
>>>>       scsi: ufs: dt-bindings: samsung,exynos: Drop unneeded quotes
>>>>       ...
>>>>
>>>> dave@atlas:~/linux/linux$ lspci
>>>> 00:01.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller (rev 02)
>>>> 40:01.0 SCSI storage controller: Broadcom / LSI 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
>>>> 40:01.1 SCSI storage controller: Broadcom / LSI 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
>>>> 60:01.0 USB controller: NEC Corporation OHCI USB Controller (rev 41)
>>>> 60:01.1 USB controller: NEC Corporation OHCI USB Controller (rev 41)
>>>> 60:01.2 USB controller: NEC Corporation uPD72010x USB 2.0 Controller (rev 02)
>>>> 60:02.0 IDE interface: Silicon Image, Inc. PCI0680 Ultra ATA-133 Host Controller (rev 02)
>>>> 60:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02)
>>> This was introduced by the following commit:
>>>
>>> dave@atlas:~/linux/linux$ git bisect good
>>> 624885209f31eb9985bf51abe204ecbffe2fdeea is the first bad commit
>>> commit 624885209f31eb9985bf51abe204ecbffe2fdeea
>>> Author: Damien Le Moal <[email protected]>
>>> Date:   Thu May 11 03:13:41 2023 +0200
>>>
>>>     scsi: core: Detect support for command duration limits
>>>
>>>     Introduce the function scsi_cdl_check() to detect if a device supports
>>>     command duration limits (CDL). Support for the READ 16, WRITE 16, READ 32
>>>     and WRITE 32 commands are checked using the function scsi_report_opcode()
>>>     to probe the rwcdlp and cdlp bits as they indicate the mode page defining
>>>     the command duration limits descriptors that apply to the command being
>>>     tested.
>>>
>>>     If any of these commands support CDL, the field cdl_supported of struct
>>>     scsi_device is set to 1 to indicate that the device supports CDL.
>>>
>>>     Support for CDL for a device is advertizes through sysfs using the new
>>>     cdl_supported device attribute. This attribute value is 1 for a device
>>>     supporting CDL and 0 otherwise.
>>>
>>>     Signed-off-by: Damien Le Moal <[email protected]>
>>>     Reviewed-by: Hannes Reinecke <[email protected]>
>>>     Co-developed-by: Niklas Cassel <[email protected]>
>>>     Signed-off-by: Niklas Cassel <[email protected]>
>>>     Link: https://lore.kernel.org/r/[email protected]
>>>     Signed-off-by: Martin K. Petersen <[email protected]>
>>>
>>>  Documentation/ABI/testing/sysfs-block-device |  9 ++++
>>>  drivers/scsi/scsi.c                          | 81 ++++++++++++++++++++++++++++
>>>  drivers/scsi/scsi_scan.c                     |  3 ++
>>>  drivers/scsi/scsi_sysfs.c                    |  2 +
>>>  include/scsi/scsi_device.h                   |  3 ++
>>>  5 files changed, 98 insertions(+)
>>>
>>> Sometimes I see when booting a bad commit:
>>> [...]
>>> Begin: Running /scripts/local-block ... done.
>>> Begin: Running /scripts/local-block ... done.
>>> Begin: Running /scripts/local-block ... done.
>>> done.
>>> Gave up waiting for root file system device.  Common problems:
>>>  - Boot args (cat /proc/cmdline)
>>>    - Check rootdelay= (did the system wait long enough?)
>>>  - Missing modules (cat /proc/modules; ls /dev)
>>> ALERT!  LABEL=ROOT does not exist.  Dropping to a shell!
>>> Rebooting automatically due to panic= boot argument
>>> ata4: SATA link down (SStatus 0 SControl 0)
>>> ata5: SATA link down (SStatus 0 SControl 0)
>>> ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
>>> ata6.00: ATA-10: ST4000VN008-2DR166, SC60, max UDMA/133
>>> ata6.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32)
>>> ata6.00: configured for UDMA/100
>>> scsi 5:0:0:0: Direct-Access     ATA      ST4000VN008-2DR1 SC60 PQ: 0 ANSI: 5
>> System boots master at e56b2b605799 if I disable CDL:
>>
>> dave@atlas:~/linux/linux$ git diff drivers/scsi/scsi.c
>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
>> index d0911bc28663..dc3a283ebd75 100644
>> --- a/drivers/scsi/scsi.c
>> +++ b/drivers/scsi/scsi.c
>> @@ -578,6 +578,8 @@ static bool scsi_cdl_check_cmd(struct scsi_device *sdev, u8 opcode, u16 sa,
>>         int ret;
>>         u8 cdlp;
>>
>> +       return false;
>> +
>>         /* Check operation code */
>>         ret = scsi_report_opcode(sdev, buf, SCSI_CDL_CHECK_BUF_LEN, opcode, sa);
>>         if (ret <= 0)
> It is weird that this solves anything... the MAINTENANCE_IN command issued by
> scsi_report_opcode() ends up being emulated in libata with
> ata_scsiop_maint_in(). There are no actual commands issued to the drive, so
> nothing that could actually fail/cause issues. By the time this is issued, the
> ATA drive is also fully probed...
>
> Or is the drive connected to the Broadcom HBA you have ? In that case, libata is
> not used and the HBA FW SAT (scsi-ata-translation) is likely to blame.
/boot, / and swap partitions reside on a ST373207LW drive connected to a Broadcom HBA.  A
ST4000VN008-2DR1 drive is connected to the  Silicon Image, Inc. SiI 3124 PCI-X Serial
ATA Controller.  It mounts on /home.  There's also a cdrom connected to the Silicon
Image, Inc. PCI0680 Ultra ATA-133 Host Controller and another ST4000VN008-2DR1 drive
connected to a Broadcom HBA.  There are two Broadcom HBAs.

I think the issue is with the root ST373207LW drive.  The console output indicates that the
ROOT drive doesn't exist when the boot fails.

Your change only appeared to affect actual SCSI drives.  That's why I tried disabling CDL.
>
> Could you send a full dmesg output for a clean boot and for a failed one so that
> I can compare ?
I'll try to get this together tomorrow.

Dave

--
John David Anglin [email protected]


2023-09-14 01:17:30

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH] linux/export: fix reference to exported functions for parisc64

On 9/14/23 09:29, John David Anglin wrote:
> On 2023-09-13 7:45 p.m., Damien Le Moal wrote:
>> On 9/14/23 06:22, John David Anglin wrote:
>>> On 2023-09-13 1:58 p.m., John David Anglin wrote:
>>>> On 2023-09-12 5:53 p.m., John David Anglin wrote:
>>>>> On 2023-09-10 5:30 p.m., John David Anglin wrote:
>>>>>> Hi Masahiro,
>>>>>>
>>>>>> The attached change fixed boot at ddb5cdbafaaa ????
>>>>>>
>>>>>> However, v6.5.x boot is still broken:
>>>>>>
>>>>>> Run /init as init process
>>>>>> process '/usr/bin/sh' started with executable stack
>>>>>> Loading, please wait...
>>>>>> Starting systemd-udevd version 254.1-3
>>>>>> e1000 alternatives: applied 0 out of 569 patches
>>>>>> e1000: Intel(R) PRO/1000 Network Driver
>>>>>> e1000: Copyright (c) 1999-2006 Intel Corporation.
>>>>>> scsi_mod alternatives: applied 0 out of 7 patches
>>>>>> SCSI subsystem initialized
>>>>>> usbcore alternatives: applied 0 out of 18 patches
>>>>>> usbcore: registered new interface driver usbfs
>>>>>> libata alternatives: applied 0 out of 3 patches
>>>>>> usbcore: registered new interface driver hub
>>>>>> usbcore: registered new device driver usb
>>>>>> mptbase alternatives: applied 0 out of 73 patches
>>>>>> ehci_hcd alternatives: applied 0 out of 114 patches
>>>>>> sata_sil24 alternatives: applied 0 out of 56 patches
>>>>>> Fusion MPT base driver 3.04.20
>>>>>> Copyright (c) 1999-2008 LSI Corporation
>>>>>> sata_sil24 0000:00:01.0: Applying completion IRQ loss on PCI-X errata fix
>>>>>> scsi host0: sata_sil24
>>>>>> scsi host1: sata_sil24
>>>>>> pata_sil680 0000:60:02.0: sil680: 133MHz clock.
>>>>>> scsi host2: sata_sil24
>>>>>> ehci_pci alternatives: applied 0 out of 2 patches
>>>>>> ohci_hcd alternatives: applied 0 out of 144 patches
>>>>>> ehci-pci 0000:60:01.2: EHCI Host Controller
>>>>>> scsi host3: pata_sil680
>>>>>> ehci-pci 0000:60:01.2: new USB bus registered, assigned bus number 1
>>>>>> scsi host4: sata_sil24
>>>>>> ata1: SATA max UDMA/100 host m128@0xffffffff80088000 port 0xffffffff80080000 ir6
>>>>>> ata2: SATA max UDMA/100 host m128@0xffffffff80088000 port 0xffffffff80082000 ir6
>>>>>> ata3: SATA max UDMA/100 host m128@0xffffffff80088000 port 0xffffffff80084000 ir6
>>>>>> ata4: SATA max UDMA/100 host m128@0xffffffff80088000 port 0xffffffff80086000 ir6
>>>>>> e1000 0000:60:03.0 eth0: (PCI:33MHz:32-bit) 00:11:0a:31:8a:77
>>>>>> ehci-pci 0000:60:01.2: irq 71, io mem 0xffffffffb00a1000
>>>>>> scsi host5: pata_sil680
>>>>>> ata5: PATA max UDMA/133 cmd 0x26058 ctl 0x26064 bmdma 0x26040 irq 72
>>>>>> ata6: PATA max UDMA/133 cmd 0x26050 ctl 0x26060 bmdma 0x26048 irq 72
>>>>>> e1000 0000:60:03.0 eth0: Intel(R) PRO/1000 Network Connection
>>>>>> ehci-pci 0000:60:01.2: USB 2.0 started, EHCI 0.95
>>>>>> usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.05
>>>>>> usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
>>>>>> usb usb1: Product: EHCI Host Controller
>>>>>> usb usb1: Manufacturer: Linux 6.5.2-dirty ehci_hcd
>>>>>> usb usb1: SerialNumber: 0000:60:01.2
>>>>>> hub 1-0:1.0: USB hub found
>>>>>> hub 1-0:1.0: 5 ports detected
>>>>>> ata1: SATA link down (SStatus 0 SControl 0)
>>>>>> ata2: SATA link down (SStatus 0 SControl 0)
>>>>>> ata3: SATA link down (SStatus 0 SControl 0)
>>>>>> ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
>>>>>> ata4.00: ATA-10: ST4000VN008-2DR166, SC60, max UDMA/133
>>>>>> ata4.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32)
>>>>>> ata4.00: configured for UDMA/100
>>>>>> scsi 4:0:0:0: Direct-Access     ATA      ST4000VN008-2DR1 SC60 PQ: 0 ANSI: 5
>>>>>> ata6.00: ATAPI: HL-DT-STDVD+-RW GSA-H21L, 1.04, max UDMA/44
>>>>>> scsi 5:0:0:0: CD-ROM            HL-DT-ST DVD+-RW GSA-H21L 1.04 PQ: 0 ANSI: 5
>>>>>> random: crng init done
>>>>>> Timed out for waiting the udev queue being empty.
>>>>>> Begin: Loading essential drivers ... done.
>>>>>> Begin: Running /scripts/init-premount ... done.
>>>>>> Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
>>>>>> Begin: Running /scripts/local-premount ... done.
>>>>>> Timed out for waiting the udev queue being empty.
>>>>>> Begin: Waiting for root file system ... Begin: Running /scripts/local-block ....
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> Begin: Running /scripts/local-block ... done.
>>>>>> done.
>>>>>> Gave up waiting for root file system device.  Common problems:
>>>>>>  - Boot args (cat /proc/cmdline)
>>>>>>    - Check rootdelay= (did the system wait long enough?)
>>>>>>  - Missing modules (cat /proc/modules; ls /dev)
>>>>>> ALERT!  LABEL=ROOT does not exist.  Dropping to a shell!
>>>>>> Rebooting automatically due to panic= boot argument
>>>>>>
>>>>>> I'll see if I can find the commit that breaks 6.5.
>>>>> I've traced this to the following merge commit:
>>>>>
>>>>> dave@atlas:~/linux/linux$ git bisect good
>>>>> ca7ce08d6a063e0ccb91dc57f9bc213120d0d1a7 is the first bad commit
>>>>> commit ca7ce08d6a063e0ccb91dc57f9bc213120d0d1a7
>>>>> Merge: 1546cd4bfda4 af92c02fb209
>>>>> Author: Linus Torvalds <[email protected]>
>>>>> Date:   Fri Jun 30 11:57:07 2023 -0700
>>>>>
>>>>>     Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
>>>>>
>>>>>     Pull SCSI updates from James Bottomley:
>>>>>      "Updates to the usual drivers (ufs, pm80xx, libata-scsi, smartpqi,
>>>>>       lpfc, qla2xxx).
>>>>>
>>>>>       We have a couple of major core changes impacting other systems:
>>>>>
>>>>>        - Command Duration Limits, which spills into block and ATA
>>>>>
>>>>>        - block level Persistent Reservation Operations, which touches block,
>>>>>          nvme, target and dm
>>>>>
>>>>>       Both of these are added with merge commits containing a cover letter
>>>>>       explaining what's going on"
>>>>>
>>>>>     * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (187 commits)
>>>>>       scsi: core: Improve warning message in scsi_device_block()
>>>>>       scsi: core: Replace scsi_target_block() with scsi_block_targets()
>>>>>       scsi: core: Don't wait for quiesce in scsi_device_block()
>>>>>       scsi: core: Don't wait for quiesce in scsi_stop_queue()
>>>>>       scsi: core: Merge scsi_internal_device_block() and device_block()
>>>>>       scsi: sg: Increase number of devices
>>>>>       scsi: bsg: Increase number of devices
>>>>>       scsi: qla2xxx: Remove unused nvme_ls_waitq wait queue
>>>>>       scsi: ufs: ufs-pci: Add support for Intel Arrow Lake
>>>>>       scsi: sd: sd_zbc: Use PAGE_SECTORS_SHIFT
>>>>>       scsi: ufs: wb: Add explicit flush_threshold sysfs attribute
>>>>>       scsi: ufs: ufs-qcom: Switch to the new ICE API
>>>>>       scsi: ufs: dt-bindings: qcom: Add ICE phandle
>>>>>       scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_RTC quirk
>>>>>       scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_INTR quirk
>>>>>       scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_RTC
>>>>>       scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_INTR
>>>>>       scsi: ufs: core: Remove dedicated hwq for dev command
>>>>>       scsi: ufs: core: mcq: Fix the incorrect OCS value for the device command
>>>>>       scsi: ufs: dt-bindings: samsung,exynos: Drop unneeded quotes
>>>>>       ...
>>>>>
>>>>> dave@atlas:~/linux/linux$ lspci
>>>>> 00:01.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial ATA Controller (rev 02)
>>>>> 40:01.0 SCSI storage controller: Broadcom / LSI 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
>>>>> 40:01.1 SCSI storage controller: Broadcom / LSI 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)
>>>>> 60:01.0 USB controller: NEC Corporation OHCI USB Controller (rev 41)
>>>>> 60:01.1 USB controller: NEC Corporation OHCI USB Controller (rev 41)
>>>>> 60:01.2 USB controller: NEC Corporation uPD72010x USB 2.0 Controller (rev 02)
>>>>> 60:02.0 IDE interface: Silicon Image, Inc. PCI0680 Ultra ATA-133 Host Controller (rev 02)
>>>>> 60:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02)
>>>> This was introduced by the following commit:
>>>>
>>>> dave@atlas:~/linux/linux$ git bisect good
>>>> 624885209f31eb9985bf51abe204ecbffe2fdeea is the first bad commit
>>>> commit 624885209f31eb9985bf51abe204ecbffe2fdeea
>>>> Author: Damien Le Moal <[email protected]>
>>>> Date:   Thu May 11 03:13:41 2023 +0200
>>>>
>>>>     scsi: core: Detect support for command duration limits
>>>>
>>>>     Introduce the function scsi_cdl_check() to detect if a device supports
>>>>     command duration limits (CDL). Support for the READ 16, WRITE 16, READ 32
>>>>     and WRITE 32 commands are checked using the function scsi_report_opcode()
>>>>     to probe the rwcdlp and cdlp bits as they indicate the mode page defining
>>>>     the command duration limits descriptors that apply to the command being
>>>>     tested.
>>>>
>>>>     If any of these commands support CDL, the field cdl_supported of struct
>>>>     scsi_device is set to 1 to indicate that the device supports CDL.
>>>>
>>>>     Support for CDL for a device is advertizes through sysfs using the new
>>>>     cdl_supported device attribute. This attribute value is 1 for a device
>>>>     supporting CDL and 0 otherwise.
>>>>
>>>>     Signed-off-by: Damien Le Moal <[email protected]>
>>>>     Reviewed-by: Hannes Reinecke <[email protected]>
>>>>     Co-developed-by: Niklas Cassel <[email protected]>
>>>>     Signed-off-by: Niklas Cassel <[email protected]>
>>>>     Link: https://lore.kernel.org/r/[email protected]
>>>>     Signed-off-by: Martin K. Petersen <[email protected]>
>>>>
>>>>  Documentation/ABI/testing/sysfs-block-device |  9 ++++
>>>>  drivers/scsi/scsi.c                          | 81 ++++++++++++++++++++++++++++
>>>>  drivers/scsi/scsi_scan.c                     |  3 ++
>>>>  drivers/scsi/scsi_sysfs.c                    |  2 +
>>>>  include/scsi/scsi_device.h                   |  3 ++
>>>>  5 files changed, 98 insertions(+)
>>>>
>>>> Sometimes I see when booting a bad commit:
>>>> [...]
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> Begin: Running /scripts/local-block ... done.
>>>> done.
>>>> Gave up waiting for root file system device.  Common problems:
>>>>  - Boot args (cat /proc/cmdline)
>>>>    - Check rootdelay= (did the system wait long enough?)
>>>>  - Missing modules (cat /proc/modules; ls /dev)
>>>> ALERT!  LABEL=ROOT does not exist.  Dropping to a shell!
>>>> Rebooting automatically due to panic= boot argument
>>>> ata4: SATA link down (SStatus 0 SControl 0)
>>>> ata5: SATA link down (SStatus 0 SControl 0)
>>>> ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
>>>> ata6.00: ATA-10: ST4000VN008-2DR166, SC60, max UDMA/133
>>>> ata6.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32)
>>>> ata6.00: configured for UDMA/100
>>>> scsi 5:0:0:0: Direct-Access     ATA      ST4000VN008-2DR1 SC60 PQ: 0 ANSI: 5
>>> System boots master at e56b2b605799 if I disable CDL:
>>>
>>> dave@atlas:~/linux/linux$ git diff drivers/scsi/scsi.c
>>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
>>> index d0911bc28663..dc3a283ebd75 100644
>>> --- a/drivers/scsi/scsi.c
>>> +++ b/drivers/scsi/scsi.c
>>> @@ -578,6 +578,8 @@ static bool scsi_cdl_check_cmd(struct scsi_device *sdev, u8 opcode, u16 sa,
>>>         int ret;
>>>         u8 cdlp;
>>>
>>> +       return false;
>>> +
>>>         /* Check operation code */
>>>         ret = scsi_report_opcode(sdev, buf, SCSI_CDL_CHECK_BUF_LEN, opcode, sa);
>>>         if (ret <= 0)
>> It is weird that this solves anything... the MAINTENANCE_IN command issued by
>> scsi_report_opcode() ends up being emulated in libata with
>> ata_scsiop_maint_in(). There are no actual commands issued to the drive, so
>> nothing that could actually fail/cause issues. By the time this is issued, the
>> ATA drive is also fully probed...
>>
>> Or is the drive connected to the Broadcom HBA you have ? In that case, libata is
>> not used and the HBA FW SAT (scsi-ata-translation) is likely to blame.
> /boot, / and swap partitions reside on a ST373207LW drive connected to a Broadcom HBA.  A
> ST4000VN008-2DR1 drive is connected to the  Silicon Image, Inc. SiI 3124 PCI-X Serial
> ATA Controller.  It mounts on /home.  There's also a cdrom connected to the Silicon
> Image, Inc. PCI0680 Ultra ATA-133 Host Controller and another ST4000VN008-2DR1 drive
> connected to a Broadcom HBA.  There are two Broadcom HBAs.
>
> I think the issue is with the root ST373207LW drive.  The console output indicates that the
> ROOT drive doesn't exist when the boot fails.
>
> Your change only appeared to affect actual SCSI drives.  That's why I tried disabling CDL.
>>
>> Could you send a full dmesg output for a clean boot and for a failed one so that
>> I can compare ?
> I'll try to get this together tomorrow.

Please also tell me the scsi_level reported for that drive (cat
/sys/block/sdX/device/scsi_level and output of sg_inq /dev/sdX).

Thanks !

>
> Dave
>

--
Damien Le Moal
Western Digital Research

2023-09-14 11:25:59

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH] linux/export: fix reference to exported functions for parisc64

On 9/14/23 09:29, John David Anglin wrote:
>>> dave@atlas:~/linux/linux$ git diff drivers/scsi/scsi.c
>>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
>>> index d0911bc28663..dc3a283ebd75 100644
>>> --- a/drivers/scsi/scsi.c
>>> +++ b/drivers/scsi/scsi.c
>>> @@ -578,6 +578,8 @@ static bool scsi_cdl_check_cmd(struct scsi_device *sdev, u8 opcode, u16 sa,
>>>         int ret;
>>>         u8 cdlp;
>>>
>>> +       return false;
>>> +
>>>         /* Check operation code */
>>>         ret = scsi_report_opcode(sdev, buf, SCSI_CDL_CHECK_BUF_LEN, opcode, sa);
>>>         if (ret <= 0)
>> It is weird that this solves anything... the MAINTENANCE_IN command issued by
>> scsi_report_opcode() ends up being emulated in libata with
>> ata_scsiop_maint_in(). There are no actual commands issued to the drive, so
>> nothing that could actually fail/cause issues. By the time this is issued, the
>> ATA drive is also fully probed...
>>
>> Or is the drive connected to the Broadcom HBA you have ? In that case, libata is
>> not used and the HBA FW SAT (scsi-ata-translation) is likely to blame.
> /boot, / and swap partitions reside on a ST373207LW drive connected to a Broadcom HBA.  A
> ST4000VN008-2DR1 drive is connected to the  Silicon Image, Inc. SiI 3124 PCI-X Serial
> ATA Controller.  It mounts on /home.  There's also a cdrom connected to the Silicon
> Image, Inc. PCI0680 Ultra ATA-133 Host Controller and another ST4000VN008-2DR1 drive
> connected to a Broadcom HBA.  There are two Broadcom HBAs.
>
> I think the issue is with the root ST373207LW drive.  The console output indicates that the
> ROOT drive doesn't exist when the boot fails.
>
> Your change only appeared to affect actual SCSI drives.  That's why I tried disabling CDL.

OK. I can see from the dmesg snippets you sent that the drives on the ATA ports
seem OK. A quick search tells me that the ST373207LW drive is a Ultra320 SCSI
drive, not ATA. So that MAINTENANCE_IN command issued by scsi_report_opcode()
will straight as-is.

This command has been issued to devices since a long time ago, and given that
your system was working, the drive is probably fine with it in its simplest form
(one command format). CDL changes however added probing command support with the
service action field (One command format with service action). And what may be
happening is that the drive does not like/does not support that format and
chokes on it.

Let me check the specs to see what scsi level support this format. What is sure
is that Ultra320 SCSI disks will definitely *not* support CDL, so we could exit
early in scsi_cdl_check_cmd() returning false for drives with an old scsi level
support.

Let me send something along these lines.

>>
>> Could you send a full dmesg output for a clean boot and for a failed one so that
>> I can compare ?
> I'll try to get this together tomorrow.
>
> Dave
>

--
Damien Le Moal
Western Digital Research

2023-09-14 12:12:59

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH] linux/export: fix reference to exported functions for parisc64

On 9/14/23 09:29, John David Anglin wrote:
> I think the issue is with the root ST373207LW drive.  The console output indicates that the
> ROOT drive doesn't exist when the boot fails.
>
> Your change only appeared to affect actual SCSI drives.  That's why I tried disabling CDL.
>>
>> Could you send a full dmesg output for a clean boot and for a failed one so that
>> I can compare ?
> I'll try to get this together tomorrow.

Please try the attached patch. That should address the issue with your drive.


--
Damien Le Moal
Western Digital Research


Attachments:
0001-scsi-Do-no-try-to-probe-for-CDL-on-old-drives.patch (3.16 kB)

2023-09-14 15:11:02

by John David Anglin

[permalink] [raw]
Subject: Re: [PATCH] linux/export: fix reference to exported functions for parisc64

On 2023-09-13 10:24 p.m., Damien Le Moal wrote:
> On 9/14/23 09:29, John David Anglin wrote:
>> I think the issue is with the root ST373207LW drive.  The console output indicates that the
>> ROOT drive doesn't exist when the boot fails.
>>
>> Your change only appeared to affect actual SCSI drives.  That's why I tried disabling CDL.
>>> Could you send a full dmesg output for a clean boot and for a failed one so that
>>> I can compare ?
>> I'll try to get this together tomorrow.
> Please try the attached patch. That should address the issue with your drive.
Mainline and v6.5.3 both booted successfully with the attached patch.

Thanks,
Dave

--
John David Anglin [email protected]

2023-09-14 22:00:00

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH] linux/export: fix reference to exported functions for parisc64

On 9/15/23 00:07, John David Anglin wrote:
> On 2023-09-13 10:24 p.m., Damien Le Moal wrote:
>> On 9/14/23 09:29, John David Anglin wrote:
>>> I think the issue is with the root ST373207LW drive.  The console output indicates that the
>>> ROOT drive doesn't exist when the boot fails.
>>>
>>> Your change only appeared to affect actual SCSI drives.  That's why I tried disabling CDL.
>>>> Could you send a full dmesg output for a clean boot and for a failed one so that
>>>> I can compare ?
>>> I'll try to get this together tomorrow.
>> Please try the attached patch. That should address the issue with your drive.
> Mainline and v6.5.3 both booted successfully with the attached patch.

Great ! Thanks for testing. Posting the patch.

>
> Thanks,
> Dave
>

--
Damien Le Moal
Western Digital Research