2020-10-15 07:54:10

by Paul Menzel

[permalink] [raw]
Subject: Linux 5.9: smartpqi: controller is offline: status code 0x6100c

Dear Linux folks,


With Linux 5.9 and


$ lspci -nn -s 89:
89:00.0 Serial Attached SCSI controller [0107]: Adaptec Smart
Storage PQI 12G SAS/PCIe 3 [9005:028f] (rev 01)
$ more
/sys/devices/pci0000:88/0000:88:00.0/0000:89:00.0/host15/scsi_host/host15/driver_version
1.2.8-026
$ more
/sys/devices/pci0000:88/0000:88:00.0/0000:89:00.0/host15/scsi_host/host15/firmware_version
2.62-0

the controller went offline with status code 0x6100c.

> Oct 14 14:54:01 done.molgen.mpg.de kernel: smartpqi 0000:89:00.0: controller is offline: status code 0x6100c
> Oct 14 14:54:01 done.molgen.mpg.de kernel: smartpqi 0000:89:00.0: controller offline
> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:2:0: [sdu] tag#709 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=6s
> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:15:0: [sdah] tag#274 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=6s
> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:4:0: [sdw] tag#516 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=6s
> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:4:0: [sdw] tag#516 CDB: Write(10) 2a 00 0d e6 9e 88 00 00 01 00
> Oct 14 14:54:01 done.molgen.mpg.de kernel: blk_update_request: I/O error, dev sdw, sector 1865741376 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:0:0: [sds] tag#529 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=6s
> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:0:0: [sds] tag#529 CDB: Write(10) 2a 00 29 4e e8 ff 00 00 01 00
> Oct 14 14:54:01 done.molgen.mpg.de kernel: blk_update_request: I/O error, dev sds, sector 5544298488 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:0:0: [sds] tag#627 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=6s
> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:0:0: [sds] tag#627 CDB: Read(10) 28 00 5d df 2c 04 00 00 04 00
> Oct 14 14:54:01 done.molgen.mpg.de kernel: blk_update_request: I/O error, dev sds, sector 12599255072 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class
> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:5:0: [sdx] tag#567 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=6s
> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:5:0: [sdx] tag#567 CDB: Write(10) 2a 00 21 4e ce 04 00 00 04 00

How can the status code 0x6100c be deciphered?


Kind regards,

Paul


2020-10-15 12:07:15

by Thomas Kreitler

[permalink] [raw]
Subject: Re: Linux 5.9: smartpqi: controller is offline: status code 0x6100c

Hello Paul,

The meaning behind 0x6100c can be found rather easily.

From drivers/scsi/smartpqi/smartpqi.h comes the main part

#define PQI_DATA_IN_OUT_PCIE_COMPLETION_TIMEOUT 0x61

the rest looks like additional status bytes reported whilst the error is
processed.

My conclusion is that something happened on the PCIe bus.

Best,
Thomas


P.S.. Maybe it's worth mentioning, that the machine in question is
fitted with two Microsemi HBA-1100 controllers.

On 2020-10-14 23:47, Paul Menzel wrote:
> Dear Linux folks,
>
>
> With Linux 5.9 and
>
>
>     $ lspci -nn -s 89:
>     89:00.0 Serial Attached SCSI controller [0107]: Adaptec Smart
> Storage PQI 12G SAS/PCIe 3 [9005:028f] (rev 01)
>     $ more
> /sys/devices/pci0000:88/0000:88:00.0/0000:89:00.0/host15/scsi_host/host15/driver_version
>
>     1.2.8-026
>     $ more
> /sys/devices/pci0000:88/0000:88:00.0/0000:89:00.0/host15/scsi_host/host15/firmware_version
>
>     2.62-0
>
> the controller went offline with status code 0x6100c.
>
>> Oct 14 14:54:01 done.molgen.mpg.de kernel: smartpqi 0000:89:00.0:
>> controller is offline: status code 0x6100c
>> Oct 14 14:54:01 done.molgen.mpg.de kernel: smartpqi 0000:89:00.0:
>> controller offline
>> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:2:0: [sdu] tag#709
>> FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=6s
>> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:15:0: [sdah]
>> tag#274 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
>> cmd_age=6s
>> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:4:0: [sdw] tag#516
>> FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=6s
>> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:4:0: [sdw] tag#516
>> CDB: Write(10) 2a 00 0d e6 9e 88 00 00 01 00
>> Oct 14 14:54:01 done.molgen.mpg.de kernel: blk_update_request: I/O
>> error, dev sdw, sector 1865741376 op 0x1:(WRITE) flags 0x0 phys_seg 1
>> prio class 0
>> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:0:0: [sds] tag#529
>> FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=6s
>> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:0:0: [sds] tag#529
>> CDB: Write(10) 2a 00 29 4e e8 ff 00 00 01 00
>> Oct 14 14:54:01 done.molgen.mpg.de kernel: blk_update_request: I/O
>> error, dev sds, sector 5544298488 op 0x1:(WRITE) flags 0x0 phys_seg 1
>> prio class 0
>> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:0:0: [sds] tag#627
>> FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=6s
>> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:0:0: [sds] tag#627
>> CDB: Read(10) 28 00 5d df 2c 04 00 00 04 00
>> Oct 14 14:54:01 done.molgen.mpg.de kernel: blk_update_request: I/O
>> error, dev sds, sector 12599255072 op 0x0:(READ) flags 0x1000 phys_seg
>> 1 prio class
>> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:5:0: [sdx] tag#567
>> FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=6s
>> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:5:0: [sdx] tag#567
>> CDB: Write(10) 2a 00 21 4e ce 04 00 00 04 00
>
> How can the status code 0x6100c be deciphered?
>
>
> Kind regards,
>
> Paul

--
Thomas Kreitler - Information Retrieval
[email protected]
49/30/8413 1702

2020-10-16 22:39:42

by Don Brace

[permalink] [raw]
Subject: RE: Linux 5.9: smartpqi: controller is offline: status code 0x6100c

The 6100C lockup is the result of the controller running out of commands to process new incoming requests from the driver.

We are actively looking into this issue.

We will keep you posted,
Thanks,
Don

-----Original Message-----
From: Paul Menzel [mailto:[email protected]]
Sent: Wednesday, October 14, 2020 4:47 PM
To: Don Brace <[email protected]>
Cc: James E. J. Bottomley <[email protected]>; Martin K. Petersen <[email protected]>; [email protected]; [email protected]; LKML <[email protected]>; [email protected]
Subject: Linux 5.9: smartpqi: controller is offline: status code 0x6100c

EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe

Dear Linux folks,


With Linux 5.9 and


$ lspci -nn -s 89:
89:00.0 Serial Attached SCSI controller [0107]: Adaptec Smart Storage PQI 12G SAS/PCIe 3 [9005:028f] (rev 01)
$ more
/sys/devices/pci0000:88/0000:88:00.0/0000:89:00.0/host15/scsi_host/host15/driver_version
1.2.8-026
$ more
/sys/devices/pci0000:88/0000:88:00.0/0000:89:00.0/host15/scsi_host/host15/firmware_version
2.62-0

the controller went offline with status code 0x6100c.

> Oct 14 14:54:01 done.molgen.mpg.de kernel: smartpqi 0000:89:00.0:
> controller is offline: status code 0x6100c Oct 14 14:54:01
> done.molgen.mpg.de kernel: smartpqi 0000:89:00.0: controller offline
> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:2:0: [sdu] tag#709
> FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=6s
> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:15:0: [sdah]
> tag#274 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> cmd_age=6s Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:4:0:
> [sdw] tag#516 FAILED Result: hostbyte=DID_NO_CONNECT
> driverbyte=DRIVER_OK cmd_age=6s Oct 14 14:54:01 done.molgen.mpg.de
> kernel: sd 15:0:4:0: [sdw] tag#516 CDB: Write(10) 2a 00 0d e6 9e 88 00
> 00 01 00 Oct 14 14:54:01 done.molgen.mpg.de kernel:
> blk_update_request: I/O error, dev sdw, sector 1865741376 op
> 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 Oct 14 14:54:01
> done.molgen.mpg.de kernel: sd 15:0:0:0: [sds] tag#529 FAILED Result:
> hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=6s Oct 14
> 14:54:01 done.molgen.mpg.de kernel: sd 15:0:0:0: [sds] tag#529 CDB:
> Write(10) 2a 00 29 4e e8 ff 00 00 01 00 Oct 14 14:54:01
> done.molgen.mpg.de kernel: blk_update_request: I/O error, dev sds,
> sector 5544298488 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0 Oct
> 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:0:0: [sds] tag#627
> FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=6s
> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:0:0: [sds] tag#627
> CDB: Read(10) 28 00 5d df 2c 04 00 00 04 00 Oct 14 14:54:01
> done.molgen.mpg.de kernel: blk_update_request: I/O error, dev sds,
> sector 12599255072 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class
> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:5:0: [sdx] tag#567
> FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=6s
> Oct 14 14:54:01 done.molgen.mpg.de kernel: sd 15:0:5:0: [sdx] tag#567
> CDB: Write(10) 2a 00 21 4e ce 04 00 00 04 00

How can the status code 0x6100c be deciphered?


Kind regards,

Paul

2020-11-08 08:13:48

by Paul Menzel

[permalink] [raw]
Subject: Re: Linux 5.9: smartpqi: controller is offline: status code 0x6100c

Dear Don,


Am 17.10.20 um 00:31 schrieb [email protected]:
> The 6100C lockup is the result of the controller running out of
> commands to process new incoming requests from the driver.
>
> We are actively looking into this issue.

Unfortunately, there has not been any further reply by the Microsemi
support, and the driver 1.2.14-016 does not built against Linux 5.9 or
master.

Were you able to reproduce the issue? What is the timeline to getting
this fixed?

Linux 5.10 is going to be a long-term support release, so it would be
great to have the problems fixed as soon as possible.


Kind regards,

Paul