2022-05-26 03:54:22

by Tony Battersby

[permalink] [raw]
Subject: [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable

#regzbot introduced: 44c57f205876

I have several different QLogic FC HBAs (8, 16, 32 Gbps) and several
different FC LTO tape drives (IBM Ultrium 8 & 9).  When I plug in the FC
cable, the tape drive shows up as a SCSI device as expected.  With older
kernels, when I unplug the FC cable, the tape drive SCSI device would
disappear after about 30 seconds.  But with newer kernels (including
5.18), when I unplug the FC cable, the tape drive SCSI device never
disappears.  I have bisected the change in behavior to the following
commit in kernel 5.15:

44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target")

This commit has been backported to various -stable kernels, so they are
also affected.

When testing with two different tape drives:
1) Plug FC cable into tape drive A.  Tape drive A shows up as a SCSI device.
2) Unplug FC cable; wait 60 seconds.  Tape drive A does not disappear.
3) Plug FC cable into tape drive B.  Tape drive A disappears 30 seconds
later, but tape drive B does not show up.
4) Unplug FC cable and plug it back into tape drive B.  Tape drive B
shows up as a SCSI device.

So I can actually make a tape drive disappear by plugging the cable into
a different tape drive, but then I have to reseat the cable again to
make the new tape drive show up.

lspci -n
83:00.0 0c04: 1077:2031 (rev 02)
83:00.1 0c04: 1077:2031 (rev 02)

When plugging in cable:
qla2xxx [0000:83:00.1]-500a:7: LOOP UP detected (8 Gbps).

When unplugging cable with old kernel:
qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
rport-7:0-2: blocked FC remote port time out: removing target and saving binding

When unplugging cable with new kernel:
qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).

/sys/class/fc_remote_ports/rport-*/
dev_loss_tmo: 30
supported_classes: Class 3
port_state: Online
(port_state remains Online even when FC cable unplugged)

/proc/scsi/scsi
Host: scsi7 Channel: 00 Id: 01 Lun: 00
Vendor: IBM Model: ULTRIUM-HH8 Rev: K4K1
Type: Sequential-Access ANSI SCSI revision: 06

Tony Battersby
Cybernetics



2022-05-28 20:39:45

by Arun Easi

[permalink] [raw]
Subject: Re: [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable

Hi Tony,

Thanks for reporting the issue. We are trying to recreate this issue in
house. I will reach out to you for logs, if we cannot repro. Typically, we
get sufficient context to the issue when the problem is reproduced with
module parameter "ql2xextended_error_logging=1".

Anyway, I will let you know the status.

Regards,
-Arun

On Wed, 25 May 2022, 3:03pm, Tony Battersby wrote:

> #regzbot introduced: 44c57f205876
>
> I have several different QLogic FC HBAs (8, 16, 32 Gbps) and several
> different FC LTO tape drives (IBM Ultrium 8 & 9).? When I plug in the FC
> cable, the tape drive shows up as a SCSI device as expected.? With older
> kernels, when I unplug the FC cable, the tape drive SCSI device would
> disappear after about 30 seconds.? But with newer kernels (including
> 5.18), when I unplug the FC cable, the tape drive SCSI device never
> disappears.? I have bisected the change in behavior to the following
> commit in kernel 5.15:
>
> 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target")
>
> This commit has been backported to various -stable kernels, so they are
> also affected.
>
> When testing with two different tape drives:
> 1) Plug FC cable into tape drive A.? Tape drive A shows up as a SCSI device.
> 2) Unplug FC cable; wait 60 seconds.? Tape drive A does not disappear.
> 3) Plug FC cable into tape drive B.? Tape drive A disappears 30 seconds
> later, but tape drive B does not show up.
> 4) Unplug FC cable and plug it back into tape drive B.? Tape drive B
> shows up as a SCSI device.
>
> So I can actually make a tape drive disappear by plugging the cable into
> a different tape drive, but then I have to reseat the cable again to
> make the new tape drive show up.
>
> lspci -n
> 83:00.0 0c04: 1077:2031 (rev 02)
> 83:00.1 0c04: 1077:2031 (rev 02)
>
> When plugging in cable:
> qla2xxx [0000:83:00.1]-500a:7: LOOP UP detected (8 Gbps).
>
> When unplugging cable with old kernel:
> qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
> rport-7:0-2: blocked FC remote port time out: removing target and saving binding
>
> When unplugging cable with new kernel:
> qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
>
> /sys/class/fc_remote_ports/rport-*/
> dev_loss_tmo: 30
> supported_classes: Class 3
> port_state: Online
> (port_state remains Online even when FC cable unplugged)
>
> /proc/scsi/scsi
> Host: scsi7 Channel: 00 Id: 01 Lun: 00
> Vendor: IBM Model: ULTRIUM-HH8 Rev: K4K1
> Type: Sequential-Access ANSI SCSI revision: 06
>
> Tony Battersby
> Cybernetics
>
>

Subject: Re: [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable

Hi!

On 28.05.22 02:27, Arun Easi wrote:
> Hi Tony,
>
> Thanks for reporting the issue. We are trying to recreate this issue in
> house. I will reach out to you for logs, if we cannot repro. Typically, we
> get sufficient context to the issue when the problem is reproduced with
> module parameter "ql2xextended_error_logging=1".
>
> Anyway, I will let you know the status.

What's the status here? Tony, did you provide the info Arun asked for=
Or was some progress made somehow without it?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

> On Wed, 25 May 2022, 3:03pm, Tony Battersby wrote:
>
>> #regzbot introduced: 44c57f205876
>>
>> I have several different QLogic FC HBAs (8, 16, 32 Gbps) and several
>> different FC LTO tape drives (IBM Ultrium 8 & 9).  When I plug in the FC
>> cable, the tape drive shows up as a SCSI device as expected.  With older
>> kernels, when I unplug the FC cable, the tape drive SCSI device would
>> disappear after about 30 seconds.  But with newer kernels (including
>> 5.18), when I unplug the FC cable, the tape drive SCSI device never
>> disappears.  I have bisected the change in behavior to the following
>> commit in kernel 5.15:
>>
>> 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target")
>>
>> This commit has been backported to various -stable kernels, so they are
>> also affected.
>>
>> When testing with two different tape drives:
>> 1) Plug FC cable into tape drive A.  Tape drive A shows up as a SCSI device.
>> 2) Unplug FC cable; wait 60 seconds.  Tape drive A does not disappear.
>> 3) Plug FC cable into tape drive B.  Tape drive A disappears 30 seconds
>> later, but tape drive B does not show up.
>> 4) Unplug FC cable and plug it back into tape drive B.  Tape drive B
>> shows up as a SCSI device.
>>
>> So I can actually make a tape drive disappear by plugging the cable into
>> a different tape drive, but then I have to reseat the cable again to
>> make the new tape drive show up.
>>
>> lspci -n
>> 83:00.0 0c04: 1077:2031 (rev 02)
>> 83:00.1 0c04: 1077:2031 (rev 02)
>>
>> When plugging in cable:
>> qla2xxx [0000:83:00.1]-500a:7: LOOP UP detected (8 Gbps).
>>
>> When unplugging cable with old kernel:
>> qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
>> rport-7:0-2: blocked FC remote port time out: removing target and saving binding
>>
>> When unplugging cable with new kernel:
>> qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
>>
>> /sys/class/fc_remote_ports/rport-*/
>> dev_loss_tmo: 30
>> supported_classes: Class 3
>> port_state: Online
>> (port_state remains Online even when FC cable unplugged)
>>
>> /proc/scsi/scsi
>> Host: scsi7 Channel: 00 Id: 01 Lun: 00
>> Vendor: IBM Model: ULTRIUM-HH8 Rev: K4K1
>> Type: Sequential-Access ANSI SCSI revision: 06
>>
>> Tony Battersby
>> Cybernetics
>>

2022-06-20 15:43:31

by Tony Battersby

[permalink] [raw]
Subject: Re: [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable

On 6/20/22 02:56, Thorsten Leemhuis wrote:
> Hi!
>
> On 28.05.22 02:27, Arun Easi wrote:
>> Hi Tony,
>>
>> Thanks for reporting the issue. We are trying to recreate this issue in
>> house. I will reach out to you for logs, if we cannot repro. Typically, we
>> get sufficient context to the issue when the problem is reproduced with
>> module parameter "ql2xextended_error_logging=1".
>>
>> Anyway, I will let you know the status.
> What's the status here? Tony, did you provide the info Arun asked for=
> Or was some progress made somehow without it?
>
I was waiting for them to reproduce the problem, or ask for logs. 
Anyway, here are the logs they said they might want:

modprobe qla2xxx ql2xextended_error_logging=1

Jun 20 13:51:46 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 10.02.07.400-k-debug.
Jun 20 13:51:46 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-011c: : MSI-X vector count: 32.
Jun 20 13:51:46 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-001d: : Found an ISP2031 irq 37 iobase 0x00000000875338f3.
Jun 20 13:51:46 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00c6:6: MSI-X: Using 26 vectors
Jun 20 13:51:47 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-0075:6: ZIO mode 6 enabled; timer delay (200 us).
Jun 20 13:51:47 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-ffff:6: FC4 priority set to NVMe
Jun 20 13:51:48 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-507b:6: SFP detect: Short-Range SFP (nvr=0 ll=40 lr=0 lrd=0).
Jun 20 13:51:50 tony13 kern.info kernel: scsi host6: qla2xxx
Jun 20 13:51:50 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4000:6: DPC handler sleeping.
Jun 20 13:51:51 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-400f:6: Loop resync scheduled.
Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00fb:6: QLogic QLE2672 - QLE2672 QLogic 2-port 16Gb Fibre Channel Adapter.
Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00fc:6: ISP2031: PCIe (8.0GT/s x8) @ 0000:83:00.0 hdma- host#=6 fw=8.07.12 (d0d5).
Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-011c: : MSI-X vector count: 32.
Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-001d: : Found an ISP2031 irq 135 iobase 0x000000007cfbf1eb.
Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00c6:7: MSI-X: Using 26 vectors
Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-0075:7: ZIO mode 6 enabled; timer delay (200 us).
Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-ffff:7: FC4 priority set to NVMe
Jun 20 13:51:57 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-507b:7: SFP detect: Short-Range SFP (nvr=0 ll=40 lr=0 lrd=0).
Jun 20 13:51:58 tony13 kern.info kernel: scsi host7: qla2xxx
Jun 20 13:51:58 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.
Jun 20 13:51:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400f:7: Loop resync scheduled.
Jun 20 13:52:03 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00fb:7: QLogic QLE2672 - QLE2672 QLogic 2-port 16Gb Fibre Channel Adapter.
Jun 20 13:52:03 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00fc:7: ISP2031: PCIe (8.0GT/s x8) @ 0000:83:00.1 hdma- host#=7 fw=8.07.12 (d0d5).
Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-8038:6: Cable is unplugged...
Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-803a:6: fw_state=4 (7, 141, 0, 800 0) curr time=ffffc764.
Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-206c:6: qla2x00_loop_resync *** FAILED ***.
Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4010:6: Loop resync end.
Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4000:6: DPC handler sleeping.
Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-8038:7: Cable is unplugged...
Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-803a:7: fw_state=4 (7, 141, 0, 800 0) curr time=ffffcaac.
Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-206c:7: qla2x00_loop_resync *** FAILED ***.
Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4010:7: Loop resync end.
Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.

Plug in cable

Jun 20 13:52:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5009:7: LIP occurred (0).
Jun 20 13:52:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b7:7: Format 0 : Number of VPs setup 254, number of VPs acquired 1.
Jun 20 13:52:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b8:7: Primary port id 0000ef.
Jun 20 13:52:58 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-500a:7: LOOP UP detected (8 Gbps).
Jun 20 13:52:58 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5012:7: Port database changed ffff 0006 0000.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400f:7: Loop resync scheduled.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-8037:7: F/W Ready - OK.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-803a:7: fw_state=3 (7, 141, 0, 800 0) curr time=ffffd9e8.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-200b:7: HBA in NL topology.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2014:7: Configure loop -- dpc flags = 0x112e0.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2011:7: Entries in ID list (1).
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20d8:7: qla24xx_fcport_handle_login 50:00:e1:11:c5:27:f0:70 DS 0 LS 7 P 0 fl 0 confl 0000000000000000 rscn 0|0 login 0 lid 0 scan 2 fc4type 0
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-307b:7: qla_chk_n2n_b4_login 50:00:e1:11:c5:27:f0:70 DS 0 LS 7 lid 0 retries=30
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20bf:7: qla_chk_n2n_b4_login 1628 50:00:e1:11:c5:27:f0:70 post login
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2069:7: LOOP READY.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-206b:7: qla2x00_configure_loop: exiting normally. local port wwpn 2100000e1e2227a1 id 0000ef)
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4010:7: Loop resync end.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: DELETED to LOGIN_PEND - portid=000026.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2072:7: Async-login - 50:00:e1:11:c5:27:f0:70 hdl=2, loopid=0 portid=000026 retries=29 .
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5036:7: Async-login complete: handle=2 pid=000026 wwpn=50:00:e1:11:c5:27:f0:70 iop0=312
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20dd:7: qla2x00_async_login_sp_done 50:00:e1:11:c5:27:f0:70 res 0
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-ffff:7: qla24xx_handle_plogi_done_event 50:00:e1:11:c5:27:f0:70 DS 3 LS 7 rc 0 login 0|0 rscn 0|0 data 4000|0 iop 0|0
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ea:7: qla24xx_handle_plogi_done_event 2232 50:00:e1:11:c5:27:f0:70 LoopID 0x0 in use with 000026. post gpdb
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: LOGIN_PEND to GPDB - portid=000026.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20dc:7: Async-gpdb 50:00:e1:11:c5:27:f0:70 hndl 0 opt 0
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20db:7: Async done-gpdb res 0, WWPN 50:00:e1:11:c5:27:f0:70 mb[1]=0 mb[2]=ffdd
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20d2:7: qla24xx_handle_gpdb_event 50:00:e1:11:c5:27:f0:70 DS 5 LS 6 fc4_type 0 rc 0
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2062:7: 50:00:e1:11:c5:27:f0:70 SVC Param w3 0312
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: GPDB to UPD_FCPORT - portid=000026.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ef:7: qla2x00_update_fcport 50:00:e1:11:c5:27:f0:70
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: UPD_FCPORT to UPD_FCPORT - portid=000026.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-207d:7: FCPort 50:00:e1:11:c5:27:f0:70 state transitioned from UNCONFIGURED to ONLINE - portid=000026.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ee:7: qla2x00_reg_remote_port: 5000e111c527f070. rport 7:0:0 (00000000bd175424) is tgt mode
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: UPD_FCPORT to LOGIN_COMPLETE - portid=000026.
Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-911e:7: qla_register_fcport_fn rscn gen 0/0 next DS 0
Jun 20 13:52:59 tony13 kern.notice kernel: scsi 7:0:0:0: Sequential-Access IBM ULTRIUM-HH8 K4K1 PQ: 0 ANSI: 6
Jun 20 13:52:59 tony13 kern.notice kernel: scsi 7:0:0:0: Attached scsi generic sg1 type 1
Jun 20 13:52:59 tony13 kern.info kernel: st: Version 20160209, fixed bufsize 32768, s/g segs 256
Jun 20 13:52:59 tony13 kern.notice kernel: st 7:0:0:0: Attached scsi tape st0
Jun 20 13:52:59 tony13 kern.info kernel: st 7:0:0:0: st0: try direct i/o: yes (alignment 8 B)

Unplug cable

Jun 20 13:54:03 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
Jun 20 13:54:03 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20f1:7: Mark all dev lost

sleep 180

cat /proc/scsi/scsi
Attached devices:
Host: scsi7 Channel: 00 Id: 00 Lun: 00
Vendor: IBM Model: ULTRIUM-HH8 Rev: K4K1
Type: Sequential-Access ANSI SCSI revision: 06

rmmod qla2xxx

Jun 20 14:00:18 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-211a:7: Delaying session delete for FCP2 flags 0x4 port_type = 0x4 port_id=000026 50
Jun 20 14:00:18 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-b079:7: Removing driver
Jun 20 14:00:18 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00af:7: Performing ISP error recovery - ha=00000000c584070c.
Jun 20 14:00:18 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20f1:7: Mark all dev lost
Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-211a:7: Delaying session delete for FCP2 flags 0x4 port_type = 0x4 port_id=000026 50
Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4011:7: DPC handler exiting.
Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-207d:7: FCPort 50:00:e1:11:c5:27:f0:70 state transitioned from ONLINE to DEAD - portid=000026.
Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-b079:6: Removing driver
Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00af:6: Performing ISP error recovery - ha=000000008a8560d1.
Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-20f1:6: Mark all dev lost
Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4011:6: DPC handler exiting.

Tony Battersby
Cybernetics

> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>
> P.S.: As the Linux kernel's regression tracker I deal with a lot of
> reports and sometimes miss something important when writing mails like
> this. If that's the case here, don't hesitate to tell me in a public
> reply, it's in everyone's interest to set the public record straight.
>
>> On Wed, 25 May 2022, 3:03pm, Tony Battersby wrote:
>>
>>> #regzbot introduced: 44c57f205876
>>>
>>> I have several different QLogic FC HBAs (8, 16, 32 Gbps) and several
>>> different FC LTO tape drives (IBM Ultrium 8 & 9).  When I plug in the FC
>>> cable, the tape drive shows up as a SCSI device as expected.  With older
>>> kernels, when I unplug the FC cable, the tape drive SCSI device would
>>> disappear after about 30 seconds.  But with newer kernels (including
>>> 5.18), when I unplug the FC cable, the tape drive SCSI device never
>>> disappears.  I have bisected the change in behavior to the following
>>> commit in kernel 5.15:
>>>
>>> 44c57f205876 ("scsi: qla2xxx: Changes to support FCP2 Target")
>>>
>>> This commit has been backported to various -stable kernels, so they are
>>> also affected.
>>>
>>> When testing with two different tape drives:
>>> 1) Plug FC cable into tape drive A.  Tape drive A shows up as a SCSI device.
>>> 2) Unplug FC cable; wait 60 seconds.  Tape drive A does not disappear.
>>> 3) Plug FC cable into tape drive B.  Tape drive A disappears 30 seconds
>>> later, but tape drive B does not show up.
>>> 4) Unplug FC cable and plug it back into tape drive B.  Tape drive B
>>> shows up as a SCSI device.
>>>
>>> So I can actually make a tape drive disappear by plugging the cable into
>>> a different tape drive, but then I have to reseat the cable again to
>>> make the new tape drive show up.
>>>
>>> lspci -n
>>> 83:00.0 0c04: 1077:2031 (rev 02)
>>> 83:00.1 0c04: 1077:2031 (rev 02)
>>>
>>> When plugging in cable:
>>> qla2xxx [0000:83:00.1]-500a:7: LOOP UP detected (8 Gbps).
>>>
>>> When unplugging cable with old kernel:
>>> qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
>>> rport-7:0-2: blocked FC remote port time out: removing target and saving binding
>>>
>>> When unplugging cable with new kernel:
>>> qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
>>>
>>> /sys/class/fc_remote_ports/rport-*/
>>> dev_loss_tmo: 30
>>> supported_classes: Class 3
>>> port_state: Online
>>> (port_state remains Online even when FC cable unplugged)
>>>
>>> /proc/scsi/scsi
>>> Host: scsi7 Channel: 00 Id: 01 Lun: 00
>>> Vendor: IBM Model: ULTRIUM-HH8 Rev: K4K1
>>> Type: Sequential-Access ANSI SCSI revision: 06
>>>
>>> Tony Battersby
>>> Cybernetics
>>>

2022-06-21 22:18:14

by Arun Easi

[permalink] [raw]
Subject: Re: [EXT] Re: [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable

Hi Tony,

Apologies for the delay. This was not forgotten, there were repro attempts
and discussions happening internally.

See further comments inline..

On Mon, 20 Jun 2022, 7:33am, Tony Battersby wrote:

>
> ----------------------------------------------------------------------
> On 6/20/22 02:56, Thorsten Leemhuis wrote:
> > Hi!
> >
> > On 28.05.22 02:27, Arun Easi wrote:
> >> Hi Tony,
> >>
> >> Thanks for reporting the issue. We are trying to recreate this issue in
> >> house. I will reach out to you for logs, if we cannot repro. Typically, we
> >> get sufficient context to the issue when the problem is reproduced with
> >> module parameter "ql2xextended_error_logging=1".
> >>
> >> Anyway, I will let you know the status.
> > What's the status here? Tony, did you provide the info Arun asked for=
> > Or was some progress made somehow without it?
> >
> I was waiting for them to reproduce the problem, or ask for logs. 
> Anyway, here are the logs they said they might want:
>
> modprobe qla2xxx ql2xextended_error_logging=1
>
> Jun 20 13:51:46 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 10.02.07.400-k-debug.
> Jun 20 13:51:46 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-011c: : MSI-X vector count: 32.
> Jun 20 13:51:46 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-001d: : Found an ISP2031 irq 37 iobase 0x00000000875338f3.
> Jun 20 13:51:46 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00c6:6: MSI-X: Using 26 vectors
> Jun 20 13:51:47 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-0075:6: ZIO mode 6 enabled; timer delay (200 us).
> Jun 20 13:51:47 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-ffff:6: FC4 priority set to NVMe
> Jun 20 13:51:48 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-507b:6: SFP detect: Short-Range SFP (nvr=0 ll=40 lr=0 lrd=0).
> Jun 20 13:51:50 tony13 kern.info kernel: scsi host6: qla2xxx
> Jun 20 13:51:50 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4000:6: DPC handler sleeping.
> Jun 20 13:51:51 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-400f:6: Loop resync scheduled.
> Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00fb:6: QLogic QLE2672 - QLE2672 QLogic 2-port 16Gb Fibre Channel Adapter.
> Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00fc:6: ISP2031: PCIe (8.0GT/s x8) @ 0000:83:00.0 hdma- host#=6 fw=8.07.12 (d0d5).
> Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-011c: : MSI-X vector count: 32.
> Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-001d: : Found an ISP2031 irq 135 iobase 0x000000007cfbf1eb.
> Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00c6:7: MSI-X: Using 26 vectors
> Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-0075:7: ZIO mode 6 enabled; timer delay (200 us).
> Jun 20 13:51:55 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-ffff:7: FC4 priority set to NVMe
> Jun 20 13:51:57 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-507b:7: SFP detect: Short-Range SFP (nvr=0 ll=40 lr=0 lrd=0).
> Jun 20 13:51:58 tony13 kern.info kernel: scsi host7: qla2xxx
> Jun 20 13:51:58 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.
> Jun 20 13:51:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400f:7: Loop resync scheduled.
> Jun 20 13:52:03 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00fb:7: QLogic QLE2672 - QLE2672 QLogic 2-port 16Gb Fibre Channel Adapter.
> Jun 20 13:52:03 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00fc:7: ISP2031: PCIe (8.0GT/s x8) @ 0000:83:00.1 hdma- host#=7 fw=8.07.12 (d0d5).
> Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-8038:6: Cable is unplugged...
> Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-803a:6: fw_state=4 (7, 141, 0, 800 0) curr time=ffffc764.
> Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-206c:6: qla2x00_loop_resync *** FAILED ***.
> Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4010:6: Loop resync end.
> Jun 20 13:52:11 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4000:6: DPC handler sleeping.
> Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-8038:7: Cable is unplugged...
> Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-803a:7: fw_state=4 (7, 141, 0, 800 0) curr time=ffffcaac.
> Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-206c:7: qla2x00_loop_resync *** FAILED ***.
> Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4010:7: Loop resync end.
> Jun 20 13:52:20 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.
>
> Plug in cable
>
> Jun 20 13:52:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5009:7: LIP occurred (0).
> Jun 20 13:52:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b7:7: Format 0 : Number of VPs setup 254, number of VPs acquired 1.
> Jun 20 13:52:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b8:7: Primary port id 0000ef.
> Jun 20 13:52:58 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-500a:7: LOOP UP detected (8 Gbps).
> Jun 20 13:52:58 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5012:7: Port database changed ffff 0006 0000.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400f:7: Loop resync scheduled.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-8037:7: F/W Ready - OK.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-803a:7: fw_state=3 (7, 141, 0, 800 0) curr time=ffffd9e8.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-200b:7: HBA in NL topology.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2014:7: Configure loop -- dpc flags = 0x112e0.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2011:7: Entries in ID list (1).
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20d8:7: qla24xx_fcport_handle_login 50:00:e1:11:c5:27:f0:70 DS 0 LS 7 P 0 fl 0 confl 0000000000000000 rscn 0|0 login 0 lid 0 scan 2 fc4type 0
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-307b:7: qla_chk_n2n_b4_login 50:00:e1:11:c5:27:f0:70 DS 0 LS 7 lid 0 retries=30
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20bf:7: qla_chk_n2n_b4_login 1628 50:00:e1:11:c5:27:f0:70 post login
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2069:7: LOOP READY.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-206b:7: qla2x00_configure_loop: exiting normally. local port wwpn 2100000e1e2227a1 id 0000ef)
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4010:7: Loop resync end.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: DELETED to LOGIN_PEND - portid=000026.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2072:7: Async-login - 50:00:e1:11:c5:27:f0:70 hdl=2, loopid=0 portid=000026 retries=29 .
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5036:7: Async-login complete: handle=2 pid=000026 wwpn=50:00:e1:11:c5:27:f0:70 iop0=312
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20dd:7: qla2x00_async_login_sp_done 50:00:e1:11:c5:27:f0:70 res 0
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-ffff:7: qla24xx_handle_plogi_done_event 50:00:e1:11:c5:27:f0:70 DS 3 LS 7 rc 0 login 0|0 rscn 0|0 data 4000|0 iop 0|0
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ea:7: qla24xx_handle_plogi_done_event 2232 50:00:e1:11:c5:27:f0:70 LoopID 0x0 in use with 000026. post gpdb
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: LOGIN_PEND to GPDB - portid=000026.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20dc:7: Async-gpdb 50:00:e1:11:c5:27:f0:70 hndl 0 opt 0
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20db:7: Async done-gpdb res 0, WWPN 50:00:e1:11:c5:27:f0:70 mb[1]=0 mb[2]=ffdd
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20d2:7: qla24xx_handle_gpdb_event 50:00:e1:11:c5:27:f0:70 DS 5 LS 6 fc4_type 0 rc 0
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2062:7: 50:00:e1:11:c5:27:f0:70 SVC Param w3 0312
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: GPDB to UPD_FCPORT - portid=000026.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ef:7: qla2x00_update_fcport 50:00:e1:11:c5:27:f0:70
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: UPD_FCPORT to UPD_FCPORT - portid=000026.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-207d:7: FCPort 50:00:e1:11:c5:27:f0:70 state transitioned from UNCONFIGURED to ONLINE - portid=000026.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ee:7: qla2x00_reg_remote_port: 5000e111c527f070. rport 7:0:0 (00000000bd175424) is tgt mode
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: UPD_FCPORT to LOGIN_COMPLETE - portid=000026.
> Jun 20 13:52:59 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-911e:7: qla_register_fcport_fn rscn gen 0/0 next DS 0
> Jun 20 13:52:59 tony13 kern.notice kernel: scsi 7:0:0:0: Sequential-Access IBM ULTRIUM-HH8 K4K1 PQ: 0 ANSI: 6
> Jun 20 13:52:59 tony13 kern.notice kernel: scsi 7:0:0:0: Attached scsi generic sg1 type 1
> Jun 20 13:52:59 tony13 kern.info kernel: st: Version 20160209, fixed bufsize 32768, s/g segs 256
> Jun 20 13:52:59 tony13 kern.notice kernel: st 7:0:0:0: Attached scsi tape st0
> Jun 20 13:52:59 tony13 kern.info kernel: st 7:0:0:0: st0: try direct i/o: yes (alignment 8 B)
>
> Unplug cable
>
> Jun 20 13:54:03 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
> Jun 20 13:54:03 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20f1:7: Mark all dev lost
>
> sleep 180
>
> cat /proc/scsi/scsi
> Attached devices:
> Host: scsi7 Channel: 00 Id: 00 Lun: 00
> Vendor: IBM Model: ULTRIUM-HH8 Rev: K4K1
> Type: Sequential-Access ANSI SCSI revision: 06

Thanks for the info. Just to reiterate, you've reported two issues (though
this log was showing only 1 of them).

Issue 1 - Tape device never disappears when removed
Issue 2 - When a direct connected tape 1 was replaced with tape 2, tape 2
was not discovered.

For Issue-2, please try the attached patch. This may not be the final fix,
but wanted to check if that would fix the issue for you.

For Issue-1, the behavior was intentional, though that behavior needs
refinement. These tape drives support something called FC sequence level
error recovery (added in FCP-2), which can make tape I/Os survive even
across a short cable pull. This is not a simple retry of the I/O, rather a
retry done at the FC sequence level that gives the IO a better chance of
revival. In other words, the said patch that caused regression, while
introduces an incorrect reporting of the state of the device, makes backup
more resilient.

Now, onto the behavior when device state is reported immediately. What we
have observed, at least with one tape drive from a major vendor, is that,
across a device loss and device back case with both the events reported to
upper layers, the backup operation was getting failed. This is due to a
REPORT LUNS command being issued during device reappearance reporting
(fc_remote_port_add -> SCSI scan), which the tape drive was not expecting
and caused the backup to fail.

I know that some tape drives do not support multiple commands to it at the
same time, but not sure if that is still the norm these days.

So, perhaps one way to make the behavior better, is to either report the
disappearing device a bit delayed or have intelligence added in SCSI scan
to detect ongoing tape IO operations and delay/avoid the REPORT LUNs.
Former is a more contained (in the LLD) fix.

Regards,
-Arun

>
> rmmod qla2xxx
>
> Jun 20 14:00:18 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-211a:7: Delaying session delete for FCP2 flags 0x4 port_type = 0x4 port_id=000026 50
> Jun 20 14:00:18 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-b079:7: Removing driver
> Jun 20 14:00:18 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00af:7: Performing ISP error recovery - ha=00000000c584070c.
> Jun 20 14:00:18 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20f1:7: Mark all dev lost
> Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-211a:7: Delaying session delete for FCP2 flags 0x4 port_type = 0x4 port_id=000026 50
> Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4011:7: DPC handler exiting.
> Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-207d:7: FCPort 50:00:e1:11:c5:27:f0:70 state transitioned from ONLINE to DEAD - portid=000026.
> Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-b079:6: Removing driver
> Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00af:6: Performing ISP error recovery - ha=000000008a8560d1.
> Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-20f1:6: Mark all dev lost
> Jun 20 14:00:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4011:6: DPC handler exiting.
>


Attachments:
0001-qla2xxx-Fix-discovery-issues-in-FC-AL-topology.patch (3.69 kB)

2022-06-22 15:05:34

by Tony Battersby

[permalink] [raw]
Subject: Re: [EXT] Re: [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable

On 6/21/22 18:05, Arun Easi wrote:
> Thanks for the info. Just to reiterate, you've reported two issues (though
> this log was showing only 1 of them).
>
> Issue 1 - Tape device never disappears when removed
> Issue 2 - When a direct connected tape 1 was replaced with tape 2, tape 2
> was not discovered.
>
> For Issue-2, please try the attached patch. This may not be the final fix,
> but wanted to check if that would fix the issue for you.
>
> For Issue-1, the behavior was intentional, though that behavior needs
> refinement. These tape drives support something called FC sequence level
> error recovery (added in FCP-2), which can make tape I/Os survive even
> across a short cable pull. This is not a simple retry of the I/O, rather a
> retry done at the FC sequence level that gives the IO a better chance of
> revival. In other words, the said patch that caused regression, while
> introduces an incorrect reporting of the state of the device, makes backup
> more resilient.
>
> Now, onto the behavior when device state is reported immediately. What we
> have observed, at least with one tape drive from a major vendor, is that,
> across a device loss and device back case with both the events reported to
> upper layers, the backup operation was getting failed. This is due to a
> REPORT LUNS command being issued during device reappearance reporting
> (fc_remote_port_add -> SCSI scan), which the tape drive was not expecting
> and caused the backup to fail.
>
> I know that some tape drives do not support multiple commands to it at the
> same time, but not sure if that is still the norm these days.
>
> So, perhaps one way to make the behavior better, is to either report the
> disappearing device a bit delayed or have intelligence added in SCSI scan
> to detect ongoing tape IO operations and delay/avoid the REPORT LUNs.
> Former is a more contained (in the LLD) fix.
>
> Regards,
> -Arun

Your patch does fix Issue-2 for me.  For Issue-1, it would be fine with
me if qla2xxx reported device removal to the upper level a bit delayed,
as you said.

Below is a log of the test for Issue-2 with your patch applied with
kernel 5.18.2:

modprobe qla2xxx ql2xextended_error_logging=1

Jun 22 14:08:23 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 10.02.07.400-k-debug.
Jun 22 14:08:23 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-011c: : MSI-X vector count: 32.
Jun 22 14:08:23 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-001d: : Found an ISP2031 irq 37 iobase 0x00000000d7dfad2d.
Jun 22 14:08:23 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00c6:6: MSI-X: Using 26 vectors
Jun 22 14:08:23 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-0075:6: ZIO mode 6 enabled; timer delay (200 us).
Jun 22 14:08:23 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-ffff:6: FC4 priority set to NVMe
Jun 22 14:08:24 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-507b:6: SFP detect: Short-Range SFP (nvr=0 ll=40 lr=0 lrd=0).
Jun 22 14:08:26 tony13 kern.info kernel: scsi host6: qla2xxx
Jun 22 14:08:26 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4000:6: DPC handler sleeping.
Jun 22 14:08:27 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-400f:6: Loop resync scheduled.
Jun 22 14:08:31 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00fb:6: QLogic QLE2672 - QLE2672 QLogic 2-port 16Gb Fibre Channel Adapter.
Jun 22 14:08:31 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00fc:6: ISP2031: PCIe (8.0GT/s x8) @ 0000:83:00.0 hdma- host#=6 fw=8.07.12 (d0d5).
Jun 22 14:08:31 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-011c: : MSI-X vector count: 32.
Jun 22 14:08:31 tony13 kern.warn kernel: qla2xxx [0000:00:00.0]-001d: : Found an ISP2031 irq 135 iobase 0x0000000083e02c30.
Jun 22 14:08:31 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00c6:7: MSI-X: Using 26 vectors
Jun 22 14:08:31 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-0075:7: ZIO mode 6 enabled; timer delay (200 us).
Jun 22 14:08:31 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-ffff:7: FC4 priority set to NVMe
Jun 22 14:08:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-507b:7: SFP detect: Short-Range SFP (nvr=0 ll=40 lr=0 lrd=0).
Jun 22 14:08:34 tony13 kern.info kernel: scsi host7: qla2xxx
Jun 22 14:08:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.
Jun 22 14:08:36 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400f:7: Loop resync scheduled.
Jun 22 14:08:39 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00fb:7: QLogic QLE2672 - QLE2672 QLogic 2-port 16Gb Fibre Channel Adapter.
Jun 22 14:08:39 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00fc:7: ISP2031: PCIe (8.0GT/s x8) @ 0000:83:00.1 hdma- host#=7 fw=8.07.12 (d0d5).
Jun 22 14:08:47 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-8038:6: Cable is unplugged...
Jun 22 14:08:47 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-803a:6: fw_state=4 (7, 141, 0, 800 0) curr time=10000dd24.
Jun 22 14:08:47 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-206c:6: qla2x00_loop_resync *** FAILED ***.
Jun 22 14:08:47 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4010:6: Loop resync end.
Jun 22 14:08:47 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4000:6: DPC handler sleeping.
Jun 22 14:08:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-8038:7: Cable is unplugged...
Jun 22 14:08:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-803a:7: fw_state=4 (7, 141, 0, 800 0) curr time=10000e075.
Jun 22 14:08:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-206c:7: qla2x00_loop_resync *** FAILED ***.
Jun 22 14:08:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4010:7: Loop resync end.
Jun 22 14:08:56 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.

Plug in cable to tape drive A

Jun 22 14:09:32 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5009:7: LIP occurred (0).
Jun 22 14:09:32 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b7:7: Format 0 : Number of VPs setup 254, number of VPs acquired 1.
Jun 22 14:09:32 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b8:7: Primary port id 0000ef.
Jun 22 14:09:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-500a:7: LOOP UP detected (8 Gbps).
Jun 22 14:09:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5012:7: Port database changed ffff 0006 0000.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400f:7: Loop resync scheduled.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-8037:7: F/W Ready - OK.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-803a:7: fw_state=3 (7, 141, 0, 800 0) curr time=10000efb0.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-200b:7: HBA in NL topology.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2014:7: Configure loop -- dpc flags = 0x112e0.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2011:7: Entries in ID list (1).
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20d8:7: qla24xx_fcport_handle_login 50:00:e1:11:c5:27:f0:70 DS 0 LS 7 P 0 fl 0 confl 0000000000000000 rscn 0|0 login 0 lid 0 scan 2 fc4type 0
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-307b:7: qla_chk_n2n_b4_login 50:00:e1:11:c5:27:f0:70 DS 0 LS 7 lid 0 retries=30
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20bf:7: qla_chk_n2n_b4_login 1628 50:00:e1:11:c5:27:f0:70 post login
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2069:7: LOOP READY.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-206b:7: qla2x00_configure_loop: exiting normally. local port wwpn 2100000e1e2227a1 id 0000ef)
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4010:7: Loop resync end.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: DELETED to LOGIN_PEND - portid=000026.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2072:7: Async-login - 50:00:e1:11:c5:27:f0:70 hdl=2, loopid=0 portid=000026 retries=29 .
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5036:7: Async-login complete: handle=2 pid=000026 wwpn=50:00:e1:11:c5:27:f0:70 iop0=312
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20dd:7: qla2x00_async_login_sp_done 50:00:e1:11:c5:27:f0:70 res 0
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-ffff:7: qla24xx_handle_plogi_done_event 50:00:e1:11:c5:27:f0:70 DS 3 LS 7 rc 0 login 0|0 rscn 0|0 data 4000|0 iop 0|0
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ea:7: qla24xx_handle_plogi_done_event 2232 50:00:e1:11:c5:27:f0:70 LoopID 0x0 in use with 000026. post gpdb
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: LOGIN_PEND to GPDB - portid=000026.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20dc:7: Async-gpdb 50:00:e1:11:c5:27:f0:70 hndl 0 opt 0
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20db:7: Async done-gpdb res 0, WWPN 50:00:e1:11:c5:27:f0:70 mb[1]=0 mb[2]=ffd3
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20d2:7: qla24xx_handle_gpdb_event 50:00:e1:11:c5:27:f0:70 DS 5 LS 6 fc4_type 0 rc 0
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2062:7: 50:00:e1:11:c5:27:f0:70 SVC Param w3 0312
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: GPDB to UPD_FCPORT - portid=000026.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ef:7: qla2x00_update_fcport 50:00:e1:11:c5:27:f0:70
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: UPD_FCPORT to UPD_FCPORT - portid=000026.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-207d:7: FCPort 50:00:e1:11:c5:27:f0:70 state transitioned from UNCONFIGURED to ONLINE - portid=000026.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ee:7: qla2x00_reg_remote_port: 5000e111c527f070. rport 7:0:0 (00000000380d4127) is tgt mode
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: UPD_FCPORT to LOGIN_COMPLETE - portid=000026.
Jun 22 14:09:35 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-911e:7: qla_register_fcport_fn rscn gen 0/0 next DS 0
Jun 22 14:09:35 tony13 kern.notice kernel: scsi 7:0:0:0: Sequential-Access IBM ULTRIUM-HH8 K4K1 PQ: 0 ANSI: 6
Jun 22 14:09:35 tony13 kern.notice kernel: st 7:0:0:0: Attached scsi tape st0
Jun 22 14:09:35 tony13 kern.info kernel: st 7:0:0:0: st0: try direct i/o: yes (alignment 8 B)

cat /proc/scsi/scsi

Attached devices:
Host: scsi7 Channel: 00 Id: 00 Lun: 00
Vendor: IBM Model: ULTRIUM-HH8 Rev: K4K1
Type: Sequential-Access ANSI SCSI revision: 06

COMMENT: Tape drive shows up as expected after cable plugged in.

Unplug cable

Jun 22 14:10:54 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
Jun 22 14:10:54 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20f1:7: Mark all dev lost

sleep 120

cat /proc/scsi/scsi

Attached devices:
Host: scsi7 Channel: 00 Id: 00 Lun: 00
Vendor: IBM Model: ULTRIUM-HH8 Rev: K4K1
Type: Sequential-Access ANSI SCSI revision: 06

COMMENT: Tape drive does not disappear after cable unplugged.

Plug in cable to tape drive B

Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-211a:7: Delaying session delete for FCP2 flags 0x4 port_type = 0x4 port_id=000026 50
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5009:7: LIP occurred (0).
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b7:7: Format 0 : Number of VPs setup 254, number of VPs acquired 1.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b8:7: Primary port id 0000ef.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5010:7: Port logout 0000 0007 0300.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-508a:7: Marking port lost loopid=0000 portid=000026.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: LOGIN_COMPLETE to DELETE_PEND - portid=000026.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-210a:7: qlt_unreg_sess sess 00000000fd4cb6a1 for deletion 50:00:e1:11:c5:27:f0:70
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: DELETE_PEND to DELETE_PEND - portid=000026.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-f084:7: qlt_free_session_done: se_sess 0000000000000000 / sess 00000000fd4cb6a1 from port 50:00:e1:11:c5:27:f0:70 loop_id 0x00 s_id 00:00:26 logout 0 keep 1 els_logo 0
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-207d:7: FCPort 50:00:e1:11:c5:27:f0:70 state transitioned from ONLINE to LOST - portid=000026.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2109:7: qla2x00_schedule_rport_del 5000e111c527f070. rport 00000000380d4127 roles 1
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-500a:7: LOOP UP detected (8 Gbps).
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5012:7: Port database changed ffff 0006 0000.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:70 disc_state transition: DELETE_PEND to DELETED - portid=000026.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-f001:7: Unregistration of sess 00000000fd4cb6a1 50:00:e1:11:c5:27:f0:70 finished fcp_cnt 0
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400f:7: Loop resync scheduled.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-8037:7: F/W Ready - OK.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-803a:7: fw_state=3 (7, 141, 0, 4 0) curr time=100016448.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-200b:7: HBA in NL topology.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2014:7: Configure loop -- dpc flags = 0x1260.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2011:7: Entries in ID list (0).
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-d050:7: Link reinitialized
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-206a:7: qla2x00_configure_loop *** FAILED ***.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4010:7: Loop resync end.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-500c:7: LIP reset occurred (f7f7).
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5009:7: LIP occurred (f7f7).
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b7:7: Format 0 : Number of VPs setup 254, number of VPs acquired 1.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-10b8:7: Primary port id 0000ef.
Jun 22 14:14:33 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5012:7: Port database changed ffff 0006 0000.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400b:7: Reset marker scheduled.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400c:7: Reset marker end.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-400f:7: Loop resync scheduled.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-8037:7: F/W Ready - OK.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-803a:7: fw_state=3 (7, 141, 0, 4 0) curr time=1000164b0.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-200b:7: HBA in NL topology.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2014:7: Configure loop -- dpc flags = 0x1260.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2011:7: Entries in ID list (1).
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20d8:7: qla24xx_fcport_handle_login 50:00:e1:11:c5:27:f0:5c DS 0 LS 7 P 0 fl 0 confl 0000000000000000 rscn 0|0 login 0 lid 0 scan 2 fc4type 0
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-307b:7: qla_chk_n2n_b4_login 50:00:e1:11:c5:27:f0:5c DS 0 LS 7 lid 0 retries=30
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20bf:7: qla_chk_n2n_b4_login 1628 50:00:e1:11:c5:27:f0:5c post login
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2069:7: LOOP READY.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-206b:7: qla2x00_configure_loop: exiting normally. local port wwpn 2100000e1e2227a1 id 0000ef)
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4010:7: Loop resync end.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4000:7: DPC handler sleeping.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:5c disc_state transition: DELETED to LOGIN_PEND - portid=000026.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2072:7: Async-login - 50:00:e1:11:c5:27:f0:5c hdl=4, loopid=0 portid=000026 retries=29 .
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-5036:7: Async-login complete: handle=4 pid=000026 wwpn=50:00:e1:11:c5:27:f0:5c iop0=312
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20dd:7: qla2x00_async_login_sp_done 50:00:e1:11:c5:27:f0:5c res 0
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-ffff:7: qla24xx_handle_plogi_done_event 50:00:e1:11:c5:27:f0:5c DS 3 LS 7 rc 0 login 0|0 rscn 0|0 data 4000|0 iop 0|0
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ea:7: qla24xx_handle_plogi_done_event 2232 50:00:e1:11:c5:27:f0:5c LoopID 0x0 in use with 000026. post gpdb
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:5c disc_state transition: LOGIN_PEND to GPDB - portid=000026.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20dc:7: Async-gpdb 50:00:e1:11:c5:27:f0:5c hndl 0 opt 0
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20db:7: Async done-gpdb res 0, WWPN 50:00:e1:11:c5:27:f0:5c mb[1]=0 mb[2]=ffd3
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20d2:7: qla24xx_handle_gpdb_event 50:00:e1:11:c5:27:f0:5c DS 5 LS 6 fc4_type 0 rc 0
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2062:7: 50:00:e1:11:c5:27:f0:5c SVC Param w3 0312
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:5c disc_state transition: GPDB to UPD_FCPORT - portid=000026.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ef:7: qla2x00_update_fcport 50:00:e1:11:c5:27:f0:5c
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:5c disc_state transition: UPD_FCPORT to UPD_FCPORT - portid=000026.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-207d:7: FCPort 50:00:e1:11:c5:27:f0:5c state transitioned from UNCONFIGURED to ONLINE - portid=000026.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20ee:7: qla2x00_reg_remote_port: 5000e111c527f05c. rport 7:0:1 (0000000091a61914) is tgt mode
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-2134:7: FCPort 50:00:e1:11:c5:27:f0:5c disc_state transition: UPD_FCPORT to LOGIN_COMPLETE - portid=000026.
Jun 22 14:14:34 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-911e:7: qla_register_fcport_fn rscn gen 0/0 next DS 0
Jun 22 14:14:34 tony13 kern.notice kernel: scsi 7:0:1:0: Sequential-Access IBM ULTRIUM-HH9 NCA1 PQ: 0 ANSI: 6
Jun 22 14:14:34 tony13 kern.notice kernel: st 7:0:1:0: Attached scsi tape st1
Jun 22 14:14:34 tony13 kern.info kernel: st 7:0:1:0: st1: try direct i/o: yes (alignment 8 B)
Jun 22 14:14:34 tony13 kern.notice kernel: scsi 7:0:1:1: Medium Changer CYBERNET CY-LTOLIB 1.40 PQ: 0 ANSI: 5

cat /proc/scsi/scsi

Attached devices:
Host: scsi7 Channel: 00 Id: 00 Lun: 00
Vendor: IBM Model: ULTRIUM-HH8 Rev: K4K1
Type: Sequential-Access ANSI SCSI revision: 06
Host: scsi7 Channel: 00 Id: 01 Lun: 00
Vendor: IBM Model: ULTRIUM-HH9 Rev: NCA1
Type: Sequential-Access ANSI SCSI revision: 06
Host: scsi7 Channel: 00 Id: 01 Lun: 01
Vendor: CYBERNET Model: CY-LTOLIB Rev: 1.40
Type: Medium Changer ANSI SCSI revision: 05

COMMENT: Tape drive B shows up as expected; tape drive A disappears 30 seconds later:

Jun 22 14:15:03 tony13 kern.err kernel: rport-7:0-0: blocked FC remote port time out: removing target and saving binding
Jun 22 14:15:03 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-207d:7: FCPort 50:00:e1:11:c5:27:f0:70 state transitioned from LOST to DEAD - portid=000026.

cat /proc/scsi/scsi

Attached devices:
Host: scsi7 Channel: 00 Id: 01 Lun: 00
Vendor: IBM Model: ULTRIUM-HH9 Rev: NCA1
Type: Sequential-Access ANSI SCSI revision: 06
Host: scsi7 Channel: 00 Id: 01 Lun: 01
Vendor: CYBERNET Model: CY-LTOLIB Rev: 1.40
Type: Medium Changer ANSI SCSI revision: 05

Unplug cable

Jun 22 14:15:31 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-500b:7: LOOP DOWN detected (2 7 0 0).
Jun 22 14:15:31 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20f1:7: Mark all dev lost

cat /proc/scsi/scsi

Attached devices:
Host: scsi7 Channel: 00 Id: 01 Lun: 00
Vendor: IBM Model: ULTRIUM-HH9 Rev: NCA1
Type: Sequential-Access ANSI SCSI revision: 06
Host: scsi7 Channel: 00 Id: 01 Lun: 01
Vendor: CYBERNET Model: CY-LTOLIB Rev: 1.40
Type: Medium Changer ANSI SCSI revision: 05

COMMENT: Tape drive does not disappear after cable unplugged.

rmmod qla2xxx

Jun 22 14:16:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-211a:7: Delaying session delete for FCP2 flags 0x4 port_type = 0x4 port_id=000026 50
Jun 22 14:16:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-b079:7: Removing driver
Jun 22 14:16:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-00af:7: Performing ISP error recovery - ha=0000000070375427.
Jun 22 14:16:28 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-20f1:7: Mark all dev lost
Jun 22 14:16:39 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-211a:7: Delaying session delete for FCP2 flags 0x4 port_type = 0x4 port_id=000026 50
Jun 22 14:16:39 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-4011:7: DPC handler exiting.
Jun 22 14:16:39 tony13 kern.warn kernel: qla2xxx [0000:83:00.1]-207d:7: FCPort 50:00:e1:11:c5:27:f0:5c state transitioned from ONLINE to DEAD - portid=000026.
Jun 22 14:16:39 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-b079:6: Removing driver
Jun 22 14:16:39 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-00af:6: Performing ISP error recovery - ha=000000006f17f9d2.
Jun 22 14:16:39 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-20f1:6: Mark all dev lost
Jun 22 14:16:39 tony13 kern.warn kernel: qla2xxx [0000:83:00.0]-4011:6: DPC handler exiting.

Tony Battersby
Cybernetics

2022-06-22 23:07:20

by Arun Easi

[permalink] [raw]
Subject: Re: [EXT] Re: [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable

Hi Tony,

On Wed, 22 Jun 2022, 7:56am, Tony Battersby wrote:

> On 6/21/22 18:05, Arun Easi wrote:
> > Thanks for the info. Just to reiterate, you've reported two issues (though
> > this log was showing only 1 of them).
> >
> > Issue 1 - Tape device never disappears when removed
> > Issue 2 - When a direct connected tape 1 was replaced with tape 2, tape 2
> > was not discovered.
> >
> > For Issue-2, please try the attached patch. This may not be the final fix,
> > but wanted to check if that would fix the issue for you.
> >
> > For Issue-1, the behavior was intentional, though that behavior needs
> > refinement. These tape drives support something called FC sequence level
> > error recovery (added in FCP-2), which can make tape I/Os survive even
> > across a short cable pull. This is not a simple retry of the I/O, rather a
> > retry done at the FC sequence level that gives the IO a better chance of
> > revival. In other words, the said patch that caused regression, while
> > introduces an incorrect reporting of the state of the device, makes backup
> > more resilient.
> >
> > Now, onto the behavior when device state is reported immediately. What we
> > have observed, at least with one tape drive from a major vendor, is that,
> > across a device loss and device back case with both the events reported to
> > upper layers, the backup operation was getting failed. This is due to a
> > REPORT LUNS command being issued during device reappearance reporting
> > (fc_remote_port_add -> SCSI scan), which the tape drive was not expecting
> > and caused the backup to fail.
> >
> > I know that some tape drives do not support multiple commands to it at the
> > same time, but not sure if that is still the norm these days.
> >
> > So, perhaps one way to make the behavior better, is to either report the
> > disappearing device a bit delayed or have intelligence added in SCSI scan
> > to detect ongoing tape IO operations and delay/avoid the REPORT LUNs.
> > Former is a more contained (in the LLD) fix.
> >
> > Regards,
> > -Arun
>
> Your patch does fix Issue-2 for me.  For Issue-1, it would be fine with
> me if qla2xxx reported device removal to the upper level a bit delayed,
> as you said.
>

Thanks for testing and verifying the patch. We will post the patch
upstream after due testing.

Regards,
-Arun

Subject: Re: [EXT] Re: [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable

On 23.06.22 01:03, Arun Easi wrote:
> On Wed, 22 Jun 2022, 7:56am, Tony Battersby wrote:
>
>> On 6/21/22 18:05, Arun Easi wrote:
>>> Thanks for the info. Just to reiterate, you've reported two issues (though
>>> this log was showing only 1 of them).
>>>
>>> Issue 1 - Tape device never disappears when removed
>>> Issue 2 - When a direct connected tape 1 was replaced with tape 2, tape 2
>>> was not discovered.
>>>
>>> For Issue-2, please try the attached patch. This may not be the final fix,
>>> but wanted to check if that would fix the issue for you.
>>>
>>> For Issue-1, the behavior was intentional, though that behavior needs
>>> refinement. These tape drives support something called FC sequence level
>>> error recovery (added in FCP-2), which can make tape I/Os survive even
>>> across a short cable pull. This is not a simple retry of the I/O, rather a
>>> retry done at the FC sequence level that gives the IO a better chance of
>>> revival. In other words, the said patch that caused regression, while
>>> introduces an incorrect reporting of the state of the device, makes backup
>>> more resilient.
>>>
>>> Now, onto the behavior when device state is reported immediately. What we
>>> have observed, at least with one tape drive from a major vendor, is that,
>>> across a device loss and device back case with both the events reported to
>>> upper layers, the backup operation was getting failed. This is due to a
>>> REPORT LUNS command being issued during device reappearance reporting
>>> (fc_remote_port_add -> SCSI scan), which the tape drive was not expecting
>>> and caused the backup to fail.
>>>
>>> I know that some tape drives do not support multiple commands to it at the
>>> same time, but not sure if that is still the norm these days.
>>>
>>> So, perhaps one way to make the behavior better, is to either report the
>>> disappearing device a bit delayed or have intelligence added in SCSI scan
>>> to detect ongoing tape IO operations and delay/avoid the REPORT LUNs.
>>> Former is a more contained (in the LLD) fix.
>>>
>>> Regards,
>>> -Arun
>>
>> Your patch does fix Issue-2 for me.  For Issue-1, it would be fine with
>> me if qla2xxx reported device removal to the upper level a bit delayed,
>> as you said.
>>
>
> Thanks for testing and verifying the patch.

BTW, that patch should have 'Link:' tags pointing to all reports about
this issue, e.g. the start of this thread.

These tags are important, as they allow others to look into the
backstory now and years from now. That is why they should be placed in
cases like this, as Documentation/process/submitting-patches.rst and
Documentation/process/5.Posting.rst explain in more detail.
Additionally, my regression tracking bot ‘regzbot’ relies on these tags
to automatically connect reports with patches that are posted or
committed to fix the reported issue. BTW, let me tell regzbot to monitor
this thread:

> We will post the patch upstream after due testing.
That was more than two weeks ago now and I didn't see any progress. Or
did I miss it?

Reminder, things should take this long. For details see the section
"Prioritize work on fixing regressions" in this document:
https://docs.kernel.org/process/handling-regressions.html

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

#regzbot poke

2022-07-06 18:08:05

by Arun Easi

[permalink] [raw]
Subject: Re: [EXT] Re: [REGRESSION] qla2xxx: tape drive not removed after unplug FC cable

Hi Thorsten,

On Mon, 4 Jul 2022, 5:06am, Thorsten Leemhuis wrote:

> On 23.06.22 01:03, Arun Easi wrote:
> > On Wed, 22 Jun 2022, 7:56am, Tony Battersby wrote:
> >
> >> On 6/21/22 18:05, Arun Easi wrote:
> >>> Thanks for the info. Just to reiterate, you've reported two issues (though
> >>> this log was showing only 1 of them).
> >>>
> >>> Issue 1 - Tape device never disappears when removed
> >>> Issue 2 - When a direct connected tape 1 was replaced with tape 2, tape 2
> >>> was not discovered.
> >>>
> >>> For Issue-2, please try the attached patch. This may not be the final fix,
> >>> but wanted to check if that would fix the issue for you.
> >>>
> >>> For Issue-1, the behavior was intentional, though that behavior needs
> >>> refinement. These tape drives support something called FC sequence level
> >>> error recovery (added in FCP-2), which can make tape I/Os survive even
> >>> across a short cable pull. This is not a simple retry of the I/O, rather a
> >>> retry done at the FC sequence level that gives the IO a better chance of
> >>> revival. In other words, the said patch that caused regression, while
> >>> introduces an incorrect reporting of the state of the device, makes backup
> >>> more resilient.
> >>>
> >>> Now, onto the behavior when device state is reported immediately. What we
> >>> have observed, at least with one tape drive from a major vendor, is that,
> >>> across a device loss and device back case with both the events reported to
> >>> upper layers, the backup operation was getting failed. This is due to a
> >>> REPORT LUNS command being issued during device reappearance reporting
> >>> (fc_remote_port_add -> SCSI scan), which the tape drive was not expecting
> >>> and caused the backup to fail.
> >>>
> >>> I know that some tape drives do not support multiple commands to it at the
> >>> same time, but not sure if that is still the norm these days.
> >>>
> >>> So, perhaps one way to make the behavior better, is to either report the
> >>> disappearing device a bit delayed or have intelligence added in SCSI scan
> >>> to detect ongoing tape IO operations and delay/avoid the REPORT LUNs.
> >>> Former is a more contained (in the LLD) fix.
> >>>
> >>> Regards,
> >>> -Arun
> >>
> >> Your patch does fix Issue-2 for me.? For Issue-1, it would be fine with
> >> me if qla2xxx reported device removal to the upper level a bit delayed,
> >> as you said.
> >>
> >
> > Thanks for testing and verifying the patch.
>
> BTW, that patch should have 'Link:' tags pointing to all reports about
> this issue, e.g. the start of this thread.

Thanks, will add, I was not aware of this.

>
> These tags are important, as they allow others to look into the
> backstory now and years from now. That is why they should be placed in
> cases like this, as Documentation/process/submitting-patches.rst and
> Documentation/process/5.Posting.rst explain in more detail.
> Additionally, my regression tracking bot ?regzbot? relies on these tags
> to automatically connect reports with patches that are posted or
> committed to fix the reported issue. BTW, let me tell regzbot to monitor
> this thread:
>
> > We will post the patch upstream after due testing.
> That was more than two weeks ago now and I didn't see any progress. Or
> did I miss it?

No, the fix is being prepared to be posted.

Regards,
-Arun