2018-03-31 02:00:35

by Wakko Warner

[permalink] [raw]
Subject: 4.15.14 crash with iscsi target and dvd

I reported this before but noone responded.

I have an iscsi target setup with /dev/sr[012] using pscsi. On the
initiator, I mount only 1 disc. Then I issue find -type f | xargs cat >
/dev/null Then after a few seconds, I get 2 oops and the system has to be
hard reset.

I noticed if I cat /dev/sr1 from the initiator, it doesn't crash. I was
also able to burn without crashing.

Here's the OOPS:
[2692.733468] WARNING: CPU: 8 PID: 0 at /usr/src/linux/dist/4.15.14-nobklcd/drivers/scsi/scsi_lib.c:1068 scsi_init_io+0x111/0x1a0 [scsi_mod]
[2692.734154] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic md_mod dm_crypt algif_skcipher af_alg dm_mod dax ext4 crc16 mbcache jbd2 af_packet iscsi_target_mod tcm_loop vhost_scsi vhost target_core_file target_core_iblock target_core_pscsi target_core_mod nfsd exportfs dummy bridge stp llc ib_iser rdma_cm iw_cm ib_cm ib_core ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi netconsole configfs sr_mod cdrom adt7475 hwmon_vid sd_mod sg coretemp x86_pkg_temp_thermal kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc snd_hda_codec_realtek snd_hda_codec_generic nouveau video led_class drm_kms_helper cfbfillrect syscopyarea cfbimgblt
[2692.737388] sysfillrect sysimgblt snd_hda_intel fb_sys_fops cfbcopyarea snd_hda_codec ttm snd_hda_core snd_pcm_oss drm snd_mixer_oss agpgart snd_pcm igb snd_timer snd aesni_intel soundcore hwmon aes_x86_64 i2c_algo_bit ahci mpt3sas crypto_simd i2c_core libahci glue_helper mptsas raid_class libata mptscsih scsi_transport_sas mptbase scsi_mod wmi button unix
[2692.737388] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.15.14 #1
[2692.737388] Hardware name: Dell Inc. Precision T5610/0WN7Y6, BIOS A16 02/05/2018
[2692.737388] RIP: 0010:scsi_init_io+0x111/0x1a0 [scsi_mod]
[2692.737388] RSP: 0018:ffff8806b7a03d78 EFLAGS: 00010046
[2692.737388] RAX: 0000000000000000 RBX: ffff8806aa4a9c00 RCX: 0000000000000000
[2692.737388] RDX: 0000000000000000 RSI: ffff8806aa4a9c00 RDI: ffff8806aa4a9d48
[2692.737388] RBP: ffff8806aa4a9d48 R08: 0000000000000000 R09: ffff8806aa4a9d80
[2692.737388] R10: ffff8806ab949088 R11: 0000000000000000 R12: ffff8806b29bb000
[2692.737388] R13: 0000000000000000 R14: ffff8806b29bb000 R15: ffff8806ac4f6950
[2692.737388] FS: 0000000000000000(0000) GS:ffff8806b7a00000(0000) knlGS:0000000000000000
[2692.737388] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2692.737388] CR2: 00007f1359a8b756 CR3: 0000000001c09003 CR4: 00000000001606e0
[2692.737388] Call Trace:
[2692.737388] <IRQ>
[2692.737388] ? scsi_setup_cmnd+0xb3/0x140 [scsi_mod]
[2692.737388] ? scsi_prep_fn+0x53/0x130 [scsi_mod]
[2692.737388] ? blk_peek_request+0x136/0x220
[2692.737388] ? scsi_request_fn+0x2b/0x510 [scsi_mod]
[2692.737388] ? __blk_run_queue+0x34/0x50
[2692.737388] ? blk_run_queue+0x26/0x40
[2692.737388] ? scsi_run_queue+0x229/0x2b0 [scsi_mod]
[2692.737388] ? scsi_io_completion+0x3ce/0x5a0 [scsi_mod]
[2692.737388] ? blk_done_softirq+0x67/0x80
[2692.737388] ? __do_softirq+0xdb/0x1dd
[2692.737388] ? irq_exit+0xa3/0xb0
[2692.737388] ? do_IRQ+0x45/0xc0
[2692.737388] ? common_interrupt+0x77/0x77
[2692.737388] </IRQ>
[2692.737388] ? cpuidle_enter_state+0x124/0x200
[2692.737388] ? cpuidle_enter_state+0x119/0x200
[2692.737388] ? do_idle+0xdc/0x180
[2692.737388] ? cpu_startup_entry+0x14/0x20
[2692.737388] ? secondary_startup_64+0xa5/0xb0
[2692.737388] Code: 8b 7b 30 e8 d2 6b 20 e1 49 8b 17 4c 89 ff 89 c6 89 44 24 04 e8 81 81 22 e1 85 c0 41 89 c4 74 55 41 bc 02 00 00 00 e9 39 ff ff ff <0f> 0b 41 bc 01 00 00 00 e9 2c ff ff ff 48 8b 3d 6b dc 00 00 be
[2692.737388] ---[ end trace 9801970f9b249e10 ]---
[2692.737388] ------------[ cut here ]------------
[2692.737388] kernel BUG at /usr/src/linux/dist/4.15.14-nobklcd/block/blk-core.c:3235!
[2692.737388] invalid opcode: 0000 [#1] PREEMPT SMP
[2692.737388] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic md_mod dm_crypt algif_skcipher af_alg dm_mod dax ext4 crc16 mbcache jbd2 af_packet iscsi_target_mod tcm_loop vhost_scsi vhost target_core_file target_core_iblock target_core_pscsi target_core_mod nfsd exportfs dummy bridge stp llc ib_iser rdma_cm iw_cm ib_cm ib_core ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi netconsole configfs sr_mod cdrom adt7475 hwmon_vid sd_mod sg coretemp x86_pkg_temp_thermal kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc snd_hda_codec_realtek snd_hda_codec_generic nouveau video led_class drm_kms_helper cfbfillrect syscopyarea cfbimgblt
[2692.737388] sysfillrect sysimgblt snd_hda_intel fb_sys_fops cfbcopyarea snd_hda_codec ttm snd_hda_core snd_pcm_oss drm snd_mixer_oss agpgart snd_pcm igb snd_timer snd aesni_intel soundcore hwmon aes_x86_64 i2c_algo_bit ahci mpt3sas crypto_simd i2c_core libahci glue_helper mptsas raid_class libata mptscsih scsi_transport_sas mptbase scsi_mod wmi button unix
[2692.737388] CPU: 8 PID: 0 Comm: swapper/8 Tainted: G W 4.15.14 #1
[2692.737388] Hardware name: Dell Inc. Precision T5610/0WN7Y6, BIOS A16 02/05/2018
[2692.737388] RIP: 0010:__blk_end_request_all+0x50/0x60
[2692.737388] RSP: 0018:ffff8806b7a03df8 EFLAGS: 00010002
[2692.737388] RAX: 0000000000000001 RBX: ffff8806ac4f6950 RCX: 0000000000000000
[2692.737388] RDX: 0000000000000001 RSI: ffff8806ac36a480 RDI: 0000000000000000
[2692.737388] RBP: 0000000000000001 R08: 0000000000000000 R09: ffff8806aa4a9d80
[2692.737388] R10: ffff8806ab949088 R11: 0000000000000000 R12: ffff8806aa4a9c00
[2692.737388] R13: ffff8806ac4f6950 R14: 0000000000000246 R15: ffff8806ac4f6950
[2692.737388] FS: 0000000000000000(0000) GS:ffff8806b7a00000(0000) knlGS:0000000000000000
[2692.737388] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2692.737388] CR2: 00007f1359a8b756 CR3: 0000000001c09003 CR4: 00000000001606e0
[2692.737388] Call Trace:
[2692.737388] <IRQ>
[2692.737388] ? blk_peek_request+0x173/0x220
[2692.737388] ? scsi_request_fn+0x2b/0x510 [scsi_mod]
[2692.737388] ? __blk_run_queue+0x34/0x50
[2692.737388] ? blk_run_queue+0x26/0x40
[2692.737388] ? scsi_run_queue+0x229/0x2b0 [scsi_mod]
[2692.737388] ? scsi_io_completion+0x3ce/0x5a0 [scsi_mod]
[2692.737388] ? blk_done_softirq+0x67/0x80
[2692.737388] ? __do_softirq+0xdb/0x1dd
[2692.737388] ? irq_exit+0xa3/0xb0
[2692.737388] ? do_IRQ+0x45/0xc0
[2692.737388] ? common_interrupt+0x77/0x77
[2692.737388] </IRQ>
[2692.737388] ? cpuidle_enter_state+0x124/0x200
[2692.737388] ? cpuidle_enter_state+0x119/0x200
[2692.737388] ? do_idle+0xdc/0x180
[2692.737388] ? cpu_startup_entry+0x14/0x20
[2692.737388] ? secondary_startup_64+0xa5/0xb0
[2692.737388] Code: ff ff ff 84 c0 75 24 c3 0f 0b 48 8b 87 40 01 00 00 31 c9 48 85 c0 74 df 8b 48 58 40 0f b6 f6 8b 57 58 e8 04 ff ff ff 84 c0 74 dc <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 41 54 55 53 48
[2692.737388] RIP: __blk_end_request_all+0x50/0x60 RSP: ffff8806b7a03df8
[2692.737388] ---[ end trace 9801970f9b249e11 ]---
[2692.737388] Kernel panic - not syncing: Fatal exception in interrupt
[2692.737388] Kernel Offset: disabled
[2692.737388] ---[ end Kernel panic - not syncing: Fatal exception in interrupt


2018-03-31 21:01:28

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Wakko Warner wrote:
> I reported this before but noone responded.
>
> I have an iscsi target setup with /dev/sr[012] using pscsi. On the
> initiator, I mount only 1 disc. Then I issue find -type f | xargs cat >
> /dev/null Then after a few seconds, I get 2 oops and the system has to be
> hard reset.
>
> I noticed if I cat /dev/sr1 from the initiator, it doesn't crash. I was
> also able to burn without crashing.

I'd like to add that 4.9.91 does not crash. I haven't tested newer kernels
yet.

2018-03-31 21:10:36

by Richard Weinberger

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner <[email protected]> wrote:
> I reported this before but noone responded.

Because you're sending only to LKML.
CC'ing storage folks.

> I have an iscsi target setup with /dev/sr[012] using pscsi. On the
> initiator, I mount only 1 disc. Then I issue find -type f | xargs cat >
> /dev/null Then after a few seconds, I get 2 oops and the system has to be
> hard reset.
>
> I noticed if I cat /dev/sr1 from the initiator, it doesn't crash. I was
> also able to burn without crashing.
>
> Here's the OOPS:
> [2692.733468] WARNING: CPU: 8 PID: 0 at /usr/src/linux/dist/4.15.14-nobklcd/drivers/scsi/scsi_lib.c:1068 scsi_init_io+0x111/0x1a0 [scsi_mod]
> [2692.734154] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic md_mod dm_crypt algif_skcipher af_alg dm_mod dax ext4 crc16 mbcache jbd2 af_packet iscsi_target_mod tcm_loop vhost_scsi vhost target_core_file target_core_iblock target_core_pscsi target_core_mod nfsd exportfs dummy bridge stp llc ib_iser rdma_cm iw_cm ib_cm ib_core ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi netconsole configfs sr_mod cdrom adt7475 hwmon_vid sd_mod sg coretemp x86_pkg_temp_thermal kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc snd_hda_codec_realtek snd_hda_codec_generic nouveau video led_class drm_kms_helper cfbfillrect syscopyarea cfbimgblt
> [2692.737388] sysfillrect sysimgblt snd_hda_intel fb_sys_fops cfbcopyarea snd_hda_codec ttm snd_hda_core snd_pcm_oss drm snd_mixer_oss agpgart snd_pcm igb snd_timer snd aesni_intel soundcore hwmon aes_x86_64 i2c_algo_bit ahci mpt3sas crypto_simd i2c_core libahci glue_helper mptsas raid_class libata mptscsih scsi_transport_sas mptbase scsi_mod wmi button unix
> [2692.737388] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.15.14 #1
> [2692.737388] Hardware name: Dell Inc. Precision T5610/0WN7Y6, BIOS A16 02/05/2018
> [2692.737388] RIP: 0010:scsi_init_io+0x111/0x1a0 [scsi_mod]
> [2692.737388] RSP: 0018:ffff8806b7a03d78 EFLAGS: 00010046
> [2692.737388] RAX: 0000000000000000 RBX: ffff8806aa4a9c00 RCX: 0000000000000000
> [2692.737388] RDX: 0000000000000000 RSI: ffff8806aa4a9c00 RDI: ffff8806aa4a9d48
> [2692.737388] RBP: ffff8806aa4a9d48 R08: 0000000000000000 R09: ffff8806aa4a9d80
> [2692.737388] R10: ffff8806ab949088 R11: 0000000000000000 R12: ffff8806b29bb000
> [2692.737388] R13: 0000000000000000 R14: ffff8806b29bb000 R15: ffff8806ac4f6950
> [2692.737388] FS: 0000000000000000(0000) GS:ffff8806b7a00000(0000) knlGS:0000000000000000
> [2692.737388] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [2692.737388] CR2: 00007f1359a8b756 CR3: 0000000001c09003 CR4: 00000000001606e0
> [2692.737388] Call Trace:
> [2692.737388] <IRQ>
> [2692.737388] ? scsi_setup_cmnd+0xb3/0x140 [scsi_mod]
> [2692.737388] ? scsi_prep_fn+0x53/0x130 [scsi_mod]
> [2692.737388] ? blk_peek_request+0x136/0x220
> [2692.737388] ? scsi_request_fn+0x2b/0x510 [scsi_mod]
> [2692.737388] ? __blk_run_queue+0x34/0x50
> [2692.737388] ? blk_run_queue+0x26/0x40
> [2692.737388] ? scsi_run_queue+0x229/0x2b0 [scsi_mod]
> [2692.737388] ? scsi_io_completion+0x3ce/0x5a0 [scsi_mod]
> [2692.737388] ? blk_done_softirq+0x67/0x80
> [2692.737388] ? __do_softirq+0xdb/0x1dd
> [2692.737388] ? irq_exit+0xa3/0xb0
> [2692.737388] ? do_IRQ+0x45/0xc0
> [2692.737388] ? common_interrupt+0x77/0x77
> [2692.737388] </IRQ>
> [2692.737388] ? cpuidle_enter_state+0x124/0x200
> [2692.737388] ? cpuidle_enter_state+0x119/0x200
> [2692.737388] ? do_idle+0xdc/0x180
> [2692.737388] ? cpu_startup_entry+0x14/0x20
> [2692.737388] ? secondary_startup_64+0xa5/0xb0
> [2692.737388] Code: 8b 7b 30 e8 d2 6b 20 e1 49 8b 17 4c 89 ff 89 c6 89 44 24 04 e8 81 81 22 e1 85 c0 41 89 c4 74 55 41 bc 02 00 00 00 e9 39 ff ff ff <0f> 0b 41 bc 01 00 00 00 e9 2c ff ff ff 48 8b 3d 6b dc 00 00 be
> [2692.737388] ---[ end trace 9801970f9b249e10 ]---
> [2692.737388] ------------[ cut here ]------------
> [2692.737388] kernel BUG at /usr/src/linux/dist/4.15.14-nobklcd/block/blk-core.c:3235!
> [2692.737388] invalid opcode: 0000 [#1] PREEMPT SMP
> [2692.737388] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic md_mod dm_crypt algif_skcipher af_alg dm_mod dax ext4 crc16 mbcache jbd2 af_packet iscsi_target_mod tcm_loop vhost_scsi vhost target_core_file target_core_iblock target_core_pscsi target_core_mod nfsd exportfs dummy bridge stp llc ib_iser rdma_cm iw_cm ib_cm ib_core ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi netconsole configfs sr_mod cdrom adt7475 hwmon_vid sd_mod sg coretemp x86_pkg_temp_thermal kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc snd_hda_codec_realtek snd_hda_codec_generic nouveau video led_class drm_kms_helper cfbfillrect syscopyarea cfbimgblt
> [2692.737388] sysfillrect sysimgblt snd_hda_intel fb_sys_fops cfbcopyarea snd_hda_codec ttm snd_hda_core snd_pcm_oss drm snd_mixer_oss agpgart snd_pcm igb snd_timer snd aesni_intel soundcore hwmon aes_x86_64 i2c_algo_bit ahci mpt3sas crypto_simd i2c_core libahci glue_helper mptsas raid_class libata mptscsih scsi_transport_sas mptbase scsi_mod wmi button unix
> [2692.737388] CPU: 8 PID: 0 Comm: swapper/8 Tainted: G W 4.15.14 #1
> [2692.737388] Hardware name: Dell Inc. Precision T5610/0WN7Y6, BIOS A16 02/05/2018
> [2692.737388] RIP: 0010:__blk_end_request_all+0x50/0x60
> [2692.737388] RSP: 0018:ffff8806b7a03df8 EFLAGS: 00010002
> [2692.737388] RAX: 0000000000000001 RBX: ffff8806ac4f6950 RCX: 0000000000000000
> [2692.737388] RDX: 0000000000000001 RSI: ffff8806ac36a480 RDI: 0000000000000000
> [2692.737388] RBP: 0000000000000001 R08: 0000000000000000 R09: ffff8806aa4a9d80
> [2692.737388] R10: ffff8806ab949088 R11: 0000000000000000 R12: ffff8806aa4a9c00
> [2692.737388] R13: ffff8806ac4f6950 R14: 0000000000000246 R15: ffff8806ac4f6950
> [2692.737388] FS: 0000000000000000(0000) GS:ffff8806b7a00000(0000) knlGS:0000000000000000
> [2692.737388] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [2692.737388] CR2: 00007f1359a8b756 CR3: 0000000001c09003 CR4: 00000000001606e0
> [2692.737388] Call Trace:
> [2692.737388] <IRQ>
> [2692.737388] ? blk_peek_request+0x173/0x220
> [2692.737388] ? scsi_request_fn+0x2b/0x510 [scsi_mod]
> [2692.737388] ? __blk_run_queue+0x34/0x50
> [2692.737388] ? blk_run_queue+0x26/0x40
> [2692.737388] ? scsi_run_queue+0x229/0x2b0 [scsi_mod]
> [2692.737388] ? scsi_io_completion+0x3ce/0x5a0 [scsi_mod]
> [2692.737388] ? blk_done_softirq+0x67/0x80
> [2692.737388] ? __do_softirq+0xdb/0x1dd
> [2692.737388] ? irq_exit+0xa3/0xb0
> [2692.737388] ? do_IRQ+0x45/0xc0
> [2692.737388] ? common_interrupt+0x77/0x77
> [2692.737388] </IRQ>
> [2692.737388] ? cpuidle_enter_state+0x124/0x200
> [2692.737388] ? cpuidle_enter_state+0x119/0x200
> [2692.737388] ? do_idle+0xdc/0x180
> [2692.737388] ? cpu_startup_entry+0x14/0x20
> [2692.737388] ? secondary_startup_64+0xa5/0xb0
> [2692.737388] Code: ff ff ff 84 c0 75 24 c3 0f 0b 48 8b 87 40 01 00 00 31 c9 48 85 c0 74 df 8b 48 58 40 0f b6 f6 8b 57 58 e8 04 ff ff ff 84 c0 74 dc <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 41 54 55 53 48
> [2692.737388] RIP: __blk_end_request_all+0x50/0x60 RSP: ffff8806b7a03df8
> [2692.737388] ---[ end trace 9801970f9b249e11 ]---
> [2692.737388] Kernel panic - not syncing: Fatal exception in interrupt
> [2692.737388] Kernel Offset: disabled
> [2692.737388] ---[ end Kernel panic - not syncing: Fatal exception in interrupt



--
Thanks,
//richard

2018-03-31 22:14:17

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Richard Weinberger wrote:
> On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner <[email protected]> wrote:
> > I reported this before but noone responded.
>
> Because you're sending only to LKML.
> CC'ing storage folks.

Thank you. I wasn't sure who I needed to send it to.

2018-04-01 11:41:15

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Bart Van Assche wrote:
> On Sat, 2018-03-31 at 18:12 -0400, Wakko Warner wrote:
> > Richard Weinberger wrote:
> > > On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner <[email protected]> wrote:
> > > > I reported this before but noone responded.
> > >
> > > Because you're sending only to LKML.
> > > CC'ing storage folks.
> >
> > Thank you. I wasn't sure who I needed to send it to.
>
> Can you share the output of lsscsi? I would like to know whether or not you
> are using a (S)ATA CDROM.

From the target:
[4:0:0:0] cd/dvd ATAPI iHAS224 B GL05 /dev/sr0
[5:0:0:0] cd/dvd ATAPI iHAS422 8 4L11 /dev/sr1
[6:0:0:0] cd/dvd PBDS DVD+-RW DH-16W1S 2D14 /dev/sr2

From the initiator:
[19:0:0:0] cd/dvd ATAPI iHAS224 B GL05 /dev/sr1
[19:0:0:1] cd/dvd ATAPI iHAS422 8 4L11 /dev/sr2
[19:0:0:2] cd/dvd PBDS DVD+-RW DH-16W1S 2D14 /dev/sr3


I tested 4.14.32 last night with the same oops. 4.9.91 works fine.
From the initiator, if I do cat /dev/sr1 > /dev/null it works. If I mount
/dev/sr1 and then do find -type f | xargs cat > /dev/null the target
crashes. I'm using the builtin iscsi target with pscsi. I can burn from
the initiator with out problems. I'll test other kernels between 4.9 and
4.14.

2018-04-01 15:05:03

by Bart Van Assche

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

On Sun, 2018-04-01 at 07:37 -0400, Wakko Warner wrote:
> Bart Van Assche wrote:
> > On Sat, 2018-03-31 at 18:12 -0400, Wakko Warner wrote:
> > > Richard Weinberger wrote:
> > > > On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner <[email protected]> wrote:
> > > > > I reported this before but noone responded.
> > > >
> > > > Because you're sending only to LKML.
> > > > CC'ing storage folks.
> > >
> > > Thank you. I wasn't sure who I needed to send it to.
> >
> > Can you share the output of lsscsi? I would like to know whether or not you
> > are using a (S)ATA CDROM.
>
> From the target:
> [4:0:0:0] cd/dvd ATAPI iHAS224 B GL05 /dev/sr0
> [5:0:0:0] cd/dvd ATAPI iHAS422 8 4L11 /dev/sr1
> [6:0:0:0] cd/dvd PBDS DVD+-RW DH-16W1S 2D14 /dev/sr2
>
> From the initiator:
> [19:0:0:0] cd/dvd ATAPI iHAS224 B GL05 /dev/sr1
> [19:0:0:1] cd/dvd ATAPI iHAS422 8 4L11 /dev/sr2
> [19:0:0:2] cd/dvd PBDS DVD+-RW DH-16W1S 2D14 /dev/sr3
>
> I tested 4.14.32 last night with the same oops. 4.9.91 works fine.
> From the initiator, if I do cat /dev/sr1 > /dev/null it works. If I mount
> /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> crashes. I'm using the builtin iscsi target with pscsi. I can burn from
> the initiator with out problems. I'll test other kernels between 4.9 and
> 4.14.

(+Lee and Chris)

Hello Wakko,

Although I'm not sure that what I ran into is exactly the same as what you
ran into, there is definitely something wrong with what I encountered. What
I ran into with Linus' latest master branch indicates two issues - one in
the iSCSI initiator and one in the block layer:

scsi 3:0:0:1: Direct-Access LIO-ORG FILEIO 4.0 PQ: 0 ANSI: 5
sd 2:0:0:1: [sdd] Attached SCSI disk
sd 3:0:0:1: Warning! Received an indication that the LUN assignments on this
target have changed. The Linux SCSI layer does not automatical
sd 3:0:0:1: Attached scsi generic sg8 type 0
sd 3:0:0:1: [sdf] 128 512-byte logical blocks: (65.5 kB/64.0 KiB)
sd 3:0:0:1: [sdf] Write Protect is off
sd 3:0:0:1: [sdf] Mode Sense: 43 00 00 08
sd 3:0:0:1: [sdf] Write cache: disabled, read cache: enabled, doesn't
support DPO or FUA
iSCSI/iqn.1993-08.org.debian:01:3b68b1b3d2eb: Unsupported SCSI Opcode 0xa3,
sending CHECK_CONDITION.
sd 3:0:0:2: [sde] Attached SCSI disk
sd 3:0:0:1: [sdf] Attached SCSI disk

=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
4.16.0-rc7-dbg+ #3 Not tainted
-----------------------------------------------------
kworker/6:1H/155 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
(&(&session->frwd_lock)->rlock){+.-.}, at: [<000000007eb678ec>]
iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]

and this task is already holding:
(&(&q->__queue_lock)->rlock){-.-.}, at: [<000000003c5841ec>]
blk_timeout_work+0x45/0x220
which would create a new lock dependency:
(&(&q->__queue_lock)->rlock){-.-.} -> (&(&session->frwd_lock)->rlock){+.-.}

but this new dependency connects a HARDIRQ-irq-safe lock:
(&(&q->__queue_lock)->rlock){-.-.}

... which became HARDIRQ-irq-safe at:
_raw_spin_lock_irqsave+0x46/0x60
ata_qc_schedule_eh+0xb9/0x110 [libata]
ata_sff_hsm_move+0x114/0xb10 [libata]
__ata_sff_port_intr+0xec/0x1a0 [libata]
ata_bmdma_port_intr+0xef/0x160 [libata]
ata_bmdma_interrupt+0x160/0x2e0 [libata]
__handle_irq_event_percpu+0x72/0x460
handle_irq_event_percpu+0x66/0xd0
handle_irq_event+0x54/0x90
handle_edge_irq+0xe9/0x2d0
handle_irq+0x17b/0x210
do_IRQ+0x6c/0x140
ret_from_intr+0x0/0x1e
native_safe_halt+0x2/0x10
default_idle+0x1f/0x200
do_idle+0x1bc/0x1f0
cpu_startup_entry+0x19/0x20
start_kernel+0x47f/0x4a1
secondary_startup_64+0xa5/0xb0

to a HARDIRQ-irq-unsafe lock:
(&(&session->frwd_lock)->rlock){+.-.}

... which became HARDIRQ-irq-unsafe at:
...
_raw_spin_lock_bh+0x34/0x40
iscsi_conn_setup+0x239/0x320 [libiscsi]
iscsi_tcp_conn_setup+0x14/0x80 [libiscsi_tcp]
iscsi_sw_tcp_conn_create+0x1f/0x24a [iscsi_tcp]
iscsi_if_rx+0x13c9/0x20d0 [scsi_transport_iscsi]
netlink_unicast+0x299/0x330
netlink_sendmsg+0x435/0x580
___sys_sendmsg+0x447/0x4d0
__sys_sendmsg+0xad/0x120
do_syscall_64+0xf3/0x2c0
entry_SYSCALL_64_after_hwframe+0x42/0xb7

other info that might help us debug this:

Possible interrupt unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&(&session->frwd_lock)->rlock);
local_irq_disable();
lock(&(&q->__queue_lock)->rlock);
lock(&(&session->frwd_lock)->rlock);
<Interrupt>
lock(&(&q->__queue_lock)->rlock);

*** DEADLOCK ***

3 locks held by kworker/6:1H/155:
#0: ((wq_completion)"kblockd"){+.+.}, at: [<00000000116fed84>]
process_one_work+0x387/0xa50
#1: ((work_completion)(&q->timeout_work)){+.+.}, at: [<00000000116fed84>]
process_one_work+0x387/0xa50
#2: (&(&q->__queue_lock)->rlock){-.-.}, at: [<000000003c5841ec>]
blk_timeout_work+0x45/0x220

the dependencies between HARDIRQ-irq-safe lock and the holding lock:
-> (&(&q->__queue_lock)->rlock){-.-.} ops: 5585 {
IN-HARDIRQ-W at:
_raw_spin_lock_irqsave+0x46/0x60
ata_qc_schedule_eh+0xb9/0x110 [libata]
ata_sff_hsm_move+0x114/0xb10 [libata]
__ata_sff_port_intr+0xec/0x1a0 [libata]
ata_bmdma_port_intr+0xef/0x160 [libata]
ata_bmdma_interrupt+0x160/0x2e0 [libata]
__handle_irq_event_percpu+0x72/0x460
handle_irq_event_percpu+0x66/0xd0
handle_irq_event+0x54/0x90
handle_edge_irq+0xe9/0x2d0
handle_irq+0x17b/0x210
do_IRQ+0x6c/0x140
ret_from_intr+0x0/0x1e
native_safe_halt+0x2/0x10
default_idle+0x1f/0x200
do_idle+0x1bc/0x1f0
cpu_startup_entry+0x19/0x20
start_kernel+0x47f/0x4a1
secondary_startup_64+0xa5/0xb0
IN-SOFTIRQ-W at:
_raw_spin_lock_irqsave+0x46/0x60
scsi_end_request+0x199/0x310 [scsi_mod]
scsi_io_completion+0x3be/0x980 [scsi_mod]
blk_done_softirq+0x177/0x1c0
__do_softirq+0x117/0x5f5
irq_exit+0xe8/0xf0
do_IRQ+0xb6/0x140
ret_from_intr+0x0/0x1e
native_safe_halt+0x2/0x10
default_idle+0x1f/0x200
do_idle+0x1bc/0x1f0
cpu_startup_entry+0x19/0x20
start_kernel+0x47f/0x4a1
secondary_startup_64+0xa5/0xb0
INITIAL USE at:
_raw_spin_lock_irq+0x3b/0x50
blkcg_init_queue+0x97/0x1c0
blk_alloc_queue_node+0x41d/0x4b0
blk_mq_init_queue+0x24/0x60
virtblk_probe+0x633/0xfef [virtio_blk]
virtio_dev_probe+0x259/0x380 [virtio]
driver_probe_device+0x469/0x680
__driver_attach+0xef/0x120
bus_for_each_dev+0xe4/0x130
bus_add_driver+0x24c/0x380
driver_register+0xc6/0x170
ata_generic_init_one+0x5b/0x260 [ata_generic]
do_one_initcall+0x79/0x1b7
do_init_module+0xde/0x32d
load_module+0x3964/0x47d0
SYSC_finit_module+0x176/0x1a0
do_syscall_64+0xf3/0x2c0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
}
... key at: [<00000000da510125>] __key.53375+0x0/0x40
... acquired at:
_raw_spin_lock+0x2f/0x40
iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]
scsi_times_out+0xcc/0x3f0 [scsi_mod]
blk_rq_timed_out+0x2f/0x70
blk_timeout_work+0x1b0/0x220
process_one_work+0x446/0xa50
worker_thread+0x7b/0x6d0
kthread+0x1b7/0x1e0
ret_from_fork+0x24/0x30


the dependencies between the lock to be acquired
and HARDIRQ-irq-unsafe lock:
-> (&(&session->frwd_lock)->rlock){+.-.} ops: 1985 {
HARDIRQ-ON-W at:
_raw_spin_lock_bh+0x34/0x40
iscsi_conn_setup+0x239/0x320 [libiscsi]
iscsi_tcp_conn_setup+0x14/0x80 [libiscsi_tcp]
iscsi_sw_tcp_conn_create+0x1f/0x24a [iscsi_tcp]
iscsi_if_rx+0x13c9/0x20d0 [scsi_transport_iscsi]
netlink_unicast+0x299/0x330
netlink_sendmsg+0x435/0x580
___sys_sendmsg+0x447/0x4d0
__sys_sendmsg+0xad/0x120
do_syscall_64+0xf3/0x2c0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
IN-SOFTIRQ-W at:
_raw_spin_lock_bh+0x34/0x40
iscsi_queuecommand+0xef/0x960 [libiscsi]
scsi_dispatch_cmd+0x1c7/0x550 [scsi_mod]
scsi_request_fn+0x823/0xaf0 [scsi_mod]
__blk_run_queue+0xc5/0x160
blk_run_queue+0x48/0x70
scsi_run_queue+0x475/0x5d0 [scsi_mod]
scsi_end_request+0x1cd/0x310 [scsi_mod]
scsi_io_completion+0x3be/0x980 [scsi_mod]
blk_done_softirq+0x177/0x1c0
__do_softirq+0x117/0x5f5
run_ksoftirqd+0x2e/0x50
smpboot_thread_fn+0x314/0x410
kthread+0x1b7/0x1e0
ret_from_fork+0x24/0x30
INITIAL USE at:
_raw_spin_lock_bh+0x34/0x40
iscsi_conn_setup+0x239/0x320 [libiscsi]
iscsi_tcp_conn_setup+0x14/0x80 [libiscsi_tcp]
iscsi_sw_tcp_conn_create+0x1f/0x24a [iscsi_tcp]
iscsi_if_rx+0x13c9/0x20d0 [scsi_transport_iscsi]
netlink_unicast+0x299/0x330
netlink_sendmsg+0x435/0x580
___sys_sendmsg+0x447/0x4d0
__sys_sendmsg+0xad/0x120
do_syscall_64+0xf3/0x2c0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
}
... key at: [<00000000df8f47b5>] __key.68290+0x0/0xffffffffffff6300
[libiscsi]
... acquired at:
_raw_spin_lock+0x2f/0x40
iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]
scsi_times_out+0xcc/0x3f0 [scsi_mod]
blk_rq_timed_out+0x2f/0x70
blk_timeout_work+0x1b0/0x220
process_one_work+0x446/0xa50
worker_thread+0x7b/0x6d0
kthread+0x1b7/0x1e0
ret_from_fork+0x24/0x30


stack backtrace:
CPU: 6 PID: 155 Comm: kworker/6:1H Not tainted 4.16.0-rc7-dbg+ #3
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: kblockd blk_timeout_work
Call Trace:
dump_stack+0x85/0xc5
check_usage+0x6e7/0x700
? check_usage_forwards+0x220/0x220
? find_next_and_bit+0x51/0xe0
? cpumask_next_and+0x20/0x30
? find_busiest_group+0xc94/0x1010
? class_equal+0x11/0x20
? __bfs+0x62/0x2e0
? class_equal+0x11/0x20
? __bfs+0xfb/0x2e0
? __lock_acquire+0x17aa/0x1af0
__lock_acquire+0x17aa/0x1af0
? mark_lock+0xc7/0x770
? debug_check_no_locks_freed+0x1b0/0x1b0
? __lock_acquire+0x583/0x1af0
? mark_lock+0xc7/0x770
? lock_pin_lock+0x160/0x160
? debug_check_no_locks_freed+0x1b0/0x1b0
? lock_acquire+0xc9/0x260
lock_acquire+0xc9/0x260
? iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]
_raw_spin_lock+0x2f/0x40
? iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]
iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]
scsi_times_out+0xcc/0x3f0 [scsi_mod]
blk_rq_timed_out+0x2f/0x70
blk_timeout_work+0x1b0/0x220
process_one_work+0x446/0xa50
? pwq_dec_nr_in_flight+0x100/0x100
? worker_thread+0x177/0x6d0
worker_thread+0x7b/0x6d0
? process_one_work+0xa50/0xa50
kthread+0x1b7/0x1e0
? kthread_create_worker_on_cpu+0xc0/0xc0
ret_from_fork+0x24/0x30

[ ... ]

------------[ cut here ]------------
kernel BUG at block/blk-core.c:3267!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
Modules linked in: sd_mod crc32c_generic target_core_pscsi
target_core_iblock target_core_file iscsi_target_mod target_core_mod brd
i2c_piix4 sg virtio_balloon i2c_core af_packet button ib_iser rdma_cm iw_cm
ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
ip_tables x_tables autofs4 hid_generic usbhid hid ext4 crc16 mbcache jbd2
sr_mod cdrom ata_generic pata_acpi virtio_blk virtio_net ata_piix xhci_pci
xhci_hcd libata virtio_pci usbcore scsi_mod virtio_ring intel_agp usb_common
intel_gtt virtio agpgart
CPU: 2 PID: 151 Comm: scsi_eh_1 Tainted: G W 4.16.0-rc7-dbg+
#3
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
RIP: 0010:__blk_end_request_all+0xda/0xe0
RSP: 0018:ffff88006192f980 EFLAGS: 00010002
sr 2:0:0:3: rejecting I/O to offline device
sr 3:0:0:3: rejecting I/O to offline device
RAX: 0000000000000001 RBX: ffff88006818e780 RCX: ffffffff814065a6
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88006818e838
RBP: 000000000000000a R08: 0000000000000000 R09: 0000000000000012
R10: ffff88006192f588 R11: 000000005e4786a3 R12: 0000000000000000
R13: 0000000000000000 R14: ffff880061280160 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff88006d280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1fcc53f010 CR3: 00000000666fc000 CR4: 00000000000006e0
Call Trace:
blk_peek_request+0x1ff/0x5f0
scsi_request_fn+0x48/0xaf0 [scsi_mod]
? lock_acquire+0xc9/0x260
__blk_run_queue+0xc5/0x160
blk_run_queue+0x48/0x70
scsi_run_queue+0x475/0x5d0 [scsi_mod]
? scsi_io_completion+0x69e/0x980 [scsi_mod]
? sdev_evt_alloc+0x80/0x80 [scsi_mod]
? blk_queue_end_tag+0x10a/0x210
? __list_add_valid+0x29/0xa0
? do_raw_spin_unlock+0x91/0x120
scsi_io_completion+0x6a6/0x980 [scsi_mod]
? lock_downgrade+0x200/0x2b0
? scsi_end_request+0x310/0x310 [scsi_mod]
? scsi_device_unbusy+0x3b/0x60 [scsi_mod]
scsi_eh_flush_done_q+0x1a6/0x210 [scsi_mod]
ata_scsi_port_error_handler+0x794/0xb00 [libata]
ata_scsi_error+0x168/0x1a0 [libata]
? ata_scsi_port_error_handler+0xb00/0xb00 [libata]
? _raw_spin_unlock_irqrestore+0x59/0x70
scsi_error_handler+0x1b5/0xa40 [scsi_mod]
? scsi_eh_get_sense+0x3b0/0x3b0 [scsi_mod]
? __sched_text_start+0x8/0x8
? __wake_up_common+0x9c/0x230
? mark_held_locks+0x1c/0x90
? _raw_spin_unlock_irqrestore+0x59/0x70
? scsi_eh_get_sense+0x3b0/0x3b0 [scsi_mod]
kthread+0x1b7/0x1e0
? kthread_create_worker_on_cpu+0xc0/0xc0
ret_from_fork+0x24/0x30
Code: 85 c0 75 02 0f 0b 48 89 df e8 b3 99 eb ff 4c 8b 23 e9 6e ff ff ff 0f
0b eb 82 49 8d 7c 24 20 e8 9d 98 eb ff 45 8b 6c 24 20 eb 8c <0f> 0b 0f 1f 40
00 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 48
RIP: __blk_end_request_all+0xda/0xe0 RSP: ffff88006192f980
---[ end trace b9c2429e31acedb4 ]---

2018-04-01 16:25:46

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Bart Van Assche wrote:
> On Sun, 2018-04-01 at 07:37 -0400, Wakko Warner wrote:
> > Bart Van Assche wrote:
> > > On Sat, 2018-03-31 at 18:12 -0400, Wakko Warner wrote:
> > > > Richard Weinberger wrote:
> > > > > On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner <[email protected]> wrote:
> > > > > > I reported this before but noone responded.
> > > > >
> > > > > Because you're sending only to LKML.
> > > > > CC'ing storage folks.
> > > >
> > > > Thank you. I wasn't sure who I needed to send it to.
> > >
> > > Can you share the output of lsscsi? I would like to know whether or not you
> > > are using a (S)ATA CDROM.
> >
> > From the target:
> > [4:0:0:0] cd/dvd ATAPI iHAS224 B GL05 /dev/sr0
> > [5:0:0:0] cd/dvd ATAPI iHAS422 8 4L11 /dev/sr1
> > [6:0:0:0] cd/dvd PBDS DVD+-RW DH-16W1S 2D14 /dev/sr2
> >
> > From the initiator:
> > [19:0:0:0] cd/dvd ATAPI iHAS224 B GL05 /dev/sr1
> > [19:0:0:1] cd/dvd ATAPI iHAS422 8 4L11 /dev/sr2
> > [19:0:0:2] cd/dvd PBDS DVD+-RW DH-16W1S 2D14 /dev/sr3
> >
> > I tested 4.14.32 last night with the same oops. 4.9.91 works fine.
> > From the initiator, if I do cat /dev/sr1 > /dev/null it works. If I mount
> > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> > crashes. I'm using the builtin iscsi target with pscsi. I can burn from
> > the initiator with out problems. I'll test other kernels between 4.9 and
> > 4.14.
>
> (+Lee and Chris)
>
> Hello Wakko,
>
> Although I'm not sure that what I ran into is exactly the same as what you
> ran into, there is definitely something wrong with what I encountered. What
> I ran into with Linus' latest master branch indicates two issues - one in
> the iSCSI initiator and one in the block layer:
>
> scsi 3:0:0:1: Direct-Access LIO-ORG FILEIO 4.0 PQ: 0 ANSI: 5
> sd 2:0:0:1: [sdd] Attached SCSI disk
> sd 3:0:0:1: Warning! Received an indication that the LUN assignments on this
> target have changed. The Linux SCSI layer does not automatical
> sd 3:0:0:1: Attached scsi generic sg8 type 0
> sd 3:0:0:1: [sdf] 128 512-byte logical blocks: (65.5 kB/64.0 KiB)
> sd 3:0:0:1: [sdf] Write Protect is off
> sd 3:0:0:1: [sdf] Mode Sense: 43 00 00 08
> sd 3:0:0:1: [sdf] Write cache: disabled, read cache: enabled, doesn't
> support DPO or FUA
> iSCSI/iqn.1993-08.org.debian:01:3b68b1b3d2eb: Unsupported SCSI Opcode 0xa3,
> sending CHECK_CONDITION.
> sd 3:0:0:2: [sde] Attached SCSI disk
> sd 3:0:0:1: [sdf] Attached SCSI disk
>
> =====================================================
> WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
> 4.16.0-rc7-dbg+ #3 Not tainted
> -----------------------------------------------------
> kworker/6:1H/155 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> (&(&session->frwd_lock)->rlock){+.-.}, at: [<000000007eb678ec>]
> iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]

[trimmed]

I'm not sure. Mine happens as 2 oopses. Both have <IRQ> </IRQ> lines.
The files mine happen in are drivers/scsi/scsi_lib.c followed by
block/blk-core.c

The first one, the stack trace began with <IRQ> then scsi_setup_cmnd. I
tested 4.10.x, 4.11.x 4.12.x 4.14.x 4.15.x where x is the latest patch
(except for 4.15). ALL crash. 4.9.91 doesn't. 4.10 added dump_stack
__warn scsi_init_io after <IRQ> and before scsi_setup_cmnd. Within seconds
of issueing the command to read files, it crashes. On 4.15, if I just do a
sequential read from the raw device, it doesn't crash.

What do you enable in the kernel to get those locking messages?

> stack backtrace:
> CPU: 6 PID: 155 Comm: kworker/6:1H Not tainted 4.16.0-rc7-dbg+ #3
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> Workqueue: kblockd blk_timeout_work
> Call Trace:
> dump_stack+0x85/0xc5
> check_usage+0x6e7/0x700
> ? check_usage_forwards+0x220/0x220
> ? find_next_and_bit+0x51/0xe0
> ? cpumask_next_and+0x20/0x30
> ? find_busiest_group+0xc94/0x1010
> ? class_equal+0x11/0x20
> ? __bfs+0x62/0x2e0
> ? class_equal+0x11/0x20
> ? __bfs+0xfb/0x2e0
> ? __lock_acquire+0x17aa/0x1af0
> __lock_acquire+0x17aa/0x1af0
> ? mark_lock+0xc7/0x770
> ? debug_check_no_locks_freed+0x1b0/0x1b0
> ? __lock_acquire+0x583/0x1af0
> ? mark_lock+0xc7/0x770
> ? lock_pin_lock+0x160/0x160
> ? debug_check_no_locks_freed+0x1b0/0x1b0
> ? lock_acquire+0xc9/0x260
> lock_acquire+0xc9/0x260
> ? iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]
> _raw_spin_lock+0x2f/0x40
> ? iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]
> iscsi_eh_cmd_timed_out+0x6b/0x5a0 [libiscsi]
> scsi_times_out+0xcc/0x3f0 [scsi_mod]
> blk_rq_timed_out+0x2f/0x70
> blk_timeout_work+0x1b0/0x220
> process_one_work+0x446/0xa50
> ? pwq_dec_nr_in_flight+0x100/0x100
> ? worker_thread+0x177/0x6d0
> worker_thread+0x7b/0x6d0
> ? process_one_work+0xa50/0xa50
> kthread+0x1b7/0x1e0
> ? kthread_create_worker_on_cpu+0xc0/0xc0
> ret_from_fork+0x24/0x30
>
> [ ... ]
>
> ------------[ cut here ]------------
> kernel BUG at block/blk-core.c:3267!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> Modules linked in: sd_mod crc32c_generic target_core_pscsi
> target_core_iblock target_core_file iscsi_target_mod target_core_mod brd
> i2c_piix4 sg virtio_balloon i2c_core af_packet button ib_iser rdma_cm iw_cm
> ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
> ip_tables x_tables autofs4 hid_generic usbhid hid ext4 crc16 mbcache jbd2
> sr_mod cdrom ata_generic pata_acpi virtio_blk virtio_net ata_piix xhci_pci
> xhci_hcd libata virtio_pci usbcore scsi_mod virtio_ring intel_agp usb_common
> intel_gtt virtio agpgart
> CPU: 2 PID: 151 Comm: scsi_eh_1 Tainted: G W 4.16.0-rc7-dbg+
> #3
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> RIP: 0010:__blk_end_request_all+0xda/0xe0
> RSP: 0018:ffff88006192f980 EFLAGS: 00010002
> sr 2:0:0:3: rejecting I/O to offline device
> sr 3:0:0:3: rejecting I/O to offline device
> RAX: 0000000000000001 RBX: ffff88006818e780 RCX: ffffffff814065a6
> RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88006818e838
> RBP: 000000000000000a R08: 0000000000000000 R09: 0000000000000012
> R10: ffff88006192f588 R11: 000000005e4786a3 R12: 0000000000000000
> R13: 0000000000000000 R14: ffff880061280160 R15: 0000000000000001
> FS: 0000000000000000(0000) GS:ffff88006d280000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f1fcc53f010 CR3: 00000000666fc000 CR4: 00000000000006e0
> Call Trace:
> blk_peek_request+0x1ff/0x5f0
> scsi_request_fn+0x48/0xaf0 [scsi_mod]
> ? lock_acquire+0xc9/0x260
> __blk_run_queue+0xc5/0x160
> blk_run_queue+0x48/0x70
> scsi_run_queue+0x475/0x5d0 [scsi_mod]
> ? scsi_io_completion+0x69e/0x980 [scsi_mod]
> ? sdev_evt_alloc+0x80/0x80 [scsi_mod]
> ? blk_queue_end_tag+0x10a/0x210
> ? __list_add_valid+0x29/0xa0
> ? do_raw_spin_unlock+0x91/0x120
> scsi_io_completion+0x6a6/0x980 [scsi_mod]
> ? lock_downgrade+0x200/0x2b0
> ? scsi_end_request+0x310/0x310 [scsi_mod]
> ? scsi_device_unbusy+0x3b/0x60 [scsi_mod]
> scsi_eh_flush_done_q+0x1a6/0x210 [scsi_mod]
> ata_scsi_port_error_handler+0x794/0xb00 [libata]
> ata_scsi_error+0x168/0x1a0 [libata]
> ? ata_scsi_port_error_handler+0xb00/0xb00 [libata]
> ? _raw_spin_unlock_irqrestore+0x59/0x70
> scsi_error_handler+0x1b5/0xa40 [scsi_mod]
> ? scsi_eh_get_sense+0x3b0/0x3b0 [scsi_mod]
> ? __sched_text_start+0x8/0x8
> ? __wake_up_common+0x9c/0x230
> ? mark_held_locks+0x1c/0x90
> ? _raw_spin_unlock_irqrestore+0x59/0x70
> ? scsi_eh_get_sense+0x3b0/0x3b0 [scsi_mod]
> kthread+0x1b7/0x1e0
> ? kthread_create_worker_on_cpu+0xc0/0xc0
> ret_from_fork+0x24/0x30
> Code: 85 c0 75 02 0f 0b 48 89 df e8 b3 99 eb ff 4c 8b 23 e9 6e ff ff ff 0f
> 0b eb 82 49 8d 7c 24 20 e8 9d 98 eb ff 45 8b 6c 24 20 eb 8c <0f> 0b 0f 1f 40
> 00 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 48
> RIP: __blk_end_request_all+0xda/0xe0 RSP: ffff88006192f980
> ---[ end trace b9c2429e31acedb4 ]---
--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.

2018-04-01 16:37:40

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Wakko Warner wrote:
> Bart Van Assche wrote:
> > On Sat, 2018-03-31 at 18:12 -0400, Wakko Warner wrote:
> > > Richard Weinberger wrote:
> > > > On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner <[email protected]> wrote:
> > > > > I reported this before but noone responded.
> > > >
> > > > Because you're sending only to LKML.
> > > > CC'ing storage folks.
> > >
> > > Thank you. I wasn't sure who I needed to send it to.
> >
> > Can you share the output of lsscsi? I would like to know whether or not you
> > are using a (S)ATA CDROM.
>
> >From the target:
> [4:0:0:0] cd/dvd ATAPI iHAS224 B GL05 /dev/sr0
> [5:0:0:0] cd/dvd ATAPI iHAS422 8 4L11 /dev/sr1
> [6:0:0:0] cd/dvd PBDS DVD+-RW DH-16W1S 2D14 /dev/sr2
>
> >From the initiator:
> [19:0:0:0] cd/dvd ATAPI iHAS224 B GL05 /dev/sr1
> [19:0:0:1] cd/dvd ATAPI iHAS422 8 4L11 /dev/sr2
> [19:0:0:2] cd/dvd PBDS DVD+-RW DH-16W1S 2D14 /dev/sr3
>
>
> I tested 4.14.32 last night with the same oops. 4.9.91 works fine.
> >From the initiator, if I do cat /dev/sr1 > /dev/null it works. If I mount
> /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> crashes. I'm using the builtin iscsi target with pscsi. I can burn from
> the initiator with out problems. I'll test other kernels between 4.9 and
> 4.14.

So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest patch
(except for 4.15 which was 1 behind)
Each of these kernels crash within seconds or immediate of doing find -type
f | xargs cat > /dev/null from the initiator.

I did a diff between 4.9.91 and 4.10.17 on scsi_lib.c. Here's the
difference around the line reported (in this case 1043). I've added the
4.10.17 oops at the end:

@@ -1029,10 +1038,10 @@ int scsi_init_io(struct scsi_cmnd *cmd)
struct scsi_device *sdev = cmd->device;
struct request *rq = cmd->request;
bool is_mq = (rq->mq_ctx != NULL);
- int error;
+ int error = BLKPREP_KILL;

- if (WARN_ON_ONCE(!rq->nr_phys_segments))
- return -EINVAL;
+ if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq)))
+ goto err_exit;

error = scsi_init_sgtable(rq, &cmd->sdb);
if (error)

Oops:
[ 158.157590] ------------[ cut here ]------------
[ 158.157601] WARNING: CPU: 0 PID: 0 at /usr/src/linux/dist/4.10.17-nobklcd/drivers/scsi/scsi_lib.c:1043 scsi_init_io+0x1d7/0x1e0 [scsi_mod]
[ 158.157603] Modules linked in: iscsi_target_mod tcm_loop af_packet vhost_scsi vhost target_core_file target_core_iblock target_core_pscsi target_core_mod nfsd exportfs dummy bridge stp llc ib_iser rdma_cm iw_cm ib_cm ib_core ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi netconsole configfs sr_mod cdrom sd_mod sg adt7475 hwmon_vid coretemp x86_pkg_temp_thermal kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc snd_hda_codec_realtek snd_hda_codec_generic nouveau video led_class drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea ttm drm agpgart snd_hda_intel snd_hda_codec snd_hda_core mptsas snd_pcm_oss snd_mixer_oss mptscsih mpt3sas snd_pcm mptbase snd_timer raid_class aesni_intel snd scsi_transport_sas
[ 158.157634] igb soundcore aes_x86_64 crypto_simd ahci glue_helper libahci hwmon libata i2c_algo_bit i2c_core scsi_mod wmi hed button unix
[ 158.157642] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.17 #1
[ 158.157644] Hardware name: Dell Inc. Precision T5610/0WN7Y6, BIOS A16 02/05/2018
[ 158.157645] Call Trace:
[ 158.157647] <IRQ>
[ 158.157651] ? dump_stack+0x46/0x5a
[ 158.157653] ? __warn+0xb4/0xd0
[ 158.157656] ? scsi_init_io+0x1d7/0x1e0 [scsi_mod]
[ 158.157658] ? scsi_setup_cmnd+0x4c/0x140 [scsi_mod]
[ 158.157661] ? scsi_prep_fn+0xe3/0x170 [scsi_mod]
[ 158.157663] ? swiotlb_unmap_sg_attrs+0x44/0x60
[ 158.157665] ? blk_peek_request+0x130/0x200
[ 158.157668] ? scsi_request_fn+0x2b/0x510 [scsi_mod]
[ 158.157670] ? __blk_run_queue+0x2a/0x40
[ 158.157672] ? blk_run_queue+0x1c/0x30
[ 158.157675] ? scsi_run_queue+0x229/0x2b0 [scsi_mod]
[ 158.157677] ? scsi_io_completion+0x3d6/0x5c0 [scsi_mod]
[ 158.157680] ? blk_done_softirq+0x67/0x80
[ 158.157682] ? __do_softirq+0xdb/0x200
[ 158.157683] ? irq_exit+0xa3/0xb0
[ 158.157686] ? do_IRQ+0x45/0xc0
[ 158.157689] ? common_interrupt+0x7c/0x7c
[ 158.157690] </IRQ>
[ 158.157693] ? cpuidle_enter_state+0x144/0x1f0
[ 158.157694] ? cpuidle_enter_state+0x139/0x1f0
[ 158.157696] ? do_idle+0xd3/0x190
[ 158.157698] ? cpu_startup_entry+0x14/0x20
[ 158.157700] ? start_kernel+0x391/0x399
[ 158.157701] ? start_cpu+0x14/0x14
[ 158.157703] ---[ end trace 8d60c2e92fac2697 ]---
[ 158.157711] ------------[ cut here ]------------
[ 158.157732] kernel BUG at /usr/src/linux/dist/4.10.17-nobklcd/block/blk-core.c:2916!
[ 158.157755] invalid opcode: 0000 [#1] PREEMPT SMP
[ 158.157770] Modules linked in: iscsi_target_mod tcm_loop af_packet vhost_scsi vhost target_core_file target_core_iblock target_core_pscsi target_core_mod nfsd exportfs dummy bridge stp llc ib_iser rdma_cm iw_cm ib_cm ib_core ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi netconsole configfs sr_mod cdrom sd_mod sg adt7475 hwmon_vid coretemp x86_pkg_temp_thermal kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc snd_hda_codec_realtek snd_hda_codec_generic nouveau video led_class drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea ttm drm agpgart snd_hda_intel snd_hda_codec snd_hda_core mptsas snd_pcm_oss snd_mixer_oss mptscsih mpt3sas snd_pcm mptbase snd_timer raid_class aesni_intel snd scsi_transport_sas
[ 158.157968] igb soundcore aes_x86_64 crypto_simd ahci glue_helper libahci hwmon libata i2c_algo_bit i2c_core scsi_mod wmi hed button unix
[ 158.158005] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.10.17 #1
[ 158.158024] Hardware name: Dell Inc. Precision T5610/0WN7Y6, BIOS A16 02/05/2018
[ 158.158045] task: ffffffff8180e4c0 task.stack: ffffffff81800000
[ 158.158063] RIP: 0010:__blk_end_request_all+0x2a/0x30
[ 158.158077] RSP: 0018:ffff8806b7803df0 EFLAGS: 00010002
[ 158.158093] RAX: 0000000000000001 RBX: ffff8806abfdb2f0 RCX: 0000000000000000
[ 158.158113] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8806abfdb2f0
[ 158.158134] RBP: ffff8806accb28d0 R08: 0000000000000000 R09: 0000000000000000
[ 158.158153] R10: ffffffff81806a40 R11: 0000000000000000 R12: 00000000ffffff87
[ 158.158173] R13: 00000000fffffffb R14: 00000000fffffffb R15: 0000000000000000
[ 158.158193] FS: 0000000000000000(0000) GS:ffff8806b7800000(0000) knlGS:0000000000000000
[ 158.158215] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 158.158231] CR2: 00007ffdeb1091b8 CR3: 0000000001809000 CR4: 00000000001406f0
[ 158.158250] Call Trace:
[ 158.158258] <IRQ>
[ 158.158265] ? blk_peek_request+0x16b/0x200
[ 158.158279] ? scsi_request_fn+0x2b/0x510 [scsi_mod]
[ 158.158294] ? __blk_run_queue+0x2a/0x40
[ 158.158306] ? blk_run_queue+0x1c/0x30
[ 158.158319] ? scsi_run_queue+0x229/0x2b0 [scsi_mod]
[ 158.158334] ? scsi_io_completion+0x3d6/0x5c0 [scsi_mod]
[ 158.158350] ? blk_done_softirq+0x67/0x80
[ 158.158362] ? __do_softirq+0xdb/0x200
[ 158.158374] ? irq_exit+0xa3/0xb0
[ 158.158384] ? do_IRQ+0x45/0xc0
[ 158.158394] ? common_interrupt+0x7c/0x7c
[ 158.158407] </IRQ>
[ 158.158415] ? cpuidle_enter_state+0x144/0x1f0
[ 158.158429] ? cpuidle_enter_state+0x139/0x1f0
[ 158.158443] ? do_idle+0xd3/0x190
[ 158.158453] ? cpu_startup_entry+0x14/0x20
[ 158.158466] ? start_kernel+0x391/0x399
[ 158.158478] ? start_cpu+0x14/0x14
[ 158.158488] Code: 00 48 8b 87 70 01 00 00 31 c9 48 85 c0 75 0d 8b 57 58 e8 1a ff ff ff 84 c0 75 10 c3 8b 48 58 8b 57 58 e8 0a ff ff ff 84 c0 74 f0 <0f> 0b 0f 1f 40 00 41 56 41 55 41 bd fb ff ff ff 41 54 41 bc 87
[ 158.158550] RIP: __blk_end_request_all+0x2a/0x30 RSP: ffff8806b7803df0
[ 158.161579] ---[ end trace 8d60c2e92fac2698 ]---
[ 158.161579] Kernel panic - not syncing: Fatal exception in interrupt
[ 158.161579] Kernel Offset: disabled
[ 158.161579] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.

2018-04-01 16:53:26

by Bart Van Assche

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

On Sun, 2018-04-01 at 12:24 -0400, Wakko Warner wrote:
> What do you enable in the kernel to get those locking messages?

Hello Wakko,

I have attached the script to this e-mail that I use to enable a bunch of
kernel debugging options. Please note that enabling these options,
especially lockdep and kasan, will slow down the kernel.

Bart.




Attachments:
enable-kernel-debugging-options (1.56 kB)
enable-kernel-debugging-options

2018-04-01 18:28:52

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Wakko Warner wrote:
> Wakko Warner wrote:
> > Bart Van Assche wrote:
> > > On Sat, 2018-03-31 at 18:12 -0400, Wakko Warner wrote:
> > > > Richard Weinberger wrote:
> > > > > On Sat, Mar 31, 2018 at 3:59 AM, Wakko Warner <[email protected]> wrote:
> > > > > > I reported this before but noone responded.
> > > > >
> > > > > Because you're sending only to LKML.
> > > > > CC'ing storage folks.
> > > >
> > > > Thank you. I wasn't sure who I needed to send it to.
> > >
> > > Can you share the output of lsscsi? I would like to know whether or not you
> > > are using a (S)ATA CDROM.
> >
> > >From the target:
> > [4:0:0:0] cd/dvd ATAPI iHAS224 B GL05 /dev/sr0
> > [5:0:0:0] cd/dvd ATAPI iHAS422 8 4L11 /dev/sr1
> > [6:0:0:0] cd/dvd PBDS DVD+-RW DH-16W1S 2D14 /dev/sr2
> >
> > >From the initiator:
> > [19:0:0:0] cd/dvd ATAPI iHAS224 B GL05 /dev/sr1
> > [19:0:0:1] cd/dvd ATAPI iHAS422 8 4L11 /dev/sr2
> > [19:0:0:2] cd/dvd PBDS DVD+-RW DH-16W1S 2D14 /dev/sr3
> >
> >
> > I tested 4.14.32 last night with the same oops. 4.9.91 works fine.
> > >From the initiator, if I do cat /dev/sr1 > /dev/null it works. If I mount
> > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> > crashes. I'm using the builtin iscsi target with pscsi. I can burn from
> > the initiator with out problems. I'll test other kernels between 4.9 and
> > 4.14.
>
> So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest patch
> (except for 4.15 which was 1 behind)
> Each of these kernels crash within seconds or immediate of doing find -type
> f | xargs cat > /dev/null from the initiator.

I tried 4.10.0. It doesn't completely lockup the system, but the device
that was used hangs. So from the initiator, it's /dev/sr1 and from the
target it's /dev/sr0. Attempting to read /dev/sr0 after the oops causes the
process to hang in D state.

Here's the oops. There was also another line that was not seen in the newer
kernels.
[ 323.105044] ------------[ cut here ]------------
[ 323.105057] WARNING: CPU: 0 PID: 0 at /usr/src/linux/dist/4.10/drivers/scsi/scsi_lib.c:1043 scsi_init_io+0x143/0x1f0 [scsi_mod]
[ 323.105058] Modules linked in: iscsi_target_mod af_packet tcm_loop vhost_scsi vhost target_core_file target_core_iblock target_core_pscsi target_core_mod nfsd exportfs dummy bridge stp llc ib_iser rdma_cm iw_cm ib_cm ib_core ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi netconsole configfs sr_mod cdrom sd_mod sg adt7475 hwmon_vid coretemp x86_pkg_temp_thermal kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc snd_hda_codec_realtek snd_hda_codec_generic nouveau video led_class drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea ttm drm snd_hda_intel agpgart snd_hda_codec snd_hda_core snd_pcm_oss igb snd_mixer_oss aesni_intel snd_pcm aes_x86_64 hwmon snd_timer crypto_simd i2c_algo_bit mptsas snd glue_helper
[ 323.105089] mpt3sas i2c_core mptscsih soundcore ahci mptbase raid_class libahci scsi_transport_sas libata scsi_mod button wmi hed unix
[ 323.105097] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0 #1
[ 323.105098] Hardware name: Dell Inc. Precision T5610/0WN7Y6, BIOS A16 02/05/2018
[ 323.105100] Call Trace:
[ 323.105101] <IRQ>
[ 323.105105] ? dump_stack+0x46/0x5a
[ 323.105107] ? __warn+0xb4/0xd0
[ 323.105110] ? scsi_init_io+0x143/0x1f0 [scsi_mod]
[ 323.105113] ? scsi_setup_cmnd+0x4c/0x140 [scsi_mod]
[ 323.105115] ? scsi_prep_fn+0xe3/0x170 [scsi_mod]
[ 323.105118] ? swiotlb_unmap_sg_attrs+0x44/0x60
[ 323.105119] ? blk_peek_request+0x130/0x200
[ 323.105122] ? scsi_request_fn+0x2b/0x510 [scsi_mod]
[ 323.105124] ? __blk_run_queue+0x2a/0x40
[ 323.105126] ? blk_run_queue+0x1c/0x30
[ 323.105129] ? scsi_run_queue+0x229/0x2b0 [scsi_mod]
[ 323.105131] ? scsi_io_completion+0x3d6/0x5c0 [scsi_mod]
[ 323.105133] ? blk_done_softirq+0x67/0x80
[ 323.105135] ? __do_softirq+0xdb/0x200
[ 323.105137] ? irq_exit+0xa3/0xb0
[ 323.105139] ? do_IRQ+0x45/0xc0
[ 323.105141] ? common_interrupt+0x7c/0x7c
[ 323.105142] </IRQ>
[ 323.105145] ? cpuidle_enter_state+0x144/0x1f0
[ 323.105146] ? cpuidle_enter_state+0x139/0x1f0
[ 323.105148] ? do_idle+0xd3/0x190
[ 323.105150] ? cpu_startup_entry+0x14/0x20
[ 323.105152] ? start_kernel+0x391/0x399
[ 323.105154] ? start_cpu+0x14/0x14
[ 323.105155] ---[ end trace f38cc734e4921bdc ]---
[ 323.105157] blk_peek_request: bad return=-22

2018-04-03 17:06:15

by Bart Van Assche

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

On Sun, 2018-04-01 at 14:27 -0400, Wakko Warner wrote:
> Wakko Warner wrote:
> > Wakko Warner wrote:
> > > I tested 4.14.32 last night with the same oops. 4.9.91 works fine.
> > > From the initiator, if I do cat /dev/sr1 > /dev/null it works. If I mount
> > > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> > > crashes. I'm using the builtin iscsi target with pscsi. I can burn from
> > > the initiator with out problems. I'll test other kernels between 4.9 and
> > > 4.14.
> >
> > So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest patch
> > (except for 4.15 which was 1 behind)
> > Each of these kernels crash within seconds or immediate of doing find -type
> > f | xargs cat > /dev/null from the initiator.
>
> I tried 4.10.0. It doesn't completely lockup the system, but the device
> that was used hangs. So from the initiator, it's /dev/sr1 and from the
> target it's /dev/sr0. Attempting to read /dev/sr0 after the oops causes the
> process to hang in D state.

Hello Wakko,

Thank you for having narrowed down this further. I think that you encountered
a regression either in the block layer core or in the SCSI core. Unfortunately
the number of changes between kernel versions v4.9 and v4.10 in these two
subsystems is huge. I see two possible ways forward:
- Either that you perform a bisect to identify the patch that introduced this
regression. However, I'm not sure whether you are familiar with the bisect
process.
- Or that you identify the command that triggers this crash such that others
can reproduce this issue without needing access to your setup.

How about reproducing this crash with the below patch applied on top of
kernel v4.15.x? The additional output sent by this patch to the system log
should allow us to reproduce this issue by submitting the same SCSI command
with sg_raw.

Thanks,

Bart.


Subject: [PATCH] Report commands with no physical segments in the system log

---
drivers/scsi/scsi_lib.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6b6a6705f6e5..74a39db57d49 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1093,8 +1093,10 @@ int scsi_init_io(struct scsi_cmnd *cmd)
bool is_mq = (rq->mq_ctx != NULL);
int error = BLKPREP_KILL;

- if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq)))
+ if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq))) {
+ scsi_print_command(cmd);
goto err_exit;
+ }

error = scsi_init_sgtable(rq, &cmd->sdb);
if (error)

2018-04-05 00:45:13

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Bart Van Assche wrote:
> On Sun, 2018-04-01 at 14:27 -0400, Wakko Warner wrote:
> > Wakko Warner wrote:
> > > Wakko Warner wrote:
> > > > I tested 4.14.32 last night with the same oops. 4.9.91 works fine.
> > > > From the initiator, if I do cat /dev/sr1 > /dev/null it works. If I mount
> > > > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> > > > crashes. I'm using the builtin iscsi target with pscsi. I can burn from
> > > > the initiator with out problems. I'll test other kernels between 4.9 and
> > > > 4.14.
> > >
> > > So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest patch
> > > (except for 4.15 which was 1 behind)
> > > Each of these kernels crash within seconds or immediate of doing find -type
> > > f | xargs cat > /dev/null from the initiator.
> >
> > I tried 4.10.0. It doesn't completely lockup the system, but the device
> > that was used hangs. So from the initiator, it's /dev/sr1 and from the
> > target it's /dev/sr0. Attempting to read /dev/sr0 after the oops causes the
> > process to hang in D state.
>
> Hello Wakko,
>
> Thank you for having narrowed down this further. I think that you encountered
> a regression either in the block layer core or in the SCSI core. Unfortunately
> the number of changes between kernel versions v4.9 and v4.10 in these two
> subsystems is huge. I see two possible ways forward:
> - Either that you perform a bisect to identify the patch that introduced this
> regression. However, I'm not sure whether you are familiar with the bisect
> process.
> - Or that you identify the command that triggers this crash such that others
> can reproduce this issue without needing access to your setup.
>
> How about reproducing this crash with the below patch applied on top of
> kernel v4.15.x? The additional output sent by this patch to the system log
> should allow us to reproduce this issue by submitting the same SCSI command
> with sg_raw.

Sorry for not getting back in touch. My internet was down. I haven't tried
the patch yet. I'll try to get to that tomorrow. The system with the issue
is busy and I can't reboot it right now.

2018-04-06 01:48:10

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Bart Van Assche wrote:
> On Sun, 2018-04-01 at 14:27 -0400, Wakko Warner wrote:
> > Wakko Warner wrote:
> > > Wakko Warner wrote:
> > > > I tested 4.14.32 last night with the same oops. 4.9.91 works fine.
> > > > From the initiator, if I do cat /dev/sr1 > /dev/null it works. If I mount
> > > > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> > > > crashes. I'm using the builtin iscsi target with pscsi. I can burn from
> > > > the initiator with out problems. I'll test other kernels between 4.9 and
> > > > 4.14.
> > >
> > > So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest patch
> > > (except for 4.15 which was 1 behind)
> > > Each of these kernels crash within seconds or immediate of doing find -type
> > > f | xargs cat > /dev/null from the initiator.
> >
> > I tried 4.10.0. It doesn't completely lockup the system, but the device
> > that was used hangs. So from the initiator, it's /dev/sr1 and from the
> > target it's /dev/sr0. Attempting to read /dev/sr0 after the oops causes the
> > process to hang in D state.
>
> Hello Wakko,
>
> Thank you for having narrowed down this further. I think that you encountered
> a regression either in the block layer core or in the SCSI core. Unfortunately
> the number of changes between kernel versions v4.9 and v4.10 in these two
> subsystems is huge. I see two possible ways forward:
> - Either that you perform a bisect to identify the patch that introduced this
> regression. However, I'm not sure whether you are familiar with the bisect
> process.
> - Or that you identify the command that triggers this crash such that others
> can reproduce this issue without needing access to your setup.
>
> How about reproducing this crash with the below patch applied on top of
> kernel v4.15.x? The additional output sent by this patch to the system log
> should allow us to reproduce this issue by submitting the same SCSI command
> with sg_raw.

Ok, so I tried this, but scsi_print_command doesn't print anything. I added
a check for !rq and the same thing that blk_rq_nr_phys_segments does in an
if statement above this thinking it might have crashed during WARN_ON_ONCE.
It still didn't print anything. My printk shows this:
[ 36.263193] sr 3:0:0:0: cmd->request->nr_phys_segments is 0

I also had scsi_print_command in the same if block which again didn't print
anything. Is there some debug option I need to turn on to make it print? I
tried looking through the code for this and following some of the function
calls but didn't see any config options.

> Subject: [PATCH] Report commands with no physical segments in the system log
>
> ---
> drivers/scsi/scsi_lib.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 6b6a6705f6e5..74a39db57d49 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1093,8 +1093,10 @@ int scsi_init_io(struct scsi_cmnd *cmd)
> bool is_mq = (rq->mq_ctx != NULL);
> int error = BLKPREP_KILL;
>
> - if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq)))
> + if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq))) {
> + scsi_print_command(cmd);
> goto err_exit;
> + }
>
> error = scsi_init_sgtable(rq, &cmd->sdb);
> if (error)
--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.

2018-04-06 02:08:22

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Wakko Warner wrote:
> Bart Van Assche wrote:
> > On Sun, 2018-04-01 at 14:27 -0400, Wakko Warner wrote:
> > > Wakko Warner wrote:
> > > > Wakko Warner wrote:
> > > > > I tested 4.14.32 last night with the same oops. 4.9.91 works fine.
> > > > > From the initiator, if I do cat /dev/sr1 > /dev/null it works. If I mount
> > > > > /dev/sr1 and then do find -type f | xargs cat > /dev/null the target
> > > > > crashes. I'm using the builtin iscsi target with pscsi. I can burn from
> > > > > the initiator with out problems. I'll test other kernels between 4.9 and
> > > > > 4.14.
> > > >
> > > > So I've tested 4.x.y where x one of 10 11 12 14 15 and y is the latest patch
> > > > (except for 4.15 which was 1 behind)
> > > > Each of these kernels crash within seconds or immediate of doing find -type
> > > > f | xargs cat > /dev/null from the initiator.
> > >
> > > I tried 4.10.0. It doesn't completely lockup the system, but the device
> > > that was used hangs. So from the initiator, it's /dev/sr1 and from the
> > > target it's /dev/sr0. Attempting to read /dev/sr0 after the oops causes the
> > > process to hang in D state.
> >
> > Hello Wakko,
> >
> > Thank you for having narrowed down this further. I think that you encountered
> > a regression either in the block layer core or in the SCSI core. Unfortunately
> > the number of changes between kernel versions v4.9 and v4.10 in these two
> > subsystems is huge. I see two possible ways forward:
> > - Either that you perform a bisect to identify the patch that introduced this
> > regression. However, I'm not sure whether you are familiar with the bisect
> > process.
> > - Or that you identify the command that triggers this crash such that others
> > can reproduce this issue without needing access to your setup.
> >
> > How about reproducing this crash with the below patch applied on top of
> > kernel v4.15.x? The additional output sent by this patch to the system log
> > should allow us to reproduce this issue by submitting the same SCSI command
> > with sg_raw.
>
> Ok, so I tried this, but scsi_print_command doesn't print anything. I added
> a check for !rq and the same thing that blk_rq_nr_phys_segments does in an
> if statement above this thinking it might have crashed during WARN_ON_ONCE.
> It still didn't print anything. My printk shows this:
> [ 36.263193] sr 3:0:0:0: cmd->request->nr_phys_segments is 0
>
> I also had scsi_print_command in the same if block which again didn't print
> anything. Is there some debug option I need to turn on to make it print? I
> tried looking through the code for this and following some of the function
> calls but didn't see any config options.

I know now why scsi_print_command isn't doing anything. cmd->cmnd is null.
I added a dev_printk in scsi_print_command where the 2 if statements return.
Logs:
[ 29.866415] sr 3:0:0:0: cmd->cmnd is NULL

> > Subject: [PATCH] Report commands with no physical segments in the system log
> >
> > ---
> > drivers/scsi/scsi_lib.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index 6b6a6705f6e5..74a39db57d49 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1093,8 +1093,10 @@ int scsi_init_io(struct scsi_cmnd *cmd)
> > bool is_mq = (rq->mq_ctx != NULL);
> > int error = BLKPREP_KILL;
> >
> > - if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq)))
> > + if (WARN_ON_ONCE(!blk_rq_nr_phys_segments(rq))) {
> > + scsi_print_command(cmd);
> > goto err_exit;
> > + }
> >
> > error = scsi_init_sgtable(rq, &cmd->sdb);
> > if (error)
> --
> Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
> million bugs.
--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.

2018-04-06 02:21:33

by Bart Van Assche

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote:
> I know now why scsi_print_command isn't doing anything. cmd->cmnd is null.
> I added a dev_printk in scsi_print_command where the 2 if statements return.
> Logs:
> [ 29.866415] sr 3:0:0:0: cmd->cmnd is NULL

That's something that should never happen. As one can see in
scsi_setup_scsi_cmnd() and scsi_setup_fs_cmnd() both functions initialize
that pointer. Since I have not yet been able to reproduce myself what you
reported, would it be possible for you to bisect this issue? You will need
to follow something like the following procedure (see also
https://git-scm.com/docs/git-bisect):

git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git bisect start
git bisect bad v4.10
git bisect good v4.9

and then build the kernel, install it, boot the kernel and test it.
Depending on the result, run either git bisect bad or git bisect good and
keep going until git bisect comes to a conclusion. This can take an hour or
more.

Bart.



2018-04-06 23:47:02

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Bart Van Assche wrote:
> On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote:
> > I know now why scsi_print_command isn't doing anything. cmd->cmnd is null.
> > I added a dev_printk in scsi_print_command where the 2 if statements return.
> > Logs:
> > [ 29.866415] sr 3:0:0:0: cmd->cmnd is NULL
>
> That's something that should never happen. As one can see in
> scsi_setup_scsi_cmnd() and scsi_setup_fs_cmnd() both functions initialize
> that pointer. Since I have not yet been able to reproduce myself what you
> reported, would it be possible for you to bisect this issue? You will need
> to follow something like the following procedure (see also
> https://git-scm.com/docs/git-bisect):

I don't know how relevent it is, but this machine boots nfs and exports it's
dvd drives over iscsi with the target modules. My scsi_target.lio is at the
end. I removed the iqn name. The options are default except for a few.
Non default options I tabbed over.
eth0 is the nfs/localnet nic and eth1 is the
nic that iscsi goes over.
eth0 is onboard pci 8086:1502 (subsystem 1028:05d3)
eth1 is pci 8086:107d (subsystem 8086:1084)
Both use the e1000e driver

The initiator is running 4.4.107.
When running on the initiator, /dev/sr1 is the target /dev/sr0. Therefor
cat /dev/sr1 > /dev/null seems to work.
mount /dev/sr1 /cdrom works
find /cdrom -type f | xargs cat > /dev/null immediately crashes the target.
Burning to /dev/sr1 seems to work.

I have another nic that uses igb instead, I'll see if that makes a
difference.

> git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> git bisect start
> git bisect bad v4.10
> git bisect good v4.9
>
> and then build the kernel, install it, boot the kernel and test it.
> Depending on the result, run either git bisect bad or git bisect good and
> keep going until git bisect comes to a conclusion. This can take an hour or
> more.

I'll try this.

Here's my scsi_target.lio:
storage pscsi {
disk dvd0 {
path /dev/sr0
attribute {
emulate_3pc yes
emulate_caw yes
emulate_dpo no
emulate_fua_read no
emulate_model_alias no
emulate_rest_reord no
emulate_tas yes
emulate_tpu no
emulate_tpws no
emulate_ua_intlck_ctrl no
emulate_write_cache no
enforce_pr_isids yes
fabric_max_sectors 8192
is_nonrot yes
max_unmap_block_desc_count 0
max_unmap_lba_count 0
max_write_same_len 65535
queue_depth 128
unmap_granularity 0
unmap_granularity_alignment 0
}
}
disk dvd1 {
path /dev/sr1
attribute {
emulate_3pc yes
emulate_caw yes
emulate_dpo no
emulate_fua_read no
emulate_model_alias no
emulate_rest_reord no
emulate_tas yes
emulate_tpu no
emulate_tpws no
emulate_ua_intlck_ctrl no
emulate_write_cache no
enforce_pr_isids yes
fabric_max_sectors 8192
is_nonrot yes
max_unmap_block_desc_count 0
max_unmap_lba_count 0
max_write_same_len 65535
queue_depth 128
unmap_granularity 0
unmap_granularity_alignment 0
}
}
disk dvd2 {
path /dev/sr2
attribute {
emulate_3pc yes
emulate_caw yes
emulate_dpo no
emulate_fua_read no
emulate_model_alias no
emulate_rest_reord no
emulate_tas yes
emulate_tpu no
emulate_tpws no
emulate_ua_intlck_ctrl no
emulate_write_cache no
enforce_pr_isids yes
fabric_max_sectors 8192
is_nonrot yes
max_unmap_block_desc_count 0
max_unmap_lba_count 0
max_write_same_len 65535
queue_depth 128
unmap_granularity 0
unmap_granularity_alignment 0
}
}
}
fabric iscsi {
discovery_auth {
enable no
mutual_password ""
mutual_userid ""
password ""
userid ""
}
target iqn.<myiqn>:dvd tpgt 1 {
enable yes
attribute {
authentication no
cache_dynamic_acls yes
default_cmdsn_depth 64
default_erl 0
demo_mode_discovery yes
demo_mode_write_protect no
fabric_prot_type 0
generate_node_acls yes
login_timeout 15
netif_timeout 2
prod_mode_write_protect no
t10_pi 0
tpg_enabled_sendtargets 1
}
auth {
password ""
password_mutual ""
userid ""
userid_mutual ""
}
parameter {
AuthMethod "CHAP,None"
DataDigest "CRC32C,None"
DataPDUInOrder yes
DataSequenceInOrder yes
DefaultTime2Retain 20
DefaultTime2Wait 2
ErrorRecoveryLevel no
FirstBurstLength 65536
HeaderDigest "CRC32C,None"
IFMarkInt Reject
IFMarker no
ImmediateData yes
InitialR2T yes
MaxBurstLength 262144
MaxConnections 1
MaxOutstandingR2T 1
MaxRecvDataSegmentLength 8192
MaxXmitDataSegmentLength 262144
OFMarkInt Reject
OFMarker no
TargetAlias "LIO Target"
}
lun 0 backend pscsi:dvd0
lun 1 backend pscsi:dvd1
lun 2 backend pscsi:dvd2
portal 0.0.0.0:3260
}
}


--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.

2018-04-07 01:10:34

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Bart Van Assche wrote:
> On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote:
> > I know now why scsi_print_command isn't doing anything. cmd->cmnd is null.
> > I added a dev_printk in scsi_print_command where the 2 if statements return.
> > Logs:
> > [ 29.866415] sr 3:0:0:0: cmd->cmnd is NULL
>
> That's something that should never happen. As one can see in
> scsi_setup_scsi_cmnd() and scsi_setup_fs_cmnd() both functions initialize
> that pointer. Since I have not yet been able to reproduce myself what you
> reported, would it be possible for you to bisect this issue? You will need
> to follow something like the following procedure (see also
> https://git-scm.com/docs/git-bisect):
>
> git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> git bisect start
> git bisect bad v4.10
> git bisect good v4.9
>
> and then build the kernel, install it, boot the kernel and test it.
> Depending on the result, run either git bisect bad or git bisect good and
> keep going until git bisect comes to a conclusion. This can take an hour or
> more.

I have 1 question. Should make clean be done between tests? My box
compiles the whole kernel in 2 minutes.

--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.

2018-04-07 02:10:08

by Bart Van Assche

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

On Fri, 2018-04-06 at 21:03 -0400, Wakko Warner wrote:
> Bart Van Assche wrote:
> > On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote:
> > > I know now why scsi_print_command isn't doing anything. cmd->cmnd is null.
> > > I added a dev_printk in scsi_print_command where the 2 if statements return.
> > > Logs:
> > > [ 29.866415] sr 3:0:0:0: cmd->cmnd is NULL
> >
> > That's something that should never happen. As one can see in
> > scsi_setup_scsi_cmnd() and scsi_setup_fs_cmnd() both functions initialize
> > that pointer. Since I have not yet been able to reproduce myself what you
> > reported, would it be possible for you to bisect this issue? You will need
> > to follow something like the following procedure (see also
> > https://git-scm.com/docs/git-bisect):
> >
> > git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > git bisect start
> > git bisect bad v4.10
> > git bisect good v4.9
> >
> > and then build the kernel, install it, boot the kernel and test it.
> > Depending on the result, run either git bisect bad or git bisect good and
> > keep going until git bisect comes to a conclusion. This can take an hour or
> > more.
>
> I have 1 question. Should make clean be done between tests? My box
> compiles the whole kernel in 2 minutes.

If you trust that the build system will figure out all dependencies then
running make clean is not necessary. Personally I always run make clean
during a bisect before rebuilding the kernel because if a header file has
changed in e.g. the block layer a huge number of files has to be rebuilt
anyway.

Bart.


2018-04-07 16:56:59

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Bart Van Assche wrote:
> On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote:
> > I know now why scsi_print_command isn't doing anything. cmd->cmnd is null.
> > I added a dev_printk in scsi_print_command where the 2 if statements return.
> > Logs:
> > [ 29.866415] sr 3:0:0:0: cmd->cmnd is NULL
>
> That's something that should never happen. As one can see in
> scsi_setup_scsi_cmnd() and scsi_setup_fs_cmnd() both functions initialize
> that pointer. Since I have not yet been able to reproduce myself what you
> reported, would it be possible for you to bisect this issue? You will need
> to follow something like the following procedure (see also
> https://git-scm.com/docs/git-bisect):

After doing 3 successful compiles with good/bad, I got this error and was
not able to compile any more kernels:
CC scripts/mod/devicetable-offsets.s
scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
/* empty file to figure out endianness / word size */

scripts/mod/devicetable-offsets.c:1:0: error: code model kernel does not support PIC mode
#include <linux/kbuild.h>

scripts/Makefile.build:153: recipe for target 'scripts/mod/devicetable-offsets.s' failed

I don't think it found the bad commit.

--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.

2018-04-07 17:12:15

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Wakko Warner wrote:
> Bart Van Assche wrote:
> > On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote:
> > > I know now why scsi_print_command isn't doing anything. cmd->cmnd is null.
> > > I added a dev_printk in scsi_print_command where the 2 if statements return.
> > > Logs:
> > > [ 29.866415] sr 3:0:0:0: cmd->cmnd is NULL
> >
> > That's something that should never happen. As one can see in
> > scsi_setup_scsi_cmnd() and scsi_setup_fs_cmnd() both functions initialize
> > that pointer. Since I have not yet been able to reproduce myself what you
> > reported, would it be possible for you to bisect this issue? You will need
> > to follow something like the following procedure (see also
> > https://git-scm.com/docs/git-bisect):
>
> After doing 3 successful compiles with good/bad, I got this error and was
> not able to compile any more kernels:
> CC scripts/mod/devicetable-offsets.s
> scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
> /* empty file to figure out endianness / word size */
>
> scripts/mod/devicetable-offsets.c:1:0: error: code model kernel does not support PIC mode
> #include <linux/kbuild.h>
>
> scripts/Makefile.build:153: recipe for target 'scripts/mod/devicetable-offsets.s' failed
>
> I don't think it found the bad commit.

I forgot to mention my gcc version.
gcc (Debian 6.2.1-7) 6.2.1 20161215

--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.

2018-04-07 17:13:55

by Bart Van Assche

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

On Sat, 2018-04-07 at 12:53 -0400, Wakko Warner wrote:
> Bart Van Assche wrote:
> > On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote:
> > > I know now why scsi_print_command isn't doing anything. cmd->cmnd is null.
> > > I added a dev_printk in scsi_print_command where the 2 if statements return.
> > > Logs:
> > > [ 29.866415] sr 3:0:0:0: cmd->cmnd is NULL
> >
> > That's something that should never happen. As one can see in
> > scsi_setup_scsi_cmnd() and scsi_setup_fs_cmnd() both functions initialize
> > that pointer. Since I have not yet been able to reproduce myself what you
> > reported, would it be possible for you to bisect this issue? You will need
> > to follow something like the following procedure (see also
> > https://git-scm.com/docs/git-bisect):
>
> After doing 3 successful compiles with good/bad, I got this error and was
> not able to compile any more kernels:
> CC scripts/mod/devicetable-offsets.s
> scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
> /* empty file to figure out endianness / word size */
>
> scripts/mod/devicetable-offsets.c:1:0: error: code model kernel does not support PIC mode
> #include <linux/kbuild.h>
>
> scripts/Makefile.build:153: recipe for target 'scripts/mod/devicetable-offsets.s' failed
>
> I don't think it found the bad commit.

Have you tried to modify the kernel Makefile as indicated in the following
e-mail? This should make the kernel build:

https://lists.ubuntu.com/archives/kernel-team/2016-May/077178.html

Thanks,

Bart.

2018-04-08 16:32:44

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Wakko Warner wrote:
> Bart Van Assche wrote:
> > Have you tried to modify the kernel Makefile as indicated in the following
> > e-mail? This should make the kernel build:
> >
> > https://lists.ubuntu.com/archives/kernel-team/2016-May/077178.html
>
> Thanks. That helped.
>
> I finished with git bisect. Here's the output:
> 84c8590646d5b35804bac60eb58b145839b5893e is the first bad commit
> commit 84c8590646d5b35804bac60eb58b145839b5893e
> Author: Ming Lei <[email protected]>
> Date: Fri Nov 11 20:05:32 2016 +0800
>
> target: avoid accessing .bi_vcnt directly
>
> When the bio is full, bio_add_pc_page() will return zero,
> so use this information tell when the bio is full.
>
> Also replace access to .bi_vcnt for pr_debug() with bio_segments().
>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Ming Lei <[email protected]>
> Reviewed-by: Sagi Grimberg <[email protected]>
> Signed-off-by: Jens Axboe <[email protected]>
>
> :040000 040000 a3ebbb71c52ee4eb8c3be4d033b81179211bf704 de39a328dbd1b18519946b3ad46d9302886e0dd0 M drivers
>
> I did a diff between HEAD^ and HEAD and manually patched the file from
> 4.15.14. It's not an exact revert. I'm running it now and it's working.
> I'll do a better test later on. Here's the patch:
>
> --- a/drivers/target/target_core_pscsi.c 2018-02-04 14:31:31.077316617 -0500
> +++ b/drivers/target/target_core_pscsi.c 2018-04-08 11:43:49.588641374 -0400
> @@ -915,7 +915,9 @@
> bio, page, bytes, off);
> pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
> bio_segments(bio), nr_vecs);
> - if (rc != bytes) {
> + if (rc != bytes)
> + goto fail;
> + if (bio->bi_vcnt > nr_vecs) {
> pr_debug("PSCSI: Reached bio->bi_vcnt max:"
> " %d i: %d bio: %p, allocating another"
> " bio\n", bio->bi_vcnt, i, bio);
>
> I really appreciate your time and assistance with this.

One thing I noticed after doing this is errors in the kernel log on the
initiator:
[9072625.181744] sr 26:0:0:0: [sr1] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[9072625.181802] sr 26:0:0:0: [sr1] tag#0 Sense Key : 0x2 [current]
[9072625.181835] sr 26:0:0:0: [sr1] tag#0 ASC=0x8 ASCQ=0x0
[9072625.181866] sr 26:0:0:0: [sr1] tag#0 CDB: opcode=0x28 28 00 00 0a 81 22 00 00 80 00
[9072625.181919] blk_update_request: I/O error, dev sr1, sector 2753672

When doing the exact same thing on the target, no mention. My patch may not
be right, but it doesn't cause an oops.

I'm going to try 4.16.1 and see what happens.

--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.

2018-04-08 17:08:37

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Bart Van Assche wrote:
> On Sat, 2018-04-07 at 12:53 -0400, Wakko Warner wrote:
> > Bart Van Assche wrote:
> > > On Thu, 2018-04-05 at 22:06 -0400, Wakko Warner wrote:
> > > > I know now why scsi_print_command isn't doing anything. cmd->cmnd is null.
> > > > I added a dev_printk in scsi_print_command where the 2 if statements return.
> > > > Logs:
> > > > [ 29.866415] sr 3:0:0:0: cmd->cmnd is NULL
> > >
> > > That's something that should never happen. As one can see in
> > > scsi_setup_scsi_cmnd() and scsi_setup_fs_cmnd() both functions initialize
> > > that pointer. Since I have not yet been able to reproduce myself what you
> > > reported, would it be possible for you to bisect this issue? You will need
> > > to follow something like the following procedure (see also
> > > https://git-scm.com/docs/git-bisect):
> >
> > After doing 3 successful compiles with good/bad, I got this error and was
> > not able to compile any more kernels:
> > CC scripts/mod/devicetable-offsets.s
> > scripts/mod/empty.c:1:0: error: code model kernel does not support PIC mode
> > /* empty file to figure out endianness / word size */
> >
> > scripts/mod/devicetable-offsets.c:1:0: error: code model kernel does not support PIC mode
> > #include <linux/kbuild.h>
> >
> > scripts/Makefile.build:153: recipe for target 'scripts/mod/devicetable-offsets.s' failed
> >
> > I don't think it found the bad commit.
>
> Have you tried to modify the kernel Makefile as indicated in the following
> e-mail? This should make the kernel build:
>
> https://lists.ubuntu.com/archives/kernel-team/2016-May/077178.html

Thanks. That helped.

I finished with git bisect. Here's the output:
84c8590646d5b35804bac60eb58b145839b5893e is the first bad commit
commit 84c8590646d5b35804bac60eb58b145839b5893e
Author: Ming Lei <[email protected]>
Date: Fri Nov 11 20:05:32 2016 +0800

target: avoid accessing .bi_vcnt directly

When the bio is full, bio_add_pc_page() will return zero,
so use this information tell when the bio is full.

Also replace access to .bi_vcnt for pr_debug() with bio_segments().

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

:040000 040000 a3ebbb71c52ee4eb8c3be4d033b81179211bf704 de39a328dbd1b18519946b3ad46d9302886e0dd0 M drivers

I did a diff between HEAD^ and HEAD and manually patched the file from
4.15.14. It's not an exact revert. I'm running it now and it's working.
I'll do a better test later on. Here's the patch:

--- a/drivers/target/target_core_pscsi.c 2018-02-04 14:31:31.077316617 -0500
+++ b/drivers/target/target_core_pscsi.c 2018-04-08 11:43:49.588641374 -0400
@@ -915,7 +915,9 @@
bio, page, bytes, off);
pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
bio_segments(bio), nr_vecs);
- if (rc != bytes) {
+ if (rc != bytes)
+ goto fail;
+ if (bio->bi_vcnt > nr_vecs) {
pr_debug("PSCSI: Reached bio->bi_vcnt max:"
" %d i: %d bio: %p, allocating another"
" bio\n", bio->bi_vcnt, i, bio);

I really appreciate your time and assistance with this.

--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.

2018-04-09 21:33:56

by Bart Van Assche

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

On Sun, 2018-04-08 at 12:02 -0400, Wakko Warner wrote:
> I finished with git bisect. Here's the output:
> 84c8590646d5b35804bac60eb58b145839b5893e is the first bad commit
> commit 84c8590646d5b35804bac60eb58b145839b5893e
> Author: Ming Lei <[email protected]>
> Date: Fri Nov 11 20:05:32 2016 +0800
>
> target: avoid accessing .bi_vcnt directly
>
> When the bio is full, bio_add_pc_page() will return zero,
> so use this information tell when the bio is full.
>
> Also replace access to .bi_vcnt for pr_debug() with bio_segments().
>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Ming Lei <[email protected]>
> Reviewed-by: Sagi Grimberg <[email protected]>
> Signed-off-by: Jens Axboe <[email protected]>
>
> :040000 040000 a3ebbb71c52ee4eb8c3be4d033b81179211bf704 de39a328dbd1b18519946b3ad46d9302886e0dd0 M drivers
>
> I did a diff between HEAD^ and HEAD and manually patched the file from
> 4.15.14. It's not an exact revert. I'm running it now and it's working.
> I'll do a better test later on. Here's the patch:
>
> --- a/drivers/target/target_core_pscsi.c 2018-02-04 14:31:31.077316617 -0500
> +++ b/drivers/target/target_core_pscsi.c 2018-04-08 11:43:49.588641374 -0400
> @@ -915,7 +915,9 @@
> bio, page, bytes, off);
> pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
> bio_segments(bio), nr_vecs);
> - if (rc != bytes) {
> + if (rc != bytes)
> + goto fail;
> + if (bio->bi_vcnt > nr_vecs) {
> pr_debug("PSCSI: Reached bio->bi_vcnt max:"
> " %d i: %d bio: %p, allocating another"
> " bio\n", bio->bi_vcnt, i, bio);

Hello Ming,

Can you have a look at this? The start of this e-mail thread is available at
https://www.mail-archive.com/[email protected]/msg72574.html.

Thanks,

Bart.



2018-04-09 23:46:45

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Ming Lei wrote:
> On Mon, Apr 09, 2018 at 09:30:11PM +0000, Bart Van Assche wrote:
> > Hello Ming,
> >
> > Can you have a look at this? The start of this e-mail thread is available at
> > https://www.mail-archive.com/[email protected]/msg72574.html.
>
> Sure, thanks for your sharing.
>
> Wakko, could you test the following patch and see if there is any
> difference?

Sure, one question, is this against 4.15 or does it matter. Last I looked,
4.16 hasn't changed from 4.15 for that file.

> diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
> index 0d99b242e82e..6147178f1f37 100644
> --- a/drivers/target/target_core_pscsi.c
> +++ b/drivers/target/target_core_pscsi.c
> @@ -888,7 +888,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> if (len > 0 && data_len > 0) {
> bytes = min_t(unsigned int, len, PAGE_SIZE - off);
> bytes = min(bytes, data_len);
> -
> + new_bio:
> if (!bio) {
> nr_vecs = min_t(int, BIO_MAX_PAGES, nr_pages);
> nr_pages -= nr_vecs;
> @@ -931,6 +931,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> * be allocated with pscsi_get_bio() above.
> */
> bio = NULL;
> + goto new_bio;
> }
>
> data_len -= bytes;
>
> --
> Ming
--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.

2018-04-09 23:55:06

by Ming Lei

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

On Mon, Apr 09, 2018 at 07:43:01PM -0400, Wakko Warner wrote:
> Ming Lei wrote:
> > On Mon, Apr 09, 2018 at 09:30:11PM +0000, Bart Van Assche wrote:
> > > Hello Ming,
> > >
> > > Can you have a look at this? The start of this e-mail thread is available at
> > > https://www.mail-archive.com/[email protected]/msg72574.html.
> >
> > Sure, thanks for your sharing.
> >
> > Wakko, could you test the following patch and see if there is any
> > difference?
>
> Sure, one question, is this against 4.15 or does it matter. Last I looked,
> 4.16 hasn't changed from 4.15 for that file.

It is two-line change, and I am sure it can be applied on 4.15 cleanly.

Thanks,
Ming

2018-04-10 00:03:37

by Ming Lei

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

On Mon, Apr 09, 2018 at 09:30:11PM +0000, Bart Van Assche wrote:
> On Sun, 2018-04-08 at 12:02 -0400, Wakko Warner wrote:
> > I finished with git bisect. Here's the output:
> > 84c8590646d5b35804bac60eb58b145839b5893e is the first bad commit
> > commit 84c8590646d5b35804bac60eb58b145839b5893e
> > Author: Ming Lei <[email protected]>
> > Date: Fri Nov 11 20:05:32 2016 +0800
> >
> > target: avoid accessing .bi_vcnt directly
> >
> > When the bio is full, bio_add_pc_page() will return zero,
> > so use this information tell when the bio is full.
> >
> > Also replace access to .bi_vcnt for pr_debug() with bio_segments().
> >
> > Reviewed-by: Christoph Hellwig <[email protected]>
> > Signed-off-by: Ming Lei <[email protected]>
> > Reviewed-by: Sagi Grimberg <[email protected]>
> > Signed-off-by: Jens Axboe <[email protected]>
> >
> > :040000 040000 a3ebbb71c52ee4eb8c3be4d033b81179211bf704 de39a328dbd1b18519946b3ad46d9302886e0dd0 M drivers
> >
> > I did a diff between HEAD^ and HEAD and manually patched the file from
> > 4.15.14. It's not an exact revert. I'm running it now and it's working.
> > I'll do a better test later on. Here's the patch:
> >
> > --- a/drivers/target/target_core_pscsi.c 2018-02-04 14:31:31.077316617 -0500
> > +++ b/drivers/target/target_core_pscsi.c 2018-04-08 11:43:49.588641374 -0400
> > @@ -915,7 +915,9 @@
> > bio, page, bytes, off);
> > pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
> > bio_segments(bio), nr_vecs);
> > - if (rc != bytes) {
> > + if (rc != bytes)
> > + goto fail;
> > + if (bio->bi_vcnt > nr_vecs) {
> > pr_debug("PSCSI: Reached bio->bi_vcnt max:"
> > " %d i: %d bio: %p, allocating another"
> > " bio\n", bio->bi_vcnt, i, bio);
>
> Hello Ming,
>
> Can you have a look at this? The start of this e-mail thread is available at
> https://www.mail-archive.com/[email protected]/msg72574.html.

Sure, thanks for your sharing.

Wakko, could you test the following patch and see if there is any
difference?

--
diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
index 0d99b242e82e..6147178f1f37 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -888,7 +888,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
if (len > 0 && data_len > 0) {
bytes = min_t(unsigned int, len, PAGE_SIZE - off);
bytes = min(bytes, data_len);
-
+ new_bio:
if (!bio) {
nr_vecs = min_t(int, BIO_MAX_PAGES, nr_pages);
nr_pages -= nr_vecs;
@@ -931,6 +931,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
* be allocated with pscsi_get_bio() above.
*/
bio = NULL;
+ goto new_bio;
}

data_len -= bytes;

--
Ming

2018-04-11 03:38:44

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Ming Lei wrote:
> Sure, thanks for your sharing.
>
> Wakko, could you test the following patch and see if there is any
> difference?
>
> --
> diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
> index 0d99b242e82e..6147178f1f37 100644
> --- a/drivers/target/target_core_pscsi.c
> +++ b/drivers/target/target_core_pscsi.c
> @@ -888,7 +888,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> if (len > 0 && data_len > 0) {
> bytes = min_t(unsigned int, len, PAGE_SIZE - off);
> bytes = min(bytes, data_len);
> -
> + new_bio:
> if (!bio) {
> nr_vecs = min_t(int, BIO_MAX_PAGES, nr_pages);
> nr_pages -= nr_vecs;
> @@ -931,6 +931,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> * be allocated with pscsi_get_bio() above.
> */
> bio = NULL;
> + goto new_bio;
> }
>
> data_len -= bytes;

Sorry for the delay. I reverted my change, added this one. I didn't
reboot, I just unloaded and loaded this one.
Note: /dev/sr1 as seen from the initiator is /dev/sr0 (physical disc) on the
target.

Doesn't crash, however on the initiator I see this:
[9273849.707777] ISO 9660 Extensions: RRIP_1991A
[9273863.359718] scsi_io_completion: 13 callbacks suppressed
[9273863.359788] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[9273863.359909] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current]
[9273863.359974] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0
[9273863.360036] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f6 96 00 00 80 00
[9273863.360116] blk_update_request: 13 callbacks suppressed
[9273863.360177] blk_update_request: I/O error, dev sr1, sector 9165400
[9273875.864648] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[9273875.864738] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current]
[9273875.864801] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0
[9273875.864890] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f7 16 00 00 80 00
[9273875.864971] blk_update_request: I/O error, dev sr1, sector 9165912

To cause this, I mounted the dvd as seen in the first line and ran this
command: find /cdrom2 -type f | xargs -tn1 cat > /dev/null
I did some various tests. Each test was done after umount and mount to
clear the cache.
cat <file> > /dev/null causes the message.
dd if=<file> of=/dev/null bs=2048 doesn't
using bs=4096 doesn't
using bs=64k doesn't
using bs=128k does
cat uses a blocksize of 128k.

The following was done without being mounted.
ddrescue -f -f /dev/sr1 /dev/null
doesn't cause the message
dd if=/dev/sr1 of=/dev/null bs=128k
doesn't cause the message
using bs=256k causes the message once:
[9275916.857409] sr 27:0:0:0: [sr1] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[9275916.857482] sr 27:0:0:0: [sr1] tag#0 Sense Key : 0x2 [current]
[9275916.857520] sr 27:0:0:0: [sr1] tag#0 ASC=0x8 ASCQ=0x0
[9275916.857556] sr 27:0:0:0: [sr1] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 80 00
[9275916.857614] blk_update_request: I/O error, dev sr1, sector 0

If I access the disc from the target natively either by mounting and
accessing files or working with the device directly (ie dd) no errors are
logged on the target.

--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.

2018-04-12 00:57:01

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Wakko Warner wrote:
> Ming Lei wrote:
> > Sure, thanks for your sharing.
> >
> > Wakko, could you test the following patch and see if there is any
> > difference?
> >
> > --
> > diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
> > index 0d99b242e82e..6147178f1f37 100644
> > --- a/drivers/target/target_core_pscsi.c
> > +++ b/drivers/target/target_core_pscsi.c
> > @@ -888,7 +888,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> > if (len > 0 && data_len > 0) {
> > bytes = min_t(unsigned int, len, PAGE_SIZE - off);
> > bytes = min(bytes, data_len);
> > -
> > + new_bio:
> > if (!bio) {
> > nr_vecs = min_t(int, BIO_MAX_PAGES, nr_pages);
> > nr_pages -= nr_vecs;
> > @@ -931,6 +931,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> > * be allocated with pscsi_get_bio() above.
> > */
> > bio = NULL;
> > + goto new_bio;
> > }
> >
> > data_len -= bytes;
>
> Sorry for the delay. I reverted my change, added this one. I didn't
> reboot, I just unloaded and loaded this one.
> Note: /dev/sr1 as seen from the initiator is /dev/sr0 (physical disc) on the
> target.
>
> Doesn't crash, however on the initiator I see this:
> [9273849.707777] ISO 9660 Extensions: RRIP_1991A
> [9273863.359718] scsi_io_completion: 13 callbacks suppressed
> [9273863.359788] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
> [9273863.359909] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current]
> [9273863.359974] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0
> [9273863.360036] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f6 96 00 00 80 00
> [9273863.360116] blk_update_request: 13 callbacks suppressed
> [9273863.360177] blk_update_request: I/O error, dev sr1, sector 9165400
> [9273875.864648] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
> [9273875.864738] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current]
> [9273875.864801] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0
> [9273875.864890] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f7 16 00 00 80 00
> [9273875.864971] blk_update_request: I/O error, dev sr1, sector 9165912

Just FYI: The jobs that I do that uses the disc over iscsi didn't cause any
kernel messages on either system (except for the informational when the disc
was mounted)

I have a dumb question though. Could the label be placed just after the
'if' statement instead of before it? bio is set to null and the 'if'
statement checks if it's null, which it always would be after the goto.

--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.

2018-04-12 10:11:25

by Ming Lei

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

On Tue, Apr 10, 2018 at 08:45:25PM -0400, Wakko Warner wrote:
> Ming Lei wrote:
> > Sure, thanks for your sharing.
> >
> > Wakko, could you test the following patch and see if there is any
> > difference?
> >
> > --
> > diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
> > index 0d99b242e82e..6147178f1f37 100644
> > --- a/drivers/target/target_core_pscsi.c
> > +++ b/drivers/target/target_core_pscsi.c
> > @@ -888,7 +888,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> > if (len > 0 && data_len > 0) {
> > bytes = min_t(unsigned int, len, PAGE_SIZE - off);
> > bytes = min(bytes, data_len);
> > -
> > + new_bio:
> > if (!bio) {
> > nr_vecs = min_t(int, BIO_MAX_PAGES, nr_pages);
> > nr_pages -= nr_vecs;
> > @@ -931,6 +931,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> > * be allocated with pscsi_get_bio() above.
> > */
> > bio = NULL;
> > + goto new_bio;
> > }
> >
> > data_len -= bytes;
>
> Sorry for the delay. I reverted my change, added this one. I didn't
> reboot, I just unloaded and loaded this one.
> Note: /dev/sr1 as seen from the initiator is /dev/sr0 (physical disc) on the
> target.
>
> Doesn't crash, however on the initiator I see this:
> [9273849.707777] ISO 9660 Extensions: RRIP_1991A
> [9273863.359718] scsi_io_completion: 13 callbacks suppressed
> [9273863.359788] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
> [9273863.359909] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current]
> [9273863.359974] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0
> [9273863.360036] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f6 96 00 00 80 00
> [9273863.360116] blk_update_request: 13 callbacks suppressed
> [9273863.360177] blk_update_request: I/O error, dev sr1, sector 9165400
> [9273875.864648] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
> [9273875.864738] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current]
> [9273875.864801] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0
> [9273875.864890] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f7 16 00 00 80 00
> [9273875.864971] blk_update_request: I/O error, dev sr1, sector 9165912
>
> To cause this, I mounted the dvd as seen in the first line and ran this
> command: find /cdrom2 -type f | xargs -tn1 cat > /dev/null
> I did some various tests. Each test was done after umount and mount to
> clear the cache.
> cat <file> > /dev/null causes the message.
> dd if=<file> of=/dev/null bs=2048 doesn't
> using bs=4096 doesn't
> using bs=64k doesn't
> using bs=128k does
> cat uses a blocksize of 128k.
>
> The following was done without being mounted.
> ddrescue -f -f /dev/sr1 /dev/null
> doesn't cause the message
> dd if=/dev/sr1 of=/dev/null bs=128k
> doesn't cause the message
> using bs=256k causes the message once:
> [9275916.857409] sr 27:0:0:0: [sr1] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
> [9275916.857482] sr 27:0:0:0: [sr1] tag#0 Sense Key : 0x2 [current]
> [9275916.857520] sr 27:0:0:0: [sr1] tag#0 ASC=0x8 ASCQ=0x0
> [9275916.857556] sr 27:0:0:0: [sr1] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 80 00
> [9275916.857614] blk_update_request: I/O error, dev sr1, sector 0
>
> If I access the disc from the target natively either by mounting and
> accessing files or working with the device directly (ie dd) no errors are
> logged on the target.

OK, thanks for your test.

Could you test the following patch and see if there is still the failure
message?

diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
index 0d99b242e82e..6137287b52fb 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -913,9 +913,11 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,

rc = bio_add_pc_page(pdv->pdv_sd->request_queue,
bio, page, bytes, off);
+ if (rc != bytes)
+ goto fail;
pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
bio_segments(bio), nr_vecs);
- if (rc != bytes) {
+ if (/*rc != bytes*/0) {
pr_debug("PSCSI: Reached bio->bi_vcnt max:"
" %d i: %d bio: %p, allocating another"
" bio\n", bio->bi_vcnt, i, bio);

--
Ming

2018-04-13 02:11:09

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Ming Lei wrote:
> On Tue, Apr 10, 2018 at 08:45:25PM -0400, Wakko Warner wrote:
> > Sorry for the delay. I reverted my change, added this one. I didn't
> > reboot, I just unloaded and loaded this one.
> > Note: /dev/sr1 as seen from the initiator is /dev/sr0 (physical disc) on the
> > target.
> >
> > Doesn't crash, however on the initiator I see this:
> > [9273849.707777] ISO 9660 Extensions: RRIP_1991A
> > [9273863.359718] scsi_io_completion: 13 callbacks suppressed
> > [9273863.359788] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
> > [9273863.359909] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current]
> > [9273863.359974] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0
> > [9273863.360036] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f6 96 00 00 80 00
> > [9273863.360116] blk_update_request: 13 callbacks suppressed
> > [9273863.360177] blk_update_request: I/O error, dev sr1, sector 9165400
> > [9273875.864648] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
> > [9273875.864738] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current]
> > [9273875.864801] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0
> > [9273875.864890] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f7 16 00 00 80 00
> > [9273875.864971] blk_update_request: I/O error, dev sr1, sector 9165912
> >
> > To cause this, I mounted the dvd as seen in the first line and ran this
> > command: find /cdrom2 -type f | xargs -tn1 cat > /dev/null
> > I did some various tests. Each test was done after umount and mount to
> > clear the cache.
> > cat <file> > /dev/null causes the message.
> > dd if=<file> of=/dev/null bs=2048 doesn't
> > using bs=4096 doesn't
> > using bs=64k doesn't
> > using bs=128k does
> > cat uses a blocksize of 128k.
> >
> > The following was done without being mounted.
> > ddrescue -f -f /dev/sr1 /dev/null
> > doesn't cause the message
> > dd if=/dev/sr1 of=/dev/null bs=128k
> > doesn't cause the message
> > using bs=256k causes the message once:
> > [9275916.857409] sr 27:0:0:0: [sr1] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
> > [9275916.857482] sr 27:0:0:0: [sr1] tag#0 Sense Key : 0x2 [current]
> > [9275916.857520] sr 27:0:0:0: [sr1] tag#0 ASC=0x8 ASCQ=0x0
> > [9275916.857556] sr 27:0:0:0: [sr1] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 80 00
> > [9275916.857614] blk_update_request: I/O error, dev sr1, sector 0
> >
> > If I access the disc from the target natively either by mounting and
> > accessing files or working with the device directly (ie dd) no errors are
> > logged on the target.
>
> OK, thanks for your test.
>
> Could you test the following patch and see if there is still the failure
> message?
>
> diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
> index 0d99b242e82e..6137287b52fb 100644
> --- a/drivers/target/target_core_pscsi.c
> +++ b/drivers/target/target_core_pscsi.c
> @@ -913,9 +913,11 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
>
> rc = bio_add_pc_page(pdv->pdv_sd->request_queue,
> bio, page, bytes, off);
> + if (rc != bytes)
> + goto fail;
> pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
> bio_segments(bio), nr_vecs);
> - if (rc != bytes) {
> + if (/*rc != bytes*/0) {
> pr_debug("PSCSI: Reached bio->bi_vcnt max:"
> " %d i: %d bio: %p, allocating another"
> " bio\n", bio->bi_vcnt, i, bio);

Target doesn't crash but the errors on the initiator are still there.

Seems that if I do large transfers, I see this in the initiator's logs.
With the previous patch, I burned 3 dvds at the same time, compared the
files to the originals and I have a script that catalogs the files. The
files consist of debian packages and source files. The 3 operations did not
show any errors in the kernel log on either end.

I did this test:
initiator: dd if=/dev/sr1 bs=512k count=1024 | md5sum
target: dd if=/dev/sr0 bs=512k count=1024 | md5sum

Result: the same. It's OK even with the i/o errors shown on the initiator.

The above patch was added on top of the one you gave me before, but I don't
believe that that would be an issue.

... Now if someone could help me with a kvm virtualization problem I'm
having with 4.16 that wasn't there with 4.15...

--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.

2018-04-13 02:21:57

by Ming Lei

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

On Thu, Apr 12, 2018 at 09:43:02PM -0400, Wakko Warner wrote:
> Ming Lei wrote:
> > On Tue, Apr 10, 2018 at 08:45:25PM -0400, Wakko Warner wrote:
> > > Sorry for the delay. I reverted my change, added this one. I didn't
> > > reboot, I just unloaded and loaded this one.
> > > Note: /dev/sr1 as seen from the initiator is /dev/sr0 (physical disc) on the
> > > target.
> > >
> > > Doesn't crash, however on the initiator I see this:
> > > [9273849.707777] ISO 9660 Extensions: RRIP_1991A
> > > [9273863.359718] scsi_io_completion: 13 callbacks suppressed
> > > [9273863.359788] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
> > > [9273863.359909] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current]
> > > [9273863.359974] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0
> > > [9273863.360036] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f6 96 00 00 80 00
> > > [9273863.360116] blk_update_request: 13 callbacks suppressed
> > > [9273863.360177] blk_update_request: I/O error, dev sr1, sector 9165400
> > > [9273875.864648] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
> > > [9273875.864738] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current]
> > > [9273875.864801] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0
> > > [9273875.864890] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f7 16 00 00 80 00
> > > [9273875.864971] blk_update_request: I/O error, dev sr1, sector 9165912
> > >
> > > To cause this, I mounted the dvd as seen in the first line and ran this
> > > command: find /cdrom2 -type f | xargs -tn1 cat > /dev/null
> > > I did some various tests. Each test was done after umount and mount to
> > > clear the cache.
> > > cat <file> > /dev/null causes the message.
> > > dd if=<file> of=/dev/null bs=2048 doesn't
> > > using bs=4096 doesn't
> > > using bs=64k doesn't
> > > using bs=128k does
> > > cat uses a blocksize of 128k.
> > >
> > > The following was done without being mounted.
> > > ddrescue -f -f /dev/sr1 /dev/null
> > > doesn't cause the message
> > > dd if=/dev/sr1 of=/dev/null bs=128k
> > > doesn't cause the message
> > > using bs=256k causes the message once:
> > > [9275916.857409] sr 27:0:0:0: [sr1] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
> > > [9275916.857482] sr 27:0:0:0: [sr1] tag#0 Sense Key : 0x2 [current]
> > > [9275916.857520] sr 27:0:0:0: [sr1] tag#0 ASC=0x8 ASCQ=0x0
> > > [9275916.857556] sr 27:0:0:0: [sr1] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 80 00
> > > [9275916.857614] blk_update_request: I/O error, dev sr1, sector 0
> > >
> > > If I access the disc from the target natively either by mounting and
> > > accessing files or working with the device directly (ie dd) no errors are
> > > logged on the target.
> >
> > OK, thanks for your test.
> >
> > Could you test the following patch and see if there is still the failure
> > message?
> >
> > diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
> > index 0d99b242e82e..6137287b52fb 100644
> > --- a/drivers/target/target_core_pscsi.c
> > +++ b/drivers/target/target_core_pscsi.c
> > @@ -913,9 +913,11 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> >
> > rc = bio_add_pc_page(pdv->pdv_sd->request_queue,
> > bio, page, bytes, off);
> > + if (rc != bytes)
> > + goto fail;
> > pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
> > bio_segments(bio), nr_vecs);
> > - if (rc != bytes) {
> > + if (/*rc != bytes*/0) {
> > pr_debug("PSCSI: Reached bio->bi_vcnt max:"
> > " %d i: %d bio: %p, allocating another"
> > " bio\n", bio->bi_vcnt, i, bio);
>
> Target doesn't crash but the errors on the initiator are still there.

OK, then this error log isn't related with my commit, because the patch
I sent to you in last email is to revert my commit simply.

But the following patch is one correct fix for your crash.

https://marc.info/?l=linux-kernel&m=152331690727052&w=2


Thanks,
Ming

2018-04-14 21:36:31

by Wakko Warner

[permalink] [raw]
Subject: Re: 4.15.14 crash with iscsi target and dvd

Ming Lei wrote:
> On Thu, Apr 12, 2018 at 09:43:02PM -0400, Wakko Warner wrote:
> > Ming Lei wrote:
> > > On Tue, Apr 10, 2018 at 08:45:25PM -0400, Wakko Warner wrote:
> > > > Sorry for the delay. I reverted my change, added this one. I didn't
> > > > reboot, I just unloaded and loaded this one.
> > > > Note: /dev/sr1 as seen from the initiator is /dev/sr0 (physical disc) on the
> > > > target.
> > > >
> > > > Doesn't crash, however on the initiator I see this:
> > > > [9273849.707777] ISO 9660 Extensions: RRIP_1991A
> > > > [9273863.359718] scsi_io_completion: 13 callbacks suppressed
> > > > [9273863.359788] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
> > > > [9273863.359909] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current]
> > > > [9273863.359974] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0
> > > > [9273863.360036] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f6 96 00 00 80 00
> > > > [9273863.360116] blk_update_request: 13 callbacks suppressed
> > > > [9273863.360177] blk_update_request: I/O error, dev sr1, sector 9165400
> > > > [9273875.864648] sr 26:0:0:0: [sr1] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
> > > > [9273875.864738] sr 26:0:0:0: [sr1] tag#1 Sense Key : 0x2 [current]
> > > > [9273875.864801] sr 26:0:0:0: [sr1] tag#1 ASC=0x8 ASCQ=0x0
> > > > [9273875.864890] sr 26:0:0:0: [sr1] tag#1 CDB: opcode=0x28 28 00 00 22 f7 16 00 00 80 00
> > > > [9273875.864971] blk_update_request: I/O error, dev sr1, sector 9165912
> > > >
> > > > To cause this, I mounted the dvd as seen in the first line and ran this
> > > > command: find /cdrom2 -type f | xargs -tn1 cat > /dev/null
> > > > I did some various tests. Each test was done after umount and mount to
> > > > clear the cache.
> > > > cat <file> > /dev/null causes the message.
> > > > dd if=<file> of=/dev/null bs=2048 doesn't
> > > > using bs=4096 doesn't
> > > > using bs=64k doesn't
> > > > using bs=128k does
> > > > cat uses a blocksize of 128k.
> > > >
> > > > The following was done without being mounted.
> > > > ddrescue -f -f /dev/sr1 /dev/null
> > > > doesn't cause the message
> > > > dd if=/dev/sr1 of=/dev/null bs=128k
> > > > doesn't cause the message
> > > > using bs=256k causes the message once:
> > > > [9275916.857409] sr 27:0:0:0: [sr1] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
> > > > [9275916.857482] sr 27:0:0:0: [sr1] tag#0 Sense Key : 0x2 [current]
> > > > [9275916.857520] sr 27:0:0:0: [sr1] tag#0 ASC=0x8 ASCQ=0x0
> > > > [9275916.857556] sr 27:0:0:0: [sr1] tag#0 CDB: opcode=0x28 28 00 00 00 00 00 00 00 80 00
> > > > [9275916.857614] blk_update_request: I/O error, dev sr1, sector 0
> > > >
> > > > If I access the disc from the target natively either by mounting and
> > > > accessing files or working with the device directly (ie dd) no errors are
> > > > logged on the target.
> > >
> > > OK, thanks for your test.
> > >
> > > Could you test the following patch and see if there is still the failure
> > > message?
> > >
> > > diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
> > > index 0d99b242e82e..6137287b52fb 100644
> > > --- a/drivers/target/target_core_pscsi.c
> > > +++ b/drivers/target/target_core_pscsi.c
> > > @@ -913,9 +913,11 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
> > >
> > > rc = bio_add_pc_page(pdv->pdv_sd->request_queue,
> > > bio, page, bytes, off);
> > > + if (rc != bytes)
> > > + goto fail;
> > > pr_debug("PSCSI: bio->bi_vcnt: %d nr_vecs: %d\n",
> > > bio_segments(bio), nr_vecs);
> > > - if (rc != bytes) {
> > > + if (/*rc != bytes*/0) {
> > > pr_debug("PSCSI: Reached bio->bi_vcnt max:"
> > > " %d i: %d bio: %p, allocating another"
> > > " bio\n", bio->bi_vcnt, i, bio);
> >
> > Target doesn't crash but the errors on the initiator are still there.
>
> OK, then this error log isn't related with my commit, because the patch
> I sent to you in last email is to revert my commit simply.
>
> But the following patch is one correct fix for your crash.
>
> https://marc.info/?l=linux-kernel&m=152331690727052&w=2

Ok, that'll be the one I used. Do you know when it'll go upstream?

--
Microsoft has beaten Volkswagen's world record. Volkswagen only created 22
million bugs.