2019-06-17 12:20:33

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 2/8] scsi: take the DMA max mapping size into account

We need to limit the devices max_sectors to what the DMA mapping
implementation can support. If not we risk running out of swiotlb
buffers easily.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/scsi/scsi_lib.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d333bb6b1c59..f233bfd84cd7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1768,6 +1768,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize);
}

+ shost->max_sectors = min_t(unsigned int, shost->max_sectors,
+ dma_max_mapping_size(dev) << SECTOR_SHIFT);
blk_queue_max_hw_sectors(q, shost->max_sectors);
if (shost->unchecked_isa_dma)
blk_queue_bounce_limit(q, BLK_BOUNCE_ISA);
--
2.20.1


2019-06-17 20:57:02

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH 2/8] scsi: take the DMA max mapping size into account

On 6/17/19 5:19 AM, Christoph Hellwig wrote:
> We need to limit the devices max_sectors to what the DMA mapping
> implementation can support. If not we risk running out of swiotlb
> buffers easily.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/scsi/scsi_lib.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index d333bb6b1c59..f233bfd84cd7 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1768,6 +1768,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
> blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize);
> }
>
> + shost->max_sectors = min_t(unsigned int, shost->max_sectors,
> + dma_max_mapping_size(dev) << SECTOR_SHIFT);
> blk_queue_max_hw_sectors(q, shost->max_sectors);
> if (shost->unchecked_isa_dma)
> blk_queue_bounce_limit(q, BLK_BOUNCE_ISA);

Does dma_max_mapping_size() return a value in bytes? Is
shost->max_sectors a number of sectors? If so, are you sure that "<<
SECTOR_SHIFT" is the proper conversion? Shouldn't that be ">>
SECTOR_SHIFT" instead?

Additionally, how about adding a comment above dma_max_mapping_size()
that documents the unit of the returned number?

Thanks,

Bart.

2019-07-22 06:01:40

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH 2/8] scsi: take the DMA max mapping size into account

On Tue, Jun 18, 2019 at 4:57 AM Bart Van Assche <[email protected]> wrote:
>
> On 6/17/19 5:19 AM, Christoph Hellwig wrote:
> > We need to limit the devices max_sectors to what the DMA mapping
> > implementation can support. If not we risk running out of swiotlb
> > buffers easily.
> >
> > Signed-off-by: Christoph Hellwig <[email protected]>
> > ---
> > drivers/scsi/scsi_lib.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index d333bb6b1c59..f233bfd84cd7 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1768,6 +1768,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
> > blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize);
> > }
> >
> > + shost->max_sectors = min_t(unsigned int, shost->max_sectors,
> > + dma_max_mapping_size(dev) << SECTOR_SHIFT);
> > blk_queue_max_hw_sectors(q, shost->max_sectors);
> > if (shost->unchecked_isa_dma)
> > blk_queue_bounce_limit(q, BLK_BOUNCE_ISA);
>
> Does dma_max_mapping_size() return a value in bytes? Is
> shost->max_sectors a number of sectors? If so, are you sure that "<<
> SECTOR_SHIFT" is the proper conversion? Shouldn't that be ">>
> SECTOR_SHIFT" instead?

Now the patch has been committed, '<< SECTOR_SHIFT' needs to be fixed.

Also the following kernel oops is triggered on qemu, and looks
device->dma_mask is NULL.

[ 5.826483] scsi host0: Virtio SCSI HBA
[ 5.829302] st: Version 20160209, fixed bufsize 32768, s/g segs 256
[ 5.831042] SCSI Media Changer driver v0.25
[ 5.832491] ==================================================================
[ 5.833332] BUG: KASAN: null-ptr-deref in
dma_direct_max_mapping_size+0x30/0x94
[ 5.833332] Read of size 8 at addr 0000000000000000 by task kworker/u17:0/7
[ 5.835506] nvme nvme0: pci function 0000:00:07.0
[ 5.833332]
[ 5.833332] CPU: 2 PID: 7 Comm: kworker/u17:0 Not tainted 5.3.0-rc1 #1328
[ 5.836999] ahci 0000:00:1f.2: version 3.0
[ 5.833332] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS ?-20180724_192412-buildhw-07.phx4
[ 5.833332] Workqueue: events_unbound async_run_entry_fn
[ 5.833332] Call Trace:
[ 5.833332] dump_stack+0x6f/0x9d
[ 5.833332] ? dma_direct_max_mapping_size+0x30/0x94
[ 5.833332] __kasan_report+0x161/0x189
[ 5.833332] ? dma_direct_max_mapping_size+0x30/0x94
[ 5.833332] kasan_report+0xe/0x12
[ 5.833332] dma_direct_max_mapping_size+0x30/0x94
[ 5.833332] __scsi_init_queue+0xd8/0x1f3
[ 5.833332] scsi_mq_alloc_queue+0x62/0x89
[ 5.833332] scsi_alloc_sdev+0x38c/0x479
[ 5.833332] scsi_probe_and_add_lun+0x22d/0x1093
[ 5.833332] ? kobject_set_name_vargs+0xa4/0xb2
[ 5.833332] ? mutex_lock+0x88/0xc4
[ 5.833332] ? scsi_free_host_dev+0x4a/0x4a
[ 5.833332] ? _raw_spin_lock_irqsave+0x8c/0xde
[ 5.833332] ? _raw_write_unlock_irqrestore+0x23/0x23
[ 5.833332] ? ata_tdev_match+0x22/0x45
[ 5.833332] ? attribute_container_add_device+0x160/0x17e
[ 5.833332] ? rpm_resume+0x26a/0x7c0
[ 5.833332] ? kobject_get+0x12/0x43
[ 5.833332] ? rpm_put_suppliers+0x7e/0x7e
[ 5.833332] ? _raw_spin_lock_irqsave+0x8c/0xde
[ 5.833332] ? _raw_write_unlock_irqrestore+0x23/0x23
[ 5.833332] ? scsi_target_destroy+0x135/0x135
[ 5.833332] __scsi_scan_target+0x14b/0x6aa
[ 5.833332] ? pvclock_clocksource_read+0xc0/0x14e
[ 5.833332] ? scsi_add_device+0x20/0x20
[ 5.833332] ? rpm_resume+0x1ae/0x7c0
[ 5.833332] ? rpm_put_suppliers+0x7e/0x7e
[ 5.833332] ? _raw_spin_lock_irqsave+0x8c/0xde
[ 5.833332] ? _raw_write_unlock_irqrestore+0x23/0x23
[ 5.833332] ? pick_next_task_fair+0x976/0xa3d
[ 5.833332] ? mutex_lock+0x88/0xc4
[ 5.833332] scsi_scan_channel+0x76/0x9e
[ 5.833332] scsi_scan_host_selected+0x131/0x176
[ 5.833332] ? scsi_scan_host+0x241/0x241
[ 5.833332] do_scan_async+0x27/0x219
[ 5.833332] ? scsi_scan_host+0x241/0x241
[ 5.833332] async_run_entry_fn+0xdc/0x23d
[ 5.833332] process_one_work+0x327/0x539
[ 5.833332] worker_thread+0x330/0x492
[ 5.833332] ? rescuer_thread+0x41f/0x41f
[ 5.833332] kthread+0x1c6/0x1d5
[ 5.833332] ? kthread_park+0xd3/0xd3
[ 5.833332] ret_from_fork+0x1f/0x30
[ 5.833332] ==================================================================



Thanks,
Ming Lei

2019-07-22 06:28:48

by Dexuan-Linux Cui

[permalink] [raw]
Subject: Re: [PATCH 2/8] scsi: take the DMA max mapping size into account

On Sun, Jul 21, 2019 at 11:01 PM Ming Lei <[email protected]> wrote:
>
> On Tue, Jun 18, 2019 at 4:57 AM Bart Van Assche <[email protected]> wrote:
> >
> > On 6/17/19 5:19 AM, Christoph Hellwig wrote:
> > > We need to limit the devices max_sectors to what the DMA mapping
> > > implementation can support. If not we risk running out of swiotlb
> > > buffers easily.
> > >
> > > Signed-off-by: Christoph Hellwig <[email protected]>
> > > ---
> > > drivers/scsi/scsi_lib.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > > index d333bb6b1c59..f233bfd84cd7 100644
> > > --- a/drivers/scsi/scsi_lib.c
> > > +++ b/drivers/scsi/scsi_lib.c
> > > @@ -1768,6 +1768,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
> > > blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize);
> > > }
> > >
> > > + shost->max_sectors = min_t(unsigned int, shost->max_sectors,
> > > + dma_max_mapping_size(dev) << SECTOR_SHIFT);
> > > blk_queue_max_hw_sectors(q, shost->max_sectors);
> > > if (shost->unchecked_isa_dma)
> > > blk_queue_bounce_limit(q, BLK_BOUNCE_ISA);
> >
> > Does dma_max_mapping_size() return a value in bytes? Is
> > shost->max_sectors a number of sectors? If so, are you sure that "<<
> > SECTOR_SHIFT" is the proper conversion? Shouldn't that be ">>
> > SECTOR_SHIFT" instead?
>
> Now the patch has been committed, '<< SECTOR_SHIFT' needs to be fixed.
>
> Also the following kernel oops is triggered on qemu, and looks
> device->dma_mask is NULL.
>
> Ming Lei

FYI: we also see the panic with a Linux kernel 5.2.0-next-20190719
running on Hyper-V:

[ 7.429053] RIP: 0010:dma_direct_max_mapping_size+0x26/0x80
[ 7.429053] Code: 0f b6 c0 c3 0f 1f 44 00 00 55 48 89 e5 41 54 53
48 89 fb e8 4c 14 00 00 84 c0 74 45 48 8b 83 28 02 00 00 4c 8b a3 38
02 00 00 <48> 8b 00 48 85 c0 74 0c 4d 85 e4 74 36 49 39 c4 4c 0f 47 e0
48 89
[ 7.429053] RSP: 0018:ffffc1d5005efbc0 EFLAGS: 00010202
[ 7.429053] RAX: 0000000000000000 RBX: ffff9cf86d24c428 RCX: 0000000000000000
[ 7.429053] RDX: ffff9cf86d12dd00 RSI: 0000000000000200 RDI: ffff9cf86d24c428
[ 7.429053] RBP: ffffc1d5005efbd0 R08: ffff9cf86fcaf0e0 R09: ffff9cf86e0072c0
[ 7.429053] R10: ffffc1d5005efa70 R11: 00000000000301a0 R12: 0000000000000000
[ 7.429053] R13: ffff9cf86d24c428 R14: 0000000000000400 R15: ffff9cf825cff000
[ 7.429053] FS: 0000000000000000(0000) GS:ffff9cf86fc80000(0000)
knlGS:0000000000000000
[ 7.429053] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7.429053] CR2: 0000000000000000 CR3: 00000003c700a001 CR4: 00000000003606e0
[ 7.456569] NET: Registered protocol family 17
[ 7.429053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7.469803] Key type dns_resolver registered
[ 7.429053] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7.429053] Call Trace:
[ 7.429053] dma_max_mapping_size+0x39/0x50
[ 7.429053] __scsi_init_queue+0x7f/0x140
[ 7.429053] scsi_mq_alloc_queue+0x38/0x60
[ 7.429053] scsi_alloc_sdev+0x1da/0x2b0
[ 7.429053] scsi_probe_and_add_lun+0x471/0xe60
[ 7.429053] __scsi_scan_target+0xfc/0x610
[ 7.429053] scsi_scan_channel+0x66/0xa0
[ 7.429053] scsi_scan_host_selected+0xf3/0x160
[ 7.429053] do_scsi_scan_host+0x93/0xa0
[ 7.429053] do_scan_async+0x1c/0x190
[ 7.429053] async_run_entry_fn+0x3c/0x150
[ 7.429053] process_one_work+0x1f7/0x3f0
[ 7.429053] worker_thread+0x34/0x400
[ 7.429053] kthread+0x121/0x140
[ 7.429053] ret_from_fork+0x35/0x40
[ 7.429053] Modules linked in:
[ 7.429053] CR2: 0000000000000000
[ 7.766122] BUG: kernel NULL pointer dereference, address: 0000000000000000

Thanks,
-- Dexuan

2019-07-22 07:46:34

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH 2/8] scsi: take the DMA max mapping size into account

On 2019/07/22 15:01, Ming Lei wrote:
> On Tue, Jun 18, 2019 at 4:57 AM Bart Van Assche <[email protected]> wrote:
>>
>> On 6/17/19 5:19 AM, Christoph Hellwig wrote:
>>> We need to limit the devices max_sectors to what the DMA mapping
>>> implementation can support. If not we risk running out of swiotlb
>>> buffers easily.
>>>
>>> Signed-off-by: Christoph Hellwig <[email protected]>
>>> ---
>>> drivers/scsi/scsi_lib.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>> index d333bb6b1c59..f233bfd84cd7 100644
>>> --- a/drivers/scsi/scsi_lib.c
>>> +++ b/drivers/scsi/scsi_lib.c
>>> @@ -1768,6 +1768,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
>>> blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize);
>>> }
>>>
>>> + shost->max_sectors = min_t(unsigned int, shost->max_sectors,
>>> + dma_max_mapping_size(dev) << SECTOR_SHIFT);
>>> blk_queue_max_hw_sectors(q, shost->max_sectors);
>>> if (shost->unchecked_isa_dma)
>>> blk_queue_bounce_limit(q, BLK_BOUNCE_ISA);
>>
>> Does dma_max_mapping_size() return a value in bytes? Is
>> shost->max_sectors a number of sectors? If so, are you sure that "<<
>> SECTOR_SHIFT" is the proper conversion? Shouldn't that be ">>
>> SECTOR_SHIFT" instead?
>
> Now the patch has been committed, '<< SECTOR_SHIFT' needs to be fixed.
>
> Also the following kernel oops is triggered on qemu, and looks
> device->dma_mask is NULL.

Just hit the exact same problem using tcmu-runner (ZBC file handler) on bare
metal (no QEMU). dev->dma_mask is NULL. No problem with real disks though.

>
> [ 5.826483] scsi host0: Virtio SCSI HBA
> [ 5.829302] st: Version 20160209, fixed bufsize 32768, s/g segs 256
> [ 5.831042] SCSI Media Changer driver v0.25
> [ 5.832491] ==================================================================
> [ 5.833332] BUG: KASAN: null-ptr-deref in
> dma_direct_max_mapping_size+0x30/0x94
> [ 5.833332] Read of size 8 at addr 0000000000000000 by task kworker/u17:0/7
> [ 5.835506] nvme nvme0: pci function 0000:00:07.0
> [ 5.833332]
> [ 5.833332] CPU: 2 PID: 7 Comm: kworker/u17:0 Not tainted 5.3.0-rc1 #1328
> [ 5.836999] ahci 0000:00:1f.2: version 3.0
> [ 5.833332] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS ?-20180724_192412-buildhw-07.phx4
> [ 5.833332] Workqueue: events_unbound async_run_entry_fn
> [ 5.833332] Call Trace:
> [ 5.833332] dump_stack+0x6f/0x9d
> [ 5.833332] ? dma_direct_max_mapping_size+0x30/0x94
> [ 5.833332] __kasan_report+0x161/0x189
> [ 5.833332] ? dma_direct_max_mapping_size+0x30/0x94
> [ 5.833332] kasan_report+0xe/0x12
> [ 5.833332] dma_direct_max_mapping_size+0x30/0x94
> [ 5.833332] __scsi_init_queue+0xd8/0x1f3
> [ 5.833332] scsi_mq_alloc_queue+0x62/0x89
> [ 5.833332] scsi_alloc_sdev+0x38c/0x479
> [ 5.833332] scsi_probe_and_add_lun+0x22d/0x1093
> [ 5.833332] ? kobject_set_name_vargs+0xa4/0xb2
> [ 5.833332] ? mutex_lock+0x88/0xc4
> [ 5.833332] ? scsi_free_host_dev+0x4a/0x4a
> [ 5.833332] ? _raw_spin_lock_irqsave+0x8c/0xde
> [ 5.833332] ? _raw_write_unlock_irqrestore+0x23/0x23
> [ 5.833332] ? ata_tdev_match+0x22/0x45
> [ 5.833332] ? attribute_container_add_device+0x160/0x17e
> [ 5.833332] ? rpm_resume+0x26a/0x7c0
> [ 5.833332] ? kobject_get+0x12/0x43
> [ 5.833332] ? rpm_put_suppliers+0x7e/0x7e
> [ 5.833332] ? _raw_spin_lock_irqsave+0x8c/0xde
> [ 5.833332] ? _raw_write_unlock_irqrestore+0x23/0x23
> [ 5.833332] ? scsi_target_destroy+0x135/0x135
> [ 5.833332] __scsi_scan_target+0x14b/0x6aa
> [ 5.833332] ? pvclock_clocksource_read+0xc0/0x14e
> [ 5.833332] ? scsi_add_device+0x20/0x20
> [ 5.833332] ? rpm_resume+0x1ae/0x7c0
> [ 5.833332] ? rpm_put_suppliers+0x7e/0x7e
> [ 5.833332] ? _raw_spin_lock_irqsave+0x8c/0xde
> [ 5.833332] ? _raw_write_unlock_irqrestore+0x23/0x23
> [ 5.833332] ? pick_next_task_fair+0x976/0xa3d
> [ 5.833332] ? mutex_lock+0x88/0xc4
> [ 5.833332] scsi_scan_channel+0x76/0x9e
> [ 5.833332] scsi_scan_host_selected+0x131/0x176
> [ 5.833332] ? scsi_scan_host+0x241/0x241
> [ 5.833332] do_scan_async+0x27/0x219
> [ 5.833332] ? scsi_scan_host+0x241/0x241
> [ 5.833332] async_run_entry_fn+0xdc/0x23d
> [ 5.833332] process_one_work+0x327/0x539
> [ 5.833332] worker_thread+0x330/0x492
> [ 5.833332] ? rescuer_thread+0x41f/0x41f
> [ 5.833332] kthread+0x1c6/0x1d5
> [ 5.833332] ? kthread_park+0xd3/0xd3
> [ 5.833332] ret_from_fork+0x1f/0x30
> [ 5.833332] ==================================================================
>
>
>
> Thanks,
> Ming Lei
>


--
Damien Le Moal
Western Digital Research