2019-06-05 09:15:33

by Sebastian Ott

[permalink] [raw]
Subject: too large sg segments with commit 09324d32d2a08

Hi,

this warning turned up on s390:

[ 7.041512] ------------[ cut here ]------------
[ 7.041518] DMA-API: nvme 0000:00:00.0: mapping sg segment longer than device claims to support [len=106496] [max=65536]
[ 7.041531] WARNING: CPU: 1 PID: 229 at kernel/dma/debug.c:1233 debug_dma_map_sg+0x21e/0x350
[ 7.041537] Modules linked in: scm_block(+) eadm_sch sch_fq_codel autofs4
[ 7.041547] CPU: 1 PID: 229 Comm: systemd-udevd Not tainted 5.2.0-rc3-00002-g112d38aa4733-dirty #146
[ 7.041552] Hardware name: IBM 3906 M03 703 (LPAR)
[ 7.041558] Krnl PSW : 0704d00180000000 00000000af580b6e (debug_dma_map_sg+0x21e/0x350)
[ 7.041566] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
[ 7.041572] Krnl GPRS: 0000000095969122 0000000080000000 000000000000006c 00000000af5624bc
[ 7.041578] 0000000000000007 0000000000000001 0000000000010000 0000000000000001
[ 7.041583] 00000000b081d278 00000000fea06888 000000008fa43400 00000000f255b418
[ 7.041589] 00000000f4a28100 ffffffff00000000 00000000af580b6a 000003e0004a36e0
[ 7.041599] Krnl Code: 00000000af580b5e: c02000566fd7 larl %r2,b004eb0c
00000000af580b64: c0e5fffad1fe brasl %r14,af4daf60
#00000000af580b6a: a7f40001 brc 15,af580b6c
>00000000af580b6e: c010005f45f5 larl %r1,b0169758
00000000af580b74: e31010000012 lt %r1,0(%r1)
00000000af580b7a: a774000f brc 7,af580b98
00000000af580b7e: c010005fac23 larl %r1,b01763c4
00000000af580b84: e31010000012 lt %r1,0(%r1)
[ 7.041620] Call Trace:
[ 7.041626] ([<00000000af580b6a>] debug_dma_map_sg+0x21a/0x350)
[ 7.041633] [<00000000afbe2152>] nvme_queue_rq+0x49a/0xd18
[ 7.041639] [<00000000afa178d0>] __blk_mq_try_issue_directly+0x108/0x1f0
[ 7.041645] [<00000000afa18e96>] blk_mq_request_issue_directly+0x4e/0x70
[ 7.041651] [<00000000afa18f42>] blk_mq_try_issue_list_directly+0x8a/0x118
[ 7.041657] [<00000000afa1e42e>] blk_mq_sched_insert_requests+0x1c6/0x350
[ 7.041663] [<00000000afa18e40>] blk_mq_flush_plug_list+0x4f8/0x500
[ 7.041669] [<00000000afa0bb3e>] blk_flush_plug_list+0x106/0x110
[ 7.041674] [<00000000afa0bb7c>] blk_finish_plug+0x34/0x50
[ 7.041680] [<00000000af6938c2>] read_pages+0x152/0x160
[ 7.041687] [<00000000af693b06>] __do_page_cache_readahead+0x236/0x268
[ 7.041693] [<00000000af694458>] force_page_cache_readahead+0x110/0x120
[ 7.041699] [<00000000af683fa4>] generic_file_buffered_read+0x144/0x968
[ 7.041706] [<00000000af74d14c>] new_sync_read+0x13c/0x1b8
[ 7.041712] [<00000000af74f9fa>] vfs_read+0x82/0x138
[ 7.041717] [<00000000af74fd92>] ksys_read+0x62/0xd8
[ 7.041724] [<00000000afe60e00>] system_call+0x2b0/0x2d0
[ 7.041729] 1 lock held by systemd-udevd/229:
[ 7.041734] #0: 00000000f715c4f3 (rcu_read_lock){....}, at: hctx_lock+0x28/0xf8
[ 7.041743] Last Breaking-Event-Address:
[ 7.041749] [<00000000af580b6a>] debug_dma_map_sg+0x21a/0x350
[ 7.041754] irq event stamp: 14457
[ 7.041760] hardirqs last enabled at (14465): [<00000000af560a2c>] console_unlock+0x63c/0x6a8
[ 7.041766] hardirqs last disabled at (14472): [<00000000af5604ba>] console_unlock+0xca/0x6a8
[ 7.041773] softirqs last enabled at (11488): [<00000000afa20ed4>] get_gendisk+0xf4/0x148
[ 7.041779] softirqs last disabled at (11486): [<00000000afa20e48>] get_gendisk+0x68/0x148
[ 7.041784] ---[ end trace 9142fc6f63a22c6e ]---

The length of the sg entry created by blk_rq_map_sg is indeed largen than
the dma max_segment_size.

Bisect points to 09324d32d2a0 ("block: force an unlimited segment size on queues with a virt boundary")

Regards,
Sebastian


2019-06-05 10:12:33

by Christoph Hellwig

[permalink] [raw]
Subject: Re: too large sg segments with commit 09324d32d2a08

The problem is that we don't communicate the block level max_segment
size to the iommu, and the commit really is just the messenger.

I'll cook up a series to fix this as papering over it in every driver
does not seem sustainable.

2019-06-05 13:32:12

by Christoph Hellwig

[permalink] [raw]
Subject: Re: too large sg segments with commit 09324d32d2a08

Actually, it looks like something completely general isn't
easily doable, not without some major dma API work. Here is what
should fix nvme, but a few other drivers will need fixes as well:

---
From 745541130409bc837a3416300f529b16eded8513 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <[email protected]>
Date: Wed, 5 Jun 2019 14:55:26 +0200
Subject: nvme-pci: don't limit DMA segement size

NVMe uses PRPs (or optionally unlimited SGLs) for data transfers and
has no specific limit for a single DMA segement. Limiting the size
will cause problems because the block layer assumes PRP-ish devices
using a virt boundary mask don't have a segment limit. And while this
is true, we also really need to tell the DMA mapping layer about it,
otherwise dma-debug will trip over it.

Signed-off-by: Christoph Hellwig <[email protected]>
Reported-by: Sebastian Ott <[email protected]>
---
drivers/nvme/host/pci.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f562154551ce..524d6bd6d095 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2513,6 +2513,12 @@ static void nvme_reset_work(struct work_struct *work)
*/
dev->ctrl.max_hw_sectors = NVME_MAX_KB_SZ << 1;
dev->ctrl.max_segments = NVME_MAX_SEGS;
+
+ /*
+ * Don't limit the IOMMU merged segment size.
+ */
+ dma_set_max_seg_size(dev->dev, 0xffffffff);
+
mutex_unlock(&dev->shutdown_lock);

/*
--
2.20.1

2019-06-05 13:59:28

by Sebastian Ott

[permalink] [raw]
Subject: Re: too large sg segments with commit 09324d32d2a08

On Wed, 5 Jun 2019, Christoph Hellwig wrote:
> Actually, it looks like something completely general isn't
> easily doable, not without some major dma API work. Here is what
> should fix nvme, but a few other drivers will need fixes as well:
>
> ---
> From 745541130409bc837a3416300f529b16eded8513 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <[email protected]>
> Date: Wed, 5 Jun 2019 14:55:26 +0200
> Subject: nvme-pci: don't limit DMA segement size
>
> NVMe uses PRPs (or optionally unlimited SGLs) for data transfers and
> has no specific limit for a single DMA segement. Limiting the size
> will cause problems because the block layer assumes PRP-ish devices
> using a virt boundary mask don't have a segment limit. And while this
> is true, we also really need to tell the DMA mapping layer about it,
> otherwise dma-debug will trip over it.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> Reported-by: Sebastian Ott <[email protected]>

Works for me. Thanks!