Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp4838395pxv; Tue, 27 Jul 2021 18:34:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJweA3PwUjHm4Y+uU6tkYKgGz7ZpPUjiB/m5BeT76p0QQKusrtsuNVHBEwyr11Pt5NHD1625 X-Received: by 2002:a17:906:ae8f:: with SMTP id md15mr17043218ejb.198.1627436080041; Tue, 27 Jul 2021 18:34:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627436080; cv=none; d=google.com; s=arc-20160816; b=c5LaqZApjggsXu5I2UqWAKupxkQtPcT4ybBiWtZrlNuXYWOTUuwJnBrT+949XFnXeI s2FUfbr6+6iOBBY1OVelzyEIcBrmSTecjP6I/3Dw1FSl6Ahu+WQ2fAwikj6IMcUQXJBd bJel3ZnKeUcSfV+fDAFYli0pueXUaPs1VzT98AwzVz1jM4SWyimnXLWx9idVAOgcwFem ee9xw6o/d8kgSHXgV5dY++peHiYWzEcYyBmeR11UBzAe6WI+HdlVeYAj7EBrnj4NfNeC 7f+3GqCSd7cvxqr744kv2rVM7DUNlLVQJNHXpaG+CpBE/L+pOfOsC0RcCwtwpW84NbfH dPIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=Kf5aUfsANGKBoZwHiIJy34leYH8aHIuAfY+f1HX0vi8=; b=dhpt92Mbyu6fP7QHeZFeblsyFnbQnBfRhu9Vsgb9i18kMNXVSmpGx6pP3dU4XkbaL5 wNrhcDpzec22wJZ9SZp0CFvHTq19YmPJhJCjyvqO7NCrHgTr0uF31qSVRC94ucFKL3ME 3kq/xQmV50XmilXZzN2XmT7CrQAHckgXzogat8r/oX2kPQhIy6tCqisAckNvTuomymyO /IahFIbDfuR3aqKGUkHPEG06Q7GmCIvhlevgkvrXkJjDxJRIaet9tzjGOJKGBQSADFxw 4bcNY84a2P+qI46HJdJXhBY4JNDq8T9Bx2evo2r+7PYG/u/B2CgCiSds95N7KL4xwYPR Kj9g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="hOpcyx/t"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bu15si4627537ejb.164.2021.07.27.18.34.17; Tue, 27 Jul 2021 18:34:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="hOpcyx/t"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234516AbhG1Bc3 (ORCPT + 99 others); Tue, 27 Jul 2021 21:32:29 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:39645 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234493AbhG1Bc2 (ORCPT ); Tue, 27 Jul 2021 21:32:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1627435947; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Kf5aUfsANGKBoZwHiIJy34leYH8aHIuAfY+f1HX0vi8=; b=hOpcyx/t9/QEeAk9v4EaHgkZikHOtBpKwGBy2hgQKHgO0UjuVJGZTERYcyIHyc+hlnbNG6 +YgPymOzLzN0O1KVczx2rf6LnWLRZbESALQd50JLU3VkJRWpnVYJWUxK/rN4JTCODwBaAR jCJlrt42JkUyhUh8WZvCkAasjiaxz94= Received: from mail-lf1-f70.google.com (mail-lf1-f70.google.com [209.85.167.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-551-sPXii0CDPk-vF-aDa27GrQ-1; Tue, 27 Jul 2021 21:32:26 -0400 X-MC-Unique: sPXii0CDPk-vF-aDa27GrQ-1 Received: by mail-lf1-f70.google.com with SMTP id t10-20020a056512208ab029037e853a8057so216745lfr.13 for ; Tue, 27 Jul 2021 18:32:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Kf5aUfsANGKBoZwHiIJy34leYH8aHIuAfY+f1HX0vi8=; b=eu/wnAJmrlJrL/VWAmgK6MJvROsi+wBAMVP0s818J4YOm76BkhbI++PiaBN65rTf47 6RqDoMqIjuiJELpAAFG6x7nBKO3W9KxjzQDNhadvssSnkhEioiNflnNBStLpI2IBnEYo GBuOinR2HBfKHZDwSq5asZHVL1Bh8Ns6FIg51ltxC+JOlXk+v/rgLS1ypASHJi+hiWkb n9CrLMhsb+cRdI8tPdAWpvSZwZuc9pLQ+r8q7xWZdIMIoir34s+fua/tyy5edqRU0ML9 Wo3n/p9jRpxk1lrROU81aFoZhghswiW/pYmophgNb7qiE0njJ6BnfcuIhSNLz5oWwG8P 7xnQ== X-Gm-Message-State: AOAM533R5XUzNJEsjy9y/tx/R2vPyv5Mc1NF+zhAqGOf9gnTzW676n3Q gzbyYwcvDEFSxftvv1mhjdI3kV9ChS7cMtO1/7k94FmJwTJAOJNbN6GAyDuISg9IhAer32hTSBr wMUGTIzh73DzC+qDvXo2bISmaXtOHHwplLTx6nP9Y X-Received: by 2002:a2e:9f17:: with SMTP id u23mr17792646ljk.489.1627435944394; Tue, 27 Jul 2021 18:32:24 -0700 (PDT) X-Received: by 2002:a2e:9f17:: with SMTP id u23mr17792637ljk.489.1627435944184; Tue, 27 Jul 2021 18:32:24 -0700 (PDT) MIME-Version: 1.0 References: <74537f9c-af5f-cd84-60ab-49ca6220310e@huawei.com> <9c929985-4fcb-e65d-0265-34c820b770ea@huawei.com> <0adbe03b-ce26-e4d3-3425-d967bc436ef5@arm.com> <6ceab844-465f-3bf3-1809-5df1f1dbbc5c@huawei.com> In-Reply-To: <6ceab844-465f-3bf3-1809-5df1f1dbbc5c@huawei.com> From: Ming Lei Date: Wed, 28 Jul 2021 09:32:17 +0800 Message-ID: Subject: Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node To: John Garry Cc: Robin Murphy , linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, iommu@lists.linux-foundation.org, Will Deacon , linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 26, 2021 at 3:51 PM John Garry wrote: > > On 23/07/2021 11:21, Ming Lei wrote: > >> Thanks, I was also going to suggest the latter, since it's what > >> arm_smmu_cmdq_issue_cmdlist() does with IRQs masked that should be most > >> indicative of where the slowness most likely stems from. > > The improvement from 'iommu.strict=0' is very small: > > > > Have you tried turning off the IOMMU to ensure that this is really just > an IOMMU problem? > > You can try setting CONFIG_ARM_SMMU_V3=n in the defconfig or passing > cmdline param iommu.passthrough=1 to bypass the the SMMU (equivalent to > disabling for kernel drivers). Bypassing SMMU via iommu.passthrough=1 basically doesn't make a difference on this issue. And from fio log, submission latency is good, but completion latency is pretty bad, and maybe it is something that writing to PCI memory isn't committed to HW in time? BTW, adding one mb() at the exit of nvme_queue_rq() doesn't make a difference. Follows the fio log after passing iommu.passthrough=1: [root@ampere-mtjade-04 ~]# taskset -c 0 ~/git/tools/test/nvme/io_uring 10 1 /dev/nvme1n1 4k + fio --bs=4k --ioengine=io_uring --fixedbufs --registerfiles --hipri --iodepth=64 --iodepth_batch_submit=16 --iodepth_batch_complete_min=16 --filename=/dev/nvme1n1 --direct=1 --runtime=10 --numjobs=1 --rw=randread --name=test --group_reporting test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=64 fio-3.27 Starting 1 process Jobs: 1 (f=1): [r(1)][100.0%][r=1538MiB/s][r=394k IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=3053: Tue Jul 27 20:57:04 2021 read: IOPS=393k, BW=1536MiB/s (1611MB/s)(15.0GiB/10001msec) slat (usec): min=12, max=343, avg=18.54, stdev= 3.47 clat (usec): min=46, max=487, avg=140.15, stdev=22.72 lat (usec): min=63, max=508, avg=158.72, stdev=22.29 clat percentiles (usec): | 1.00th=[ 87], 5.00th=[ 104], 10.00th=[ 113], 20.00th=[ 123], | 30.00th=[ 130], 40.00th=[ 135], 50.00th=[ 141], 60.00th=[ 145], | 70.00th=[ 151], 80.00th=[ 159], 90.00th=[ 167], 95.00th=[ 176], | 99.00th=[ 196], 99.50th=[ 206], 99.90th=[ 233], 99.95th=[ 326], | 99.99th=[ 392] bw ( MiB/s): min= 1533, max= 1539, per=100.00%, avg=1537.99, stdev= 1.36, samples=19 iops : min=392672, max=394176, avg=393724.63, stdev=348.25, samples=19 lat (usec) : 50=0.01%, 100=3.64%, 250=96.30%, 500=0.06% cpu : usr=17.58%, sys=82.03%, ctx=1113, majf=0, minf=5 IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=3933712,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: bw=1536MiB/s (1611MB/s), 1536MiB/s-1536MiB/s (1611MB/s-1611MB/s), io=15.0GiB (16.1GB), run=10001-10001msec Disk stats (read/write): nvme1n1: ios=3890950/0, merge=0/0, ticks=529137/0, in_queue=529137, util=99.15% [root@ampere-mtjade-04 ~]# [root@ampere-mtjade-04 ~]# taskset -c 80 ~/git/tools/test/nvme/io_uring 10 1 /dev/nvme1n1 4k + fio --bs=4k --ioengine=io_uring --fixedbufs --registerfiles --hipri --iodepth=64 --iodepth_batch_submit=16 --iodepth_batch_complete_min=16 --filename=/dev/nvme1n1 --direct=1 --runtime=10 --numjobs=1 --rw=randread --name=test --group_reporting test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=64 fio-3.27 Starting 1 process Jobs: 1 (f=1): [r(1)][100.0%][r=150MiB/s][r=38.4k IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=3062: Tue Jul 27 20:57:23 2021 read: IOPS=38.4k, BW=150MiB/s (157MB/s)(1501MiB/10002msec) slat (usec): min=14, max=376, avg=20.21, stdev= 4.66 clat (usec): min=439, max=2457, avg=1640.85, stdev=17.01 lat (usec): min=559, max=2494, avg=1661.09, stdev=15.67 clat percentiles (usec): | 1.00th=[ 1614], 5.00th=[ 1631], 10.00th=[ 1647], 20.00th=[ 1647], | 30.00th=[ 1647], 40.00th=[ 1647], 50.00th=[ 1647], 60.00th=[ 1647], | 70.00th=[ 1647], 80.00th=[ 1647], 90.00th=[ 1647], 95.00th=[ 1647], | 99.00th=[ 1647], 99.50th=[ 1663], 99.90th=[ 1729], 99.95th=[ 1827], | 99.99th=[ 2057] bw ( KiB/s): min=153600, max=153984, per=100.00%, avg=153876.21, stdev=88.10, samples=19 iops : min=38400, max=38496, avg=38469.05, stdev=22.02, samples=19 lat (usec) : 500=0.01%, 1000=0.01% lat (msec) : 2=99.96%, 4=0.03% cpu : usr=2.00%, sys=97.65%, ctx=1056, majf=0, minf=5 IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=384288,0,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): READ: bw=150MiB/s (157MB/s), 150MiB/s-150MiB/s (157MB/s-157MB/s), io=1501MiB (1574MB), run=10002-10002msec Disk stats (read/write): nvme1n1: ios=380266/0, merge=0/0, ticks=554940/0, in_queue=554940, util=99.22%