Received: by 2002:ac0:8c9a:0:0:0:0:0 with SMTP id r26csp5055334ima; Tue, 5 Feb 2019 05:47:02 -0800 (PST) X-Google-Smtp-Source: AHgI3IYj2OpgMtJpcotMmaHbomf6tluzDRNDUmKbwQ4/WUgM68ZK3EIV2uKigPGsn9EVSiXqSG6O X-Received: by 2002:a63:4858:: with SMTP id x24mr4622624pgk.420.1549374422375; Tue, 05 Feb 2019 05:47:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549374422; cv=none; d=google.com; s=arc-20160816; b=Si7lwyQ+/FoTe+riMT7QATMGoJzZvtW61ZUaWgbOpzUea8OtldVb35xZGdQwfLDrir a8kQM9/VXCqGbmzHOS87rOLLOlVk4KQcf37GaH+pZXE0rBi/QENNQWlV3HNTff7X3p+8 2r9dL2AhTAOoJqpj38hGE7y94qzvcHFQIRuPEMqWwhDpkiKxG0OoG36MkI9Ppi9l5Jws Qr7pjUyPEkjYKP8oB6KEWfRkNwzdMb1+c25q3zItZJIvB6dwDAgpyUcQxkKlt7WJyMZA mKX71CcCAg8JBHu7yUZTjPpBub9IBDHMIusMjmjf/JbOFnK+axjrWKeB4QkB9C7go/NR gADQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=zQ6n0WdlJvS56MEW24PPZw5NTVMV8Pk9faYEan8S8tM=; b=deoiE4pNkfjpXj86gGhacoVC9jMyOWa0UMRbEkd1k/gmdhwodRPSQH2NfS5Rahm7EO ohyg+PVfQlknNVXWIXfZxmVssBXEAU/vAAUWDxM7qvTs+fb3OOympcV88Ne6za5TdKGt xj9ZdnKScxZQks4TGjiqUyQb9sBbmFclP0B9czFQ8x5xmu/wJBRIhL78o8zn/pMq3Zre +bK+uqsRkSRWyQT4pVqaPCCElbn8Amzo8rrsoB22bllewvipXAwfL4hpEiOY4fb2nHA6 n6atb+zzg+VC4nHUzDf4KYzb/DDqZ5MKXHhIdlxdTkKRInQzaT/z5gI0dPI4gNXT1g3l ijQQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i16si3018031pgk.445.2019.02.05.05.46.46; Tue, 05 Feb 2019 05:47:02 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729308AbfBENiP (ORCPT + 99 others); Tue, 5 Feb 2019 08:38:15 -0500 Received: from p3plsmtpa12-10.prod.phx3.secureserver.net ([68.178.252.239]:46037 "EHLO p3plsmtpa12-10.prod.phx3.secureserver.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726731AbfBENiO (ORCPT ); Tue, 5 Feb 2019 08:38:14 -0500 Received: from [192.168.0.55] ([24.218.182.144]) by :SMTPAUTH: with ESMTPSA id r0vTg40O9TD2Er0vTgUdQL; Tue, 05 Feb 2019 06:38:13 -0700 Subject: Re: [PATCH 0/6] RFC v2: mm: gup/dma tracking To: John Hubbard , john.hubbard@gmail.com, Andrew Morton , linux-mm@kvack.org Cc: Al Viro , Christian Benvenuti , Christoph Hellwig , Christopher Lameter , Dan Williams , Dave Chinner , Dennis Dalessandro , Doug Ledford , Jan Kara , Jason Gunthorpe , Jerome Glisse , Matthew Wilcox , Michal Hocko , Mike Rapoport , Mike Marciniszyn , Ralph Campbell , LKML , linux-fsdevel@vger.kernel.org References: <20190204052135.25784-1-jhubbard@nvidia.com> <80d503f5-038b-7f0b-90d5-e5b9537ae1df@nvidia.com> From: Tom Talpey Message-ID: <303ab506-62b7-ee6d-27a0-a818c7ff6473@talpey.com> Date: Tue, 5 Feb 2019 08:38:10 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 MIME-Version: 1.0 In-Reply-To: <80d503f5-038b-7f0b-90d5-e5b9537ae1df@nvidia.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-CMAE-Envelope: MS4wfE3sP9dmfKcdvwJu410G2UxRjEbmm6Bp6I7HmkY03jHAbVJ8gN0XGZjMEc8grcrNKHP027cPIJnqRShJVl+GeERwSBU7SKYnjDo13e/2ROLUnwgzxGWi s/qokzrXZU3/VReIjTIS39+fLyK9wffXvjeaDBpbzBGhYSAEOKE1A5X9qlp2opXuxTNKcI1h+fx8GxeB10oiU9PxDUAZinynjeCnCcN5PE9HbppRSiQ1aTIO WO+SxL0wqqsHTSnk8gFbpqnye0KQgUlA3VT0cLXo6O0cEm894EWz0ZrpWUJVFyg2APLM1M2FtMlTAyXzL2xoiwV0d6oTYsjLvEhj4MCOO0R3MO6nilxR7sBD L8HPz1PD3xaByXrrud5gzvelN0M78eUZlpHbz+VSalYyEQ0lDLpOwlgnZHu3UO7bTjaGiWnonPDLHhT2bVou9s7E92JRDyEiymtUxcedDEaXzsdXfBFeIXlE JpA+F+hGjA4uclD9mIMQ5hZWvNY6/DyZbVbSRoSvtLjpwo3AM0EIvUZE0p9nB3wLU9VDR3mBqQsHnbikPxobZaHs+giBiymWgoX0QLFC3LmNslanL3SS+bRP loq9GW9DnSY/5pM9IocZfg0RD0ZbypLBXrZxzHyAOBmpmyUF3Se7w0g8GEi0kfvH11XueQTkll898OPjWfuMAiZgKxK4FAb2qssrdjiULQGFFxXYa5DgHtMI q6DTPNb1o0Ixb/YIbixT/k2bpmSReUCNXhqEPOihc45hmhEIbKvi6CmbSbhcLCnODeo7DnvuY9Lj/ulDNBPLKzNOwwZjwmSwZ91mLSUc9z4evUqZuLyg/rZr U+bUPPeaWAdDt9gpsrE= Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/5/2019 3:22 AM, John Hubbard wrote: > On 2/4/19 5:41 PM, Tom Talpey wrote: >> On 2/4/2019 12:21 AM, john.hubbard@gmail.com wrote: >>> From: John Hubbard >>> >>> >>> Performance: here is an fio run on an NVMe drive, using this for the fio >>> configuration file: >>> >>>      [reader] >>>      direct=1 >>>      ioengine=libaio >>>      blocksize=4096 >>>      size=1g >>>      numjobs=1 >>>      rw=read >>>      iodepth=64 >>> >>> reader: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) >>> 4096B-4096B, ioengine=libaio, iodepth=64 >>> fio-3.3 >>> Starting 1 process >>> Jobs: 1 (f=1) >>> reader: (groupid=0, jobs=1): err= 0: pid=7011: Sun Feb  3 20:36:51 2019 >>>     read: IOPS=190k, BW=741MiB/s (778MB/s)(1024MiB/1381msec) >>>      slat (nsec): min=2716, max=57255, avg=4048.14, stdev=1084.10 >>>      clat (usec): min=20, max=12485, avg=332.63, stdev=191.77 >>>       lat (usec): min=22, max=12498, avg=336.72, stdev=192.07 >>>      clat percentiles (usec): >>>       |  1.00th=[  322],  5.00th=[  322], 10.00th=[  322], 20.00th=[ >>> 326], >>>       | 30.00th=[  326], 40.00th=[  326], 50.00th=[  326], 60.00th=[ >>> 326], >>>       | 70.00th=[  326], 80.00th=[  330], 90.00th=[  330], 95.00th=[ >>> 330], >>>       | 99.00th=[  478], 99.50th=[  717], 99.90th=[ 1074], 99.95th=[ >>> 1090], >>>       | 99.99th=[12256] >> >> These latencies are concerning. The best results we saw at the end of >> November (previous approach) were MUCH flatter. These really start >> spiking at three 9's, and are sky-high at four 9's. The "stdev" values >> for clat and lat are about 10 times the previous. There's some kind >> of serious queuing contention here, that wasn't there in November. > > Hi Tom, > > I think this latency problem is also there in the baseline kernel, but... > >> >>>     bw (  KiB/s): min=730152, max=776512, per=99.22%, avg=753332.00, >>> stdev=32781.47, samples=2 >>>     iops        : min=182538, max=194128, avg=188333.00, >>> stdev=8195.37, samples=2 >>>    lat (usec)   : 50=0.01%, 100=0.01%, 250=0.07%, 500=99.26%, 750=0.38% >>>    lat (usec)   : 1000=0.02% >>>    lat (msec)   : 2=0.24%, 20=0.02% >>>    cpu          : usr=15.07%, sys=84.13%, ctx=10, majf=0, minf=74 >> >> System CPU 84% is roughly double the November results of 45%. Ouch. > > That's my fault. First of all, I had a few extra, supposedly minor debug > settings in the .config, which I'm removing now--I'm doing a proper run > with the original .config file from November, below. Second, I'm not > sure I controlled the run carefully enough. > >> >> Did you re-run the baseline on the new unpatched base kernel and can >> we see the before/after? > > Doing that now, I see: > > -- No significant perf difference between before and after, but > -- Still high clat in the 99.99th > > ======================================================================= > Before: using commit 8834f5600cf3 ("Linux 5.0-rc5") > =================================================== > reader: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > 4096B-4096B, ioengine=libaio, iodepth=64 > fio-3.3 > Starting 1 process > Jobs: 1 (f=1) > reader: (groupid=0, jobs=1): err= 0: pid=1829: Tue Feb  5 00:08:08 2019 >    read: IOPS=193k, BW=753MiB/s (790MB/s)(1024MiB/1359msec) >     slat (nsec): min=1269, max=40309, avg=1493.66, stdev=534.83 >     clat (usec): min=127, max=12249, avg=329.83, stdev=184.92 >      lat (usec): min=129, max=12256, avg=331.35, stdev=185.06 >     clat percentiles (usec): >      |  1.00th=[  326],  5.00th=[  326], 10.00th=[  326], 20.00th=[  326], >      | 30.00th=[  326], 40.00th=[  326], 50.00th=[  326], 60.00th=[  326], >      | 70.00th=[  326], 80.00th=[  326], 90.00th=[  326], 95.00th=[  326], >      | 99.00th=[  347], 99.50th=[  519], 99.90th=[  529], 99.95th=[  537], >      | 99.99th=[12125] >    bw (  KiB/s): min=755032, max=781472, per=99.57%, avg=768252.00, > stdev=18695.90, samples=2 >    iops        : min=188758, max=195368, avg=192063.00, stdev=4673.98, > samples=2 >   lat (usec)   : 250=0.08%, 500=99.18%, 750=0.72% >   lat (msec)   : 20=0.02% >   cpu          : usr=12.30%, sys=46.83%, ctx=253554, majf=0, minf=74 >   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, > >=64=100.0% >      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.0% >      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, > >=64=0.0% >      issued rwts: total=262144,0,0,0 short=0,0,0,0 dropped=0,0,0,0 >      latency   : target=0, window=0, percentile=100.00%, depth=64 > > Run status group 0 (all jobs): >    READ: bw=753MiB/s (790MB/s), 753MiB/s-753MiB/s (790MB/s-790MB/s), > io=1024MiB (1074MB), run=1359-1359msec > > Disk stats (read/write): >   nvme0n1: ios=221246/0, merge=0/0, ticks=71556/0, in_queue=704, > util=91.35% > > ======================================================================= > After: > ======================================================================= > reader: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) > 4096B-4096B, ioengine=libaio, iodepth=64 > fio-3.3 > Starting 1 process > Jobs: 1 (f=1) > reader: (groupid=0, jobs=1): err= 0: pid=1803: Mon Feb  4 23:58:07 2019 >    read: IOPS=193k, BW=753MiB/s (790MB/s)(1024MiB/1359msec) >     slat (nsec): min=1276, max=41900, avg=1505.36, stdev=565.26 >     clat (usec): min=177, max=12186, avg=329.88, stdev=184.03 >      lat (usec): min=178, max=12192, avg=331.42, stdev=184.16 >     clat percentiles (usec): >      |  1.00th=[  326],  5.00th=[  326], 10.00th=[  326], 20.00th=[  326], >      | 30.00th=[  326], 40.00th=[  326], 50.00th=[  326], 60.00th=[  326], >      | 70.00th=[  326], 80.00th=[  326], 90.00th=[  326], 95.00th=[  326], >      | 99.00th=[  359], 99.50th=[  498], 99.90th=[  537], 99.95th=[  627], >      | 99.99th=[12125] >    bw (  KiB/s): min=754656, max=781504, per=99.55%, avg=768080.00, > stdev=18984.40, samples=2 >    iops        : min=188664, max=195378, avg=192021.00, stdev=4747.51, > samples=2 >   lat (usec)   : 250=0.12%, 500=99.40%, 750=0.46% >   lat (msec)   : 20=0.02% >   cpu          : usr=12.44%, sys=47.05%, ctx=252127, majf=0, minf=73 >   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, > >=64=100.0% >      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.0% >      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, > >=64=0.0% >      issued rwts: total=262144,0,0,0 short=0,0,0,0 dropped=0,0,0,0 >      latency   : target=0, window=0, percentile=100.00%, depth=64 > > Run status group 0 (all jobs): >    READ: bw=753MiB/s (790MB/s), 753MiB/s-753MiB/s (790MB/s-790MB/s), > io=1024MiB (1074MB), run=1359-1359msec > > Disk stats (read/write): >   nvme0n1: ios=221203/0, merge=0/0, ticks=71291/0, in_queue=704, > util=91.19% > > How's this look to you? Ok, I'm satisfied the four-9's latency spike is in not your code. :-) Results look good relative to baseline. Thanks for doublechecking! Tom.