Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp1688641ybz; Thu, 16 Apr 2020 13:44:09 -0700 (PDT) X-Google-Smtp-Source: APiQypJ4NCQ32YHeKDWX1N9jWEleC62crALLD1bm1X8NyL3sJ+PmQSBMephMjGlzE0fe4KLDnXtc X-Received: by 2002:aa7:d794:: with SMTP id s20mr45757edq.141.1587069849561; Thu, 16 Apr 2020 13:44:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587069849; cv=none; d=google.com; s=arc-20160816; b=ys1RUQTCcUj7uuIzv3OcRVDO4YpKUjISgekirscshaXIBIroqJY9HSAU6Pbc8slnuM 2+6vdaKmWZXuXcKvjP2vIEuoQa+yJ9l41sLbw/UBV3CVB0YbufKaJ6/NzudMZ+dRHmXg YkyiCq/3CVKy79qrWVkYVvpl6jF3w0fv+pP6A1mhe2QHuJlYYgw38+Y6SB/K70uLmgty IjWmYAQMRC14Fe9qVq3TSIMkGHFQTEtVqWAfcCH7J9tceqecxuIRKv3+wU6hvxTC8GnD 6Fz/s4i+U+Cem4MaN/M0voEMekbYNVBUzMWKYdKxMdDVtaeEWgTyo6NS9iPiw8VnghUR hkEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=o2C8tnZUkXvDOPI4LGiThjtqlJf5lF5YB6Qvi7NuNHc=; b=mZqcHS9B+pguNu4+dOYXwo10YkUQ7dwS9lbTiqFXPFTaropdHJlcyxwFHV33zL1WCE J04kLlBU3PHtkmY7yzFBAH8JOPmku9R+eVDIf00lj2kmOZq5gA4HUHVR33DUU/2hbG0z oB3psqPoHkd+/cGL867yx5qY42AB/kwrgg8mIWgMWUzorf3J+hVv4WsgWk+EUqf1ouh2 1YC4c1m5eGn/oxlP/Iz81og6esR5dj0DCAstJF0vG0CpPJffArTZJQO/iT76q9+rl938 ySQihEBjqQ+12wp44fMwH6+nuZekxL1Zbc7Lrhi8auuO00gngqYj6Vqm8oolw0NEbiOu 2D4w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=ufe0gGKl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id nh5si5125156ejb.352.2020.04.16.13.43.46; Thu, 16 Apr 2020 13:44:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=ufe0gGKl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390642AbgDPS2W (ORCPT + 99 others); Thu, 16 Apr 2020 14:28:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46336 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727792AbgDPS2U (ORCPT ); Thu, 16 Apr 2020 14:28:20 -0400 Received: from mail-ej1-x642.google.com (mail-ej1-x642.google.com [IPv6:2a00:1450:4864:20::642]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 80BC2C061A0C for ; Thu, 16 Apr 2020 11:28:20 -0700 (PDT) Received: by mail-ej1-x642.google.com with SMTP id s3so2021844eji.6 for ; Thu, 16 Apr 2020 11:28:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=o2C8tnZUkXvDOPI4LGiThjtqlJf5lF5YB6Qvi7NuNHc=; b=ufe0gGKl/pOUpmSwLR6hxD/PkZvSZu5TScGHSAHidmhiqKc5HZ46e4LaVrkmeCkgXf lAVkYHHT0xteZuZpfvMh3t4czdzuMQo4R2sk7zePN1b13RTn8yByroQPIaYlycwjGZqC SxY3TvQPv81rkc1OUHRrOZx3qY6Bx8pKeJAYLSHXOOoL0I1EaBtd/f7Ilhyp7UXiLJ0f rL6qmn86Ld467VRhTFJtgSlcMJJfgv9+kXp/rUQKJXT9qrNqps+TQGGjMdWrSSySRnmE NC0/3a6yEptRW084OcW4s7aGfWZ/MeMiBVRsEOCogw1Il1OfnlxztY31JHO6N1n0KSp9 unVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=o2C8tnZUkXvDOPI4LGiThjtqlJf5lF5YB6Qvi7NuNHc=; b=nP4GLvNWMsxQc6mj5k4V0f4oWCPATyam6JNFxdtqhZ4+DX6Z36fpYJs6rDYTcPGxDu C4AV0OwSgTGfEXUvxPKat6n2TG5zj60+b5RkxbW2hwG5KMtXt9DNPOI6x+emeEomz7r6 GD0X3k9+ciYXEhscznRCvnZyn6NGxLep6M15KFqKfX1B+IHjgsQBKmvGM0u9RJ3+YSLX Ic5O7K20zB38tMy+XcfYvCubZ5lHs8bQyw2BB++STBG2fbCstQqrsg3nQvWT7fXsA8Ox L+fITQOOxTR/BXnw+Et9Awo32xzxNxxmo2d5rxcuMo+NFZepqytLUrkcMhgukBhmUAzb E5IQ== X-Gm-Message-State: AGi0Pubr8QQ5tOFr+5qppok53+SkhaAxsbE7XNCy+azuMLR6y/FP+2P6 6a9gZu804nb2NqAMrSsezNclWH+ivgc2HNrp7i60lA== X-Received: by 2002:a17:906:90cc:: with SMTP id v12mr10995580ejw.211.1587061699158; Thu, 16 Apr 2020 11:28:19 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Dan Williams Date: Thu, 16 Apr 2020 11:28:08 -0700 Message-ID: Subject: Re: [PATCH] memcpy_flushcache: use cache flusing for larger lengths To: Mikulas Patocka Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Peter Zijlstra , X86 ML , Linux Kernel Mailing List , device-mapper development Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 16, 2020 at 1:24 AM Mikulas Patocka wrote: > > > > On Thu, 9 Apr 2020, Mikulas Patocka wrote: > > > With dm-writecache on emulated pmem (with the memmap argument), we get > > > > With the original kernel: > > 8508 - 11378 > > real 0m4.960s > > user 0m0.638s > > sys 0m4.312s > > > > With dm-writecache hacked to use cached writes + clflushopt: > > 8505 - 11378 > > real 0m4.151s > > user 0m0.560s > > sys 0m3.582s > > I did some multithreaded tests: > http://people.redhat.com/~mpatocka/testcases/pmem/microbenchmarks/pmem-multithreaded.txt > > And it turns out that for singlethreaded access, write+clwb performs > better, while for multithreaded access, non-temporal stores perform > better. > > 1 sequential write-nt 8 bytes 1.3 GB/s > 2 sequential write-nt 8 bytes 2.5 GB/s > 3 sequential write-nt 8 bytes 2.8 GB/s > 4 sequential write-nt 8 bytes 2.8 GB/s > 5 sequential write-nt 8 bytes 2.5 GB/s > > 1 sequential write 8 bytes + clwb 1.6 GB/s > 2 sequential write 8 bytes + clwb 2.4 GB/s > 3 sequential write 8 bytes + clwb 1.7 GB/s > 4 sequential write 8 bytes + clwb 1.2 GB/s > 5 sequential write 8 bytes + clwb 0.8 GB/s > > For one thread, we can see that write-nt 8 bytes has 1.3 GB/s and write > 8+clwb has 1.6 GB/s, but for multiple threads, write-nt has better > throughput. > > The dm-writecache target is singlethreaded (all the copying is done while > holding the writecache lock), so it benefits from clwb. > > Should memcpy_flushcache be changed to write+clwb? Or are there some > multithreaded users of memcpy_flushcache that would be hurt by this > change? Maybe this is asking for a specific memcpy_flushcache_inatomic() implementation for your use case, but leave nt-writes for the general case?