Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp2322928pxv; Sat, 24 Jul 2021 12:16:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy93pYNIdgZsaAVL+FheK1mMzrrghmZEhf7N+Nt37B/IlcBnWqp56NqU1O1PY2ScyTnWmHg X-Received: by 2002:a05:6402:951:: with SMTP id h17mr12673125edz.198.1627154204074; Sat, 24 Jul 2021 12:16:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627154204; cv=none; d=google.com; s=arc-20160816; b=cnbcHt7CjepMhBX+TH3OrUjI5qNc7xCK6+vs/+b+w4rAStJ8pSF7o9bIPHDKulm8dm 2Wfg0w+HvehOHaxnHyBgMd3XOBvcS8yqQ0XvmKtJ97VOC5wuhWUnsIkfLga+Qrxm2c2G QVzFEoOaMjhhbUd5nh0csC9fQBETUqAkSXidhlJwasd9bmSsSwOh7A6c3zjey5Qy8QPL bXiT0QFZxfBgbEaDEMAlAjKmOzCEI2xGULm96GUDc1aWxKaEwK0A7JaRLsZ3/08opZ6M TB6+dFcuU40aKLd14dgcDNfhlfMqzE19aWy3ZUqKmaKoFbNGU9egcVb9SSbFAMIR0txF NjyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:cc:to:from:date:references:in-reply-to :message-id:mime-version:user-agent:dkim-signature:dkim-signature; bh=dXCbOqGxR8Vk0pjKq6a806jXWA/9phvbJwLeUoq51KU=; b=ZAaUEjyXpcS1P1ok4m2lEbyTSkUc/LqfpBQKrOHORlrMdQ+A/3gnpnAJKzLQpJAr2g 4hYwse5Ol6MBLEDz4xSDNns+Y2VIEA01dvOwLPCbE3XhmBVwMLuO+DPOa7vKkd5Gr9Q8 5EyVFbiR8nEfShb+BIcsfjXmNSoYJ9fv+W8masdOQPYkV5bT1WC2xOB7yZWo4ohkD9E4 hi8YQ6q27P16N5gY0dlaxPL1Q6Y6XLKN5siXABTRMVyRZQxo4CAqvV92h2mCJ/JiVrsA udzlRI1iqwMm4Rgg3V44kv5EhVjCczo3vP/ggcXU4JhhmdvoDFP+J0XiYYmdvMYDcMvH N5sw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@anarazel.de header.s=fm2 header.b=reX+fT1w; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=RH85y+QA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r17si35712173edv.508.2021.07.24.12.16.19; Sat, 24 Jul 2021 12:16:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@anarazel.de header.s=fm2 header.b=reX+fT1w; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=RH85y+QA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229531AbhGXSca (ORCPT + 99 others); Sat, 24 Jul 2021 14:32:30 -0400 Received: from wout5-smtp.messagingengine.com ([64.147.123.21]:34645 "EHLO wout5-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229476AbhGXSc3 (ORCPT ); Sat, 24 Jul 2021 14:32:29 -0400 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id B6AC432007D7; Sat, 24 Jul 2021 15:13:00 -0400 (EDT) Received: from imap35 ([10.202.2.85]) by compute3.internal (MEProxy); Sat, 24 Jul 2021 15:13:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= mime-version:message-id:in-reply-to:references:date:from:to:cc :subject:content-type; s=fm2; bh=dXCbOqGxR8Vk0pjKq6a806jXWA/9phv bJwLeUoq51KU=; b=reX+fT1wQPMi3/naLPEr73S1dEeec5Q5Gp9mM6EaMl0oEYI YG6Lc4V3g94xLJQQ9c2i3JxjS2LeycCpGmVROS64RIr+YuCD+uaNsjmJcHrelKKb nywedFxnXuqzdFJr8XJhnszWbix3V0yuwQo9CBgxhjuoDJS4bT+cGOEldOB/pmNh drtFGF2peoVeZadpc9qIpqIAQlOTG/VRi3xvvgLiiLjlrVvRjX6lcvtdaIDdfXll hfmLPYrprUF6QIJ/JWFCRty5yyvMwVWVb5O0kNmIE6KsJaozRBFzduCZLDNjdFvL aLGfBfgg9YxI+JWvpeHxS9rOPufDDWjsj4vVeqw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=dXCbOq GxR8Vk0pjKq6a806jXWA/9phvbJwLeUoq51KU=; b=RH85y+QAkCaiVl/RGgPPfc qNBBmEhpVrrzn4pFqOybH4CpMdPDY2alln2Vf3pFeg8k+VIKDf/7rzqiFKeRQJ2o HFmGYktc+hTbMnRxyU2QOVH0zRU0hyOmu6DKqYV7wUrDPz797T49AIEZhf1eBjXF UnWUV32Nd9SC83MIOjTIjkDQoElYkN+6m+qHGKAI6SFE8zUzhi4vWGJPbinRfcER wHuCch47FgsDDPS6uFoqyfl5PbOXG+KolTQpUabN/Ij2NVaZ7x08gExbkwOgV9ol tpWV1GW01aDDrSuKAbLZvNvpkDM/zXXdm6fo3JPP3/FVqVRnxlUGkLH2j1loPgJw == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrgedtgddufeduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvffutgesthdtredtreertdenucfhrhhomhepfdetnhgu rhgvshcuhfhrvghunhgufdcuoegrnhgurhgvshesrghnrghrrgiivghlrdguvgeqnecugg ftrfgrthhtvghrnhepteegvddvffeghfejteevteevfeegffduudffgedtueejvdejlefg veegudekfedvnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrh homheprghnughrvghssegrnhgrrhgriigvlhdruggv X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id DED8515A007C; Sat, 24 Jul 2021 15:12:58 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.5.0-alpha0-540-g21c5be8f1e-fm-20210722.001-g21c5be8f Mime-Version: 1.0 Message-Id: <4c634d08-c658-44cf-ac92-92097eeb8532@www.fastmail.com> In-Reply-To: References: <20210715033704.692967-1-willy@infradead.org> <1e48f7edcb6d9a67e8b78823660939007e14bae1.camel@HansenPartnership.com> <17a9d8bf-cd52-4e6c-9b3e-2fbc1e4592d9@www.fastmail.com> Date: Sat, 24 Jul 2021 12:12:36 -0700 From: "Andres Freund" To: "Matthew Wilcox" Cc: "James Bottomley" , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, "Linus Torvalds" , "Andrew Morton" , "Darrick J. Wong" , "Christoph Hellwig" , "Michael Larabel" Subject: Re: Folios give an 80% performance win Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On Sat, Jul 24, 2021, at 12:01, Matthew Wilcox wrote: > On Sat, Jul 24, 2021 at 11:45:26AM -0700, Andres Freund wrote: > > On Sat, Jul 24, 2021, at 11:23, James Bottomley wrote: > > > Well, I cut the previous question deliberately, but if you're going to > > > force me to answer, my experience with storage tells me that one test > > > being 10x different from all the others usually indicates a problem > > > with the benchmark test itself rather than a baseline improvement, so > > > I'd wait for more data. > > > > I have a similar reaction - the large improvements are for a read/write pgbench benchmark at a scale that fits in memory. That's typically purely bound by the speed at which the WAL can be synced to disk. As far as I recall mariadb also uses buffered IO for WAL (but there was recent work in the area). > > > > Is there a reason fdatasync() of 16MB files to have got a lot faster? Or a chance that could be broken? > > > > Some improvement for read-only wouldn't surprise me, particularly if the os/pg weren't configured for explicit huge pages. Pgbench has a uniform distribution so its *very* tlb miss heavy with 4k pages. > > It's going to depend substantially on the access pattern. If the 16MB > file (oof, that's tiny!) was read in in large chunks or even in small > chunks, but consecutively, the folio changes will allocate larger pages > (16k, 64k, 256k, ...). Theoretically it might get up to 2MB pages and > start using PMDs, but I've never seen that in my testing. The 16MB files are just for the WAL/journal, and are write only in a benchmark like this. With pgbench it'll be written in small consecutive chunks (a few pages at a time, for each group commit). Each page is only written once, until after a checkpoint the entire file is "recycled" (renamed into the future of the WAL stream) and reused from start. The data files are 1GB. > fdatasync() could indeed have got much faster. If we're writing back a > 256kB page as a unit, we're handling 64 times less metadata than writing > back 64x4kB pages. We'll track 64x less dirty bits. We'll find only > 64 dirty pages per 16MB instead of 4096 dirty pages. The dirty writes will be 8-32k or so in this workload - the constant commits require the WAL to constantly be flushed. > It's always possible I just broke something. The xfstests aren't > exhaustive, and no regressions doesn't mean no problems. > > Can you guide Michael towards parameters for pgbench that might give > an indication of performance on a more realistic workload that doesn't > entirely fit in memory? Fitting in memory isn't bad - that's a large post of real workloads. It just makes it hard to believe the performance improvement, given that we expect to be bound by disk sync speed... Michael, where do I find more details about the codification used during the run? Regards, Andres