Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp2315374pxv; Sat, 24 Jul 2021 12:03:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJycYJFViwW2brO2SVC6MzIgUkQXjTzlOQfm4cl3d5U3c9laJ7lq1hdYp+/Yzukj4UvJS29H X-Received: by 2002:a05:6402:51c7:: with SMTP id r7mr12953161edd.150.1627153392579; Sat, 24 Jul 2021 12:03:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627153392; cv=none; d=google.com; s=arc-20160816; b=tc8tuhcqGxnQBJvrXv7fAtU2VjcDQY8DGiRey9igordMG/3fNqpD9V4t+RmQrqRQxt 1Z/9CMGq2UZFWEnI3bBFXnsXffuWQDxty1/7TdjfB7/6Mg0Zkio+TcLz400e+0OnGEzt Si/lJ10ww7cbdb8EwSY16trb8Cvq2jxSZzu8deQAvHMQtcXOrnpkP2nA8SRAUz+4StKr 7LO+DpvuXfXEFzPw7+n53OkbuptNjHYVa/egrYcT3Kd8ttIZ8tQIQv6KCMcFA3xBbfmh rvB3EkCdFHb0rs/IZ5tzDGAteKV++KAt8F06H53tl/ci/pxWNs7TasZN5JmcqcBhOf3n B+sQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=rffqCudnrm5JKUoz+KBPg7rjdlISRKU9NYB1v0WUlHU=; b=XyiYx9L7zZMcGSzMJg+Z4YhADe33LojMlsQxUR08N0Oc/a1jPGFQbcHux6VqmuI5Y+ BLq2CZgy1tYx9CcFEMUHHV01oeIwLTy8fEywxyMVlV09KYM3KbbzHmGlUAYf2kQYRAt9 VyY9LWJdkWbCp5s71jfLAeGzjNs5z4ruey62gDCf+bWgMbKG4uCh24xJQtS+mQ0tNv1a laRAsC9PGG/DLkBrVWkm+33tFV6fmYWOwiGmNq+SbkLOa6nxRhaJWPd3mdIc9Ai5JiwK sLoAX8pVwkd85o4qezfrSgWPFn2+rppTkoTvJGtNohzJNzxYcyBWXJkwTDIrGyO/mKCU xtgQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=n7EpiJrD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h8si42755654ejj.750.2021.07.24.12.02.49; Sat, 24 Jul 2021 12:03:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=n7EpiJrD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229626AbhGXSUw (ORCPT + 99 others); Sat, 24 Jul 2021 14:20:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229510AbhGXSUv (ORCPT ); Sat, 24 Jul 2021 14:20:51 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 23DE8C061575; Sat, 24 Jul 2021 12:01:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=rffqCudnrm5JKUoz+KBPg7rjdlISRKU9NYB1v0WUlHU=; b=n7EpiJrDz7Z5vCd41WUY7T1X4n i+XgwJgw7Y1juxrVmWtM+bZ6RKZ6dyLH3gY/ohFuLRQZvfNMYskHNzwpBGymX324/noMwjgiMheM1 1Ac4EVBjZMSglbtp6rBRqyycaf4Q87bgsVyp4elDlK8ezYe66iNX1WwUoebQmzMI4g67kOH3AzP7R jfLQi8SCg3TxcmjJXRSmRIdcv8ve3bUTsOmK+aCf3NKu5O4gQ25dTiAnGdXCjorA1wfWOeeZTnHl5 BVUyJVh4KQcaeMBIRTFMPQdIzyRfaYFBYYyn9TQVsUx3EynKXu6cxo8TOjkVPN0mpH7xUuEwlVMo1 gK3ucbLw==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1m7MtS-00CU8e-Fx; Sat, 24 Jul 2021 19:01:07 +0000 Date: Sat, 24 Jul 2021 20:01:02 +0100 From: Matthew Wilcox To: Andres Freund Cc: James Bottomley , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Linus Torvalds , Andrew Morton , "Darrick J. Wong" , Christoph Hellwig , Michael Larabel Subject: Re: Folios give an 80% performance win Message-ID: References: <20210715033704.692967-1-willy@infradead.org> <1e48f7edcb6d9a67e8b78823660939007e14bae1.camel@HansenPartnership.com> <17a9d8bf-cd52-4e6c-9b3e-2fbc1e4592d9@www.fastmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <17a9d8bf-cd52-4e6c-9b3e-2fbc1e4592d9@www.fastmail.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jul 24, 2021 at 11:45:26AM -0700, Andres Freund wrote: > On Sat, Jul 24, 2021, at 11:23, James Bottomley wrote: > > Well, I cut the previous question deliberately, but if you're going to > > force me to answer, my experience with storage tells me that one test > > being 10x different from all the others usually indicates a problem > > with the benchmark test itself rather than a baseline improvement, so > > I'd wait for more data. > > I have a similar reaction - the large improvements are for a read/write pgbench benchmark at a scale that fits in memory. That's typically purely bound by the speed at which the WAL can be synced to disk. As far as I recall mariadb also uses buffered IO for WAL (but there was recent work in the area). > > Is there a reason fdatasync() of 16MB files to have got a lot faster? Or a chance that could be broken? > > Some improvement for read-only wouldn't surprise me, particularly if the os/pg weren't configured for explicit huge pages. Pgbench has a uniform distribution so its *very* tlb miss heavy with 4k pages. It's going to depend substantially on the access pattern. If the 16MB file (oof, that's tiny!) was read in in large chunks or even in small chunks, but consecutively, the folio changes will allocate larger pages (16k, 64k, 256k, ...). Theoretically it might get up to 2MB pages and start using PMDs, but I've never seen that in my testing. fdatasync() could indeed have got much faster. If we're writing back a 256kB page as a unit, we're handling 64 times less metadata than writing back 64x4kB pages. We'll track 64x less dirty bits. We'll find only 64 dirty pages per 16MB instead of 4096 dirty pages. It's always possible I just broke something. The xfstests aren't exhaustive, and no regressions doesn't mean no problems. Can you guide Michael towards parameters for pgbench that might give an indication of performance on a more realistic workload that doesn't entirely fit in memory?