Received: by 2002:a05:7412:37c9:b0:e2:908c:2ebd with SMTP id jz9csp2336956rdb; Thu, 21 Sep 2023 16:03:08 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFjqIgEZleJmfp0mBfb4ysrgFhoVntvCCapkmcIpqJkkaMpiTplfopYrO2T7DeoT0RZ2HBQ X-Received: by 2002:a05:6a00:4486:b0:690:41a1:9b6c with SMTP id cu6-20020a056a00448600b0069041a19b6cmr6278966pfb.10.1695337388471; Thu, 21 Sep 2023 16:03:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695337388; cv=none; d=google.com; s=arc-20160816; b=WzpkSFCLAWiDbdHE9Em7zCZb4V03I89W0xUjSj7yxmB6Q6AexQCcDhbTijRXWmcIUO 5/VXUcSFrP0LAnRLium272fAwjxtZgDuJGD31wUBzgNFupe2Q58mhxWNRwXYblxAC3y7 /u24kIBXHV7hRVF7ufSTA/agF+Pjk9Uw0roTnDzuj6V+9HGO/CdJ/+x8Tp0sgpc/E7JR vc3UfZsf+sUgP4Ay6CkgNdLJ5l6W6J1KwE13m12tfl+5d+jIEkodzuVwWCv2X5Wa2Gy7 IDkVgfjqiFeQOFF7bkifWX1h4kwK693hDQf4hsde8ZU+13jQzqlrBkzJQmnT7IlYpFuF UbxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=5JQjYlFnwxrhcfHqWw60RsFGqSkwN1XUouvFVD0EJfc=; fh=X19qRRdqGY4/Bm0k8aaUPWTz0ZxaCmyBU64OTIFYDqE=; b=NJomzBuX3pYKGNt7LoLLJh4kRwGPmgDmyCrhNZNNyVVoXoYLGKWqlqbmy/h1HqQ+pE iTR0I+1urDjrzwrMqmyDDxj47EnAuWfshEY8CQL07WfubXAjaojfT+CMBHA9KZy83n76 +FBSiNUtIqUtAEz8TC7Bt5t9XVO62Wg1JYsombXGtP/gYiqf6Jxem5P5nHFGC1+cY+8R rAA0ocFbF/JDu9BFaagjJEOmI/aWoEuba3zuvkXDYtdyr8lQNjJ0iyx3JfE8WZn9tmUn w1AJ7ewyuTAE2Aa1AYGKJEjCLE6GwEUv1FBEKb2sZqR6d7OKfxC7pg3vbfKJuMmdME7V utGA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b="RScI/c/G"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id m16-20020a63fd50000000b00563fac86c55si2454534pgj.134.2023.09.21.16.03.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Sep 2023 16:03:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b="RScI/c/G"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 01DB48311B75; Thu, 21 Sep 2023 11:41:14 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230382AbjIUSiR (ORCPT + 99 others); Thu, 21 Sep 2023 14:38:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229808AbjIUShl (ORCPT ); Thu, 21 Sep 2023 14:37:41 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 181CDAE96F; Thu, 21 Sep 2023 11:06:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=5JQjYlFnwxrhcfHqWw60RsFGqSkwN1XUouvFVD0EJfc=; b=RScI/c/GkXbCF4dJBoZpg+Q8j9 vwH+a3Qh76zavs8//B94a3Mcg5/hAZK3d8b4bHRobDld6DAv+pb63hgWh1vRlKYXk8LA+uUx6A1R3 ZeZDqHLogNmIcYhfNzWCzZulQwArVy+xDLLA3w8v992UU1Qx9Ep1XGz947HiWGBKzhpAc6nRG0ICZ 3d/u2v5XeaHmscFTmkRxv8YxftoIW2bFoYkq38PLSR/s8QI2ceV58xuN4V+TTnfSmc/3sSbLm4Dut FVkaYf8LUpTY5iGEnKr0Q656BX9olDKH3J4sH/lRvdpAv49O5YGL8FbzAFNu9+G8jwvrp4prBcdvB t19YSvcA==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.96 #2 (Red Hat Linux)) id 1qjDwz-005MFV-0W; Thu, 21 Sep 2023 07:18:13 +0000 Date: Thu, 21 Sep 2023 00:18:13 -0700 From: Luis Chamberlain To: Dave Chinner Cc: Pankaj Raghav , Pankaj Raghav , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, da.gomez@samsung.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, willy@infradead.org, djwong@kernel.org, linux-mm@kvack.org, chandan.babu@oracle.com, gost.dev@samsung.com, riteshh@linux.ibm.com Subject: Re: [RFC 00/23] Enable block size > page size in XFS Message-ID: References: <20230915183848.1018717-1-kernel@pankajraghav.com> <806df723-78cf-c7eb-66a6-1442c02126b3@samsung.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Luis Chamberlain X-Spam-Status: No, score=0.1 required=5.0 tests=DATE_IN_PAST_06_12,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Thu, 21 Sep 2023 11:41:14 -0700 (PDT) On Thu, Sep 21, 2023 at 04:03:56PM +1000, Dave Chinner wrote: > On Wed, Sep 20, 2023 at 09:57:56PM -0700, Luis Chamberlain wrote: > > On Wed, Sep 20, 2023 at 08:00:12PM -0700, Luis Chamberlain wrote: > > > https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=large-block-linus > > > > > > I haven't tested yet the second branch I pushed though but it applied without any changes > > > so it should be good (usual famous last words). > > > > I have run some preliminary tests on that branch as well above using fsx > > with larger LBA formats running them all on the *same* system at the > > same time. Kernel is happy. <-- snip --> > So I just pulled this, built it and run generic/091 as the very > first test on this: > > # ./run_check.sh --mkfs-opts "-m rmapbt=1 -b size=64k" --run-opts "-s xfs_64k generic/091" The cover letter for this patch series acknowledged failures in fstests. For kdevops now, we borrow the same last linux-next baseline: git grep "generic/091" workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev/xfs/unassigned/xfs_reflink_1024.txt:generic/091 # possible regression workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev/xfs/unassigned/xfs_reflink_16k.txt:generic/091 workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev/xfs/unassigned/xfs_reflink_32k.txt:generic/091 workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev/xfs/unassigned/xfs_reflink_64k_4ks.txt:generic/091 So well, we already know this fails. > For all these assertions about how none of your testing is finding > bugs in this code, It's taken me *4 seconds* of test runtime to find > the first failure. Because you know what to look for and this is not yet perfect. > And, well, it's the same failure as I reported for the previous > version of this code: And we haven't done *any* new changes to the patch series so no surprise either. > Guess what? The fsx parameters being used means it is testing things you > aren't. I actualy found quite a bit of issues with -W. And it was useful. > Yes, the '-Z -R -W' mean it is using direct IO for reads and writes, > mmap() is disabled. Other parameters indicate that using 4k aligned reads and > 512 byte aligned writes and truncates. Thanks! This will help for sure!. > There is a reason there are multiple different fsx tests in fstests; You made it clear, and I documented the goal to ensure we get to the point we pass all those: https://kernelnewbies.org/KernelProjects/large-block-size#fsx > they all exercise different sets of IO behaviours and alignments, > and they exercise the IO paths differently. > > So there's clearly something wrong here - it's likely that the > filesystem IO alignment parameters pulled from the underlying block > device (4k physical, 512 byte logical sector sizes) are improperly > interpreted. i.e. for a filesystem with a sector size of 4kB, > direct IO with an alignment of 512 bytes should be rejected...... So yes, this is not yet complete. But now let's step back and I want you to realize where we started and why we decided to post, in particular me, I was suggesting we post now, instead of waiting for us to resolve *it all*. When we first started this work we simply thought it was impossible. Unless of course you are Matthew and you believed hard in your work. The progress, which you don't see, is that steps towards fixing fsx issues have been logarithmic. Days, weeks, months before decent progress, but the progress was steady... And so to get to where we are today only just shows, well this is actually not impossible, and Matthew did the right thing with the right data structure, and the changes to the page cache with multi index array stuff, it seems to be able to also be used for LBS. At this point, from a logarithmic perspective, we have huge progress, and I don't think it will stop. It gives us confidence Matthew was right and LBS is possible indeed with the multi-index stuff. It's not about, can this crash. Yes, we know, it can crash. It's about how many different ways, and how many fixes left. Because clearly the multi-index stuff is working well. The code feedback so far on this patch series has mostly been "I don't think this patch is needed" or "perhaps this way is better", and that's the kind of feedback we're looking for. Because *each* new patch adds a huge a milestone. And it seems the progress has been logarithmic. It is exactly why this series went out with a few patches which ... we felt safer with them than without. For instance the batch delete.. I still am suspicious about us not needing as Hannes' patches also seem to rely on similer rounding on the wait stuff, and it seems to bring back memories on issues found on permissions. But anyway, the point is that, this is clearly not ready. But try to think of progress here as logarithmic, and any *dent* we make on the page cache to fix the last corner cases will be huge, not small. If you want to try, you can see for yourself, what's the next fix? :) And if found, was it logarithmic? How do we polish this? That's the goal of this patch series. Luis