Received: by 2002:ab2:69cc:0:b0:1f4:be93:e15a with SMTP id n12csp276924lqp; Fri, 12 Apr 2024 18:36:53 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWKDchUJI0SGHEAf54sPIsVjRstbuXZT0gToe6VY9MbUX2B1e0GtIdF5yx83KSMF+MedHeNNCMpzaUQaz2inFkvOY3BBSqM6hnlFq+lpw== X-Google-Smtp-Source: AGHT+IHPIxlQ6GLG2E0JlTg2c0talm0HmO7LumR3BQbNsZYYrRQmAH2rG/e68CKXI8fyU+Fnciz8 X-Received: by 2002:a05:6808:3af:b0:3c6:f599:23af with SMTP id n15-20020a05680803af00b003c6f59923afmr3359259oie.33.1712972213497; Fri, 12 Apr 2024 18:36:53 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712972213; cv=pass; d=google.com; s=arc-20160816; b=fgYX3yZhM9z44E2EKnCS/TBJj54SdsjJ/xUIMAYudvdN0s0MbEt1xIaqXFXqBiLmJC xYW0VBXooqGB0RL2Z5oHUfyEvYUyXvrGdu3mE5G0dHVedjhOT4bQAybQ0DFR0ETAgQwU hw08qSknkLq1sxNnU2Du12pZglJyNnSncYKEXEgYVNMHY3yGzbqDm76bgUwhLF8UilgJ Um6OroivrZPE+Y+FS06dfD9mX9SDM5D6BdBO964EMo+vjaIBNtIIbKewCFpHbQfupioz E4cc9AXzf/qODiSx3NdyPH+2526qvY9JzffaXbJ0QLS1UhI3grPpKqfZSXow4el88LMA Hkaw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:in-reply-to:content-disposition:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:references :message-id:subject:cc:to:from:date:dkim-signature; bh=s9uyMrI2kBzD/jir8Kf5BQ4KLVX7ZmHiKzONe0aRyLQ=; fh=RZI+nZ2+Fs4VdtjU763d1RAsNj55cy//cE7r5Ogg5Yg=; b=zAJhJ9SpG+RA0z8WGW02ERByfIDu5PeG2qAO1yVkshqH8H5Dl5hGkm4zud1pMZyAxi mRCZ8A5SRjWC2+VF/i/BxHe9qV9xhGv+jfYqV9JzvL2l7n9oNC1vFu2QN4SOSoml8e8c +AWBZSdHAORD/UpyAkaPdS2Bk0PMIb9nE6z43wL5FCbNY7iwqRhG2ZzXPNn1cWrzxJfq RfQgvrPttXIDFYXgdGz3I44JK7mcOoVe3yRqgGhzKBT1EI4KS7uRLR5GSTVszpa8jocX wwzC4RRw72Bibc7oTCbg5wlVMjKLRGAOvurBSO8+IB2Inwib3wHj5jh+kg+BGkCf0cZG Yedw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=g4enleGk; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-143199-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-143199-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id du8-20020a056a002b4800b006ed4263ab9asi4263564pfb.303.2024.04.12.18.36.53 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Apr 2024 18:36:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-143199-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=g4enleGk; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-143199-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-143199-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 37AC3285DC2 for ; Fri, 12 Apr 2024 18:29:10 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9D8B314F124; Fri, 12 Apr 2024 18:28:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="g4enleGk" Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C915514E2E0; Fri, 12 Apr 2024 18:28:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712946532; cv=none; b=Yjwwik/8yFHONX3FX+7g4NNsec0lgCUQwvhuLFAM75dCsxooWBMgCygOBsV05PLOM01k4H++fTlKU3D3gq8VCMS5v1NsOfzOcMkKYQG3aGes1uPA/ZFb3c1l9/SJvA29cPGEgBuTPE84CnvRoixQeriJwmXUNrkJtl6UiW2CXlE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712946532; c=relaxed/simple; bh=X9j9yFm/65nRAckgOYZMNKOekN37E9FIFTz7phVScao=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=VFMbdsFJLpawZEWS+H6Zz6WLBy2u42TVNfzxw0/i2Y5LDy5ZV0l6Ss2iBjFHIsnod+UE1cE/GCF5Yk2XtApUmYqSiGlZInjlcVPKULkwfUm15N2uNZ2qsVUUtozX3uBvPGbLaMUVViSQfbGik8Rbd/abymDlJfFhcG5Gs3dFDZc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=g4enleGk; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=s9uyMrI2kBzD/jir8Kf5BQ4KLVX7ZmHiKzONe0aRyLQ=; b=g4enleGkCJEEwc71HQKni0x9GL WQaciMczjdvFq3ggaGdSSVerJPZOnRP70tf/jJYONFwjo9AoRzjwNg1jZ9M6ngZW5iMlU+zpXLUW+ C9zDOOJMqEmOd6JzljE5sC7YvKO8mFIzOpi+wuvjAXUVr9SZ358OD4vWFo034mtRdmCrH29tnhQbE dlN7RuMr4Fvg37WpbT3NYaYZIyDt4GuGQ41syaaAGsXyHbnvtFukS0XAo6x8EtzXWRmXwg5RtczDq HV/tchkMzR/ZAUM21RhtdRQJLJo86VziZRS8/dEaVWKErEijWWzQNAFfc4tuKo+/zMQHY52RRcbZU tAwukUoA==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1rvLdf-00000000rhY-0qAo; Fri, 12 Apr 2024 18:28:39 +0000 Date: Fri, 12 Apr 2024 11:28:39 -0700 From: Luis Chamberlain To: John Garry , Dan Helmick Cc: Matthew Wilcox , Pankaj Raghav , Daniel Gomez , Javier =?iso-8859-1?Q?Gonz=E1lez?= , axboe@kernel.dk, kbusch@kernel.org, hch@lst.de, sagi@grimberg.me, jejb@linux.ibm.com, martin.petersen@oracle.com, djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, dchinner@redhat.com, jack@suse.cz, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com, linux-scsi@vger.kernel.org, ojaswin@linux.ibm.com, linux-aio@kvack.org, linux-btrfs@vger.kernel.org, io-uring@vger.kernel.org, nilay@linux.ibm.com, ritesh.list@gmail.com Subject: Re: [PATCH v6 00/10] block atomic writes Message-ID: References: <20240326133813.3224593-1-john.g.garry@oracle.com> <6d8e98bb-24d1-49be-8965-b6afa97dfdaa@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6d8e98bb-24d1-49be-8965-b6afa97dfdaa@oracle.com> Sender: Luis Chamberlain + Dan, On Fri, Apr 12, 2024 at 09:15:57AM +0100, John Garry wrote: > On 11/04/2024 20:07, Luis Chamberlain wrote: > > > So if you > > > have a 4K PBS and 512B LBS, then WRITE_ATOMIC_16 would be required to write > > > 16KB atomically. > > Ugh. Why does SCSI requires a special command for this? > > The actual question from others is why does NVMe not have a dedicated > command for this, like: > https://lore.kernel.org/linux-nvme/20240129062035.GB19796@lst.de/ Because we don't really need it for the hardware that supports it if the host does the respective topology checks. For instance the respective checks for NVMe are that atomics respect AWUN as the cap as the drive already can go up to AWUN, and the limit for power-fail is implicit by checking for AWUPF / NAWUPF. The alignment constraints can be dealt with by the host software. > It's a data integrity feature, and we want to know if it works properly. For drives which already support this integrity is ensured already for you. An NVMe specific atomic write command could be useful for for existing drives for other reasons or future uses but its not a requirement with the existing use cases if the NVMe alignment / atomic are respected by the host. > > Now we know what would be needed to bump the physical block size, it is > > certainly a different feature, however I think it would be good to > > evaluate that world too. For NVMe we don't have such special write > > requirements. > > > > I put together this kludge with the last patches series of LBS + the > > bdev cache aops stuff (which as I said before needs an alternative > > solution) and just the scsi atomics topology + physical block size > > change to easily experiment to see what would break: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20240408-lbs-scsi-kludge > > > > Using a larger sector size works but it does not use the special scsi > > atomic write. > > If you are using scsi_debug driver, then you can just pass the desired > physblk_exp and sector_size args - they both default to 512B. Then you don't > need bother with sd.c atomic stuff, which I think is what you want. > > > > > > > > To me, O_ATOMIC would be required for buffered atomic writes IO, as we want > > > > > a fixed-sized IO, so that would mean no mixing of atomic and non-atomic IO. > > > > Would using the same min and max order for the inode work instead? > > > Maybe, I would need to check further. > > I'd be happy to help review too. > > Yeah, I'm starting to think that min and max inode would make life easier, > as we don't need to deal with the scenario of an atomic write to a folio > > atomic write size. And aligments constraints could be dealt with as well. Luis