Received: by 2002:a05:7412:2a91:b0:fc:a2b0:25d7 with SMTP id u17csp406765rdh; Tue, 13 Feb 2024 23:46:31 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWoSQYlFMNboQMjAMLE8h26nh/DeyxJDaTkSmOtV+bEsP9STzats4hUnwG51Q91siNZqmEAg6hADXlWZ9c3T2C+ZtrE7g0iRGxBrDmoRQ== X-Google-Smtp-Source: AGHT+IHjf0gBU+Gz3I20fydqIgcr8+LzVrNWFhPXx78Vy6vtudsKKl8pGKCgWBw4uxz1PlcIwKLR X-Received: by 2002:a05:620a:28c4:b0:787:15b8:b4a1 with SMTP id l4-20020a05620a28c400b0078715b8b4a1mr2336740qkp.38.1707896791421; Tue, 13 Feb 2024 23:46:31 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707896791; cv=pass; d=google.com; s=arc-20160816; b=WsA7OJzcvV4GEKzq2jWJzkpoVNzAA7n7BOpHu/Esmh6xpGBLrLcnRlzSuLxoLcTXSB nQ42giVKoc2W7eU0ap8qdHU3A1oQJ/a1Yo6XMsH5ONBTWIOYjBcrxitSbPVmgL+JG6/K BSGjyTO3d3YpVWIwBanOGpP8GII6ob3mC19/wkpsm7rxxUkX2f0ngPpiY2rMhzfRCUAk 3dUD5bt0Pw+6zvzzeuYJRVuowxBFtOR4SxMsbQ8bUGuY4Z3/8yoWCquL9PAlwVaFnHqo b2QzrqAYvItPSx0xI4Mtc+WgZTNgJdpViHeP52Qv7xX8gA8lPuNDscCxyzRVb+nVzkZj S9vg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:references :message-id:subject:cc:to:from:date; bh=obOoOO9P6XSN8T+QD+3+P1czRHVLGe0shEP1v+vPnkU=; fh=9EDocdyt02CkEPms6r1HGobXkDE9Qrx/N4ahQcKDBBA=; b=lIGR3IM5QhsNe6nykZucx0hS0W8K40Dzolcj+qQAZOetm7mpgeZ4sLeMcYgBoNyDVa yc2/MCegVK8bnfjwY4BbNx6PRFFOexW5YAskP4XCLMx9s/4B2iNfzP662dpiRA9kl2Bf a5up8y1gBqY4YslOs3stwzWjn6b755vMBSgF0Rhxep5HoK912sij78UdUwGNd5CIo6cQ KMx8eUA+hZ8C5JgY/ho6calnENAuiSw99hqZUpqnSmCk5sPn02/D6M2REp3ruUNMHrw2 rI3q5TrJxGZypUyoY3RdnRuDc/O3P8fwQA8+b3Z4GQAELXp44moh/0FPYkjmjq+AHQoz F9eQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=lst.de); spf=pass (google.com: domain of linux-kernel+bounces-64845-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-64845-linux.lists.archive=gmail.com@vger.kernel.org" X-Forwarded-Encrypted: i=2; AJvYcCUKojgpycgVYUDVANDsAJUAP+WDWremMdNmslCord0SrE0P7NLevrtSiXEz4dSatAqqHfDpNQJ7Ikdo0kHe7gZ2GMeYoFNchncnb6mZrA== Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id pj7-20020a05620a1d8700b0078403a5e00csi9930651qkn.252.2024.02.13.23.46.31 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 23:46:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-64845-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=lst.de); spf=pass (google.com: domain of linux-kernel+bounces-64845-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-64845-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id AEB391C22392 for ; Wed, 14 Feb 2024 07:46:27 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C6D22134D9; Wed, 14 Feb 2024 07:46:07 +0000 (UTC) Received: from verein.lst.de (verein.lst.de [213.95.11.211]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 727B212E54; Wed, 14 Feb 2024 07:46:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=213.95.11.211 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707896767; cv=none; b=MmSP3+oSD31ZSBLZu2CkUcQqeVHb6tCqNfoD5nRWa3sAig62qfkH/vKYz3CLAoGhrDx6GeveO25jJerHo16ff6Abm9j8NIb0jxVnv1rBEgmuzm6V0qdoUirBubtR5lgb304z0I2M9ZWH4lYISFWvIQDvj3D/1PufhElUcpMHLl4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707896767; c=relaxed/simple; bh=j2L0dsxXcUALJ2xrMWdCxrXsQ3lueIEYhmFrnKTCdFc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=clCgajX+tGpMsftY9BxBuPFKcpOL7HtMVtBG7JVojpRLuY+TYE6G4FfA6cibwn3xnsod/b321CMlvBeMmHJ30HO+p7XllWYBBsov36JwlkSLmiGTSVPaj7m6Z35Ojo8sXH7QFNaIRx/0lNYTsCmcadapTJfNJptV6NU24FfpI0E= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de; spf=pass smtp.mailfrom=lst.de; arc=none smtp.client-ip=213.95.11.211 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=lst.de Received: by verein.lst.de (Postfix, from userid 2407) id 2A9CE227AAA; Wed, 14 Feb 2024 08:46:00 +0100 (CET) Date: Wed, 14 Feb 2024 08:45:59 +0100 From: Christoph Hellwig To: "Darrick J. Wong" Cc: Christoph Hellwig , John Garry , viro@zeniv.linux.org.uk, brauner@kernel.org, dchinner@redhat.com, jack@suse.cz, chandan.babu@oracle.com, martin.petersen@oracle.com, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com, ojaswin@linux.ibm.com Subject: Re: [PATCH 0/6] block atomic writes for XFS Message-ID: <20240214074559.GB10006@lst.de> References: <20240124142645.9334-1-john.g.garry@oracle.com> <20240213072237.GA24218@lst.de> <20240213175549.GU616564@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240213175549.GU616564@frogsfrogsfrogs> User-Agent: Mutt/1.5.17 (2007-11-01) On Tue, Feb 13, 2024 at 09:55:49AM -0800, Darrick J. Wong wrote: > On Tue, Feb 13, 2024 at 08:22:37AM +0100, Christoph Hellwig wrote: > > From reading the series and the discussions with Darrick and Dave > > I'm coming more and more back to my initial position that tying this > > user visible feature to hardware limits is wrong and will just keep > > on creating ever more painpoints in the future. > > > > Based on that I suspect that doing proper software only atomic writes > > using the swapext log item and selective always COW mode > > Er, what are you thinking w.r.t. swapext and sometimescow? What do you mean with sometimescow? Just normal reflinked inodes? > swapext > doesn't currently handle COW forks at all, and it can only exchange > between two of the same type of fork (e.g. both data forks or both attr > forks, no mixing). > > Or will that be your next suggestion whenever I get back to fiddling > with the online fsck patches? ;) Let's take a step back. If we want atomic write semantics without hardware offload, what we need is to allocate new blocks and atomically swap them into the data fork. Basicall an atomic version of xfs_reflink_end_cow. But yes, the details of the current swapext item might not be an exact fit, maybe it's just shared infrastructure and concepts. I'm not planning to make you do it, because such a log item would generally be pretty useful for always COW mode. > > and making that > > work should be the first step. We can then avoid that overhead for > > properly aligned writs if the hardware supports it. For your Oracle > > DB loads you'll set the alignment hints and maybe even check with > > fiemap that everything is fine and will get the offload, but we also > > provide a nice and useful API for less performance critical applications > > that don't have to care about all these details. > > I suspect they might want to fail-fast (back to standard WAL mode or > whatever) if the hardware support isn't available. Maybe for your particular DB use case. But there's plenty of applications that just want atomic writes without building their own infrastruture, including some that want pretty large chunks. Also if a file system supports logging data (which I have an XFS early prototype for that I plan to finish), we can even do the small double writes more efficiently than the application, all through the same interface.