Received: by 2002:a05:7412:cfc7:b0:fc:a2b0:25d7 with SMTP id by7csp2482301rdb; Wed, 21 Feb 2024 08:56:28 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWqyzXbsKo9VsIvugcZ+/lSW8+eH7YUfA1UbI38J9yGhPjhDwPsMvzJlDhaDz1ZVltTSJkg+RfrGvLKGyXnCXn31eegBNuWn1Dxlv8NaA== X-Google-Smtp-Source: AGHT+IHL1WtoxxRV+MF32/xvWuDAf/SyMD8jhqLGdt5TqT45s8wk0imBCBs/53QtWeVHFoT/Amo8 X-Received: by 2002:ac2:4345:0:b0:512:bb2a:c96b with SMTP id o5-20020ac24345000000b00512bb2ac96bmr5022673lfl.28.1708534588513; Wed, 21 Feb 2024 08:56:28 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708534588; cv=pass; d=google.com; s=arc-20160816; b=YdQUTIM3oCRGzC/SCzi4J/m6o2hYdltc6OsaSzfkoWT67/+pHm6Ey/AmXBtCBskW9g TzIQ6DoY/pxjetOfuNagmMQYYJ6DpmmNTH0TLy+p55/CwoCq3K/k0zkvyjKa6e/zDtnr QkLqaOH15FjKe4XCzGmTW5bdd58smX2YyRk0vV137+KYtBRit7+oiw/FDvU1EPQYFn5Z atSeF3DBS/3bt/GBLnp6C5sn6jnpkFpV62/U8rMYbCw5OPoLM812LsTHanHSGTrVTABm Lhu/oYjFonoXk6O/4qrS8TR3DXTSLPD3O8yO8WeL+hDpTgCqwn4h0RaLtfqLAe/5D9kO Jd0Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=niX/+W5kpgROegyk0IVzm02gXjy4PNrcYyprD0pSsvw=; fh=2YkbzhRvFal51pv2NQG6lDuNt8nZpJatUOTsejnXb+c=; b=g5SgINaGhVEsdd+sOtL5dyvT2cn84RrOVWJ6snbenVxlqXC18f/mVmSJEhU/KtQCei KxryfS2wcjd2pU2qPVLWt/s960Na4RAu/W6e1e2UJpERHIGH3GfigGnl/q0v2T9omG0Q fMe2mJW8PPRTXKg9P1poTSK/h4eAA5jjR1lJNqjyeSoCDKDjc4n/OSUv1fAiLZYw+QrM H//859dZLyozWHUAHgnOx0rhqz+01ske8HWnGO14NCMI8OCL0Mq5D2zJHKqlUecppCSt Kzyk/XSNHoFGP66bn745FCkrFdMBS88y8SnF0rk2OfhfG1BIvciBvqwUfJdm2IULP5ID fxBw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=a4Bb5jCP; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-75146-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-75146-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id bl22-20020a056402211600b0056451b5433esi3673363edb.326.2024.02.21.08.56.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Feb 2024 08:56:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-75146-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=a4Bb5jCP; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-75146-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-75146-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 424131F25296 for ; Wed, 21 Feb 2024 16:56:28 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DB66783A00; Wed, 21 Feb 2024 16:56:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="a4Bb5jCP" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A2E880613; Wed, 21 Feb 2024 16:56:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708534577; cv=none; b=A7fhrLpDgE72D+7Yjm+FdmQH0jEJVDvddxzqeNc3I6+TNqYbZpcWjCZtNEs0+P3MmScB35gHVVsXMYpDozqC3EbMugMYy5N4kWHuLgD5Bz/QfckNvK/zAWeNrnLCuVDp5fqT1GL5Iq7gHVEGVOuFDFg2PRO7An4oswrCksQUBU8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708534577; c=relaxed/simple; bh=JvU0tQfghs9ivP7wwVrY8yM2tvT8Ba8a984fQlx+zQ4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ZcI74BSxkG0JVhQlmFNPR5fRlDBu76gv/OvbSlXgxX0n3eW9U97oSjdRqAQtUx7NymbDZ3cfwZdh6OcuxARxFL+ePmLpgTsZa3DgQop0ZV89uY3xPjs/5S/5QZIwRo23c0s0fbPxxww6CpKp0MFRb8Ns4jk5QUcmwj/MIwcC2xQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=a4Bb5jCP; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8B36DC433C7; Wed, 21 Feb 2024 16:56:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1708534576; bh=JvU0tQfghs9ivP7wwVrY8yM2tvT8Ba8a984fQlx+zQ4=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=a4Bb5jCPAd5fYX0Ux3wZTHtsK2kKfPBlut+e/Wp4xxv0kaaK+cCnVb7B/qLuWUNl0 2QfxdpL1/Ly4+rvEiITOSeHCqFjGFfo8pMrDfoMV5q668EMKxh7ngwyuwVvVUq2aZU /m8+R9RBffGExL2ZOUed+Du/qUyExN9aFhHwYxBs+qOCyYDwFz94QI/lfi6RK7EQTq QLv9gnThvx3eIOvV0HKLO9e8G25xK6IsPHmJn+lJs8lJrubqwdVdX8kVY6Yeog6rJ8 fAFMbdmxkKM2hFw8+YKnQsy8tu8qQbegZvgIwAYDoE8dJD+89nGN/enhTUwASyUBQ7 afIzc5RzN9Fwg== Date: Wed, 21 Feb 2024 08:56:15 -0800 From: "Darrick J. Wong" To: Christoph Hellwig Cc: John Garry , viro@zeniv.linux.org.uk, brauner@kernel.org, dchinner@redhat.com, jack@suse.cz, chandan.babu@oracle.com, martin.petersen@oracle.com, linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, tytso@mit.edu, jbongio@google.com, ojaswin@linux.ibm.com Subject: Re: [PATCH 0/6] block atomic writes for XFS Message-ID: <20240221165615.GH6184@frogsfrogsfrogs> References: <20240124142645.9334-1-john.g.garry@oracle.com> <20240213072237.GA24218@lst.de> <20240213175549.GU616564@frogsfrogsfrogs> <20240214074559.GB10006@lst.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240214074559.GB10006@lst.de> On Wed, Feb 14, 2024 at 08:45:59AM +0100, Christoph Hellwig wrote: > On Tue, Feb 13, 2024 at 09:55:49AM -0800, Darrick J. Wong wrote: > > On Tue, Feb 13, 2024 at 08:22:37AM +0100, Christoph Hellwig wrote: > > > From reading the series and the discussions with Darrick and Dave > > > I'm coming more and more back to my initial position that tying this > > > user visible feature to hardware limits is wrong and will just keep > > > on creating ever more painpoints in the future. > > > > > > Based on that I suspect that doing proper software only atomic writes > > > using the swapext log item and selective always COW mode > > > > Er, what are you thinking w.r.t. swapext and sometimescow? > > What do you mean with sometimescow? Just normal reflinked inodes? > > > swapext > > doesn't currently handle COW forks at all, and it can only exchange > > between two of the same type of fork (e.g. both data forks or both attr > > forks, no mixing). > > > > Or will that be your next suggestion whenever I get back to fiddling > > with the online fsck patches? ;) > > Let's take a step back. If we want atomic write semantics without > hardware offload, what we need is to allocate new blocks and atomically > swap them into the data fork. Basicall an atomic version of > xfs_reflink_end_cow. But yes, the details of the current swapext > item might not be an exact fit, maybe it's just shared infrastructure > and concepts. Hmm. For rt reflink (whenever I get back to that, ha) I've been starting to think that yes, we actually /do/ want to have a log item that tracks the progress of remap and cow operations. That would solve the problem of someone wanting to reflink a semi-written rtx. That said, it might complicate the reflink code quite a bit since right now it writes zeroes to the unwritten parts of an rt file's rtx so that there's only one mapping record for the whole rtx, and then it remaps them. That's most of why I haven't bothered to implement that solution. > I'm not planning to make you do it, because such a log item would > generally be pretty useful for always COW mode. One other thing -- while I was refactoring the swapext code into exch{range,maps}, it occurred to me that doing an exchange between the cow and data forks isn't possible because log recovery won't be able to do anything. There's no ondisk metadata to map a cow staging extent back to the file it came from, which means we can't generally resume an exchange operation. However for a small write I guess you could simply queue all the log intent items for all the changes needed and commit that. > > > and making that > > > work should be the first step. We can then avoid that overhead for > > > properly aligned writs if the hardware supports it. For your Oracle > > > DB loads you'll set the alignment hints and maybe even check with > > > fiemap that everything is fine and will get the offload, but we also > > > provide a nice and useful API for less performance critical applications > > > that don't have to care about all these details. > > > > I suspect they might want to fail-fast (back to standard WAL mode or > > whatever) if the hardware support isn't available. > > Maybe for your particular DB use case. But there's plenty of > applications that just want atomic writes without building their > own infrastruture, including some that want pretty large chunks. > > Also if a file system supports logging data (which I have an > XFS early prototype for that I plan to finish), we can even do > the small double writes more efficiently than the application, > all through the same interface. Heh. Ted's been trying to kill data=journal. Now we've found a use for it after all. :) --D