Received: by 2002:a05:6500:2018:b0:1fb:9675:f89d with SMTP id t24csp937113lqh; Sat, 1 Jun 2024 02:34:27 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUT/croXhHzvAF49AMgz9Gl5G1VZHQzY9cApvV2vRUpNYN8vFLnVrhZ92Vka9FT0V2JY1HRzlfO9lpEoJgvfEETFAkBXnOYnutThW5rDg== X-Google-Smtp-Source: AGHT+IED+duppLShvMowBUEr7stR/OzIU+Tq7sbrO14faVsCRX8zEsWMXcKPHs3Xt9ZnV8jrtX70 X-Received: by 2002:a17:906:fc1a:b0:a68:b159:11ee with SMTP id a640c23a62f3a-a68b1591740mr83908466b.12.1717234466990; Sat, 01 Jun 2024 02:34:26 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717234466; cv=pass; d=google.com; s=arc-20160816; b=wFmWepHQse/JAfsMdGNe++EWQNW4IUUY3OpCNkOTmRpZ96r1ljT03ZsQxjPT/muqpF hD1a2oJ9CSHvO97BnYqzDBkp3XQeheli2KsnnB4sW167g4sx8HfCRHHPeJyI5e/yPYvo vqoicaPFf8tAiYBg4IPECUFVVwh3+HD0rT3t6XpFaJIqBi/XMxrbIOOWtl+vedJl3v3z nRp4lx3Lk7POA2J7t/sifKxepL2wCrboxz8CSFokcGnUWNO5DCdBKbLYT2sCY4CdyqRh 7rkOUSiHzF70VeHun0w96UW5MMeILIBs5vHzEDKDDQkCG2iYyN69A34UBAq8vot9hJ2E VvFw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=GZ0408TmIABRPqW2q8FHPsPmzLEavYEG/LOb2+cRwi4=; fh=Ny9xUFu5ZCQdUBtrTyqZAaVSgIiXt6v61Wfq1G1b3P0=; b=I2l+ZokPn6wuaoBzYucKAjAOkszCa6h9TdArYkqDjBSKghqGLrIckoz4pO/H81XMGv 52sgO77UWTlC5/oA2NBdna0BaOcLu1iDRNJKpwG5n81wPPCIAxf2gHIr9yRuV4E3NfHh Zon1fjM0GDK95CNIMSO1qcHLltXVRrUBpQICPXnpAyuK8BMCR5kGNRb1cO7inXwHNkWr 5WRVFM/jCLxFCzXho1H15B1TdfxgyaI+FpELh2K38JvmxeiD549a5wnA0imwMPb91DFt nr/JWE8kBBrebOHs7Yd0lI8z3gzXW3sUdzPOnj0cZue6q0wvKpSfpzuXyIGtDAL25OFL R3/g==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@mit.edu header.s=outgoing header.b=dxwOQmNH; arc=pass (i=1 spf=pass spfdomain=mit.edu dkim=pass dkdomain=mit.edu dmarc=pass fromdomain=mit.edu); spf=pass (google.com: domain of linux-kernel+bounces-197790-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-197790-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mit.edu Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id a640c23a62f3a-a68bd97e25dsi35470466b.297.2024.06.01.02.34.26 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 01 Jun 2024 02:34:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-197790-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@mit.edu header.s=outgoing header.b=dxwOQmNH; arc=pass (i=1 spf=pass spfdomain=mit.edu dkim=pass dkdomain=mit.edu dmarc=pass fromdomain=mit.edu); spf=pass (google.com: domain of linux-kernel+bounces-197790-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-197790-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mit.edu Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id B5C721F222A2 for ; Sat, 1 Jun 2024 09:34:26 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6476114EC49; Sat, 1 Jun 2024 09:34:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=mit.edu header.i=@mit.edu header.b="dxwOQmNH" Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE7AA14E2EC for ; Sat, 1 Jun 2024 09:34:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=18.9.28.11 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717234456; cv=none; b=INo7Yzatx333hyB5C/oRyeVQhTILk532/djOeI9n1CURHtWmc3kYE61wUvM3kcoz9dIpm3w5lzF5Od2j+CX6haKXnIuk2Wbv8rvsaRKz3uurhJ0UeM/6nmgWOMukg9h4w7bkQ0T6inuRJJHcTElkxjuKOBaBc38j9jh0NI3nYDo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717234456; c=relaxed/simple; bh=PDIhnFKslvYdvVWIZtqzQUmt09I6mMJxyq/3liyAGkA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=AjP9p+spEvFSsHeYVXnr6wHGxXw6jJ+On5eDDSt8wV2mbPxZshJBatjhIRolDgWFMaiGU3BZ0YXpyYXdqVav3hDufyubB260qFc1yZTD4HK7pNSlXJZVDQo9r7MHEaywxO3aWKLIJoQYTrNvWMJjnl+7kRpxnGAeAVEBDY5UZ9M= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=mit.edu; spf=pass smtp.mailfrom=mit.edu; dkim=pass (2048-bit key) header.d=mit.edu header.i=@mit.edu header.b=dxwOQmNH; arc=none smtp.client-ip=18.9.28.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=mit.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=mit.edu Received: from macsyma.thunk.org (unn-149-40-50-25.datapacket.com [149.40.50.25] (may be forged)) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 4519XSrK015047 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 1 Jun 2024 05:33:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1717234413; bh=GZ0408TmIABRPqW2q8FHPsPmzLEavYEG/LOb2+cRwi4=; h=Date:From:Subject:Message-ID:MIME-Version:Content-Type; b=dxwOQmNHGPD6R3ZrzWqaLbnYu5Bbbju7WhHtcVu/4qC5cef3PAcsJcrZhDHtHuNSo /BXQLgZwEGVWtOIkKNmwQEZRgME27WWPqTnlbNee2X5ucRMtEeA6ghuvsjrXupgQ9G 46l5V5Tpa5ASaqfz40rmMEiCR8VMpUs+RxyeCSv6PEAsyu6mCjXSOurp9GKRLdgHA1 Ezf5/WcPUMX7m6/MJLAajKA75JlXTDMCqxKx2IfgBQ2yqcnn+Zy4SPPdcUQWJd47Us g7q3sXLnOVWHY6WOBs/eDkygQshB0E4AuTK0hB37bTiksFYnZhmeRPZ4pWCtw8tVKA 1ptBxtc1XtnRA== Received: by macsyma.thunk.org (Postfix, from userid 15806) id D21FA340FB3; Sat, 01 Jun 2024 11:33:25 +0200 (CEST) Date: Sat, 1 Jun 2024 11:33:25 +0200 From: "Theodore Ts'o" To: John Garry Cc: Luis Chamberlain , David Bueso , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm , "Martin K. Petersen" , Matthew Wilcox , Dave Chinner , linux-kernel@vger.kernel.org, catherine.hoang@oracle.com Subject: Re: [LSF/MM/BPF TOPIC] untorn buffered writes Message-ID: <20240601093325.GC247052@mit.edu> References: <20240228061257.GA106651@mit.edu> <9e230104-4fb8-44f1-ae5a-a940f69b8d45@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Thu, May 23, 2024 at 12:59:57PM +0100, John Garry wrote: > > That's my point really. There were some positive discussion. I put across > the idea of implementing buffered atomic writes, and now I want to ensure > that everyone is satisfied with that going forward. I think that a LWN > report is now being written. I checked in with some PostgreSQL developers after LSF/MM, and unfortunately, the idea of immediately sending atomic buffered I/O directly to the storage device is going to be problematic for them. The problem is that they depend on the database to coalesce writes for them. So if they are doing a large database commit that involves touching hundreds or thousands of 16k database pages, they today issue a separate buffered write request for each database page. So if we turn each one into an immediate SCSI/NVMe write request, that would be disastrous for performance. Yes, when they migrate to using Direct I/O, the database is going to have to figure out how to coalesce write requests; but this is why it's going to take at least 3 years to make this migration (and some will call this hopelessly optimistic), and then users will probably wait another 3 to 5 years before they trust that the database rewrite to use Direct I/O will get it right and trust their enterprise workloads to it.... So I think this goes back to either (a) trying to track which writes we've promised atomic write semantics, or (b) using a completely different API that only promises "untorn writes with a specified granulatity" approach for the untorn buffered writes I/O interface, instead in addition to, or instead of, the current "atomic write" interface which we are currently trying to promulate for Direct I/O. Personally, I'd advocate for two separate interfaces; one for "atomic" I/O's, and a different one for "untorn writes with a specified guaranteed granularity". And if XFS folks want to turn the atomic I/O interface into something where you can do a multi-megabyte atomic write into something that requires allocating new blocks and atomically mutating the file system metadata to do this kind of atomicity --- even though the Database folks Don't Care --- God bless. But let's have something which *just* promises the guarantee requested by the primary requesteres of this interface, at least for the buffered I/O case. Cheers, - Ted