Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp65378rdb; Mon, 4 Dec 2023 20:59:01 -0800 (PST) X-Google-Smtp-Source: AGHT+IFw0W05MN14f+6GJa2DHwZkTSQUNweimZcnXZUR+V+vxD6k3wsK7Khk5ZC7Ct4Q0Ti01CwU X-Received: by 2002:a05:6e02:f02:b0:35d:59a2:2c8 with SMTP id x2-20020a056e020f0200b0035d59a202c8mr6944269ilj.104.1701752340801; Mon, 04 Dec 2023 20:59:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701752340; cv=none; d=google.com; s=arc-20160816; b=XdEHb93IeESZkX4cFDqzFSOFfS6IfEOmBzQPxP+G2ufU2u9Dp+f09V1dEIBjb5uYAV nuULwD96WJCuYbFBuicFGZnzSC2Z7Ripp3MfZhgxvdq4NraH1A4P0vMmtfcgq0+U2l78 qMPKt7mJNqJRUdJCiYeBZxHEqx3iV9EgcqOsQFzNEpoATF8+o23k/4QJt7w+lszXVThG OFC7qPNsjNAXUIL5eNW1BE7gSfMj9F6ZPGglpfWZUtC02evDrLRH+texjyqR1sfcn5Jp DJU8/rLj9s21c3B00pkTxu2yLnXiT+m6jXjexcpwwpJMW3K+PzQvsXkkYG/xW0Elc8qm iizA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=xY31OtXpm6KSv94nmxlfIijYjDNR0HuHd+Om4l/DRL4=; fh=MbMaaoyokF8/8+3PgAcoNMm3iUj+MgBlDRmKW8tiJmc=; b=IPpmMYBnlwn+xjI4bJtuaNKJ/Y+ak9/u2FoXD8XmubDVMvPZAGUzqGjnO0lDh98WZL 47v0W8f+ZNyOQFvo9BkD+5XVc2/PeYcQHNqkp4nMu0iePL8FbHhWihy4GGWMqHSvz2d5 2NR1R+dbfuBeV5uAzPWu5YZqKImKcYcA/Tl3wGa2PVs1vfHpNlre71iYiWBtLV+09mCK POljA42i//WokifQEv/x7myQU3fY7G33P+FKx/PUExH4zofLQzdw3RtROmrcZVUpICGW cpOEf7d0LaD+5YgdeHgYvsonqDK1CwWgwV6f0HWEXPIEXYp7JFHE4WAwDAvaEeOFn8fA SWuw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mit.edu header.s=outgoing header.b=ULAuEAIm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mit.edu Return-Path: Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id 4-20020a630204000000b005c65d99187asi5189821pgc.140.2023.12.04.20.59.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Dec 2023 20:59:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@mit.edu header.s=outgoing header.b=ULAuEAIm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mit.edu Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 53DE680219F6; Mon, 4 Dec 2023 20:58:58 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344090AbjLEE6d (ORCPT + 99 others); Mon, 4 Dec 2023 23:58:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41012 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230074AbjLEE6c (ORCPT ); Mon, 4 Dec 2023 23:58:32 -0500 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 68F8AFA for ; Mon, 4 Dec 2023 20:58:38 -0800 (PST) Received: from cwcc.thunk.org (pool-173-48-111-98.bstnma.fios.verizon.net [173.48.111.98]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 3B54tdZl016948 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 4 Dec 2023 23:55:40 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1701752144; bh=xY31OtXpm6KSv94nmxlfIijYjDNR0HuHd+Om4l/DRL4=; h=Date:From:Subject:Message-ID:MIME-Version:Content-Type; b=ULAuEAImbbRp/EKwZ7vqMIWXT6OjmvjIgkTgGyD2NyabpELmR3xt1M7Wv5tFphTNy XiZVg+kRB4tu3DD9cFnL521xALsq037uudWRmRUK7vkgoYcEoqgB+Htr4GUEL0xIyy LNPjeRK8WI4kaZRcQ+8btulffIuW+dUqWHrMOgabNvPzMOdbmgmh9p5eHW3WRVb/ep Yi7dzwjk72n8SsYenWwqy0BIa468+PEZe3uAIMnixXE8eMy6TAzFNehDAUg68SjnCk /52qTNwdL2JoY1vDMEizYdaR4Y9c40vZA++hZxVi6HpdiFL1SvlEc/7gPguBe1Lv+1 /K1ac43DNI/4Q== Received: by cwcc.thunk.org (Postfix, from userid 15806) id 626A415C02E0; Mon, 4 Dec 2023 23:55:39 -0500 (EST) Date: Mon, 4 Dec 2023 23:55:39 -0500 From: "Theodore Ts'o" To: John Garry Cc: Christoph Hellwig , axboe@kernel.dk, kbusch@kernel.org, sagi@grimberg.me, jejb@linux.ibm.com, martin.petersen@oracle.com, djwong@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, chandan.babu@oracle.com, dchinner@redhat.com, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, jbongio@google.com, linux-api@vger.kernel.org Subject: Re: [PATCH 17/21] fs: xfs: iomap atomic write support Message-ID: <20231205045539.GH509422@mit.edu> References: <20230929102726.2985188-1-john.g.garry@oracle.com> <20230929102726.2985188-18-john.g.garry@oracle.com> <20231109152615.GB1521@lst.de> <20231128135619.GA12202@lst.de> <20231204134509.GA25834@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Mon, 04 Dec 2023 20:58:58 -0800 (PST) On Mon, Dec 04, 2023 at 03:19:15PM +0000, John Garry wrote: > > > > What is the 'dubious amazon torn-write prevention'? > > https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/storage-twp.html > > AFAICS, this is without any kernel changes, so no guarantee of unwanted > splitting or merging of bios. Well, more than one company has audited the kernel paths, and it turns out that for selected Kernel versions, after doing desk-check verification of the relevant kernel baths, as well as experimental verification via testing to try to find torn writes in the kernel, we can make it safe for specific kernel versions which might be used in hosted MySQL instances where we control the kernel, the mysql server, and the emulated block device (and we know the database is doing Direct I/O writes --- this won't work for PostgreSQL). I gave a talk about this at Google I/O Next '18, five years ago[1]. [1] https://www.youtube.com/watch?v=gIeuiGg-_iw Given the performance gains (see the talk (see the comparison of the at time 19:31 and at 29:57) --- it's quite compelling. Of course, I wouldn't recommend this approach for a naive sysadmin, since most database adminsitrators won't know how to audit kernel code (see the discussion at time 35:10 of the video), and reverify the entire software stack before every kernel upgrade. The challenge is how to do this safely. The fact remains that both Amazon's EBS and Google's Persistent Disk products are implemented in such a way that writes will not be torn below the virtual machine, and the guarantees are in fact quite a bit stronger than what we will probably end up advertising via NVMe and/or SCSI. It wouldn't surprise me if this is the case (or could be made to be the case) For Oracle Cloud as well. The question is how to make this guarantee so that the kernel knows when various cloud-provided block devicse do provide these greater guarantees, and then how to make it be an architected feature, as opposed to a happy implementation detail that has to be verified at every kernel upgrade. Cheers, - Ted