Received: by 2002:ab2:7855:0:b0:1f9:5764:f03e with SMTP id m21csp708813lqp; Wed, 22 May 2024 18:37:19 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVPF+TDKCEA8z2F9JN/B6rAaO1A1l11/t0OMKwfWlbcdxGZ2M1urmPOh/TLWbFm/tT3GDvpbfpl7mZ/cpKTbfs3gfmt5dm2kJPiVgCnwg== X-Google-Smtp-Source: AGHT+IHJGz5Eu5zM77A68qJEHQx52kpv7jW0rHsqIlf2g0LSSMKcdrdFozCsxmK8fXNHCQlh/dLd X-Received: by 2002:a05:622a:a193:b0:43e:26f5:7e5d with SMTP id d75a77b69052e-43fa7456233mr15309411cf.12.1716428239480; Wed, 22 May 2024 18:37:19 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1716428239; cv=pass; d=google.com; s=arc-20160816; b=SJZzIqy06BhpSxdV27Kg2STNxYMH8XqYgCzSNKhWa8ste8/rWm0ljffEjIG01Lf2eK BP4RrdOF5GFoVsJrSi6auZism5uHXq/SaupZtRZLw/19/C/csv5RnGPbAfwzxM2sxKpg tzVRaIwik950mxLDbWp9hEBJ3+mt8xWx/UU0Gb+1jtLBnf2oOtYKos6fM/JggIOQhJeA WXeiJOfDA+4d+j99rydlfW5tT3MQeLrXVKP+j9v2s8ZG1S1nljBA2OGUrzYiJBnh9DSJ B78kb4phPwLamECgphL76F2AeWujb61xxoT3QzcAzXoJ2unf7v83bzhDDjE06ojswNCx hbIQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:in-reply-to:content-disposition:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:references :message-id:subject:cc:to:from:date:dkim-signature; bh=OtlyxkkzB2BjgfYbFdYm1Yvm8zkHcVIsiDANoj0QdIg=; fh=VmPQSMnSyFqzBX+zfpAyLjQzB3zDIWdSpJ8HT/nI7eU=; b=P44tTxfM/HZ0glGb1Qj/9DHY+PgURUnK13fr8au5GiUUY+Tnv8S0DsTGp7HnCyU6B3 gdth68guF66WbTlq8bYOmwqLQko6iDskL9FpOn8NkvPeQS6jk8FB89NuHbaLmV266eyZ /iWR/8vdGnce+6Pvs6VTi+qyTXhOiJ1L7sC7C4PxN+sNJVAiqTf9rcEOj44p1gNplPfi /7KcKVxh5enYuKu6+ATIRzlcqvxepXHKhYOdsE4QpiDfGTOQd+ztN84hVg2GXf3BItcs vBJOU6LNqYAEMPhWqkT0jHJvusN9tXceY6OQXpczN4dHwcg5CTregdC/xaTmbF7UFI3g qj1Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=PAgB0mL+; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-186756-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-186756-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id d75a77b69052e-43df54b55ebsi10302691cf.218.2024.05.22.18.37.19 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 May 2024 18:37:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-186756-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=bombadil.20210309 header.b=PAgB0mL+; arc=pass (i=1 dkim=pass dkdomain=infradead.org); spf=pass (google.com: domain of linux-kernel+bounces-186756-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-186756-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 4C78B1C20D63 for ; Wed, 22 May 2024 21:58:28 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A11E8149E08; Wed, 22 May 2024 21:57:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="PAgB0mL+" Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C6531149DE6; Wed, 22 May 2024 21:57:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716415022; cv=none; b=FY8wqosP/XHU0g4rPhEZ6EzIYPSzAdmCSj/8r/qW8dC3mx5nYhe1A7XPw1O7FsJR9vGqHRJ6chH+UnpdaQ0ZOnjqNYeg7zHvSUJ24eRhGEHPJjqv2H2vh4xrRlIzpMb+WWKD2aawxgVol+Gvp6zo0M83ZCw4QYioyghREV5r0Nw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716415022; c=relaxed/simple; bh=N4qVpG58VKqKgZKyTkTeB1CudfqXIits3TVA1PWshf0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=KjSYJ+J6pKShJXKnIg6IqQSRt7WHNjhyCoOZsn+GCdMtotQzSv2EAgD3TXfVbggzj/yQWwSmFU/XmqtZvn7xCcVDxRl0Dk950XBKiLiZ/rsnBY80Ev6b46TMhjoNb5uqta9tzStOCZc+PzfdDEdGD5594JB4HtHPyebRNCpoeII= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=PAgB0mL+; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=OtlyxkkzB2BjgfYbFdYm1Yvm8zkHcVIsiDANoj0QdIg=; b=PAgB0mL+L3w4fZSyf1x7yDLl/w +XtXEAgSQ4Tp3RSWdju1UUHftUS3ScXd+YO2y8ICZNubA8jiygpNZYPP4ifan1zfHWc6Oj42FOUaA Z+r37SPbuTaiUZJuJOMpRCg2qanunYYdB52ph2jlnVZV2OitmJcT9kRTL2G26gwEqk/W58rt77zbh Kfaky9WKflalU3s1Kt60l0flCR5WID5qeaFV0QD0b4MQRoHGN07fI6+XM6lfGVoNIK2Tx/iThycfT yySypAeMLhYtTGqSZnqa7fcHroQWlnNj7WPUHCpXCTiKohr3eyXOdSvjowd6VQvQgBnuQOCNG4kh8 9HiVCYlQ==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1s9txB-00000004Cgh-2a85; Wed, 22 May 2024 21:56:57 +0000 Date: Wed, 22 May 2024 14:56:57 -0700 From: Luis Chamberlain To: John Garry , David Bueso Cc: Theodore Ts'o , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm , "Martin K. Petersen" , Matthew Wilcox , Dave Chinner , linux-kernel@vger.kernel.org Subject: Re: [LSF/MM/BPF TOPIC] untorn buffered writes Message-ID: References: <20240228061257.GA106651@mit.edu> <9e230104-4fb8-44f1-ae5a-a940f69b8d45@oracle.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9e230104-4fb8-44f1-ae5a-a940f69b8d45@oracle.com> Sender: Luis Chamberlain On Wed, May 15, 2024 at 01:54:39PM -0600, John Garry wrote: > On 27/02/2024 23:12, Theodore Ts'o wrote: > > Last year, I talked about an interest to provide database such as > > MySQL with the ability to issue writes that would not be torn as they > > write 16k database pages[1]. > > > > [1] https://urldefense.com/v3/__https://lwn.net/Articles/932900/__;!!ACWV5N9M2RV99hQ!Ij_ZeSZrJ4uPL94Im73udLMjqpkcZwHmuNnznogL68ehu6TDTXqbMsC4xLUqh18hq2Ib77p1D8_4mV5Q$ > > > > After discussing this topic earlier this week, I would like to know if there > are still objections or concerns with the untorn-writes userspace API > proposed in https://lore.kernel.org/linux-block/20240326133813.3224593-1-john.g.garry@oracle.com/ > > I feel that the series for supporting direct-IO only, above, is stuck > because of this topic of buffered IO. I think it was good we had the discussions at LSFMM over it, however I personally don't percieve it as stuck, however without any consensus being obviated or written down anywhere it would not be clear to anyone that we did reach any consensus at all. Hope is that lwn captures any consensus if any was indeed reached as you're not making it clear any was. In case it helps, as we did with the LBS effort it may also be useful to put together bi-monthly cabals to follow up progress, and divide and conquer any pending work items. > So I sent an RFC for buffered untorn-writes last month in https://lore.kernel.org/linux-fsdevel/20240422143923.3927601-1-john.g.garry@oracle.com/, > which did leverage the bs > ps effort. Maybe it did not get noticed due to > being an RFC. It works on the following principles: > > - A buffered atomic write requires RWF_ATOMIC flag be set, same as > direct IO. The same other atomic writes rules apply. > - For an inode, only a single size of buffered write is allowed. So for > statx, atomic_write_unit_min = atomic_write_unit_max always for > buffered atomic writes. > - A single folio maps to an atomic write in the pagecache. So inode > address_space folio min order = max order = atomic_write_unit_min/max > - A folio is tagged as "atomic" when atomically written and written back > to storage "atomically", same as direct-IO method would do for an > atomic write. > - If userspace wants to guarantee a buffered atomic write is written to > storage atomically after the write syscall returns, it must use > RWF_SYNC or similar (along with RWF_ATOMIC). From my perspective the above just needs the IOCB atomic support, and the pending long term work item there is the near-write-through buffered IO support. We could just wait for buffered-IO support until we have support for that. I can't think of anying blocking DIO support though, now that we at least have a mental model of how buffered IO *should* work. What about testing? Are you extending fstests, blktests? Luis