Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp2448966pxb; Thu, 4 Nov 2021 21:28:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxMLn1ofT+mVVS4u30ZeiyqgLkDFEyWNMiQvCp+menCegcPObNs0OgBbxMMk6FCAfFN/ek5 X-Received: by 2002:a05:6e02:1e0c:: with SMTP id g12mr36705928ila.86.1636086530124; Thu, 04 Nov 2021 21:28:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1636086530; cv=none; d=google.com; s=arc-20160816; b=q5ejeWDXK5PTYOBiEXJCynnL2vzOlIF1OFUObW+1SY+ZrKebIMIfG88+AwxMAOJLjy rPXcMrQSdKuDoRhLIxsUunYelcmQQJgyPUUkUwuYWW5dX3ahQa2FhB2JcuhdHNtcI9yO iriVE+BsFOtIT9jIXA8nrEVL2NCtZP/UDAw+yFfMPhBMMPI22I/StMguRdDBxhmChGgN tpyv2xLhmhJbbfOdmr094EMfNqZVyJzUdjT51XtaIISCJUmCAULTkrfskmHR9cOKVEk+ 0xa/2KV2JWKWNxQFDzpxuOv99N6ZcdbqucofSr7bdRkOoNX0U+l26o8NRB7WAPWVFmlL CbFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=vYTj4+Qj7wwu4cVg0DWamUnt0bl+yIq5bTcfsNjRRqw=; b=YddfgNmOuiK/Rfa9K72OW94H4NQPkg+1IQELNl0FtLQpc5N0ImdD4AXiqOJtnVH+sU uKHJO8cBvmo3v96VEz6HwQ0h2NlN2N3vNjIZ6w282mmuR0uStSBbqxb8p/EckJ0bA1QY Ng+edr97PclSJhwOvu6IiQnVP182YYnGGSboLRAajBJ8wyAsf5oJa1ZqlVdw31BDuSnV 3TrqyB61cZKaccUhxHaJ3xydIoPyy2J6Uqimejh2xB70Jhu7/gmin/d/C/xdP64Y2y2t 9uiH4RyaWZYuFk22qXcTFFXOHfEBG3HXW+bE3hI9zh//Aua/Ja/KKtWvS03lCUGI8hA6 5JgQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=GDZA848H; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l5si12114732ilh.2.2021.11.04.21.28.29; Thu, 04 Nov 2021 21:28:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=GDZA848H; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231732AbhKEDfZ (ORCPT + 99 others); Thu, 4 Nov 2021 23:35:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50322 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231426AbhKEDfZ (ORCPT ); Thu, 4 Nov 2021 23:35:25 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22C51C061714; Thu, 4 Nov 2021 20:32:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=vYTj4+Qj7wwu4cVg0DWamUnt0bl+yIq5bTcfsNjRRqw=; b=GDZA848HxP31gXPJrbJb1hsEqw 0f4xGgqaPeTYr/jnGmCuCwIDm9/vBUN70m1anN1+xf4jYdLKo8bnkpv98Tz/4LSFfutV+5RUbsHcn xvpui7RzzIwOqB9/ri4wv9GDDtkkShT1u8edH0UBE2oBmmJEo1CNw0G0wxCGXXKlubAjOM2Ro/Ehs bQ0RTb1mP+goFvBzd5NdSO+UkHqegStEbVRdq3ZH+RO1ssVK/8iJEk/+Yic/9NxIG4O8ZIrUL3aQU GV3eG6t1cGKVIkDmU4bZ62AO0lzIYsinGEt5zLBc3KGEkXCrJ55GrQMAElwFrvICgT/tM4n5HS7z3 Ew1FygLQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1mipvv-006JD0-HF; Fri, 05 Nov 2021 03:30:39 +0000 Date: Fri, 5 Nov 2021 03:30:27 +0000 From: Matthew Wilcox To: Theodore Ts'o Cc: "Darrick J. Wong" , Dan Williams , Christoph Hellwig , Eric Sandeen , Mike Snitzer , Ira Weiny , device-mapper development , linux-xfs , Linux NVDIMM , linux-s390 , linux-fsdevel , linux-erofs@lists.ozlabs.org, linux-ext4 , virtualization@lists.linux-foundation.org Subject: Re: futher decouple DAX from block devices Message-ID: References: <20211018044054.1779424-1-hch@lst.de> <21ff4333-e567-2819-3ae0-6a2e83ec7ce6@sandeen.net> <20211104081740.GA23111@lst.de> <20211104173417.GJ2237511@magnolia> <20211104173559.GB31740@lst.de> <20211104190443.GK24333@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Thu, Nov 04, 2021 at 11:09:19PM -0400, Theodore Ts'o wrote: > On Thu, Nov 04, 2021 at 12:04:43PM -0700, Darrick J. Wong wrote: > > > Note that I've avoided implementing read/write fops for dax devices > > > partly out of concern for not wanting to figure out shared-mmap vs > > > write coherence issues, but also because of a bet with Dave Hansen > > > that device-dax not grow features like what happened to hugetlbfs. So > > > it would seem mkfs would need to switch to mmap I/O, or bite the > > > bullet and implement read/write fops in the driver. > > > > That ... would require a fair amount of userspace changes, though at > > least e2fsprogs has pluggable io drivers, which would make mmapping a > > character device not too awful. > > > > xfsprogs would be another story -- porting the buffer cache mignt not be > > too bad, but mkfs and repair seem to issue pread/pwrite calls directly. > > Note that xfsprogs explicitly screens out chardevs. > > It's not just e2fsprogs and xfsprogs. There's also udev, blkid, > potententially systemd unit generators to kick off fsck runs, etc. > There are probably any number of user scripts which assume that file > systems are mounted on block devices --- for example, by looking at > the output of lsblk, etc. > > Also note that block devices have O_EXCL support to provide locking > against attempts to run mkfs on a mounted file system. If you move > dax file systems to be mounted on a character mode device, that would > have to be replicated as well, etc. So I suspect that a large number > of subtle things would break, and I'd strongly recommend against going > down that path. Agreed. There were reasons we decided to present pmem as "block device with extra functionality" rather than try to cram all the block layer functionality (eg submitting BIOs for filesystem metadata) into a character device. Some of those assumptions might be worth re-examining, but I haven't seen anything that makes me say "this is obviously better than what we did at the time".