Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp519691ybl; Tue, 7 Jan 2020 10:08:42 -0800 (PST) X-Google-Smtp-Source: APXvYqzAfO9uBBRcN3vIpFGkhUXoAI9pJ+SVLvPFmBymFK4PaqL+nImy4QPyyQw4NXdffZXZ3NSx X-Received: by 2002:a05:6808:907:: with SMTP id w7mr617544oih.91.1578420522124; Tue, 07 Jan 2020 10:08:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578420522; cv=none; d=google.com; s=arc-20160816; b=LPOHTm2Ox6M7P0NYq/THZqsdvhFmoUiVsCfJbp4Q7LC0eIu9RXlJPW9tShVdyz3Jhs 7SB6ucCV0nE8H71sNFzHsNmazzpFPekMPoz9TJNpkAWWge8C1OMvbE00JnI/02Kej4+M TKoutKF3mbwh4oBuvnvmYz5RH5Hr0AIo+ivn9g5YvBafLUJ85wGpNVuhmA6iNUbCsekz 13GJUVGyBzyF1pri7txJ90hvblJ5EX/1jd5qfBgXGc0hDOXF2+fMLd20jQXqEwxDqa1e u8Qt63YIHHg8NFTCYXuqZVo4q7Hn85QH5gu1UfiI6JpIXlldl3C16/JXLuh8XqmERyj3 JZMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=s9qVcv2ok0qOOmijKDSuHWlK71aI9OJbo1lCq2yFN1g=; b=y95LaBcGge0qiceWi0+SvgKUubI1u8aHbZLFzSMjSQTGMhw4Ls0R/0W6PTEHsCZBPS xvNYhwrarOT/ZgCqUrbbAtRkzbchYpIa9Lhy1bIYXFop7hZLLRA0nIoyX+JwnoelFmos N3tGYlUw5TpQcYYhv017LUBeQIlTWDALNtaSQJQQ7or+Y4lPF+89RP2qNaOGCGOzqOLP nKw8jL5n0+tMeZ+SGVV6a+XeqFU6U3ChSJQpqc+xSJYYpoLj2y3WrGufap4p09QfV26U L3Rn5Uh6hmzRnqH1X3TSV0YhreEmG+ChUdyL1ak0iqGXLQPOksQ56kUags1PXDDFgvIR Z20A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=jQH6IumL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q66si383156oig.65.2020.01.07.10.08.29; Tue, 07 Jan 2020 10:08:42 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=jQH6IumL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728595AbgAGSHb (ORCPT + 99 others); Tue, 7 Jan 2020 13:07:31 -0500 Received: from mail-ot1-f68.google.com ([209.85.210.68]:45893 "EHLO mail-ot1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728534AbgAGSHa (ORCPT ); Tue, 7 Jan 2020 13:07:30 -0500 Received: by mail-ot1-f68.google.com with SMTP id 59so808019otp.12 for ; Tue, 07 Jan 2020 10:07:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=s9qVcv2ok0qOOmijKDSuHWlK71aI9OJbo1lCq2yFN1g=; b=jQH6IumLFu5DdnndMFOX12fwWMj0i8NatfwnP/5oCE1lYwuVmTMX/cnW78ZGwmvQ6C Qf0P7QezhZLY6yUTEncsy4CD2+tMq5afmzQir2tuRcj4aYCq6eLC7oaDEl8aEYUIBHsN jRw3z+WcIM8BoxFiz2LVFq3zFqYDP6dfz2dVBV0bsl1Jn3z6HVBWAAWrsIMw9Alg7iso 4qLs9mUkKKQReWortMYNeHaq6H1iki5wIw2Nh1eGzfZxxf0kVY57IcQnV3FcXeFoIGIf wyvQRZSkpk7lUva1wSK7WcXpc+5o1xaEGEtRTQSTS8psr/zB3XKTvtR4ukz9uZweRo1S A2VA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=s9qVcv2ok0qOOmijKDSuHWlK71aI9OJbo1lCq2yFN1g=; b=PMx3HftiqLFNFkf7deHuxU8DWD4aA0m+V3gU42S+ijdi0NXky8wo7jQXsc+/bKvK5O GbBYQgelmc0GyfN+zOIffRwYN6LU9V1JyMlHAxp4OHyAk4YvqDkrMW+teCQdYLP2lBn1 nzffN1jv+sWXi8rD6UaB3jg+6MtB/Co1ssiDE3PMR1Jah3HVjLqRHZDcxpTh+D2IAIcP sUau8KY/NYY69it4Q6k1olWAbVCtVHfD8d9+fzt0q+BYSN8py7QqUK5Bug0UItndoiu3 sGIZ5W8z+rCP+mIXg8XXWU1FfenSvqnjVMA1Ss8ZMZnyWhydX/xfSngsEuQRdaaWux0S 8GnA== X-Gm-Message-State: APjAAAVikLra5ftX0fYpp3xYQrwWug+PXma1DVuHZedkOQ3lKi44/Bnt XqXKSBDEDGwRiS2H9EwRjossgVLEisFHV+C0s9vtRQ== X-Received: by 2002:a9d:6f11:: with SMTP id n17mr1079009otq.126.1578420449528; Tue, 07 Jan 2020 10:07:29 -0800 (PST) MIME-Version: 1.0 References: <20190827163828.GA6859@redhat.com> <20190828065809.GA27426@infradead.org> <20190828175843.GB912@redhat.com> <20190828225322.GA7777@dread.disaster.area> <20191216181014.GA30106@redhat.com> <20200107125159.GA15745@infradead.org> <20200107170731.GA472641@magnolia> <20200107180101.GC15920@redhat.com> In-Reply-To: <20200107180101.GC15920@redhat.com> From: Dan Williams Date: Tue, 7 Jan 2020 10:07:18 -0800 Message-ID: Subject: Re: [PATCH 01/19] dax: remove block device dependencies To: Vivek Goyal Cc: "Darrick J. Wong" , Christoph Hellwig , Dave Chinner , Miklos Szeredi , linux-nvdimm , Linux Kernel Mailing List , "Dr. David Alan Gilbert" , virtio-fs@redhat.com, Stefan Hajnoczi , linux-fsdevel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 7, 2020 at 10:02 AM Vivek Goyal wrote: > > On Tue, Jan 07, 2020 at 09:29:17AM -0800, Dan Williams wrote: > > On Tue, Jan 7, 2020 at 9:08 AM Darrick J. Wong wrote: > > > > > > On Tue, Jan 07, 2020 at 06:22:54AM -0800, Dan Williams wrote: > > > > On Tue, Jan 7, 2020 at 4:52 AM Christoph Hellwig wrote: > > > > > > > > > > On Mon, Dec 16, 2019 at 01:10:14PM -0500, Vivek Goyal wrote: > > > > > > > Agree. In retrospect it was my laziness in the dax-device > > > > > > > implementation to expect the block-device to be available. > > > > > > > > > > > > > > It looks like fs_dax_get_by_bdev() is an intercept point where a > > > > > > > dax_device could be dynamically created to represent the subset range > > > > > > > indicated by the block-device partition. That would open up more > > > > > > > cleanup opportunities. > > > > > > > > > > > > Hi Dan, > > > > > > > > > > > > After a long time I got time to look at it again. Want to work on this > > > > > > cleanup so that I can make progress with virtiofs DAX paches. > > > > > > > > > > > > I am not sure I understand the requirements fully. I see that right now > > > > > > dax_device is created per device and all block partitions refer to it. If > > > > > > we want to create one dax_device per partition, then it looks like this > > > > > > will be structured more along the lines how block layer handles disk and > > > > > > partitions. (One gendisk for disk and block_devices for partitions, > > > > > > including partition 0). That probably means state belong to whole device > > > > > > will be in common structure say dax_device_common, and per partition state > > > > > > will be in dax_device and dax_device can carry a pointer to > > > > > > dax_device_common. > > > > > > > > > > > > I am also not sure what does it mean to partition dax devices. How will > > > > > > partitions be exported to user space. > > > > > > > > > > Dan, last time we talked you agreed that partitioned dax devices are > > > > > rather pointless IIRC. Should we just deprecate partitions on DAX > > > > > devices and then remove them after a cycle or two? > > > > > > > > That does seem a better plan than trying to force partition support > > > > where it is not needed. > > > > > > Question: if one /did/ have a partitioned DAX device and used kpartx to > > > create dm-linear devices for each partition, will DAX still work through > > > that? > > > > The device-mapper support will continue, but it will be limited to > > whole device sub-components. I.e. you could use kpartx to carve up > > /dev/pmem0 and still have dax, but not partitions of /dev/pmem0. > > So we can't use fdisk/parted to partition /dev/pmem0. Given /dev/pmem0 > is a block device, I thought tools will expect it to be partitioned. > Sometimes I create those partitions and use /dev/pmem0. So what's > the replacement for this. People often have tools/scripts which might > want to partition the device and these will start failing. Partitioning will still work, but dax operation will be declined and fall back to page-cache. > IOW, I do not understand that why being able to partition /dev/pmem0 > (which is a block device from user space point of view), is pointless. How about s/pointless/redundant/. Persistent memory can already be "partitioned" via namespace boundaries. Block device partitioning is then redundant and needlessly complicates, as you have found, the kernel implementation. The problem will be people that were on dax+ext4 on partitions. Those people will see a hard failure at mount whereas XFS will fallback to page cache with a warning in the log. I think ext4 must convert to the xfs dax handling model before partition support is dropped.