Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp5701130ybl; Tue, 14 Jan 2020 13:30:48 -0800 (PST) X-Google-Smtp-Source: APXvYqx6zEc9DMuxB05fpcdJL05Idc2250wi/Sdrt2HGHAGg2+TxsCfOvnNI3yxj6tcUgcF8kCov X-Received: by 2002:aca:ac0d:: with SMTP id v13mr17590959oie.160.1579037448819; Tue, 14 Jan 2020 13:30:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579037448; cv=none; d=google.com; s=arc-20160816; b=Tcj5N0B7XEiq5kO0ADAXPxBuqcy5332SXogN+MubQ2RDcLCdXxSmllSpMS8dJkuF3Z UkPYohcWyGBppq96rdmPXObM9mwPeJYQui4YooPahkhwTB5phUrs7nTh1+QDCHP90NQa IfFODgZZNskE1cih3w6wcYkwONMw+yXdh5Rqu0dqtitGTu+8U4suHwgD/DDC8VrVyhLA Rj5YbF2G6Clafr2G2hSyKtRsPYldokRyEBOK6NTQ21+Ldag2cjqlEJ4BmI2z0Ce3AHi/ XvogYLxqL/92ljFuHy8UnO8gE84rq8e7cycUsfOa1ButkwcLANB0pSo7fB8reJFvCPx/ LvUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=CmEl4tOAlf+q1dssiqaydOdwINh9FE1HPfFzHT2TJms=; b=KfLFQQq//R0+H+puLYzkPG/B2IydiWKNbYgqyZ3IucymbjP5NAH39jX3O5gRPDU+XI ZHQWIlrNXvpMtDwD7oF1vx2Mgx0fIr1lxaUHUgDGZeTfH6c6zi/246yCqvZ/FxNQ0Rv6 lQFbKJ5cum+0AdnSWySGDawqEuABhMKWwsOBJ3Gtx80AWOJZH5Nl+hVS28lE2gc4YOBE oCTg9vPaXHSJjSWAI231kVqzc8sm9i+S7awHWw7ulgAtCRkeHlIhv8g+D7t0C+2ttFKk VjTSdO4CEG4vHmQhTN+OeKKST1pJqg+VCGpEvYvi6fGyPybk5J4qkxjfIBjJ67tRq3mn pl/A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RAxbJx2D; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t12si9038616otm.224.2020.01.14.13.30.33; Tue, 14 Jan 2020 13:30:48 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RAxbJx2D; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728883AbgANV2Q (ORCPT + 99 others); Tue, 14 Jan 2020 16:28:16 -0500 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:37754 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726491AbgANV2Q (ORCPT ); Tue, 14 Jan 2020 16:28:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1579037296; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=CmEl4tOAlf+q1dssiqaydOdwINh9FE1HPfFzHT2TJms=; b=RAxbJx2DYUNoQz47bEidt5YOfffdqrwh9z1ngL0OsJOS3wleSQriPOGSmMs/bmT1BvD2yw bMJIpQ6XuvLVfyxh3qluIrhDBiZ65W4l34VTVWUvE4j8Bta1Kifbwzfa75LLVbHkjqKYxk hyMvbUutosrAwE4JaSJJ1wjOhTiqL4c= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-225-ulhA3bQmN5-1R1ohd67YdA-1; Tue, 14 Jan 2020 16:28:12 -0500 X-MC-Unique: ulhA3bQmN5-1R1ohd67YdA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9B0CE100728F; Tue, 14 Jan 2020 21:28:10 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id B2C7B10372F3; Tue, 14 Jan 2020 21:28:05 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 47D7B220A24; Tue, 14 Jan 2020 16:28:05 -0500 (EST) Date: Tue, 14 Jan 2020 16:28:05 -0500 From: Vivek Goyal To: Dan Williams Cc: Jan Kara , "Darrick J. Wong" , Christoph Hellwig , Dave Chinner , Miklos Szeredi , linux-nvdimm , Linux Kernel Mailing List , "Dr. David Alan Gilbert" , virtio-fs@redhat.com, Stefan Hajnoczi , linux-fsdevel , Jeff Moyer Subject: Re: [PATCH 01/19] dax: remove block device dependencies Message-ID: <20200114212805.GB3145@redhat.com> References: <20200107170731.GA472641@magnolia> <20200107180101.GC15920@redhat.com> <20200107183307.GD15920@redhat.com> <20200109112447.GG27035@quack2.suse.cz> <20200114203138.GA3145@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.12.1 (2019-06-15) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 14, 2020 at 12:39:00PM -0800, Dan Williams wrote: > On Tue, Jan 14, 2020 at 12:31 PM Vivek Goyal wrote: > > > > On Thu, Jan 09, 2020 at 12:03:01PM -0800, Dan Williams wrote: > > > On Thu, Jan 9, 2020 at 3:27 AM Jan Kara wrote: > > > > > > > > On Tue 07-01-20 10:49:55, Dan Williams wrote: > > > > > On Tue, Jan 7, 2020 at 10:33 AM Vivek Goyal wrote: > > > > > > W.r.t partitioning, bdev_dax_pgoff() seems to be the pain point where > > > > > > dax code refers back to block device to figure out partition offset in > > > > > > dax device. If we create a dax object corresponding to "struct block_device" > > > > > > and store sector offset in that, then we could pass that object to dax > > > > > > code and not worry about referring back to bdev. I have written some > > > > > > proof of concept code and called that object "dax_handle". I can post > > > > > > that code if there is interest. > > > > > > > > > > I don't think it's worth it in the end especially considering > > > > > filesystems are looking to operate on /dev/dax devices directly and > > > > > remove block entanglements entirely. > > > > > > > > > > > IMHO, it feels useful to be able to partition and use a dax capable > > > > > > block device in same way as non-dax block device. It will be really > > > > > > odd to think that if filesystem is on /dev/pmem0p1, then dax can't > > > > > > be enabled but if filesystem is on /dev/mapper/pmem0p1, then dax > > > > > > will work. > > > > > > > > > > That can already happen today. If you do not properly align the > > > > > partition then dax operations will be disabled. This proposal just > > > > > extends that existing failure domain to make all partitions fail to > > > > > support dax. > > > > > > > > Well, I have some sympathy with the sysadmin that has /dev/pmem0 device, > > > > decides to create partitions on it for whatever (possibly misguided) > > > > reason and then ponders why the hell DAX is not working? And PAGE_SIZE > > > > partition alignment is so obvious and widespread that I don't count it as a > > > > realistic error case sysadmins would be pondering about currently. > > > > > > > > So I'd find two options reasonably consistent: > > > > 1) Keep status quo where partitions are created and support DAX. > > > > 2) Stop partition creation altogether, if anyones wants to split pmem > > > > device further, he can use dm-linear for that (i.e., kpartx). > > > > > > > > But I'm not sure if the ship hasn't already sailed for option 2) to be > > > > feasible without angry users and Linus reverting the change. > > > > > > Christoph? I feel myself leaning more and more to the "keep pmem > > > partitions" camp. > > > > > > I don't see "drop partition support" effort ending well given the long > > > standing "ext4 fails to mount when dax is not available" precedent. > > > > > > I think the next least bad option is to have a dax_get_by_host() > > > variant that passes an offset and length pair rather than requiring a > > > later bdev_dax_pgoff() to recall the offset. This also prevents > > > needing to add another dax-device object representation. > > > > I am wondering what's the conclusion on this. I want to this to make > > progress in some direction so that I can make progress on virtiofs DAX > > support. > > I think we should at least try to delete the partition support and see > if anyone screams. Have a module option to revert the behavior so > people are not stuck waiting for the revert to land, but if it stays > quiet then we're in a better place with that support pushed out of the > dax core. Hi Dan, So basically keep partition support code just that disable it by default and it is enabled by some knob say kernel command line option/module option. At what point of time will we remove that code completely. I mean what if people scream after two kernel releases, after we have removed the code. Also, from distribution's perspective, we might not hear from our customers for a very long time (till we backport that code in to existing releases or release this new code in next major release). From that view point I will not like to break existing user visible behavior. How bad it is to keep partition support around. To me it feels reasonaly simple where we just have to store offset into dax device into another dax object and pass that object around (instead of dax_device). If that's the case, I am not sure why to even venture into a direction where some user's setup might be broken. Also from an application perspective, /dev/pmem is a block device, so it should behave like a block device, (including kernel partition table support). From that view, dax looks like just an additional feature of that device which can be enabled by passing option "-o dax". IOW, can we reconsider the idea of not supporting kernel partition tables for dax capable block devices. I can only see downsides of removing kernel partition table support and only upside seems to be little cleanup of dax core code. Thanks Vivek