Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp1080609imm; Fri, 1 Jun 2018 15:06:08 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKoHPGtLBerUqYhZgqbzEnGMwk2uaKvl6dbj8UsXSNzIV0GyPRUTGSdFv7ijI1areYU0J4j X-Received: by 2002:a17:902:a703:: with SMTP id w3-v6mr13019130plq.111.1527890768629; Fri, 01 Jun 2018 15:06:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527890768; cv=none; d=google.com; s=arc-20160816; b=tYG6NzGIO+ZNeCGcgU7ZA6Upe4jVFMYMmNwVhXTAQqm+ULahEBvQE3VgKfdu0zXzb9 k+bBMdWbXzDxUpS1A5wr/q2uslAzXjKijBsHohrHUVDIFMb6+98gRZlKL/9OCIxmZ/Jc DnDsoYcmpKyS77ziCfuIPFOYh9c6ptu6IqGpwa4HQu+ifIyvIOJtQcoXR8SE1X2NUkdM 3NoYeVYgIqfeO8AASIAag3/cBR5QTekMJ8N6fiDjpjCDMfwvpmjs3PJP4nQlvZYTLVMF GWSxYHgaiqlcOgBXPpwwDvkON9INHpfU+tEnxGEwJtgp4BWOwfUR9baJY3QzflVn6af2 UemQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=hclk8lWdfOCEACY03KR4SQDgI4yJtprwTxYTPtQrIZo=; b=XGKjTpDMzN6VIJhkiFBv2JdeUy3zYhaPLZpDLAZgMOptFGxN7/9ex3TtkGV+stfomX ASjk6RiQDhwug1jJ2ZHFen4lRViB8I0pqobZQb1LUQINTUe9AL9G0q6JgBHOeLey6jVf LJ2rwV/mrKUV0UhoMaatMjh/+vbFXqPZ4WDUB2CPwKzozoh8N6fdKjmtOpw4tpSO48fw bHAtz1fFptRaGxF+jgZIar3lv0T73MC6hEWNc3isNCnbvUr/Yz0vNg1T42o6EcMScevt Q4nbJf/JMMcRonGXQcI5tgKRGuhZzImKENHkYbko8nyd9CCAH/a0TX8GTE80jEktDOvs LtTg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o10-v6si5050340pgr.175.2018.06.01.15.05.52; Fri, 01 Jun 2018 15:06:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751157AbeFAWEs (ORCPT + 99 others); Fri, 1 Jun 2018 18:04:48 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:56202 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750916AbeFAWEp (ORCPT ); Fri, 1 Jun 2018 18:04:45 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A981A401EF23; Fri, 1 Jun 2018 22:04:44 +0000 (UTC) Received: from localhost (unknown [10.18.25.149]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 65E0B210C6CF; Fri, 1 Jun 2018 22:04:44 +0000 (UTC) Date: Fri, 1 Jun 2018 18:04:43 -0400 From: Mike Snitzer To: Ross Zwisler Cc: Toshi Kani , dm-devel@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-xfs@vger.kernel.org Subject: Re: [PATCH v2 5/7] dm: remove DM_TYPE_DAX_BIO_BASED dm_queue_mode Message-ID: <20180601220443.GB18712@redhat.com> References: <20180529195106.14268-1-ross.zwisler@linux.intel.com> <20180529195106.14268-6-ross.zwisler@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180529195106.14268-6-ross.zwisler@linux.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Fri, 01 Jun 2018 22:04:44 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Fri, 01 Jun 2018 22:04:44 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'msnitzer@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 29 2018 at 3:51pm -0400, Ross Zwisler wrote: > The DM_TYPE_DAX_BIO_BASED dm_queue_mode was introduced to prevent DM > devices that could possibly support DAX from transitioning into DM devices > that cannot support DAX. > > For example, the following transition will currently fail: > > dm-linear: [fsdax pmem][fsdax pmem] => [fsdax pmem][fsdax raw] > DM_TYPE_DAX_BIO_BASED DM_TYPE_BIO_BASED > > but these will both succeed: > > dm-linear: [fsdax pmem][brd ramdisk] => [fsdax pmem][fsdax raw] > DM_TYPE_DAX_BIO_BASED DM_TYPE_BIO_BASED > I fail to see how this succeeds given drivers/md/dm-ioctl.c:is_valid_type() only allows transitions from: DM_TYPE_BIO_BASED => DM_TYPE_DAX_BIO_BASED > dm-linear: [fsdax pmem][fsdax raw] => [fsdax pmem][fsdax pmem] > DM_TYPE_BIO_BASED DM_TYPE_DAX_BIO_BASED > > This seems arbitrary, as really the choice on whether to use DAX happens at > filesystem mount time. There's no guarantee that the in the first case > (double fsdax pmem) we were using the dax mount option with our file > system. > > Instead, get rid of DM_TYPE_DAX_BIO_BASED and all the special casing around > it, and instead make the request queue's QUEUE_FLAG_DAX be our one source > of truth. If this is set, we can use DAX, and if not, not. We keep this > up to date in table_load() as the table changes. As with regular block > devices the filesystem will then know at mount time whether DAX is a > supported mount option or not. If you don't think you need this specialization that is fine.. but DM devices supporting suspending (as part of table reloads) so is there any risk that there will be inflight IO (say if someone did 'dmsetup suspend --noflush').. and then upon reload the device type changed out from under us.. anyway, I don't have all the PMEM DAX stuff paged back into my head yet. But this just seems like we really shouldn't be allowing the transition from what was DM_TYPE_DAX_BIO_BASED back to DM_TYPE_BIO_BASED Mike > Signed-off-by: Ross Zwisler > --- > drivers/md/dm-ioctl.c | 16 ++++++---------- > drivers/md/dm-table.c | 25 ++++++++++--------------- > drivers/md/dm.c | 2 -- > include/linux/device-mapper.h | 8 ++++++-- > 4 files changed, 22 insertions(+), 29 deletions(-) > > diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c > index 5acf77de5945..d1f86d0bb2d0 100644 > --- a/drivers/md/dm-ioctl.c > +++ b/drivers/md/dm-ioctl.c > @@ -1292,15 +1292,6 @@ static int populate_table(struct dm_table *table, > return dm_table_complete(table); > } > > -static bool is_valid_type(enum dm_queue_mode cur, enum dm_queue_mode new) > -{ > - if (cur == new || > - (cur == DM_TYPE_BIO_BASED && new == DM_TYPE_DAX_BIO_BASED)) > - return true; > - > - return false; > -} > - > static int table_load(struct file *filp, struct dm_ioctl *param, size_t param_size) > { > int r; > @@ -1343,12 +1334,17 @@ static int table_load(struct file *filp, struct dm_ioctl *param, size_t param_si > DMWARN("unable to set up device queue for new table."); > goto err_unlock_md_type; > } > - } else if (!is_valid_type(dm_get_md_type(md), dm_table_get_type(t))) { > + } else if (dm_get_md_type(md) != dm_table_get_type(t)) { > DMWARN("can't change device type after initial table load."); > r = -EINVAL; > goto err_unlock_md_type; > } > > + if (dm_table_supports_dax(t)) > + blk_queue_flag_set(QUEUE_FLAG_DAX, md->queue); > + else > + blk_queue_flag_clear(QUEUE_FLAG_DAX, md->queue); > + > dm_unlock_md_type(md); > > /* stage inactive table */ > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c > index 5bb994b012ca..ea5c4a1e6f5b 100644 > --- a/drivers/md/dm-table.c > +++ b/drivers/md/dm-table.c > @@ -866,7 +866,6 @@ EXPORT_SYMBOL(dm_consume_args); > static bool __table_type_bio_based(enum dm_queue_mode table_type) > { > return (table_type == DM_TYPE_BIO_BASED || > - table_type == DM_TYPE_DAX_BIO_BASED || > table_type == DM_TYPE_NVME_BIO_BASED); > } > > @@ -888,7 +887,7 @@ static int device_supports_dax(struct dm_target *ti, struct dm_dev *dev, > return bdev_dax_supported(dev->bdev, PAGE_SIZE); > } > > -static bool dm_table_supports_dax(struct dm_table *t) > +bool dm_table_supports_dax(struct dm_table *t) > { > struct dm_target *ti; > unsigned i; > @@ -907,6 +906,7 @@ static bool dm_table_supports_dax(struct dm_table *t) > > return true; > } > +EXPORT_SYMBOL_GPL(dm_table_supports_dax); > > static bool dm_table_does_not_support_partial_completion(struct dm_table *t); > > @@ -944,7 +944,6 @@ static int dm_table_determine_type(struct dm_table *t) > /* possibly upgrade to a variant of bio-based */ > goto verify_bio_based; > } > - BUG_ON(t->type == DM_TYPE_DAX_BIO_BASED); > BUG_ON(t->type == DM_TYPE_NVME_BIO_BASED); > goto verify_rq_based; > } > @@ -981,18 +980,14 @@ static int dm_table_determine_type(struct dm_table *t) > verify_bio_based: > /* We must use this table as bio-based */ > t->type = DM_TYPE_BIO_BASED; > - if (dm_table_supports_dax(t) || > - (list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) { > - t->type = DM_TYPE_DAX_BIO_BASED; > - } else { > - /* Check if upgrading to NVMe bio-based is valid or required */ > - tgt = dm_table_get_immutable_target(t); > - if (tgt && !tgt->max_io_len && dm_table_does_not_support_partial_completion(t)) { > - t->type = DM_TYPE_NVME_BIO_BASED; > - goto verify_rq_based; /* must be stacked directly on NVMe (blk-mq) */ > - } else if (list_empty(devices) && live_md_type == DM_TYPE_NVME_BIO_BASED) { > - t->type = DM_TYPE_NVME_BIO_BASED; > - } > + > + /* Check if upgrading to NVMe bio-based is valid or required */ > + tgt = dm_table_get_immutable_target(t); > + if (tgt && !tgt->max_io_len && dm_table_does_not_support_partial_completion(t)) { > + t->type = DM_TYPE_NVME_BIO_BASED; > + goto verify_rq_based; /* must be stacked directly on NVMe (blk-mq) */ > + } else if (list_empty(devices) && live_md_type == DM_TYPE_NVME_BIO_BASED) { > + t->type = DM_TYPE_NVME_BIO_BASED; > } > return 0; > } > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > index 9728433362d1..0ce06fa292fd 100644 > --- a/drivers/md/dm.c > +++ b/drivers/md/dm.c > @@ -2192,7 +2192,6 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t) > } > break; > case DM_TYPE_BIO_BASED: > - case DM_TYPE_DAX_BIO_BASED: > dm_init_normal_md_queue(md); > blk_queue_make_request(md->queue, dm_make_request); > break; > @@ -2910,7 +2909,6 @@ struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_qu > > switch (type) { > case DM_TYPE_BIO_BASED: > - case DM_TYPE_DAX_BIO_BASED: > case DM_TYPE_NVME_BIO_BASED: > pool_size = max(dm_get_reserved_bio_based_ios(), min_pool_size); > front_pad = roundup(per_io_data_size, __alignof__(struct dm_target_io)) + offsetof(struct dm_target_io, clone); > diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h > index 31fef7c34185..cbf3d7e7ed33 100644 > --- a/include/linux/device-mapper.h > +++ b/include/linux/device-mapper.h > @@ -27,8 +27,7 @@ enum dm_queue_mode { > DM_TYPE_BIO_BASED = 1, > DM_TYPE_REQUEST_BASED = 2, > DM_TYPE_MQ_REQUEST_BASED = 3, > - DM_TYPE_DAX_BIO_BASED = 4, > - DM_TYPE_NVME_BIO_BASED = 5, > + DM_TYPE_NVME_BIO_BASED = 4, > }; > > typedef enum { STATUSTYPE_INFO, STATUSTYPE_TABLE } status_type_t; > @@ -460,6 +459,11 @@ void dm_table_add_target_callbacks(struct dm_table *t, struct dm_target_callback > */ > void dm_table_set_type(struct dm_table *t, enum dm_queue_mode type); > > +/* > + * Check to see if this target type and all table devices support DAX. > + */ > +bool dm_table_supports_dax(struct dm_table *t); > + > /* > * Finally call this to make the table ready for use. > */ > -- > 2.14.3 >