2017-12-11 16:22:16

by Scott Bauer

[permalink] [raw]
Subject: [PATCH v2 0/2] dm-unstripe




V1->v2 Changes:
1) Fixed up some spelling errors in documentation.
2) Cleaned up some variable names to something more appropriate.



2017-12-11 16:22:19

by Scott Bauer

[permalink] [raw]
Subject: [PATCH v2 1/2] dm-unstripe: unstripe of IO across RAID 0

This device mapper module remaps and unstripes IO so it lands
solely on a single drive in a RAID 0. In a 4 drive RAID 0 the
mapper exposes 1/4th of the LBA range as a virtual drive.
Each IO to that virtual drive will land on only one of the 4
drives, selected by the user.

As an example:

Intel NVMe drives contain two cores on the physical device.
Each core of the drive has segregated access to its LBA range.
The current LBA model has a RAID 0 128k stripe across the two cores:

Core 0: Core 1:
__________ __________
| LBA 511| | LBA 768|
| LBA 0 | | LBA 256|

The purpose of this unstriping is to provide better QoS in noisy
neighbor environments. When two partitions are created on the
aggregate drive without this unstriping, reads on one partition
can affect writes on another partition. With the striping concurrent
reads and writes and I/O on opposite cores have lower completion times,
and better tail latencies.

Signed-off-by: Scott Bauer <[email protected]>
---
drivers/md/Kconfig | 10 +++
drivers/md/Makefile | 1 +
drivers/md/dm-unstripe.c | 197 +++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 208 insertions(+)
create mode 100644 drivers/md/dm-unstripe.c

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 83b9362be09c..948874fcc67c 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -269,6 +269,16 @@ config DM_BIO_PRISON

source "drivers/md/persistent-data/Kconfig"

+config DM_UN_STRIPE
+ tristate "Transpose IO to individual drives on a raid device"
+ depends on BLK_DEV_DM
+ ---help---
+ Enable this feature if you with to unstripe I/O on a RAID 0
+ device to the respective drive. If your hardware has physical
+ RAID 0 this module can unstripe the I/O to respective sides.
+
+ If unsure say N.
+
config DM_CRYPT
tristate "Crypt target support"
depends on BLK_DEV_DM
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index f701bb211783..2cc380b71319 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -43,6 +43,7 @@ obj-$(CONFIG_BCACHE) += bcache/
obj-$(CONFIG_BLK_DEV_MD) += md-mod.o
obj-$(CONFIG_BLK_DEV_DM) += dm-mod.o
obj-$(CONFIG_BLK_DEV_DM_BUILTIN) += dm-builtin.o
+obj-$(CONFIG_DM_UN_STRIPE) += dm-unstripe.o
obj-$(CONFIG_DM_BUFIO) += dm-bufio.o
obj-$(CONFIG_DM_BIO_PRISON) += dm-bio-prison.o
obj-$(CONFIG_DM_CRYPT) += dm-crypt.o
diff --git a/drivers/md/dm-unstripe.c b/drivers/md/dm-unstripe.c
new file mode 100644
index 000000000000..cca91108688f
--- /dev/null
+++ b/drivers/md/dm-unstripe.c
@@ -0,0 +1,197 @@
+/*
+ * Copyright © 2017 Intel Corporation
+ *
+ * Authors:
+ * Scott Bauer <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ */
+
+#include "dm.h"
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/blkdev.h>
+#include <linux/bio.h>
+#include <linux/slab.h>
+#include <linux/bitops.h>
+#include <linux/device-mapper.h>
+
+
+struct unstripe {
+ struct dm_dev *ddisk;
+ unsigned int max_hw_sectors;
+ unsigned int chunk_sector;
+ u64 stripe_shift;
+ u8 cur_stripe;
+};
+
+
+#define DM_MSG_PREFIX "dm-unstripe"
+static const char *parse_err = "Please provide the necessary information:"
+ "<drive> <set (0 indexed)> <total_sets>"
+ " <stripe size in 512B sectors || 0 to use max hw sector size>";
+
+/*
+ * Argument layout:
+ * <drive> <set> <total_sets> <stripe size in KB>
+ */
+static int set_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+{
+ struct block_device *bbdev;
+ struct unstripe *target;
+ unsigned int stripe_size;
+ u64 tot_sec, mod;
+ u8 set, num_sets;
+ char dummy;
+ int ret;
+
+ if (argc != 4) {
+ DMERR("%s", parse_err);
+ return -EINVAL;
+ }
+
+ if (sscanf(argv[1], "%hhu%c", &set, &dummy) != 1 ||
+ sscanf(argv[2], "%hhu%c", &num_sets, &dummy) != 1 ||
+ sscanf(argv[3], "%u%c", &stripe_size, &dummy) != 1) {
+ DMERR("%s", parse_err);
+ return -EINVAL;
+ }
+
+ if (num_sets == 0 || (set > num_sets && num_sets > 1)) {
+ DMERR("Please provide a set between [0,%hhu)", num_sets);
+ return -EINVAL;
+ }
+
+ target = kzalloc(sizeof(*target), GFP_KERNEL);
+
+ if (!target) {
+ DMERR("Failed to allocate space for DM unstripe!");
+ return -ENOMEM;
+ }
+
+ ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table),
+ &target->ddisk);
+ if (ret) {
+ kfree(target);
+ DMERR("dm-unstripe dev lookup failure! for drive %s", argv[0]);
+ return ret;
+ }
+
+ bbdev = target->ddisk->bdev;
+
+ target->cur_stripe = set;
+ if (stripe_size)
+ target->max_hw_sectors = stripe_size;
+ else
+ target->max_hw_sectors =
+ queue_max_hw_sectors(bdev_get_queue(bbdev));
+
+ target->chunk_sector = (num_sets - 1) * target->max_hw_sectors;
+ target->stripe_shift = fls(target->max_hw_sectors) - 1;
+
+ dm_set_target_max_io_len(ti, target->max_hw_sectors);
+ ti->private = target;
+
+ tot_sec = i_size_read(bbdev->bd_inode) >> 9;
+ mod = tot_sec % target->max_hw_sectors;
+
+ if (ti->len == 1)
+ ti->len = (tot_sec / num_sets) - mod;
+ ti->begin = 0;
+ return 0;
+}
+
+static void set_dtr(struct dm_target *ti)
+{
+ struct unstripe *target = ti->private;
+
+ dm_put_device(ti, target->ddisk);
+ kfree(target);
+}
+
+
+static sector_t map_to_core(struct dm_target *ti, struct bio *bio)
+{
+ struct unstripe *target = ti->private;
+ unsigned long long sec = bio->bi_iter.bi_sector;
+ unsigned long long group;
+
+ group = (sec >> target->stripe_shift);
+ /* Account for what drive we're operating on */
+ sec += (target->cur_stripe * target->max_hw_sectors);
+ /* Shift us up to the right "row" on the drive*/
+ sec += target->chunk_sector * group;
+ return sec;
+}
+
+static int set_map_bio(struct dm_target *ti, struct bio *bio)
+{
+ struct unstripe *target = ti->private;
+
+ if (bio_sectors(bio))
+ bio->bi_iter.bi_sector = map_to_core(ti, bio);
+
+ bio_set_dev(bio, target->ddisk->bdev);
+ submit_bio(bio);
+ return DM_MAPIO_SUBMITTED;
+}
+
+static void set_iohints(struct dm_target *ti,
+ struct queue_limits *limits)
+{
+ struct unstripe *target = ti->private;
+ struct queue_limits *lim = &bdev_get_queue(target->ddisk->bdev)->limits;
+
+ blk_limits_io_min(limits, lim->io_min);
+ blk_limits_io_opt(limits, lim->io_opt);
+ limits->chunk_sectors = target->max_hw_sectors;
+}
+
+static int set_iterate(struct dm_target *ti, iterate_devices_callout_fn fn,
+ void *data)
+{
+ struct unstripe *target = ti->private;
+
+ return fn(ti, target->ddisk, 0, ti->len, data);
+}
+
+static struct target_type iset_target = {
+ .name = "dm-unstripe",
+ .version = {1, 0, 0},
+ .module = THIS_MODULE,
+ .ctr = set_ctr,
+ .dtr = set_dtr,
+ .map = set_map_bio,
+ .iterate_devices = set_iterate,
+ .io_hints = set_iohints,
+};
+
+static int __init dm_unstripe_init(void)
+{
+ int r = dm_register_target(&iset_target);
+
+ if (r < 0)
+ DMERR("register failed %d", r);
+
+ return r;
+}
+
+static void __exit dm_unstripe_exit(void)
+{
+ dm_unregister_target(&iset_target);
+}
+
+module_init(dm_unstripe_init);
+module_exit(dm_unstripe_exit);
+
+MODULE_DESCRIPTION(DM_NAME " DM unstripe");
+MODULE_ALIAS("dm-unstripe");
+MODULE_AUTHOR("Scott Bauer <[email protected]>");
+MODULE_LICENSE("GPL");
--
2.11.0

2017-12-11 16:22:53

by Scott Bauer

[permalink] [raw]
Subject: [PATCH v2 2/2] dm unstripe: Add documentation for unstripe target

Signed-off-by: Scott Bauer <[email protected]>
---
Documentation/device-mapper/dm-unstripe.txt | 82 +++++++++++++++++++++++++++++
1 file changed, 82 insertions(+)
create mode 100644 Documentation/device-mapper/dm-unstripe.txt

diff --git a/Documentation/device-mapper/dm-unstripe.txt b/Documentation/device-mapper/dm-unstripe.txt
new file mode 100644
index 000000000000..4e1a0a39a689
--- /dev/null
+++ b/Documentation/device-mapper/dm-unstripe.txt
@@ -0,0 +1,82 @@
+Device-Mapper Unstripe
+=====================
+
+The device-mapper Unstripe (dm-unstripe) target provides a transparent
+mechanism to unstripe a RAID 0 striping to access segregated disks.
+
+This module should be used by users who understand what the underlying
+disks look like behind the software/hardware RAID.
+
+Parameters:
+<drive (ex: /dev/nvme0n1)> <drive #> <# of drives> <stripe sectors>
+
+
+<drive>
+ The block device you wish to unstripe.
+
+<drive #>
+ The physical drive you wish to expose via this "virtual" device
+ mapper target. This must be 0 indexed.
+
+<# of drives>
+ The number of drives in the RAID 0.
+
+<stripe sectors>
+ The amount of 512B sectors in the raid striping, or zero, if you
+ wish you use max_hw_sector_size.
+
+
+Why use this module?
+=====================
+
+As a use case:
+
+
+ As an example:
+
+ Intel NVMe drives contain two cores on the physical device.
+ Each core of the drive has segregated access to its LBA range.
+ The current LBA model has a RAID 0 128k stripe across the two cores:
+
+ Core 0: Core 1:
+ __________ __________
+ | LBA 511| | LBA 768|
+ | LBA 0 | | LBA 256|
+ ⎻⎻⎻⎻⎻⎻⎻⎻⎻⎻ ⎻⎻⎻⎻⎻⎻⎻⎻⎻⎻
+
+ The purpose of this unstriping is to provide better QoS in noisy
+ neighbor environments. When two partitions are created on the
+ aggregate drive without this unstriping, reads on one partition
+ can affect writes on another partition. With the striping concurrent
+ reads and writes and I/O on opposite cores have lower completion times,
+ and better tail latencies.
+
+ With the module we were able to segregate a fio script that has read and
+ write jobs that are independent of each other. Compared to when we run
+ the test on a combined drive with partitions, we were able to get a 92%
+ reduction in five-9ths read latency using this device mapper target.
+
+
+ One could use the module to Logical de-pop a HDD if you have sufficient
+ geometry information regarding the drive.
+
+
+Example scripts:
+====================
+
+dmsetup create nvmset1 --table '0 1 dm-unstripe /dev/nvme0n1 1 2 0'
+dmsetup create nvmset0 --table '0 1 dm-unstripe /dev/nvme0n1 0 2 0'
+
+There will now be two mappers:
+/dev/mapper/nvmset1
+/dev/mapper/nvmset0
+
+that will expose core 0 and core 1.
+
+
+In a Raid 0 with 4 drives of stripe size 128K:
+dmsetup create raid_disk0 --table '0 1 dm-unstripe /dev/nvme0n1 0 4 256'
+dmsetup create raid_disk1 --table '0 1 dm-unstripe /dev/nvme0n1 1 4 256'
+dmsetup create raid_disk2 --table '0 1 dm-unstripe /dev/nvme0n1 2 4 256'
+dmsetup create raid_disk3 --table '0 1 dm-unstripe /dev/nvme0n1 3 4 256'
+
--
2.11.0

2017-12-11 23:17:43

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] dm unstripe: Add documentation for unstripe target

On Mon, Dec 11, 2017 at 09:00:19AM -0700, Scott Bauer wrote:
> +Example scripts:
> +====================
> +
> +dmsetup create nvmset1 --table '0 1 dm-unstripe /dev/nvme0n1 1 2 0'
> +dmsetup create nvmset0 --table '0 1 dm-unstripe /dev/nvme0n1 0 2 0'
> +
> +There will now be two mappers:
> +/dev/mapper/nvmset1
> +/dev/mapper/nvmset0
> +
> +that will expose core 0 and core 1.
> +
> +
> +In a Raid 0 with 4 drives of stripe size 128K:
> +dmsetup create raid_disk0 --table '0 1 dm-unstripe /dev/nvme0n1 0 4 256'
> +dmsetup create raid_disk1 --table '0 1 dm-unstripe /dev/nvme0n1 1 4 256'
> +dmsetup create raid_disk2 --table '0 1 dm-unstripe /dev/nvme0n1 2 4 256'
> +dmsetup create raid_disk3 --table '0 1 dm-unstripe /dev/nvme0n1 3 4 256'

While this device mapper is intended for H/W RAID where the member disks
are hidden, we can test it using DM software striping so we don't need
any particular hardware.

Here's a little test script I wrote for that. It sets up a striped
device backed by files, unstripes it into different sets, then compares
each to its original backing file after writing random data to it,
cleaning up the test artifacts before exiting. The parameters at the
top can be modified to test different striping scenarios.

---
#!/bin/bash

MEMBER_SIZE=$((128 * 1024 * 1024))
NUM=4
SEQ_END=$((${NUM}-1))
CHUNK=256
BS=4096

RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512))
DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}"
COUNT=$((${MEMBER_SIZE} / ${BS}))

for i in $(seq 0 ${SEQ_END}); do
dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct
losetup /dev/loop${i} member-${i}
DM_PARMS+=" /dev/loop${i} 0"
done

echo $DM_PARMS | dmsetup create raid0
for i in $(seq 0 ${SEQ_END}); do
echo "0 1 dm-unstripe /dev/mapper/raid0 ${i} ${NUM} ${CHUNK}" | dmsetup create set-${i}
done;

for i in $(seq 0 ${SEQ_END}); do
dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} oflag=direct
diff /dev/mapper/set-${i} member-${i}
done;

for i in $(seq 0 ${SEQ_END}); do
dmsetup remove set-${i}
done
dmsetup remove raid0

for i in $(seq 0 ${SEQ_END}); do
losetup -d /dev/loop${i}
rm -f member-${i}
done
--

2017-12-11 23:22:15

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] dm-unstripe: unstripe of IO across RAID 0

On Mon, Dec 11, 2017 at 09:00:18AM -0700, Scott Bauer wrote:
> +
> + dm_set_target_max_io_len(ti, target->max_hw_sectors);

The return for this function has "__must_check", so it's currently
throwing an a compiler warning.

Otherwise, this looks like it's doing what you want, and tests
successfully on my synthetic workloads.

Acked-by: Keith Busch <[email protected]>

2017-12-12 11:35:26

by Nikolay Borisov

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] dm unstripe: Add documentation for unstripe target



On 11.12.2017 18:00, Scott Bauer wrote:
> Signed-off-by: Scott Bauer <[email protected]>
> ---
> Documentation/device-mapper/dm-unstripe.txt | 82 +++++++++++++++++++++++++++++
> 1 file changed, 82 insertions(+)
> create mode 100644 Documentation/device-mapper/dm-unstripe.txt
>
> diff --git a/Documentation/device-mapper/dm-unstripe.txt b/Documentation/device-mapper/dm-unstripe.txt
> new file mode 100644
> index 000000000000..4e1a0a39a689
> --- /dev/null
> +++ b/Documentation/device-mapper/dm-unstripe.txt
> @@ -0,0 +1,82 @@
> +Device-Mapper Unstripe
> +=====================
> +
> +The device-mapper Unstripe (dm-unstripe) target provides a transparent
> +mechanism to unstripe a RAID 0 striping to access segregated disks.
> +
> +This module should be used by users who understand what the underlying
> +disks look like behind the software/hardware RAID.
> +
> +Parameters:
> +<drive (ex: /dev/nvme0n1)> <drive #> <# of drives> <stripe sectors>
> +
> +
> +<drive>
> + The block device you wish to unstripe.
> +
> +<drive #>
> + The physical drive you wish to expose via this "virtual" device
> + mapper target. This must be 0 indexed.
> +
> +<# of drives>
> + The number of drives in the RAID 0.
> +
> +<stripe sectors>
> + The amount of 512B sectors in the raid striping, or zero, if you
> + wish you use max_hw_sector_size.
> +
> +
> +Why use this module?
> +=====================
> +
> +As a use case:
> +
> +
> + As an example:
> +
> + Intel NVMe drives contain two cores on the physical device.
> + Each core of the drive has segregated access to its LBA range.
> + The current LBA model has a RAID 0 128k stripe across the two cores:
> +
> + Core 0: Core 1:
> + __________ __________
> + | LBA 511| | LBA 768|
> + | LBA 0 | | LBA 256|
> + ⎻⎻⎻⎻⎻⎻⎻⎻⎻⎻ ⎻⎻⎻⎻⎻⎻⎻⎻⎻⎻

If it's 128k stripe shouldn't it be LBAs 0/256 on core0 and LBAs 128/511
on core1?

> +
> + The purpose of this unstriping is to provide better QoS in noisy
> + neighbor environments. When two partitions are created on the
> + aggregate drive without this unstriping, reads on one partition
> + can affect writes on another partition. With the striping concurrent
> + reads and writes and I/O on opposite cores have lower completion times,
> + and better tail latencies.
> +
> + With the module we were able to segregate a fio script that has read and
> + write jobs that are independent of each other. Compared to when we run
> + the test on a combined drive with partitions, we were able to get a 92%
> + reduction in five-9ths read latency using this device mapper target.
> +
> +
> + One could use the module to Logical de-pop a HDD if you have sufficient
> + geometry information regarding the drive.
> +
> +
> +Example scripts:
> +====================
> +
> +dmsetup create nvmset1 --table '0 1 dm-unstripe /dev/nvme0n1 1 2 0'
> +dmsetup create nvmset0 --table '0 1 dm-unstripe /dev/nvme0n1 0 2 0'
> +
> +There will now be two mappers:
> +/dev/mapper/nvmset1
> +/dev/mapper/nvmset0
> +
> +that will expose core 0 and core 1.
> +
> +
> +In a Raid 0 with 4 drives of stripe size 128K:
> +dmsetup create raid_disk0 --table '0 1 dm-unstripe /dev/nvme0n1 0 4 256'
> +dmsetup create raid_disk1 --table '0 1 dm-unstripe /dev/nvme0n1 1 4 256'
> +dmsetup create raid_disk2 --table '0 1 dm-unstripe /dev/nvme0n1 2 4 256'
> +dmsetup create raid_disk3 --table '0 1 dm-unstripe /dev/nvme0n1 3 4 256'
> +
>

2017-12-12 14:42:19

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] dm unstripe: Add documentation for unstripe target

On Tue, Dec 12, 2017 at 01:35:13PM +0200, Nikolay Borisov wrote:
> On 11.12.2017 18:00, Scott Bauer wrote:
> > + As an example:
> > +
> > + Intel NVMe drives contain two cores on the physical device.
> > + Each core of the drive has segregated access to its LBA range.
> > + The current LBA model has a RAID 0 128k stripe across the two cores:
> > +
> > + Core 0: Core 1:
> > + __________ __________
> > + | LBA 511| | LBA 768|
> > + | LBA 0 | | LBA 256|
> > + ⎻⎻⎻⎻⎻⎻⎻⎻⎻⎻ ⎻⎻⎻⎻⎻⎻⎻⎻⎻⎻
>
> If it's 128k stripe shouldn't it be LBAs 0/256 on core0 and LBAs 128/511
> on core1?

Ah, this device's makers call the "stripe" size what should be called
"chunk". This device has a 128k chunk per core with two cores, so the
full stripe is 256k. The above should have core 0 owning LBA 512 rather
than 511 (assuming 512b LBA format).

2017-12-12 14:56:20

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] dm unstripe: Add documentation for unstripe target

On Tue, Dec 12, 2017 at 07:45:56AM -0700, Keith Busch wrote:
> Ah, this device's makers call the "stripe" size what should be called
> "chunk".

If this target is to go anywhere, let's try to define it as 'undoing'
the existing dm-stripe target using primary terminology, field names
etc. as close as possible to our existing target. We can still mention
any alternative terminology encountered in the documentation. The
first (simplest) documented example could be a dm-stripe target being
'unstriped'.

Alasdair

2017-12-12 18:10:18

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] dm unstripe: Add documentation for unstripe target

On Mon, Dec 11 2017 at 11:00am -0500,
Scott Bauer <[email protected]> wrote:

> Signed-off-by: Scott Bauer <[email protected]>
> ---
> Documentation/device-mapper/dm-unstripe.txt | 82 +++++++++++++++++++++++++++++
> 1 file changed, 82 insertions(+)
> create mode 100644 Documentation/device-mapper/dm-unstripe.txt
>
> diff --git a/Documentation/device-mapper/dm-unstripe.txt b/Documentation/device-mapper/dm-unstripe.txt
> new file mode 100644
> index 000000000000..4e1a0a39a689
> --- /dev/null
> +++ b/Documentation/device-mapper/dm-unstripe.txt
> @@ -0,0 +1,82 @@
> +Device-Mapper Unstripe
> +=====================
> +
> +The device-mapper Unstripe (dm-unstripe) target provides a transparent
> +mechanism to unstripe a RAID 0 striping to access segregated disks.
> +
> +This module should be used by users who understand what the underlying
> +disks look like behind the software/hardware RAID.
> +
> +Parameters:
> +<drive (ex: /dev/nvme0n1)> <drive #> <# of drives> <stripe sectors>
> +
> +
> +<drive>
> + The block device you wish to unstripe.
> +
> +<drive #>
> + The physical drive you wish to expose via this "virtual" device
> + mapper target. This must be 0 indexed.
> +
> +<# of drives>
> + The number of drives in the RAID 0.
> +
> +<stripe sectors>
> + The amount of 512B sectors in the raid striping, or zero, if you
> + wish you use max_hw_sector_size.
> +
> +
> +Why use this module?
> +=====================
> +
> +As a use case:
> +
> +
> + As an example:
> +
> + Intel NVMe drives contain two cores on the physical device.
> + Each core of the drive has segregated access to its LBA range.
> + The current LBA model has a RAID 0 128k stripe across the two cores:
> +
> + Core 0: Core 1:
> + __________ __________
> + | LBA 511| | LBA 768|
> + | LBA 0 | | LBA 256|
> + ⎻⎻⎻⎻⎻⎻⎻⎻⎻⎻ ⎻⎻⎻⎻⎻⎻⎻⎻⎻⎻
> +
> + The purpose of this unstriping is to provide better QoS in noisy
> + neighbor environments. When two partitions are created on the
> + aggregate drive without this unstriping, reads on one partition
> + can affect writes on another partition. With the striping concurrent
> + reads and writes and I/O on opposite cores have lower completion times,
> + and better tail latencies.
> +
> + With the module we were able to segregate a fio script that has read and
> + write jobs that are independent of each other. Compared to when we run
> + the test on a combined drive with partitions, we were able to get a 92%
> + reduction in five-9ths read latency using this device mapper target.
> +
> +
> + One could use the module to Logical de-pop a HDD if you have sufficient
> + geometry information regarding the drive.

OK, but I'm left wondering: why doesn't the user avoid striping across
the cores?

Do the Intel NVMe drives not provide the ability to present 1 device per
NVMe core?

This DM target seems like a pretty nasty workaround for what should be
fixed in the NVMe drive's firmware.

Mainly because there is no opportunity to use both striped and unstriped
access to the same NVMe drive. So why impose striped on the user in the
first place?

Mike

2017-12-12 19:24:47

by Scott Bauer

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] dm unstripe: Add documentation for unstripe target

On Tue, Dec 12, 2017 at 01:10:13PM -0500, Mike Snitzer wrote:
> On Mon, Dec 11 2017 at 11:00am -0500,
> Scott Bauer <[email protected]> wrote:
>
> OK, but I'm left wondering: why doesn't the user avoid striping across
> the cores?
>
> Do the Intel NVMe drives not provide the ability to present 1 device per
> NVMe core?
>
> This DM target seems like a pretty nasty workaround for what should be
> fixed in the NVMe drive's firmware.
>
> Mainly because there is no opportunity to use both striped and unstriped
> access to the same NVMe drive. So why impose striped on the user in the
> first place?
>
> Mike

Unfortunately, the NVMe drives do not currently support exposing each core
as seperate drives or namespaces. While it would be preferable if the
controllers did expose such features, firmware development informed us there
are sufficient reasons why it isn't possible for the existing generation of
drives.

The NVMe working group just finalized a standard that presents isolated storage
sets for users who wish to use them. As the standard was just recently finalized
the use case was created well after the targeted drives were created. The Intel
drives just so happen to have independent back-ends that align with the isolated
storage sets use case. We would like to provide a way to exploit the independent
back-ends to expose isolated storage in a way that is generic across all the Intel
NVMe drive familes. The implementation is generic enough that it can be applied
to any storage that has physical seperation at known block boundaries.