2012-10-02 01:59:18

by Ed L. Cashin

[permalink] [raw]
Subject: [PATCH 1/7] aoe: support more AoE addresses with dynamic block device minor numbers

The ATA over Ethernet protocol uses a major (shelf) and
minor (slot) address to identify a particular storage target.
These changes remove an artificial limitation the aoe driver
imposes on the use of AoE addresses. For example, without these
changes, the slot address has a maximum of 15, but users commonly
use slot numbers much greater than that.

The AoE shelf and slot address space is often used sparsely.
Instead of using a static mapping between AoE addresses and the
block device minor number, the block device minor numbers are now
allocated on demand.

Signed-off-by: Ed Cashin <[email protected]>
---
drivers/block/aoe/aoe.h | 6 ++--
drivers/block/aoe/aoeblk.c | 2 +-
drivers/block/aoe/aoechr.c | 2 +-
drivers/block/aoe/aoecmd.c | 25 ++++---------
drivers/block/aoe/aoedev.c | 86 ++++++++++++++++++++++++++++++--------------
5 files changed, 72 insertions(+), 49 deletions(-)

diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index 27d0a21..7b694f7 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -49,6 +49,8 @@ struct aoe_hdr {
__be32 tag;
};

+#define AOE_MAXSHELF (0xffff-1) /* one less than the broadcast shelf address */
+
struct aoe_atahdr {
unsigned char aflags;
unsigned char errfeat;
@@ -211,8 +213,7 @@ void aoe_ktstop(struct ktstate *k);

int aoedev_init(void);
void aoedev_exit(void);
-struct aoedev *aoedev_by_aoeaddr(int maj, int min);
-struct aoedev *aoedev_by_sysminor_m(ulong sysminor);
+struct aoedev *aoedev_by_aoeaddr(ulong maj, int min, int do_alloc);
void aoedev_downdev(struct aoedev *d);
int aoedev_flush(const char __user *str, size_t size);
void aoe_failbuf(struct aoedev *, struct buf *);
@@ -223,4 +224,3 @@ void aoenet_exit(void);
void aoenet_xmit(struct sk_buff_head *);
int is_aoe_netif(struct net_device *ifp);
int set_aoe_iflist(const char __user *str, size_t size);
-
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 83160ab..00dfc50 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -249,7 +249,7 @@ aoeblk_gdalloc(void *vp)
q->queuedata = d;
d->gd = gd;
gd->major = AOE_MAJOR;
- gd->first_minor = d->sysminor * AOE_PARTITIONS;
+ gd->first_minor = d->sysminor;
gd->fops = &aoe_bdops;
gd->private_data = d;
set_capacity(gd, d->ssize);
diff --git a/drivers/block/aoe/aoechr.c b/drivers/block/aoe/aoechr.c
index deb30c1..ed57a89 100644
--- a/drivers/block/aoe/aoechr.c
+++ b/drivers/block/aoe/aoechr.c
@@ -91,7 +91,7 @@ revalidate(const char __user *str, size_t size)
pr_err("aoe: invalid device specification %s\n", buf);
return -EINVAL;
}
- d = aoedev_by_aoeaddr(major, minor);
+ d = aoedev_by_aoeaddr(major, minor, 0);
if (!d)
return -EINVAL;
spin_lock_irqsave(&d->lock, flags);
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index 39dacdb..94e810c 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -1149,7 +1149,7 @@ aoecmd_ata_rsp(struct sk_buff *skb)

h = (struct aoe_hdr *) skb->data;
aoemajor = be16_to_cpu(get_unaligned(&h->major));
- d = aoedev_by_aoeaddr(aoemajor, h->minor);
+ d = aoedev_by_aoeaddr(aoemajor, h->minor, 0);
if (d == NULL) {
snprintf(ebuf, sizeof ebuf, "aoecmd_ata_rsp: ata response "
"for unknown device %d.%d\n",
@@ -1330,7 +1330,7 @@ aoecmd_cfg_rsp(struct sk_buff *skb)
struct aoe_hdr *h;
struct aoe_cfghdr *ch;
struct aoetgt *t;
- ulong flags, sysminor, aoemajor;
+ ulong flags, aoemajor;
struct sk_buff *sl;
struct sk_buff_head queue;
u16 n;
@@ -1349,18 +1349,15 @@ aoecmd_cfg_rsp(struct sk_buff *skb)
"Check shelf dip switches.\n");
return;
}
- if (h->minor >= NPERSHELF) {
- pr_err("aoe: e%ld.%d %s, %d\n",
- aoemajor, h->minor,
- "slot number larger than the maximum",
- NPERSHELF-1);
+ if (aoemajor > AOE_MAXSHELF) {
+ pr_info("aoe: e%ld.%d: shelf number too large\n",
+ aoemajor, (int) h->minor);
return;
}

- sysminor = SYSMINOR(aoemajor, h->minor);
- if (sysminor * AOE_PARTITIONS + AOE_PARTITIONS > MINORMASK) {
- printk(KERN_INFO "aoe: e%ld.%d: minor number too large\n",
- aoemajor, (int) h->minor);
+ d = aoedev_by_aoeaddr(aoemajor, h->minor, 1);
+ if (d == NULL) {
+ pr_info("aoe: device allocation failure\n");
return;
}

@@ -1368,12 +1365,6 @@ aoecmd_cfg_rsp(struct sk_buff *skb)
if (n > aoe_maxout) /* keep it reasonable */
n = aoe_maxout;

- d = aoedev_by_sysminor_m(sysminor);
- if (d == NULL) {
- printk(KERN_INFO "aoe: device sysminor_m failure\n");
- return;
- }
-
spin_lock_irqsave(&d->lock, flags);

t = gettgt(d, h->src);
diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
index ccaecff..68a7a5a 100644
--- a/drivers/block/aoe/aoedev.c
+++ b/drivers/block/aoe/aoedev.c
@@ -9,6 +9,8 @@
#include <linux/netdevice.h>
#include <linux/delay.h>
#include <linux/slab.h>
+#include <linux/bitmap.h>
+#include <linux/kdev_t.h>
#include "aoe.h"

static void dummy_timer(ulong);
@@ -19,35 +21,63 @@ static void skbpoolfree(struct aoedev *d);
static struct aoedev *devlist;
static DEFINE_SPINLOCK(devlist_lock);

-/*
- * Users who grab a pointer to the device with aoedev_by_aoeaddr or
- * aoedev_by_sysminor_m automatically get a reference count and must
- * be responsible for performing a aoedev_put. With the addition of
- * async kthread processing I'm no longer confident that we can
- * guarantee consistency in the face of device flushes.
- *
- * For the time being, we only bother to add extra references for
- * frames sitting on the iocq. When the kthreads finish processing
- * these frames, they will aoedev_put the device.
+/* Because some systems will have one, many, or no
+ * - partitions,
+ * - slots per shelf,
+ * - or shelves,
+ * we need some flexibility in the way the minor numbers
+ * are allocated. So they are dynamic.
*/
-struct aoedev *
-aoedev_by_aoeaddr(int maj, int min)
+#define N_DEVS ((1U<<MINORBITS)/AOE_PARTITIONS)
+
+static DEFINE_SPINLOCK(used_minors_lock);
+static DECLARE_BITMAP(used_minors, N_DEVS);
+
+static int
+minor_get(ulong *minor)
{
- struct aoedev *d;
ulong flags;
+ ulong n;
+ int error = 0;
+
+ spin_lock_irqsave(&used_minors_lock, flags);
+ n = find_first_zero_bit(used_minors, N_DEVS);
+ if (n < N_DEVS)
+ set_bit(n, used_minors);
+ else
+ error = -1;
+ spin_unlock_irqrestore(&used_minors_lock, flags);
+
+ *minor = n * AOE_PARTITIONS;
+ return error;
+}

- spin_lock_irqsave(&devlist_lock, flags);
+static void
+minor_free(ulong minor)
+{
+ ulong flags;

- for (d=devlist; d; d=d->next)
- if (d->aoemajor == maj && d->aoeminor == min) {
- d->ref++;
- break;
- }
+ minor /= AOE_PARTITIONS;
+ BUG_ON(minor >= N_DEVS);

- spin_unlock_irqrestore(&devlist_lock, flags);
- return d;
+ spin_lock_irqsave(&used_minors_lock, flags);
+ BUG_ON(!test_bit(minor, used_minors));
+ clear_bit(minor, used_minors);
+ spin_unlock_irqrestore(&used_minors_lock, flags);
}

+/*
+ * Users who grab a pointer to the device with aoedev_by_aoeaddr
+ * automatically get a reference count and must be responsible
+ * for performing a aoedev_put. With the addition of async
+ * kthread processing I'm no longer confident that we can
+ * guarantee consistency in the face of device flushes.
+ *
+ * For the time being, we only bother to add extra references for
+ * frames sitting on the iocq. When the kthreads finish processing
+ * these frames, they will aoedev_put the device.
+ */
+
void
aoedev_put(struct aoedev *d)
{
@@ -159,6 +189,7 @@ aoedev_freedev(struct aoedev *d)
if (d->bufpool)
mempool_destroy(d->bufpool);
skbpoolfree(d);
+ minor_free(d->sysminor);
kfree(d);
}

@@ -246,22 +277,23 @@ skbpoolfree(struct aoedev *d)
__skb_queue_head_init(&d->skbpool);
}

-/* find it or malloc it */
+/* find it or allocate it */
struct aoedev *
-aoedev_by_sysminor_m(ulong sysminor)
+aoedev_by_aoeaddr(ulong maj, int min, int do_alloc)
{
struct aoedev *d;
int i;
ulong flags;
+ ulong sysminor;

spin_lock_irqsave(&devlist_lock, flags);

for (d=devlist; d; d=d->next)
- if (d->sysminor == sysminor) {
+ if (d->aoemajor == maj && d->aoeminor == min) {
d->ref++;
break;
}
- if (d)
+ if (d || !do_alloc || minor_get(&sysminor) < 0)
goto out;
d = kcalloc(1, sizeof *d, GFP_ATOMIC);
if (!d)
@@ -280,8 +312,8 @@ aoedev_by_sysminor_m(ulong sysminor)
for (i = 0; i < NFACTIVE; i++)
INIT_LIST_HEAD(&d->factive[i]);
d->sysminor = sysminor;
- d->aoemajor = AOEMAJOR(sysminor);
- d->aoeminor = AOEMINOR(sysminor);
+ d->aoemajor = maj;
+ d->aoeminor = min;
d->mintimer = MINTIMER;
d->next = devlist;
devlist = d;
--
1.7.1


2012-10-02 02:01:21

by Ed L. Cashin

[permalink] [raw]
Subject: [PATCH 2/7] aoe: retain static block device numbers for backwards compatibility

The old mapping between AoE target shelf and slot addresses and
the block device minor number is retained as a
backwards-compatible feature, with a new "aoe_dyndevs" module
parameter available for enabling dynamic block device minor
numbers.

Signed-off-by: Ed Cashin <[email protected]>
---
drivers/block/aoe/aoedev.c | 54 +++++++++++++++++++++++++++++++++++++++++--
1 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
index 68a7a5a..3d494fd 100644
--- a/drivers/block/aoe/aoedev.c
+++ b/drivers/block/aoe/aoedev.c
@@ -11,6 +11,7 @@
#include <linux/slab.h>
#include <linux/bitmap.h>
#include <linux/kdev_t.h>
+#include <linux/moduleparam.h>
#include "aoe.h"

static void dummy_timer(ulong);
@@ -18,6 +19,10 @@ static void aoedev_freedev(struct aoedev *);
static void freetgt(struct aoedev *d, struct aoetgt *t);
static void skbpoolfree(struct aoedev *d);

+static int aoe_dyndevs;
+module_param(aoe_dyndevs, int, 0644);
+MODULE_PARM_DESC(aoe_dyndevs, "Use dynamic minor numbers for devices.");
+
static struct aoedev *devlist;
static DEFINE_SPINLOCK(devlist_lock);

@@ -34,7 +39,7 @@ static DEFINE_SPINLOCK(used_minors_lock);
static DECLARE_BITMAP(used_minors, N_DEVS);

static int
-minor_get(ulong *minor)
+minor_get_dyn(ulong *sysminor)
{
ulong flags;
ulong n;
@@ -48,10 +53,53 @@ minor_get(ulong *minor)
error = -1;
spin_unlock_irqrestore(&used_minors_lock, flags);

- *minor = n * AOE_PARTITIONS;
+ *sysminor = n * AOE_PARTITIONS;
return error;
}

+static int
+minor_get_static(ulong *sysminor, ulong aoemaj, int aoemin)
+{
+ ulong flags;
+ ulong n;
+ int error = 0;
+ enum {
+ /* for backwards compatibility when !aoe_dyndevs,
+ * a static number of supported slots per shelf */
+ NPERSHELF = 16,
+ };
+
+ n = aoemaj * NPERSHELF + aoemin;
+ if (aoemin >= NPERSHELF || n >= N_DEVS) {
+ pr_err("aoe: %s with e%ld.%d\n",
+ "cannot use static minor device numbers",
+ aoemaj, aoemin);
+ error = -1;
+ } else {
+ spin_lock_irqsave(&used_minors_lock, flags);
+ if (test_bit(n, used_minors)) {
+ pr_err("aoe: %s %lu\n",
+ "existing device already has static minor number",
+ n);
+ error = -1;
+ } else
+ set_bit(n, used_minors);
+ spin_unlock_irqrestore(&used_minors_lock, flags);
+ }
+
+ *sysminor = n;
+ return error;
+}
+
+static int
+minor_get(ulong *sysminor, ulong aoemaj, int aoemin)
+{
+ if (aoe_dyndevs)
+ return minor_get_dyn(sysminor);
+ else
+ return minor_get_static(sysminor, aoemaj, aoemin);
+}
+
static void
minor_free(ulong minor)
{
@@ -293,7 +341,7 @@ aoedev_by_aoeaddr(ulong maj, int min, int do_alloc)
d->ref++;
break;
}
- if (d || !do_alloc || minor_get(&sysminor) < 0)
+ if (d || !do_alloc || minor_get(&sysminor, maj, min) < 0)
goto out;
d = kcalloc(1, sizeof *d, GFP_ATOMIC);
if (!d)
--
1.7.1

2012-10-02 02:03:18

by Ed L. Cashin

[permalink] [raw]
Subject: [PATCH 3/7] aoe: update and specify AoE address guards and error messages

In general, specific is better when it comes to messages about
AoE usage problems. Also, explicit checks for the AoE broadcast
addresses are added.

Signed-off-by: Ed Cashin <[email protected]>
---
drivers/block/aoe/aoe.h | 2 --
drivers/block/aoe/aoecmd.c | 17 +++++++++++------
2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index 7b694f7..4ae2468 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -49,8 +49,6 @@ struct aoe_hdr {
__be32 tag;
};

-#define AOE_MAXSHELF (0xffff-1) /* one less than the broadcast shelf address */
-
struct aoe_atahdr {
unsigned char aflags;
unsigned char errfeat;
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index 94e810c..3804a0a 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -1349,15 +1349,14 @@ aoecmd_cfg_rsp(struct sk_buff *skb)
"Check shelf dip switches.\n");
return;
}
- if (aoemajor > AOE_MAXSHELF) {
- pr_info("aoe: e%ld.%d: shelf number too large\n",
+ if (aoemajor == 0xffff) {
+ pr_info("aoe: e%ld.%d: broadcast shelf number invalid\n",
aoemajor, (int) h->minor);
return;
}
-
- d = aoedev_by_aoeaddr(aoemajor, h->minor, 1);
- if (d == NULL) {
- pr_info("aoe: device allocation failure\n");
+ if (h->minor == 0xff) {
+ pr_info("aoe: e%ld.%d: broadcast slot number invalid\n",
+ aoemajor, (int) h->minor);
return;
}

@@ -1365,6 +1364,12 @@ aoecmd_cfg_rsp(struct sk_buff *skb)
if (n > aoe_maxout) /* keep it reasonable */
n = aoe_maxout;

+ d = aoedev_by_aoeaddr(aoemajor, h->minor, 1);
+ if (d == NULL) {
+ pr_info("aoe: device allocation failure\n");
+ return;
+ }
+
spin_lock_irqsave(&d->lock, flags);

t = gettgt(d, h->src);
--
1.7.1

2012-10-02 02:05:19

by Ed L. Cashin

[permalink] [raw]
Subject: [PATCH 4/7] aoe: make dynamic block minor numbers the default

Because udev use is so widespread, making the old static mapping
the default is too conservative, given the severe limitations it
places on usable AoE addresses. Storage virtualization and
larger shelves have made the old limitations too confining.

These changes make the dynamic block device minor numbers the
default, removing the limitations on usable AoE addresses.

The static arrangement is still available with aoe_dyndevs=0, and
the aoe-stat tool from the userland aoetools package, the user
space counterpart to the aoe driver, recognizes the case where
there is a mismatch between the minor number in sysfs and the
minor number in a special device file.

Signed-off-by: Ed Cashin <[email protected]>
---
drivers/block/aoe/aoedev.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
index 3d494fd..90e5b53 100644
--- a/drivers/block/aoe/aoedev.c
+++ b/drivers/block/aoe/aoedev.c
@@ -19,7 +19,7 @@ static void aoedev_freedev(struct aoedev *);
static void freetgt(struct aoedev *d, struct aoetgt *t);
static void skbpoolfree(struct aoedev *d);

-static int aoe_dyndevs;
+static int aoe_dyndevs = 1;
module_param(aoe_dyndevs, int, 0644);
MODULE_PARM_DESC(aoe_dyndevs, "Use dynamic minor numbers for devices.");

--
1.7.1

2012-10-02 02:07:21

by Ed L. Cashin

[permalink] [raw]
Subject: [PATCH 5/7] aoe: remove unused code

Signed-off-by: Ed Cashin <[email protected]>
---
drivers/block/aoe/aoe.h | 4 ----
1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index 4ae2468..c2bf797 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -10,9 +10,6 @@
#define AOE_PARTITIONS (16)
#endif

-#define SYSMINOR(aoemajor, aoeminor) ((aoemajor) * NPERSHELF + (aoeminor))
-#define AOEMAJOR(sysminor) ((sysminor) / NPERSHELF)
-#define AOEMINOR(sysminor) ((sysminor) % NPERSHELF)
#define WHITESPACE " \t\v\f\n"

enum {
@@ -82,7 +79,6 @@ enum {

enum {
DEFAULTBCNT = 2 * 512, /* 2 sectors */
- NPERSHELF = 16, /* number of slots per shelf address */
MIN_BUFS = 16,
NTARGETS = 8,
NAOEIFS = 8,
--
1.7.1

2012-10-02 02:09:21

by Ed L. Cashin

[permalink] [raw]
Subject: [PATCH 6/7] aoe: update documentation to better reflect aoe-plus-udev usage

Signed-off-by: Ed Cashin <[email protected]>
---
Documentation/aoe/aoe.txt | 49 +++++++++++++++++++++++++++--------------
Documentation/aoe/mkdevs.sh | 41 -----------------------------------
Documentation/aoe/mkshelf.sh | 28 ------------------------
Documentation/aoe/status.sh | 3 ++
4 files changed, 35 insertions(+), 86 deletions(-)
delete mode 100644 Documentation/aoe/mkdevs.sh
delete mode 100644 Documentation/aoe/mkshelf.sh

diff --git a/Documentation/aoe/aoe.txt b/Documentation/aoe/aoe.txt
index b3e4756..bfc9cb1 100644
--- a/Documentation/aoe/aoe.txt
+++ b/Documentation/aoe/aoe.txt
@@ -1,3 +1,8 @@
+ATA over Ethernet is a network protocol that provides simple access to
+block storage on the LAN.
+
+ http://support.coraid.com/documents/AoEr11.txt
+
The EtherDrive (R) HOWTO for 2.6 and 3.x kernels is found at ...

http://support.coraid.com/support/linux/EtherDrive-2.6-HOWTO.html
@@ -26,20 +31,12 @@ CREATING DEVICE NODES
There is a udev-install.sh script that shows how to install these
rules on your system.

- If you are not using udev, two scripts are provided in
- Documentation/aoe as examples of static device node creation for
- using the aoe driver.
-
- rm -rf /dev/etherd
- sh Documentation/aoe/mkdevs.sh /dev/etherd
-
- ... or to make just one shelf's worth of block device nodes ...
-
- sh Documentation/aoe/mkshelf.sh /dev/etherd 0
-
There is also an autoload script that shows how to edit
/etc/modprobe.d/aoe.conf to ensure that the aoe module is loaded when
- necessary.
+ necessary. Preloading the aoe module is preferable to autoloading,
+ however, because AoE discovery takes a few seconds. It can be
+ confusing when an AoE device is not present the first time the a
+ command is run but appears a second later.

USING DEVICE NODES

@@ -54,9 +51,9 @@ USING DEVICE NODES
"echo > /dev/etherd/discover" tells the driver to find out what AoE
devices are available.

- These character devices may disappear and be replaced by sysfs
- counterparts. Using the commands in aoetools insulates users from
- these implementation details.
+ In the future these character devices may disappear and be replaced
+ by sysfs counterparts. Using the commands in aoetools insulates
+ users from these implementation details.

The block devices are named like this:

@@ -79,8 +76,8 @@ USING SYSFS
The netif attribute is the network interface on the localhost
through which we are communicating with the remote AoE device.

- There is a script in this directory that formats this information
- in a convenient way. Users with aoetools can use the aoe-stat
+ There is a script in this directory that formats this information in
+ a convenient way. Users with aoetools should use the aoe-stat
command.

root@makki root# sh Documentation/aoe/status.sh
@@ -124,3 +121,21 @@ DRIVER OPTIONS
usage example for the module parameter.

modprobe aoe_iflist="eth1 eth3"
+
+ The aoe_deadsecs module parameter determines the maximum number of
+ seconds that the driver will wait for an AoE device to provide a
+ response to an AoE command. After aoe_deadsecs seconds have
+ elapsed, the AoE device will be marked as "down".
+
+ The aoe_maxout module parameter has a default of 128. This is the
+ maximum number of unresponded packets that will be sent to an AoE
+ target at one time.
+
+ The aoe_dyndevs module parameter defaults to 1, meaning that the
+ driver will assign a block device minor number to a discovered AoE
+ target based on the order of its discovery. With dynamic minor
+ device numbers in use, a greater range of AoE shelf and slot
+ addresses can be supported. Users with udev will never have to
+ think about minor numbers. Using aoe_dyndevs=0 allows device nodes
+ to be pre-created using a static minor-number scheme with the
+ aoe-mkshelf script in the aoetools.
diff --git a/Documentation/aoe/mkdevs.sh b/Documentation/aoe/mkdevs.sh
deleted file mode 100644
index 44c0ab7..0000000
--- a/Documentation/aoe/mkdevs.sh
+++ /dev/null
@@ -1,41 +0,0 @@
-#!/bin/sh
-
-n_shelves=${n_shelves:-10}
-n_partitions=${n_partitions:-16}
-
-if test "$#" != "1"; then
- echo "Usage: sh `basename $0` {dir}" 1>&2
- echo " n_partitions=16 sh `basename $0` {dir}" 1>&2
- exit 1
-fi
-dir=$1
-
-MAJOR=152
-
-echo "Creating AoE devnode files in $dir ..."
-
-set -e
-
-mkdir -p $dir
-
-# (Status info is in sysfs. See status.sh.)
-# rm -f $dir/stat
-# mknod -m 0400 $dir/stat c $MAJOR 1
-rm -f $dir/err
-mknod -m 0400 $dir/err c $MAJOR 2
-rm -f $dir/discover
-mknod -m 0200 $dir/discover c $MAJOR 3
-rm -f $dir/interfaces
-mknod -m 0200 $dir/interfaces c $MAJOR 4
-rm -f $dir/revalidate
-mknod -m 0200 $dir/revalidate c $MAJOR 5
-rm -f $dir/flush
-mknod -m 0200 $dir/flush c $MAJOR 6
-
-export n_partitions
-mkshelf=`echo $0 | sed 's!mkdevs!mkshelf!'`
-i=0
-while test $i -lt $n_shelves; do
- sh -xc "sh $mkshelf $dir $i"
- i=`expr $i + 1`
-done
diff --git a/Documentation/aoe/mkshelf.sh b/Documentation/aoe/mkshelf.sh
deleted file mode 100644
index 3261581..0000000
--- a/Documentation/aoe/mkshelf.sh
+++ /dev/null
@@ -1,28 +0,0 @@
-#! /bin/sh
-
-if test "$#" != "2"; then
- echo "Usage: sh `basename $0` {dir} {shelfaddress}" 1>&2
- echo " n_partitions=16 sh `basename $0` {dir} {shelfaddress}" 1>&2
- exit 1
-fi
-n_partitions=${n_partitions:-16}
-dir=$1
-shelf=$2
-nslots=16
-maxslot=`echo $nslots 1 - p | dc`
-MAJOR=152
-
-set -e
-
-minor=`echo $nslots \* $shelf \* $n_partitions | bc`
-endp=`echo $n_partitions - 1 | bc`
-for slot in `seq 0 $maxslot`; do
- for part in `seq 0 $endp`; do
- name=e$shelf.$slot
- test "$part" != "0" && name=${name}p$part
- rm -f $dir/$name
- mknod -m 0660 $dir/$name b $MAJOR $minor
-
- minor=`expr $minor + 1`
- done
-done
diff --git a/Documentation/aoe/status.sh b/Documentation/aoe/status.sh
index 751f3be..eeec7ba 100644
--- a/Documentation/aoe/status.sh
+++ b/Documentation/aoe/status.sh
@@ -1,5 +1,8 @@
#! /bin/sh
# collate and present sysfs information about AoE storage
+#
+# A more complete version of this script is aoe-stat, in the
+# aoetools.

set -e
format="%8s\t%8s\t%8s\n"
--
1.7.1

2012-10-02 02:11:23

by Ed L. Cashin

[permalink] [raw]
Subject: [PATCH 7/7] aoe: update aoe-internal version number to 50

Signed-off-by: Ed Cashin <[email protected]>
---
drivers/block/aoe/aoe.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index c2bf797..d2ed7f1 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -1,5 +1,5 @@
/* Copyright (c) 2012 Coraid, Inc. See COPYING for GPL terms. */
-#define VERSION "49"
+#define VERSION "50"
#define AOE_MAJOR 152
#define DEVICE_NAME "aoe"

--
1.7.1

2012-10-02 10:30:26

by Ed L. Cashin

[permalink] [raw]
Subject: Re: [PATCH 1/7] aoe: support more AoE addresses with dynamic block device minor numbers

I should have mentioned that these seven patches were made for linux-next/akpm, which contains the last 14-patch series for aoe.

On Oct 1, 2012, at 9:59 PM, "Ed Cashin" <[email protected]> wrote:

> The ATA over Ethernet protocol uses a major (shelf) and
> minor (slot) address to identify a particular storage target.
> These changes remove an artificial limitation the aoe driver
> imposes on the use of AoE addresses. For example, without these
> changes, the slot address has a maximum of 15, but users commonly
> use slot numbers much greater than that.
>
> The AoE shelf and slot address space is often used sparsely.
> Instead of using a static mapping between AoE addresses and the
> block device minor number, the block device minor numbers are now
> allocated on demand.
>
> Signed-off-by: Ed Cashin <[email protected]>
> ---
> drivers/block/aoe/aoe.h | 6 ++--
> drivers/block/aoe/aoeblk.c | 2 +-
> drivers/block/aoe/aoechr.c | 2 +-
> drivers/block/aoe/aoecmd.c | 25 ++++---------
> drivers/block/aoe/aoedev.c | 86 ++++++++++++++++++++++++++++++--------------
> 5 files changed, 72 insertions(+), 49 deletions(-)
>
> diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
> index 27d0a21..7b694f7 100644
> --- a/drivers/block/aoe/aoe.h
> +++ b/drivers/block/aoe/aoe.h
> @@ -49,6 +49,8 @@ struct aoe_hdr {
> __be32 tag;
> };
>
> +#define AOE_MAXSHELF (0xffff-1) /* one less than the broadcast shelf address */
> +
> struct aoe_atahdr {
> unsigned char aflags;
> unsigned char errfeat;
> @@ -211,8 +213,7 @@ void aoe_ktstop(struct ktstate *k);
>
> int aoedev_init(void);
> void aoedev_exit(void);
> -struct aoedev *aoedev_by_aoeaddr(int maj, int min);
> -struct aoedev *aoedev_by_sysminor_m(ulong sysminor);
> +struct aoedev *aoedev_by_aoeaddr(ulong maj, int min, int do_alloc);
> void aoedev_downdev(struct aoedev *d);
> int aoedev_flush(const char __user *str, size_t size);
> void aoe_failbuf(struct aoedev *, struct buf *);
> @@ -223,4 +224,3 @@ void aoenet_exit(void);
> void aoenet_xmit(struct sk_buff_head *);
> int is_aoe_netif(struct net_device *ifp);
> int set_aoe_iflist(const char __user *str, size_t size);
> -
> diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
> index 83160ab..00dfc50 100644
> --- a/drivers/block/aoe/aoeblk.c
> +++ b/drivers/block/aoe/aoeblk.c
> @@ -249,7 +249,7 @@ aoeblk_gdalloc(void *vp)
> q->queuedata = d;
> d->gd = gd;
> gd->major = AOE_MAJOR;
> - gd->first_minor = d->sysminor * AOE_PARTITIONS;
> + gd->first_minor = d->sysminor;
> gd->fops = &aoe_bdops;
> gd->private_data = d;
> set_capacity(gd, d->ssize);
> diff --git a/drivers/block/aoe/aoechr.c b/drivers/block/aoe/aoechr.c
> index deb30c1..ed57a89 100644
> --- a/drivers/block/aoe/aoechr.c
> +++ b/drivers/block/aoe/aoechr.c
> @@ -91,7 +91,7 @@ revalidate(const char __user *str, size_t size)
> pr_err("aoe: invalid device specification %s\n", buf);
> return -EINVAL;
> }
> - d = aoedev_by_aoeaddr(major, minor);
> + d = aoedev_by_aoeaddr(major, minor, 0);
> if (!d)
> return -EINVAL;
> spin_lock_irqsave(&d->lock, flags);
> diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
> index 39dacdb..94e810c 100644
> --- a/drivers/block/aoe/aoecmd.c
> +++ b/drivers/block/aoe/aoecmd.c
> @@ -1149,7 +1149,7 @@ aoecmd_ata_rsp(struct sk_buff *skb)
>
> h = (struct aoe_hdr *) skb->data;
> aoemajor = be16_to_cpu(get_unaligned(&h->major));
> - d = aoedev_by_aoeaddr(aoemajor, h->minor);
> + d = aoedev_by_aoeaddr(aoemajor, h->minor, 0);
> if (d == NULL) {
> snprintf(ebuf, sizeof ebuf, "aoecmd_ata_rsp: ata response "
> "for unknown device %d.%d\n",
> @@ -1330,7 +1330,7 @@ aoecmd_cfg_rsp(struct sk_buff *skb)
> struct aoe_hdr *h;
> struct aoe_cfghdr *ch;
> struct aoetgt *t;
> - ulong flags, sysminor, aoemajor;
> + ulong flags, aoemajor;
> struct sk_buff *sl;
> struct sk_buff_head queue;
> u16 n;
> @@ -1349,18 +1349,15 @@ aoecmd_cfg_rsp(struct sk_buff *skb)
> "Check shelf dip switches.\n");
> return;
> }
> - if (h->minor >= NPERSHELF) {
> - pr_err("aoe: e%ld.%d %s, %d\n",
> - aoemajor, h->minor,
> - "slot number larger than the maximum",
> - NPERSHELF-1);
> + if (aoemajor > AOE_MAXSHELF) {
> + pr_info("aoe: e%ld.%d: shelf number too large\n",
> + aoemajor, (int) h->minor);
> return;
> }
>
> - sysminor = SYSMINOR(aoemajor, h->minor);
> - if (sysminor * AOE_PARTITIONS + AOE_PARTITIONS > MINORMASK) {
> - printk(KERN_INFO "aoe: e%ld.%d: minor number too large\n",
> - aoemajor, (int) h->minor);
> + d = aoedev_by_aoeaddr(aoemajor, h->minor, 1);
> + if (d == NULL) {
> + pr_info("aoe: device allocation failure\n");
> return;
> }
>
> @@ -1368,12 +1365,6 @@ aoecmd_cfg_rsp(struct sk_buff *skb)
> if (n > aoe_maxout) /* keep it reasonable */
> n = aoe_maxout;
>
> - d = aoedev_by_sysminor_m(sysminor);
> - if (d == NULL) {
> - printk(KERN_INFO "aoe: device sysminor_m failure\n");
> - return;
> - }
> -
> spin_lock_irqsave(&d->lock, flags);
>
> t = gettgt(d, h->src);
> diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
> index ccaecff..68a7a5a 100644
> --- a/drivers/block/aoe/aoedev.c
> +++ b/drivers/block/aoe/aoedev.c
> @@ -9,6 +9,8 @@
> #include <linux/netdevice.h>
> #include <linux/delay.h>
> #include <linux/slab.h>
> +#include <linux/bitmap.h>
> +#include <linux/kdev_t.h>
> #include "aoe.h"
>
> static void dummy_timer(ulong);
> @@ -19,35 +21,63 @@ static void skbpoolfree(struct aoedev *d);
> static struct aoedev *devlist;
> static DEFINE_SPINLOCK(devlist_lock);
>
> -/*
> - * Users who grab a pointer to the device with aoedev_by_aoeaddr or
> - * aoedev_by_sysminor_m automatically get a reference count and must
> - * be responsible for performing a aoedev_put. With the addition of
> - * async kthread processing I'm no longer confident that we can
> - * guarantee consistency in the face of device flushes.
> - *
> - * For the time being, we only bother to add extra references for
> - * frames sitting on the iocq. When the kthreads finish processing
> - * these frames, they will aoedev_put the device.
> +/* Because some systems will have one, many, or no
> + * - partitions,
> + * - slots per shelf,
> + * - or shelves,
> + * we need some flexibility in the way the minor numbers
> + * are allocated. So they are dynamic.
> */
> -struct aoedev *
> -aoedev_by_aoeaddr(int maj, int min)
> +#define N_DEVS ((1U<<MINORBITS)/AOE_PARTITIONS)
> +
> +static DEFINE_SPINLOCK(used_minors_lock);
> +static DECLARE_BITMAP(used_minors, N_DEVS);
> +
> +static int
> +minor_get(ulong *minor)
> {
> - struct aoedev *d;
> ulong flags;
> + ulong n;
> + int error = 0;
> +
> + spin_lock_irqsave(&used_minors_lock, flags);
> + n = find_first_zero_bit(used_minors, N_DEVS);
> + if (n < N_DEVS)
> + set_bit(n, used_minors);
> + else
> + error = -1;
> + spin_unlock_irqrestore(&used_minors_lock, flags);
> +
> + *minor = n * AOE_PARTITIONS;
> + return error;
> +}
>
> - spin_lock_irqsave(&devlist_lock, flags);
> +static void
> +minor_free(ulong minor)
> +{
> + ulong flags;
>
> - for (d=devlist; d; d=d->next)
> - if (d->aoemajor == maj && d->aoeminor == min) {
> - d->ref++;
> - break;
> - }
> + minor /= AOE_PARTITIONS;
> + BUG_ON(minor >= N_DEVS);
>
> - spin_unlock_irqrestore(&devlist_lock, flags);
> - return d;
> + spin_lock_irqsave(&used_minors_lock, flags);
> + BUG_ON(!test_bit(minor, used_minors));
> + clear_bit(minor, used_minors);
> + spin_unlock_irqrestore(&used_minors_lock, flags);
> }
>
> +/*
> + * Users who grab a pointer to the device with aoedev_by_aoeaddr
> + * automatically get a reference count and must be responsible
> + * for performing a aoedev_put. With the addition of async
> + * kthread processing I'm no longer confident that we can
> + * guarantee consistency in the face of device flushes.
> + *
> + * For the time being, we only bother to add extra references for
> + * frames sitting on the iocq. When the kthreads finish processing
> + * these frames, they will aoedev_put the device.
> + */
> +
> void
> aoedev_put(struct aoedev *d)
> {
> @@ -159,6 +189,7 @@ aoedev_freedev(struct aoedev *d)
> if (d->bufpool)
> mempool_destroy(d->bufpool);
> skbpoolfree(d);
> + minor_free(d->sysminor);
> kfree(d);
> }
>
> @@ -246,22 +277,23 @@ skbpoolfree(struct aoedev *d)
> __skb_queue_head_init(&d->skbpool);
> }
>
> -/* find it or malloc it */
> +/* find it or allocate it */
> struct aoedev *
> -aoedev_by_sysminor_m(ulong sysminor)
> +aoedev_by_aoeaddr(ulong maj, int min, int do_alloc)
> {
> struct aoedev *d;
> int i;
> ulong flags;
> + ulong sysminor;
>
> spin_lock_irqsave(&devlist_lock, flags);
>
> for (d=devlist; d; d=d->next)
> - if (d->sysminor == sysminor) {
> + if (d->aoemajor == maj && d->aoeminor == min) {
> d->ref++;
> break;
> }
> - if (d)
> + if (d || !do_alloc || minor_get(&sysminor) < 0)
> goto out;
> d = kcalloc(1, sizeof *d, GFP_ATOMIC);
> if (!d)
> @@ -280,8 +312,8 @@ aoedev_by_sysminor_m(ulong sysminor)
> for (i = 0; i < NFACTIVE; i++)
> INIT_LIST_HEAD(&d->factive[i]);
> d->sysminor = sysminor;
> - d->aoemajor = AOEMAJOR(sysminor);
> - d->aoeminor = AOEMINOR(sysminor);
> + d->aoemajor = maj;
> + d->aoeminor = min;
> d->mintimer = MINTIMER;
> d->next = devlist;
> devlist = d;
> --
> 1.7.1
>

2012-10-02 21:12:33

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/7] aoe: support more AoE addresses with dynamic block device minor numbers

On Mon, 1 Oct 2012 18:59:12 -0700
Ed Cashin <[email protected]> wrote:

> The ATA over Ethernet protocol uses a major (shelf) and
> minor (slot) address to identify a particular storage target.
> These changes remove an artificial limitation the aoe driver
> imposes on the use of AoE addresses. For example, without these
> changes, the slot address has a maximum of 15, but users commonly
> use slot numbers much greater than that.
>
> The AoE shelf and slot address space is often used sparsely.
> Instead of using a static mapping between AoE addresses and the
> block device minor number, the block device minor numbers are now
> allocated on demand.
>
> ...

Very minor things...

>
> ...
>
> +static int
> +minor_get(ulong *minor)
> {
> - struct aoedev *d;
> ulong flags;
> + ulong n;
> + int error = 0;
> +
> + spin_lock_irqsave(&used_minors_lock, flags);
> + n = find_first_zero_bit(used_minors, N_DEVS);
> + if (n < N_DEVS)
> + set_bit(n, used_minors);
> + else
> + error = -1;
> + spin_unlock_irqrestore(&used_minors_lock, flags);
> +
> + *minor = n * AOE_PARTITIONS;
> + return error;
> +}

- can use the more efficient __set_bit() inside that spinlock.

- could avoid setting *minor if we're returning an error.

> - spin_lock_irqsave(&devlist_lock, flags);
> +static void
> +minor_free(ulong minor)
> +{
> + ulong flags;
>
> - for (d=devlist; d; d=d->next)
> - if (d->aoemajor == maj && d->aoeminor == min) {
> - d->ref++;
> - break;
> - }
> + minor /= AOE_PARTITIONS;
> + BUG_ON(minor >= N_DEVS);
>
> - spin_unlock_irqrestore(&devlist_lock, flags);
> - return d;
> + spin_lock_irqsave(&used_minors_lock, flags);
> + BUG_ON(!test_bit(minor, used_minors));
> + clear_bit(minor, used_minors);
> + spin_unlock_irqrestore(&used_minors_lock, flags);
> }

Could use

BUG_ON(!__test_and_clear_bit(...));

This will work, but I think it's bad form to put an expression with
side-effects inside an assert macro.

2012-10-02 21:16:28

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/7] aoe: retain static block device numbers for backwards compatibility

On Mon, 1 Oct 2012 19:01:15 -0700
Ed Cashin <[email protected]> wrote:

> The old mapping between AoE target shelf and slot addresses and
> the block device minor number is retained as a
> backwards-compatible feature, with a new "aoe_dyndevs" module
> parameter available for enabling dynamic block device minor
> numbers.

OK, so the default behaviour is unchanged and users opt into the new
behaviour by providing this module parameter?

Why is this worth doing? What behaviour changes would users see if we
defaulted to dynamic minors?


I'm surprised we don't have some library code somewhere for managing a
map of minor numbers - it's an operation which is common to so many
drivers. It would be bitmap or IDR based, I expect.

2012-10-02 21:17:37

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 4/7] aoe: make dynamic block minor numbers the default

On Mon, 1 Oct 2012 19:05:15 -0700
Ed Cashin <[email protected]> wrote:

> Because udev use is so widespread, making the old static mapping
> the default is too conservative, given the severe limitations it
> places on usable AoE addresses. Storage virtualization and
> larger shelves have made the old limitations too confining.
>
> These changes make the dynamic block device minor numbers the
> default, removing the limitations on usable AoE addresses.
>
> The static arrangement is still available with aoe_dyndevs=0, and
> the aoe-stat tool from the userland aoetools package, the user
> space counterpart to the aoe driver, recognizes the case where
> there is a mismatch between the minor number in sysfs and the
> minor number in a special device file.

Oh. There we are.

Again, what are the back-(in)compatibility issues which users might
encounter with this?

2012-10-02 22:51:00

by Ed L. Cashin

[permalink] [raw]
Subject: Re: [PATCH 4/7] aoe: make dynamic block minor numbers the default


On Oct 2, 2012, at 5:17 PM, Andrew Morton wrote:
...
>> The static arrangement is still available with aoe_dyndevs=0, and
>> the aoe-stat tool from the userland aoetools package, the user
>> space counterpart to the aoe driver, recognizes the case where
>> there is a mismatch between the minor number in sysfs and the
>> minor number in a special device file.
>
> Oh. There we are.
>
> Again, what are the back-(in)compatibility issues which users might
> encounter with this?

I think the people who care about aoe_dyndevs=0 are the people creating
an initramfs without udev and with AoE targets that have low shelf and
slot numbers for addresses.

They might want a tiny bzImage that can boot off AoE storage using
device nodes prepared in advance with mknod. The old static mapping
is a nice fit for that use case.

There could be others with a reason not to use udev, but none of those
others have ever communicated with me, if they exist.

It's probably confusing the way I did the commits, making static the
default before explicitly changing it to dynamic, but usually the people
who actually care about this are so technical that I expect them to read
the git logs, and I wanted to leave a paper trail there for anyone reading
later.

--
Ed Cashin
[email protected]

2012-10-03 00:03:53

by Ed L. Cashin

[permalink] [raw]
Subject: Re: [PATCH 1/7] aoe: support more AoE addresses with dynamic block device minor numbers

On Oct 2, 2012, at 5:12 PM, Andrew Morton wrote:
...
>> +static int
>> +minor_get(ulong *minor)
>> {
>> - struct aoedev *d;
>> ulong flags;
>> + ulong n;
>> + int error = 0;
>> +
>> + spin_lock_irqsave(&used_minors_lock, flags);
>> + n = find_first_zero_bit(used_minors, N_DEVS);
>> + if (n < N_DEVS)
>> + set_bit(n, used_minors);
>> + else
>> + error = -1;
>> + spin_unlock_irqrestore(&used_minors_lock, flags);
>> +
>> + *minor = n * AOE_PARTITIONS;
>> + return error;
>> +}
>
> - can use the more efficient __set_bit() inside that spinlock.

Thanks for that observation. Because this operation occurs on target discovery, which is expected to be relatively infrequent, my inclination is to leave it in its atomic form, though, and leave the __set_bit() for another time when optimization is needed. Like you said, this is a minor point. I wouldn't mind changing it, though, if you think it's worth me resubmitting the patch. Just let me know.

> - could avoid setting *minor if we're returning an error.

Yes. The only caller of aoedev.c:minor_get() handles that correctly. Again, just let me know if you think this is worth a resubmission of the patch. Otherwise I'll just make a note to myself to try to avoid setting output parameters on error in the future.

--
Ed Cashin
[email protected]