Hello, all.
This patchset implements extended devt for block devices. This is
mainly to work around sd limitations (16 minors per device) but can
easily be used to allow more partitions or more devices.
With it turned on, a libata disk w/ 60 partitions looks like the
following.
# cat /proc/partitions
major minor #blocks name
3 0 78184008 hda
3 1 26218048 hda1
3 2 1052257 hda2
8 0 156290904 sda
8 1 8001 sda1
8 2 8032 sda2
8 3 8032 sda3
8 4 1 sda4
...
8 15 8001 sda15
259 0 8001 sda16
259 1 8001 sda17
259 2 8001 sda18
...
259 43 8001 sda59
259 44 8001 sda60
As you can see, partitions over the genhd->minors limit gets assigned
under major 259 which breaks the predetermined contiguous minors
assumption. I've tested a number of things on it and everything seems
to work just fine including mounting as root.
A debug option CONFIG_DEBUG_BLOCK_EXT_DEVT is also implemented, when
enabled, the /proc/partitions looks like the following.
# cat /proc/partitions
major minor #blocks name
3 0 78184008 hda
259 0 26218048 hda1
259 524288 1052257 hda2
8 0 156290904 sda
259 262144 8001 sda1
259 786432 8032 sda2
259 131072 8032 sda3
259 655360 1 sda4
259 393216 8001 sda5
259 917504 8001 sda6
259 65536 8001 sda7
...
259 114688 8001 sda55
259 638976 8001 sda56
259 376832 8001 sda57
259 901120 8001 sda58
259 245760 8001 sda59
259 770048 8001 sda60
The option forces all partitions to be allocated from the extended
region and spreads the minors as apart as possible to achieve two
goals.
* Detect kernel or userland code which assumes pre-determined
consecutive block devts.
* Prevent such code from accessing the wrong partition and corrupting
it by making devts far apart from each other.
I thought about making the spread allocation default for the extended
area while keeping the conventional minors as they are but that seemed
like an overkill. Especially because the only interface which reveals
how many consecutive minors are allocated is sysfs attribute "range"
which doesn't change after this patchset. A new attribute "ext_range"
which indicates the total limit of minors includnig the extended ones
is added instead.
I chose major 259 at my own whim. Is it okay to use this value? As
extended devts don't care what MAJ:MIN is used, we can also create a
pool of MAJ:MINs which any driver can allocate MAJ:MIN from.
Switching to such mechanism shouldn't be too difficult (some kobj_map
adjustments will be necessary tho) so no need to worry about it now.
This patchset is also available in the following git tree.
http://git.kernel.org/?p=linux/kernel/git/tj/misc.git;a=shortlog;h=block-extended-devt
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git block-extended-devt
Thanks.
block/genhd.c | 268 ++++++++++++++++++++++++++++++++++++++++---------
block/ioctl.c | 6 -
drivers/ide/ide-disk.c | 17 ++-
drivers/scsi/sd.c | 15 ++
fs/block_dev.c | 2
fs/partitions/check.c | 44 ++++++--
include/linux/fs.h | 1
include/linux/genhd.h | 36 ++++++
include/linux/major.h | 2
lib/Kconfig.debug | 16 ++
10 files changed, 340 insertions(+), 67 deletions(-)
--
tejun
This patch makes the following misc updates in preparation for
extended block devt support.
* make add_partition report on failrues
* add hd_struct->disk which points to the containing gendisk
* fix comment for gendisk->part
Signed-off-by: Tejun Heo <[email protected]>
---
fs/partitions/check.c | 11 ++++++++++-
include/linux/genhd.h | 3 ++-
2 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/fs/partitions/check.c b/fs/partitions/check.c
index 6149e4b..b915ac2 100644
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -346,19 +346,28 @@ static DEVICE_ATTR(whole_disk, S_IRUSR | S_IRGRP | S_IROTH,
void add_partition(struct gendisk *disk, int part, sector_t start, sector_t len, int flags)
{
+ char name[BDEVNAME_SIZE];
struct hd_struct *p;
int err;
+ disk_name(disk, part, name);
+
p = kzalloc(sizeof(*p), GFP_KERNEL);
- if (!p)
+ if (!p) {
+ printk(KERN_WARNING "%s: failed to allocate partition "
+ "structure (part=%d)\n", name, part);
return;
+ }
if (!init_part_stats(p)) {
+ printk(KERN_WARNING "%s: failed to initialize partition stats "
+ "structure (part=%d)\n", name, part);
kfree(p);
return;
}
p->start_sect = start;
p->nr_sects = len;
+ p->disk = disk;
p->partno = part;
p->policy = disk->policy;
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index ae7aec3..1f06681 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -88,6 +88,7 @@ struct hd_struct {
sector_t start_sect;
sector_t nr_sects;
struct device dev;
+ struct gendisk *disk;
struct kobject *holder_dir;
int policy, partno;
#ifdef CONFIG_FAIL_MAKE_REQUEST
@@ -117,7 +118,7 @@ struct gendisk {
int minors; /* maximum number of minors, =1 for
* disks that can't be partitioned. */
char disk_name[32]; /* name of major driver */
- struct hd_struct **part; /* [indexed by minor] */
+ struct hd_struct **part; /* [indexed by minor - 1] */
struct block_device_operations *fops;
struct request_queue *queue;
void *private_data;
--
1.5.4.5
Update sd and ide-disk such that they can take advantage of extended
minors.
ide-disk already has 64 minors per device and currently doesn't use
extended minors although after this patch it can be turned on by
simply tweaking constants.
sd only had 16 minors per device causing problems on certain peculiar
configurations. This patch lifts the restriction and enables it to
use upto 64 minors.
Signed-off-by: Tejun Heo <[email protected]>
---
drivers/ide/ide-disk.c | 11 ++++++++---
drivers/scsi/sd.c | 9 +++++++--
2 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
index 8e08d08..f8b091a 100644
--- a/drivers/ide/ide-disk.c
+++ b/drivers/ide/ide-disk.c
@@ -41,6 +41,10 @@
#include <asm/io.h>
#include <asm/div64.h>
+#define IDE_DISK_PARTS (1 << PARTN_BITS)
+#define IDE_DISK_MINORS IDE_DISK_PARTS
+#define IDE_DISK_EXT_MINORS (IDE_DISK_PARTS - IDE_DISK_MINORS)
+
struct ide_disk_obj {
ide_drive_t *drive;
ide_driver_t *driver;
@@ -1158,8 +1162,8 @@ static int ide_disk_probe(ide_drive_t *drive)
if (!idkp)
goto failed;
- g = alloc_disk_node(1 << PARTN_BITS,
- hwif_to_node(drive->hwif));
+ g = alloc_disk_ext_node(IDE_DISK_MINORS, IDE_DISK_EXT_MINORS,
+ hwif_to_node(drive->hwif));
if (!g)
goto out_free_idkp;
@@ -1185,7 +1189,8 @@ static int ide_disk_probe(ide_drive_t *drive)
} else
drive->attach = 1;
- g->minors = 1 << PARTN_BITS;
+ g->minors = IDE_DISK_MINORS;
+ g->ext_minors = IDE_DISK_EXT_MINORS;
g->driverfs_dev = &drive->gendev;
g->flags = drive->removable ? GENHD_FL_REMOVABLE : 0;
set_capacity(g, idedisk_capacity(drive));
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 01cefbb..8879c98 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -86,6 +86,10 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_DISK);
MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
+#define SD_PARTS 64
+#define SD_MINORS 16
+#define SD_EXT_MINORS (SD_PARTS - SD_MINORS)
+
static int sd_revalidate_disk(struct gendisk *);
static int sd_probe(struct device *);
static int sd_remove(struct device *);
@@ -1642,7 +1646,7 @@ static int sd_probe(struct device *dev)
if (!sdkp)
goto out;
- gd = alloc_disk(16);
+ gd = alloc_disk_ext(SD_MINORS, SD_EXT_MINORS);
if (!gd)
goto out_free;
@@ -1684,7 +1688,8 @@ static int sd_probe(struct device *dev)
gd->major = sd_major((index & 0xf0) >> 4);
gd->first_minor = ((index & 0xf) << 4) | (index & 0xfff00);
- gd->minors = 16;
+ gd->minors = SD_MINORS;
+ gd->ext_minors = SD_EXT_MINORS;
gd->fops = &sd_fops;
if (index < 26) {
--
1.5.4.5
Extended devt introduces non-contiguos device numbers. This patch
implements a debug option which forces most devt allocations to be
from the extended area and spreads them out. This is enabled by
default if DEBUG_KERNEL is set and achieves...
1. Detects code paths in kernel or userland which expect predetermined
consecutive device numbers.
2. When something goes wrong, avoid corruption as adding to the minor
of earlier partition won't lead to the wrong but valid device.
Signed-off-by: Tejun Heo <[email protected]>
---
block/genhd.c | 40 ++++++++++++++++++++++++++++++++++++----
drivers/ide/ide-disk.c | 6 ++++++
drivers/scsi/sd.c | 6 ++++++
lib/Kconfig.debug | 16 ++++++++++++++++
4 files changed, 64 insertions(+), 4 deletions(-)
diff --git a/block/genhd.c b/block/genhd.c
index 615e0de..7fd17a0 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -142,6 +142,38 @@ EXPORT_SYMBOL(unregister_blkdev);
static struct kobj_map *bdev_map;
/**
+ * blk_mangle_minor - scatter minor numbers apart
+ * @minor: minor number to mangle
+ *
+ * Scatter consecutively allocated @minor number apart if MANGLE_DEVT
+ * is enabled. Mangling twice gives the original value.
+ *
+ * RETURNS:
+ * Mangled value.
+ *
+ * CONTEXT:
+ * Don't care.
+ */
+static int blk_mangle_minor(int minor)
+{
+#ifdef CONFIG_DEBUG_BLOCK_EXT_DEVT
+ int i;
+
+ for (i = 0; i < MINORBITS / 2; i++) {
+ int low = minor & (1 << i);
+ int high = minor & (1 << (MINORBITS - 1 - i));
+ int distance = MINORBITS - 1 - 2 * i;
+
+ minor ^= low | high; /* clear both bits */
+ low <<= distance; /* swap the positions */
+ high >>= distance;
+ minor |= low | high; /* and set */
+ }
+#endif
+ return minor;
+}
+
+/**
* blk_alloc_devt - allocate a dev_t for a partition
* @part: partition to allocate dev_t for
* @gfp_mask: memory allocation flag
@@ -181,7 +213,7 @@ int blk_alloc_devt(struct hd_struct *part, gfp_t gfp_mask, dev_t *devt)
return -EBUSY;
}
- *devt = MKDEV(EXT_BLOCK_MAJOR, idx);
+ *devt = MKDEV(EXT_BLOCK_MAJOR, blk_mangle_minor(idx));
return 0;
}
@@ -197,7 +229,7 @@ int blk_alloc_devt(struct hd_struct *part, gfp_t gfp_mask, dev_t *devt)
void blk_free_devt(dev_t devt)
{
if (MAJOR(devt) == EXT_BLOCK_MAJOR)
- idr_remove(&ext_devt_idr, MINOR(devt));
+ idr_remove(&ext_devt_idr, blk_mangle_minor(MINOR(devt)));
}
/*
@@ -450,7 +482,7 @@ static struct kobject *ext_probe(dev_t devt, int *idx, void *data)
{
struct hd_struct *part;
- part = idr_find(&ext_devt_idr, MINOR(devt));
+ part = idr_find(&ext_devt_idr, blk_mangle_minor(MINOR(devt)));
if (unlikely(!part))
return NULL;
@@ -462,7 +494,7 @@ static int ext_lock(dev_t devt, void *data)
{
struct hd_struct *part;
- part = idr_find(&ext_devt_idr, MINOR(devt));
+ part = idr_find(&ext_devt_idr, blk_mangle_minor(MINOR(devt)));
if (likely(part && get_disk(part->disk)))
return 0;
return -1;
diff --git a/drivers/ide/ide-disk.c b/drivers/ide/ide-disk.c
index f8b091a..3c6f0fa 100644
--- a/drivers/ide/ide-disk.c
+++ b/drivers/ide/ide-disk.c
@@ -42,7 +42,13 @@
#include <asm/div64.h>
#define IDE_DISK_PARTS (1 << PARTN_BITS)
+
+#if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
#define IDE_DISK_MINORS IDE_DISK_PARTS
+#else
+#define IDE_DISK_MINORS 1
+#endif
+
#define IDE_DISK_EXT_MINORS (IDE_DISK_PARTS - IDE_DISK_MINORS)
struct ide_disk_obj {
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 8879c98..d49605c 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -87,7 +87,13 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
#define SD_PARTS 64
+
+#if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
#define SD_MINORS 16
+#else
+#define SD_MINORS 1
+#endif
+
#define SD_EXT_MINORS (SD_PARTS - SD_MINORS)
static int sd_revalidate_disk(struct gendisk *);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index d2099f4..46bc380 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -565,6 +565,22 @@ config BACKTRACE_SELF_TEST
Say N if you are unsure.
+config DEBUG_BLOCK_EXT_DEVT
+ bool "Force extended block device numbers and spread them"
+ depends on DEBUG_KERNEL
+ depends on BLOCK
+ default y
+ help
+ Conventionally, block device numbers are allocated from
+ predetermined contiguous area. However, extended block area
+ may introduce non-contiguous block device numbers. This
+ option forces most block device numbers to be allocated from
+ the extended space and spreads them to discover kernel or
+ userland code paths which assume predetermined contiguous
+ device number allocation.
+
+ Say N if you are unsure.
+
config LKDTM
tristate "Linux Kernel Dump Test Tool Module"
depends on DEBUG_KERNEL
--
1.5.4.5
Implement extended minors. A block driver can tell block layer that
it wants to use extended minors. After the usual minor space is used
up, block layer automatically allocates devt from EXT_BLOCK_MAJOR.
Currently only one major number is allocated for this but as the
allocation is on-demand so ~1mil minor space under it should suffice
for most cases.
For internal implementation simplicity, the first partition can't be
allocated on the extended area. In other words, genhd->minors should
at least be 1. Lifting this restriction shouldn't be too difficult.
Signed-off-by: Tejun Heo <[email protected]>
---
block/genhd.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++--
fs/partitions/check.c | 11 +++++
include/linux/genhd.h | 8 +++-
include/linux/major.h | 2 +
4 files changed, 134 insertions(+), 5 deletions(-)
diff --git a/block/genhd.c b/block/genhd.c
index e7310ba..97cc5e4 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -16,6 +16,7 @@
#include <linux/kobj_map.h>
#include <linux/buffer_head.h>
#include <linux/mutex.h>
+#include <linux/idr.h>
#include "blk.h"
@@ -24,6 +25,10 @@ static DEFINE_MUTEX(block_class_lock);
struct kobject *block_depr;
#endif
+/* for extended dynamic devt allocation, currently only one major is used */
+#define MAX_EXT_DEVT (1 << MINORBITS)
+static DEFINE_IDR(ext_devt_idr);
+
static struct device_type disk_type;
/*
@@ -136,6 +141,65 @@ EXPORT_SYMBOL(unregister_blkdev);
static struct kobj_map *bdev_map;
+/**
+ * blk_alloc_devt - allocate a dev_t for a partition
+ * @part: partition to allocate dev_t for
+ * @gfp_mask: memory allocation flag
+ * @devt: out parameter for resulting dev_t
+ *
+ * Allocate a dev_t for block device.
+ *
+ * RETURNS:
+ * 0 on success, allocated dev_t is returned in *@devt. -errno on
+ * failure.
+ *
+ * CONTEXT:
+ * Determined by @gfp_mask.
+ */
+int blk_alloc_devt(struct hd_struct *part, gfp_t gfp_mask, dev_t *devt)
+{
+ int idx, rc;
+
+ if (part->partno < part->disk->minors) {
+ *devt = MKDEV(part->disk->major, part->partno);
+ return 0;
+ }
+
+ while (true) {
+ if (!idr_pre_get(&ext_devt_idr, gfp_mask))
+ return -ENOMEM;
+
+ rc = idr_get_new(&ext_devt_idr, part, &idx);
+ if (rc == 0)
+ break;
+ if (rc && rc != -EAGAIN)
+ return rc;
+ }
+
+ if (idx > MAX_EXT_DEVT) {
+ idr_remove(&ext_devt_idr, idx);
+ return -EBUSY;
+ }
+
+ *devt = MKDEV(EXT_BLOCK_MAJOR, idx);
+ return 0;
+}
+
+/**
+ * blk_free_devt - free a dev_t
+ * @devt: dev_t to free
+ *
+ * Free @devt which was allocated using blk_alloc_devt().
+ *
+ * CONTEXT:
+ * Don't care.
+ */
+void blk_free_devt(dev_t devt)
+{
+ if (MAJOR(devt) == EXT_BLOCK_MAJOR)
+ idr_remove(&ext_devt_idr, MINOR(devt));
+}
+
/*
* Register device numbers dev..(dev+range-1)
* range must be nonzero
@@ -368,12 +432,43 @@ static struct kobject *base_probe(dev_t devt, int *part, void *data)
return NULL;
}
+static struct kobject *ext_probe(dev_t devt, int *idx, void *data)
+{
+ struct hd_struct *part;
+
+ part = idr_find(&ext_devt_idr, MINOR(devt));
+ if (unlikely(!part))
+ return NULL;
+
+ *idx = part->partno;
+ return &part->disk->dev.kobj;
+}
+
+static int ext_lock(dev_t devt, void *data)
+{
+ struct hd_struct *part;
+
+ part = idr_find(&ext_devt_idr, MINOR(devt));
+ if (likely(part && get_disk(part->disk)))
+ return 0;
+ return -1;
+}
+
static int __init genhd_device_init(void)
{
- int error = class_register(&block_class);
+ int error;
+
+ error = class_register(&block_class);
if (unlikely(error))
return error;
+
bdev_map = kobj_map_init(base_probe, &block_class_lock);
+ if (!bdev_map)
+ return -ENOMEM;
+
+ blk_register_region(MKDEV(EXT_BLOCK_MAJOR, 0), 1 << MINORBITS, NULL,
+ ext_probe, ext_lock, NULL);
+
blk_dev_init();
#ifndef CONFIG_SYSFS_DEPRECATED
@@ -691,22 +786,34 @@ EXPORT_SYMBOL(blk_lookup_devt);
struct gendisk *alloc_disk(int minors)
{
- return alloc_disk_node(minors, -1);
+ return alloc_disk_ext(minors, 0);
}
struct gendisk *alloc_disk_node(int minors, int node_id)
{
+ return alloc_disk_ext_node(minors, 0, node_id);
+}
+
+struct gendisk *alloc_disk_ext(int minors, int ext_minors)
+{
+ return alloc_disk_ext_node(minors, ext_minors, -1);
+}
+
+struct gendisk *alloc_disk_ext_node(int minors, int ext_minors, int node_id)
+{
struct gendisk *disk;
disk = kmalloc_node(sizeof(struct gendisk),
GFP_KERNEL | __GFP_ZERO, node_id);
if (disk) {
+ int tot_minors = minors + ext_minors;
+
if (!init_disk_stats(disk)) {
kfree(disk);
return NULL;
}
- if (minors > 1) {
- int size = (minors - 1) * sizeof(struct hd_struct *);
+ if (tot_minors > 1) {
+ int size = (tot_minors - 1) * sizeof(struct hd_struct *);
disk->part = kmalloc_node(size,
GFP_KERNEL | __GFP_ZERO, node_id);
if (!disk->part) {
@@ -716,6 +823,7 @@ struct gendisk *alloc_disk_node(int minors, int node_id)
}
}
disk->minors = minors;
+ disk->ext_minors = ext_minors;
rand_initialize_disk(disk);
disk->dev.class = &block_class;
disk->dev.type = &disk_type;
@@ -728,6 +836,8 @@ struct gendisk *alloc_disk_node(int minors, int node_id)
EXPORT_SYMBOL(alloc_disk);
EXPORT_SYMBOL(alloc_disk_node);
+EXPORT_SYMBOL(alloc_disk_ext);
+EXPORT_SYMBOL(alloc_disk_ext_node);
struct kobject *get_disk(struct gendisk *disk)
{
diff --git a/fs/partitions/check.c b/fs/partitions/check.c
index 994a621..15d231f 100644
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -331,6 +331,7 @@ void delete_partition(struct gendisk *disk, int part)
return;
if (!p->nr_sects)
return;
+ blk_free_devt(p->dev.devt);
disk->part[part-1] = NULL;
p->start_sect = 0;
p->nr_sects = 0;
@@ -386,6 +387,16 @@ void add_partition(struct gendisk *disk, int part, sector_t start, sector_t len,
p->dev.class = &block_class;
p->dev.type = &part_type;
p->dev.parent = &disk->dev;
+
+ err = blk_alloc_devt(p, GFP_KERNEL, &p->dev.devt);
+ if (err) {
+ printk(KERN_WARNING "%s: failed to to allocate MAJOR:MINOR "
+ "(part=%d, err=%d)\n", name, part, err);
+ free_part_stats(p);
+ kfree(p);
+ return;
+ }
+
disk->part[part-1] = p;
/* delay uevent until 'holders' subdir is created */
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 1db5740..a1843e6 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -117,6 +117,7 @@ struct gendisk {
int first_minor;
int minors; /* maximum number of minors, =1 for
* disks that can't be partitioned. */
+ int ext_minors; /* number of extended dynamic minors */
char disk_name[32]; /* name of major driver */
struct hd_struct **part; /* [indexed by minor - 1] */
struct block_device_operations *fops;
@@ -146,7 +147,7 @@ struct gendisk {
static inline int disk_max_parts(struct gendisk *disk)
{
- return disk->minors - 1;
+ return disk->minors + disk->ext_minors - 1;
}
static inline int disk_major(struct gendisk *disk)
@@ -551,6 +552,8 @@ struct unixware_disklabel {
#define ADDPART_FLAG_RAID 1
#define ADDPART_FLAG_WHOLEDISK 2
+extern int blk_alloc_devt(struct hd_struct *part, gfp_t gfp_mask, dev_t *devt);
+extern void blk_free_devt(dev_t devt);
extern dev_t blk_lookup_devt(const char *name, int part);
extern char *disk_name (struct gendisk *hd, int part, char *buf);
@@ -561,6 +564,9 @@ extern void printk_all_partitions(void);
extern struct gendisk *alloc_disk_node(int minors, int node_id);
extern struct gendisk *alloc_disk(int minors);
+extern struct gendisk *alloc_disk_ext_node(int minors, int ext_minrs,
+ int node_id);
+extern struct gendisk *alloc_disk_ext(int minors, int ext_minors);
extern struct kobject *get_disk(struct gendisk *disk);
extern void put_disk(struct gendisk *disk);
extern void blk_register_region(dev_t devt, unsigned long range,
diff --git a/include/linux/major.h b/include/linux/major.h
index 0cb9805..e7fa573 100644
--- a/include/linux/major.h
+++ b/include/linux/major.h
@@ -170,4 +170,6 @@
#define VIOTAPE_MAJOR 230
+#define EXT_BLOCK_MAJOR 259
+
#endif
--
1.5.4.5
With extended minors and the soon-to-follow debug feature, large minor
numbers for block devices will be common. This patch does the
followings.
* Adapt print formats such that large minors don't break the
formatting.
* For extended MAJ:MIN, %02x%02x for MAJ:MIN used in
printk_all_partitions() doesn't cut it anymore. Update it such that
%03x:%05x is used if either MAJ or MIN doesn't fit in %02x.
* Implement ext_range sysfs attribute which shows total minors the
device can use including both conventional minor space and the
extended one.
Signed-off-by: Tejun Heo <[email protected]>
---
block/genhd.c | 48 ++++++++++++++++++++++++++++++++++++------------
include/linux/fs.h | 1 +
2 files changed, 37 insertions(+), 12 deletions(-)
diff --git a/block/genhd.c b/block/genhd.c
index 97cc5e4..615e0de 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -286,6 +286,18 @@ struct gendisk *get_gendisk(dev_t devt, int *part)
return kobj ? dev_to_disk(dev) : NULL;
}
+static char *bdevt_str(int major, int minor, char *buf)
+{
+ if (major <= 0xff && minor <= 0xff) {
+ char tbuf[BDEVT_SIZE];
+ snprintf(tbuf, BDEVT_SIZE, "%02x%02x", major, minor);
+ snprintf(buf, BDEVT_SIZE, "%-9s", tbuf);
+ } else
+ snprintf(buf, BDEVT_SIZE, "%03x:%05x", major, minor);
+
+ return buf;
+}
+
/*
* print a full list of all partitions - intended for places where the root
* filesystem can't be mounted and thus to give the victim some idea of what
@@ -295,7 +307,8 @@ void __init printk_all_partitions(void)
{
struct device *dev;
struct gendisk *sgp;
- char buf[BDEVNAME_SIZE];
+ char devt_buf[BDEVT_SIZE];
+ char name_buf[BDEVNAME_SIZE];
int n;
mutex_lock(&block_class_lock);
@@ -315,10 +328,10 @@ void __init printk_all_partitions(void)
* Note, unlike /proc/partitions, I am showing the numbers in
* hex - the same format as the root= option takes.
*/
- printk("%02x%02x %10llu %s",
- disk_major(sgp), disk_minor(sgp),
+ printk("%s %10llu %s",
+ bdevt_str(disk_major(sgp), disk_minor(sgp), devt_buf),
(unsigned long long)get_capacity(sgp) >> 1,
- disk_name(sgp, 0, buf));
+ disk_name(sgp, 0, name_buf));
if (sgp->driverfs_dev != NULL &&
sgp->driverfs_dev->driver != NULL)
printk(" driver: %s\n",
@@ -333,10 +346,11 @@ void __init printk_all_partitions(void)
if (!part || !part->nr_sects)
continue;
- printk(" %02x%02x %10llu %s\n",
- part_major(part), part_minor(part),
+ printk(" %s %10llu %s\n",
+ bdevt_str(part_major(part), part_minor(part),
+ devt_buf),
(unsigned long long)sgp->part[n]->nr_sects >> 1,
- disk_name(sgp, n + 1, buf));
+ disk_name(sgp, n + 1, name_buf));
}
}
@@ -386,7 +400,7 @@ static int show_partition(struct seq_file *seqf, void *v)
char buf[BDEVNAME_SIZE];
if (&sgp->dev.node == block_class.devices.next)
- seq_puts(seqf, "major minor #blocks name\n\n");
+ seq_puts(seqf, "major minor #blocks name\n\n");
/* Don't show non-partitionable removeable devices or empty devices */
if (!get_capacity(sgp) || (!disk_max_parts(sgp) &&
@@ -396,7 +410,7 @@ static int show_partition(struct seq_file *seqf, void *v)
return 0;
/* show the full disk and all non-0 size partitions of it */
- seq_printf(seqf, "%4d %4d %10llu %s\n",
+ seq_printf(seqf, "%4d %7d %10llu %s\n",
disk_major(sgp), disk_minor(sgp),
(unsigned long long)get_capacity(sgp) >> 1,
disk_name(sgp, 0, buf));
@@ -406,7 +420,7 @@ static int show_partition(struct seq_file *seqf, void *v)
continue;
if (part->nr_sects == 0)
continue;
- seq_printf(seqf, "%4d %4d %10llu %s\n",
+ seq_printf(seqf, "%4d %7d %10llu %s\n",
part_major(part), part_minor(part),
(unsigned long long)part->nr_sects >> 1,
disk_name(sgp, n + 1, buf));
@@ -488,6 +502,14 @@ static ssize_t disk_range_show(struct device *dev,
return sprintf(buf, "%d\n", disk->minors);
}
+static ssize_t disk_ext_range_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct gendisk *disk = dev_to_disk(dev);
+
+ return sprintf(buf, "%d\n", disk_max_parts(disk) + 1);
+}
+
static ssize_t disk_removable_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -568,6 +590,7 @@ static ssize_t disk_fail_store(struct device *dev,
#endif
static DEVICE_ATTR(range, S_IRUGO, disk_range_show, NULL);
+static DEVICE_ATTR(ext_range, S_IRUGO, disk_ext_range_show, NULL);
static DEVICE_ATTR(removable, S_IRUGO, disk_removable_show, NULL);
static DEVICE_ATTR(size, S_IRUGO, disk_size_show, NULL);
static DEVICE_ATTR(capability, S_IRUGO, disk_capability_show, NULL);
@@ -579,6 +602,7 @@ static struct device_attribute dev_attr_fail =
static struct attribute *disk_attrs[] = {
&dev_attr_range.attr,
+ &dev_attr_ext_range.attr,
&dev_attr_removable.attr,
&dev_attr_size.attr,
&dev_attr_capability.attr,
@@ -677,7 +701,7 @@ static int diskstats_show(struct seq_file *s, void *v)
preempt_disable();
disk_round_stats(gp);
preempt_enable();
- seq_printf(s, "%4d %4d %s %lu %lu %llu %u %lu %lu %llu %u %u %u %u\n",
+ seq_printf(s, "%4d %7d %s %lu %lu %llu %u %lu %lu %llu %u %u %u %u\n",
disk_major(gp), disk_minor(gp), disk_name(gp, n, buf),
disk_stat_read(gp, ios[0]), disk_stat_read(gp, merges[0]),
(unsigned long long)disk_stat_read(gp, sectors[0]),
@@ -699,7 +723,7 @@ static int diskstats_show(struct seq_file *s, void *v)
preempt_disable();
part_round_stats(hd);
preempt_enable();
- seq_printf(s, "%4d %4d %s %lu %lu %llu "
+ seq_printf(s, "%4d %7d %s %lu %lu %llu "
"%u %lu %lu %llu %u %u %u %u\n",
part_major(hd), part_minor(hd),
disk_name(gp, n + 1, buf),
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d8e2762..8337243 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1667,6 +1667,7 @@ extern void chrdev_show(struct seq_file *,off_t);
/* fs/block_dev.c */
#define BDEVNAME_SIZE 32 /* Largest string for a blockdev identifier */
+#define BDEVT_SIZE 10 /* Largest string for MAJ:MIN for blkdev */
#ifdef CONFIG_BLOCK
#define BLKDEV_MAJOR_HASH_SIZE 255
--
1.5.4.5
Implement disk_major(), disk_minor(), part_major() and part_minor()
and use them to directly access devt instead of computing it from
first_minor.
While at it, implement disk_max_parts() to avoid directly deferencing
genhd->minors.
These changes are to enable extended block minors.a
Signed-off-by: Tejun Heo <[email protected]>
---
block/genhd.c | 80 ++++++++++++++++++++++++++++---------------------
block/ioctl.c | 6 ++--
fs/block_dev.c | 2 +-
fs/partitions/check.c | 22 +++++++------
include/linux/genhd.h | 27 ++++++++++++++++-
5 files changed, 88 insertions(+), 49 deletions(-)
diff --git a/block/genhd.c b/block/genhd.c
index b922d48..e7310ba 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -185,13 +185,14 @@ void add_disk(struct gendisk *disk)
struct backing_dev_info *bdi;
disk->flags |= GENHD_FL_UP;
- blk_register_region(MKDEV(disk->major, disk->first_minor),
- disk->minors, NULL, exact_match, exact_lock, disk);
+ disk->dev.devt = MKDEV(disk->major, disk->first_minor);
+ blk_register_region(disk->dev.devt, disk->minors, NULL,
+ exact_match, exact_lock, disk);
register_disk(disk);
blk_register_queue(disk);
bdi = &disk->queue->backing_dev_info;
- bdi_register_dev(bdi, MKDEV(disk->major, disk->first_minor));
+ bdi_register_dev(bdi, disk->dev.devt);
sysfs_create_link(&disk->dev.kobj, &bdi->dev->kobj, "bdi");
}
@@ -203,8 +204,7 @@ void unlink_gendisk(struct gendisk *disk)
sysfs_remove_link(&disk->dev.kobj, "bdi");
bdi_unregister(&disk->queue->backing_dev_info);
blk_unregister_queue(disk);
- blk_unregister_region(MKDEV(disk->major, disk->first_minor),
- disk->minors);
+ blk_unregister_region(disk->dev.devt, disk->minors);
}
/**
@@ -252,7 +252,7 @@ void __init printk_all_partitions(void)
* hex - the same format as the root= option takes.
*/
printk("%02x%02x %10llu %s",
- sgp->major, sgp->first_minor,
+ disk_major(sgp), disk_minor(sgp),
(unsigned long long)get_capacity(sgp) >> 1,
disk_name(sgp, 0, buf));
if (sgp->driverfs_dev != NULL &&
@@ -263,13 +263,14 @@ void __init printk_all_partitions(void)
printk(" (driver?)\n");
/* now show the partitions */
- for (n = 0; n < sgp->minors - 1; ++n) {
- if (sgp->part[n] == NULL)
- continue;
- if (sgp->part[n]->nr_sects == 0)
+ for (n = 0; n < disk_max_parts(sgp); ++n) {
+ struct hd_struct *part = sgp->part[n];
+
+ if (!part || !part->nr_sects)
continue;
+
printk(" %02x%02x %10llu %s\n",
- sgp->major, n + 1 + sgp->first_minor,
+ part_major(part), part_minor(part),
(unsigned long long)sgp->part[n]->nr_sects >> 1,
disk_name(sgp, n + 1, buf));
}
@@ -314,35 +315,36 @@ static void part_stop(struct seq_file *part, void *v)
mutex_unlock(&block_class_lock);
}
-static int show_partition(struct seq_file *part, void *v)
+static int show_partition(struct seq_file *seqf, void *v)
{
struct gendisk *sgp = v;
int n;
char buf[BDEVNAME_SIZE];
if (&sgp->dev.node == block_class.devices.next)
- seq_puts(part, "major minor #blocks name\n\n");
+ seq_puts(seqf, "major minor #blocks name\n\n");
/* Don't show non-partitionable removeable devices or empty devices */
- if (!get_capacity(sgp) ||
- (sgp->minors == 1 && (sgp->flags & GENHD_FL_REMOVABLE)))
+ if (!get_capacity(sgp) || (!disk_max_parts(sgp) &&
+ (sgp->flags & GENHD_FL_REMOVABLE)))
return 0;
if (sgp->flags & GENHD_FL_SUPPRESS_PARTITION_INFO)
return 0;
/* show the full disk and all non-0 size partitions of it */
- seq_printf(part, "%4d %4d %10llu %s\n",
- sgp->major, sgp->first_minor,
+ seq_printf(seqf, "%4d %4d %10llu %s\n",
+ disk_major(sgp), disk_minor(sgp),
(unsigned long long)get_capacity(sgp) >> 1,
disk_name(sgp, 0, buf));
- for (n = 0; n < sgp->minors - 1; n++) {
- if (!sgp->part[n])
+ for (n = 0; n < disk_max_parts(sgp); n++) {
+ struct hd_struct *part = sgp->part[n];
+ if (!part)
continue;
- if (sgp->part[n]->nr_sects == 0)
+ if (part->nr_sects == 0)
continue;
- seq_printf(part, "%4d %4d %10llu %s\n",
- sgp->major, n + 1 + sgp->first_minor,
- (unsigned long long)sgp->part[n]->nr_sects >> 1 ,
+ seq_printf(seqf, "%4d %4d %10llu %s\n",
+ part_major(part), part_minor(part),
+ (unsigned long long)part->nr_sects >> 1,
disk_name(sgp, n + 1, buf));
}
@@ -581,7 +583,7 @@ static int diskstats_show(struct seq_file *s, void *v)
disk_round_stats(gp);
preempt_enable();
seq_printf(s, "%4d %4d %s %lu %lu %llu %u %lu %lu %llu %u %u %u %u\n",
- gp->major, n + gp->first_minor, disk_name(gp, n, buf),
+ disk_major(gp), disk_minor(gp), disk_name(gp, n, buf),
disk_stat_read(gp, ios[0]), disk_stat_read(gp, merges[0]),
(unsigned long long)disk_stat_read(gp, sectors[0]),
jiffies_to_msecs(disk_stat_read(gp, ticks[0])),
@@ -593,7 +595,7 @@ static int diskstats_show(struct seq_file *s, void *v)
jiffies_to_msecs(disk_stat_read(gp, time_in_queue)));
/* now show all non-0 size partitions of it */
- for (n = 0; n < gp->minors - 1; n++) {
+ for (n = 0; n < disk_max_parts(gp); n++) {
struct hd_struct *hd = gp->part[n];
if (!hd || !hd->nr_sects)
@@ -604,7 +606,7 @@ static int diskstats_show(struct seq_file *s, void *v)
preempt_enable();
seq_printf(s, "%4d %4d %s %lu %lu %llu "
"%u %lu %lu %llu %u %u %u %u\n",
- gp->major, n + gp->first_minor + 1,
+ part_major(hd), part_minor(hd),
disk_name(gp, n + 1, buf),
part_stat_read(hd, ios[0]),
part_stat_read(hd, merges[0]),
@@ -653,23 +655,33 @@ void genhd_media_change_notify(struct gendisk *disk)
EXPORT_SYMBOL_GPL(genhd_media_change_notify);
#endif /* 0 */
-dev_t blk_lookup_devt(const char *name, int part)
+dev_t blk_lookup_devt(const char *name, int partno)
{
struct device *dev;
dev_t devt = MKDEV(0, 0);
mutex_lock(&block_class_lock);
list_for_each_entry(dev, &block_class.devices, node) {
+ struct gendisk *disk = dev_to_disk(dev);
+
if (dev->type != &disk_type)
continue;
- if (strcmp(dev->bus_id, name) == 0) {
- struct gendisk *disk = dev_to_disk(dev);
+ if (strcmp(dev->bus_id, name))
+ continue;
+ if (partno < 0 || partno > disk_max_parts(disk))
+ continue;
- if (part < disk->minors)
- devt = MKDEV(MAJOR(dev->devt),
- MINOR(dev->devt) + part);
- break;
+ if (partno == 0)
+ devt = disk->dev.devt;
+ else {
+ struct hd_struct *part = disk->part[partno - 1];
+
+ if (!part || !part->nr_sects)
+ continue;
+
+ devt = part->dev.devt;
}
+ break;
}
mutex_unlock(&block_class_lock);
@@ -760,7 +772,7 @@ void set_disk_ro(struct gendisk *disk, int flag)
{
int i;
disk->policy = flag;
- for (i = 0; i < disk->minors - 1; i++)
+ for (i = 0; i < disk_max_parts(disk); i++)
if (disk->part[i]) disk->part[i]->policy = flag;
}
diff --git a/block/ioctl.c b/block/ioctl.c
index 52d6385..9b008a2 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -28,7 +28,7 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
if (bdev != bdev->bd_contains)
return -EINVAL;
part = p.pno;
- if (part <= 0 || part >= disk->minors)
+ if (part <= 0 || part > disk_max_parts(disk))
return -EINVAL;
switch (a.op) {
case BLKPG_ADD_PARTITION:
@@ -49,7 +49,7 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
return -EBUSY;
}
/* overlap? */
- for (i = 0; i < disk->minors - 1; i++) {
+ for (i = 0; i < disk_max_parts(disk); i++) {
struct hd_struct *s = disk->part[i];
if (!s)
@@ -99,7 +99,7 @@ static int blkdev_reread_part(struct block_device *bdev)
struct gendisk *disk = bdev->bd_disk;
int res;
- if (disk->minors == 1 || bdev != bdev->bd_contains)
+ if (!disk_max_parts(disk) || bdev != bdev->bd_contains)
return -EINVAL;
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 10d8a0a..215e4be 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -892,7 +892,7 @@ int check_disk_change(struct block_device *bdev)
if (bdops->revalidate_disk)
bdops->revalidate_disk(bdev->bd_disk);
- if (bdev->bd_disk->minors > 1)
+ if (disk_max_parts(bdev->bd_disk))
bdev->bd_invalidated = 1;
return 1;
}
diff --git a/fs/partitions/check.c b/fs/partitions/check.c
index b915ac2..994a621 100644
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -134,8 +134,12 @@ char *disk_name(struct gendisk *hd, int part, char *buf)
const char *bdevname(struct block_device *bdev, char *buf)
{
- int part = MINOR(bdev->bd_dev) - bdev->bd_disk->first_minor;
- return disk_name(bdev->bd_disk, part, buf);
+ int partno = 0;
+
+ if (bdev->bd_part)
+ partno = bdev->bd_part->partno;
+
+ return disk_name(bdev->bd_disk, partno, buf);
}
EXPORT_SYMBOL(bdevname);
@@ -169,7 +173,7 @@ check_partition(struct gendisk *hd, struct block_device *bdev)
if (isdigit(state->name[strlen(state->name)-1]))
sprintf(state->name, "p");
- state->limit = hd->minors;
+ state->limit = disk_max_parts(hd) + 1;
i = res = err = 0;
while (!res && check_part[i]) {
memset(&state->parts, 0, sizeof(state->parts));
@@ -379,7 +383,6 @@ void add_partition(struct gendisk *disk, int part, sector_t start, sector_t len,
"%s%d", disk->dev.bus_id, part);
device_initialize(&p->dev);
- p->dev.devt = MKDEV(disk->major, disk->first_minor + part);
p->dev.class = &block_class;
p->dev.type = &part_type;
p->dev.parent = &disk->dev;
@@ -408,7 +411,6 @@ void register_disk(struct gendisk *disk)
int err;
disk->dev.parent = disk->driverfs_dev;
- disk->dev.devt = MKDEV(disk->major, disk->first_minor);
strlcpy(disk->dev.bus_id, disk->disk_name, KOBJ_NAME_LEN);
/* ewww... some of these buggers have / in the name... */
@@ -432,7 +434,7 @@ void register_disk(struct gendisk *disk)
disk_sysfs_add_subdirs(disk);
/* No minors to use for partitions */
- if (disk->minors == 1)
+ if (!disk_max_parts(disk))
goto exit;
/* No such device (e.g., media were just removed) */
@@ -455,8 +457,8 @@ exit:
kobject_uevent(&disk->dev.kobj, KOBJ_ADD);
/* announce possible partitions */
- for (i = 1; i < disk->minors; i++) {
- p = disk->part[i-1];
+ for (i = 0; i < disk_max_parts(disk); i++) {
+ p = disk->part[i];
if (!p || !p->nr_sects)
continue;
kobject_uevent(&p->dev.kobj, KOBJ_ADD);
@@ -474,7 +476,7 @@ int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
if (res)
return res;
bdev->bd_invalidated = 0;
- for (p = 1; p < disk->minors; p++)
+ for (p = 1; p <= disk_max_parts(disk); p++)
delete_partition(disk, p);
if (disk->fops->revalidate_disk)
disk->fops->revalidate_disk(disk);
@@ -531,7 +533,7 @@ void del_gendisk(struct gendisk *disk)
int p;
/* invalidate stuff */
- for (p = disk->minors - 1; p > 0; p--) {
+ for (p = disk_max_parts(disk); p > 0; p--) {
invalidate_partition(disk, p);
delete_partition(disk, p);
}
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 1f06681..1db5740 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -144,6 +144,31 @@ struct gendisk {
struct work_struct async_notify;
};
+static inline int disk_max_parts(struct gendisk *disk)
+{
+ return disk->minors - 1;
+}
+
+static inline int disk_major(struct gendisk *disk)
+{
+ return MAJOR(disk->dev.devt);
+}
+
+static inline int disk_minor(struct gendisk *disk)
+{
+ return MINOR(disk->dev.devt);
+}
+
+static inline int part_major(struct hd_struct *part)
+{
+ return MAJOR(part->dev.devt);
+}
+
+static inline int part_minor(struct hd_struct *part)
+{
+ return MINOR(part->dev.devt);
+}
+
/*
* Macros to operate on percpu disk statistics:
*
@@ -155,7 +180,7 @@ static inline struct hd_struct *get_part(struct gendisk *gendiskp,
{
struct hd_struct *part;
int i;
- for (i = 0; i < gendiskp->minors - 1; i++) {
+ for (i = 0; i < disk_max_parts(gendiskp); i++) {
part = gendiskp->part[i];
if (part && part->start_sect <= sector
&& sector < part->start_sect + part->nr_sects)
--
1.5.4.5
On Thu, Jul 03, 2008 at 05:33:00PM +0900, Tejun Heo wrote:
> I chose major 259 at my own whim. Is it okay to use this value? As
No. You need to get it assigned by LANANA (Torben Mathiasen
<[email protected]>). See Documentation/devices.txt.
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
Matthew Wilcox wrote:
> On Thu, Jul 03, 2008 at 05:33:00PM +0900, Tejun Heo wrote:
>> I chose major 259 at my own whim. Is it okay to use this value? As
>
> No. You need to get it assigned by LANANA (Torben Mathiasen
> <[email protected]>). See Documentation/devices.txt.
Yeap, already cc'd && the question was sorta directed to LANANA. :-)
--
tejun
Tejun Heo wrote:
> Hello, all.
>
> This patchset implements extended devt for block devices. This is
> mainly to work around sd limitations (16 minors per device) but can
> easily be used to allow more partitions or more devices.
I'm updating the patchset. Please standby a bit.
Thanks.
--
tejun