The 'max_part' parameter controls the number of maximum partition
a loop block device can have. However if a user specifies very
large value it would exceed the limitation of device minor number
and can cause a kernel panic (or, at least, produce invalid
device nodes in some cases).
On my desktop system, following command kills the kernel. On qemu,
it triggers similar oops but the kernel was alive:
$ sudo modprobe loop max_part=200000
------------[ cut here ]------------
kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65!
invalid opcode: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in: loop(+)
Pid: 43, comm: insmod Tainted: G W 2.6.39-qemu+ #155 Bochs Bochs
RIP: 0010:[<ffffffff8113ce61>] [<ffffffff8113ce61>] internal_create_group+0x2a/0x170
RSP: 0018:ffff880007b3fde8 EFLAGS: 00000246
RAX: 00000000ffffffef RBX: ffff880007b3d878 RCX: 00000000000007b4
RDX: ffffffff8152da50 RSI: 0000000000000000 RDI: ffff880007b3d878
RBP: ffff880007b3fe38 R08: ffff880007b3fde8 R09: 0000000000000000
R10: ffff88000783b4a8 R11: ffff880007b3d878 R12: ffffffff8152da50
R13: ffff880007b3d868 R14: 0000000000000000 R15: ffff880007b3d800
FS: 0000000002137880(0063) GS:ffff880007c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000422680 CR3: 0000000007b50000 CR4: 00000000000006b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
Process insmod (pid: 43, threadinfo ffff880007b3e000, task ffff880007afb9c0)
Stack:
ffff880007b3fe58 ffffffff811e66dd ffff880007b3fe58 ffffffff811e570b
0000000000000010 ffff880007b3d800 ffff880007a7b390 ffff880007b3d868
0000000000400920 ffff880007b3d800 ffff880007b3fe48 ffffffff8113cfc8
Call Trace:
[<ffffffff811e66dd>] ? device_add+0x4bc/0x5af
[<ffffffff811e570b>] ? dev_set_name+0x3c/0x3e
[<ffffffff8113cfc8>] sysfs_create_group+0xe/0x12
[<ffffffff810b420e>] blk_trace_init_sysfs+0x14/0x16
[<ffffffff8116a090>] blk_register_queue+0x47/0xf7
[<ffffffff8116f527>] add_disk+0xdf/0x290
[<ffffffffa00060eb>] loop_init+0xeb/0x1b8 [loop]
[<ffffffffa0006000>] ? 0xffffffffa0005fff
[<ffffffff8100020a>] do_one_initcall+0x7a/0x12e
[<ffffffff81096804>] sys_init_module+0x9c/0x1e0
[<ffffffff813329bb>] system_call_fastpath+0x16/0x1b
Code: c3 55 48 89 e5 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 53 48 89 fb 48 83 ec 28 48 85 ff 74 0b 85 f6 75 0b 48 83 7f 30 00 75 14 <0f> 0b eb fe 48 83 7f 30 00 b9 ea ff ff ff 0f 84 18 01 00 00 49
RIP [<ffffffff8113ce61>] internal_create_group+0x2a/0x170
RSP <ffff880007b3fde8>
---[ end trace a123eb592043acad ]---
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Laurent Vivier <[email protected]>
---
drivers/block/loop.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index a076a14ca72d..cbf7052d1dd5 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1691,6 +1691,9 @@ static int __init loop_init(void)
if (max_part > 0)
part_shift = fls(max_part);
+ if ((1UL << part_shift) > DISK_MAX_PARTS)
+ return -EINVAL;
+
if (max_loop > 1UL << (MINORBITS - part_shift))
return -EINVAL;
--
1.7.5.2
When finding or allocating a loop device, loop_probe() did not take
partition numbers into account so that it can result to a different
device. Consider following example:
$ sudo modprobe loop max_part=15
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0
brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1
brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2
brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3
brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4
brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5
brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6
brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
$ sudo mknod /dev/loop8 b 7 128
$ sudo losetup /dev/loop8 ~/temp/disk-with-3-parts.img
$ sudo losetup -a
/dev/loop128: [0805]:278201 (/home/namhyung/temp/disk-with-3-parts.img)
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0
brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1
brw-rw---- 1 root disk 7, 2048 2011-05-24 22:18 /dev/loop128
brw-rw---- 1 root disk 7, 2049 2011-05-24 22:18 /dev/loop128p1
brw-rw---- 1 root disk 7, 2050 2011-05-24 22:18 /dev/loop128p2
brw-rw---- 1 root disk 7, 2051 2011-05-24 22:18 /dev/loop128p3
brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2
brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3
brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4
brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5
brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6
brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
brw-r--r-- 1 root root 7, 128 2011-05-24 22:17 /dev/loop8
After this patch, /dev/loop8 - instead of /dev/loop128 - was
accessed correctly.
In addition, 'range' passed to blk_register_region() should
include all range of dev_t that LOOP_MAJOR can address. It does
not need to be limited by partition numbers unless 'max_loop'
param was specified.
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Laurent Vivier <[email protected]>
---
drivers/block/loop.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index cbf7052d1dd5..c59a672a3de0 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1658,7 +1658,7 @@ static struct kobject *loop_probe(dev_t dev, int *part, void *data)
struct kobject *kobj;
mutex_lock(&loop_devices_mutex);
- lo = loop_init_one(dev & MINORMASK);
+ lo = loop_init_one(MINOR(dev) >> part_shift);
kobj = lo ? get_disk(lo->lo_disk) : ERR_PTR(-ENOMEM);
mutex_unlock(&loop_devices_mutex);
@@ -1699,10 +1699,10 @@ static int __init loop_init(void)
if (max_loop) {
nr = max_loop;
- range = max_loop;
+ range = max_loop << part_shift;
} else {
nr = 8;
- range = 1UL << (MINORBITS - part_shift);
+ range = 1UL << MINORBITS;
}
if (register_blkdev(LOOP_MAJOR, "loop"))
@@ -1741,7 +1741,7 @@ static void __exit loop_exit(void)
unsigned long range;
struct loop_device *lo, *next;
- range = max_loop ? max_loop : 1UL << (MINORBITS - part_shift);
+ range = max_loop ? max_loop << part_shift : 1UL << MINORBITS;
list_for_each_entry_safe(lo, next, &loop_devices, lo_list)
loop_del_one(lo);
--
1.7.5.2
On 2011-05-24 16:36, Namhyung Kim wrote:
> When finding or allocating a loop device, loop_probe() did not take
> partition numbers into account so that it can result to a different
> device. Consider following example:
>
> $ sudo modprobe loop max_part=15
> $ ls -l /dev/loop*
> brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0
> brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1
> brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2
> brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3
> brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4
> brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5
> brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6
> brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
> $ sudo mknod /dev/loop8 b 7 128
> $ sudo losetup /dev/loop8 ~/temp/disk-with-3-parts.img
> $ sudo losetup -a
> /dev/loop128: [0805]:278201 (/home/namhyung/temp/disk-with-3-parts.img)
> $ ls -l /dev/loop*
> brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0
> brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1
> brw-rw---- 1 root disk 7, 2048 2011-05-24 22:18 /dev/loop128
> brw-rw---- 1 root disk 7, 2049 2011-05-24 22:18 /dev/loop128p1
> brw-rw---- 1 root disk 7, 2050 2011-05-24 22:18 /dev/loop128p2
> brw-rw---- 1 root disk 7, 2051 2011-05-24 22:18 /dev/loop128p3
> brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2
> brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3
> brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4
> brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5
> brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6
> brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
> brw-r--r-- 1 root root 7, 128 2011-05-24 22:17 /dev/loop8
>
> After this patch, /dev/loop8 - instead of /dev/loop128 - was
> accessed correctly.
>
> In addition, 'range' passed to blk_register_region() should
> include all range of dev_t that LOOP_MAJOR can address. It does
> not need to be limited by partition numbers unless 'max_loop'
> param was specified.
Thanks, applied. Seems we should mark this for stable.
--
Jens Axboe
On 2011-05-24 16:36, Namhyung Kim wrote:
> The 'max_part' parameter controls the number of maximum partition
> a loop block device can have. However if a user specifies very
> large value it would exceed the limitation of device minor number
> and can cause a kernel panic (or, at least, produce invalid
> device nodes in some cases).
Applied as well, ditto in the stable comment.
--
Jens Axboe
2011-05-24 (화), 16:45 +0200, Jens Axboe:
> Thanks, applied. Seems we should mark this for stable.
>
Hi Jens,
The partition handling was introduced by commit 476a4813cfdd ("loop:
manage partitions in disk image") so that we need these patches for all
stable releases, AFAICS.
$ git name-rev --tags 476a4813cfdd
476a4813cfdd tags/v2.6.26-rc1~1115^2~9
Now I can see that brd (ramdisk) module seems to have same problems with
loop. I'll work on that, too.
Thanks.
--
Regards,
Namhyung Kim
On 2011-05-24 16:57, Namhyung Kim wrote:
> 2011-05-24 (화), 16:45 +0200, Jens Axboe:
>> Thanks, applied. Seems we should mark this for stable.
>>
>
> Hi Jens,
>
> The partition handling was introduced by commit 476a4813cfdd ("loop:
> manage partitions in disk image") so that we need these patches for all
> stable releases, AFAICS.
>
> $ git name-rev --tags 476a4813cfdd
> 476a4813cfdd tags/v2.6.26-rc1~1115^2~9
Thanks, I have committed with a stable marker.
> Now I can see that brd (ramdisk) module seems to have same problems with
> loop. I'll work on that, too.
Good catch, btw!
--
Jens Axboe
Le mardi 24 mai 2011 à 23:57 +0900, Namhyung Kim a écrit :
> 2011-05-24 (화), 16:45 +0200, Jens Axboe:
> > Thanks, applied. Seems we should mark this for stable.
> >
>
> Hi Jens,
>
> The partition handling was introduced by commit 476a4813cfdd ("loop:
> manage partitions in disk image") so that we need these patches for all
> stable releases, AFAICS.
>
> $ git name-rev --tags 476a4813cfdd
> 476a4813cfdd tags/v2.6.26-rc1~1115^2~9
>
> Now I can see that brd (ramdisk) module seems to have same problems with
> loop. I'll work on that, too.
Have a look to nbd.c too.
Regards,
Laurent
--
------------ [email protected] ------------
"We are the Borg. You will be assimilated. Your
biological and technological distinctiveness
will be added to our own. Resistance is futile."
Le mardi 24 mai 2011 à 23:36 +0900, Namhyung Kim a écrit :
> The 'max_part' parameter controls the number of maximum partition
> a loop block device can have. However if a user specifies very
> large value it would exceed the limitation of device minor number
> and can cause a kernel panic (or, at least, produce invalid
> device nodes in some cases).
>
> On my desktop system, following command kills the kernel. On qemu,
> it triggers similar oops but the kernel was alive:
>
> $ sudo modprobe loop max_part=200000
> ------------[ cut here ]------------
> kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65!
> invalid opcode: 0000 [#1] SMP
> last sysfs file:
> CPU 0
> Modules linked in: loop(+)
>
> Pid: 43, comm: insmod Tainted: G W 2.6.39-qemu+ #155 Bochs Bochs
> RIP: 0010:[<ffffffff8113ce61>] [<ffffffff8113ce61>] internal_create_group+0x2a/0x170
> RSP: 0018:ffff880007b3fde8 EFLAGS: 00000246
> RAX: 00000000ffffffef RBX: ffff880007b3d878 RCX: 00000000000007b4
> RDX: ffffffff8152da50 RSI: 0000000000000000 RDI: ffff880007b3d878
> RBP: ffff880007b3fe38 R08: ffff880007b3fde8 R09: 0000000000000000
> R10: ffff88000783b4a8 R11: ffff880007b3d878 R12: ffffffff8152da50
> R13: ffff880007b3d868 R14: 0000000000000000 R15: ffff880007b3d800
> FS: 0000000002137880(0063) GS:ffff880007c00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000422680 CR3: 0000000007b50000 CR4: 00000000000006b0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
> Process insmod (pid: 43, threadinfo ffff880007b3e000, task ffff880007afb9c0)
> Stack:
> ffff880007b3fe58 ffffffff811e66dd ffff880007b3fe58 ffffffff811e570b
> 0000000000000010 ffff880007b3d800 ffff880007a7b390 ffff880007b3d868
> 0000000000400920 ffff880007b3d800 ffff880007b3fe48 ffffffff8113cfc8
> Call Trace:
> [<ffffffff811e66dd>] ? device_add+0x4bc/0x5af
> [<ffffffff811e570b>] ? dev_set_name+0x3c/0x3e
> [<ffffffff8113cfc8>] sysfs_create_group+0xe/0x12
> [<ffffffff810b420e>] blk_trace_init_sysfs+0x14/0x16
> [<ffffffff8116a090>] blk_register_queue+0x47/0xf7
> [<ffffffff8116f527>] add_disk+0xdf/0x290
> [<ffffffffa00060eb>] loop_init+0xeb/0x1b8 [loop]
> [<ffffffffa0006000>] ? 0xffffffffa0005fff
> [<ffffffff8100020a>] do_one_initcall+0x7a/0x12e
> [<ffffffff81096804>] sys_init_module+0x9c/0x1e0
> [<ffffffff813329bb>] system_call_fastpath+0x16/0x1b
> Code: c3 55 48 89 e5 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 53 48 89 fb 48 83 ec 28 48 85 ff 74 0b 85 f6 75 0b 48 83 7f 30 00 75 14 <0f> 0b eb fe 48 83 7f 30 00 b9 ea ff ff ff 0f 84 18 01 00 00 49
> RIP [<ffffffff8113ce61>] internal_create_group+0x2a/0x170
> RSP <ffff880007b3fde8>
> ---[ end trace a123eb592043acad ]---
>
> Signed-off-by: Namhyung Kim <[email protected]>
> Cc: Laurent Vivier <[email protected]>
> ---
> drivers/block/loop.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index a076a14ca72d..cbf7052d1dd5 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1691,6 +1691,9 @@ static int __init loop_init(void)
> if (max_part > 0)
> part_shift = fls(max_part);
>
> + if ((1UL << part_shift) > DISK_MAX_PARTS)
> + return -EINVAL;
> +
looks fine.
Reviewed-by: Laurent Vivier <[email protected]>
> if (max_loop > 1UL << (MINORBITS - part_shift))
> return -EINVAL;
>
--
------------ [email protected] ------------
"We are the Borg. You will be assimilated. Your
biological and technological distinctiveness
will be added to our own. Resistance is futile."
2011-05-24 (화), 17:01 +0200, Laurent Vivier:
> Le mardi 24 mai 2011 à 23:57 +0900, Namhyung Kim a écrit :
> > 2011-05-24 (화), 16:45 +0200, Jens Axboe:
> > > Thanks, applied. Seems we should mark this for stable.
> > >
> >
> > Hi Jens,
> >
> > The partition handling was introduced by commit 476a4813cfdd ("loop:
> > manage partitions in disk image") so that we need these patches for all
> > stable releases, AFAICS.
> >
> > $ git name-rev --tags 476a4813cfdd
> > 476a4813cfdd tags/v2.6.26-rc1~1115^2~9
> >
> > Now I can see that brd (ramdisk) module seems to have same problems with
> > loop. I'll work on that, too.
>
> Have a look to nbd.c too.
>
> Regards,
> Laurent
>
Hi Laurent,
I'll take a look on that too after looking at brd. Thanks for the
comment.
--
Regards,
Namhyung Kim