struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".
But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
bd_set_size+0x10/0xa0
loop_clr_fd+0x1f8/0x420 [loop]
lo_ioctl+0x200/0x7e0 [loop]
lo_compat_ioctl+0x47/0xe0 [loop]
compat_blkdev_ioctl+0x341/0x1290
do_filp_open+0x42/0xa0
compat_sys_ioctl+0xc1/0xf20
do_sys_open+0x16e/0x1d0
sysenter_dispatch+0x7/0x1a
To prevent use-after-free we need to hold device inode in loop_set_fd()
and put it later in loop_clr_fd().
The issue is reprodusible on current Linus head and v3.3. Here is the test:
dd if=/dev/zero of=loop.file bs=1M count=1
while [ true ]; do
losetup /dev/loop0 loop.file
echo 2 > /proc/sys/vm/drop_caches
losetup -d /dev/loop0
done
Signed-off-by: Anatol Pomozov <[email protected]>
---
drivers/block/loop.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index fe5f640..7ab9520 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -922,6 +922,14 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
lo->lo_flags |= LO_FLAGS_PARTSCAN;
if (lo->lo_flags & LO_FLAGS_PARTSCAN)
ioctl_by_bdev(bdev, BLKRRPART, 0);
+
+ /* bdev lifecycle is defined by its bd_inode (see
+ * struct bdev_inode usage). In case of loop device we need to make
+ * sure that bdev deallocation will not happen between loop_set_fd()
+ * and loop_clr_fd() invocations. To do this we need to hold
+ * bdev inode here and put it later in loop_clr_fd().
+ */
+ ihold(bdev->bd_inode);
return 0;
out_clr:
@@ -1031,8 +1039,11 @@ static int loop_clr_fd(struct loop_device *lo)
memset(lo->lo_encrypt_key, 0, LO_KEY_SIZE);
memset(lo->lo_crypt_name, 0, LO_NAME_SIZE);
memset(lo->lo_file_name, 0, LO_NAME_SIZE);
- if (bdev)
+ if (bdev) {
+ BUG_ON(atomic_read(&bdev->bd_inode->i_count) < 2);
+ iput(bdev->bd_inode);
invalidate_bdev(bdev);
+ }
set_capacity(lo->lo_disk, 0);
loop_sysfs_exit(lo);
if (bdev) {
--
1.8.1.3
On Mon, Apr 1, 2013 at 4:58 AM, Anatol Pomozov <[email protected]> wrote:
>
> To prevent use-after-free we need to hold device inode in loop_set_fd()
> and put it later in loop_clr_fd().
Is there something that guarantees that there's only one loop_set_fd()
and one paired loop_clr_fd()?
IOW, what protects us from somebody doing loop_clr_fd() on a device
that hasn't been set up yet, or doing multiple loop_set_fd calls?
I suspect the "lo->lo_state" is part of the answer, but it's very much
not obvious, and I'd want this to be explicitly explained.
Linus
On Mon, Apr 01, 2013 at 04:58:05AM -0700, Anatol Pomozov wrote:
> lo->lo_flags |= LO_FLAGS_PARTSCAN;
> if (lo->lo_flags & LO_FLAGS_PARTSCAN)
> ioctl_by_bdev(bdev, BLKRRPART, 0);
> +
> + /* bdev lifecycle is defined by its bd_inode (see
> + * struct bdev_inode usage). In case of loop device we need to make
> + * sure that bdev deallocation will not happen between loop_set_fd()
> + * and loop_clr_fd() invocations. To do this we need to hold
> + * bdev inode here and put it later in loop_clr_fd().
> + */
> + ihold(bdev->bd_inode);
That's open-coded bdgrab()
> + if (bdev) {
> + BUG_ON(atomic_read(&bdev->bd_inode->i_count) < 2);
> + iput(bdev->bd_inode);
... and that - bdput() (I'd drop that BUG_ON())
Hi
On Mon, Apr 1, 2013 at 9:28 AM, Al Viro <[email protected]> wrote:
> On Mon, Apr 01, 2013 at 04:58:05AM -0700, Anatol Pomozov wrote:
>> lo->lo_flags |= LO_FLAGS_PARTSCAN;
>> if (lo->lo_flags & LO_FLAGS_PARTSCAN)
>> ioctl_by_bdev(bdev, BLKRRPART, 0);
>> +
>> + /* bdev lifecycle is defined by its bd_inode (see
>> + * struct bdev_inode usage). In case of loop device we need to make
>> + * sure that bdev deallocation will not happen between loop_set_fd()
>> + * and loop_clr_fd() invocations. To do this we need to hold
>> + * bdev inode here and put it later in loop_clr_fd().
>> + */
>> + ihold(bdev->bd_inode);
>
> That's open-coded bdgrab()
Done. It also requires EXPORTING bdgrab.
The test still passes.
>
>> + if (bdev) {
>> + BUG_ON(atomic_read(&bdev->bd_inode->i_count) < 2);
>> + iput(bdev->bd_inode);
>
> ... and that - bdput() (I'd drop that BUG_ON())
struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".
But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
bd_set_size+0x10/0xa0
loop_clr_fd+0x1f8/0x420 [loop]
lo_ioctl+0x200/0x7e0 [loop]
lo_compat_ioctl+0x47/0xe0 [loop]
compat_blkdev_ioctl+0x341/0x1290
do_filp_open+0x42/0xa0
compat_sys_ioctl+0xc1/0xf20
do_sys_open+0x16e/0x1d0
sysenter_dispatch+0x7/0x1a
To prevent use-after-free we need to grab the device in loop_set_fd()
and put it later in loop_clr_fd().
The issue is reprodusible on current Linus head and v3.3. Here is the test:
dd if=/dev/zero of=loop.file bs=1M count=1
while [ true ]; do
losetup /dev/loop0 loop.file
echo 2 > /proc/sys/vm/drop_caches
losetup -d /dev/loop0
done
Signed-off-by: Anatol Pomozov <[email protected]>
---
drivers/block/loop.c | 9 ++++++++-
fs/block_dev.c | 1 +
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index fe5f640..2c127f9 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -922,6 +922,11 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
lo->lo_flags |= LO_FLAGS_PARTSCAN;
if (lo->lo_flags & LO_FLAGS_PARTSCAN)
ioctl_by_bdev(bdev, BLKRRPART, 0);
+
+ /* Grab the block_device to prevent its destruction after we
+ * put /dev/loopXX inode. Later in loop_clr_fd() we bdput(bdev).
+ */
+ bdgrab(bdev);
return 0;
out_clr:
@@ -1031,8 +1036,10 @@ static int loop_clr_fd(struct loop_device *lo)
memset(lo->lo_encrypt_key, 0, LO_KEY_SIZE);
memset(lo->lo_crypt_name, 0, LO_NAME_SIZE);
memset(lo->lo_file_name, 0, LO_NAME_SIZE);
- if (bdev)
+ if (bdev) {
+ bdput(bdev);
invalidate_bdev(bdev);
+ }
set_capacity(lo->lo_disk, 0);
loop_sysfs_exit(lo);
if (bdev) {
diff --git a/fs/block_dev.c b/fs/block_dev.c
index aea605c..aae187a 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -551,6 +551,7 @@ struct block_device *bdgrab(struct block_device *bdev)
ihold(bdev->bd_inode);
return bdev;
}
+EXPORT_SYMBOL(bdgrab);
long nr_blockdev_pages(void)
{
--
1.8.1.3
Hi
On Mon, Apr 1, 2013 at 8:16 AM, Linus Torvalds
<[email protected]> wrote:
> On Mon, Apr 1, 2013 at 4:58 AM, Anatol Pomozov <[email protected]> wrote:
>>
>> To prevent use-after-free we need to hold device inode in loop_set_fd()
>> and put it later in loop_clr_fd().
>
> Is there something that guarantees that there's only one loop_set_fd()
> and one paired loop_clr_fd()?
Yes there is such guarantee.
Every time we call loop_set_fd() we check that loop_device->lo_state
is Lo_unbound and set it to Lo_bound If somebody will try to set_fd
again it will get EBUSY. And if we try to loop_clr_fd() on unbound
loop device we'll get ENXIO.
loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
loop_device->lo_ctl_mutex.
>
> IOW, what protects us from somebody doing loop_clr_fd() on a device
> that hasn't been set up yet, or doing multiple loop_set_fd calls?
> I suspect the "lo->lo_state" is part of the answer, but it's very much
> not obvious, and I'd want this to be explicitly explained.
>
> Linus
On Mon, Apr 1, 2013 at 10:00 AM, Anatol Pomozov
<[email protected]> wrote:
> Hi
>
> On Mon, Apr 1, 2013 at 8:16 AM, Linus Torvalds
> <[email protected]> wrote:
>> On Mon, Apr 1, 2013 at 4:58 AM, Anatol Pomozov <[email protected]> wrote:
>>>
>>> To prevent use-after-free we need to hold device inode in loop_set_fd()
>>> and put it later in loop_clr_fd().
>>
>> Is there something that guarantees that there's only one loop_set_fd()
>> and one paired loop_clr_fd()?
>
> Yes there is such guarantee.
>
> Every time we call loop_set_fd() we check that loop_device->lo_state
> is Lo_unbound and set it to Lo_bound If somebody will try to set_fd
> again it will get EBUSY. And if we try to loop_clr_fd() on unbound
> loop device we'll get ENXIO.
>
> loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
> loop_device->lo_ctl_mutex.
Ok, good enough for me, I applied it, and it's commit
c1681bf8a7b1b98edee8b862a42c19c4e53205fd in my tree.
I assume it should go to stable too, because none of this is new, is
it? Did you check how far back this applies? I assume this goes back
pretty much forever, no?
Linus
Hi
On Mon, Apr 1, 2013 at 3:53 PM, Linus Torvalds
<[email protected]> wrote:
> On Mon, Apr 1, 2013 at 10:00 AM, Anatol Pomozov
> <[email protected]> wrote:
>> Hi
>>
>> On Mon, Apr 1, 2013 at 8:16 AM, Linus Torvalds
>> <[email protected]> wrote:
>>> On Mon, Apr 1, 2013 at 4:58 AM, Anatol Pomozov <[email protected]> wrote:
>>>>
>>>> To prevent use-after-free we need to hold device inode in loop_set_fd()
>>>> and put it later in loop_clr_fd().
>>>
>>> Is there something that guarantees that there's only one loop_set_fd()
>>> and one paired loop_clr_fd()?
>>
>> Yes there is such guarantee.
>>
>> Every time we call loop_set_fd() we check that loop_device->lo_state
>> is Lo_unbound and set it to Lo_bound If somebody will try to set_fd
>> again it will get EBUSY. And if we try to loop_clr_fd() on unbound
>> loop device we'll get ENXIO.
>>
>> loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
>> loop_device->lo_ctl_mutex.
>
> Ok, good enough for me, I applied it, and it's commit
> c1681bf8a7b1b98edee8b862a42c19c4e53205fd in my tree.
>
> I assume it should go to stable too, because none of this is new, is
> it? Did you check how far back this applies? I assume this goes back
> pretty much forever, no?
I bisected kernel using test from my commit and it points to
4c823cc3d568277aa6340d8df6981e34f4c4dee5 (appeared in kernel 3.2).
But even despite i cannot repro the crash on 3.0-stable, the
underlying issue (block_device is not locked) still exists there. So I
think patch should go to stable as well.
On Mon, Apr 01, 2013 at 10:58:55PM -0700, Anatol Pomozov wrote:
> >>
> >> loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
> >> loop_device->lo_ctl_mutex.
> >
> > Ok, good enough for me, I applied it, and it's commit
> > c1681bf8a7b1b98edee8b862a42c19c4e53205fd in my tree.
> >
> > I assume it should go to stable too, because none of this is new, is
> > it? Did you check how far back this applies? I assume this goes back
> > pretty much forever, no?
>
> I bisected kernel using test from my commit and it points to
> 4c823cc3d568277aa6340d8df6981e34f4c4dee5 (appeared in kernel 3.2).
>
> But even despite i cannot repro the crash on 3.0-stable, the
> underlying issue (block_device is not locked) still exists there. So I
> think patch should go to stable as well.
... except that you are doing invalidate *after* having done bdput. Which
is probably valid (we have the same bdev pinned down by opened file used
to issue the ioclt), but it's a really bad style; this should be in opposite
order.