2011-02-13 02:02:56

by Milan Broz

[permalink] [raw]
Subject: block device read-only handling regression in 2.6.38-rc4 (bisected)

Hi Tejun,

Seems this commit cause regressions:

commit 75f1dc0d076d1c1168f2115f1941ea627d38bd5a
Author: Tejun Heo <[email protected]>
Date: Sat Nov 13 11:55:17 2010 +0100

block: check bdev_read_only() from blkdev_get()

bdev read-only status can be queried using bdev_read_only() and may
change while the device is being opened. Enforce it by checking it
from blkdev_get() after open succeeds.


1) loop device once set read-only is not able to be used read-write afterward

touch /x1.img
losetup -r /dev/loop0 /x1.img
losetup -d /dev/loop0
losetup /dev/loop0 /x1.img
/dev/loop0: Permission denied


2) it breaks read-only dm-snapshots
(Fedora LiveCD operations is broken by this as well.)

(x.img is backing device, xs.img is prepared COW, you can simply run it
once in read-write to create dm-snap header and then re-run this commands)

losetup -r /dev/loop0 /x.img
losetup -r /dev/loop1 /xs.img
dmsetup create x --readonly --table '0 131072 snapshot /dev/loop0 /dev/loop1 p 8'
device-mapper: reload ioctl failed: Permission denied

Milan


2011-02-13 10:59:27

by Tao Ma

[permalink] [raw]
Subject: [PATCH] loop: clear read-only flag in loop_clr_fd.

From: Tao Ma <[email protected]>

In 75f1dc0, we check bdev_read_only() from blkdev_get().
But the loop_clr_fd doesn't clear the read only flag.
What cause a error if we changing a loop device from
read only to writable.

A simple test to reproduce the error reported by Milan[1]:
touch /x1.img
losetup -r /dev/loop0 /x1.img
losetup -d /dev/loop0
losetup /dev/loop0 /x1.img
/dev/loop0: Permission denied

1: http://marc.info/?l=linux-kernel&m=129756258222642&w=2

Reported-by: Milan Broz <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Tao Ma <[email protected]>
---
drivers/block/loop.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 44e18c0..0d24579 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1036,8 +1036,10 @@ static int loop_clr_fd(struct loop_device *lo, struct block_device *bdev)
memset(lo->lo_encrypt_key, 0, LO_KEY_SIZE);
memset(lo->lo_crypt_name, 0, LO_NAME_SIZE);
memset(lo->lo_file_name, 0, LO_NAME_SIZE);
- if (bdev)
+ if (bdev) {
+ set_device_ro(bdev, 0);
invalidate_bdev(bdev);
+ }
set_capacity(lo->lo_disk, 0);
loop_sysfs_exit(lo);
if (bdev) {
--
1.6.3.GIT

2011-02-13 14:11:26

by Milan Broz

[permalink] [raw]
Subject: Re: [PATCH] loop: clear read-only flag in loop_clr_fd.

On 02/13/2011 11:58 AM, Tao Ma wrote:
> From: Tao Ma <[email protected]>
>
> In 75f1dc0, we check bdev_read_only() from blkdev_get().
> But the loop_clr_fd doesn't clear the read only flag.
> What cause a error if we changing a loop device from
> read only to writable.

No, sorry, this is not proper/complete fix. It fixes it for loop
(and even not completely - you are missing some error
paths and ignoring autoclear mode where you have bdev NULL.)
(And yes, I tried the same as workaround.)

And it will not help for DM case (and possibly others).

Milan

2011-02-13 15:06:19

by Tao Ma

[permalink] [raw]
Subject: Re: [PATCH] loop: clear read-only flag in loop_clr_fd.

On 02/13/2011 10:11 PM, Milan Broz wrote:
> On 02/13/2011 11:58 AM, Tao Ma wrote:
>
>> From: Tao Ma<[email protected]>
>>
>> In 75f1dc0, we check bdev_read_only() from blkdev_get().
>> But the loop_clr_fd doesn't clear the read only flag.
>> What cause a error if we changing a loop device from
>> read only to writable.
>>
> No, sorry, this is not proper/complete fix. It fixes it for loop
> (and even not completely - you are missing some error
> paths and ignoring autoclear mode where you have bdev NULL.)
> (And yes, I tried the same as workaround.)
>
aha, sorry, I don't know you are more familiar with loop than me. ;)
I just did a quick test and sent the patch. So could you please tell me
a little more about how we use autoclear mode? I just googled but can't
find some helpful information. I will try to update my patch with a V2 when
I get familiar with the whole stuff.
> And it will not help for DM case (and possibly others).
>
Actually I don't think it is Tejun's patch that causes the bug. What his
patch do
is to expose some bugs that already exist. Say loop, it sets ro when it
get read
only flags, but doesn't clear it when it is detached. It should be
loop's problem,
not blkdev's. As for the DM case, I guess it should also be related to
DM part.

Regards,
Tao

2011-02-13 16:45:10

by Milan Broz

[permalink] [raw]
Subject: Re: [PATCH] loop: clear read-only flag in loop_clr_fd.

On 02/13/2011 04:05 PM, Tao Ma wrote:
> On 02/13/2011 10:11 PM, Milan Broz wrote:
>> On 02/13/2011 11:58 AM, Tao Ma wrote:
>>
>>> From: Tao Ma<[email protected]>
>>>
>>> In 75f1dc0, we check bdev_read_only() from blkdev_get().
>>> But the loop_clr_fd doesn't clear the read only flag.
>>> What cause a error if we changing a loop device from
>>> read only to writable.
>>>
>> No, sorry, this is not proper/complete fix. It fixes it for loop
>> (and even not completely - you are missing some error
>> paths and ignoring autoclear mode where you have bdev NULL.)
>> (And yes, I tried the same as workaround.)
>>
> aha, sorry, I don't know you are more familiar with loop than me. ;)
> I just did a quick test and sent the patch. So could you please tell me
> a little more about how we use autoclear mode?

When the autoclear flag is set, the loop device is deallocated with
the last close. So you can mount device over loop and after umount
the loop is automatically cleared (no need for losetup -d).
(I think this flag was not exported to losetup yet, so you need to use
ioctl flag yourself.)

> I will try to update my patch with a V2 when I get familiar with the whole stuff.

I would like Tejun tell us what the intention was here.
There are some paths which are not so clear (this one in loop device is easy),
so that code need to be audited.

> Actually I don't think it is Tejun's patch that causes the bug.

It is quite possible. But it worked before and this patch did not
fix these problems, so it is regression.

> Say loop, it sets ro when it get read only flags, but doesn't clear it when it is detached.

You can very easily create another bug here if you set device read-write
too early (while udev is still processing change/remove events).

Milan

2011-02-13 20:42:17

by Bodo Eggert

[permalink] [raw]
Subject: Re: [PATCH] loop: clear read-only flag in loop_clr_fd.

Tao Ma <[email protected]> wrote:

> --- a/drivers/block/loop.c

> - if (bdev)
> + if (bdev) {
> + set_device_ro(bdev, 0);
> invalidate_bdev(bdev);
> + }
> set_capacity(lo->lo_disk, 0);
> loop_sysfs_exit(lo);
> if (bdev) {

This looks like set_device_ro() should imply invalidate_bdev().
--
E.G.G.E.R.T.: Electronic Guardian Generated for
Efficient Repair and Troubleshooting
-- http://www.brunching.com/toys/toy-cyborger.html (down)
Friß, Spammer: [email protected] [email protected]

2011-02-14 10:30:33

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] loop: clear read-only flag in loop_clr_fd.

On Sun, Feb 13, 2011 at 05:44:59PM +0100, Milan Broz wrote:
> On 02/13/2011 04:05 PM, Tao Ma wrote:
> > On 02/13/2011 10:11 PM, Milan Broz wrote:
> >> On 02/13/2011 11:58 AM, Tao Ma wrote:
> >>
> >>> From: Tao Ma<[email protected]>
> >>>
> >>> In 75f1dc0, we check bdev_read_only() from blkdev_get().
> >>> But the loop_clr_fd doesn't clear the read only flag.
> >>> What cause a error if we changing a loop device from
> >>> read only to writable.
> >>>
> >> No, sorry, this is not proper/complete fix. It fixes it for loop
> >> (and even not completely - you are missing some error
> >> paths and ignoring autoclear mode where you have bdev NULL.)
> >> (And yes, I tried the same as workaround.)
> >>
> > aha, sorry, I don't know you are more familiar with loop than me. ;)
> > I just did a quick test and sent the patch. So could you please tell me
> > a little more about how we use autoclear mode?

Umm... This was reported some time ago and patches were already
posted. Milan, can you test whether the following two patches fix the
problems you're seeing? Jens, what's the status of these patches?

http://thread.gmane.org/gmane.linux.kernel/1090211/focus=1090666

Thanks.

--
tejun

2011-02-14 11:48:03

by Milan Broz

[permalink] [raw]
Subject: Re: [PATCH] loop: clear read-only flag in loop_clr_fd.


On 02/14/2011 11:30 AM, Tejun Heo wrote:
> Umm... This was reported some time ago and patches were already
> posted. Milan, can you test whether the following two patches fix the
> problems you're seeing? Jens, what's the status of these patches?
>
> http://thread.gmane.org/gmane.linux.kernel/1090211/focus=1090666
>
With patch below (loop cannot be built as module) it fixes the loop problem.

But it doesn't fix the read-only snapshot issue and I guess there will be
the same problem with read-only MD code too.
(so the 2) issue here https://lkml.org/lkml/2011/2/12/209).

If the call is changed intentionally, we have to fix unconditional blkdev open
calls with read-write flag in this code.
Before doing that I would like to know if it was intentional change or not...

You can simple try this reproducer (works on older kernel, second readonly
snapshot create fails now with permission denied)
+ dmsetup create x --readonly --table '0 131072 snapshot /dev/loop0 /dev/loop1 p 8'
device-mapper: reload ioctl failed: Permission denied

#!/bin/bash -x
modprobe loop

dd if=/dev/zero of=/x.img bs=1M count=64
dd if=/dev/zero of=/xs.img bs=1M count=64
losetup /dev/loop0 /x.img
losetup /dev/loop1 /xs.img
sync
dmsetup create x --table "0 131072 snapshot /dev/loop0 /dev/loop1 p 8"
udevadm settle
dmsetup remove x

losetup -d /dev/loop0
losetup -d /dev/loop1
losetup -r /dev/loop0 /x.img
losetup -r /dev/loop1 /xs.img
dmsetup create x --readonly --table "0 131072 snapshot /dev/loop0 /dev/loop1 p 8"
dmsetup table

dmsetup remove x
losetup -d /dev/loop0
losetup -d /dev/loop1

Milan

--
Export bdgrap to allow loop module build

Signed-off-by: Milan Broz <[email protected]>
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 333a7bb..c9cf9f7 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -574,6 +574,7 @@ struct block_device *bdgrab(struct block_device *bdev)
ihold(bdev->bd_inode);
return bdev;
}
+EXPORT_SYMBOL(bdgrab);

long nr_blockdev_pages(void)
{

2011-02-14 13:15:12

by Milan Broz

[permalink] [raw]
Subject: [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

> But it doesn't fix the read-only snapshot issue and I guess there will be
> the same problem with read-only MD code too.
> (so the 2) issue here https://lkml.org/lkml/2011/2/12/209).
I am not sure if this is complete fix... note that:
- what happens during mirror resync and read-only log?
- for COW, it there situation we need to update header in read-oly mode? (invalidated snap?)
Milan
--

[RFC] Do not open log and cow device read-write for read-only mappings

Signed-off-by: Milan Broz <[email protected]>

diff --git a/drivers/md/dm-log.c b/drivers/md/dm-log.c
index 6951536..8e8a868 100644
--- a/drivers/md/dm-log.c
+++ b/drivers/md/dm-log.c
@@ -543,7 +543,7 @@ static int disk_ctr(struct dm_dirty_log *log, struct dm_target *ti,
return -EINVAL;
}

- r = dm_get_device(ti, argv[0], FMODE_READ | FMODE_WRITE, &dev);
+ r = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), &dev);
if (r)
return r;

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index fdde53c..a2d3309 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1080,7 +1080,7 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
argv++;
argc--;

- r = dm_get_device(ti, cow_path, FMODE_READ | FMODE_WRITE, &s->cow);
+ r = dm_get_device(ti, cow_path, dm_table_get_mode(ti->table), &s->cow);
if (r) {
ti->error = "Cannot get COW device";
goto bad_cow;

2011-02-14 14:08:06

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] loop: clear read-only flag in loop_clr_fd.

Hello, Milan.

On Mon, Feb 14, 2011 at 12:47:48PM +0100, Milan Broz wrote:
> With patch below (loop cannot be built as module) it fixes the loop problem.

Okay.

> But it doesn't fix the read-only snapshot issue and I guess there will be
> the same problem with read-only MD code too.
> (so the 2) issue here https://lkml.org/lkml/2011/2/12/209).
>
> If the call is changed intentionally, we have to fix unconditional blkdev open
> calls with read-write flag in this code.
> Before doing that I would like to know if it was intentional change or not...

Yeap, the change was intentional. It was part of effort to make bdev
usages more consistent as before there was no mechanism enforcing ro.
It's still problematic as bdev users can clear ro without consulting
the actual device driver. Device driver's ->open() is called w/ ro
flag but the resulting bdev can be used rw. I wanted to remove that
too but filesystems depend on it so maybe later.

Thanks.

--
tejun

2011-02-14 14:09:48

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

Hello,

On Mon, Feb 14, 2011 at 02:14:57PM +0100, Milan Broz wrote:
> > But it doesn't fix the read-only snapshot issue and I guess there will be
> > the same problem with read-only MD code too.
> > (so the 2) issue here https://lkml.org/lkml/2011/2/12/209).

So, the problem is caused by dm opening members rw even for ro
devices, right?

> I am not sure if this is complete fix... note that:
> - what happens during mirror resync and read-only log?
> - for COW, it there situation we need to update header in read-oly mode? (invalidated snap?)

But if the underlying device is marked ro, dm shouldn't update it at
all. The device should be opened ro and ro policy should be enforced.

Thanks.

--
tejun

2011-02-14 14:23:35

by Milan Broz

[permalink] [raw]
Subject: Re: [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

On 02/14/2011 03:09 PM, Tejun Heo wrote:
> On Mon, Feb 14, 2011 at 02:14:57PM +0100, Milan Broz wrote:
>>> But it doesn't fix the read-only snapshot issue and I guess there will be
>>> the same problem with read-only MD code too.
>>> (so the 2) issue here https://lkml.org/lkml/2011/2/12/209).
>
> So, the problem is caused by dm opening members rw even for ro
> devices, right?

The patch uncover these shortcomings in code.
(Unfortunately quite late in RC...)

>> I am not sure if this is complete fix... note that:
>> - what happens during mirror resync and read-only log?
>> - for COW, it there situation we need to update header in read-oly mode? (invalidated snap?)
>
> But if the underlying device is marked ro, dm shouldn't update it at
> all. The device should be opened ro and ro policy should be enforced.

Sure. So we need to check these situations I described.


Btw the same pattern is in MD code in lock_rdev() ...

Milan

2011-02-14 14:39:20

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

On Mon, Feb 14, 2011 at 03:09:40PM +0100, Tejun Heo wrote:
> But if the underlying device is marked ro, dm shouldn't update it at
> all. The device should be opened ro and ro policy should be enforced.

Indeed, but dm isn't tracking this today because it didn't need to up-to
now.

I can think of a few scenarios where dm can have the underlying device open
read-write when it only needs read-only. (E.g. we track and cater for
read-only->read-write transitions but not the opposite and don't propagate
changes through the stack.)

We'll do an audit...

Alasdair

2011-02-14 15:44:39

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

Hello,

On Mon, Feb 14, 2011 at 03:23:20PM +0100, Milan Broz wrote:
> >> I am not sure if this is complete fix... note that:
> >> - what happens during mirror resync and read-only log?
> >> - for COW, it there situation we need to update header in read-oly mode? (invalidated snap?)
> >
> > But if the underlying device is marked ro, dm shouldn't update it at
> > all. The device should be opened ro and ro policy should be enforced.
>
> Sure. So we need to check these situations I described.

Yeap, it seems dm folks are gonna take care of dm part.

> Btw the same pattern is in MD code in lock_rdev() ...

Indeed, cc'ing Neil. Hi, the whole thread can be read from the
following URL.

http://thread.gmane.org/gmane.linux.kernel/1099399/focus=1099735

blkdev_get() now rejects rw open of devices which are marked
read-only. I think the right thing to do would be opening the member
devices ro if the array is assembled for ro access (similar to Milan's
patch for dm). How does that sound?

Thanks.

--
tejun

2011-02-14 23:15:21

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

On Mon, 14 Feb 2011 16:44:30 +0100 Tejun Heo <[email protected]> wrote:

> Hello,
>
> On Mon, Feb 14, 2011 at 03:23:20PM +0100, Milan Broz wrote:
> > >> I am not sure if this is complete fix... note that:
> > >> - what happens during mirror resync and read-only log?
> > >> - for COW, it there situation we need to update header in read-oly mode? (invalidated snap?)
> > >
> > > But if the underlying device is marked ro, dm shouldn't update it at
> > > all. The device should be opened ro and ro policy should be enforced.
> >
> > Sure. So we need to check these situations I described.
>
> Yeap, it seems dm folks are gonna take care of dm part.
>
> > Btw the same pattern is in MD code in lock_rdev() ...
>
> Indeed, cc'ing Neil. Hi, the whole thread can be read from the
> following URL.
>
> http://thread.gmane.org/gmane.linux.kernel/1099399/focus=1099735
>
> blkdev_get() now rejects rw open of devices which are marked
> read-only. I think the right thing to do would be opening the member
> devices ro if the array is assembled for ro access (similar to Milan's
> patch for dm). How does that sound?
>
> Thanks.
>

Sounds sensible ... though it is not all that easy to assemble an
array as 'read-only'.... it is possible though.

When the array is switched to read-write, do I have to call blkdev_get again
asking for rw access, then close the old blkdev, or can I 'upgrade'?

If a device has multiple opens: some read-only and some read-write, can I
find out when the last read-write close is gone? That would be really useful,
especially if a filesystem down-graded its open to read-only when it is
remounted read-only..

[[And if filesystems could be convinced to open the device read-only when the
fs is mounted read-only (and just do journal replay to internal data
structures) that would be really awesome!!]]

NeilBrown

2011-02-15 02:04:10

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

On Tue, Feb 15, 2011 at 10:15:06AM +1100, Neil Brown wrote:
> Sounds sensible ... though it is not all that easy to assemble an
> array as 'read-only'.... it is possible though.

For dm, it is looking like this change will *require* an upgrade to
userspace LVM tools: some people who just update their kernels without
updating their LVM tools too may find booting their machine fails. We
are still evaluating exactly which parts of LVM and which classes of
users are affected and how best to deal with this, but I know from
experience that changes where a kernel update requires an associated
userspace update are never well-received and we would normally try to
give plenty of lead time by updating the userspace tools and starting to
get them into the distributions before imposing the kernel change.

> When the array is switched to read-write, do I have to call blkdev_get again
> asking for rw access, then close the old blkdev, or can I 'upgrade'?

In dm we upgrade_mode() as you describe.

> If a device has multiple opens: some read-only and some read-write, can I
> find out when the last read-write close is gone? That would be really useful,
> especially if a filesystem down-graded its open to read-only when it is
> remounted read-only..

Likewise, dm has no mechanism for downgrading as yet.

Alasdair

2011-02-15 12:18:11

by Milan Broz

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

On 02/15/2011 03:03 AM, Alasdair G Kergon wrote:
> On Tue, Feb 15, 2011 at 10:15:06AM +1100, Neil Brown wrote:
>> Sounds sensible ... though it is not all that easy to assemble an
>> array as 'read-only'.... it is possible though.
>
> For dm, it is looking like this change will *require* an upgrade to
> userspace LVM tools: some people who just update their kernels without
> updating their LVM tools too may find booting their machine fails. We
> are still evaluating exactly which parts of LVM and which classes of
> users are affected and how best to deal with this, but I know from
> experience that changes where a kernel update requires an associated
> userspace update are never well-received and we would normally try to
> give plenty of lead time by updating the userspace tools and starting to
> get them into the distributions before imposing the kernel change.

This little change is really problematic.

Not only lvm userspace has problems here, also cryptsetup is broken
for read-only mappings.

Of course this it can be easily fixed, but it take some time until
the new userspace is propagated to distros.

Seems it changed userspace API here. For example, this is no longer
working now:

fd = open(device, O_RDWR | flags);
if (fd == -1 && errno == EROFS) {
*readonly = 1;
fd = open(device, O_RDONLY | flags);
}


Milan

2011-02-15 12:46:44

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

On Tue, Feb 15, 2011 at 01:17:56PM +0100, Milan Broz wrote:
> fd = open(device, O_RDWR | flags);
> if (fd == -1 && errno == EROFS) {
> *readonly = 1;
> fd = open(device, O_RDONLY | flags);
> }

Would it be reasonable for your patch to return EROFS rather than
EACCES?

man 2 open:
int open(const char *pathname, int flags, mode_t mode);

EROFS pathname refers to a file on a read-only file system and write
access was requested.

EACCES The requested access to the file is not allowed, or search per‐
mission is denied for one of the directories in the path prefix
of pathname, or the file did not exist yet and write access to
the parent directory is not allowed. (See also path_resolu‐
tion(7).)

Alasdair

2011-02-15 15:16:37

by Milan Broz

[permalink] [raw]
Subject: [PATCH] Return EROFS if read-only detected on block device

This allows userspace to distinguish what is going on.

EACCES is returned when user lacks required capability,
not that device is read-only.

Signed-off-by: Milan Broz <[email protected]>
---
fs/block_dev.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index c9cf9f7..db2c8db 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1219,7 +1219,7 @@ int blkdev_get(struct block_device *bdev, fmode_t mode, void *holder)
/* __blkdev_get() may alter read only status, check it afterwards */
if (!res && (mode & FMODE_WRITE) && bdev_read_only(bdev)) {
__blkdev_put(bdev, mode, 0);
- res = -EACCES;
+ res = -EROFS;
}

if (whole) {


2011-02-15 15:25:50

by Tejun Heo

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

Hello,

On Tue, Feb 15, 2011 at 12:46:29PM +0000, Alasdair G Kergon wrote:
> On Tue, Feb 15, 2011 at 01:17:56PM +0100, Milan Broz wrote:
> > fd = open(device, O_RDWR | flags);
> > if (fd == -1 && errno == EROFS) {
> > *readonly = 1;
> > fd = open(device, O_RDONLY | flags);
> > }
>
> Would it be reasonable for your patch to return EROFS rather than
> EACCES?
>
> man 2 open:
> int open(const char *pathname, int flags, mode_t mode);
>
> EROFS pathname refers to a file on a read-only file system and write
> access was requested.
>
> EACCES The requested access to the file is not allowed, or search per‐
> mission is denied for one of the directories in the path prefix
> of pathname, or the file did not exist yet and write access to
> the parent directory is not allowed. (See also path_resolu‐
> tion(7).)

Hmmm... but -EACCES is the correct one here. The device node itself
is rejecting RW access. There's no FS which is enforcing RO. But if
the userland is already expecting that, dm can simply change the error
code, no?

Thanks.

--
tejun

2011-02-15 15:46:38

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

On Tue, Feb 15, 2011 at 04:20:33PM +0100, Tejun Heo wrote:
> Hmmm... but -EACCES is the correct one here. The device node itself
> is rejecting RW access. There's no FS which is enforcing RO.

Exactly:) If the filesystem permissions were what was blocking this
(say r--) then I'd agree with EACCES. Interpret those man pages in the
context of 'pathname refers to a block device not a file'.

If it's EACCES, I just need to gain more privilege/capabilities and then
repeat the system call and it could succeed.

But EROFS tells me however much extra privilege I get it's going to make
no difference.

That's why I'm arguing EACCES is not a good error to return and EROFS is
more appropriate.

Alasdair

2011-02-15 15:50:26

by Tejun Heo

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

Hello,

On Tue, Feb 15, 2011 at 03:46:25PM +0000, Alasdair G Kergon wrote:
> Exactly:) If the filesystem permissions were what was blocking this
> (say r--) then I'd agree with EACCES. Interpret those man pages in the
> context of 'pathname refers to a block device not a file'.
>
> If it's EACCES, I just need to gain more privilege/capabilities and then
> repeat the system call and it could succeed.
>
> But EROFS tells me however much extra privilege I get it's going to make
> no difference.
>
> That's why I'm arguing EACCES is not a good error to return and EROFS is
> more appropriate.

Frankly, I don't really mind one way or the other but EROFS isn't
usually used in those areas. It might make sense for this use case
and then there will be cases it just feels awkward. This being a dm
thing, wouldn't it be just better to let dm massage the return value?

Thanks.

--
tejun

2011-02-15 16:06:06

by Milan Broz

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

On 02/15/2011 04:50 PM, Tejun Heo wrote:
>> That's why I'm arguing EACCES is not a good error to return and EROFS is
>> more appropriate.
>
> Frankly, I don't really mind one way or the other but EROFS isn't
> usually used in those areas. It might make sense for this use case
> and then there will be cases it just feels awkward. This being a dm
> thing, wouldn't it be just better to let dm massage the return value?

It is not DM thing. That code was checking for generic block device.
No DM there (it was from cryptsetup code but not related to DM part).

Yes, code is not perfect but it worked for >5 years. How many userspace
programs it breaks now?

Milan

2011-02-15 16:13:13

by Tejun Heo

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

On Tue, Feb 15, 2011 at 05:05:48PM +0100, Milan Broz wrote:
> On 02/15/2011 04:50 PM, Tejun Heo wrote:
> >> That's why I'm arguing EACCES is not a good error to return and EROFS is
> >> more appropriate.
> >
> > Frankly, I don't really mind one way or the other but EROFS isn't
> > usually used in those areas. It might make sense for this use case
> > and then there will be cases it just feels awkward. This being a dm
> > thing, wouldn't it be just better to let dm massage the return value?
>
> It is not DM thing. That code was checking for generic block device.
> No DM there (it was from cryptsetup code but not related to DM part).

Hmmm... I'm confused now. Where was that -EROFS from then? I don't
recall changing -EROFS to -EACCES. What did I miss?

--
tejun

2011-02-15 16:36:44

by Milan Broz

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

On 02/15/2011 05:12 PM, Tejun Heo wrote:
> On Tue, Feb 15, 2011 at 05:05:48PM +0100, Milan Broz wrote:
>> On 02/15/2011 04:50 PM, Tejun Heo wrote:
>>>> That's why I'm arguing EACCES is not a good error to return and EROFS is
>>>> more appropriate.
>>>
>>> Frankly, I don't really mind one way or the other but EROFS isn't
>>> usually used in those areas. It might make sense for this use case
>>> and then there will be cases it just feels awkward. This being a dm
>>> thing, wouldn't it be just better to let dm massage the return value?
>>
>> It is not DM thing. That code was checking for generic block device.
>> No DM there (it was from cryptsetup code but not related to DM part).
>
> Hmmm... I'm confused now. Where was that -EROFS from then? I don't
> recall changing -EROFS to -EACCES. What did I miss?

Well, I am also not sure about that.

But the problem is that read-write open fails now while it worked before.
(TBH I have no idea when that EROFS fallback worked - because the code
opened device RW, issued EROGET ioctl and set read-only... for years.)

Anyway I think EROFS is used on block devices, just grep kernel source.

Milan

2011-02-15 16:41:36

by Tejun Heo

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

Hello,

On Tue, Feb 15, 2011 at 05:36:31PM +0100, Milan Broz wrote:
> Well, I am also not sure about that.
>
> But the problem is that read-write open fails now while it worked before.
> (TBH I have no idea when that EROFS fallback worked - because the code
> opened device RW, issued EROGET ioctl and set read-only... for years.)
>
> Anyway I think EROFS is used on block devices, just grep kernel source.

Ah, okay, so the fallback was there just in case. It didn't really
trigger and right it wouldn't have triggered until now, so your
assertion about how many programs would break is kinda bogus. You
just have single isolated case which hasn't been excercised till now.
There may as well be code pieces which check against EACCES or what
not.

That said, maybe -EROFS is the better fit. I really have no idea.
Maybe we should just revert and leave rw accesses to ro block devices
alone. Jens, what do you think?

Thanks.

--
tejun

2011-02-15 16:56:32

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

On Tue, Feb 15, 2011 at 05:41:28PM +0100, Tejun Heo wrote:
> That said, maybe -EROFS is the better fit. I really have no idea.
> Maybe we should just revert and leave rw accesses to ro block devices
> alone.

I'd agree that the change to enforce readonly devs is a desirable change
to make. But we're still discovering exactly what things it breaks and
as well as further kernel patches some userspace changes seem
unavoidable. Personally I'd prefer it if this change went back into
linux-next to give us more time to prepare for it cleanly. It's
unfortunate none of us picked up on these problems sooner.

Alasdair

2011-02-15 16:58:59

by Milan Broz

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings


On 02/15/2011 05:41 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, Feb 15, 2011 at 05:36:31PM +0100, Milan Broz wrote:
>> Well, I am also not sure about that.
>>
>> But the problem is that read-write open fails now while it worked before.
>> (TBH I have no idea when that EROFS fallback worked - because the code
>> opened device RW, issued EROGET ioctl and set read-only... for years.)
>>
>> Anyway I think EROFS is used on block devices, just grep kernel source.
>
> Ah, okay, so the fallback was there just in case. It didn't really
> trigger and right it wouldn't have triggered until now, so your
> assertion about how many programs would break is kinda bogus. You
> just have single isolated case which hasn't been excercised till now.
> There may as well be code pieces which check against EACCES or what
> not.

If you want another example, here is MD one.

# blockdev --setrw /dev/sd[bcde]
# mdadm -A /dev/md0 /dev/sd[bcde]
mdadm: /dev/md0 has been started with 4 drives.
# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
# blockdev --setro /dev/sd[bcde]
# mdadm -A /dev/md0 /dev/sd[bcde]
mdadm: cannot re-read metadata from /dev/sdb - aborting

Works on 2.6.36.

Thanks,
Milan

2011-02-16 08:40:04

by Tejun Heo

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

On Tue, Feb 15, 2011 at 05:58:45PM +0100, Milan Broz wrote:
> # blockdev --setrw /dev/sd[bcde]
> # mdadm -A /dev/md0 /dev/sd[bcde]
> mdadm: /dev/md0 has been started with 4 drives.
> # mdadm --stop /dev/md0
> mdadm: stopped /dev/md0
> # blockdev --setro /dev/sd[bcde]
> # mdadm -A /dev/md0 /dev/sd[bcde]
> mdadm: cannot re-read metadata from /dev/sdb - aborting
>
> Works on 2.6.36.

Isn't failing to assemble rw array from ro devices the expected
behavior? I think I'd want that to fail. Neil, what do you think?

--
tejun

2011-02-16 08:46:24

by Tejun Heo

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH][RFC] dm: Do not open log and cow device read-write for read-only mappings

Hello, Alasdair.

On Tue, Feb 15, 2011 at 04:56:18PM +0000, Alasdair G Kergon wrote:
> On Tue, Feb 15, 2011 at 05:41:28PM +0100, Tejun Heo wrote:
> > That said, maybe -EROFS is the better fit. I really have no idea.
> > Maybe we should just revert and leave rw accesses to ro block devices
> > alone.
>
> I'd agree that the change to enforce readonly devs is a desirable change
> to make. But we're still discovering exactly what things it breaks and
> as well as further kernel patches some userspace changes seem
> unavoidable. Personally I'd prefer it if this change went back into
> linux-next to give us more time to prepare for it cleanly. It's
> unfortunate none of us picked up on these problems sooner.

Yeah, yeah, I think we need revert. It's a bit too late to sort this
out in this cycle, but, then again, can we ever change this? Given
that the 'ro' part is hardly used with md, I think dm probably is the
only one which has noticeable problem with this change.

No matter when we change it, it's gonna be visible to userland. To
me, most of the changes seem like bug fixes and pretty important ones
at that. As it currently stands, the kernel goes behind the user's
back and issues writes to devices which the user believes to have
allowed only ro access.

So, if dm can figure out how to deal with this, it would be great.

Thanks.

--
tejun