2020-12-21 22:54:07

by Stephen Rothwell

[permalink] [raw]
Subject: linux-next: manual merge of the device-mapper tree with Linus' tree

Hi all,

Today's linux-next merge of the device-mapper tree got a conflict in:

drivers/md/dm-table.c

between commit:

4e7b5671c6a8 ("block: remove i_bdev")

from Linus' tree and commit:

05a876e9a2a8 ("dm table: avoid filesystem lookup in dm_get_dev_t()")

from the device-mapper tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging. You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

--
Cheers,
Stephen Rothwell

diff --cc drivers/md/dm-table.c
index 188f41287f18,299a256fb409..000000000000
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@@ -347,9 -347,22 +347,15 @@@ static int upgrade_mode(struct dm_dev_i
dev_t dm_get_dev_t(const char *path)
{
dev_t dev;
- struct block_device *bdev;
+ unsigned int maj, min;

+ if (sscanf(path, "%u:%u", &maj, &min) == 2) {
+ dev = MKDEV(maj, min);
+ if (maj == MAJOR(dev) && min == MINOR(dev))
+ return dev;
+ }
- bdev = lookup_bdev(path);
- if (IS_ERR(bdev))
+ if (lookup_bdev(path, &dev))
dev = name_to_dev_t(path);
- else {
- dev = bdev->bd_dev;
- bdput(bdev);
- }
-
return dev;
}
EXPORT_SYMBOL_GPL(dm_get_dev_t);


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2020-12-22 13:18:36

by Christoph Hellwig

[permalink] [raw]
Subject: Re: linux-next: manual merge of the device-mapper tree with Linus' tree

Mike, Hannes,

I think this patch is rather harmful. Why does device mapper even
mix file system path with a dev_t and all the other weird forms
parsed by name_to_dev_t, which was supposed to be be for the early
init code where no file system is available.

Can we please kick off a proper discussion for this on the linux-block
list?

2020-12-22 14:57:38

by Mike Snitzer

[permalink] [raw]
Subject: DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]

[added linux-block and dm-devel, if someone replies to this email to
continue "proper discussion" _please_ at least drop sfr and linux-next
from Cc]

On Tue, Dec 22 2020 at 8:15am -0500,
Christoph Hellwig <[email protected]> wrote:

> Mike, Hannes,
>
> I think this patch is rather harmful. Why does device mapper even
> mix file system path with a dev_t and all the other weird forms
> parsed by name_to_dev_t, which was supposed to be be for the early
> init code where no file system is available.

OK, I'll need to revisit (unless someone beats me to it) because this
could've easily been a blind-spot for me when the dm-init code went in.
Any dm-init specific enabling interface shouldn't be used by more
traditional DM interfaces. So Hannes' change might be treating symptom
rather than the core problem (which would be better treated by factoring
out dm-init requirements for a name_to_dev_t()-like interface?).

DM has supported passing maj:min and blockdev names on DM table lines
forever... so we'll need to be very specific about where/why things
regressed.

> Can we please kick off a proper discussion for this on the linux-block
> list?

Sure, done. But I won't drive that discussion in the near-term. I need
to take some time off for a few weeks.

In the meantime I'll drop Hannes' patch for 5.11; I'm open to an
alternative fix that I'd pickup during 5.11-rcX.

Thanks,
Mike

2020-12-22 17:25:46

by Hannes Reinecke

[permalink] [raw]
Subject: Re: DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]

On 12/22/20 3:53 PM, Mike Snitzer wrote:
> [added linux-block and dm-devel, if someone replies to this email to
> continue "proper discussion" _please_ at least drop sfr and linux-next
> from Cc]
>
> On Tue, Dec 22 2020 at 8:15am -0500,
> Christoph Hellwig <[email protected]> wrote:
>
>> Mike, Hannes,
>>
>> I think this patch is rather harmful. Why does device mapper even
>> mix file system path with a dev_t and all the other weird forms
>> parsed by name_to_dev_t, which was supposed to be be for the early
>> init code where no file system is available.
>
> OK, I'll need to revisit (unless someone beats me to it) because this
> could've easily been a blind-spot for me when the dm-init code went in.
> Any dm-init specific enabling interface shouldn't be used by more
> traditional DM interfaces. So Hannes' change might be treating symptom
> rather than the core problem (which would be better treated by factoring
> out dm-init requirements for a name_to_dev_t()-like interface?).
>
> DM has supported passing maj:min and blockdev names on DM table lines
> forever... so we'll need to be very specific about where/why things
> regressed.
>

Ok. The problem from my perspective is that device-mapper needs to
a) ensure that the arbitrary string passed in with the table definition
refers to a valid block device
and
b) the block device can be opened with O_EXCL, so that device-mapper can
then use it.

Originally (ie prior to commit 644bda6f3460) dm_get_device() just
converted the string to a 'dev_t' representation, and then the block
device itself was checked and opened in dm_get_table_device().
'lookup_bdev' was just being used to convert the path if the string was
not in the canonical major:minor format, as then it was assumed that it
referred to a block device node, and then lookup_bdev kinda makes sense.

However, lookup_bdev() now always recurses into the filesystem, causing
multipath to stall in an all-paths-down scenario.

So, the real issue is the table definiton; as it also accepts a device
to be specified by the block device _node_ name, we need to have a way
of converting that into a dev_t.

If lookup_bdev() is the wrong interface for that, by all means, please,
do tell me. I'd be happy to draft up a patch.

Alternatively, if Mike says that only major:minor is the valid format
for a table definition we can kill that code completely. But clearly _I_
cannot make the call here.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

2020-12-22 21:09:10

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: [dm-devel] DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]

On Tue, Dec 22, 2020 at 06:24:09PM +0100, Hannes Reinecke wrote:
> However, lookup_bdev() now always recurses into the filesystem, causing
> multipath to stall in an all-paths-down scenario.

I have not read the background about whatever the new problem is - I'm
jumping in cold seeing this message - but from the very beginning of
device-mapper we have strongly recommended that userspace supplies the
block device in the form MAJOR:MINOR and all our own tools do that. We
guarantee not to deadlock in these places when this is done.

We also accept the device in the form of a path name as we know there
are times when this is safe and convenient, but then we offer no
guarantees - we place the responsibility upon userspace only to do this
when it knows it is safe to do so i.e. no race and no deadlock.

Alasdair

2020-12-23 08:08:14

by Christoph Hellwig

[permalink] [raw]
Subject: Re: DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]

On Tue, Dec 22, 2020 at 06:24:09PM +0100, Hannes Reinecke wrote:
> Ok. The problem from my perspective is that device-mapper needs to
> a) ensure that the arbitrary string passed in with the table definition
> refers to a valid block device
> and
> b) the block device can be opened with O_EXCL, so that device-mapper can
> then use it.
>
> Originally (ie prior to commit 644bda6f3460) dm_get_device() just converted
> the string to a 'dev_t' representation, and then the block device itself
> was checked and opened in dm_get_table_device().
> 'lookup_bdev' was just being used to convert the path if the string was not
> in the canonical major:minor format, as then it was assumed that it
> referred to a block device node, and then lookup_bdev kinda makes sense.

Yes, 644bda6f3460 is the cause of all evil, as it uses an API in a place
where it should not be used. It and the prep patch
(e6e20a7a5f3f49bfee518d5c6849107398d83912) which did the grave mistake
of making name_to_dev_t available outside of the early init code really
need to be reverted.

> However, lookup_bdev() now always recurses into the filesystem, causing
> multipath to stall in an all-paths-down scenario.

lookup_bdev always did a file system lookup, and always only accepted
a valid name in the file system.

> Alternatively, if Mike says that only major:minor is the valid format for a
> table definition we can kill that code completely. But clearly _I_ cannot
> make the call here.

Before 644bda6f3460 the table definitions only accepted a valid name in
the file system. Which is the proper interface.

2020-12-23 08:09:34

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [dm-devel] DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]

On Tue, Dec 22, 2020 at 09:06:04PM +0000, Alasdair G Kergon wrote:
> I have not read the background about whatever the new problem is - I'm
> jumping in cold seeing this message - but from the very beginning of
> device-mapper we have strongly recommended that userspace supplies the
> block device in the form MAJOR:MINOR and all our own tools do that. We
> guarantee not to deadlock in these places when this is done.
>
> We also accept the device in the form of a path name as we know there
> are times when this is safe and convenient, but then we offer no
> guarantees - we place the responsibility upon userspace only to do this
> when it knows it is safe to do so i.e. no race and no deadlock.

644bda6f346038bce7ad3ed48f7044c10dde6d47 changes that by accepting all
kinds of weirdo formats :(