Subject: Re: API break, sysfs "capability" file

[adding the culprit's author to the loop; also CCing everyone else in
the Signed-off-by chain and a few lists that should be in the loop, too]

On 08.04.24 17:13, Lennart Poettering wrote:
>
> So this broke systemd:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e81cd5a983bb35dabd38ee472cf3fea1c63e0f23

FWIW, that is e81cd5a983bb35 ("block: stub out and deprecated the
capability attribute on the gendisk") [v6.3-rc1]

> We use the "capability" sysfs attr to figure out if a block device has
> part scanning enabled or not. There seems to be no other API for
> this. (We also use it in our test suite to see if devices match are
> expectations, and older systemd/udev versions used to match agains it
> from udev rules.)
>
> The interface was part of sysfs, and documented:
>
> https://www.kernel.org/doc/html/v5.5/block/capability.html
>
> While it doesn't list the partscan bit it actually does document that
> one is supposed to look into include/linux/genhd.h for the various
> bits and their meanings. I'd argue that makes them API to some level.
>
> Could this please be reverted? Just keeping the relevant bits (i.e. at
> least the media change feature bit, and the part scanning bit) is
> enough for retaining userspace compat.
>
> (Please consider googling or a github code search or so before removing
> a public API like this. This compat breakage was very much avoidable
> with a tiny bit of googling.)

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced e81cd5a983bb3
#regzbot title block: sysfs "capability" file broke systemd's checking
for part scanning
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.


2024-04-08 18:43:29

by Keith Busch

[permalink] [raw]
Subject: Re: API break, sysfs "capability" file

On Mon, Apr 08, 2024 at 07:43:04PM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> [adding the culprit's author to the loop; also CCing everyone else in
> the Signed-off-by chain and a few lists that should be in the loop, too]
>
> On 08.04.24 17:13, Lennart Poettering wrote:
> >
> > So this broke systemd:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e81cd5a983bb35dabd38ee472cf3fea1c63e0f23
>>
> > We use the "capability" sysfs attr to figure out if a block device has
> > part scanning enabled or not. There seems to be no other API for
> > this. (We also use it in our test suite to see if devices match are
> > expectations, and older systemd/udev versions used to match agains it
> > from udev rules.)
> >
> > The interface was part of sysfs, and documented:
> >
> > https://www.kernel.org/doc/html/v5.5/block/capability.html
> >
> > While it doesn't list the partscan bit it actually does document that
> > one is supposed to look into include/linux/genhd.h for the various
> > bits and their meanings. I'd argue that makes them API to some level.
> >
> > Could this please be reverted? Just keeping the relevant bits (i.e. at
> > least the media change feature bit, and the part scanning bit) is
> > enough for retaining userspace compat.
> >
> > (Please consider googling or a github code search or so before removing
> > a public API like this. This compat breakage was very much avoidable
> > with a tiny bit of googling.)

That is unfortunate wording in the sysfs description.

How were keeping in sync with the changing values before? The setting
you seem to care about is now defined in a different file, with a
different name, and with a different value. Or are you suggesting all
those things should have been stable API?

2024-04-08 20:24:05

by Lennart Poettering

[permalink] [raw]
Subject: Re: API break, sysfs "capability" file

On Mo, 08.04.24 12:41, Keith Busch ([email protected]) wrote:

> On Mon, Apr 08, 2024 at 07:43:04PM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> > [adding the culprit's author to the loop; also CCing everyone else in
> > the Signed-off-by chain and a few lists that should be in the loop, too]
> >
> > On 08.04.24 17:13, Lennart Poettering wrote:
> > >
> > > So this broke systemd:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e81cd5a983bb35dabd38ee472cf3fea1c63e0f23
> >>
> > > We use the "capability" sysfs attr to figure out if a block device has
> > > part scanning enabled or not. There seems to be no other API for
> > > this. (We also use it in our test suite to see if devices match are
> > > expectations, and older systemd/udev versions used to match agains it
> > > from udev rules.)
> > >
> > > The interface was part of sysfs, and documented:
> > >
> > > https://www.kernel.org/doc/html/v5.5/block/capability.html

I linked to the wrong docs btw: here is the right one:

https://www.kernel.org/doc/html/v5.15/block/capability.html

Quoting:

This file documents the sysfs file block/<disk>/capability.

capability is a bitfield, printed in hexadecimal, indicating
which capabilities a specific block device supports:



GENHD_FL_NO_PART_SCAN (0x0200): partition scanning is disabled. Used
for loop devices in their default settings and some MMC devices.

We used the flag 0x200 in systemd. i.e. followed the docs, and it
worked back when we added this.

> > > While it doesn't list the partscan bit it actually does document that
> > > one is supposed to look into include/linux/genhd.h for the various
> > > bits and their meanings. I'd argue that makes them API to some level.
> > >
> > > Could this please be reverted? Just keeping the relevant bits (i.e. at
> > > least the media change feature bit, and the part scanning bit) is
> > > enough for retaining userspace compat.
> > >
> > > (Please consider googling or a github code search or so before removing
> > > a public API like this. This compat breakage was very much avoidable
> > > with a tiny bit of googling.)
>
> That is unfortunate wording in the sysfs description.
>
> How were keeping in sync with the changing values before? The setting
> you seem to care about is now defined in a different file, with a
> different name, and with a different value. Or are you suggesting all
> those things should have been stable API?

Ah, so this ws already broken once
before... 430cc5d3ab4d0ba0bd011cfbb0035e46ba92920c

So the documented API was not just broken completely once but twice?

Good to know.

1. GENHD_FL_NO_PART(_SCAN) was originally documented as 0x200.

2. GENHD_FL_MEDIA_CHANGE_NOTIFY was originally documented as 4.

3. And then GENHD_FL_NO_PART(_SCAN) got redefined to 4 in
430cc5d3ab4d0ba0bd011cfbb0035e46ba92920c.

All that even though this is documented API which is still up on
kernel.org?

Not sure how this is salvageable. This is just seriously fucked
up. What now?

It has been proposed to use the "range_ext" sysfs attr instead as a
hint if partition scanning is available or not. But it's entirely
undocumented. Is this something that will remain stable? (I mean,
whether something is documented or not apparently has no effect on the
stability of an API anyway, so I guess it's equally shaky as the
capability sysattr? Is any of the block device sysfs interfaces
actually stable or can they change any time?)

Lennart

--
Lennart Poettering, Berlin

2024-04-08 22:41:30

by Keith Busch

[permalink] [raw]
Subject: Re: API break, sysfs "capability" file

On Mon, Apr 08, 2024 at 10:23:49PM +0200, Lennart Poettering wrote:
> Not sure how this is salvageable. This is just seriously fucked
> up. What now?
>
> It has been proposed to use the "range_ext" sysfs attr instead as a
> hint if partition scanning is available or not. But it's entirely
> undocumented. Is this something that will remain stable? (I mean,
> whether something is documented or not apparently has no effect on the
> stability of an API anyway, so I guess it's equally shaky as the
> capability sysattr? Is any of the block device sysfs interfaces
> actually stable or can they change any time?)

The "ext_range" attribute does look like an appropriate proxy for the
attribute, but indeed, it's not well documented.

Looking at the history of the documentation you had been relying on, it
appears that was submitted with good intentions (9243c6f3e012a92d), but
it itself changed values, acknowledging the instability of this
interface.

So what to do? If documentation is all that's preventing "ext_range"
from replacing you're previous usage, then let's add it in the
Documentation/ABI/stable/sysfs-block. It's been there since 2008, so
that seems like a reliable attribute to put there.

2024-04-09 06:10:10

by Hannes Reinecke

[permalink] [raw]
Subject: Re: API break, sysfs "capability" file

On 4/9/24 00:41, Keith Busch wrote:
> On Mon, Apr 08, 2024 at 10:23:49PM +0200, Lennart Poettering wrote:
>> Not sure how this is salvageable. This is just seriously fucked
>> up. What now?
>>
>> It has been proposed to use the "range_ext" sysfs attr instead as a
>> hint if partition scanning is available or not. But it's entirely
>> undocumented. Is this something that will remain stable? (I mean,
>> whether something is documented or not apparently has no effect on the
>> stability of an API anyway, so I guess it's equally shaky as the
>> capability sysattr? Is any of the block device sysfs interfaces
>> actually stable or can they change any time?)
>
> The "ext_range" attribute does look like an appropriate proxy for the
> attribute, but indeed, it's not well documented.
>
> Looking at the history of the documentation you had been relying on, it
> appears that was submitted with good intentions (9243c6f3e012a92d), but
> it itself changed values, acknowledging the instability of this
> interface.
>
> So what to do? If documentation is all that's preventing "ext_range"
> from replacing you're previous usage, then let's add it in the
> Documentation/ABI/stable/sysfs-block. It's been there since 2008, so
> that seems like a reliable attribute to put there.
>
I'll side with Keith. Our management tools use 'ext_range' to find
if a device is partitionable, and we've done that since the very
beginning of sysfs.

Cheers,

Hannes


2024-04-09 08:23:29

by Lennart Poettering

[permalink] [raw]
Subject: Re: API break, sysfs "capability" file

On Mo, 08.04.24 16:41, Keith Busch ([email protected]) wrote:

> On Mon, Apr 08, 2024 at 10:23:49PM +0200, Lennart Poettering wrote:
> > Not sure how this is salvageable. This is just seriously fucked
> > up. What now?
> >
> > It has been proposed to use the "range_ext" sysfs attr instead as a
> > hint if partition scanning is available or not. But it's entirely
> > undocumented. Is this something that will remain stable? (I mean,
> > whether something is documented or not apparently has no effect on the
> > stability of an API anyway, so I guess it's equally shaky as the
> > capability sysattr? Is any of the block device sysfs interfaces
> > actually stable or can they change any time?)
>
> The "ext_range" attribute does look like an appropriate proxy for the
> attribute, but indeed, it's not well documented.
>
> Looking at the history of the documentation you had been relying on, it
> appears that was submitted with good intentions (9243c6f3e012a92d), but
> it itself changed values, acknowledging the instability of this
> interface.
>
> So what to do? If documentation is all that's preventing "ext_range"
> from replacing you're previous usage, then let's add it in the
> Documentation/ABI/stable/sysfs-block. It's been there since 2008, so
> that seems like a reliable attribute to put there.

Well, history so far is telling us that this doesn't stop the block layer
to change it anyway...

AFAICS "ext_range" is kinda messy to use for this since it changed
behaviour – only since
https://github.com/torvalds/linux/commit/1ebe2e5f9d68e94c524aba876f27b945669a7879
it actually directly exposes GENHD_FL_NO_PART, before it it did some
more complex stuff which did *not* take GENHD_FL_NO_PART into
consideration. It's nasty to hack against that from userspace, since
we never know on what kernel we are on, and how it has been patched.

Also "ext_range" is only available on whole block devices afaics. Partition
block devices do not have it at all, which makes the check userspace
has to do even more complex.

All I am looking for is a very simple test that returns me a boolean:
is there kernel-level partition scanning enabled on this device or
not. At this point it's not clear to me if I can write this at all in
a way that works reasonably correctly on any kernel since let's say
4.15 (which is systemd's "recommended baseline" right now).

I am really not sure how to salvage this mess at all. AFAICS there's
currently no way to write such a test correctly.

1. "ext_range" does not work on older kernels, and not on partition
block devices
2. "capabilities" does not work on newer kernels, because it changed
meaning and then was amputated to be zero.
3. There's no way to know if we are on an old or new kernel, as
apparently various distros backported the amputation.

So, what now?

I think it would be nice if the "capabilities" thing would be brought
back in a limited form. For example, if it would be changed to start
to return 0x200|0x1000 for part scanning is off, 0x1000 when it is on.

That would then mean we return to compatibility with Linux <= 5.15,
but the new 0x1000 bit would tell us that the information is
reliable. i.e. if userspace sees 0x1000 being set we know that the
0x200 bit is definitely correct. That would then just mean that
kernels >= 5.16 until today are left in the cold...

That would then allow userspace to implement:

1. if "capabilities" has 0x200 set → definitely no partition scanning
2. if "capabilities" has 0x1000 set → bit 0x200 reliably tells is
whether partition scanning on or off
3. if DEVTYPE=partition → definitely no partition scanning
4. if "ext_range" is 1 → definitely no partition scanning
5. if LOOP_GET_STATUS64 works, then .lo_flags' LO_FLAGS_PARTSCAN flag
indicates partition scanning on or off.
6. otherwise: ??? (probably we should assume partition scanning is on?)

Lennart

--
Lennart Poettering, Berlin

2024-04-09 14:16:20

by Christoph Hellwig

[permalink] [raw]
Subject: Re: API break, sysfs "capability" file

On Tue, Apr 09, 2024 at 10:19:09AM +0200, Lennart Poettering wrote:
> All I am looking for is a very simple test that returns me a boolean:
> is there kernel-level partition scanning enabled on this device or
> not.

And we can add a trivial sysfs attribute for that.

> At this point it's not clear to me if I can write this at all in
> a way that works reasonably correctly on any kernel since let's say
> 4.15 (which is systemd's "recommended baseline" right now).
>
> I am really not sure how to salvage this mess at all. AFAICS there's
> currently no way to write such a test correctly.

You can't. Maybe that's a lesson to not depend on undocumented internal
flags exposed by accident by a weirdo interface. Just talk to people.

> I think it would be nice if the "capabilities" thing would be brought
> back in a limited form. For example, if it would be changed to start
> to return 0x200|0x1000 for part scanning is off, 0x1000 when it is on.
>
> That would then mean we return to compatibility with Linux <= 5.15,
> but the new 0x1000 bit would tell us that the information is
> reliable. i.e. if userspace sees 0x1000 being set we know that the
> 0x200 bit is definitely correct. That would then just mean that
> kernels >= 5.16 until today are left in the cold...

At this point we're just better off with a clean new interface.
And you can use the old hack for < 5.15 if you care strongly enough
or just talk distros into backporting it to make their lives easier.

2024-04-09 15:19:49

by Jens Axboe

[permalink] [raw]
Subject: Re: API break, sysfs "capability" file

On 4/9/24 8:15 AM, Christoph Hellwig wrote:
> On Tue, Apr 09, 2024 at 10:19:09AM +0200, Lennart Poettering wrote:
>> All I am looking for is a very simple test that returns me a boolean:
>> is there kernel-level partition scanning enabled on this device or
>> not.
>
> And we can add a trivial sysfs attribute for that.

And I think we should. I don't know what was being smoked adding a sysfs
interface that exposed internal flag values - and honestly what was
being smoked to rely on that, but I think it's fair to say that the
majority of the fuckup here is on the kernel side.

> At this point we're just better off with a clean new interface.
> And you can use the old hack for < 5.15 if you care strongly enough
> or just talk distros into backporting it to make their lives easier.

We should arguably just stable mark the patch adding the above simple
interface.

--
Jens Axboe


Subject: Re: API break, sysfs "capability" file

On 09.04.24 17:17, Jens Axboe wrote:
> On 4/9/24 8:15 AM, Christoph Hellwig wrote:
>> On Tue, Apr 09, 2024 at 10:19:09AM +0200, Lennart Poettering wrote:
>>> All I am looking for is a very simple test that returns me a boolean:
>>> is there kernel-level partition scanning enabled on this device or
>>> not.
>>
>> And we can add a trivial sysfs attribute for that.
>
> And I think we should. I don't know what was being smoked adding a sysfs
> interface that exposed internal flag values - and honestly what was
> being smoked to rely on that, but I think it's fair to say that the
> majority of the fuckup here is on the kernel side.
>
>> At this point we're just better off with a clean new interface.
>> And you can use the old hack for < 5.15 if you care strongly enough
>> or just talk distros into backporting it to make their lives easier.
>
> We should arguably just stable mark the patch adding the above simple
> interface.

I might have missed something, but it seems nothing has happened since a
week. Sure, this is hardly a new regression, so it's not that urgent;
still it would be good to see this fixed rather sooner than later after
all the publicity this got. So allow me to quickly ask:

Is anyone (Christoph?) already working on such a patch or is it at least
somewhat high on somebody's todo list?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

2024-04-16 14:20:24

by Lennart Poettering

[permalink] [raw]
Subject: Re: API break, sysfs "capability" file

On Di, 09.04.24 09:17, Jens Axboe ([email protected]) wrote:

> On 4/9/24 8:15 AM, Christoph Hellwig wrote:
> > On Tue, Apr 09, 2024 at 10:19:09AM +0200, Lennart Poettering wrote:
> >> All I am looking for is a very simple test that returns me a boolean:
> >> is there kernel-level partition scanning enabled on this device or
> >> not.
> >
> > And we can add a trivial sysfs attribute for that.
>
> And I think we should. I don't know what was being smoked adding a sysfs
> interface that exposed internal flag values - and honestly what was
> being smoked to rely on that, but I think it's fair to say that the
> majority of the fuckup here is on the kernel side.

Yeah, it's a shitty interface, the kernel is rich in that. But it was
excessively well documented, better in fact than almost all other
kernel interfaces:

https://www.kernel.org/doc/html/v5.16/block/capability.html

If you document something on so much detail in the API docs, how do
you expect this *not* to be relied on by userspace.

Lennart

2024-04-16 14:27:25

by Lennart Poettering

[permalink] [raw]
Subject: Re: API break, sysfs "capability" file

On Di, 09.04.24 16:15, Christoph Hellwig ([email protected]) wrote:
11;rgb:1717/1414/2121
> On Tue, Apr 09, 2024 at 10:19:09AM +0200, Lennart Poettering wrote:
> > All I am looking for is a very simple test that returns me a boolean:
> > is there kernel-level partition scanning enabled on this device or
> > not.
>
> And we can add a trivial sysfs attribute for that.
>
> > At this point it's not clear to me if I can write this at all in
> > a way that works reasonably correctly on any kernel since let's say
> > 4.15 (which is systemd's "recommended baseline" right now).
> >
> > I am really not sure how to salvage this mess at all. AFAICS there's
> > currently no way to write such a test correctly.
>
> You can't. Maybe that's a lesson to not depend on undocumented internal
> flags exposed by accident by a weirdo interface. Just talk to
> people.

Undocumented? Internal?

It's was actually one of the *best* documented kernel *public* APIs I
ever came across:

https://www.kernel.org/doc/html/v5.16/block/capability.html

So much detail, I love it!

I mean, you did good work here, documented it, with all flags in all
details. I think that's great work! You should take pride in this, not
try to deny its existance!

> > I think it would be nice if the "capabilities" thing would be brought
> > back in a limited form. For example, if it would be changed to start
> > to return 0x200|0x1000 for part scanning is off, 0x1000 when it is on.
> >
> > That would then mean we return to compatibility with Linux <= 5.15,
> > but the new 0x1000 bit would tell us that the information is
> > reliable. i.e. if userspace sees 0x1000 being set we know that the
> > 0x200 bit is definitely correct. That would then just mean that
> > kernels >= 5.16 until today are left in the cold...
>
> At this point we're just better off with a clean new interface.
> And you can use the old hack for < 5.15 if you care strongly enough
> or just talk distros into backporting it to make their lives easier.

I'll take what I can get. If API compatibility is not coming back,
then sure, a new sysattr is better than nothing.

Lennart

--
Lennart Poettering, Berlin

2024-04-16 14:50:44

by Jens Axboe

[permalink] [raw]
Subject: Re: API break, sysfs "capability" file

On 4/16/24 8:18 AM, Lennart Poettering wrote:
> On Di, 09.04.24 09:17, Jens Axboe ([email protected]) wrote:
>
>> On 4/9/24 8:15 AM, Christoph Hellwig wrote:
>>> On Tue, Apr 09, 2024 at 10:19:09AM +0200, Lennart Poettering wrote:
>>>> All I am looking for is a very simple test that returns me a boolean:
>>>> is there kernel-level partition scanning enabled on this device or
>>>> not.
>>>
>>> And we can add a trivial sysfs attribute for that.
>>
>> And I think we should. I don't know what was being smoked adding a sysfs
>> interface that exposed internal flag values - and honestly what was
>> being smoked to rely on that, but I think it's fair to say that the
>> majority of the fuckup here is on the kernel side.
>
> Yeah, it's a shitty interface, the kernel is rich in that. But it was
> excessively well documented, better in fact than almost all other
> kernel interfaces:
>
> ? https://www.kernel.org/doc/html/v5.16/block/capability.html ?
>
> If you document something on so much detail in the API docs, how do
> you expect this *not* to be relied on by userspace.

This is _internal_ documentation, not user ABI documentation. The fact
that it's talking about internal flag values should make that clear,
though I can definitely see how that's just badly exposed along with
other things that document things that users/admins could care about.

--
Jens Axboe


2024-04-16 15:12:20

by Keith Busch

[permalink] [raw]
Subject: Re: API break, sysfs "capability" file

On Tue, Apr 16, 2024 at 04:23:43PM +0200, Lennart Poettering wrote:
> On Di, 09.04.24 16:15, Christoph Hellwig ([email protected]) wrote:
> 11;rgb:1717/1414/2121
> > On Tue, Apr 09, 2024 at 10:19:09AM +0200, Lennart Poettering wrote:
> > > All I am looking for is a very simple test that returns me a boolean:
> > > is there kernel-level partition scanning enabled on this device or
> > > not.
> >
> > And we can add a trivial sysfs attribute for that.
> >
> > > At this point it's not clear to me if I can write this at all in
> > > a way that works reasonably correctly on any kernel since let's say
> > > 4.15 (which is systemd's "recommended baseline" right now).
> > >
> > > I am really not sure how to salvage this mess at all. AFAICS there's
> > > currently no way to write such a test correctly.
> >
> > You can't. Maybe that's a lesson to not depend on undocumented internal
> > flags exposed by accident by a weirdo interface. Just talk to
> > people.
>
> Undocumented? Internal?
>
> It's was actually one of the *best* documented kernel *public* APIs I
> ever came across:
>
> https://www.kernel.org/doc/html/v5.16/block/capability.html
>
> So much detail, I love it!
>
> I mean, you did good work here, documented it, with all flags in all
> details. I think that's great work! You should take pride in this, not
> try to deny its existance!

The patch that introduced this was submitted not because the API was
stable; it was committed to encourage developers to update it as it
changed because it is *not* stable. That's not the kind of interface you
want exported for users to rely on, but no one should have to search
commit logs to understand why the doc exists, so I think exporting it
was just a mistake on the kernel side. To say this doc is "good"
misunderstands what it was trying to accomplish, and what it ultimately
created instead: technical debt.

The block interfaces documented in Documetation/ABI/stable/ are reliably
stable, though.

2024-04-17 15:09:25

by Christoph Hellwig

[permalink] [raw]
Subject: Re: API break, sysfs "capability" file

On Tue, Apr 16, 2024 at 11:26:40AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> I might have missed something, but it seems nothing has happened since a
> week. Sure, this is hardly a new regression, so it's not that urgent;

It is not a regression at all. Userspace poked into completely internal
bits that changed frequently before and now it changed again.

2024-04-17 15:14:17

by Christoph Hellwig

[permalink] [raw]
Subject: Re: API break, sysfs "capability" file

On Tue, Apr 16, 2024 at 08:44:55AM -0600, Keith Busch wrote:
> The patch that introduced this was submitted not because the API was
> stable; it was committed to encourage developers to update it as it
> changed because it is *not* stable. That's not the kind of interface you
> want exported for users to rely on, but no one should have to search
> commit logs to understand why the doc exists, so I think exporting it
> was just a mistake on the kernel side. To say this doc is "good"
> misunderstands what it was trying to accomplish, and what it ultimately
> created instead: technical debt.

Yes. It might be a problem with the documentation generation mess,
but something that is generated from a random code comment really
can't be an API document.

Anyway, instead of bickering about this, what does systemd actually
want to known? Because all these flags we talked about did slightly
different things all vaguely related to partition scanning.
We also have another GD_SUPPRESS_PART_SCAN bit that is used to
temporarily suppress partition scanning (and ublk now also abuses
it in really whacky way, sigh). I'm not really sure why userspace
would even care about any of this, but I'm open to come up with
something sane if we can actually define the semantics.

2024-04-17 15:53:43

by Lennart Poettering

[permalink] [raw]
Subject: Re: API break, sysfs "capability" file

On Mi, 17.04.24 17:13, Christoph Hellwig ([email protected]) wrote:

> On Tue, Apr 16, 2024 at 08:44:55AM -0600, Keith Busch wrote:
> > The patch that introduced this was submitted not because the API was
> > stable; it was committed to encourage developers to update it as it
> > changed because it is *not* stable. That's not the kind of interface you
> > want exported for users to rely on, but no one should have to search
> > commit logs to understand why the doc exists, so I think exporting it
> > was just a mistake on the kernel side. To say this doc is "good"
> > misunderstands what it was trying to accomplish, and what it ultimately
> > created instead: technical debt.
>
> Yes. It might be a problem with the documentation generation mess,
> but something that is generated from a random code comment really
> can't be an API document.
>
> Anyway, instead of bickering about this, what does systemd actually
> want to known? Because all these flags we talked about did slightly
> different things all vaguely related to partition scanning.
> We also have another GD_SUPPRESS_PART_SCAN bit that is used to
> temporarily suppress partition scanning (and ublk now also abuses
> it in really whacky way, sigh). I'm not really sure why userspace
> would even care about any of this, but I'm open to come up with
> something sane if we can actually define the semantics.

systemd works a lot with partitioned GPT disk images, and many of our
tools you can point to such images, either by referencing a file or by
referencing a block device. We generally handle that
transparently. Typically we then look at the GPT partition table from
userspace, and then do something with the associated partition block
devices (i.e. mount them or so). But those will only be synthesized by
the kernel if we are operating on a block device with partition
scanning on. Since kernel part scanning is async if a partition block
device doesn#t exist yet could have two reasons: part scanning is off
for the device, or the part table is still read and processed by the
kernel. By being able to read the part scan flag off the block device
we know which case it is, and can avoid a potential time-out.

If we are operating on a regular file or on a block device with
partition scanning off, we first have to generate an intermediary
loopback block device off it, whith scanning on.

Hence it's very useful to be able to determine if we a block device
already supports partition scans, simply so that we can do all this
entirely generically: you give us anything and we can handle it.

Block devices with part scanning off are quite common after all,
i.e. "losetup" creates them by default like that, and partition block
devices themselves have no part scanning on and so on, hence we have
to be ablet to operate sanely with them.

Lennart

--
Lennart Poettering, Berlin

2024-04-17 15:59:37

by Christoph Hellwig

[permalink] [raw]
Subject: Re: API break, sysfs "capability" file

On Wed, Apr 17, 2024 at 05:48:16PM +0200, Lennart Poettering wrote:
> Block devices with part scanning off are quite common after all,
> i.e. "losetup" creates them by default like that, and partition block
> devices themselves have no part scanning on and so on, hence we have
> to be ablet to operate sanely with them.

Maybe and ioctl to turn on partition scanning if it is currently disabled
or return an error otherwise would be the better thing? It would
do the right thing for the most common loop case, and with a bit more
work could do the right thing for those that more or less disable it
graciously (ubiblock, drbd, zram) and would just fail for those who are
so grotty old code and slow devices that we never want to do a partition
scan (basically old floppy drivers and the Nintendo N64 cartridge driver)


2024-04-17 16:23:01

by Lennart Poettering

[permalink] [raw]
Subject: Re: API break, sysfs "capability" file

On Mi, 17.04.24 17:59, Christoph Hellwig ([email protected]) wrote:

> On Wed, Apr 17, 2024 at 05:48:16PM +0200, Lennart Poettering wrote:
> > Block devices with part scanning off are quite common after all,
> > i.e. "losetup" creates them by default like that, and partition block
> > devices themselves have no part scanning on and so on, hence we have
> > to be ablet to operate sanely with them.
>
> Maybe and ioctl to turn on partition scanning if it is currently disabled
> or return an error otherwise would be the better thing? It would
> do the right thing for the most common loop case, and with a bit more
> work could do the right thing for those that more or less disable it
> graciously (ubiblock, drbd, zram) and would just fail for those who are
> so grotty old code and slow devices that we never want to do a partition
> scan (basically old floppy drivers and the Nintendo N64 cartridge driver)

Well, there are plenty of other block devices with part scanning off,
such as DM, including dm-crypt, dm-integrity and so on. And that's
certainly stuff we want to cover for this.

Lennart

--
Lennart Poettering, Berlin

2024-04-17 16:40:58

by Christoph Hellwig

[permalink] [raw]
Subject: Re: API break, sysfs "capability" file

On Wed, Apr 17, 2024 at 06:10:21PM +0200, Lennart Poettering wrote:
> Well, there are plenty of other block devices with part scanning off,
> such as DM, including dm-crypt, dm-integrity and so on. And that's
> certainly stuff we want to cover for this.

But there is no good reason to prohibit scanning for them. We can't
scan by default as that would probably break a few expectations in
userspace, but we can trivially allow manual scanning.


2024-04-18 06:28:16

by Hannes Reinecke

[permalink] [raw]
Subject: Re: API break, sysfs "capability" file

On 4/17/24 17:59, Christoph Hellwig wrote:
> On Wed, Apr 17, 2024 at 05:48:16PM +0200, Lennart Poettering wrote:
>> Block devices with part scanning off are quite common after all,
>> i.e. "losetup" creates them by default like that, and partition block
>> devices themselves have no part scanning on and so on, hence we have
>> to be ablet to operate sanely with them.
>
> Maybe and ioctl to turn on partition scanning if it is currently disabled
> or return an error otherwise would be the better thing? It would
> do the right thing for the most common loop case, and with a bit more
> work could do the right thing for those that more or less disable it
> graciously (ubiblock, drbd, zram) and would just fail for those who are
> so grotty old code and slow devices that we never want to do a partition
> scan (basically old floppy drivers and the Nintendo N64 cartridge driver)
>
>
The world is going to end.
hch suggests an ioctl.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich