2024-05-09 15:53:39

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem

On May 9, 2024 11:19:39 AM GMT+02:00, Jonathan Cameron <[email protected]> wrote:
>Many subsystem core drivers will probe and create subsystem specific
>sysfs directories on on systems that don't have any hardware needing
>drivers from that subsystem (if someone manually inserts them rather
>than relying on automatic module dependency handling.)
>I don't see why this class driver should be different and have to jump
>through hoops to satisfy this requirement.

You mean it should load because "Look ma, the others do it this way". Does it make any sense? Of course not.

Are you arguing for the nonsensical "it should load" case because it is simply easier this way? How hard is that "jump through hoops" thing anyway?

You mean it should load so that when booting an allmodconfig kernel there are not enough modules which are loading so lemme load one more. And then I need to go and rmmod them all before I need to do localmodconfig and build a tailored kernel for the machine.

Or is there some other reason to load silly modules, use up resources for no good reason whatsoever and bloat the machine?

You mean, f*ck it, right? Who cares...

Geez.


--
Sent from a small device: formatting sucks and brevity is inevitable.


2024-05-09 20:26:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem

On Thu, May 09, 2024 at 05:52:18PM +0200, Borislav Petkov wrote:
> Are you arguing for the nonsensical "it should load" case because it
> is simply easier this way? How hard is that "jump through hoops" thing
> anyway?

Let's see: the following patches add something called
GET_SUPPORTED_FEATURES which is used to detect whether the system has
patrol scrub functionality etc.

Then there's ras2_acpi_init() which checks for a RAS2 ACPI table.

Are you saying that checking for those two things in the init function
is jumping through hoops?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-05-09 21:30:07

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem

Borislav Petkov wrote:
> On Thu, May 09, 2024 at 05:52:18PM +0200, Borislav Petkov wrote:
> > Are you arguing for the nonsensical "it should load" case because it
> > is simply easier this way? How hard is that "jump through hoops" thing
> > anyway?
>
> Let's see: the following patches add something called
> GET_SUPPORTED_FEATURES which is used to detect whether the system has
> patrol scrub functionality etc.
>
> Then there's ras2_acpi_init() which checks for a RAS2 ACPI table.
>
> Are you saying that checking for those two things in the init function
> is jumping through hoops?

Wait, this discussion has gone off the rails.

Recall that there are 461 usages of module_pci_driver() in the kernel.
Every one of those arranges for just registering a PCI driver when the
module is loaded regardless of whether any devices that driver cares
about are present.

Consider the case of the PCI subsystem that allows dynamically updating
the device ids that a driver attaches. I.e.

echo $id > /sys/bus/pci/drivers/$driver/new_id

..the fundamentally does not work if the module unloads itself and the
driver(s) it registers when the PCI bus core finds no devices to attach
at runtime.

The mitigation for keeping unneeded modules unloaded is that udev does
not autoload modules unless a PCI bus udev event matches a module's
published PCI device table.

Don't want to waste module memory? Don't load the module.

2024-05-09 21:53:10

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem

On Thu, May 09, 2024 at 02:21:28PM -0700, Dan Williams wrote:
> Recall that there are 461 usages of module_pci_driver() in the kernel.
> Every one of those arranges for just registering a PCI driver when the
> module is loaded regardless of whether any devices that driver cares
> about are present.

Sorry, I read your text a bunch of times but I still have no clue what
you're trying to tell me.

All *I* am saying is since this is a new subsystem and the methods for
detecting the scrub functionality are two - either an ACPI table or
a GET_SUPPORTED_FEATURES command, then the init function of the
subsystem:

+static int __init memory_scrub_control_init(void)
+{
+ return class_register(&scrub_class);
+}
+subsys_initcall(memory_scrub_control_init);

can check for those two things before initializing.

If there is no scrubbing functionality, then it can return an error and
not load.

The same as when we don't load x86 drivers on the wrong vendor and so
on.

If the check is easy, why not do it?

Make more sense?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-05-09 22:59:53

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem

Borislav Petkov wrote:
> On Thu, May 09, 2024 at 02:21:28PM -0700, Dan Williams wrote:
> > Recall that there are 461 usages of module_pci_driver() in the kernel.
> > Every one of those arranges for just registering a PCI driver when the
> > module is loaded regardless of whether any devices that driver cares
> > about are present.
>
> Sorry, I read your text a bunch of times but I still have no clue what
> you're trying to tell me.

Because taking this proposal to its logical end of "if a simple check is
possible, why not do it in module_init()" has wide implications like the
module_pci_driver() example.

> All *I* am saying is since this is a new subsystem and the methods for
> detecting the scrub functionality are two - either an ACPI table or
> a GET_SUPPORTED_FEATURES command, then the init function of the
> subsystem:

No, at a minimum that's a layering violation. This is a generic library
facility that should not care if it is being called for a CXL device or
an ACPI device. It also causes functional issues, see below:

> +static int __init memory_scrub_control_init(void)
> +{
> + return class_register(&scrub_class);
> +}
> +subsys_initcall(memory_scrub_control_init);
>
> can check for those two things before initializing.
>
> If there is no scrubbing functionality, then it can return an error and
> not load.
>
> The same as when we don't load x86 drivers on the wrong vendor and so
> on.

I think it works for x86 drivers because the functionality in those
modules is wholly contained within that one module. This scrub module is
a service library for other modules.

> If the check is easy, why not do it?

It is functionally the wrong place to do the check. When module_init()
fails it causes not only the current module to be unloaded but any
dependent modules will also fail to load.

Let's take an example of the CXL driver wanting to register with this
scrub interface to support the capability that *might* be available on
some CXL devices. The cxl_pci.ko module, that houses cxl_pci_driver,
grows a call to scrub_device_register(). That scrub_device_register()
call is statically present in cxl_pci.ko so that when cxl_pci.ko loads
symbol resolution requires scrub.ko to load.

Neither of those modules (cxl_pci.ko or scrub.ko) load automatically.
Either udev loads cxl_pci.ko because it sees a device that matches
cxl_mem_pci_tbl, or the user manually insmods those modules because they
think they know better. No memory wasted unless the user explicitly asks
for memory to be wasted.

If no CXL devices in the system have scrub capabilities, great, then
scrub_device_register() will never be called.

Now, if memory_scrub_control_init() did its own awkward and redundant
CXL scan, and fails with "no CXL scrub capable devices found" it would
not only block scrub.ko from loading, but also cxl_pci.ko since
cxl_pci.ko needs to resolve that symbol to load.

All of that said, you are right that there is still a scenario where
memory is wasted. I.e. the case where a subsystem like CXL or ACPI wants
the runtime *option* of calling scrub_device_register(), but never does.
That will inflict the cost of registering a vestigial scrub_class. That
can be mitigated with another layer of module indirection where
cxl_pci_driver registers a cxl_scrub_device and then a cxl_scrub_driver
in its own module calls scrub_device_register() with the scrub core.

I would entertain that extra indirection long before I would entertain
memory_scrub_control_init() growing scrub device enumeration that
belongs to the *caller* of scrub_device_register().

> Make more sense?

It is a reasonable question, but all module libraries incur init costs
just by being linked by another module. You can walk /sys/class to see
how many other subsystems are registering class devices but never using
them.

I would not say "no" to a generic facility that patches out module
dependencies until the first call, just not sure the return on
investment would be worth it.

Lastly I think drivers based on ACPI tables are awkward. They really
need to have an ACPI device to attach so that typical automatic Linux
module loading machinery can be used. The fact this function is a
subsys_initcall() is a red-flag since nothing should be depending on the
load order of a little driver to control scrub parameters.

2024-05-10 09:26:23

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem

On Thu, May 09, 2024 at 03:59:29PM -0700, Dan Williams wrote:
> No, at a minimum that's a layering violation. This is a generic library
> facility that should not care if it is being called for a CXL device or
> an ACPI device.

Really?

Because this looks like creating a subsystem for those two newfangled
scrubbing functionalities which will be present in CXL devices and
by that ACPI RAS2 thing.

Because we have a *lot* of hw scrubbing functionality already. Just do:

git grep "scrub"

Some of it controls hw scrubbing. If this is a generic facility, does
this mean that all those older scrubbers should be converted to it?

Or is this thing going to support only new stuff? I.e., we want to
disable our scrubbers when doing performance-sensitive workloads and
reenable them after that but we don't care about old systems?

And all that other bla about controlling scrubbers from userspace.

So which is it?

> I think it works for x86 drivers because the functionality in those
> modules is wholly contained within that one module. This scrub module is
> a service library for other modules.

Well, you have that thing in EDAC. edac_core.ko is that service module
and the chipset-specific drivers - at least on x86 - use a match_id to
load only on the systems they should load on.

If I had a BIOS table which had "EDAC" in it, I won't load edac_core.ko
either but there isn't one.

> It is functionally the wrong place to do the check. When module_init()
> fails it causes not only the current module to be unloaded but any
> dependent modules will also fail to load.

See above. I'm under the assumption that this is using two methods to
detect scrubbing functionality.

> Let's take an example of the CXL driver wanting to register with this
> scrub interface to support the capability that *might* be available on
> some CXL devices. The cxl_pci.ko module, that houses cxl_pci_driver,
> grows a call to scrub_device_register(). That scrub_device_register()
> call is statically present in cxl_pci.ko so that when cxl_pci.ko loads
> symbol resolution requires scrub.ko to load.
>
> Neither of those modules (cxl_pci.ko or scrub.ko) load automatically.
> Either udev loads cxl_pci.ko because it sees a device that matches
> cxl_mem_pci_tbl, or the user manually insmods those modules because they
> think they know better. No memory wasted unless the user explicitly asks
> for memory to be wasted.

The scrub.ko goes and asks the system: "Do you have a CXL device with
scrubbing functionality?" "Yes: load." "No: ok, won't."

> If no CXL devices in the system have scrub capabilities, great, then
> scrub_device_register() will never be called.
>
> Now, if memory_scrub_control_init() did its own awkward and redundant
> CXL scan, and fails with "no CXL scrub capable devices found" it would
> not only block scrub.ko from loading, but also cxl_pci.ko since
> cxl_pci.ko needs to resolve that symbol to load.

Why would it fail the scan?

Isn't this fancy GET_SUPPORTED_FEATURES command giving you all info you
need?

> I would not say "no" to a generic facility that patches out module
> dependencies until the first call, just not sure the return on
> investment would be worth it.

From the discussion so far yeah, I think you're right, it is not worth
the effort.

> Lastly I think drivers based on ACPI tables are awkward. They really
> need to have an ACPI device to attach so that typical automatic Linux
> module loading machinery can be used. The fact this function is a
> subsys_initcall() is a red-flag since nothing should be depending on the
> load order of a little driver to control scrub parameters.

Yeah, it is becoming a mess before it has even started.

So I don't mind if such drivers get loaded as long as doing this is the
best we can do given the situation. What gets me up the palms, as they
say in .de, is "just because" and "look, the others do it too."

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-05-10 17:14:05

by Dan Williams

[permalink] [raw]
Subject: Re: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem

Borislav Petkov wrote:
> On Thu, May 09, 2024 at 03:59:29PM -0700, Dan Williams wrote:
> > No, at a minimum that's a layering violation. This is a generic library
> > facility that should not care if it is being called for a CXL device or
> > an ACPI device.
>
> Really?

Yes.

> Because this looks like creating a subsystem for those two newfangled
> scrubbing functionalities which will be present in CXL devices and
> by that ACPI RAS2 thing.
>
> Because we have a *lot* of hw scrubbing functionality already. Just do:
>
> git grep "scrub"
>
> Some of it controls hw scrubbing. If this is a generic facility, does
> this mean that all those older scrubbers should be converted to it?
>
> Or is this thing going to support only new stuff? I.e., we want to
> disable our scrubbers when doing performance-sensitive workloads and
> reenable them after that but we don't care about old systems?
>
> And all that other bla about controlling scrubbers from userspace.
>
> So which is it?

In fact this question matches my reaction to the last posting [1], and
led to a much improved cover letter and the "Comparison of scrubbing
features". To your point there are scrub capabilities already in the
kernel and we would need to make a decision about what to do about them.
I called out NVDIMM ARS as one example and am open to exploring
converting that to the common scrub ABI, but not block this proposal in
the meantime.

For me the proposal can be boiled down to, "here we (kernel community)
are again with 2 new scrub interfaces to add to the kernel. Lets step
back, define a common ABI for ACPI RAS 2 and CXL to stop the
proliferation of scrub ABIs, and then make a decision about when/whether
to integrate legacy scrub facilities into this new interface".

[1]: http://lore.kernel.org/r/[email protected]

> > I think it works for x86 drivers because the functionality in those
> > modules is wholly contained within that one module. This scrub module is
> > a service library for other modules.
>
> Well, you have that thing in EDAC. edac_core.ko is that service module
> and the chipset-specific drivers - at least on x86 - use a match_id to
> load only on the systems they should load on.

Which is exactly the same mechanism being defined here. scrub_core.ko is
a service module that would only be requested by an ACPI module or a CXL
module after one of those loads based on their match_id.

> If I had a BIOS table which had "EDAC" in it, I won't load edac_core.ko
> either but there isn't one.
>
> > It is functionally the wrong place to do the check. When module_init()
> > fails it causes not only the current module to be unloaded but any
> > dependent modules will also fail to load.
>
> See above. I'm under the assumption that this is using two methods to
> detect scrubbing functionality.

The scrub_core, like edac_core, has no method to detect scrubbing
facility, it is simply a passive library waiting for the first
scrub_device_register() call.

> > Let's take an example of the CXL driver wanting to register with this
> > scrub interface to support the capability that *might* be available on
> > some CXL devices. The cxl_pci.ko module, that houses cxl_pci_driver,
> > grows a call to scrub_device_register(). That scrub_device_register()
> > call is statically present in cxl_pci.ko so that when cxl_pci.ko loads
> > symbol resolution requires scrub.ko to load.
> >
> > Neither of those modules (cxl_pci.ko or scrub.ko) load automatically.
> > Either udev loads cxl_pci.ko because it sees a device that matches
> > cxl_mem_pci_tbl, or the user manually insmods those modules because they
> > think they know better. No memory wasted unless the user explicitly asks
> > for memory to be wasted.
>
> The scrub.ko goes and asks the system: "Do you have a CXL device with
> scrubbing functionality?" "Yes: load." "No: ok, won't."

Yeah, that's backwards. CXL enumeration belongs in the CXL driver and
the CXL driver is fully responsible for deciding when to incur the costs
of loading scrub_core.

> > If no CXL devices in the system have scrub capabilities, great, then
> > scrub_device_register() will never be called.
> >
> > Now, if memory_scrub_control_init() did its own awkward and redundant
> > CXL scan, and fails with "no CXL scrub capable devices found" it would
> > not only block scrub.ko from loading, but also cxl_pci.ko since
> > cxl_pci.ko needs to resolve that symbol to load.
>
> Why would it fail the scan?

cxl_pci.ko loads based on match_id and cxl_pci_probe() enumerates device
capabilities. My interpretation of your feedback is that
memory_scrub_control_init() should duplicate that cxl_pci_probe()
enumeration?

Assume that it does and memory_scrub_control_init() finds no scrub
facilities in any CXL devices and fails memory_scrub_control_init(). Any
module that links to scrub_device_register() will also fail to load
because module symbol resolution depends on all modules completing init.

> Isn't this fancy GET_SUPPORTED_FEATURES command giving you all info you
> need?

Sure, but that's a driver-probe-time facility, not a module_init-time
facility.

> > Lastly I think drivers based on ACPI tables are awkward. They really
> > need to have an ACPI device to attach so that typical automatic Linux
> > module loading machinery can be used. The fact this function is a
> > subsys_initcall() is a red-flag since nothing should be depending on the
> > load order of a little driver to control scrub parameters.
>
> Yeah, it is becoming a mess before it has even started.

I assume you do not consider edac_core a mess?

Now, the question of how many legacy scrub interfaces should be
considered in this design out of the gate is a worthwhile discussion. I
am encouraged that this ABI is at least trying to handle more than 1
backend, which makes me feel better that adding a 3rd and 4th might not
be prohibitive.

> So I don't mind if such drivers get loaded as long as doing this is the
> best we can do given the situation. What gets me up the palms, as they
> say in .de, is "just because" and "look, the others do it too."

Which matches what I reacted to on the last posting:

"Maybe it is self evident to others, but for me there is little in these
changelogs besides 'mechanism exists, enable it'"

..and to me that feedback was taken to heart with much improved
changelogs in this new posting.

This init time feature probing discussion feels like it was born from a
micommunication / misunderstanding.

2024-05-11 10:18:30

by Borislav Petkov

[permalink] [raw]
Subject: Re: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem

On Fri, May 10, 2024 at 10:13:41AM -0700, Dan Williams wrote:
> In fact this question matches my reaction to the last posting [1], and
> led to a much improved cover letter and the "Comparison of scrubbing
> features". To your point there are scrub capabilities already in the
> kernel and we would need to make a decision about what to do about them.

The answer to that question is whether this new userspace usage is going
to want to control those too.

So

"Use case of scrub control feature"

from the cover letter is giving two short sentences about what one would
do but I'm still meh. A whole subsystem needing a bunch of effort would
need a lot more justification.

So can anyone please elaborate more on the use cases and why this new
thing is needed?

> I called out NVDIMM ARS as one example and am open to exploring
> converting that to the common scrub ABI, but not block this proposal
> in the meantime.
>
> For me the proposal can be boiled down to, "here we (kernel community)
> are again with 2 new scrub interfaces to add to the kernel. Lets step
> back, define a common ABI for ACPI RAS 2 and CXL to stop the
> proliferation of scrub ABIs, and then make a decision about when/whether
> to integrate legacy scrub facilities into this new interface".

Fully agreed as long as there's valid users for it and we don't end up
designing and supporting an interface which people are not sure if
anyone uses. ras_userspace_consumers() from the other thread
case-in-point.

> [1]: http://lore.kernel.org/r/[email protected]
^^^^^

Ha, you're speaking what I'm thinking here. :-)

> The scrub_core, like edac_core, has no method to detect scrubbing
> facility, it is simply a passive library waiting for the first
> scrub_device_register() call.

Well, those scrub things still have methods which are better than
nothing. EDAC is ancient. But ok, let's just say they're the same for
the sake of simplicity.

> Yeah, that's backwards. CXL enumeration belongs in the CXL driver and
> the CXL driver is fully responsible for deciding when to incur the costs
> of loading scrub_core.

Ok, fair enough.

> Assume that it does and memory_scrub_control_init() finds no scrub
> facilities in any CXL devices and fails memory_scrub_control_init(). Any
> module that links to scrub_device_register() will also fail to load
> because module symbol resolution depends on all modules completing init.

My angle was: scan the system for *all* possible scrub functionalities
and if none present, then fail. And since they're only two...

> Sure, but that's a driver-probe-time facility, not a module_init-time
> facility.

Oh well.

> I assume you do not consider edac_core a mess?

The whole EDAC is a mess but that's a whole another story. :-)

> Now, the question of how many legacy scrub interfaces should be
> considered in this design out of the gate is a worthwhile discussion. I
> am encouraged that this ABI is at least trying to handle more than 1
> backend, which makes me feel better that adding a 3rd and 4th might not
> be prohibitive.

See above.

I'm perfectly fine with: "hey, we have a new scrub API interfacing to
RAS scrub capability and it is *the* thing to use and all other hw scrub
functionality should be shoehorned into it.

So this thing's design should at least try to anticipate supporting
other scrub hw.

Because there's EDAC too. Why isn't this scrub thing part of EDAC? Why
isn't this scrub API part of edac_core? I mean, this is all RAS so why
design a whole new thing when the required glue is already there?

We can just as well have a

/sys/devices/system/edac/scrub/

node hierarchy and have everything there.

Why does it have to be yet another thing?

And if it needs to be separate, who's going to maintain it?

> Which matches what I reacted to on the last posting:
>
> "Maybe it is self evident to others, but for me there is little in these
> changelogs besides 'mechanism exists, enable it'"
>
> ...and to me that feedback was taken to heart with much improved
> changelogs in this new posting.

Ok.

> This init time feature probing discussion feels like it was born from a
> micommunication / misunderstanding.

Yes, it seems so, thanks for clarifying things.

I still am unclear on the usecases and how this is supposed to be used
and also, as mentioned above, we have a *lot* of RAS functionality
spread around the kernel. Perhaps we should start unifying it instead of
adding more...

So the big picture and where we're headed to, needs to be clarified first.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette