2021-10-22 02:02:56

by Zev Weiss

[permalink] [raw]
Subject: [PATCH 0/5] driver core, of: support for reserved devices

Hello all,

This series is another incarnation of a couple other patchsets I've
posted recently [0, 1], but again different enough in overall
structure that I'm not sure it's exactly a v2 (or v3).

As compared to [1], it abandons the writable binary sysfs files and at
Frank's suggestion returns to an approach more akin to [0], though
without any driver-specific (aspeed-smc) changes, which I figure might
as well be done later in a separate series once appropriate
infrastructure is in place.

The basic idea is to implement support for a status property value
that's documented in the DT spec [2], but thus far not used at all in
the kernel (or anywhere else I'm aware of): "reserved". According to
the spec (section 2.3.4, Table 2.4), this status:

Indicates that the device is operational, but should not be used.
Typically this is used for devices that are controlled by another
software component, such as platform firmware.

With these changes, devices marked as reserved are (at least in some
cases, more on this later) instantiated, but will not have drivers
bound to them unless and until userspace explicitly requests it by
writing the device's name to the driver's sysfs 'bind' file. This
enables appropriate handling of hardware arrangements that can arise
in contexts like OpenBMC, where a device may be shared with another
external controller not under the kernel's control (for example, the
flash chip storing the host CPU's firmware, shared by the BMC and the
host CPU and exclusively under the control of the latter by default).
Such a device can be marked as reserved so that the kernel refrains
from touching it until appropriate preparatory steps have been taken
(e.g. BMC userspace coordinating with the host CPU to arbitrate which
processor has control of the firmware flash).

Patches 1-3 provide some basic plumbing for checking the "reserved"
status of a device, patch 4 is the main driver-core change, and patch
5 tweaks the OF platform code to not skip reserved devices so that
they can actually be instantiated.

One shortcoming of this series is that it doesn't automatically apply
universally across all busses and drivers -- patch 5 enables support
for platform devices, but similar changes would be required for
support in other busses (e.g. in of_register_spi_devices(),
of_i2c_register_devices(), etc.) and drivers that instantiate DT
devices. Since at present a "reserved" status is treated as
equivalent to "disabled" and this series preserves that status quo in
those cases I'd hope this wouldn't be considered a deal-breaker, but
a thing to be aware of at least.

Greg: I know on [1] you had commented nack-ing the addition of boolean
function parameters; patch 4 adds a flags mask instead in an analogous
situation. I'm not certain how much of an improvement you'd consider
that (hopefully at least slightly better, in that the arguments passed
at the call site are more self-explanatory); if that's still
unsatisfactory I'd welcome any suggested alternatives.


Thanks,
Zev

[0] https://lore.kernel.org/openbmc/[email protected]/
[1] https://lore.kernel.org/openbmc/[email protected]/
[2] https://github.com/devicetree-org/devicetree-specification/releases/download/v0.3/devicetree-specification-v0.3.pdf

Zev Weiss (5):
of: base: add function to check for status = "reserved"
device property: add fwnode_device_is_reserved()
of: property: add support for fwnode_device_is_reserved()
driver core: inhibit automatic driver binding on reserved devices
of: platform: instantiate reserved devices

drivers/base/bus.c | 2 +-
drivers/base/dd.c | 13 ++++++----
drivers/base/property.c | 16 +++++++++++++
drivers/dma/idxd/compat.c | 3 +--
drivers/of/base.c | 56 ++++++++++++++++++++++++++++++++++++-------
drivers/of/platform.c | 2 +-
drivers/of/property.c | 6 +++++
drivers/vfio/mdev/mdev_core.c | 2 +-
include/linux/device.h | 14 ++++++++++-
include/linux/fwnode.h | 2 ++
include/linux/of.h | 6 +++++
include/linux/property.h | 1 +
12 files changed, 104 insertions(+), 19 deletions(-)

--
2.33.1


2021-10-22 02:03:28

by Zev Weiss

[permalink] [raw]
Subject: [PATCH 2/5] device property: add fwnode_device_is_reserved()

This is the fwnode API corresponding to of_device_is_reserved(), which
indicates that a device exists but may not be immediately accessible.

Signed-off-by: Zev Weiss <[email protected]>
---
drivers/base/property.c | 16 ++++++++++++++++
include/linux/fwnode.h | 2 ++
include/linux/property.h | 1 +
3 files changed, 19 insertions(+)

diff --git a/drivers/base/property.c b/drivers/base/property.c
index 453918eb7390..7b70cb04531c 100644
--- a/drivers/base/property.c
+++ b/drivers/base/property.c
@@ -845,6 +845,22 @@ bool fwnode_device_is_available(const struct fwnode_handle *fwnode)
}
EXPORT_SYMBOL_GPL(fwnode_device_is_available);

+/**
+ * fwnode_device_is_reserved - check if a device is reserved for use
+ * @fwnode: Pointer to the fwnode of the device.
+ *
+ * For fwnode node types that don't implement the .device_is_reserved()
+ * operation, this function returns false.
+ */
+bool fwnode_device_is_reserved(const struct fwnode_handle *fwnode)
+{
+ if (!fwnode_has_op(fwnode, device_is_reserved))
+ return false;
+
+ return fwnode_call_bool_op(fwnode, device_is_reserved);
+}
+EXPORT_SYMBOL_GPL(fwnode_device_is_reserved);
+
/**
* device_get_child_node_count - return the number of child nodes for device
* @dev: Device to cound the child nodes for
diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
index 9f4ad719bfe3..1a9aefbe12f1 100644
--- a/include/linux/fwnode.h
+++ b/include/linux/fwnode.h
@@ -86,6 +86,7 @@ struct fwnode_reference_args {
* @get: Get a reference to an fwnode.
* @put: Put a reference to an fwnode.
* @device_is_available: Return true if the device is available.
+ * @device_is_reserved: Return true if the device is reserved.
* @device_get_match_data: Return the device driver match data.
* @property_present: Return true if a property is present.
* @property_read_int_array: Read an array of integer properties. Return zero on
@@ -110,6 +111,7 @@ struct fwnode_operations {
struct fwnode_handle *(*get)(struct fwnode_handle *fwnode);
void (*put)(struct fwnode_handle *fwnode);
bool (*device_is_available)(const struct fwnode_handle *fwnode);
+ bool (*device_is_reserved)(const struct fwnode_handle *fwnode);
const void *(*device_get_match_data)(const struct fwnode_handle *fwnode,
const struct device *dev);
bool (*property_present)(const struct fwnode_handle *fwnode,
diff --git a/include/linux/property.h b/include/linux/property.h
index 357513a977e5..455b022fa612 100644
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -50,6 +50,7 @@ int device_property_match_string(struct device *dev,
const char *propname, const char *string);

bool fwnode_device_is_available(const struct fwnode_handle *fwnode);
+bool fwnode_device_is_reserved(const struct fwnode_handle *fwnode);
bool fwnode_property_present(const struct fwnode_handle *fwnode,
const char *propname);
int fwnode_property_read_u8_array(const struct fwnode_handle *fwnode,
--
2.33.1

2021-10-22 02:03:39

by Zev Weiss

[permalink] [raw]
Subject: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices

Devices whose fwnodes are marked as reserved are instantiated, but
will not have a driver bound to them unless userspace explicitly
requests it by writing to a 'bind' sysfs file. This is to enable
devices that may require special (userspace-mediated) preparation
before a driver can safely probe them.

Signed-off-by: Zev Weiss <[email protected]>
---
drivers/base/bus.c | 2 +-
drivers/base/dd.c | 13 ++++++++-----
drivers/dma/idxd/compat.c | 3 +--
drivers/vfio/mdev/mdev_core.c | 2 +-
include/linux/device.h | 14 +++++++++++++-
5 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index bdc98c5713d5..8a30f9a02de0 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -211,7 +211,7 @@ static ssize_t bind_store(struct device_driver *drv, const char *buf,

dev = bus_find_device_by_name(bus, NULL, buf);
if (dev && driver_match_device(drv, dev)) {
- err = device_driver_attach(drv, dev);
+ err = device_driver_attach(drv, dev, DRV_BIND_EXPLICIT);
if (!err) {
/* success */
err = count;
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 68ea1f949daa..ee64740a63d9 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -22,6 +22,7 @@
#include <linux/dma-map-ops.h>
#include <linux/init.h>
#include <linux/module.h>
+#include <linux/property.h>
#include <linux/kthread.h>
#include <linux/wait.h>
#include <linux/async.h>
@@ -727,13 +728,14 @@ void wait_for_device_probe(void)
}
EXPORT_SYMBOL_GPL(wait_for_device_probe);

-static int __driver_probe_device(struct device_driver *drv, struct device *dev)
+static int __driver_probe_device(struct device_driver *drv, struct device *dev, u32 flags)
{
int ret = 0;

if (dev->p->dead || !device_is_registered(dev))
return -ENODEV;
- if (dev->driver)
+ if (dev->driver ||
+ (fwnode_device_is_reserved(dev->fwnode) && !(flags & DRV_BIND_EXPLICIT)))
return -EBUSY;

dev->can_match = true;
@@ -778,7 +780,7 @@ static int driver_probe_device(struct device_driver *drv, struct device *dev)
int ret;

atomic_inc(&probe_count);
- ret = __driver_probe_device(drv, dev);
+ ret = __driver_probe_device(drv, dev, DRV_BIND_DEFAULT);
if (ret == -EPROBE_DEFER || ret == EPROBE_DEFER) {
driver_deferred_probe_add(dev);

@@ -1052,16 +1054,17 @@ static void __device_driver_unlock(struct device *dev, struct device *parent)
* device_driver_attach - attach a specific driver to a specific device
* @drv: Driver to attach
* @dev: Device to attach it to
+ * @flags: Bitmask of DRV_BIND_* flags
*
* Manually attach driver to a device. Will acquire both @dev lock and
* @dev->parent lock if needed. Returns 0 on success, -ERR on failure.
*/
-int device_driver_attach(struct device_driver *drv, struct device *dev)
+int device_driver_attach(struct device_driver *drv, struct device *dev, u32 flags)
{
int ret;

__device_driver_lock(dev, dev->parent);
- ret = __driver_probe_device(drv, dev);
+ ret = __driver_probe_device(drv, dev, flags);
__device_driver_unlock(dev, dev->parent);

/* also return probe errors as normal negative errnos */
diff --git a/drivers/dma/idxd/compat.c b/drivers/dma/idxd/compat.c
index 3df21615f888..51df38dea15a 100644
--- a/drivers/dma/idxd/compat.c
+++ b/drivers/dma/idxd/compat.c
@@ -7,7 +7,6 @@
#include <linux/device/bus.h>
#include "idxd.h"

-extern int device_driver_attach(struct device_driver *drv, struct device *dev);
extern void device_driver_detach(struct device *dev);

#define DRIVER_ATTR_IGNORE_LOCKDEP(_name, _mode, _show, _store) \
@@ -56,7 +55,7 @@ static ssize_t bind_store(struct device_driver *drv, const char *buf, size_t cou
if (!alt_drv)
return -ENODEV;

- rc = device_driver_attach(alt_drv, dev);
+ rc = device_driver_attach(alt_drv, dev, DRV_BIND_EXPLICIT);
if (rc < 0)
return rc;

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b314101237fe..f42c6ec543c8 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -309,7 +309,7 @@ int mdev_device_create(struct mdev_type *type, const guid_t *uuid)

if (!drv)
drv = &vfio_mdev_driver;
- ret = device_driver_attach(&drv->driver, &mdev->dev);
+ ret = device_driver_attach(&drv->driver, &mdev->dev, DRV_BIND_DEFAULT);
if (ret)
goto out_del;

diff --git a/include/linux/device.h b/include/linux/device.h
index e270cb740b9e..1ada1093799b 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -876,12 +876,24 @@ static inline void *dev_get_platdata(const struct device *dev)
return dev->platform_data;
}

+/*
+ * Driver-binding flags (for passing to device_driver_attach())
+ *
+ * DRV_BIND_DEFAULT: a default, automatic bind, e.g. as a result of a device
+ * being added for which we already have a driver, or vice
+ * versa.
+ * DRV_BIND_EXPLICIT: an explicit, userspace-requested driver bind, e.g. as a
+ * result of a write to /sys/bus/.../drivers/.../bind
+ */
+#define DRV_BIND_DEFAULT 0
+#define DRV_BIND_EXPLICIT BIT(0)
+
/*
* Manual binding of a device to driver. See drivers/base/bus.c
* for information on use.
*/
int __must_check device_driver_attach(struct device_driver *drv,
- struct device *dev);
+ struct device *dev, u32 flags);
int __must_check device_bind_driver(struct device *dev);
void device_release_driver(struct device *dev);
int __must_check device_attach(struct device *dev);
--
2.33.1

2021-10-22 02:03:40

by Zev Weiss

[permalink] [raw]
Subject: [PATCH 3/5] of: property: add support for fwnode_device_is_reserved()

This is just a simple pass-through to of_device_is_reserved().

Signed-off-by: Zev Weiss <[email protected]>
---
drivers/of/property.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/of/property.c b/drivers/of/property.c
index a3483484a5a2..67a8cb7ac462 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -872,6 +872,11 @@ static bool of_fwnode_device_is_available(const struct fwnode_handle *fwnode)
return of_device_is_available(to_of_node(fwnode));
}

+static bool of_fwnode_device_is_reserved(const struct fwnode_handle *fwnode)
+{
+ return of_device_is_reserved(to_of_node(fwnode));
+}
+
static bool of_fwnode_property_present(const struct fwnode_handle *fwnode,
const char *propname)
{
@@ -1458,6 +1463,7 @@ const struct fwnode_operations of_fwnode_ops = {
.get = of_fwnode_get,
.put = of_fwnode_put,
.device_is_available = of_fwnode_device_is_available,
+ .device_is_reserved = of_fwnode_device_is_reserved,
.device_get_match_data = of_fwnode_device_get_match_data,
.property_present = of_fwnode_property_present,
.property_read_int_array = of_fwnode_property_read_int_array,
--
2.33.1

2021-10-22 02:04:06

by Zev Weiss

[permalink] [raw]
Subject: [PATCH 5/5] of: platform: instantiate reserved devices

Devices with a "reserved" status will now be passed through to
of_device_add() along with "okay" ones so that the driver core is
aware of their existence and drivers can be explicitly bound to them
when appropriate.

Signed-off-by: Zev Weiss <[email protected]>
---
drivers/of/platform.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 74afbb7a4f5e..060e1e9058c2 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -170,7 +170,7 @@ static struct platform_device *of_platform_device_create_pdata(
{
struct platform_device *dev;

- if (!of_device_is_available(np) ||
+ if ((!of_device_is_available(np) && !of_device_is_reserved(np)) ||
of_node_test_and_set_flag(np, OF_POPULATED))
return NULL;

--
2.33.1

2021-10-22 03:00:50

by Rob Herring

[permalink] [raw]
Subject: Re: [PATCH 0/5] driver core, of: support for reserved devices

On Thu, Oct 21, 2021 at 9:00 PM Zev Weiss <[email protected]> wrote:
>
> Hello all,
>
> This series is another incarnation of a couple other patchsets I've
> posted recently [0, 1], but again different enough in overall
> structure that I'm not sure it's exactly a v2 (or v3).
>
> As compared to [1], it abandons the writable binary sysfs files and at
> Frank's suggestion returns to an approach more akin to [0], though
> without any driver-specific (aspeed-smc) changes, which I figure might
> as well be done later in a separate series once appropriate
> infrastructure is in place.

I skimmed this, and overall I like the approach.

> The basic idea is to implement support for a status property value
> that's documented in the DT spec [2], but thus far not used at all in
> the kernel (or anywhere else I'm aware of): "reserved". According to
> the spec (section 2.3.4, Table 2.4), this status:
>
> Indicates that the device is operational, but should not be used.
> Typically this is used for devices that are controlled by another
> software component, such as platform firmware.
>
> With these changes, devices marked as reserved are (at least in some
> cases, more on this later) instantiated, but will not have drivers
> bound to them unless and until userspace explicitly requests it by
> writing the device's name to the driver's sysfs 'bind' file. This
> enables appropriate handling of hardware arrangements that can arise
> in contexts like OpenBMC, where a device may be shared with another
> external controller not under the kernel's control (for example, the
> flash chip storing the host CPU's firmware, shared by the BMC and the
> host CPU and exclusively under the control of the latter by default).
> Such a device can be marked as reserved so that the kernel refrains
> from touching it until appropriate preparatory steps have been taken
> (e.g. BMC userspace coordinating with the host CPU to arbitrate which
> processor has control of the firmware flash).
>
> Patches 1-3 provide some basic plumbing for checking the "reserved"
> status of a device, patch 4 is the main driver-core change, and patch
> 5 tweaks the OF platform code to not skip reserved devices so that
> they can actually be instantiated.
>
> One shortcoming of this series is that it doesn't automatically apply
> universally across all busses and drivers -- patch 5 enables support
> for platform devices, but similar changes would be required for
> support in other busses (e.g. in of_register_spi_devices(),
> of_i2c_register_devices(), etc.) and drivers that instantiate DT
> devices. Since at present a "reserved" status is treated as
> equivalent to "disabled" and this series preserves that status quo in
> those cases I'd hope this wouldn't be considered a deal-breaker, but
> a thing to be aware of at least.
>
> Greg: I know on [1] you had commented nack-ing the addition of boolean
> function parameters; patch 4 adds a flags mask instead in an analogous
> situation. I'm not certain how much of an improvement you'd consider
> that (hopefully at least slightly better, in that the arguments passed
> at the call site are more self-explanatory); if that's still
> unsatisfactory I'd welcome any suggested alternatives.

Can't we add a flag bit in struct device to reflect manual binding?
bind will set it and unbind clears it.

Rob

2021-10-22 03:15:58

by Zev Weiss

[permalink] [raw]
Subject: Re: [PATCH 0/5] driver core, of: support for reserved devices

On Thu, Oct 21, 2021 at 07:58:46PM PDT, Rob Herring wrote:
>On Thu, Oct 21, 2021 at 9:00 PM Zev Weiss <[email protected]> wrote:
>>
>> Hello all,
>>
>> This series is another incarnation of a couple other patchsets I've
>> posted recently [0, 1], but again different enough in overall
>> structure that I'm not sure it's exactly a v2 (or v3).
>>
>> As compared to [1], it abandons the writable binary sysfs files and at
>> Frank's suggestion returns to an approach more akin to [0], though
>> without any driver-specific (aspeed-smc) changes, which I figure might
>> as well be done later in a separate series once appropriate
>> infrastructure is in place.
>
>I skimmed this, and overall I like the approach.
>
>> The basic idea is to implement support for a status property value
>> that's documented in the DT spec [2], but thus far not used at all in
>> the kernel (or anywhere else I'm aware of): "reserved". According to
>> the spec (section 2.3.4, Table 2.4), this status:
>>
>> Indicates that the device is operational, but should not be used.
>> Typically this is used for devices that are controlled by another
>> software component, such as platform firmware.
>>
>> With these changes, devices marked as reserved are (at least in some
>> cases, more on this later) instantiated, but will not have drivers
>> bound to them unless and until userspace explicitly requests it by
>> writing the device's name to the driver's sysfs 'bind' file. This
>> enables appropriate handling of hardware arrangements that can arise
>> in contexts like OpenBMC, where a device may be shared with another
>> external controller not under the kernel's control (for example, the
>> flash chip storing the host CPU's firmware, shared by the BMC and the
>> host CPU and exclusively under the control of the latter by default).
>> Such a device can be marked as reserved so that the kernel refrains
>> from touching it until appropriate preparatory steps have been taken
>> (e.g. BMC userspace coordinating with the host CPU to arbitrate which
>> processor has control of the firmware flash).
>>
>> Patches 1-3 provide some basic plumbing for checking the "reserved"
>> status of a device, patch 4 is the main driver-core change, and patch
>> 5 tweaks the OF platform code to not skip reserved devices so that
>> they can actually be instantiated.
>>
>> One shortcoming of this series is that it doesn't automatically apply
>> universally across all busses and drivers -- patch 5 enables support
>> for platform devices, but similar changes would be required for
>> support in other busses (e.g. in of_register_spi_devices(),
>> of_i2c_register_devices(), etc.) and drivers that instantiate DT
>> devices. Since at present a "reserved" status is treated as
>> equivalent to "disabled" and this series preserves that status quo in
>> those cases I'd hope this wouldn't be considered a deal-breaker, but
>> a thing to be aware of at least.
>>
>> Greg: I know on [1] you had commented nack-ing the addition of boolean
>> function parameters; patch 4 adds a flags mask instead in an analogous
>> situation. I'm not certain how much of an improvement you'd consider
>> that (hopefully at least slightly better, in that the arguments passed
>> at the call site are more self-explanatory); if that's still
>> unsatisfactory I'd welcome any suggested alternatives.
>
>Can't we add a flag bit in struct device to reflect manual binding?
>bind will set it and unbind clears it.
>

I considered this (and actually drafted up a version that did exactly
that), but it seemed like turning a parameter-passing problem into a
state-maintenance problem (finding all the places that flag would need
to be cleared and ensuring newly-added ones don't get missed, which
unlike a function parameter the compiler can't really check for us).
Given that the live range (definition to use) of that value is entirely
within the codepath of a single call-chain (bind_store() ->
device_driver_attach() -> __driver_probe_device()), continuing to
maintain that state beyond that call chain seemed like unnecessary
complexity to me, but if there's a consensus that that would actually be
preferable I can certainly do it that way instead.


Zev

2021-10-22 06:48:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices

On Thu, Oct 21, 2021 at 07:00:31PM -0700, Zev Weiss wrote:
> Devices whose fwnodes are marked as reserved are instantiated, but
> will not have a driver bound to them unless userspace explicitly
> requests it by writing to a 'bind' sysfs file. This is to enable
> devices that may require special (userspace-mediated) preparation
> before a driver can safely probe them.
>
> Signed-off-by: Zev Weiss <[email protected]>
> ---
> drivers/base/bus.c | 2 +-
> drivers/base/dd.c | 13 ++++++++-----
> drivers/dma/idxd/compat.c | 3 +--
> drivers/vfio/mdev/mdev_core.c | 2 +-
> include/linux/device.h | 14 +++++++++++++-
> 5 files changed, 24 insertions(+), 10 deletions(-)

Ugh, no, I don't really want to add yet-another-state to the driver core
like this. Why are these devices even in the kernel with a driver that
wants to bind to them registered if the driver somehow should NOT be
bound to it? Shouldn't all of that logic be in the crazy driver itself
as that is a very rare and odd thing to do that the driver core should
not care about at all.

And why does a device need userspace interaction at all? Again, why
would the driver not know about this and handle it all directly?

thanks,

greg k-h

2021-10-22 06:52:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 0/5] driver core, of: support for reserved devices

On Thu, Oct 21, 2021 at 09:58:46PM -0500, Rob Herring wrote:
> On Thu, Oct 21, 2021 at 9:00 PM Zev Weiss <[email protected]> wrote:
> >
> > Hello all,
> >
> > This series is another incarnation of a couple other patchsets I've
> > posted recently [0, 1], but again different enough in overall
> > structure that I'm not sure it's exactly a v2 (or v3).
> >
> > As compared to [1], it abandons the writable binary sysfs files and at
> > Frank's suggestion returns to an approach more akin to [0], though
> > without any driver-specific (aspeed-smc) changes, which I figure might
> > as well be done later in a separate series once appropriate
> > infrastructure is in place.
>
> I skimmed this, and overall I like the approach.
>
> > The basic idea is to implement support for a status property value
> > that's documented in the DT spec [2], but thus far not used at all in
> > the kernel (or anywhere else I'm aware of): "reserved". According to
> > the spec (section 2.3.4, Table 2.4), this status:
> >
> > Indicates that the device is operational, but should not be used.
> > Typically this is used for devices that are controlled by another
> > software component, such as platform firmware.
> >
> > With these changes, devices marked as reserved are (at least in some
> > cases, more on this later) instantiated, but will not have drivers
> > bound to them unless and until userspace explicitly requests it by
> > writing the device's name to the driver's sysfs 'bind' file. This
> > enables appropriate handling of hardware arrangements that can arise
> > in contexts like OpenBMC, where a device may be shared with another
> > external controller not under the kernel's control (for example, the
> > flash chip storing the host CPU's firmware, shared by the BMC and the
> > host CPU and exclusively under the control of the latter by default).
> > Such a device can be marked as reserved so that the kernel refrains
> > from touching it until appropriate preparatory steps have been taken
> > (e.g. BMC userspace coordinating with the host CPU to arbitrate which
> > processor has control of the firmware flash).
> >
> > Patches 1-3 provide some basic plumbing for checking the "reserved"
> > status of a device, patch 4 is the main driver-core change, and patch
> > 5 tweaks the OF platform code to not skip reserved devices so that
> > they can actually be instantiated.
> >
> > One shortcoming of this series is that it doesn't automatically apply
> > universally across all busses and drivers -- patch 5 enables support
> > for platform devices, but similar changes would be required for
> > support in other busses (e.g. in of_register_spi_devices(),
> > of_i2c_register_devices(), etc.) and drivers that instantiate DT
> > devices. Since at present a "reserved" status is treated as
> > equivalent to "disabled" and this series preserves that status quo in
> > those cases I'd hope this wouldn't be considered a deal-breaker, but
> > a thing to be aware of at least.
> >
> > Greg: I know on [1] you had commented nack-ing the addition of boolean
> > function parameters; patch 4 adds a flags mask instead in an analogous
> > situation. I'm not certain how much of an improvement you'd consider
> > that (hopefully at least slightly better, in that the arguments passed
> > at the call site are more self-explanatory); if that's still
> > unsatisfactory I'd welcome any suggested alternatives.
>
> Can't we add a flag bit in struct device to reflect manual binding?
> bind will set it and unbind clears it.

The driver core does not "know" the difference betwen manual and not
manual binding, nor should it. That's up to the bus.

thanks,

greg k-h

2021-10-22 06:52:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 0/5] driver core, of: support for reserved devices

On Thu, Oct 21, 2021 at 07:00:27PM -0700, Zev Weiss wrote:
> Hello all,
>
> This series is another incarnation of a couple other patchsets I've
> posted recently [0, 1], but again different enough in overall
> structure that I'm not sure it's exactly a v2 (or v3).
>
> As compared to [1], it abandons the writable binary sysfs files and at
> Frank's suggestion returns to an approach more akin to [0], though
> without any driver-specific (aspeed-smc) changes, which I figure might
> as well be done later in a separate series once appropriate
> infrastructure is in place.
>
> The basic idea is to implement support for a status property value
> that's documented in the DT spec [2], but thus far not used at all in
> the kernel (or anywhere else I'm aware of): "reserved". According to
> the spec (section 2.3.4, Table 2.4), this status:
>
> Indicates that the device is operational, but should not be used.
> Typically this is used for devices that are controlled by another
> software component, such as platform firmware.
>
> With these changes, devices marked as reserved are (at least in some
> cases, more on this later) instantiated, but will not have drivers
> bound to them unless and until userspace explicitly requests it by
> writing the device's name to the driver's sysfs 'bind' file. This
> enables appropriate handling of hardware arrangements that can arise
> in contexts like OpenBMC, where a device may be shared with another
> external controller not under the kernel's control (for example, the
> flash chip storing the host CPU's firmware, shared by the BMC and the
> host CPU and exclusively under the control of the latter by default).
> Such a device can be marked as reserved so that the kernel refrains
> from touching it until appropriate preparatory steps have been taken
> (e.g. BMC userspace coordinating with the host CPU to arbitrate which
> processor has control of the firmware flash).
>
> Patches 1-3 provide some basic plumbing for checking the "reserved"
> status of a device, patch 4 is the main driver-core change, and patch
> 5 tweaks the OF platform code to not skip reserved devices so that
> they can actually be instantiated.

Again, the driver core should not care about this, that is up to the bus
that wants to read these "reserved" values and do something with them or
not (remember the bus is the thing that does the binding, not the driver
core).

But are you sure you are using the "reserved" field properly? You are
wanting to do "something" to the device to later on be able to then have
the kernel touch the device, while it seems that the reason for this
field is for the kernel to NEVER touch the device at all. What will
break if you change this logic?

> One shortcoming of this series is that it doesn't automatically apply
> universally across all busses and drivers -- patch 5 enables support
> for platform devices, but similar changes would be required for
> support in other busses (e.g. in of_register_spi_devices(),
> of_i2c_register_devices(), etc.) and drivers that instantiate DT
> devices. Since at present a "reserved" status is treated as
> equivalent to "disabled" and this series preserves that status quo in
> those cases I'd hope this wouldn't be considered a deal-breaker, but
> a thing to be aware of at least.
>
> Greg: I know on [1] you had commented nack-ing the addition of boolean
> function parameters; patch 4 adds a flags mask instead in an analogous
> situation. I'm not certain how much of an improvement you'd consider
> that (hopefully at least slightly better, in that the arguments passed
> at the call site are more self-explanatory); if that's still
> unsatisfactory I'd welcome any suggested alternatives.

Flags are a bit better, yes, but still I do not think this is the right
way to go here, see my comments on the patches...

thanks,

greg k-h

2021-10-22 08:35:07

by Zev Weiss

[permalink] [raw]
Subject: Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices

On Thu, Oct 21, 2021 at 11:46:56PM PDT, Greg Kroah-Hartman wrote:
>On Thu, Oct 21, 2021 at 07:00:31PM -0700, Zev Weiss wrote:
>> Devices whose fwnodes are marked as reserved are instantiated, but
>> will not have a driver bound to them unless userspace explicitly
>> requests it by writing to a 'bind' sysfs file. This is to enable
>> devices that may require special (userspace-mediated) preparation
>> before a driver can safely probe them.
>>
>> Signed-off-by: Zev Weiss <[email protected]>
>> ---
>> drivers/base/bus.c | 2 +-
>> drivers/base/dd.c | 13 ++++++++-----
>> drivers/dma/idxd/compat.c | 3 +--
>> drivers/vfio/mdev/mdev_core.c | 2 +-
>> include/linux/device.h | 14 +++++++++++++-
>> 5 files changed, 24 insertions(+), 10 deletions(-)
>
>Ugh, no, I don't really want to add yet-another-state to the driver core
>like this. Why are these devices even in the kernel with a driver that
>wants to bind to them registered if the driver somehow should NOT be
>bound to it? Shouldn't all of that logic be in the crazy driver itself
>as that is a very rare and odd thing to do that the driver core should
>not care about at all.
>
>And why does a device need userspace interaction at all? Again, why
>would the driver not know about this and handle it all directly?
>

Let me expand a bit more on the details of the specific situation I'm
dealing with...

On a server motherboard we've got a host CPU (Xeon, Epyc, POWER, etc.)
and a baseboard management controller, or BMC (typically an ARM SoC, an
ASPEED AST2500 in my case). The host CPU's firmware (BIOS/UEFI, ME
firmware, etc.) lives in a SPI flash chip. Because it's the host's
firmware, that flash chip is connected to and generally (by default)
under the control of the host CPU.

But we also want the BMC to be able to perform out-of-band updates to
the host's firmware, so the flash is *also* connected to the BMC.
There's an external mux (controlled by a GPIO output driven by the BMC)
that switches which processor (host or BMC) is actually driving the SPI
signals to the flash chip, but there's a bunch of other stuff that's
also required before the BMC can flip that switch and take control of
the SPI interface:

- the BMC needs to track (and potentially alter) the host's power state
to ensure it's not running (in OpenBMC the existing logic for this is
an entire non-trivial userspace daemon unto itself)

- it needs to twiddle some other GPIOs to put the ME into recovery mode

- it needs to exchange some IPMI messages with the ME to confirm it got
into recovery mode

(Some of the details here are specific to the particular motherboard I'm
working with, but I'd guess other systems probably have broadly similar
requirements.)

The firmware flash (or at least the BMC's side of the mux in front of
it) is attached to a spi-nor controller that's well supported by an
existing MTD driver (aspeed-smc), but that driver can't safely probe the
chip until all the stuff described above has been done. In particular,
this means we can't reasonably bind the driver to that device during the
normal device-discovery/driver-binding done in the BMC's boot process
(nor do we want to, as that would pull the rug out from under the
running host). We basically only ever want to touch that SPI interface
when a user (sysadmin using the BMC, let's say) has explicitly initiated
an out-of-band firmware update.

So we want the kernel to be aware of the device's existence (so that we
*can* bind a driver to it when needed), but we don't want it touching
the device unless we really ask for it.

Does that help clarify the motivation for wanting this functionality?


Thanks,
Zev

2021-10-22 09:00:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices

On Fri, Oct 22, 2021 at 01:32:32AM -0700, Zev Weiss wrote:
> On Thu, Oct 21, 2021 at 11:46:56PM PDT, Greg Kroah-Hartman wrote:
> > On Thu, Oct 21, 2021 at 07:00:31PM -0700, Zev Weiss wrote:
> > > Devices whose fwnodes are marked as reserved are instantiated, but
> > > will not have a driver bound to them unless userspace explicitly
> > > requests it by writing to a 'bind' sysfs file. This is to enable
> > > devices that may require special (userspace-mediated) preparation
> > > before a driver can safely probe them.
> > >
> > > Signed-off-by: Zev Weiss <[email protected]>
> > > ---
> > > drivers/base/bus.c | 2 +-
> > > drivers/base/dd.c | 13 ++++++++-----
> > > drivers/dma/idxd/compat.c | 3 +--
> > > drivers/vfio/mdev/mdev_core.c | 2 +-
> > > include/linux/device.h | 14 +++++++++++++-
> > > 5 files changed, 24 insertions(+), 10 deletions(-)
> >
> > Ugh, no, I don't really want to add yet-another-state to the driver core
> > like this. Why are these devices even in the kernel with a driver that
> > wants to bind to them registered if the driver somehow should NOT be
> > bound to it? Shouldn't all of that logic be in the crazy driver itself
> > as that is a very rare and odd thing to do that the driver core should
> > not care about at all.
> >
> > And why does a device need userspace interaction at all? Again, why
> > would the driver not know about this and handle it all directly?
> >
>
> Let me expand a bit more on the details of the specific situation I'm
> dealing with...
>
> On a server motherboard we've got a host CPU (Xeon, Epyc, POWER, etc.) and a
> baseboard management controller, or BMC (typically an ARM SoC, an ASPEED
> AST2500 in my case). The host CPU's firmware (BIOS/UEFI, ME firmware, etc.)
> lives in a SPI flash chip. Because it's the host's firmware, that flash
> chip is connected to and generally (by default) under the control of the
> host CPU.
>
> But we also want the BMC to be able to perform out-of-band updates to the
> host's firmware, so the flash is *also* connected to the BMC. There's an
> external mux (controlled by a GPIO output driven by the BMC) that switches
> which processor (host or BMC) is actually driving the SPI signals to the
> flash chip, but there's a bunch of other stuff that's also required before
> the BMC can flip that switch and take control of the SPI interface:
>
> - the BMC needs to track (and potentially alter) the host's power state
> to ensure it's not running (in OpenBMC the existing logic for this is an
> entire non-trivial userspace daemon unto itself)
>
> - it needs to twiddle some other GPIOs to put the ME into recovery mode
>
> - it needs to exchange some IPMI messages with the ME to confirm it got
> into recovery mode
>
> (Some of the details here are specific to the particular motherboard I'm
> working with, but I'd guess other systems probably have broadly similar
> requirements.)
>
> The firmware flash (or at least the BMC's side of the mux in front of it) is
> attached to a spi-nor controller that's well supported by an existing MTD
> driver (aspeed-smc), but that driver can't safely probe the chip until all
> the stuff described above has been done. In particular, this means we can't
> reasonably bind the driver to that device during the normal
> device-discovery/driver-binding done in the BMC's boot process (nor do we
> want to, as that would pull the rug out from under the running host). We
> basically only ever want to touch that SPI interface when a user (sysadmin
> using the BMC, let's say) has explicitly initiated an out-of-band firmware
> update.
>
> So we want the kernel to be aware of the device's existence (so that we
> *can* bind a driver to it when needed), but we don't want it touching the
> device unless we really ask for it.
>
> Does that help clarify the motivation for wanting this functionality?

Sure, then just do this type of thing in the driver itself. Do not have
any matching "ids" for this hardware it so that the bus will never call
the probe function for this hardware _until_ a manual write happens to
the driver's "bind" sysfs file.

Then when userspace is done, do a "unbind" write.

No driver core changes should be needed at all here.

thanks,

greg k-h

2021-10-22 09:03:23

by Zev Weiss

[permalink] [raw]
Subject: Re: [PATCH 0/5] driver core, of: support for reserved devices

On Thu, Oct 21, 2021 at 11:50:07PM PDT, Greg Kroah-Hartman wrote:
>On Thu, Oct 21, 2021 at 07:00:27PM -0700, Zev Weiss wrote:
>> Hello all,
>>
>> This series is another incarnation of a couple other patchsets I've
>> posted recently [0, 1], but again different enough in overall
>> structure that I'm not sure it's exactly a v2 (or v3).
>>
>> As compared to [1], it abandons the writable binary sysfs files and at
>> Frank's suggestion returns to an approach more akin to [0], though
>> without any driver-specific (aspeed-smc) changes, which I figure might
>> as well be done later in a separate series once appropriate
>> infrastructure is in place.
>>
>> The basic idea is to implement support for a status property value
>> that's documented in the DT spec [2], but thus far not used at all in
>> the kernel (or anywhere else I'm aware of): "reserved". According to
>> the spec (section 2.3.4, Table 2.4), this status:
>>
>> Indicates that the device is operational, but should not be used.
>> Typically this is used for devices that are controlled by another
>> software component, such as platform firmware.
>>
>> With these changes, devices marked as reserved are (at least in some
>> cases, more on this later) instantiated, but will not have drivers
>> bound to them unless and until userspace explicitly requests it by
>> writing the device's name to the driver's sysfs 'bind' file. This
>> enables appropriate handling of hardware arrangements that can arise
>> in contexts like OpenBMC, where a device may be shared with another
>> external controller not under the kernel's control (for example, the
>> flash chip storing the host CPU's firmware, shared by the BMC and the
>> host CPU and exclusively under the control of the latter by default).
>> Such a device can be marked as reserved so that the kernel refrains
>> from touching it until appropriate preparatory steps have been taken
>> (e.g. BMC userspace coordinating with the host CPU to arbitrate which
>> processor has control of the firmware flash).
>>
>> Patches 1-3 provide some basic plumbing for checking the "reserved"
>> status of a device, patch 4 is the main driver-core change, and patch
>> 5 tweaks the OF platform code to not skip reserved devices so that
>> they can actually be instantiated.
>
>Again, the driver core should not care about this, that is up to the bus
>that wants to read these "reserved" values and do something with them or
>not (remember the bus is the thing that does the binding, not the driver
>core).
>
>But are you sure you are using the "reserved" field properly?

Well, thus far both Rob Herring and Oliver O'Halloran (originator of the
"reserved" status in the DT spec, whom I probably should have CCed
earlier, sorry) have seemed receptive to this interpretation of it,
which I'd hope would lend it some credence.

>You are
>wanting to do "something" to the device to later on be able to then have
>the kernel touch the device, while it seems that the reason for this
>field is for the kernel to NEVER touch the device at all. What will
>break if you change this logic?

Given that there's no existing usage of or support for this status value
anywhere I can see in the kernel, and that Oliver has indicated that it
should be compatible with usage in OpenPower platform firmware, my
expectation would certainly be that nothing would break, but if there
are examples of things that could I'd be interested to see them.


Thanks,
Zev

2021-10-22 09:23:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 0/5] driver core, of: support for reserved devices

On Fri, Oct 22, 2021 at 02:00:57AM -0700, Zev Weiss wrote:
> On Thu, Oct 21, 2021 at 11:50:07PM PDT, Greg Kroah-Hartman wrote:
> > On Thu, Oct 21, 2021 at 07:00:27PM -0700, Zev Weiss wrote:
> > > Hello all,
> > >
> > > This series is another incarnation of a couple other patchsets I've
> > > posted recently [0, 1], but again different enough in overall
> > > structure that I'm not sure it's exactly a v2 (or v3).
> > >
> > > As compared to [1], it abandons the writable binary sysfs files and at
> > > Frank's suggestion returns to an approach more akin to [0], though
> > > without any driver-specific (aspeed-smc) changes, which I figure might
> > > as well be done later in a separate series once appropriate
> > > infrastructure is in place.
> > >
> > > The basic idea is to implement support for a status property value
> > > that's documented in the DT spec [2], but thus far not used at all in
> > > the kernel (or anywhere else I'm aware of): "reserved". According to
> > > the spec (section 2.3.4, Table 2.4), this status:
> > >
> > > Indicates that the device is operational, but should not be used.
> > > Typically this is used for devices that are controlled by another
> > > software component, such as platform firmware.
> > >
> > > With these changes, devices marked as reserved are (at least in some
> > > cases, more on this later) instantiated, but will not have drivers
> > > bound to them unless and until userspace explicitly requests it by
> > > writing the device's name to the driver's sysfs 'bind' file. This
> > > enables appropriate handling of hardware arrangements that can arise
> > > in contexts like OpenBMC, where a device may be shared with another
> > > external controller not under the kernel's control (for example, the
> > > flash chip storing the host CPU's firmware, shared by the BMC and the
> > > host CPU and exclusively under the control of the latter by default).
> > > Such a device can be marked as reserved so that the kernel refrains
> > > from touching it until appropriate preparatory steps have been taken
> > > (e.g. BMC userspace coordinating with the host CPU to arbitrate which
> > > processor has control of the firmware flash).
> > >
> > > Patches 1-3 provide some basic plumbing for checking the "reserved"
> > > status of a device, patch 4 is the main driver-core change, and patch
> > > 5 tweaks the OF platform code to not skip reserved devices so that
> > > they can actually be instantiated.
> >
> > Again, the driver core should not care about this, that is up to the bus
> > that wants to read these "reserved" values and do something with them or
> > not (remember the bus is the thing that does the binding, not the driver
> > core).
> >
> > But are you sure you are using the "reserved" field properly?
>
> Well, thus far both Rob Herring and Oliver O'Halloran (originator of the
> "reserved" status in the DT spec, whom I probably should have CCed earlier,
> sorry) have seemed receptive to this interpretation of it, which I'd hope
> would lend it some credence.

Ok, that's up to the DT people, I'll let you all fight it out with the
platform creators :)

Good luck!

greg k-h

2021-10-22 15:21:07

by Patrick Williams

[permalink] [raw]
Subject: Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices

Hi Greg,

On Fri, Oct 22, 2021 at 10:57:21AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Oct 22, 2021 at 01:32:32AM -0700, Zev Weiss wrote:
> > On Thu, Oct 21, 2021 at 11:46:56PM PDT, Greg Kroah-Hartman wrote:
> > > On Thu, Oct 21, 2021 at 07:00:31PM -0700, Zev Weiss wrote:

> > So we want the kernel to be aware of the device's existence (so that we
> > *can* bind a driver to it when needed), but we don't want it touching the
> > device unless we really ask for it.
> >
> > Does that help clarify the motivation for wanting this functionality?
>
> Sure, then just do this type of thing in the driver itself. Do not have
> any matching "ids" for this hardware it so that the bus will never call
> the probe function for this hardware _until_ a manual write happens to
> the driver's "bind" sysfs file.

It sounds like you're suggesting a change to one particular driver to satisfy
this one particular case (and maybe I'm just not understanding your suggestion).
For a BMC, this is a pretty regular situation and not just as one-off as Zev's
example.

Another good example is where a system can have optional riser cards with a
whole tree of devices that might be on that riser card (and there might be
different variants of a riser card that could go in the same slot). Usually
there is an EEPROM of some sort at a well-known address that can be parsed to
identify which kind of riser card it is and then the appropriate sub-devices can
be enumerated. That EEPROM parsing is something that is currently done in
userspace due to the complexity and often vendor-specific nature of it.

Many of these devices require quite a bit more configuration information than
can be passed along a `bind` call. I believe it has been suggested previously
that this riser-card scenario could also be solved with dynamic loading of DT
snippets, but that support seems simple pretty far from being merged.

--
Patrick Williams


Attachments:
(No filename) (1.91 kB)
signature.asc (849.00 B)
Download all attachments

2021-10-22 16:29:01

by Zev Weiss

[permalink] [raw]
Subject: Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices

On Fri, Oct 22, 2021 at 01:57:21AM PDT, Greg Kroah-Hartman wrote:
>On Fri, Oct 22, 2021 at 01:32:32AM -0700, Zev Weiss wrote:
>> On Thu, Oct 21, 2021 at 11:46:56PM PDT, Greg Kroah-Hartman wrote:
>> > On Thu, Oct 21, 2021 at 07:00:31PM -0700, Zev Weiss wrote:
>> > > Devices whose fwnodes are marked as reserved are instantiated, but
>> > > will not have a driver bound to them unless userspace explicitly
>> > > requests it by writing to a 'bind' sysfs file. This is to enable
>> > > devices that may require special (userspace-mediated) preparation
>> > > before a driver can safely probe them.
>> > >
>> > > Signed-off-by: Zev Weiss <[email protected]>
>> > > ---
>> > > drivers/base/bus.c | 2 +-
>> > > drivers/base/dd.c | 13 ++++++++-----
>> > > drivers/dma/idxd/compat.c | 3 +--
>> > > drivers/vfio/mdev/mdev_core.c | 2 +-
>> > > include/linux/device.h | 14 +++++++++++++-
>> > > 5 files changed, 24 insertions(+), 10 deletions(-)
>> >
>> > Ugh, no, I don't really want to add yet-another-state to the driver core
>> > like this. Why are these devices even in the kernel with a driver that
>> > wants to bind to them registered if the driver somehow should NOT be
>> > bound to it? Shouldn't all of that logic be in the crazy driver itself
>> > as that is a very rare and odd thing to do that the driver core should
>> > not care about at all.
>> >
>> > And why does a device need userspace interaction at all? Again, why
>> > would the driver not know about this and handle it all directly?
>> >
>>
>> Let me expand a bit more on the details of the specific situation I'm
>> dealing with...
>>
>> On a server motherboard we've got a host CPU (Xeon, Epyc, POWER, etc.) and a
>> baseboard management controller, or BMC (typically an ARM SoC, an ASPEED
>> AST2500 in my case). The host CPU's firmware (BIOS/UEFI, ME firmware, etc.)
>> lives in a SPI flash chip. Because it's the host's firmware, that flash
>> chip is connected to and generally (by default) under the control of the
>> host CPU.
>>
>> But we also want the BMC to be able to perform out-of-band updates to the
>> host's firmware, so the flash is *also* connected to the BMC. There's an
>> external mux (controlled by a GPIO output driven by the BMC) that switches
>> which processor (host or BMC) is actually driving the SPI signals to the
>> flash chip, but there's a bunch of other stuff that's also required before
>> the BMC can flip that switch and take control of the SPI interface:
>>
>> - the BMC needs to track (and potentially alter) the host's power state
>> to ensure it's not running (in OpenBMC the existing logic for this is an
>> entire non-trivial userspace daemon unto itself)
>>
>> - it needs to twiddle some other GPIOs to put the ME into recovery mode
>>
>> - it needs to exchange some IPMI messages with the ME to confirm it got
>> into recovery mode
>>
>> (Some of the details here are specific to the particular motherboard I'm
>> working with, but I'd guess other systems probably have broadly similar
>> requirements.)
>>
>> The firmware flash (or at least the BMC's side of the mux in front of it) is
>> attached to a spi-nor controller that's well supported by an existing MTD
>> driver (aspeed-smc), but that driver can't safely probe the chip until all
>> the stuff described above has been done. In particular, this means we can't
>> reasonably bind the driver to that device during the normal
>> device-discovery/driver-binding done in the BMC's boot process (nor do we
>> want to, as that would pull the rug out from under the running host). We
>> basically only ever want to touch that SPI interface when a user (sysadmin
>> using the BMC, let's say) has explicitly initiated an out-of-band firmware
>> update.
>>
>> So we want the kernel to be aware of the device's existence (so that we
>> *can* bind a driver to it when needed), but we don't want it touching the
>> device unless we really ask for it.
>>
>> Does that help clarify the motivation for wanting this functionality?
>
>Sure, then just do this type of thing in the driver itself. Do not have
>any matching "ids" for this hardware it so that the bus will never call
>the probe function for this hardware _until_ a manual write happens to
>the driver's "bind" sysfs file.
>

Perhaps I'm misunderstanding what you're suggesting, but if I just
change the DT "compatible" string so that the device doesn't match the
driver and then try to manually bind it, the driver_match_device() check
in bind_store() prevents that manual bind from actually happening.


Thanks,
Zev

2021-10-23 08:57:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices

On Fri, Oct 22, 2021 at 09:27:41AM -0700, Zev Weiss wrote:
> On Fri, Oct 22, 2021 at 01:57:21AM PDT, Greg Kroah-Hartman wrote:
> > On Fri, Oct 22, 2021 at 01:32:32AM -0700, Zev Weiss wrote:
> > > On Thu, Oct 21, 2021 at 11:46:56PM PDT, Greg Kroah-Hartman wrote:
> > > > On Thu, Oct 21, 2021 at 07:00:31PM -0700, Zev Weiss wrote:
> > > > > Devices whose fwnodes are marked as reserved are instantiated, but
> > > > > will not have a driver bound to them unless userspace explicitly
> > > > > requests it by writing to a 'bind' sysfs file. This is to enable
> > > > > devices that may require special (userspace-mediated) preparation
> > > > > before a driver can safely probe them.
> > > > >
> > > > > Signed-off-by: Zev Weiss <[email protected]>
> > > > > ---
> > > > > drivers/base/bus.c | 2 +-
> > > > > drivers/base/dd.c | 13 ++++++++-----
> > > > > drivers/dma/idxd/compat.c | 3 +--
> > > > > drivers/vfio/mdev/mdev_core.c | 2 +-
> > > > > include/linux/device.h | 14 +++++++++++++-
> > > > > 5 files changed, 24 insertions(+), 10 deletions(-)
> > > >
> > > > Ugh, no, I don't really want to add yet-another-state to the driver core
> > > > like this. Why are these devices even in the kernel with a driver that
> > > > wants to bind to them registered if the driver somehow should NOT be
> > > > bound to it? Shouldn't all of that logic be in the crazy driver itself
> > > > as that is a very rare and odd thing to do that the driver core should
> > > > not care about at all.
> > > >
> > > > And why does a device need userspace interaction at all? Again, why
> > > > would the driver not know about this and handle it all directly?
> > > >
> > >
> > > Let me expand a bit more on the details of the specific situation I'm
> > > dealing with...
> > >
> > > On a server motherboard we've got a host CPU (Xeon, Epyc, POWER, etc.) and a
> > > baseboard management controller, or BMC (typically an ARM SoC, an ASPEED
> > > AST2500 in my case). The host CPU's firmware (BIOS/UEFI, ME firmware, etc.)
> > > lives in a SPI flash chip. Because it's the host's firmware, that flash
> > > chip is connected to and generally (by default) under the control of the
> > > host CPU.
> > >
> > > But we also want the BMC to be able to perform out-of-band updates to the
> > > host's firmware, so the flash is *also* connected to the BMC. There's an
> > > external mux (controlled by a GPIO output driven by the BMC) that switches
> > > which processor (host or BMC) is actually driving the SPI signals to the
> > > flash chip, but there's a bunch of other stuff that's also required before
> > > the BMC can flip that switch and take control of the SPI interface:
> > >
> > > - the BMC needs to track (and potentially alter) the host's power state
> > > to ensure it's not running (in OpenBMC the existing logic for this is an
> > > entire non-trivial userspace daemon unto itself)
> > >
> > > - it needs to twiddle some other GPIOs to put the ME into recovery mode
> > >
> > > - it needs to exchange some IPMI messages with the ME to confirm it got
> > > into recovery mode
> > >
> > > (Some of the details here are specific to the particular motherboard I'm
> > > working with, but I'd guess other systems probably have broadly similar
> > > requirements.)
> > >
> > > The firmware flash (or at least the BMC's side of the mux in front of it) is
> > > attached to a spi-nor controller that's well supported by an existing MTD
> > > driver (aspeed-smc), but that driver can't safely probe the chip until all
> > > the stuff described above has been done. In particular, this means we can't
> > > reasonably bind the driver to that device during the normal
> > > device-discovery/driver-binding done in the BMC's boot process (nor do we
> > > want to, as that would pull the rug out from under the running host). We
> > > basically only ever want to touch that SPI interface when a user (sysadmin
> > > using the BMC, let's say) has explicitly initiated an out-of-band firmware
> > > update.
> > >
> > > So we want the kernel to be aware of the device's existence (so that we
> > > *can* bind a driver to it when needed), but we don't want it touching the
> > > device unless we really ask for it.
> > >
> > > Does that help clarify the motivation for wanting this functionality?
> >
> > Sure, then just do this type of thing in the driver itself. Do not have
> > any matching "ids" for this hardware it so that the bus will never call
> > the probe function for this hardware _until_ a manual write happens to
> > the driver's "bind" sysfs file.
> >
>
> Perhaps I'm misunderstanding what you're suggesting, but if I just change
> the DT "compatible" string so that the device doesn't match the driver and
> then try to manually bind it, the driver_match_device() check in
> bind_store() prevents that manual bind from actually happening.

Hm, I thought the bus had the ability to 'override' this somehow. The
bus does get the callback in driver_match_device() so maybe do the logic
in there? Somehow this works for other devices and busses, so there
must be a way it happens...

thanks,

greg k-h

2021-10-23 08:58:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices

On Fri, Oct 22, 2021 at 10:18:11AM -0500, Patrick Williams wrote:
> Hi Greg,
>
> On Fri, Oct 22, 2021 at 10:57:21AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Oct 22, 2021 at 01:32:32AM -0700, Zev Weiss wrote:
> > > On Thu, Oct 21, 2021 at 11:46:56PM PDT, Greg Kroah-Hartman wrote:
> > > > On Thu, Oct 21, 2021 at 07:00:31PM -0700, Zev Weiss wrote:
>
> > > So we want the kernel to be aware of the device's existence (so that we
> > > *can* bind a driver to it when needed), but we don't want it touching the
> > > device unless we really ask for it.
> > >
> > > Does that help clarify the motivation for wanting this functionality?
> >
> > Sure, then just do this type of thing in the driver itself. Do not have
> > any matching "ids" for this hardware it so that the bus will never call
> > the probe function for this hardware _until_ a manual write happens to
> > the driver's "bind" sysfs file.
>
> It sounds like you're suggesting a change to one particular driver to satisfy
> this one particular case (and maybe I'm just not understanding your suggestion).
> For a BMC, this is a pretty regular situation and not just as one-off as Zev's
> example.
>
> Another good example is where a system can have optional riser cards with a
> whole tree of devices that might be on that riser card (and there might be
> different variants of a riser card that could go in the same slot). Usually
> there is an EEPROM of some sort at a well-known address that can be parsed to
> identify which kind of riser card it is and then the appropriate sub-devices can
> be enumerated. That EEPROM parsing is something that is currently done in
> userspace due to the complexity and often vendor-specific nature of it.
>
> Many of these devices require quite a bit more configuration information than
> can be passed along a `bind` call. I believe it has been suggested previously
> that this riser-card scenario could also be solved with dynamic loading of DT
> snippets, but that support seems simple pretty far from being merged.

Then work to get the DT code merged! Do not try to create
yet-another-way of doing things here if DT overlays is the correct
solution here (and it seems like it is.)

thanks,

greg k-h

2021-10-25 05:39:28

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices

On 10/23/21 3:56 AM, Greg Kroah-Hartman wrote:
> On Fri, Oct 22, 2021 at 10:18:11AM -0500, Patrick Williams wrote:
>> Hi Greg,
>>
>> On Fri, Oct 22, 2021 at 10:57:21AM +0200, Greg Kroah-Hartman wrote:
>>> On Fri, Oct 22, 2021 at 01:32:32AM -0700, Zev Weiss wrote:
>>>> On Thu, Oct 21, 2021 at 11:46:56PM PDT, Greg Kroah-Hartman wrote:
>>>>> On Thu, Oct 21, 2021 at 07:00:31PM -0700, Zev Weiss wrote:
>>
>>>> So we want the kernel to be aware of the device's existence (so that we
>>>> *can* bind a driver to it when needed), but we don't want it touching the
>>>> device unless we really ask for it.
>>>>
>>>> Does that help clarify the motivation for wanting this functionality?
>>>
>>> Sure, then just do this type of thing in the driver itself. Do not have
>>> any matching "ids" for this hardware it so that the bus will never call
>>> the probe function for this hardware _until_ a manual write happens to
>>> the driver's "bind" sysfs file.
>>
>> It sounds like you're suggesting a change to one particular driver to satisfy
>> this one particular case (and maybe I'm just not understanding your suggestion).
>> For a BMC, this is a pretty regular situation and not just as one-off as Zev's
>> example.
>>
>> Another good example is where a system can have optional riser cards with a
>> whole tree of devices that might be on that riser card (and there might be
>> different variants of a riser card that could go in the same slot). Usually
>> there is an EEPROM of some sort at a well-known address that can be parsed to
>> identify which kind of riser card it is and then the appropriate sub-devices can
>> be enumerated. That EEPROM parsing is something that is currently done in
>> userspace due to the complexity and often vendor-specific nature of it.
>>
>> Many of these devices require quite a bit more configuration information than
>> can be passed along a `bind` call. I believe it has been suggested previously
>> that this riser-card scenario could also be solved with dynamic loading of DT
>> snippets, but that support seems simple pretty far from being merged.
>
> Then work to get the DT code merged! Do not try to create
> yet-another-way of doing things here if DT overlays is the correct
> solution here (and it seems like it is.)

I don't think this is a case that fits the overlay model.

We know what the description of the device is (which is what devicetree
is all about), but the device is to be shared between the Linux kernel
and some other entity, such as the firmware or another OS. The issue
to be resolved is how to describe that the device is to be shared (in
this case exclusively by the kernel _or_ by the other entity at any
given moment), and how to move ownership of the device between the
Linux kernel and the other entity.

In the scenario presented by Zev, it is suggested that a user space
agent will be involved in deciding which entity owns the device and
to tell the Linux kernel when to take ownership of the device (and
presumably when to relinquish ownership, although we haven't seen
the implementation of relinquishing ownership yet). One could
imagine direct communication between the driver and the other
entity to mediate ownership. That seems like a driver specific
defined choice to me, though if there are enough different drivers
facing this situation then eventually a shared framework would
make sense.

So to step back and think architecture, it seems to me that the
devicetree needs to specify in the node describing the shared
device that the device must be (1) owned exclusively by either
the Linux kernel or some other entity, with a signalling method
between the Linux kernel and the other entity being defined
(possibly by information in the node or possibly by the definition
of the driver) or (2) actively shared by both the Linux
kernel and the other entity. Actively shared may or may not be
possible, based on the specific hardware. For example, if a single
contains some bits controlled by the Linux kernel and other bits
controlled by the other entity, then it can be difficult for one
of the two entities to atomically modify the register in coordination
with the other entity. Obviously case 1 is much simpler than case 2,
I'm just trying to create a picture of the potential cases. In a
simpler version of case 2, each entity would have control of their
own set of registers.

Diverging away from the overlay question, to comment on the
"status" property mentioned elsewhere in this thread, I do not
think that a status value of "reserved" is an adequate method
of conveying all of the information needed by the above range
of scenarios.

-Frank

>
> thanks,
>
> greg k-h
>

2021-10-25 05:57:21

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH 0/5] driver core, of: support for reserved devices

On 10/22/21 4:00 AM, Zev Weiss wrote:
> On Thu, Oct 21, 2021 at 11:50:07PM PDT, Greg Kroah-Hartman wrote:
>> On Thu, Oct 21, 2021 at 07:00:27PM -0700, Zev Weiss wrote:
>>> Hello all,
>>>
>>> This series is another incarnation of a couple other patchsets I've
>>> posted recently [0, 1], but again different enough in overall
>>> structure that I'm not sure it's exactly a v2 (or v3).
>>>
>>> As compared to [1], it abandons the writable binary sysfs files and at
>>> Frank's suggestion returns to an approach more akin to [0], though
>>> without any driver-specific (aspeed-smc) changes, which I figure might
>>> as well be done later in a separate series once appropriate
>>> infrastructure is in place.
>>>
>>> The basic idea is to implement support for a status property value
>>> that's documented in the DT spec [2], but thus far not used at all in
>>> the kernel (or anywhere else I'm aware of): "reserved".  According to
>>> the spec (section 2.3.4, Table 2.4), this status:
>>>
>>>   Indicates that the device is operational, but should not be used.
>>>   Typically this is used for devices that are controlled by another
>>>   software component, such as platform firmware.
>>>
>>> With these changes, devices marked as reserved are (at least in some
>>> cases, more on this later) instantiated, but will not have drivers
>>> bound to them unless and until userspace explicitly requests it by
>>> writing the device's name to the driver's sysfs 'bind' file.  This
>>> enables appropriate handling of hardware arrangements that can arise
>>> in contexts like OpenBMC, where a device may be shared with another
>>> external controller not under the kernel's control (for example, the
>>> flash chip storing the host CPU's firmware, shared by the BMC and the
>>> host CPU and exclusively under the control of the latter by default).
>>> Such a device can be marked as reserved so that the kernel refrains
>>> from touching it until appropriate preparatory steps have been taken
>>> (e.g. BMC userspace coordinating with the host CPU to arbitrate which
>>> processor has control of the firmware flash).
>>>
>>> Patches 1-3 provide some basic plumbing for checking the "reserved"
>>> status of a device, patch 4 is the main driver-core change, and patch
>>> 5 tweaks the OF platform code to not skip reserved devices so that
>>> they can actually be instantiated.
>>
>> Again, the driver core should not care about this, that is up to the bus
>> that wants to read these "reserved" values and do something with them or
>> not (remember the bus is the thing that does the binding, not the driver
>> core).
>>
>> But are you sure you are using the "reserved" field properly?
>
> Well, thus far both Rob Herring and Oliver O'Halloran (originator of the "reserved" status in the DT spec, whom I probably should have CCed earlier, sorry) have seemed receptive to this interpretation of it, which I'd hope would lend it some credence.

I am not on board with this interpretation. To me, if the value of
status is "reserved", then the Linux kernel should _never_ use the
device described by the node.

If a "reserved" node is usable by the Linux kernel, then the specification
should be updated to allow this. And the specification should probably
be expanded to either discuss how to describe the coordination between
multiple entities or state that the coordination is outside of the
specification and will be implemention dependent.

I am wary of the complexity of the operating system treating a node as
reserved at initial boot, then at some point via coordination with
some other entity starting to use the node. It is not too complex if
the node is a leaf node with no links (phandles) to or from any other node,
but as soon as links to or from other nodes exist, then other drivers or
subsystems may need to be aware of when the node is available to the
operating system or given back to the other entity then any part of the
operating system has to coordinate in that state transition. This is
driving a lot of my caution that we be careful to create architecture
and not an ad hoc hack.

-Frank

>
>> You are
>> wanting to do "something" to the device to later on be able to then have
>> the kernel touch the device, while it seems that the reason for this
>> field is for the kernel to NEVER touch the device at all.  What will
>> break if you change this logic?
>
> Given that there's no existing usage of or support for this status value anywhere I can see in the kernel, and that Oliver has indicated that it should be compatible with usage in OpenPower platform firmware, my expectation would certainly be that nothing would break, but if there are examples of things that could I'd be interested to see them.
>
>
> Thanks,
> Zev
>

2021-10-25 06:42:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices

On Mon, Oct 25, 2021 at 12:38:08AM -0500, Frank Rowand wrote:
> On 10/23/21 3:56 AM, Greg Kroah-Hartman wrote:
> > On Fri, Oct 22, 2021 at 10:18:11AM -0500, Patrick Williams wrote:
> >> Hi Greg,
> >>
> >> On Fri, Oct 22, 2021 at 10:57:21AM +0200, Greg Kroah-Hartman wrote:
> >>> On Fri, Oct 22, 2021 at 01:32:32AM -0700, Zev Weiss wrote:
> >>>> On Thu, Oct 21, 2021 at 11:46:56PM PDT, Greg Kroah-Hartman wrote:
> >>>>> On Thu, Oct 21, 2021 at 07:00:31PM -0700, Zev Weiss wrote:
> >>
> >>>> So we want the kernel to be aware of the device's existence (so that we
> >>>> *can* bind a driver to it when needed), but we don't want it touching the
> >>>> device unless we really ask for it.
> >>>>
> >>>> Does that help clarify the motivation for wanting this functionality?
> >>>
> >>> Sure, then just do this type of thing in the driver itself. Do not have
> >>> any matching "ids" for this hardware it so that the bus will never call
> >>> the probe function for this hardware _until_ a manual write happens to
> >>> the driver's "bind" sysfs file.
> >>
> >> It sounds like you're suggesting a change to one particular driver to satisfy
> >> this one particular case (and maybe I'm just not understanding your suggestion).
> >> For a BMC, this is a pretty regular situation and not just as one-off as Zev's
> >> example.
> >>
> >> Another good example is where a system can have optional riser cards with a
> >> whole tree of devices that might be on that riser card (and there might be
> >> different variants of a riser card that could go in the same slot). Usually
> >> there is an EEPROM of some sort at a well-known address that can be parsed to
> >> identify which kind of riser card it is and then the appropriate sub-devices can
> >> be enumerated. That EEPROM parsing is something that is currently done in
> >> userspace due to the complexity and often vendor-specific nature of it.
> >>
> >> Many of these devices require quite a bit more configuration information than
> >> can be passed along a `bind` call. I believe it has been suggested previously
> >> that this riser-card scenario could also be solved with dynamic loading of DT
> >> snippets, but that support seems simple pretty far from being merged.
> >
> > Then work to get the DT code merged! Do not try to create
> > yet-another-way of doing things here if DT overlays is the correct
> > solution here (and it seems like it is.)
>
> I don't think this is a case that fits the overlay model.
>
> We know what the description of the device is (which is what devicetree
> is all about), but the device is to be shared between the Linux kernel
> and some other entity, such as the firmware or another OS. The issue
> to be resolved is how to describe that the device is to be shared (in
> this case exclusively by the kernel _or_ by the other entity at any
> given moment), and how to move ownership of the device between the
> Linux kernel and the other entity.
>
> In the scenario presented by Zev, it is suggested that a user space
> agent will be involved in deciding which entity owns the device and
> to tell the Linux kernel when to take ownership of the device (and
> presumably when to relinquish ownership, although we haven't seen
> the implementation of relinquishing ownership yet). One could
> imagine direct communication between the driver and the other
> entity to mediate ownership. That seems like a driver specific
> defined choice to me, though if there are enough different drivers
> facing this situation then eventually a shared framework would
> make sense.

We have the bind/unbind ability today, from userspace, that can control
this. Why not just have Linux grab the device when it boots, and then
when userspace wants to "give the device up", it writes to "unbind" in
sysfs, and then when all is done, it writes to the "bind" file and then
Linux takes back over.

Unless for some reason Linux should _not_ grab the device when booting,
then things get messier, as we have seen in this thread.

thanks,

greg k-h

2021-10-25 13:01:14

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices

On Mon, Oct 25, 2021 at 06:44:26AM -0500, Patrick Williams wrote:
> On Mon, Oct 25, 2021 at 08:15:41AM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Oct 25, 2021 at 12:38:08AM -0500, Frank Rowand wrote:
> > > On 10/23/21 3:56 AM, Greg Kroah-Hartman wrote:
>
> > We have the bind/unbind ability today, from userspace, that can control
> > this. Why not just have Linux grab the device when it boots, and then
> > when userspace wants to "give the device up", it writes to "unbind" in
> > sysfs, and then when all is done, it writes to the "bind" file and then
> > Linux takes back over.
> >
> > Unless for some reason Linux should _not_ grab the device when booting,
> > then things get messier, as we have seen in this thread.
>
> This is probably more typical on a BMC than atypical. The systems often require
> the BMC (running Linux) to be able to reboot independently from the managed host
> (running anything). In the example Zev gave, the BMC rebooting would rip away
> the BIOS chip from the running host.
>
> The BMC almost always needs to come up in a "I don't know what could possibly be
> going on in the system" state and re-discover where the system was left off.

Isn't it an architectural issue then?

--
With Best Regards,
Andy Shevchenko


2021-10-25 14:33:27

by Patrick Williams

[permalink] [raw]
Subject: Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices

On Mon, Oct 25, 2021 at 03:34:05PM +0200, Greg Kroah-Hartman wrote:
> On Mon, Oct 25, 2021 at 08:20:05AM -0500, Patrick Williams wrote:
> > On Mon, Oct 25, 2021 at 03:58:25PM +0300, Andy Shevchenko wrote:
> > > On Mon, Oct 25, 2021 at 06:44:26AM -0500, Patrick Williams wrote:
> > > > On Mon, Oct 25, 2021 at 08:15:41AM +0200, Greg Kroah-Hartman wrote:
> > > > > On Mon, Oct 25, 2021 at 12:38:08AM -0500, Frank Rowand wrote:
> > > > > > On 10/23/21 3:56 AM, Greg Kroah-Hartman wrote:
> > > >
> > > > > We have the bind/unbind ability today, from userspace, that can control
> > > > > this. Why not just have Linux grab the device when it boots, and then
> > > > > when userspace wants to "give the device up", it writes to "unbind" in
> > > > > sysfs, and then when all is done, it writes to the "bind" file and then
> > > > > Linux takes back over.
> > > > >
> > > > > Unless for some reason Linux should _not_ grab the device when booting,
> > > > > then things get messier, as we have seen in this thread.
> > > >
> > > > This is probably more typical on a BMC than atypical. The systems often require
> > > > the BMC (running Linux) to be able to reboot independently from the managed host
> > > > (running anything). In the example Zev gave, the BMC rebooting would rip away
> > > > the BIOS chip from the running host.
> > > >
> > > > The BMC almost always needs to come up in a "I don't know what could possibly be
> > > > going on in the system" state and re-discover where the system was left off.
> > >
> > > Isn't it an architectural issue then?
> >
> > I'm not sure what "it" you are referring to here.
> >
> > I was trying to explain why starting in "bind" state is not a good idea for a
> > BMC in most of these cases where we want to be able to dynamically add a device.
>
> I think "it" is "something needs to be the moderator between the two
> operating systems". What is the external entity that handles the
> switching between the two?

Ah, ok.

Those usually end up being system / device specific. In the case of the BIOS
flash, most designs I've seen use a SPI mux between the BMC and the host
processor or IO hub (PCH on Xeons). The BMC has a GPIO to control the mux.

As far as state, the BMC on start-up will go through a set of discovery code to
figure out where it left the system prior to getting reset. That involves
looking at the power subsystem and usually doing some kind of query to the host
to see if it is alive. These queries are mostly system / host-processor design
specific. I've seen anything from an IPMI/IPMB message alert from the BMC to
the BIOS to ask "are you alive" to reading host processor state over JTAG to
figure out if the processors are "making progress".

--
Patrick Williams


Attachments:
(No filename) (2.74 kB)
signature.asc (849.00 B)
Download all attachments

2021-10-25 14:34:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices

On Mon, Oct 25, 2021 at 09:02:40AM -0500, Patrick Williams wrote:
> On Mon, Oct 25, 2021 at 03:34:05PM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Oct 25, 2021 at 08:20:05AM -0500, Patrick Williams wrote:
> > > On Mon, Oct 25, 2021 at 03:58:25PM +0300, Andy Shevchenko wrote:
> > > > On Mon, Oct 25, 2021 at 06:44:26AM -0500, Patrick Williams wrote:
> > > > > On Mon, Oct 25, 2021 at 08:15:41AM +0200, Greg Kroah-Hartman wrote:
> > > > > > On Mon, Oct 25, 2021 at 12:38:08AM -0500, Frank Rowand wrote:
> > > > > > > On 10/23/21 3:56 AM, Greg Kroah-Hartman wrote:
> > > > >
> > > > > > We have the bind/unbind ability today, from userspace, that can control
> > > > > > this. Why not just have Linux grab the device when it boots, and then
> > > > > > when userspace wants to "give the device up", it writes to "unbind" in
> > > > > > sysfs, and then when all is done, it writes to the "bind" file and then
> > > > > > Linux takes back over.
> > > > > >
> > > > > > Unless for some reason Linux should _not_ grab the device when booting,
> > > > > > then things get messier, as we have seen in this thread.
> > > > >
> > > > > This is probably more typical on a BMC than atypical. The systems often require
> > > > > the BMC (running Linux) to be able to reboot independently from the managed host
> > > > > (running anything). In the example Zev gave, the BMC rebooting would rip away
> > > > > the BIOS chip from the running host.
> > > > >
> > > > > The BMC almost always needs to come up in a "I don't know what could possibly be
> > > > > going on in the system" state and re-discover where the system was left off.
> > > >
> > > > Isn't it an architectural issue then?
> > >
> > > I'm not sure what "it" you are referring to here.
> > >
> > > I was trying to explain why starting in "bind" state is not a good idea for a
> > > BMC in most of these cases where we want to be able to dynamically add a device.
> >
> > I think "it" is "something needs to be the moderator between the two
> > operating systems". What is the external entity that handles the
> > switching between the two?
>
> Ah, ok.
>
> Those usually end up being system / device specific. In the case of the BIOS
> flash, most designs I've seen use a SPI mux between the BMC and the host
> processor or IO hub (PCH on Xeons). The BMC has a GPIO to control the mux.
>
> As far as state, the BMC on start-up will go through a set of discovery code to
> figure out where it left the system prior to getting reset. That involves
> looking at the power subsystem and usually doing some kind of query to the host
> to see if it is alive. These queries are mostly system / host-processor design
> specific. I've seen anything from an IPMI/IPMB message alert from the BMC to
> the BIOS to ask "are you alive" to reading host processor state over JTAG to
> figure out if the processors are "making progress".

But which processor is "in control" here over the hardware? What method
is used to pass the device from one CPU to another from a logical point
of view? Sounds like it is another driver that needs to handle all of
this, so why not have that be the one that adds/removes the devices
under control here?

thanks,

greg k-h

2021-10-25 15:57:12

by Patrick Williams

[permalink] [raw]
Subject: Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices

On Mon, Oct 25, 2021 at 04:09:59PM +0200, Greg Kroah-Hartman wrote:
> On Mon, Oct 25, 2021 at 09:02:40AM -0500, Patrick Williams wrote:
> > On Mon, Oct 25, 2021 at 03:34:05PM +0200, Greg Kroah-Hartman wrote:
> > > On Mon, Oct 25, 2021 at 08:20:05AM -0500, Patrick Williams wrote:
> > > I think "it" is "something needs to be the moderator between the two
> > > operating systems". What is the external entity that handles the
> > > switching between the two?
> >
> > Ah, ok.
> >
> > Those usually end up being system / device specific. In the case of the BIOS
> > flash, most designs I've seen use a SPI mux between the BMC and the host
> > processor or IO hub (PCH on Xeons). The BMC has a GPIO to control the mux.
> >
> > As far as state, the BMC on start-up will go through a set of discovery code to
> > figure out where it left the system prior to getting reset. That involves
> > looking at the power subsystem and usually doing some kind of query to the host
> > to see if it is alive. These queries are mostly system / host-processor design
> > specific. I've seen anything from an IPMI/IPMB message alert from the BMC to
> > the BIOS to ask "are you alive" to reading host processor state over JTAG to
> > figure out if the processors are "making progress".
>
> But which processor is "in control" here over the hardware?

The BMC. It owns the GPIO that controls the SPI mux.

But, the BMC is responsible for doing all operations in a way that doesn't mess
up the running host processor(s). Pulling away the SPI flash containing the
BIOS code at an incorrect time might do that.

> What method
> is used to pass the device from one CPU to another from a logical point
> of view?

The state of the server as a whole is determined and maintained by the BMC. I'm
simplifying here a bit but the operation "turn on the host processors" implies
"the host processors will access the BIOS" so the BMC must ensure "SPI mux is
switched towards the host" before "turn on the host processors".

> Sounds like it is another driver that needs to handle all of
> this, so why not have that be the one that adds/removes the devices
> under control here?

If what you're describing is moving all of the state control logic into the
kernel, I don't think that is feasible. For some systems it would mean moving
yet another entire IPMI stack into the kernel tree. On others it might be
somewhat simpler, but it is still a good amount of code. We could probably
write up more details on the scope of this.

If what you're describing is a small driver, similar to the board support
drivers that were used before the device tree, that instantiates subordinate
devices it doesn't seem like an unreasonable alternative to DT overlays to me
(for whatever my limited kernel contribution experience counts for).

--
Patrick Williams


Attachments:
(No filename) (2.83 kB)
signature.asc (849.00 B)
Download all attachments

2021-10-25 18:07:03

by Patrick Williams

[permalink] [raw]
Subject: Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices

On Mon, Oct 25, 2021 at 08:15:41AM +0200, Greg Kroah-Hartman wrote:
> On Mon, Oct 25, 2021 at 12:38:08AM -0500, Frank Rowand wrote:
> > On 10/23/21 3:56 AM, Greg Kroah-Hartman wrote:

> We have the bind/unbind ability today, from userspace, that can control
> this. Why not just have Linux grab the device when it boots, and then
> when userspace wants to "give the device up", it writes to "unbind" in
> sysfs, and then when all is done, it writes to the "bind" file and then
> Linux takes back over.
>
> Unless for some reason Linux should _not_ grab the device when booting,
> then things get messier, as we have seen in this thread.

This is probably more typical on a BMC than atypical. The systems often require
the BMC (running Linux) to be able to reboot independently from the managed host
(running anything). In the example Zev gave, the BMC rebooting would rip away
the BIOS chip from the running host.

The BMC almost always needs to come up in a "I don't know what could possibly be
going on in the system" state and re-discover where the system was left off.

--
Patrick Williams


Attachments:
(No filename) (1.10 kB)
signature.asc (849.00 B)
Download all attachments

2021-10-25 18:09:34

by Patrick Williams

[permalink] [raw]
Subject: Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices

On Mon, Oct 25, 2021 at 03:58:25PM +0300, Andy Shevchenko wrote:
> On Mon, Oct 25, 2021 at 06:44:26AM -0500, Patrick Williams wrote:
> > On Mon, Oct 25, 2021 at 08:15:41AM +0200, Greg Kroah-Hartman wrote:
> > > On Mon, Oct 25, 2021 at 12:38:08AM -0500, Frank Rowand wrote:
> > > > On 10/23/21 3:56 AM, Greg Kroah-Hartman wrote:
> >
> > > We have the bind/unbind ability today, from userspace, that can control
> > > this. Why not just have Linux grab the device when it boots, and then
> > > when userspace wants to "give the device up", it writes to "unbind" in
> > > sysfs, and then when all is done, it writes to the "bind" file and then
> > > Linux takes back over.
> > >
> > > Unless for some reason Linux should _not_ grab the device when booting,
> > > then things get messier, as we have seen in this thread.
> >
> > This is probably more typical on a BMC than atypical. The systems often require
> > the BMC (running Linux) to be able to reboot independently from the managed host
> > (running anything). In the example Zev gave, the BMC rebooting would rip away
> > the BIOS chip from the running host.
> >
> > The BMC almost always needs to come up in a "I don't know what could possibly be
> > going on in the system" state and re-discover where the system was left off.
>
> Isn't it an architectural issue then?

I'm not sure what "it" you are referring to here.

I was trying to explain why starting in "bind" state is not a good idea for a
BMC in most of these cases where we want to be able to dynamically add a device.


--
Patrick Williams


Attachments:
(No filename) (1.56 kB)
signature.asc (849.00 B)
Download all attachments

2021-10-25 18:10:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices

On Mon, Oct 25, 2021 at 08:20:05AM -0500, Patrick Williams wrote:
> On Mon, Oct 25, 2021 at 03:58:25PM +0300, Andy Shevchenko wrote:
> > On Mon, Oct 25, 2021 at 06:44:26AM -0500, Patrick Williams wrote:
> > > On Mon, Oct 25, 2021 at 08:15:41AM +0200, Greg Kroah-Hartman wrote:
> > > > On Mon, Oct 25, 2021 at 12:38:08AM -0500, Frank Rowand wrote:
> > > > > On 10/23/21 3:56 AM, Greg Kroah-Hartman wrote:
> > >
> > > > We have the bind/unbind ability today, from userspace, that can control
> > > > this. Why not just have Linux grab the device when it boots, and then
> > > > when userspace wants to "give the device up", it writes to "unbind" in
> > > > sysfs, and then when all is done, it writes to the "bind" file and then
> > > > Linux takes back over.
> > > >
> > > > Unless for some reason Linux should _not_ grab the device when booting,
> > > > then things get messier, as we have seen in this thread.
> > >
> > > This is probably more typical on a BMC than atypical. The systems often require
> > > the BMC (running Linux) to be able to reboot independently from the managed host
> > > (running anything). In the example Zev gave, the BMC rebooting would rip away
> > > the BIOS chip from the running host.
> > >
> > > The BMC almost always needs to come up in a "I don't know what could possibly be
> > > going on in the system" state and re-discover where the system was left off.
> >
> > Isn't it an architectural issue then?
>
> I'm not sure what "it" you are referring to here.
>
> I was trying to explain why starting in "bind" state is not a good idea for a
> BMC in most of these cases where we want to be able to dynamically add a device.

I think "it" is "something needs to be the moderator between the two
operating systems". What is the external entity that handles the
switching between the two?

thanks,

greg k-h

2021-10-25 18:12:45

by Frank Rowand

[permalink] [raw]
Subject: Re: [PATCH 0/5] driver core, of: support for reserved devices

On 10/25/21 12:53 AM, Frank Rowand wrote:
> On 10/22/21 4:00 AM, Zev Weiss wrote:
>> On Thu, Oct 21, 2021 at 11:50:07PM PDT, Greg Kroah-Hartman wrote:
>>> On Thu, Oct 21, 2021 at 07:00:27PM -0700, Zev Weiss wrote:
>>>> Hello all,
>>>>
>>>> This series is another incarnation of a couple other patchsets I've
>>>> posted recently [0, 1], but again different enough in overall
>>>> structure that I'm not sure it's exactly a v2 (or v3).
>>>>
>>>> As compared to [1], it abandons the writable binary sysfs files and at
>>>> Frank's suggestion returns to an approach more akin to [0], though
>>>> without any driver-specific (aspeed-smc) changes, which I figure might
>>>> as well be done later in a separate series once appropriate
>>>> infrastructure is in place.
>>>>
>>>> The basic idea is to implement support for a status property value
>>>> that's documented in the DT spec [2], but thus far not used at all in
>>>> the kernel (or anywhere else I'm aware of): "reserved".  According to
>>>> the spec (section 2.3.4, Table 2.4), this status:
>>>>
>>>>   Indicates that the device is operational, but should not be used.
>>>>   Typically this is used for devices that are controlled by another
>>>>   software component, such as platform firmware.
>>>>
>>>> With these changes, devices marked as reserved are (at least in some
>>>> cases, more on this later) instantiated, but will not have drivers
>>>> bound to them unless and until userspace explicitly requests it by
>>>> writing the device's name to the driver's sysfs 'bind' file.  This
>>>> enables appropriate handling of hardware arrangements that can arise
>>>> in contexts like OpenBMC, where a device may be shared with another
>>>> external controller not under the kernel's control (for example, the
>>>> flash chip storing the host CPU's firmware, shared by the BMC and the
>>>> host CPU and exclusively under the control of the latter by default).
>>>> Such a device can be marked as reserved so that the kernel refrains
>>>> from touching it until appropriate preparatory steps have been taken
>>>> (e.g. BMC userspace coordinating with the host CPU to arbitrate which
>>>> processor has control of the firmware flash).
>>>>
>>>> Patches 1-3 provide some basic plumbing for checking the "reserved"
>>>> status of a device, patch 4 is the main driver-core change, and patch
>>>> 5 tweaks the OF platform code to not skip reserved devices so that
>>>> they can actually be instantiated.
>>>
>>> Again, the driver core should not care about this, that is up to the bus
>>> that wants to read these "reserved" values and do something with them or
>>> not (remember the bus is the thing that does the binding, not the driver
>>> core).
>>>
>>> But are you sure you are using the "reserved" field properly?
>>
>> Well, thus far both Rob Herring and Oliver O'Halloran (originator of the "reserved" status in the DT spec, whom I probably should have CCed earlier, sorry) have seemed receptive to this interpretation of it, which I'd hope would lend it some credence.
>
> I am not on board with this interpretation. To me, if the value of
> status is "reserved", then the Linux kernel should _never_ use the
> device described by the node.
>
> If a "reserved" node is usable by the Linux kernel, then the specification
> should be updated to allow this. And the specification should probably
> be expanded to either discuss how to describe the coordination between
> multiple entities or state that the coordination is outside of the
> specification and will be implemention dependent.

Maybe a value of "reserved-sharable" should be added to the standard.
This would indicate that the node is operational and controlled by
another software component, but is available to the operating system
after requesting permission or being granted permission from the other
software component.

The exact method of requesting permission or being granted permission
could either be driver specific, or the driver binding could
include one or more additional properties to describe the method.

One thing that I am wary of is the possibility of a proliferation of
status checks changing from "okay" to "okay" || ("reserved" and the
state of the driver is that permission has been granted).

From a simplicity of coding view, it is really tempting to dynamically
change the value of the status property from "reserved-sharable"
to "okay" when the other software component grants permission to
use the device, so that status checks will magically allow use
instead of blocking use. I do not like changing the value of the
status property dynamically because the devicetree is supposed to
describe hardware (and communicate information from the firmware
to the operating system), not to actively maintain state.

-Frank

>
> I am wary of the complexity of the operating system treating a node as
> reserved at initial boot, then at some point via coordination with
> some other entity starting to use the node. It is not too complex if
> the node is a leaf node with no links (phandles) to or from any other node,
> but as soon as links to or from other nodes exist, then other drivers or
> subsystems may need to be aware of when the node is available to the
> operating system or given back to the other entity then any part of the
> operating system has to coordinate in that state transition. This is
> driving a lot of my caution that we be careful to create architecture
> and not an ad hoc hack.
>
> -Frank
>
>>
>>> You are
>>> wanting to do "something" to the device to later on be able to then have
>>> the kernel touch the device, while it seems that the reason for this
>>> field is for the kernel to NEVER touch the device at all.  What will
>>> break if you change this logic?
>>
>> Given that there's no existing usage of or support for this status value anywhere I can see in the kernel, and that Oliver has indicated that it should be compatible with usage in OpenPower platform firmware, my expectation would certainly be that nothing would break, but if there are examples of things that could I'd be interested to see them.
>>
>>
>> Thanks,
>> Zev
>>
>

2021-10-25 19:20:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices

On Mon, Oct 25, 2021 at 10:54:21AM -0500, Patrick Williams wrote:
> On Mon, Oct 25, 2021 at 04:09:59PM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Oct 25, 2021 at 09:02:40AM -0500, Patrick Williams wrote:
> > > On Mon, Oct 25, 2021 at 03:34:05PM +0200, Greg Kroah-Hartman wrote:
> > > > On Mon, Oct 25, 2021 at 08:20:05AM -0500, Patrick Williams wrote:
> > > > I think "it" is "something needs to be the moderator between the two
> > > > operating systems". What is the external entity that handles the
> > > > switching between the two?
> > >
> > > Ah, ok.
> > >
> > > Those usually end up being system / device specific. In the case of the BIOS
> > > flash, most designs I've seen use a SPI mux between the BMC and the host
> > > processor or IO hub (PCH on Xeons). The BMC has a GPIO to control the mux.
> > >
> > > As far as state, the BMC on start-up will go through a set of discovery code to
> > > figure out where it left the system prior to getting reset. That involves
> > > looking at the power subsystem and usually doing some kind of query to the host
> > > to see if it is alive. These queries are mostly system / host-processor design
> > > specific. I've seen anything from an IPMI/IPMB message alert from the BMC to
> > > the BIOS to ask "are you alive" to reading host processor state over JTAG to
> > > figure out if the processors are "making progress".
> >
> > But which processor is "in control" here over the hardware?
>
> The BMC. It owns the GPIO that controls the SPI mux.
>
> But, the BMC is responsible for doing all operations in a way that doesn't mess
> up the running host processor(s). Pulling away the SPI flash containing the
> BIOS code at an incorrect time might do that.
>
> > What method
> > is used to pass the device from one CPU to another from a logical point
> > of view?
>
> The state of the server as a whole is determined and maintained by the BMC. I'm
> simplifying here a bit but the operation "turn on the host processors" implies
> "the host processors will access the BIOS" so the BMC must ensure "SPI mux is
> switched towards the host" before "turn on the host processors".
>
> > Sounds like it is another driver that needs to handle all of
> > this, so why not have that be the one that adds/removes the devices
> > under control here?
>
> If what you're describing is moving all of the state control logic into the
> kernel, I don't think that is feasible. For some systems it would mean moving
> yet another entire IPMI stack into the kernel tree. On others it might be
> somewhat simpler, but it is still a good amount of code. We could probably
> write up more details on the scope of this.
>
> If what you're describing is a small driver, similar to the board support
> drivers that were used before the device tree, that instantiates subordinate
> devices it doesn't seem like an unreasonable alternative to DT overlays to me
> (for whatever my limited kernel contribution experience counts for).
>

Something has to be here doing the mediation between the two processors
and keeping things straight as to what processor is handling the
hardware when. I suggest you focus on that first...

Good luck!

greg k-h