2020-07-27 11:08:33

by Moshe Shemesh

[permalink] [raw]
Subject: [PATCH net-next RFC 09/13] devlink: Add enable_remote_dev_reset generic parameter

The enable_remote_dev_reset devlink param flags that the host admin
allows device resets that can be initiated by other hosts. This
parameter is useful for setups where a device is shared by different
hosts, such as multi-host setup. Once the user set this parameter to
false, the driver should NACK any attempt to reset the device while the
driver is loaded.

Signed-off-by: Moshe Shemesh <[email protected]>
---
Documentation/networking/devlink/devlink-params.rst | 6 ++++++
include/net/devlink.h | 4 ++++
net/core/devlink.c | 5 +++++
3 files changed, 15 insertions(+)

diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
index d075fd090b3d..54c9f107c4b0 100644
--- a/Documentation/networking/devlink/devlink-params.rst
+++ b/Documentation/networking/devlink/devlink-params.rst
@@ -108,3 +108,9 @@ own name.
* - ``region_snapshot_enable``
- Boolean
- Enable capture of ``devlink-region`` snapshots.
+ * - ``enable_remote_dev_reset``
+ - Boolean
+ - Enable device reset by remote host. When cleared, the device driver
+ will NACK any attempt of other host to reset the device. This parameter
+ is useful for setups where a device is shared by different hosts, such
+ as multi-host setup.
diff --git a/include/net/devlink.h b/include/net/devlink.h
index b291cd8d6be6..125e7bb9bb82 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -420,6 +420,7 @@ enum devlink_param_generic_id {
DEVLINK_PARAM_GENERIC_ID_FW_LOAD_POLICY,
DEVLINK_PARAM_GENERIC_ID_RESET_DEV_ON_DRV_PROBE,
DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE,
+ DEVLINK_PARAM_GENERIC_ID_ENABLE_REMOTE_DEV_RESET,

/* add new param generic ids above here*/
__DEVLINK_PARAM_GENERIC_ID_MAX,
@@ -457,6 +458,9 @@ enum devlink_param_generic_id {
#define DEVLINK_PARAM_GENERIC_ENABLE_ROCE_NAME "enable_roce"
#define DEVLINK_PARAM_GENERIC_ENABLE_ROCE_TYPE DEVLINK_PARAM_TYPE_BOOL

+#define DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_NAME "enable_remote_dev_reset"
+#define DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_TYPE DEVLINK_PARAM_TYPE_BOOL
+
#define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate) \
{ \
.id = DEVLINK_PARAM_GENERIC_ID_##_id, \
diff --git a/net/core/devlink.c b/net/core/devlink.c
index f1812fc620d4..4d7b0f2a6f7b 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3230,6 +3230,11 @@ static const struct devlink_param devlink_param_generic[] = {
.name = DEVLINK_PARAM_GENERIC_ENABLE_ROCE_NAME,
.type = DEVLINK_PARAM_GENERIC_ENABLE_ROCE_TYPE,
},
+ {
+ .id = DEVLINK_PARAM_GENERIC_ID_ENABLE_REMOTE_DEV_RESET,
+ .name = DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_NAME,
+ .type = DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_TYPE,
+ },
};

static int devlink_param_generic_verify(const struct devlink_param *param)
--
2.17.1


2020-07-28 01:00:22

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 09/13] devlink: Add enable_remote_dev_reset generic parameter

On Mon, 27 Jul 2020 14:02:29 +0300 Moshe Shemesh wrote:
> The enable_remote_dev_reset devlink param flags that the host admin
> allows device resets that can be initiated by other hosts. This
> parameter is useful for setups where a device is shared by different
> hosts, such as multi-host setup. Once the user set this parameter to
> false, the driver should NACK any attempt to reset the device while the
> driver is loaded.
>
> Signed-off-by: Moshe Shemesh <[email protected]>

There needs to be a devlink event generated when reset is triggered
(remotely or not).

You're also missing failure reasons. Users need to know if the reset
request was clearly nacked by some host, not supported, etc. vs
unexpected failure.

2020-07-29 14:43:23

by Moshe Shemesh

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 09/13] devlink: Add enable_remote_dev_reset generic parameter


On 7/28/2020 3:59 AM, Jakub Kicinski wrote:
> On Mon, 27 Jul 2020 14:02:29 +0300 Moshe Shemesh wrote:
>> The enable_remote_dev_reset devlink param flags that the host admin
>> allows device resets that can be initiated by other hosts. This
>> parameter is useful for setups where a device is shared by different
>> hosts, such as multi-host setup. Once the user set this parameter to
>> false, the driver should NACK any attempt to reset the device while the
>> driver is loaded.
>>
>> Signed-off-by: Moshe Shemesh <[email protected]>
> There needs to be a devlink event generated when reset is triggered
> (remotely or not).
>
> You're also missing failure reasons. Users need to know if the reset
> request was clearly nacked by some host, not supported, etc. vs
> unexpected failure.


I will fix and send extack message to the user accordingly.

2020-07-29 20:57:56

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 09/13] devlink: Add enable_remote_dev_reset generic parameter

On Wed, 29 Jul 2020 17:42:12 +0300 Moshe Shemesh wrote:
> On 7/28/2020 3:59 AM, Jakub Kicinski wrote:
> > On Mon, 27 Jul 2020 14:02:29 +0300 Moshe Shemesh wrote:
> >> The enable_remote_dev_reset devlink param flags that the host admin
> >> allows device resets that can be initiated by other hosts. This
> >> parameter is useful for setups where a device is shared by different
> >> hosts, such as multi-host setup. Once the user set this parameter to
> >> false, the driver should NACK any attempt to reset the device while the
> >> driver is loaded.
> >>
> >> Signed-off-by: Moshe Shemesh <[email protected]>
> > There needs to be a devlink event generated when reset is triggered
> > (remotely or not).
> >
> > You're also missing failure reasons. Users need to know if the reset
> > request was clearly nacked by some host, not supported, etc. vs
> > unexpected failure.
>
> I will fix and send extack message to the user accordingly.

I'd suggest the reason codes to be somewhat standard.

The groups I can think of:
- timeout - device did not respond to the reset request
- device reject - FW or else has nacked the activation req
- host incapable - one of the participating hosts (in MH) is not
capable of handling live activation
- host denied - one of the participating hosts has NACKed
- host timeout - one of the p. hosts did not ack or done the procedure
in time (e.g. has not toggled the link)
- failed reset - the activation itself had failed
- failed reinit - one of p. hosts was not able to cleanly come back up

2020-07-30 12:09:38

by Moshe Shemesh

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 09/13] devlink: Add enable_remote_dev_reset generic parameter


On 7/29/2020 11:57 PM, Jakub Kicinski wrote:
> On Wed, 29 Jul 2020 17:42:12 +0300 Moshe Shemesh wrote:
>> On 7/28/2020 3:59 AM, Jakub Kicinski wrote:
>>> On Mon, 27 Jul 2020 14:02:29 +0300 Moshe Shemesh wrote:
>>>> The enable_remote_dev_reset devlink param flags that the host admin
>>>> allows device resets that can be initiated by other hosts. This
>>>> parameter is useful for setups where a device is shared by different
>>>> hosts, such as multi-host setup. Once the user set this parameter to
>>>> false, the driver should NACK any attempt to reset the device while the
>>>> driver is loaded.
>>>>
>>>> Signed-off-by: Moshe Shemesh <[email protected]>
>>> There needs to be a devlink event generated when reset is triggered
>>> (remotely or not).
>>>
>>> You're also missing failure reasons. Users need to know if the reset
>>> request was clearly nacked by some host, not supported, etc. vs
>>> unexpected failure.
>> I will fix and send extack message to the user accordingly.
> I'd suggest the reason codes to be somewhat standard.
>
> The groups I can think of:
> - timeout - device did not respond to the reset request
> - device reject - FW or else has nacked the activation req
> - host incapable - one of the participating hosts (in MH) is not
> capable of handling live activation
> - host denied - one of the participating hosts has NACKed
> - host timeout - one of the p. hosts did not ack or done the procedure
> in time (e.g. has not toggled the link)
> - failed reset - the activation itself had failed
> - failed reinit - one of p. hosts was not able to cleanly come back up


Sounds good, that seems to cover all options of fw_reset process to fail.