Currently SF device has all the aux devices enabled by default. Once
loaded, user who desire to disable some of them need to perform devlink
reload. This operation helps to reclaim memory that was not supposed
to be used, but the lost time in disabling and enabling again cannot be
recovered by this approach[1].
Therefore, introduce a new devlink generic parameter for PCI PF which
controls the creation of SFs. This parameter sets a flag in order to
disable all auxiliary devices of the SF. i.e.: All children auxiliary
devices of SF for RDMA, eth and vdpa-net are disabled by default and
hence no device initialization is done at probe stage.
$ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
value false cmode runtime
Create SF:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
$ devlink port function set pci/0000:08:00.0/32768 \
hw_addr 00:00:00:00:00:11 state active
Now depending on the use case, the user can enable specific auxiliary
device(s). For example:
$ devlink dev param set auxiliary/mlx5_core.sf.1 \
name enable_vnet value true cmde driverinit
Afterwards, user needs to reload the SF in order for the SF to come up
with the specific configuration:
$ devlink dev reload auxiliary/mlx5_core.sf.1
[1]
mlx5 devlink reload is taking about 2 seconds, which means that with
256 SFs we are speaking about ~8.5 minutes.
Shay Drory (4):
net/mlx5: Split function_setup() to enable and open functions
net/mlx5: Delete redundant default assignment of runtime devlink
params
devlink: Add new "enable_sfs_aux_devs" generic device param
net/mlx5: Support enable_sfs_aux_devs devlink param
.../networking/devlink/devlink-params.rst | 5 +
drivers/net/ethernet/mellanox/mlx5/core/dev.c | 16 ++
.../net/ethernet/mellanox/mlx5/core/devlink.c | 51 ++---
.../net/ethernet/mellanox/mlx5/core/eswitch.c | 3 +
.../net/ethernet/mellanox/mlx5/core/health.c | 5 +-
.../net/ethernet/mellanox/mlx5/core/main.c | 183 +++++++++++++++---
.../ethernet/mellanox/mlx5/core/mlx5_core.h | 6 +
.../mellanox/mlx5/core/sf/dev/driver.c | 13 +-
.../ethernet/mellanox/mlx5/core/sf/devlink.c | 40 ++++
.../ethernet/mellanox/mlx5/core/sf/hw_table.c | 7 +
.../net/ethernet/mellanox/mlx5/core/sf/priv.h | 2 +
include/linux/mlx5/driver.h | 1 +
include/net/devlink.h | 4 +
net/core/devlink.c | 5 +
14 files changed, 284 insertions(+), 57 deletions(-)
--
2.26.3
On Tue, 8 Feb 2022 19:14:02 +0200 Moshe Shemesh wrote:
> $ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
> value false cmode runtime
>
> Create SF:
> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
> $ devlink port function set pci/0000:08:00.0/32768 \
> hw_addr 00:00:00:00:00:11 state active
>
> Now depending on the use case, the user can enable specific auxiliary
> device(s). For example:
>
> $ devlink dev param set auxiliary/mlx5_core.sf.1 \
> name enable_vnet value true cmde driverinit
>
> Afterwards, user needs to reload the SF in order for the SF to come up
> with the specific configuration:
>
> $ devlink dev reload auxiliary/mlx5_core.sf.1
If the user just wants vnet why not add an API which tells the driver
which functionality the user wants when the "port" is "spawned"?
On 2/9/2022 7:23 AM, Jakub Kicinski wrote:
> On Tue, 8 Feb 2022 19:14:02 +0200 Moshe Shemesh wrote:
>> $ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
>> value false cmode runtime
>>
>> Create SF:
>> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
>> $ devlink port function set pci/0000:08:00.0/32768 \
>> hw_addr 00:00:00:00:00:11 state active
>>
>> Now depending on the use case, the user can enable specific auxiliary
>> device(s). For example:
>>
>> $ devlink dev param set auxiliary/mlx5_core.sf.1 \
>> name enable_vnet value true cmde driverinit
>>
>> Afterwards, user needs to reload the SF in order for the SF to come up
>> with the specific configuration:
>>
>> $ devlink dev reload auxiliary/mlx5_core.sf.1
> If the user just wants vnet why not add an API which tells the driver
> which functionality the user wants when the "port" is "spawned"?
Well we don't have the SFs at that stage, how can we tell which SF will
use vnet and which SF will use eth ?
Wed, Feb 09, 2022 at 06:23:41AM CET, [email protected] wrote:
>On Tue, 8 Feb 2022 19:14:02 +0200 Moshe Shemesh wrote:
>> $ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
>> value false cmode runtime
>>
>> Create SF:
>> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
>> $ devlink port function set pci/0000:08:00.0/32768 \
>> hw_addr 00:00:00:00:00:11 state active
>>
>> Now depending on the use case, the user can enable specific auxiliary
>> device(s). For example:
>>
>> $ devlink dev param set auxiliary/mlx5_core.sf.1 \
>> name enable_vnet value true cmde driverinit
>>
>> Afterwards, user needs to reload the SF in order for the SF to come up
>> with the specific configuration:
>>
>> $ devlink dev reload auxiliary/mlx5_core.sf.1
>
>If the user just wants vnet why not add an API which tells the driver
>which functionality the user wants when the "port" is "spawned"?
It's a different user. One works with the eswitch and creates the port
function. The other one takes the created instance and works with it.
Note that it may be on a different host.
On Wed, 9 Feb 2022 09:39:54 +0200 Moshe Shemesh wrote:
> Well we don't have the SFs at that stage, how can we tell which SF will
> use vnet and which SF will use eth ?
On Wed, 9 Feb 2022 10:57:21 +0100 Jiri Pirko wrote:
> It's a different user. One works with the eswitch and creates the port
> function. The other one takes the created instance and works with it.
> Note that it may be on a different host.
It is a little confusing, so I may well be misunderstanding but the
cover letter says:
$ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
value false cmode runtime
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
So both of these run on the same side, no?
What I meant is make the former part of the latter:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11 noprobe
Maybe worth clarifying - pci/0000:08:00.0 is the eswitch side and
auxiliary/mlx5_core.sf.1 is the... "customer" side, correct?
Thu, Feb 10, 2022 at 02:25:25AM CET, [email protected] wrote:
>On Wed, 9 Feb 2022 09:39:54 +0200 Moshe Shemesh wrote:
>> Well we don't have the SFs at that stage, how can we tell which SF will
>> use vnet and which SF will use eth ?
>
>On Wed, 9 Feb 2022 10:57:21 +0100 Jiri Pirko wrote:
>> It's a different user. One works with the eswitch and creates the port
>> function. The other one takes the created instance and works with it.
>> Note that it may be on a different host.
>
>It is a little confusing, so I may well be misunderstanding but the
>cover letter says:
>
>$ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
> value false cmode runtime
>
>$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
>
>So both of these run on the same side, no?
>
>What I meant is make the former part of the latter:
>
>$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11 noprobe
I see. So it would not be "global policy" but per-instance option during
creation. That makes sense. I wonder if the HW is capable of such flow,
Moshe, Saeed?
>
>
>Maybe worth clarifying - pci/0000:08:00.0 is the eswitch side and
>auxiliary/mlx5_core.sf.1 is the... "customer" side, correct?
Yep.
On 2/10/2022 9:02 AM, Jiri Pirko wrote:
> Thu, Feb 10, 2022 at 02:25:25AM CET, [email protected] wrote:
>> On Wed, 9 Feb 2022 09:39:54 +0200 Moshe Shemesh wrote:
>>> Well we don't have the SFs at that stage, how can we tell which SF will
>>> use vnet and which SF will use eth ?
>> On Wed, 9 Feb 2022 10:57:21 +0100 Jiri Pirko wrote:
>>> It's a different user. One works with the eswitch and creates the port
>>> function. The other one takes the created instance and works with it.
>>> Note that it may be on a different host.
>> It is a little confusing, so I may well be misunderstanding but the
>> cover letter says:
>>
>> $ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
>> value false cmode runtime
>>
>> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
>>
>> So both of these run on the same side, no?
Yes.
>> What I meant is make the former part of the latter:
>>
>> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11 noprobe
> I see. So it would not be "global policy" but per-instance option during
> creation. That makes sense. I wonder if the HW is capable of such flow,
> Moshe, Saeed?
LGTM. Thanks.
>
>>
>> Maybe worth clarifying - pci/0000:08:00.0 is the eswitch side and
>> auxiliary/mlx5_core.sf.1 is the... "customer" side, correct?
> Yep.
> From: Moshe Shemesh <[email protected]>
> Sent: Thursday, February 10, 2022 3:58 PM
>
> On 2/10/2022 9:02 AM, Jiri Pirko wrote:
> > Thu, Feb 10, 2022 at 02:25:25AM CET, [email protected] wrote:
> >> On Wed, 9 Feb 2022 09:39:54 +0200 Moshe Shemesh wrote:
> >>> Well we don't have the SFs at that stage, how can we tell which SF
> >>> will use vnet and which SF will use eth ?
> >> On Wed, 9 Feb 2022 10:57:21 +0100 Jiri Pirko wrote:
> >>> It's a different user. One works with the eswitch and creates the
> >>> port function. The other one takes the created instance and works with it.
> >>> Note that it may be on a different host.
> >> It is a little confusing, so I may well be misunderstanding but the
> >> cover letter says:
> >>
> >> $ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
> >> value false cmode runtime
> >>
> >> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
> >>
> >> So both of these run on the same side, no?
> Yes.
In this cover letter example it is on same side.
But as Jiri explained, both can be on different host.
> >> What I meant is make the former part of the latter:
> >>
> >> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
> >> noprobe
> > I see. So it would not be "global policy" but per-instance option
> > during creation. That makes sense. I wonder if the HW is capable of
> > such flow, Moshe, Saeed?
At present the device isn't capable of propagating this hint.
Moreover, the probe option is for the auxiliary devices of the SF (net, vdpa, rdma).
We still need to probe the SF's main auxiliary device so that a devlink instance of the SF is present to control the SF parameters [1] to compose it.
The one very good advantage I see of the per SF suggestion of Jakub is, the ability to compose most properties of a SF at one place on eswitch side.
However, even with per SF approach on eswitch side, the hurdle was in assigning the cpu affinity of the SF, which is something preferable to do on the host, where the actual workload is running.
So cpu affinity assignment per SF on host side requires devlink reload.
With that consideration it is better to control rest of the other parameters [1] too on customer side auxiliary/mlx5_core.sf.1 side.
[1] https://www.kernel.org/doc/html/latest/networking/devlink/devlink-params.html
>
> LGTM. Thanks.
>
> >
> >>
> >> Maybe worth clarifying - pci/0000:08:00.0 is the eswitch side and
> >> auxiliary/mlx5_core.sf.1 is the... "customer" side, correct?
> > Yep.
It is important to describe both use cases in the cover letter where customer side and eswitch side can be in same/different host with example.
Moshe,
Can you please revise the cover letter?
On 2/10/2022 9:09 PM, Parav Pandit wrote:
>> From: Moshe Shemesh <[email protected]>
>> Sent: Thursday, February 10, 2022 3:58 PM
>>
>> On 2/10/2022 9:02 AM, Jiri Pirko wrote:
>>> Thu, Feb 10, 2022 at 02:25:25AM CET, [email protected] wrote:
>>>> On Wed, 9 Feb 2022 09:39:54 +0200 Moshe Shemesh wrote:
>>>>> Well we don't have the SFs at that stage, how can we tell which SF
>>>>> will use vnet and which SF will use eth ?
>>>> On Wed, 9 Feb 2022 10:57:21 +0100 Jiri Pirko wrote:
>>>>> It's a different user. One works with the eswitch and creates the
>>>>> port function. The other one takes the created instance and works with it.
>>>>> Note that it may be on a different host.
>>>> It is a little confusing, so I may well be misunderstanding but the
>>>> cover letter says:
>>>>
>>>> $ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
>>>> value false cmode runtime
>>>>
>>>> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
>>>>
>>>> So both of these run on the same side, no?
>> Yes.
> In this cover letter example it is on same side.
> But as Jiri explained, both can be on different host.
>
>>>> What I meant is make the former part of the latter:
>>>>
>>>> $ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
>>>> noprobe
>>> I see. So it would not be "global policy" but per-instance option
>>> during creation. That makes sense. I wonder if the HW is capable of
>>> such flow, Moshe, Saeed?
> At present the device isn't capable of propagating this hint.
> Moreover, the probe option is for the auxiliary devices of the SF (net, vdpa, rdma).
> We still need to probe the SF's main auxiliary device so that a devlink instance of the SF is present to control the SF parameters [1] to compose it.
>
> The one very good advantage I see of the per SF suggestion of Jakub is, the ability to compose most properties of a SF at one place on eswitch side.
>
> However, even with per SF approach on eswitch side, the hurdle was in assigning the cpu affinity of the SF, which is something preferable to do on the host, where the actual workload is running.
> So cpu affinity assignment per SF on host side requires devlink reload.
> With that consideration it is better to control rest of the other parameters [1] too on customer side auxiliary/mlx5_core.sf.1 side.
>
> [1] https://www.kernel.org/doc/html/latest/networking/devlink/devlink-params.html
>
>> LGTM. Thanks.
>>
>>>> Maybe worth clarifying - pci/0000:08:00.0 is the eswitch side and
>>>> auxiliary/mlx5_core.sf.1 is the... "customer" side, correct?
>>> Yep.
> It is important to describe both use cases in the cover letter where customer side and eswitch side can be in same/different host with example.
>
> Moshe,
> Can you please revise the cover letter?
Yes, I will send revised version.