2015-04-10 16:27:58

by Benjamin Poirier

[permalink] [raw]
Subject: [PATCH] mlx4: Fix tx ring affinity_mask creation

By default, the number of tx queues is limited by the number of online cpus in
mlx4_en_get_profile(). However, this limit no longer holds after the ethtool
.set_channels method has been called. In that situation, the driver may access
invalid bits of certain cpumask variables when queue_index > nr_cpu_ids.

Signed-off-by: Benjamin Poirier <[email protected]>
---
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 55f9f5c..8c234ec 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -143,8 +143,10 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
ring->hwtstamp_tx_type = priv->hwtstamp_config.tx_type;
ring->queue_index = queue_index;

- if (queue_index < priv->num_tx_rings_p_up && cpu_online(queue_index))
- cpumask_set_cpu(queue_index, &ring->affinity_mask);
+ if (queue_index < priv->num_tx_rings_p_up)
+ cpumask_set_cpu_local_first(queue_index,
+ priv->mdev->dev->numa_node,
+ &ring->affinity_mask);

*pring = ring;
return 0;
@@ -213,7 +215,7 @@ int mlx4_en_activate_tx_ring(struct mlx4_en_priv *priv,

err = mlx4_qp_to_ready(mdev->dev, &ring->wqres.mtt, &ring->context,
&ring->qp, &ring->qp_state);
- if (!user_prio && cpu_online(ring->queue_index))
+ if (!cpumask_empty(&ring->affinity_mask))
netif_set_xps_queue(priv->dev, &ring->affinity_mask,
ring->queue_index);

--
2.3.3


2015-04-12 07:03:11

by Ido Shamay

[permalink] [raw]
Subject: Re: [PATCH] mlx4: Fix tx ring affinity_mask creation

Hi Benjamin,

On 4/10/2015 7:27 PM, Benjamin Poirier wrote:
> By default, the number of tx queues is limited by the number of online cpus in
> mlx4_en_get_profile(). However, this limit no longer holds after the ethtool
> .set_channels method has been called. In that situation, the driver may access
> invalid bits of certain cpumask variables when queue_index > nr_cpu_ids.

I must say I don't see the above issue with the current code.
Whatever is the modified value of priv->num_tx_rings_p_up, it will set
XPS only on queues which have
been set with CPU affinity mask (no access to invalid bits).

It's true that when priv->num_tx_rings_p_up > nr_cpus. not all queues
will be set with XPS.
This is because the code tries to preserve 1:1 mapping of queues to
cores, to avoid a double mapping
of queues to cores.
I guess it's ok to break the 1:1 mapping in this condition, but the
commit message should say that instead
of invalid bits. Please fix me if I'm wrong.

> Signed-off-by: Benjamin Poirier <[email protected]>
> ---
> drivers/net/ethernet/mellanox/mlx4/en_tx.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index 55f9f5c..8c234ec 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> @@ -143,8 +143,10 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
> ring->hwtstamp_tx_type = priv->hwtstamp_config.tx_type;
> ring->queue_index = queue_index;
>
> - if (queue_index < priv->num_tx_rings_p_up && cpu_online(queue_index))
> - cpumask_set_cpu(queue_index, &ring->affinity_mask);
> + if (queue_index < priv->num_tx_rings_p_up)
> + cpumask_set_cpu_local_first(queue_index,
> + priv->mdev->dev->numa_node,
> + &ring->affinity_mask);
Moving from cpumask_set_cpu to cpumask_set_cpu_local_first is great, but
should come in a different commit, since
the behavior of the XPS is changed here (xps_cpus[tx_ring[queue_index]]
!= queue_index from now).
Commit should state of this behavior change.
Thanks a lot Benjamin.
>
> *pring = ring;
> return 0;
> @@ -213,7 +215,7 @@ int mlx4_en_activate_tx_ring(struct mlx4_en_priv *priv,
>
> err = mlx4_qp_to_ready(mdev->dev, &ring->wqres.mtt, &ring->context,
> &ring->qp, &ring->qp_state);
> - if (!user_prio && cpu_online(ring->queue_index))
> + if (!cpumask_empty(&ring->affinity_mask))
> netif_set_xps_queue(priv->dev, &ring->affinity_mask,
> ring->queue_index);
>

2015-04-14 00:23:07

by Benjamin Poirier

[permalink] [raw]
Subject: Re: [PATCH] mlx4: Fix tx ring affinity_mask creation

On 2015/04/12 10:03, Ido Shamay wrote:
> Hi Benjamin,
>
> On 4/10/2015 7:27 PM, Benjamin Poirier wrote:
> >By default, the number of tx queues is limited by the number of online cpus in
> >mlx4_en_get_profile(). However, this limit no longer holds after the ethtool
> >.set_channels method has been called. In that situation, the driver may access
> >invalid bits of certain cpumask variables when queue_index > nr_cpu_ids.
>
> I must say I don't see the above issue with the current code.
> Whatever is the modified value of priv->num_tx_rings_p_up, it will set XPS
> only on queues which have
> been set with CPU affinity mask (no access to invalid bits).

The problem is not with the call to netif_set_xps_queue() it is with the
calls to cpu_online() and cpumask_set_cpu().

For example, if the user calls `ethtool -L ethX tx 32`, queue_index in
mlx4_en_create_tx_ring() can be up to 255. Depending on CONFIG_NR_CPUS
and CONFIG_CPUMASK_OFFSTACK this may result in calls to cpu_online() and
cpumask_set_cpu() with cpu >= nr_cpumask_bits which is an invalid usage
of the cpumask api. The driver will potentially read or write beyond the
end of the bitmap. With CONFIG_CPUMASK_OFFSTACK=y and
CONFIG_DEBUG_PER_CPU_MAPS=y, the aforementioned ethtool call on a system
with <32 cpus triggers the warning in cpumask_check().

>
> It's true that when priv->num_tx_rings_p_up > nr_cpus. not all queues will
> be set with XPS.
> This is because the code tries to preserve 1:1 mapping of queues to cores,
> to avoid a double mapping
> of queues to cores.
> I guess it's ok to break the 1:1 mapping in this condition, but the commit
> message should say that instead
> of invalid bits. Please fix me if I'm wrong.
>
> >Signed-off-by: Benjamin Poirier <[email protected]>
> >---
> > drivers/net/ethernet/mellanox/mlx4/en_tx.c | 8 +++++---
> > 1 file changed, 5 insertions(+), 3 deletions(-)
> >
> >diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> >index 55f9f5c..8c234ec 100644
> >--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> >+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> >@@ -143,8 +143,10 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
> > ring->hwtstamp_tx_type = priv->hwtstamp_config.tx_type;
> > ring->queue_index = queue_index;
> >- if (queue_index < priv->num_tx_rings_p_up && cpu_online(queue_index))
> >- cpumask_set_cpu(queue_index, &ring->affinity_mask);
> >+ if (queue_index < priv->num_tx_rings_p_up)
> >+ cpumask_set_cpu_local_first(queue_index,
> >+ priv->mdev->dev->numa_node,
> >+ &ring->affinity_mask);
> Moving from cpumask_set_cpu to cpumask_set_cpu_local_first is great, but
> should come in a different commit, since
> the behavior of the XPS is changed here (xps_cpus[tx_ring[queue_index]] !=
> queue_index from now).
> Commit should state of this behavior change.
> Thanks a lot Benjamin.
> > *pring = ring;
> > return 0;
> >@@ -213,7 +215,7 @@ int mlx4_en_activate_tx_ring(struct mlx4_en_priv *priv,
> > err = mlx4_qp_to_ready(mdev->dev, &ring->wqres.mtt, &ring->context,
> > &ring->qp, &ring->qp_state);
> >- if (!user_prio && cpu_online(ring->queue_index))
> >+ if (!cpumask_empty(&ring->affinity_mask))
> > netif_set_xps_queue(priv->dev, &ring->affinity_mask,
> > ring->queue_index);
>

2015-04-28 03:26:43

by Benjamin Poirier

[permalink] [raw]
Subject: Re: [PATCH] mlx4: Fix tx ring affinity_mask creation

On 2015/04/13 17:22, Benjamin Poirier wrote:
> On 2015/04/12 10:03, Ido Shamay wrote:
> > Hi Benjamin,
> >
> > On 4/10/2015 7:27 PM, Benjamin Poirier wrote:
> > >By default, the number of tx queues is limited by the number of online cpus in
> > >mlx4_en_get_profile(). However, this limit no longer holds after the ethtool
> > >.set_channels method has been called. In that situation, the driver may access
> > >invalid bits of certain cpumask variables when queue_index > nr_cpu_ids.
> >
> > I must say I don't see the above issue with the current code.
> > Whatever is the modified value of priv->num_tx_rings_p_up, it will set XPS
> > only on queues which have
> > been set with CPU affinity mask (no access to invalid bits).
>
> The problem is not with the call to netif_set_xps_queue() it is with the
> calls to cpu_online() and cpumask_set_cpu().
>
> For example, if the user calls `ethtool -L ethX tx 32`, queue_index in
> mlx4_en_create_tx_ring() can be up to 255. Depending on CONFIG_NR_CPUS
> and CONFIG_CPUMASK_OFFSTACK this may result in calls to cpu_online() and
> cpumask_set_cpu() with cpu >= nr_cpumask_bits which is an invalid usage
> of the cpumask api. The driver will potentially read or write beyond the
> end of the bitmap. With CONFIG_CPUMASK_OFFSTACK=y and
> CONFIG_DEBUG_PER_CPU_MAPS=y, the aforementioned ethtool call on a system
> with <32 cpus triggers the warning in cpumask_check().
>

Mellanox, can you please
ack the patch as submitted, or
clarify what changes you'd like to see given my reply above, or
submit a fix of your own for this problem

Thanks,
-Benjamin

> >
> > It's true that when priv->num_tx_rings_p_up > nr_cpus. not all queues will
> > be set with XPS.
> > This is because the code tries to preserve 1:1 mapping of queues to cores,
> > to avoid a double mapping
> > of queues to cores.
> > I guess it's ok to break the 1:1 mapping in this condition, but the commit
> > message should say that instead
> > of invalid bits. Please fix me if I'm wrong.
> >
> > >Signed-off-by: Benjamin Poirier <[email protected]>
> > >---
> > > drivers/net/ethernet/mellanox/mlx4/en_tx.c | 8 +++++---
> > > 1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > >diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > >index 55f9f5c..8c234ec 100644
> > >--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > >+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > >@@ -143,8 +143,10 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
> > > ring->hwtstamp_tx_type = priv->hwtstamp_config.tx_type;
> > > ring->queue_index = queue_index;
> > >- if (queue_index < priv->num_tx_rings_p_up && cpu_online(queue_index))
> > >- cpumask_set_cpu(queue_index, &ring->affinity_mask);
> > >+ if (queue_index < priv->num_tx_rings_p_up)
> > >+ cpumask_set_cpu_local_first(queue_index,
> > >+ priv->mdev->dev->numa_node,
> > >+ &ring->affinity_mask);
> > Moving from cpumask_set_cpu to cpumask_set_cpu_local_first is great, but
> > should come in a different commit, since
> > the behavior of the XPS is changed here (xps_cpus[tx_ring[queue_index]] !=
> > queue_index from now).
> > Commit should state of this behavior change.
> > Thanks a lot Benjamin.
> > > *pring = ring;
> > > return 0;
> > >@@ -213,7 +215,7 @@ int mlx4_en_activate_tx_ring(struct mlx4_en_priv *priv,
> > > err = mlx4_qp_to_ready(mdev->dev, &ring->wqres.mtt, &ring->context,
> > > &ring->qp, &ring->qp_state);
> > >- if (!user_prio && cpu_online(ring->queue_index))
> > >+ if (!cpumask_empty(&ring->affinity_mask))
> > > netif_set_xps_queue(priv->dev, &ring->affinity_mask,
> > > ring->queue_index);
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2015-04-28 13:37:44

by Ido Shamay

[permalink] [raw]
Subject: Re: [PATCH] mlx4: Fix tx ring affinity_mask creation

On 4/28/2015 6:26 AM, Benjamin Poirier wrote:
> On 2015/04/13 17:22, Benjamin Poirier wrote:
>> On 2015/04/12 10:03, Ido Shamay wrote:
>>> Hi Benjamin,
>>>
>>> On 4/10/2015 7:27 PM, Benjamin Poirier wrote:
>>>> By default, the number of tx queues is limited by the number of online cpus in
>>>> mlx4_en_get_profile(). However, this limit no longer holds after the ethtool
>>>> .set_channels method has been called. In that situation, the driver may access
>>>> invalid bits of certain cpumask variables when queue_index > nr_cpu_ids.
>>> I must say I don't see the above issue with the current code.
>>> Whatever is the modified value of priv->num_tx_rings_p_up, it will set XPS
>>> only on queues which have
>>> been set with CPU affinity mask (no access to invalid bits).
>> The problem is not with the call to netif_set_xps_queue() it is with the
>> calls to cpu_online() and cpumask_set_cpu().
>>
>> For example, if the user calls `ethtool -L ethX tx 32`, queue_index in
>> mlx4_en_create_tx_ring() can be up to 255. Depending on CONFIG_NR_CPUS
>> and CONFIG_CPUMASK_OFFSTACK this may result in calls to cpu_online() and
>> cpumask_set_cpu() with cpu >= nr_cpumask_bits which is an invalid usage
>> of the cpumask api. The driver will potentially read or write beyond the
>> end of the bitmap. With CONFIG_CPUMASK_OFFSTACK=y and
>> CONFIG_DEBUG_PER_CPU_MAPS=y, the aforementioned ethtool call on a system
>> with <32 cpus triggers the warning in cpumask_check().
>>
> Mellanox, can you please
> ack the patch as submitted, or
> clarify what changes you'd like to see given my reply above, or
> submit a fix of your own for this problem
>
> Thanks,
> -Benjamin
Hi Benjamin,

After further review and better understanding of the issue, we are okay
with your patch as is.
Thanks for the good work.

Acked-by: Ido Shamay <[email protected]>

2015-04-28 18:24:05

by Or Gerlitz

[permalink] [raw]
Subject: Re: [PATCH] mlx4: Fix tx ring affinity_mask creation

On Fri, Apr 10, 2015 at 7:27 PM, Benjamin Poirier <[email protected]> wrote:
> By default, the number of tx queues is limited by the number of online cpus in
> mlx4_en_get_profile(). However, this limit no longer holds after the ethtool
> .set_channels method has been called. In that situation, the driver may access
> invalid bits of certain cpumask variables when queue_index > nr_cpu_ids.
>

Hi Benjamin,

Can this fix be related to a specific commit? if yes, would be good if
you can add here a Fixes: line so it would be easier to spot down to
which stable kernels the fix should go.

Or.

> Signed-off-by: Benjamin Poirier <[email protected]>