2020-06-10 17:21:09

by Nitesh Narayan Lal

[permalink] [raw]
Subject: [PATCH v1 0/3] Preventing job distribution to isolated CPUs

This patch-set is originated from one of the patches that have been
posted earlier as a part of "Task_isolation" mode [1] patch series
by Alex Belits <[email protected]>. There are only a couple of
changes that I am proposing in this patch-set compared to what Alex
has posted earlier.


Context
=======
On a broad level, all three patches that are included in this patch
set are meant to improve the driver/library to respect isolated
CPUs by not pinning any job on it. Not doing so could impact
the latency values in RT use-cases.


Patches
=======
* Patch1:
The first patch is meant to make cpumask_local_spread()
aware of the isolated CPUs. It ensures that the CPUs that
are returned by this API only includes housekeeping CPUs.

* Patch2:
This patch ensures that a probe function that is called
using work_on_cpu() doesn't run any task on an isolated CPU.

* Patch3:
This patch makes store_rps_map() aware of the isolated
CPUs so that rps don't queue any jobs on an isolated CPU.


Changes
=======
To fix the above-mentioned issues Alex has used housekeeping_cpumask().
The only changes that I am proposing here are:
- Removing the dependency on CONFIG_TASK_ISOLATION that was proposed by Alex.
  As it should be safe to rely on housekeeping_cpumask()
  even when we don't have any isolated CPUs and we want
  to fall back to using all available CPUs in any of the above scenarios.
- Using both HK_FLAG_DOMAIN and HK_FLAG_WQ in all three patches, this is
because we would want the above fixes not only when we have isolcpus but
also with something like systemd's CPU affinity.


Testing
=======
* Patch 1:
Fix for cpumask_local_spread() is tested by creating VFs, loading
iavf module and by adding a tracepoint to confirm that only housekeeping
CPUs are picked when an appropriate profile is set up and all remaining CPUs
when no CPU isolation is required/configured.

* Patch 2:
To test the PCI fix, I hotplugged a virtio-net-pci from qemu console
and forced its addition to a specific node to trigger the code path that
includes the proposed fix and verified that only housekeeping CPUs
are included via tracepoint. I understand that this may not be the
best way to test it, hence, I am open to any suggestion to test this
fix in a better way if required.

* Patch 3:
To test the fix in store_rps_map(), I tried configuring an isolated
CPU by writing to /sys/class/net/en*/queues/rx*/rps_cpus which
resulted in 'write error: Invalid argument' error. For the case
where a non-isolated CPU is writing in rps_cpus the above operation
succeeded without any error.

[1] https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/

Alex Belits (3):
lib: restricting cpumask_local_spread to only houskeeping CPUs
PCI: prevent work_on_cpu's probe to execute on isolated CPUs
net: restrict queuing of receive packets to housekeeping CPUs

drivers/pci/pci-driver.c | 5 ++++-
lib/cpumask.c | 43 +++++++++++++++++++++++-----------------
net/core/net-sysfs.c | 10 +++++++++-
3 files changed, 38 insertions(+), 20 deletions(-)

--



2020-06-10 17:21:11

by Nitesh Narayan Lal

[permalink] [raw]
Subject: [Patch v1 1/3] lib: restricting cpumask_local_spread to only houskeeping CPUs

From: Alex Belits <[email protected]>

The current implementation of cpumask_local_spread() does not
respect the isolated CPUs, i.e., even if a CPU has been isolated
for Real-Time task, it will return it to the caller for pinning
of its IRQ threads. Having these unwanted IRQ threads on an
isolated CPU adds up to a latency overhead.
This patch restricts the CPUs that are returned for spreading
IRQs only to the available housekeeping CPUs.

Signed-off-by: Alex Belits <[email protected]>
Signed-off-by: Nitesh Narayan Lal <[email protected]>
---
lib/cpumask.c | 43 +++++++++++++++++++++++++------------------
1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/lib/cpumask.c b/lib/cpumask.c
index fb22fb266f93..cc4311a8c079 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -6,6 +6,7 @@
#include <linux/export.h>
#include <linux/memblock.h>
#include <linux/numa.h>
+#include <linux/sched/isolation.h>

/**
* cpumask_next - get the next cpu in a cpumask
@@ -205,28 +206,34 @@ void __init free_bootmem_cpumask_var(cpumask_var_t mask)
*/
unsigned int cpumask_local_spread(unsigned int i, int node)
{
- int cpu;
+ int cpu, m, n, hk_flags;
+ const struct cpumask *mask;

+ hk_flags = HK_FLAG_DOMAIN | HK_FLAG_WQ;
+ mask = housekeeping_cpumask(hk_flags);
+ m = cpumask_weight(mask);
/* Wrap: we always want a cpu. */
- i %= num_online_cpus();
+ n = i % m;
+ while (m-- > 0) {
+ if (node == NUMA_NO_NODE) {
+ for_each_cpu(cpu, mask)
+ if (n-- == 0)
+ return cpu;
+ } else {
+ /* NUMA first. */
+ for_each_cpu_and(cpu, cpumask_of_node(node), mask)
+ if (n-- == 0)
+ return cpu;

- if (node == NUMA_NO_NODE) {
- for_each_cpu(cpu, cpu_online_mask)
- if (i-- == 0)
- return cpu;
- } else {
- /* NUMA first. */
- for_each_cpu_and(cpu, cpumask_of_node(node), cpu_online_mask)
- if (i-- == 0)
- return cpu;
+ for_each_cpu(cpu, mask) {
+ /* Skip NUMA nodes, done above. */
+ if (cpumask_test_cpu(cpu,
+ cpumask_of_node(node)))
+ continue;

- for_each_cpu(cpu, cpu_online_mask) {
- /* Skip NUMA nodes, done above. */
- if (cpumask_test_cpu(cpu, cpumask_of_node(node)))
- continue;
-
- if (i-- == 0)
- return cpu;
+ if (n-- == 0)
+ return cpu;
+ }
}
}
BUG();
--
2.18.4

2020-06-10 17:21:59

by Nitesh Narayan Lal

[permalink] [raw]
Subject: [Patch v1 3/3] net: restrict queuing of receive packets to housekeeping CPUs

From: Alex Belits <[email protected]>

With the existing implementation of store_rps_map() packets are
queued in the receive path on the backlog queues of other
CPUs irrespective of whether they are isolated or not. This could
add a latency overhead to any RT workload that is running on
the same CPU.
This patch ensures that store_rps_map() only uses available
housekeeping CPUs for storing the rps_map.

Signed-off-by: Alex Belits <[email protected]>
Signed-off-by: Nitesh Narayan Lal <[email protected]>
---
net/core/net-sysfs.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index e353b822bb15..16e433287191 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -11,6 +11,7 @@
#include <linux/if_arp.h>
#include <linux/slab.h>
#include <linux/sched/signal.h>
+#include <linux/sched/isolation.h>
#include <linux/nsproxy.h>
#include <net/sock.h>
#include <net/net_namespace.h>
@@ -741,7 +742,7 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
{
struct rps_map *old_map, *map;
cpumask_var_t mask;
- int err, cpu, i;
+ int err, cpu, i, hk_flags;
static DEFINE_MUTEX(rps_map_mutex);

if (!capable(CAP_NET_ADMIN))
@@ -756,6 +757,13 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
return err;
}

+ hk_flags = HK_FLAG_DOMAIN | HK_FLAG_WQ;
+ cpumask_and(mask, mask, housekeeping_cpumask(hk_flags));
+ if (cpumask_weight(mask) == 0) {
+ free_cpumask_var(mask);
+ return -EINVAL;
+ }
+
map = kzalloc(max_t(unsigned int,
RPS_MAP_SIZE(cpumask_weight(mask)), L1_CACHE_BYTES),
GFP_KERNEL);
--
2.18.4

2020-06-10 20:05:44

by Nitesh Narayan Lal

[permalink] [raw]
Subject: [Patch v1 2/3] PCI: prevent work_on_cpu's probe to execute on isolated CPUs

From: Alex Belits <[email protected]>

pci_call_probe() prevents the nesting of work_on_cpu()
for a scenario where a VF device is probed from work_on_cpu()
of the Physical device.
This patch replaces the cpumask used in pci_call_probe()
from all online CPUs to only housekeeping CPUs. This is to
ensure that there are no additional latency overheads
caused due to the pinning of jobs on isolated CPUs.

Signed-off-by: Alex Belits <[email protected]>
Signed-off-by: Nitesh Narayan Lal <[email protected]>
---
drivers/pci/pci-driver.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index da6510af1221..449466f71040 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -12,6 +12,7 @@
#include <linux/string.h>
#include <linux/slab.h>
#include <linux/sched.h>
+#include <linux/sched/isolation.h>
#include <linux/cpu.h>
#include <linux/pm_runtime.h>
#include <linux/suspend.h>
@@ -333,6 +334,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
const struct pci_device_id *id)
{
int error, node, cpu;
+ int hk_flags = HK_FLAG_DOMAIN | HK_FLAG_WQ;
struct drv_dev_and_id ddi = { drv, dev, id };

/*
@@ -353,7 +355,8 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
pci_physfn_is_probed(dev))
cpu = nr_cpu_ids;
else
- cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask);
+ cpu = cpumask_any_and(cpumask_of_node(node),
+ housekeeping_cpumask(hk_flags));

if (cpu < nr_cpu_ids)
error = work_on_cpu(cpu, local_pci_probe, &ddi);
--
2.18.4

2020-06-16 17:28:55

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH v1 0/3] Preventing job distribution to isolated CPUs

Hi Nitesh,

On Wed, Jun 10, 2020 at 12:12:23PM -0400, Nitesh Narayan Lal wrote:
> This patch-set is originated from one of the patches that have been
> posted earlier as a part of "Task_isolation" mode [1] patch series
> by Alex Belits <[email protected]>. There are only a couple of
> changes that I am proposing in this patch-set compared to what Alex
> has posted earlier.
>
>
> Context
> =======
> On a broad level, all three patches that are included in this patch
> set are meant to improve the driver/library to respect isolated
> CPUs by not pinning any job on it. Not doing so could impact
> the latency values in RT use-cases.
>
>
> Patches
> =======
> * Patch1:
> The first patch is meant to make cpumask_local_spread()
> aware of the isolated CPUs. It ensures that the CPUs that
> are returned by this API only includes housekeeping CPUs.
>
> * Patch2:
> This patch ensures that a probe function that is called
> using work_on_cpu() doesn't run any task on an isolated CPU.
>
> * Patch3:
> This patch makes store_rps_map() aware of the isolated
> CPUs so that rps don't queue any jobs on an isolated CPU.
>
>
> Changes
> =======
> To fix the above-mentioned issues Alex has used housekeeping_cpumask().
> The only changes that I am proposing here are:
> - Removing the dependency on CONFIG_TASK_ISOLATION that was proposed by Alex.
> ? As it should be safe to rely on housekeeping_cpumask()
> ? even when we don't have any isolated CPUs and we want
> ? to fall back to using all available CPUs in any of the above scenarios.
> - Using both HK_FLAG_DOMAIN and HK_FLAG_WQ in all three patches, this is
> because we would want the above fixes not only when we have isolcpus but
> also with something like systemd's CPU affinity.
>
>
> Testing
> =======
> * Patch 1:
> Fix for cpumask_local_spread() is tested by creating VFs, loading
> iavf module and by adding a tracepoint to confirm that only housekeeping
> CPUs are picked when an appropriate profile is set up and all remaining CPUs
> when no CPU isolation is required/configured.
>
> * Patch 2:
> To test the PCI fix, I hotplugged a virtio-net-pci from qemu console
> and forced its addition to a specific node to trigger the code path that
> includes the proposed fix and verified that only housekeeping CPUs
> are included via tracepoint. I understand that this may not be the
> best way to test it, hence, I am open to any suggestion to test this
> fix in a better way if required.
>
> * Patch 3:
> To test the fix in store_rps_map(), I tried configuring an isolated
> CPU by writing to /sys/class/net/en*/queues/rx*/rps_cpus which
> resulted in 'write error: Invalid argument' error. For the case
> where a non-isolated CPU is writing in rps_cpus the above operation
> succeeded without any error.
>
> [1] https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/
>
> Alex Belits (3):
> lib: restricting cpumask_local_spread to only houskeeping CPUs
> PCI: prevent work_on_cpu's probe to execute on isolated CPUs
> net: restrict queuing of receive packets to housekeeping CPUs
>
> drivers/pci/pci-driver.c | 5 ++++-
> lib/cpumask.c | 43 +++++++++++++++++++++++-----------------
> net/core/net-sysfs.c | 10 +++++++++-
> 3 files changed, 38 insertions(+), 20 deletions(-)
>
> --
>

Looks good to me.

The flags mechanism is not well organized: this is using HK_FLAG_WQ to
infer nohz_full is being set (while HK_FLAG_WQ should indicate that
non-affined workqueue threads should not run on certain CPUs).

But this is a problem of the flags (which apparently Frederic wants
to fix by exposing a limited number of options to users), and not
of this patch.


2020-06-16 20:08:21

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Patch v1 2/3] PCI: prevent work_on_cpu's probe to execute on isolated CPUs

"git log --oneline drivers/pci/pci-driver.c" tells you that the
subject should be something like:

PCI: Restrict probe functions to housekeeping CPUs

On Wed, Jun 10, 2020 at 12:12:25PM -0400, Nitesh Narayan Lal wrote:
> From: Alex Belits <[email protected]>
>
> pci_call_probe() prevents the nesting of work_on_cpu()
> for a scenario where a VF device is probed from work_on_cpu()
> of the Physical device.
> This patch replaces the cpumask used in pci_call_probe()
> from all online CPUs to only housekeeping CPUs. This is to
> ensure that there are no additional latency overheads
> caused due to the pinning of jobs on isolated CPUs.

s/Physical/PF/ (since you used "VF" earlier, this should match that)

s/This patch replaces/Replace the/

Please rewrap this to fill a 75 column line (so it doesn't overflow 80
columns when "git log" adds 4 spaces).

This should be two paragraphs; add a blank line between them.

> Signed-off-by: Alex Belits <[email protected]>
> Signed-off-by: Nitesh Narayan Lal <[email protected]>

Acked-by: Bjorn Helgaas <[email protected]>

> ---
> drivers/pci/pci-driver.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index da6510af1221..449466f71040 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -12,6 +12,7 @@
> #include <linux/string.h>
> #include <linux/slab.h>
> #include <linux/sched.h>
> +#include <linux/sched/isolation.h>
> #include <linux/cpu.h>
> #include <linux/pm_runtime.h>
> #include <linux/suspend.h>
> @@ -333,6 +334,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
> const struct pci_device_id *id)
> {
> int error, node, cpu;
> + int hk_flags = HK_FLAG_DOMAIN | HK_FLAG_WQ;
> struct drv_dev_and_id ddi = { drv, dev, id };
>
> /*
> @@ -353,7 +355,8 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
> pci_physfn_is_probed(dev))
> cpu = nr_cpu_ids;
> else
> - cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask);
> + cpu = cpumask_any_and(cpumask_of_node(node),
> + housekeeping_cpumask(hk_flags));
>
> if (cpu < nr_cpu_ids)
> error = work_on_cpu(cpu, local_pci_probe, &ddi);
> --
> 2.18.4
>

2020-06-16 22:05:42

by Nitesh Narayan Lal

[permalink] [raw]
Subject: Re: [Patch v1 2/3] PCI: prevent work_on_cpu's probe to execute on isolated CPUs


On 6/16/20 4:05 PM, Bjorn Helgaas wrote:
> "git log --oneline drivers/pci/pci-driver.c" tells you that the
> subject should be something like:
>
> PCI: Restrict probe functions to housekeeping CPUs
>
> On Wed, Jun 10, 2020 at 12:12:25PM -0400, Nitesh Narayan Lal wrote:
>> From: Alex Belits <[email protected]>
>>
>> pci_call_probe() prevents the nesting of work_on_cpu()
>> for a scenario where a VF device is probed from work_on_cpu()
>> of the Physical device.
>> This patch replaces the cpumask used in pci_call_probe()
>> from all online CPUs to only housekeeping CPUs. This is to
>> ensure that there are no additional latency overheads
>> caused due to the pinning of jobs on isolated CPUs.
> s/Physical/PF/ (since you used "VF" earlier, this should match that)
>
> s/This patch replaces/Replace the/
>
> Please rewrap this to fill a 75 column line (so it doesn't overflow 80
> columns when "git log" adds 4 spaces).
>
> This should be two paragraphs; add a blank line between them.

Thanks for pointing these out.
I will correct it in the next posting before that I will wait for any comments
on other patches.

>
>> Signed-off-by: Alex Belits <[email protected]>
>> Signed-off-by: Nitesh Narayan Lal <[email protected]>
> Acked-by: Bjorn Helgaas <[email protected]>
>
>> ---
>> drivers/pci/pci-driver.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>> index da6510af1221..449466f71040 100644
>> --- a/drivers/pci/pci-driver.c
>> +++ b/drivers/pci/pci-driver.c
>> @@ -12,6 +12,7 @@
>> #include <linux/string.h>
>> #include <linux/slab.h>
>> #include <linux/sched.h>
>> +#include <linux/sched/isolation.h>
>> #include <linux/cpu.h>
>> #include <linux/pm_runtime.h>
>> #include <linux/suspend.h>
>> @@ -333,6 +334,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
>> const struct pci_device_id *id)
>> {
>> int error, node, cpu;
>> + int hk_flags = HK_FLAG_DOMAIN | HK_FLAG_WQ;
>> struct drv_dev_and_id ddi = { drv, dev, id };
>>
>> /*
>> @@ -353,7 +355,8 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
>> pci_physfn_is_probed(dev))
>> cpu = nr_cpu_ids;
>> else
>> - cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask);
>> + cpu = cpumask_any_and(cpumask_of_node(node),
>> + housekeeping_cpumask(hk_flags));
>>
>> if (cpu < nr_cpu_ids)
>> error = work_on_cpu(cpu, local_pci_probe, &ddi);
>> --
>> 2.18.4
>>
--
Nitesh


Attachments:
signature.asc (849.00 B)
OpenPGP digital signature

2020-06-16 23:24:51

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [Patch v1 2/3] PCI: prevent work_on_cpu's probe to execute on isolated CPUs

On Wed, Jun 10, 2020 at 12:12:25PM -0400, Nitesh Narayan Lal wrote:
> From: Alex Belits <[email protected]>
>
> pci_call_probe() prevents the nesting of work_on_cpu()
> for a scenario where a VF device is probed from work_on_cpu()
> of the Physical device.
> This patch replaces the cpumask used in pci_call_probe()
> from all online CPUs to only housekeeping CPUs. This is to
> ensure that there are no additional latency overheads
> caused due to the pinning of jobs on isolated CPUs.
>
> Signed-off-by: Alex Belits <[email protected]>
> Signed-off-by: Nitesh Narayan Lal <[email protected]>
> ---
> drivers/pci/pci-driver.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index da6510af1221..449466f71040 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -12,6 +12,7 @@
> #include <linux/string.h>
> #include <linux/slab.h>
> #include <linux/sched.h>
> +#include <linux/sched/isolation.h>
> #include <linux/cpu.h>
> #include <linux/pm_runtime.h>
> #include <linux/suspend.h>
> @@ -333,6 +334,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
> const struct pci_device_id *id)
> {
> int error, node, cpu;
> + int hk_flags = HK_FLAG_DOMAIN | HK_FLAG_WQ;
> struct drv_dev_and_id ddi = { drv, dev, id };
>
> /*
> @@ -353,7 +355,8 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
> pci_physfn_is_probed(dev))
> cpu = nr_cpu_ids;
> else
> - cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask);
> + cpu = cpumask_any_and(cpumask_of_node(node),
> + housekeeping_cpumask(hk_flags));

Looks like cpumask_of_node() is based on online CPUs. So that all
looks good. Thanks!

Reviewed-by: Frederic Weisbecker <[email protected]>


>
> if (cpu < nr_cpu_ids)
> error = work_on_cpu(cpu, local_pci_probe, &ddi);
> --
> 2.18.4
>