2013-06-24 20:08:03

by Duyck, Alexander H

[permalink] [raw]
Subject: [PATCH] pci: Avoid unnecessary calls to work_on_cpu

This patch is meant to address the fact that we are making unnecessary calls
to work_on_cpu. To resolve this I have added a check to see if the current
node is the correct node for the device before we decide to assign the probe
task to another CPU.

The advantages to this approach is that we can avoid reentrant calls to
work_on_cpu. In addition we should not make any calls to setup the work
remotely in the case of a single node system that has NUMA enabled.

Signed-off-by: Alexander Duyck <[email protected]>
---

This patch is based off of work I submitted in an earlier patch that I never
heard back on. The change was originally submitted in:
pci: Avoid reentrant calls to work_on_cpu

I'm not sure what ever happened with that patch, however after reviewing it
some myself I decided I could do without the change to the comments since they
were unneeded. As such I am resubmitting this as a much simpler patch that
only adds the line of code needed to avoid calling work_on_cpu for every call
to probe on an NUMA node specific device.

drivers/pci/pci-driver.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 79277fb..7d81713 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -282,7 +282,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
its local memory on the right node without any need to
change it. */
node = dev_to_node(&dev->dev);
- if (node >= 0) {
+ if ((node >= 0) && (node != numa_node_id())) {
int cpu;

get_online_cpus();


2013-07-05 23:36:23

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] pci: Avoid unnecessary calls to work_on_cpu

[+cc Rusty]

On Mon, Jun 24, 2013 at 2:05 PM, Alexander Duyck
<[email protected]> wrote:
> This patch is meant to address the fact that we are making unnecessary calls
> to work_on_cpu. To resolve this I have added a check to see if the current
> node is the correct node for the device before we decide to assign the probe
> task to another CPU.
>
> The advantages to this approach is that we can avoid reentrant calls to
> work_on_cpu. In addition we should not make any calls to setup the work
> remotely in the case of a single node system that has NUMA enabled.

The description above makes it sound like this is just a minor
performance enhancement, but I think the real reason you want this is
to resolve the lockdep warning mentioned at [1]. That thread is long
and confusing, so I'd like to see a bugzilla that distills out the
useful details, and a synopsis in this changelog.

[1] https://lkml.kernel.org/r/[email protected]

> Signed-off-by: Alexander Duyck <[email protected]>
> ---
>
> This patch is based off of work I submitted in an earlier patch that I never
> heard back on. The change was originally submitted in:
> pci: Avoid reentrant calls to work_on_cpu
>
> I'm not sure what ever happened with that patch, however after reviewing it
> some myself I decided I could do without the change to the comments since they
> were unneeded. As such I am resubmitting this as a much simpler patch that
> only adds the line of code needed to avoid calling work_on_cpu for every call
> to probe on an NUMA node specific device.
>
> drivers/pci/pci-driver.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 79277fb..7d81713 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -282,7 +282,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
> its local memory on the right node without any need to
> change it. */
> node = dev_to_node(&dev->dev);
> - if (node >= 0) {
> + if ((node >= 0) && (node != numa_node_id())) {
> int cpu;
>
> get_online_cpus();

I think it's theoretically unsafe to use numa_node_id() while
preemption is enabled.

It seems a little strange to me that this "run the driver probe method
on the correct node" code is in PCI. I would think this behavior
would be desirable for *all* bus types, not just PCI, so maybe it
would make sense to do this up in device_attach() or somewhere
similar.

But Rusty added this (in 873392ca51), and he knows way more about this
stuff than I do.

Bjorn

2013-07-06 00:29:44

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] pci: Avoid unnecessary calls to work_on_cpu

On Fri, 2013-07-05 at 17:36 -0600, Bjorn Helgaas wrote:
> It seems a little strange to me that this "run the driver probe method
> on the correct node" code is in PCI. I would think this behavior
> would be desirable for *all* bus types, not just PCI, so maybe it
> would make sense to do this up in device_attach() or somewhere
> similar.
>
> But Rusty added this (in 873392ca51), and he knows way more about this
> stuff than I do.

I tend to agree... I can see this being useful on some of our non-PCI
devices on power as well in fact.

Cheers,
Ben.

2013-07-08 02:22:18

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] pci: Avoid unnecessary calls to work_on_cpu

Bjorn Helgaas <[email protected]> writes:
> [+cc Rusty]
>
> On Mon, Jun 24, 2013 at 2:05 PM, Alexander Duyck
> <[email protected]> wrote:
>> This patch is meant to address the fact that we are making unnecessary calls
>> to work_on_cpu. To resolve this I have added a check to see if the current
>> node is the correct node for the device before we decide to assign the probe
>> task to another CPU.
>>
>> The advantages to this approach is that we can avoid reentrant calls to
>> work_on_cpu. In addition we should not make any calls to setup the work
>> remotely in the case of a single node system that has NUMA enabled.
>
> The description above makes it sound like this is just a minor
> performance enhancement, but I think the real reason you want this is
> to resolve the lockdep warning mentioned at [1]. That thread is long
> and confusing, so I'd like to see a bugzilla that distills out the
> useful details, and a synopsis in this changelog.
>
> [1] https://lkml.kernel.org/r/[email protected]
>
>> Signed-off-by: Alexander Duyck <[email protected]>
>> ---
>>
>> This patch is based off of work I submitted in an earlier patch that I never
>> heard back on. The change was originally submitted in:
>> pci: Avoid reentrant calls to work_on_cpu
>>
>> I'm not sure what ever happened with that patch, however after reviewing it
>> some myself I decided I could do without the change to the comments since they
>> were unneeded. As such I am resubmitting this as a much simpler patch that
>> only adds the line of code needed to avoid calling work_on_cpu for every call
>> to probe on an NUMA node specific device.
>>
>> drivers/pci/pci-driver.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
>> index 79277fb..7d81713 100644
>> --- a/drivers/pci/pci-driver.c
>> +++ b/drivers/pci/pci-driver.c
>> @@ -282,7 +282,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
>> its local memory on the right node without any need to
>> change it. */
>> node = dev_to_node(&dev->dev);
>> - if (node >= 0) {
>> + if ((node >= 0) && (node != numa_node_id())) {
>> int cpu;
>>
>> get_online_cpus();
>
> I think it's theoretically unsafe to use numa_node_id() while
> preemption is enabled.
>
> It seems a little strange to me that this "run the driver probe method
> on the correct node" code is in PCI. I would think this behavior
> would be desirable for *all* bus types, not just PCI, so maybe it
> would make sense to do this up in device_attach() or somewhere
> similar.
>
> But Rusty added this (in 873392ca51), and he knows way more about this
> stuff than I do.

Actually, I just stopped the code from playing cpumask games, which is
what it used to do.

You want Andi and/or Greg KH:

commit d42c69972b853fd33a26c8c7405624be41a22136
Author: Andi Kleen <[email protected]>
Date: Wed Jul 6 19:56:03 2005 +0200

[PATCH] PCI: Run PCI driver initialization on local node

Run PCI driver initialization on local node

Instead of adding messy kmalloc_node()s everywhere run the
PCI driver probe on the node local to the device.

This would not have helped for IDE, but should for
other more clean drivers that do more initialization in probe().
It won't help for drivers that do most of the work
on first open (like many network drivers)

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Cheers,
Rusty.

2013-07-08 04:37:08

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] pci: Avoid unnecessary calls to work_on_cpu

> > But Rusty added this (in 873392ca51), and he knows way more about this
> > stuff than I do.
>
> Actually, I just stopped the code from playing cpumask games, which is
> what it used to do.

You're right the numa_node_id() check ptimization is not 100% safe on preempt
kernels and should be probably removed. Also I agree it would probably
make sense to move it up to the generic device layer (although I'm not
sure that other bus types really care that much about NUMA locality)

None of it seems to be a fatal problem though, so it boils down to
"if someone cares enough to write a patch"

-Andi