2010-01-01 19:51:28

by David John

[permalink] [raw]
Subject: [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS

Hi All,

Commit e0cd516 causes an null pointer dereference when reading from the
sysfs attributes local_cpu* on Intel machines with no ACPI NUMA
proximity info, since dev->numa_node gets set to -1 for all PCI devices,
which then gets passed to cpumask_of_node.

The patch following this mail fixes the problem for x86. Perhaps a more
thorough solution would be to fix the PCI layer to set the node
information for devices to zero rather than -1 (Since if CONFIG_NUMA=y
we have node 0)? I don't know whether it is safe / correct to do this.

Regards,
David.


2010-01-01 22:22:40

by Yinghai Lu

[permalink] [raw]
Subject: Re: [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS

On Fri, Jan 1, 2010 at 11:50 AM, David John <[email protected]> wrote:
> Hi All,
>
> Commit e0cd516 causes an null pointer dereference when reading from the
> sysfs attributes local_cpu* on Intel machines with no ACPI NUMA
> proximity info, since dev->numa_node gets set to -1 for all PCI devices,
> which then gets passed to cpumask_of_node.
>
> The patch following this mail fixes the problem for x86. Perhaps a more
> thorough solution would be to fix the PCI layer to set the node
> information for devices to zero rather than -1 (Since if CONFIG_NUMA=y
> we have node 0)? I don't know whether it is safe / correct to do this.

no.

1. -1, mean calling code will use node that code is running on.
2. the system that have two or more nodes, and more peer root buses.
if the first node doesn't have RAM installed, no node0 then.

YH

2010-01-01 21:55:24

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS

On Friday 01 January 2010, David John wrote:
> Hi All,

Hi,

> Commit e0cd516 causes an null pointer dereference when reading from the
> sysfs attributes local_cpu* on Intel machines with no ACPI NUMA
> proximity info, since dev->numa_node gets set to -1 for all PCI devices,
> which then gets passed to cpumask_of_node.
>
> The patch following this mail fixes the problem for x86. Perhaps a more
> thorough solution would be to fix the PCI layer to set the node
> information for devices to zero rather than -1 (Since if CONFIG_NUMA=y
> we have node 0)? I don't know whether it is safe / correct to do this.

First, it would be nicer if you said what the commit subject was in addition
to providing its hash.

Second, why don't you put the information above into the changelog of your
patch?

Rafael

2010-01-01 19:56:07

by David John

[permalink] [raw]
Subject: [PATCH] Check the node argument passed to cpumask_of_node

Ensure that the node value is valid.

Signed-off-by: David John <[email protected]>

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index c5087d7..1141a6e 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -99,7 +99,8 @@ extern const struct cpumask *cpumask_of_node(int node);
/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
static inline const struct cpumask *cpumask_of_node(int node)
{
- return node_to_cpumask_map[node];
+ return (node < 0 || node >= nr_node_ids) ? cpu_online_mask :
+ node_to_cpumask_map[node];
}
#endif

2010-01-02 06:16:48

by David John

[permalink] [raw]
Subject: [PATCH v2] Check the node argument passed to cpumask_of_node

Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus' NUMA information"
causes an null pointer dereference when reading from the sysfs attributes local_cpu*
on Intel machines with no ACPI NUMA proximity info, since dev->numa_node gets set to -1
for all PCI devices, which then gets passed to cpumask_of_node.

Ensure that the node value is valid.

Signed-off-by: David John <[email protected]>

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index c5087d7..1141a6e 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -99,7 +99,8 @@ extern const struct cpumask *cpumask_of_node(int node);
/* Returns a pointer to the cpumask of CPUs on Node 'node'. */
static inline const struct cpumask *cpumask_of_node(int node)
{
- return node_to_cpumask_map[node];
+ return (node < 0 || node >= nr_node_ids) ? cpu_online_mask :
+ node_to_cpumask_map[node];
}
#endif

2010-01-02 06:18:55

by David John

[permalink] [raw]
Subject: Re: [Regression] 2.6.33-rc2 - pci: Commit e0cd516 causes OOPS

On 01/02/2010 03:52 AM, Yinghai Lu wrote:
> On Fri, Jan 1, 2010 at 11:50 AM, David John <[email protected]> wrote:
>> Hi All,
>>
>> Commit e0cd516 causes an null pointer dereference when reading from the
>> sysfs attributes local_cpu* on Intel machines with no ACPI NUMA
>> proximity info, since dev->numa_node gets set to -1 for all PCI devices,
>> which then gets passed to cpumask_of_node.
>>
>> The patch following this mail fixes the problem for x86. Perhaps a more
>> thorough solution would be to fix the PCI layer to set the node
>> information for devices to zero rather than -1 (Since if CONFIG_NUMA=y
>> we have node 0)? I don't know whether it is safe / correct to do this.
>
> no.
>
> 1. -1, mean calling code will use node that code is running on.
> 2. the system that have two or more nodes, and more peer root buses.
> if the first node doesn't have RAM installed, no node0 then.
>
> YH
>

Oh I see. Thanks.

Regards,
David.

2010-01-04 04:28:40

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH v2] Check the node argument passed to cpumask_of_node

On Sat, 2 Jan 2010 04:45:47 pm David John wrote:
> Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus' NUMA information"
> causes an null pointer dereference when reading from the sysfs attributes local_cpu*
> on Intel machines with no ACPI NUMA proximity info, since dev->numa_node gets set to -1
> for all PCI devices, which then gets passed to cpumask_of_node.
>
> Ensure that the node value is valid.

This only works for x86, and only for !CONFIG_DEBUG_PER_CPU_MAPS.

I suggest fixing the callers introduced in e0cd516 for the moment,
then if you feel enthused, change the semantics of cpumask_of_node and
remove the checks from the various callers.

(And please only check for -1: it has a special meaning, unlike other
invalid numbers which indicate a bug).

Thanks,
Rusty.

2010-01-04 05:14:41

by Jin Dongming

[permalink] [raw]
Subject: Re: [PATCH v2] Check the node argument passed to cpumask_of_node

Hi, David

This problem also happened when the CONFIG_DEBUG_PER_CPU_MAPS was used,
so how about modifying the code for it working well?

Best Regards,
Jin Dongming

David John wrote:
> Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus' NUMA information"
> causes an null pointer dereference when reading from the sysfs attributes local_cpu*
> on Intel machines with no ACPI NUMA proximity info, since dev->numa_node gets set to -1
> for all PCI devices, which then gets passed to cpumask_of_node.
>
> Ensure that the node value is valid.
>
> Signed-off-by: David John <[email protected]>
>
> diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
> index c5087d7..1141a6e 100644
> --- a/arch/x86/include/asm/topology.h
> +++ b/arch/x86/include/asm/topology.h
> @@ -99,7 +99,8 @@ extern const struct cpumask *cpumask_of_node(int node);
> /* Returns a pointer to the cpumask of CPUs on Node 'node'. */
> static inline const struct cpumask *cpumask_of_node(int node)
> {
> - return node_to_cpumask_map[node];
> + return (node < 0 || node >= nr_node_ids) ? cpu_online_mask :
> + node_to_cpumask_map[node];
> }
> #endif
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2010-01-04 11:43:14

by David John

[permalink] [raw]
Subject: Re: [PATCH v2] Check the node argument passed to cpumask_of_node

On 01/04/2010 09:58 AM, Rusty Russell wrote:
> On Sat, 2 Jan 2010 04:45:47 pm David John wrote:
>> Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus' NUMA information"
>> causes an null pointer dereference when reading from the sysfs attributes local_cpu*
>> on Intel machines with no ACPI NUMA proximity info, since dev->numa_node gets set to -1
>> for all PCI devices, which then gets passed to cpumask_of_node.
>>
>> Ensure that the node value is valid.
>
> This only works for x86, and only for !CONFIG_DEBUG_PER_CPU_MAPS.
>
> I suggest fixing the callers introduced in e0cd516 for the moment,
> then if you feel enthused, change the semantics of cpumask_of_node and
> remove the checks from the various callers.

Okay, I'll do as you suggested.

Regards,
David.

2010-01-04 14:59:36

by David John

[permalink] [raw]
Subject: [PATCH v3] Check the node argument passed to cpumask_of_node

Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus' NUMA information"
causes an null pointer dereference when reading from the sysfs attributes local_cpu*
on Intel machines with no ACPI NUMA proximity info, since dev->numa_node gets set to -1
for all PCI devices, which then gets passed to cpumask_of_node.

Add a check to prevent this.

Signed-off-by: David John <[email protected]>

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index c5df94e..807224e 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -75,7 +75,8 @@ static ssize_t local_cpus_show(struct device *dev,
int len;

#ifdef CONFIG_NUMA
- mask = cpumask_of_node(dev_to_node(dev));
+ mask = (dev_to_node(dev) == -1) ? cpu_online_mask :
+ cpumask_of_node(dev_to_node(dev));
#else
mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
#endif
@@ -93,7 +94,8 @@ static ssize_t local_cpulist_show(struct device *dev,
int len;

#ifdef CONFIG_NUMA
- mask = cpumask_of_node(dev_to_node(dev));
+ mask = (dev_to_node(dev) == -1) ? cpu_online_mask :
+ cpumask_of_node(dev_to_node(dev));
#else
mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
#endif

2010-01-04 17:18:53

by Jesse Barnes

[permalink] [raw]
Subject: Re: [PATCH v3] Check the node argument passed to cpumask_of_node

On Mon, 4 Jan 2010 20:28:57 +0530
David John <[email protected]> wrote:

> Commit e0cd516 "PCI: derive nearby CPUs from device's instead of bus'
> NUMA information" causes an null pointer dereference when reading
> from the sysfs attributes local_cpu* on Intel machines with no ACPI
> NUMA proximity info, since dev->numa_node gets set to -1 for all PCI
> devices, which then gets passed to cpumask_of_node.
>
> Add a check to prevent this.
>
> Signed-off-by: David John <[email protected]>
>
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index c5df94e..807224e 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -75,7 +75,8 @@ static ssize_t local_cpus_show(struct device *dev,
> int len;
>
> #ifdef CONFIG_NUMA
> - mask = cpumask_of_node(dev_to_node(dev));
> + mask = (dev_to_node(dev) == -1) ? cpu_online_mask :
> +
> cpumask_of_node(dev_to_node(dev)); #else
> mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
> #endif
> @@ -93,7 +94,8 @@ static ssize_t local_cpulist_show(struct device
> *dev, int len;
>
> #ifdef CONFIG_NUMA
> - mask = cpumask_of_node(dev_to_node(dev));
> + mask = (dev_to_node(dev) == -1) ? cpu_online_mask :
> +
> cpumask_of_node(dev_to_node(dev)); #else
> mask = cpumask_of_pcibus(to_pci_dev(dev)->bus);
> #endif
>

Applied to my for-linus branch, thanks David.

--
Jesse Barnes, Intel Open Source Technology Center