2013-07-07 22:26:07

by L A Walsh

[permalink] [raw]
Subject: Fwd: BUG: key ffff880c1148c478 not in .data! (V3.10.0)

Also am seeing this for the first time:

(don't know, but seems unlikely to be related to
https://patchwork.kernel.org/patch/87359/
Yet it is the only hit I found for the same message.


Looks like it's back to a more stable 3.9.8...
(*sigh*)


BUG: key ffff880c1148c478 not in .data!
[ 4.429474] ------------[ cut here ]------------
[ 4.434236] WARNING: at kernel/lockdep.c:2987
lockdep_init_map+0x45e/0x490()
[ 4.441414] DEBUG_LOCKS_WARN_ON(1)
[ 4.444684] Modules linked in:
[ 4.448168] usb 1-3.2: new low-speed USB device number 3 using ehci-pci
[ 4.454975] CPU: 10 PID: 1 Comm: swapper/0 Tainted: G I
3.10.0-Isht-Van #1
[ 4.462862] Hardware name: Dell Inc. PowerEdge T610/0CX0R0, BIOS
6.3.0 07/24/2012
[ 4.470475] 0000000000000009 ffff880c13175a70 ffffffff815bb279
ffff880c13175aa8
[ 4.478221] ffffffff8104641c ffff880c11c12130 ffff880c1148c478
0000000000000000
[ 4.485988] ffff880c11c12058 ffff880c12386180 ffff880c13175b08
ffffffff81046487
[ 4.493800] Call Trace:
[ 4.496472] [<ffffffff815bb279>] dump_stack+0x19/0x1b
[ 4.501776] [<ffffffff8104641c>] warn_slowpath_common+0x5c/0x80
[ 4.507917] [<ffffffff81046487>] warn_slowpath_fmt+0x47/0x50
[ 4.513790] [<ffffffff8109c1fe>] lockdep_init_map+0x45e/0x490
[ 4.519775] [<ffffffff8109b12d>] debug_mutex_init+0x2d/0x40
[ 4.525567] [<ffffffff8106ef61>] __mutex_init+0x51/0x60
[ 4.531017] [<ffffffff813a1618>] bus_register+0x158/0x2c0
[ 4.536646] [<ffffffff814c6dc3>] edac_create_sysfs_mci_device+0x53/0x540
[ 4.542512] usb 1-3.2: New USB device found, idVendor=413c,
idProduct=2003
[ 4.542513] usb 1-3.2: New USB device strings: Mfr=1, Product=2,
SerialNumber=0
[ 4.542514] usb 1-3.2: Product: Dell USB Keyboard
[ 4.542515] usb 1-3.2: Manufacturer: Dell
[ 4.567013] [<ffffffff814c4f13>] edac_mc_add_mc+0x103/0x270
[ 4.572804] [<ffffffff814cac30>] i7core_probe+0x530/0xe70
[ 4.578435] [<ffffffff81313479>] local_pci_probe+0x39/0x70
[ 4.584136] [<ffffffff81313ca1>] pci_device_probe+0x111/0x120
[ 4.590107] [<ffffffff813a27b1>] driver_probe_device+0x71/0x230
[ 4.596251] [<ffffffff813a2a3b>] __driver_attach+0x8b/0x90
[ 4.601955] [<ffffffff813a29b0>] ? __device_attach+0x40/0x40
[ 4.607836] [<ffffffff813a09a3>] bus_for_each_dev+0x63/0xa0
[ 4.613625] [<ffffffff813a2309>] driver_attach+0x19/0x20
[ 4.619167] [<ffffffff813a1f28>] bus_add_driver+0x1d8/0x270
[ 4.624963] [<ffffffff813a2e4c>] driver_register+0x6c/0x150
[ 4.630756] [<ffffffff81313308>] __pci_register_driver+0x58/0x60
[ 4.636990] [<ffffffff81cddf00>] ? edac_init+0x67/0x67
[ 4.642343] [<ffffffff81cddf38>] i7core_init+0x38/0xb7
[ 4.647702] [<ffffffff81cddf00>] ? edac_init+0x67/0x67
[ 4.653065] [<ffffffff810002ca>] do_one_initcall+0xfa/0x150
[ 4.658859] [<ffffffff81ca6eee>] kernel_init_freeable+0x15a/0x1da
[ 4.665176] [<ffffffff81ca6807>] ? do_early_param+0x88/0x88
[ 4.673917] [<ffffffff815a95b0>] ? rest_init+0xd0/0xd0
[ 4.679278] [<ffffffff815a95b9>] kernel_init+0x9/0x180
[ 4.684640] [<ffffffff815c962c>] ret_from_fork+0x7c/0xb0
[ 4.690174] [<ffffffff815a95b0>] ? rest_init+0xd0/0xd0
[ 4.695534] ---[ end trace 9ddab1480c5d91dc ]---
[ 4.700444] EDAC MC1: Giving out device to 'i7core_edac.c' 'i7 core
#1': DEV 0000:fe:03.0
[ 4.708791] EDAC PCI0: Giving out device to module 'i7core_edac'
controller 'EDAC PCI controller': DEV '0000:fe:03.0' (POLLED)
[ 4.720714] BUG: key ffff880c11a1c478 not in .data!




2013-07-12 02:19:20

by Ming Lei

[permalink] [raw]
Subject: Re: BUG: key ffff880c1148c478 not in .data! (V3.10.0)

On Mon, Jul 8, 2013 at 6:25 AM, Linda Walsh <[email protected]> wrote:
> Also am seeing this for the first time:
>
> (don't know, but seems unlikely to be related to
> https://patchwork.kernel.org/patch/87359/
> Yet it is the only hit I found for the same message.
>
>
> Looks like it's back to a more stable 3.9.8...
> (*sigh*)
>
>
> BUG: key ffff880c1148c478 not in .data!
> [ 4.429474] ------------[ cut here ]------------
> [ 4.434236] WARNING: at kernel/lockdep.c:2987
> lockdep_init_map+0x45e/0x490()
> [ 4.441414] DEBUG_LOCKS_WARN_ON(1)
> [ 4.444684] Modules linked in:
> [ 4.448168] usb 1-3.2: new low-speed USB device number 3 using ehci-pci
> [ 4.454975] CPU: 10 PID: 1 Comm: swapper/0 Tainted: G I
> 3.10.0-Isht-Van #1
> [ 4.462862] Hardware name: Dell Inc. PowerEdge T610/0CX0R0, BIOS 6.3.0
> 07/24/2012
> [ 4.470475] 0000000000000009 ffff880c13175a70 ffffffff815bb279
> ffff880c13175aa8
> [ 4.478221] ffffffff8104641c ffff880c11c12130 ffff880c1148c478
> 0000000000000000
> [ 4.485988] ffff880c11c12058 ffff880c12386180 ffff880c13175b08
> ffffffff81046487
> [ 4.493800] Call Trace:
> [ 4.496472] [<ffffffff815bb279>] dump_stack+0x19/0x1b
> [ 4.501776] [<ffffffff8104641c>] warn_slowpath_common+0x5c/0x80
> [ 4.507917] [<ffffffff81046487>] warn_slowpath_fmt+0x47/0x50
> [ 4.513790] [<ffffffff8109c1fe>] lockdep_init_map+0x45e/0x490
> [ 4.519775] [<ffffffff8109b12d>] debug_mutex_init+0x2d/0x40
> [ 4.525567] [<ffffffff8106ef61>] __mutex_init+0x51/0x60
> [ 4.531017] [<ffffffff813a1618>] bus_register+0x158/0x2c0
> [ 4.536646] [<ffffffff814c6dc3>] edac_create_sysfs_mci_device+0x53/0x540

Looks because that bus_type of 'struct mem_ctl_info' is allocated dynamically
instead of being kept it in .data statically.

Thanks,
--
Ming Lei

2013-07-12 08:04:36

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: BUG: key ffff880c1148c478 not in .data! (V3.10.0)

On 2013.07.12 at 10:19 +0800, Ming Lei wrote:
> On Mon, Jul 8, 2013 at 6:25 AM, Linda Walsh <[email protected]> wrote:
> > Also am seeing this for the first time:
> >
> > (don't know, but seems unlikely to be related to
> > https://patchwork.kernel.org/patch/87359/
> > Yet it is the only hit I found for the same message.
> >
> >
> > Looks like it's back to a more stable 3.9.8...
> > (*sigh*)
> >
> >
> > BUG: key ffff880c1148c478 not in .data!
> > [ 4.429474] ------------[ cut here ]------------
> > [ 4.434236] WARNING: at kernel/lockdep.c:2987
> > lockdep_init_map+0x45e/0x490()
> > [ 4.441414] DEBUG_LOCKS_WARN_ON(1)
> > [ 4.444684] Modules linked in:
> > [ 4.448168] usb 1-3.2: new low-speed USB device number 3 using ehci-pci
> > [ 4.454975] CPU: 10 PID: 1 Comm: swapper/0 Tainted: G I
> > 3.10.0-Isht-Van #1
> > [ 4.462862] Hardware name: Dell Inc. PowerEdge T610/0CX0R0, BIOS 6.3.0
> > 07/24/2012
> > [ 4.470475] 0000000000000009 ffff880c13175a70 ffffffff815bb279
> > ffff880c13175aa8
> > [ 4.478221] ffffffff8104641c ffff880c11c12130 ffff880c1148c478
> > 0000000000000000
> > [ 4.485988] ffff880c11c12058 ffff880c12386180 ffff880c13175b08
> > ffffffff81046487
> > [ 4.493800] Call Trace:
> > [ 4.496472] [<ffffffff815bb279>] dump_stack+0x19/0x1b
> > [ 4.501776] [<ffffffff8104641c>] warn_slowpath_common+0x5c/0x80
> > [ 4.507917] [<ffffffff81046487>] warn_slowpath_fmt+0x47/0x50
> > [ 4.513790] [<ffffffff8109c1fe>] lockdep_init_map+0x45e/0x490
> > [ 4.519775] [<ffffffff8109b12d>] debug_mutex_init+0x2d/0x40
> > [ 4.525567] [<ffffffff8106ef61>] __mutex_init+0x51/0x60
> > [ 4.531017] [<ffffffff813a1618>] bus_register+0x158/0x2c0
> > [ 4.536646] [<ffffffff814c6dc3>] edac_create_sysfs_mci_device+0x53/0x540
>
> Looks because that bus_type of 'struct mem_ctl_info' is allocated dynamically
> instead of being kept it in .data statically.

Mauro said he will fix this in the coming weeks:

http://article.gmane.org/gmane.linux.kernel/1522719

--
Markus

2013-07-12 13:41:39

by Borislav Petkov

[permalink] [raw]
Subject: Re: BUG: key ffff880c1148c478 not in .data! (V3.10.0)

On Fri, Jul 12, 2013 at 10:04:28AM +0200, Markus Trippelsdorf wrote:
> Mauro said he will fix this in the coming weeks:
>
> http://article.gmane.org/gmane.linux.kernel/1522719

Here's a possible fix which works fine here. Markus, if you could verify
please...

I probably should also tag it for stable since the issue is in 3.10.
I'll leave it in -next a bit though, to have some coverage.

--
From: Borislav Petkov <[email protected]>
Date: Fri, 12 Jul 2013 10:53:38 +0200
Subject: [PATCH] EDAC: Fix lockdep splat

Fix the following:

BUG: key ffff88043bdd0330 not in .data!
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
Call Trace:
dump_stack
warn_slowpath_common
warn_slowpath_fmt
lockdep_init_map
? trace_hardirqs_on_caller
? trace_hardirqs_on
debug_mutex_init
__mutex_init
bus_register
edac_create_sysfs_mci_device
edac_mc_add_mc
sbridge_probe
pci_device_probe
driver_probe_device
__driver_attach
? driver_probe_device
bus_for_each_dev
driver_attach
bus_add_driver
driver_register
__pci_register_driver
? 0xffffffffa0010fff
sbridge_init
? 0xffffffffa0010fff
do_one_initcall
load_module
? unset_module_init_ro_nx
SyS_init_module
tracesys
---[ end trace d24a70b0d3ddf733 ]---
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
EDAC sbridge: Driver loaded.

What happens is that bus_register needs a statically allocated lock_key
because it is handed in to lockdep. However, struct mem_ctl_info embeds
struct bus_type (the whole struct, not a pointer to it) which gets
dynamically allocated.

Fix this by using a statically allocated struct bus_type for the MC bus.

Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Markus Trippelsdorf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
---
drivers/edac/edac_mc.c | 6 ++++++
drivers/edac/edac_mc_sysfs.c | 28 +++++++++++++++-------------
drivers/edac/i5100_edac.c | 2 +-
include/linux/edac.h | 2 +-
4 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 27e86d938262..2179f48cfe16 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -48,6 +48,10 @@ static LIST_HEAD(mc_devices);
*/
static void const *edac_mc_owner;

+static struct bus_type mc_bus = {
+ .dev_name = "edac_mc",
+};
+
unsigned edac_dimm_info_location(struct dimm_info *dimm, char *buf,
unsigned len)
{
@@ -762,6 +766,8 @@ int edac_mc_add_mc(struct mem_ctl_info *mci)
/* set load time so that error rate can be tracked */
mci->start_time = jiffies;

+ mci->bus = &mc_bus;
+
if (edac_create_sysfs_mci_device(mci)) {
edac_mc_printk(mci, KERN_WARNING,
"failed to create sysfs device\n");
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index 67610a6ebf87..c4d700a577d2 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -370,7 +370,7 @@ static int edac_create_csrow_object(struct mem_ctl_info *mci,
return -ENODEV;

csrow->dev.type = &csrow_attr_type;
- csrow->dev.bus = &mci->bus;
+ csrow->dev.bus = mci->bus;
device_initialize(&csrow->dev);
csrow->dev.parent = &mci->dev;
csrow->mci = mci;
@@ -605,7 +605,7 @@ static int edac_create_dimm_object(struct mem_ctl_info *mci,
dimm->mci = mci;

dimm->dev.type = &dimm_attr_type;
- dimm->dev.bus = &mci->bus;
+ dimm->dev.bus = mci->bus;
device_initialize(&dimm->dev);

dimm->dev.parent = &mci->dev;
@@ -975,11 +975,13 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
* The memory controller needs its own bus, in order to avoid
* namespace conflicts at /sys/bus/edac.
*/
- mci->bus.name = kasprintf(GFP_KERNEL, "mc%d", mci->mc_idx);
- if (!mci->bus.name)
+ mci->bus->name = kasprintf(GFP_KERNEL, "mc%d", mci->mc_idx);
+ if (!mci->bus->name)
return -ENOMEM;
- edac_dbg(0, "creating bus %s\n", mci->bus.name);
- err = bus_register(&mci->bus);
+
+ edac_dbg(0, "creating bus %s\n", mci->bus->name);
+
+ err = bus_register(mci->bus);
if (err < 0)
return err;

@@ -988,7 +990,7 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
device_initialize(&mci->dev);

mci->dev.parent = mci_pdev;
- mci->dev.bus = &mci->bus;
+ mci->dev.bus = mci->bus;
dev_set_name(&mci->dev, "mc%d", mci->mc_idx);
dev_set_drvdata(&mci->dev, mci);
pm_runtime_forbid(&mci->dev);
@@ -997,8 +999,8 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
err = device_add(&mci->dev);
if (err < 0) {
edac_dbg(1, "failure: create device %s\n", dev_name(&mci->dev));
- bus_unregister(&mci->bus);
- kfree(mci->bus.name);
+ bus_unregister(mci->bus);
+ kfree(mci->bus->name);
return err;
}

@@ -1064,8 +1066,8 @@ fail:
}
fail2:
device_unregister(&mci->dev);
- bus_unregister(&mci->bus);
- kfree(mci->bus.name);
+ bus_unregister(mci->bus);
+ kfree(mci->bus->name);
return err;
}

@@ -1098,8 +1100,8 @@ void edac_unregister_sysfs(struct mem_ctl_info *mci)
{
edac_dbg(1, "Unregistering device %s\n", dev_name(&mci->dev));
device_unregister(&mci->dev);
- bus_unregister(&mci->bus);
- kfree(mci->bus.name);
+ bus_unregister(mci->bus);
+ kfree(mci->bus->name);
}

static void mc_attr_release(struct device *dev)
diff --git a/drivers/edac/i5100_edac.c b/drivers/edac/i5100_edac.c
index 1b635178cc44..157b934e8ce3 100644
--- a/drivers/edac/i5100_edac.c
+++ b/drivers/edac/i5100_edac.c
@@ -974,7 +974,7 @@ static int i5100_setup_debugfs(struct mem_ctl_info *mci)
if (!i5100_debugfs)
return -ENODEV;

- priv->debugfs = debugfs_create_dir(mci->bus.name, i5100_debugfs);
+ priv->debugfs = debugfs_create_dir(mci->bus->name, i5100_debugfs);

if (!priv->debugfs)
return -ENOMEM;
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 0b763276f619..a9cc845f9762 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -622,7 +622,7 @@ struct edac_raw_error_desc {
*/
struct mem_ctl_info {
struct device dev;
- struct bus_type bus;
+ struct bus_type *bus;

struct list_head link; /* for global list of mem_ctl_info structs */

--
1.8.3


--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-07-12 13:58:07

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: BUG: key ffff880c1148c478 not in .data! (V3.10.0)

Em 12-07-2013 10:41, Borislav Petkov escreveu:
> On Fri, Jul 12, 2013 at 10:04:28AM +0200, Markus Trippelsdorf wrote:
>> Mauro said he will fix this in the coming weeks:
>>
>> http://article.gmane.org/gmane.linux.kernel/1522719
> Here's a possible fix which works fine here. Markus, if you could verify
> please...
>
> I probably should also tag it for stable since the issue is in 3.10.
> I'll leave it in -next a bit though, to have some coverage.
>
> --
> From: Borislav Petkov <[email protected]>
> Date: Fri, 12 Jul 2013 10:53:38 +0200
> Subject: [PATCH] EDAC: Fix lockdep splat
>
> Fix the following:
>
> BUG: key ffff88043bdd0330 not in .data!
> ------------[ cut here ]------------
> WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
> DEBUG_LOCKS_WARN_ON(1)
> Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
> CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
> Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
> 0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
> ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
> ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
> Call Trace:
> dump_stack
> warn_slowpath_common
> warn_slowpath_fmt
> lockdep_init_map
> ? trace_hardirqs_on_caller
> ? trace_hardirqs_on
> debug_mutex_init
> __mutex_init
> bus_register
> edac_create_sysfs_mci_device
> edac_mc_add_mc
> sbridge_probe
> pci_device_probe
> driver_probe_device
> __driver_attach
> ? driver_probe_device
> bus_for_each_dev
> driver_attach
> bus_add_driver
> driver_register
> __pci_register_driver
> ? 0xffffffffa0010fff
> sbridge_init
> ? 0xffffffffa0010fff
> do_one_initcall
> load_module
> ? unset_module_init_ro_nx
> SyS_init_module
> tracesys
> ---[ end trace d24a70b0d3ddf733 ]---
> EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
> EDAC sbridge: Driver loaded.
>
> What happens is that bus_register needs a statically allocated lock_key
> because it is handed in to lockdep. However, struct mem_ctl_info embeds
> struct bus_type (the whole struct, not a pointer to it) which gets
> dynamically allocated.
>
> Fix this by using a statically allocated struct bus_type for the MC bus.
>
> Cc: Mauro Carvalho Chehab <[email protected]>
> Cc: Markus Trippelsdorf <[email protected]>
> Signed-off-by: Borislav Petkov <[email protected]>
> ---
> drivers/edac/edac_mc.c | 6 ++++++
> drivers/edac/edac_mc_sysfs.c | 28 +++++++++++++++-------------
> drivers/edac/i5100_edac.c | 2 +-
> include/linux/edac.h | 2 +-
> 4 files changed, 23 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index 27e86d938262..2179f48cfe16 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -48,6 +48,10 @@ static LIST_HEAD(mc_devices);
> */
> static void const *edac_mc_owner;
>
> +static struct bus_type mc_bus = {
> + .dev_name = "edac_mc",
> +};
> +
> unsigned edac_dimm_info_location(struct dimm_info *dimm, char *buf,
> unsigned len)
> {
> @@ -762,6 +766,8 @@ int edac_mc_add_mc(struct mem_ctl_info *mci)
> /* set load time so that error rate can be tracked */
> mci->start_time = jiffies;
>
> + mci->bus = &mc_bus;
> +
> if (edac_create_sysfs_mci_device(mci)) {
> edac_mc_printk(mci, KERN_WARNING,
> "failed to create sysfs device\n");
> diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
> index 67610a6ebf87..c4d700a577d2 100644
> --- a/drivers/edac/edac_mc_sysfs.c
> +++ b/drivers/edac/edac_mc_sysfs.c
> @@ -370,7 +370,7 @@ static int edac_create_csrow_object(struct mem_ctl_info *mci,
> return -ENODEV;
>
> csrow->dev.type = &csrow_attr_type;
> - csrow->dev.bus = &mci->bus;
> + csrow->dev.bus = mci->bus;
> device_initialize(&csrow->dev);
> csrow->dev.parent = &mci->dev;
> csrow->mci = mci;
> @@ -605,7 +605,7 @@ static int edac_create_dimm_object(struct mem_ctl_info *mci,
> dimm->mci = mci;
>
> dimm->dev.type = &dimm_attr_type;
> - dimm->dev.bus = &mci->bus;
> + dimm->dev.bus = mci->bus;
> device_initialize(&dimm->dev);
>
> dimm->dev.parent = &mci->dev;
> @@ -975,11 +975,13 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
> * The memory controller needs its own bus, in order to avoid
> * namespace conflicts at /sys/bus/edac.
> */
> - mci->bus.name = kasprintf(GFP_KERNEL, "mc%d", mci->mc_idx);
> - if (!mci->bus.name)
> + mci->bus->name = kasprintf(GFP_KERNEL, "mc%d", mci->mc_idx);
> + if (!mci->bus->name)
> return -ENOMEM;

This will be overriding the content of the static var mc_bus every for every
new memory controller. Are you sure that bus.name is only used on register,
or if its contents is stored somewhere?

Otherwise, you may have troubles at module removal and/or on other places.

Regards,
Mauro
> - edac_dbg(0, "creating bus %s\n", mci->bus.name);
> - err = bus_register(&mci->bus);
> +
> + edac_dbg(0, "creating bus %s\n", mci->bus->name);
> +
> + err = bus_register(mci->bus);
> if (err < 0)
> return err;
>
> @@ -988,7 +990,7 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
> device_initialize(&mci->dev);
>
> mci->dev.parent = mci_pdev;
> - mci->dev.bus = &mci->bus;
> + mci->dev.bus = mci->bus;
> dev_set_name(&mci->dev, "mc%d", mci->mc_idx);
> dev_set_drvdata(&mci->dev, mci);
> pm_runtime_forbid(&mci->dev);
> @@ -997,8 +999,8 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
> err = device_add(&mci->dev);
> if (err < 0) {
> edac_dbg(1, "failure: create device %s\n", dev_name(&mci->dev));
> - bus_unregister(&mci->bus);
> - kfree(mci->bus.name);
> + bus_unregister(mci->bus);
> + kfree(mci->bus->name);
> return err;
> }
>
> @@ -1064,8 +1066,8 @@ fail:
> }
> fail2:
> device_unregister(&mci->dev);
> - bus_unregister(&mci->bus);
> - kfree(mci->bus.name);
> + bus_unregister(mci->bus);
> + kfree(mci->bus->name);
> return err;
> }
>
> @@ -1098,8 +1100,8 @@ void edac_unregister_sysfs(struct mem_ctl_info *mci)
> {
> edac_dbg(1, "Unregistering device %s\n", dev_name(&mci->dev));
> device_unregister(&mci->dev);
> - bus_unregister(&mci->bus);
> - kfree(mci->bus.name);
> + bus_unregister(mci->bus);
> + kfree(mci->bus->name);
> }
>
> static void mc_attr_release(struct device *dev)
> diff --git a/drivers/edac/i5100_edac.c b/drivers/edac/i5100_edac.c
> index 1b635178cc44..157b934e8ce3 100644
> --- a/drivers/edac/i5100_edac.c
> +++ b/drivers/edac/i5100_edac.c
> @@ -974,7 +974,7 @@ static int i5100_setup_debugfs(struct mem_ctl_info *mci)
> if (!i5100_debugfs)
> return -ENODEV;
>
> - priv->debugfs = debugfs_create_dir(mci->bus.name, i5100_debugfs);
> + priv->debugfs = debugfs_create_dir(mci->bus->name, i5100_debugfs);
>
> if (!priv->debugfs)
> return -ENOMEM;
> diff --git a/include/linux/edac.h b/include/linux/edac.h
> index 0b763276f619..a9cc845f9762 100644
> --- a/include/linux/edac.h
> +++ b/include/linux/edac.h
> @@ -622,7 +622,7 @@ struct edac_raw_error_desc {
> */
> struct mem_ctl_info {
> struct device dev;
> - struct bus_type bus;
> + struct bus_type *bus;
>
> struct list_head link; /* for global list of mem_ctl_info structs */
>

2013-07-12 14:21:29

by Borislav Petkov

[permalink] [raw]
Subject: Re: BUG: key ffff880c1148c478 not in .data! (V3.10.0)

On Fri, Jul 12, 2013 at 10:57:41AM -0300, Mauro Carvalho Chehab wrote:
> This will be overriding the content of the static var mc_bus every for
> every new memory controller. Are you sure that bus.name is only used
> on register, or if its contents is stored somewhere?

bus_register does kobject_set_name which copies bus->name, for example,
but I didn't look exhaustively.

Just to be on the safe side, I should probably do a

static const char **bus_names = { "mc0", "mc1", ..., "mc7" };

and use it. Are 8 enough for your edac drivers too?

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-07-12 14:28:48

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: BUG: key ffff880c1148c478 not in .data! (V3.10.0)

On 2013.07.12 at 15:41 +0200, Borislav Petkov wrote:
> On Fri, Jul 12, 2013 at 10:04:28AM +0200, Markus Trippelsdorf wrote:
> > Mauro said he will fix this in the coming weeks:
> >
> > http://article.gmane.org/gmane.linux.kernel/1522719
>
> Here's a possible fix which works fine here. Markus, if you could verify
> please...

Yes, it's working fine here, too. Thanks Boris.

--
Markus

2013-07-12 14:36:56

by Borislav Petkov

[permalink] [raw]
Subject: Re: BUG: key ffff880c1148c478 not in .data! (V3.10.0)

On Fri, Jul 12, 2013 at 04:28:44PM +0200, Markus Trippelsdorf wrote:
> Yes, it's working fine here, too. Thanks Boris.

Thanks Markus!

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-07-12 16:14:01

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: BUG: key ffff880c1148c478 not in .data! (V3.10.0)

Em Fri, 12 Jul 2013 16:21:06 +0200
Borislav Petkov <[email protected]> escreveu:

> On Fri, Jul 12, 2013 at 10:57:41AM -0300, Mauro Carvalho Chehab wrote:
> > This will be overriding the content of the static var mc_bus every for
> > every new memory controller. Are you sure that bus.name is only used
> > on register, or if its contents is stored somewhere?
>
> bus_register does kobject_set_name which copies bus->name, for example,

Ok, so, it could be safe.

> but I didn't look exhaustively.

Did you try to remove and reinsert the edac driver a few times, on a
multi-memory controller machine? The bus nodes got created properly?
>
> Just to be on the safe side, I should probably do a
>
> static const char **bus_names = { "mc0", "mc1", ..., "mc7" };

You would likely to use an array for the bus_type too, if reusing
the static one is an issue.

> and use it. Are 8 enough for your edac drivers too?

With edac_ghes, I suspect that the worse case, on Intel side, is the
Nehalem/Sandy Bridge/Ivy Bridge EX machines.

Tony,

What would be a reasonable maximum limit for the number of memory
controllers, on a -EX machine?

Cheers,
Mauro

2013-07-12 16:19:30

by Tony Luck

[permalink] [raw]
Subject: RE: BUG: key ffff880c1148c478 not in .data! (V3.10.0)

> What would be a reasonable maximum limit for the number of memory
> controllers, on a -EX machine?

Westmere-EX has one memory controller per socket ... and there are glueless systems up to 8 sockets. So 8 there. Not sure if any OEM is building larger machines with a node controller (SGI? Not sure if they build their behemoths from -EP or -EX parts).

Ivy Bridge ups the ante with two memory controllers on a socket. So plan on doubling soon.

-Tony

2013-07-18 16:42:23

by Borislav Petkov

[permalink] [raw]
Subject: Re: BUG: key ffff880c1148c478 not in .data! (V3.10.0)

On Fri, Jul 12, 2013 at 04:19:24PM +0000, Luck, Tony wrote:
> > What would be a reasonable maximum limit for the number of memory
> > controllers, on a -EX machine?
>
> Westmere-EX has one memory controller per socket ... and there are
> glueless systems up to 8 sockets. So 8 there. Not sure if any OEM is
> building larger machines with a node controller (SGI? Not sure if they
> build their behemoths from -EP or -EX parts).
>
> Ivy Bridge ups the ante with two memory controllers on a socket. So
> plan on doubling soon.

Let's give it a second try, 16 memory controllers max:

---
>From 18fec2fd4279640b9f471c28aa3a5dc8be104273 Mon Sep 17 00:00:00 2001
From: Borislav Petkov <[email protected]>
Date: Fri, 12 Jul 2013 10:53:38 +0200
Subject: [PATCH] EDAC: Fix lockdep splat

Fix the following:

BUG: key ffff88043bdd0330 not in .data!
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
Call Trace:
dump_stack
warn_slowpath_common
warn_slowpath_fmt
lockdep_init_map
? trace_hardirqs_on_caller
? trace_hardirqs_on
debug_mutex_init
__mutex_init
bus_register
edac_create_sysfs_mci_device
edac_mc_add_mc
sbridge_probe
pci_device_probe
driver_probe_device
__driver_attach
? driver_probe_device
bus_for_each_dev
driver_attach
bus_add_driver
driver_register
__pci_register_driver
? 0xffffffffa0010fff
sbridge_init
? 0xffffffffa0010fff
do_one_initcall
load_module
? unset_module_init_ro_nx
SyS_init_module
tracesys
---[ end trace d24a70b0d3ddf733 ]---
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
EDAC sbridge: Driver loaded.

What happens is that bus_register needs a statically allocated lock_key
because it is handed in to lockdep. However, struct mem_ctl_info embeds
struct bus_type (the whole struct, not a pointer to it) which gets
dynamically allocated.

Fix this by using a statically allocated struct bus_type for the MC bus.

Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Markus Trippelsdorf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
---
drivers/edac/edac_mc.c | 6 ++++++
drivers/edac/edac_mc_sysfs.c | 28 +++++++++++++++-------------
drivers/edac/i5100_edac.c | 2 +-
include/linux/edac.h | 7 ++++++-
4 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 27e86d938262..429e971e02d7 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -48,6 +48,8 @@ static LIST_HEAD(mc_devices);
*/
static void const *edac_mc_owner;

+static struct bus_type mc_bus[EDAC_MAX_MCS];
+
unsigned edac_dimm_info_location(struct dimm_info *dimm, char *buf,
unsigned len)
{
@@ -762,6 +764,10 @@ int edac_mc_add_mc(struct mem_ctl_info *mci)
/* set load time so that error rate can be tracked */
mci->start_time = jiffies;

+ BUG_ON(mci->mc_idx >= EDAC_MAX_MCS);
+
+ mci->bus = &mc_bus[mci->mc_idx];
+
if (edac_create_sysfs_mci_device(mci)) {
edac_mc_printk(mci, KERN_WARNING,
"failed to create sysfs device\n");
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index ef15a7e613bc..e7c32c4f7837 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -370,7 +370,7 @@ static int edac_create_csrow_object(struct mem_ctl_info *mci,
return -ENODEV;

csrow->dev.type = &csrow_attr_type;
- csrow->dev.bus = &mci->bus;
+ csrow->dev.bus = mci->bus;
device_initialize(&csrow->dev);
csrow->dev.parent = &mci->dev;
csrow->mci = mci;
@@ -605,7 +605,7 @@ static int edac_create_dimm_object(struct mem_ctl_info *mci,
dimm->mci = mci;

dimm->dev.type = &dimm_attr_type;
- dimm->dev.bus = &mci->bus;
+ dimm->dev.bus = mci->bus;
device_initialize(&dimm->dev);

dimm->dev.parent = &mci->dev;
@@ -975,11 +975,13 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
* The memory controller needs its own bus, in order to avoid
* namespace conflicts at /sys/bus/edac.
*/
- mci->bus.name = kasprintf(GFP_KERNEL, "mc%d", mci->mc_idx);
- if (!mci->bus.name)
+ mci->bus->name = kasprintf(GFP_KERNEL, "mc%d", mci->mc_idx);
+ if (!mci->bus->name)
return -ENOMEM;
- edac_dbg(0, "creating bus %s\n", mci->bus.name);
- err = bus_register(&mci->bus);
+
+ edac_dbg(0, "creating bus %s\n", mci->bus->name);
+
+ err = bus_register(mci->bus);
if (err < 0)
return err;

@@ -988,7 +990,7 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
device_initialize(&mci->dev);

mci->dev.parent = mci_pdev;
- mci->dev.bus = &mci->bus;
+ mci->dev.bus = mci->bus;
dev_set_name(&mci->dev, "mc%d", mci->mc_idx);
dev_set_drvdata(&mci->dev, mci);
pm_runtime_forbid(&mci->dev);
@@ -997,8 +999,8 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
err = device_add(&mci->dev);
if (err < 0) {
edac_dbg(1, "failure: create device %s\n", dev_name(&mci->dev));
- bus_unregister(&mci->bus);
- kfree(mci->bus.name);
+ bus_unregister(mci->bus);
+ kfree(mci->bus->name);
return err;
}

@@ -1064,8 +1066,8 @@ fail:
}
fail2:
device_unregister(&mci->dev);
- bus_unregister(&mci->bus);
- kfree(mci->bus.name);
+ bus_unregister(mci->bus);
+ kfree(mci->bus->name);
return err;
}

@@ -1098,8 +1100,8 @@ void edac_unregister_sysfs(struct mem_ctl_info *mci)
{
edac_dbg(1, "Unregistering device %s\n", dev_name(&mci->dev));
device_unregister(&mci->dev);
- bus_unregister(&mci->bus);
- kfree(mci->bus.name);
+ bus_unregister(mci->bus);
+ kfree(mci->bus->name);
}

static void mc_attr_release(struct device *dev)
diff --git a/drivers/edac/i5100_edac.c b/drivers/edac/i5100_edac.c
index 1b635178cc44..157b934e8ce3 100644
--- a/drivers/edac/i5100_edac.c
+++ b/drivers/edac/i5100_edac.c
@@ -974,7 +974,7 @@ static int i5100_setup_debugfs(struct mem_ctl_info *mci)
if (!i5100_debugfs)
return -ENODEV;

- priv->debugfs = debugfs_create_dir(mci->bus.name, i5100_debugfs);
+ priv->debugfs = debugfs_create_dir(mci->bus->name, i5100_debugfs);

if (!priv->debugfs)
return -ENOMEM;
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 0b763276f619..5c6d7fbaf89e 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -622,7 +622,7 @@ struct edac_raw_error_desc {
*/
struct mem_ctl_info {
struct device dev;
- struct bus_type bus;
+ struct bus_type *bus;

struct list_head link; /* for global list of mem_ctl_info structs */

@@ -742,4 +742,9 @@ struct mem_ctl_info {
#endif
};

+/*
+ * Maximum number of memory controllers in the coherent fabric.
+ */
+#define EDAC_MAX_MCS 16
+
#endif
--
1.8.3

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-07-18 16:51:52

by Tony Luck

[permalink] [raw]
Subject: RE: BUG: key ffff880c1148c478 not in .data! (V3.10.0)

+ BUG_ON(mci->mc_idx >= EDAC_MAX_MCS);

Do we have to "BUG_ON()" here? Couldn't we be gentler with something like:

if (mci->mc_idx >= EDAC_MAX_MCS) {
printk_once(KERN_WARNING "Too many memory controllers\n");
return; /* probably need to make sure caller copes with this ... so more stuff there */
}

-Tony
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-07-18 23:27:25

by Borislav Petkov

[permalink] [raw]
Subject: Re: BUG: key ffff880c1148c478 not in .data! (V3.10.0)

On Thu, Jul 18, 2013 at 04:51:48PM +0000, Luck, Tony wrote:
> + BUG_ON(mci->mc_idx >= EDAC_MAX_MCS);
>
> Do we have to "BUG_ON()" here? Couldn't we be gentler with something like:
>
> if (mci->mc_idx >= EDAC_MAX_MCS) {
> printk_once(KERN_WARNING "Too many memory controllers\n");
> return; /* probably need to make sure caller copes with this ... so more stuff there */

Yeah, we can do something like this:

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 429e971e02d7..c55ad285c285 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -725,6 +725,11 @@ int edac_mc_add_mc(struct mem_ctl_info *mci)
int ret = -EINVAL;
edac_dbg(0, "\n");

+ if (mci->mc_idx >= EDAC_MAX_MCS) {
+ pr_warn_once("Too many memory controllers: %d\n", mci->mc_idx);
+ return ret;
+ }
+
#ifdef CONFIG_EDAC_DEBUG
if (edac_debug_level >= 3)
edac_mc_dump_mci(mci);
--

right near the beginning of the function so that we can save us the
unwinding.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-07-19 10:28:35

by Borislav Petkov

[permalink] [raw]
Subject: [PATCH -v2] EDAC: Fix lockdep splat

From: Borislav Petkov <[email protected]>

Fix the following:

BUG: key ffff88043bdd0330 not in .data!
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
Call Trace:
dump_stack
warn_slowpath_common
warn_slowpath_fmt
lockdep_init_map
? trace_hardirqs_on_caller
? trace_hardirqs_on
debug_mutex_init
__mutex_init
bus_register
edac_create_sysfs_mci_device
edac_mc_add_mc
sbridge_probe
pci_device_probe
driver_probe_device
__driver_attach
? driver_probe_device
bus_for_each_dev
driver_attach
bus_add_driver
driver_register
__pci_register_driver
? 0xffffffffa0010fff
sbridge_init
? 0xffffffffa0010fff
do_one_initcall
load_module
? unset_module_init_ro_nx
SyS_init_module
tracesys
---[ end trace d24a70b0d3ddf733 ]---
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
EDAC sbridge: Driver loaded.

What happens is that bus_register needs a statically allocated lock_key
because the last is handed in to lockdep. However, struct mem_ctl_info
embeds struct bus_type (the whole struct, not a pointer to it) and the
whole thing gets dynamically allocated.

Fix this by using a statically allocated struct bus_type for the MC bus.

Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Markus Trippelsdorf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
---
drivers/edac/edac_mc.c | 9 +++++++++
drivers/edac/edac_mc_sysfs.c | 28 +++++++++++++++-------------
drivers/edac/i5100_edac.c | 2 +-
include/linux/edac.h | 7 ++++++-
4 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 27e86d938262..c55ad285c285 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -48,6 +48,8 @@ static LIST_HEAD(mc_devices);
*/
static void const *edac_mc_owner;

+static struct bus_type mc_bus[EDAC_MAX_MCS];
+
unsigned edac_dimm_info_location(struct dimm_info *dimm, char *buf,
unsigned len)
{
@@ -723,6 +725,11 @@ int edac_mc_add_mc(struct mem_ctl_info *mci)
int ret = -EINVAL;
edac_dbg(0, "\n");

+ if (mci->mc_idx >= EDAC_MAX_MCS) {
+ pr_warn_once("Too many memory controllers: %d\n", mci->mc_idx);
+ return ret;
+ }
+
#ifdef CONFIG_EDAC_DEBUG
if (edac_debug_level >= 3)
edac_mc_dump_mci(mci);
@@ -762,6 +769,8 @@ int edac_mc_add_mc(struct mem_ctl_info *mci)
/* set load time so that error rate can be tracked */
mci->start_time = jiffies;

+ mci->bus = &mc_bus[mci->mc_idx];
+
if (edac_create_sysfs_mci_device(mci)) {
edac_mc_printk(mci, KERN_WARNING,
"failed to create sysfs device\n");
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index ef15a7e613bc..e7c32c4f7837 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -370,7 +370,7 @@ static int edac_create_csrow_object(struct mem_ctl_info *mci,
return -ENODEV;

csrow->dev.type = &csrow_attr_type;
- csrow->dev.bus = &mci->bus;
+ csrow->dev.bus = mci->bus;
device_initialize(&csrow->dev);
csrow->dev.parent = &mci->dev;
csrow->mci = mci;
@@ -605,7 +605,7 @@ static int edac_create_dimm_object(struct mem_ctl_info *mci,
dimm->mci = mci;

dimm->dev.type = &dimm_attr_type;
- dimm->dev.bus = &mci->bus;
+ dimm->dev.bus = mci->bus;
device_initialize(&dimm->dev);

dimm->dev.parent = &mci->dev;
@@ -975,11 +975,13 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
* The memory controller needs its own bus, in order to avoid
* namespace conflicts at /sys/bus/edac.
*/
- mci->bus.name = kasprintf(GFP_KERNEL, "mc%d", mci->mc_idx);
- if (!mci->bus.name)
+ mci->bus->name = kasprintf(GFP_KERNEL, "mc%d", mci->mc_idx);
+ if (!mci->bus->name)
return -ENOMEM;
- edac_dbg(0, "creating bus %s\n", mci->bus.name);
- err = bus_register(&mci->bus);
+
+ edac_dbg(0, "creating bus %s\n", mci->bus->name);
+
+ err = bus_register(mci->bus);
if (err < 0)
return err;

@@ -988,7 +990,7 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
device_initialize(&mci->dev);

mci->dev.parent = mci_pdev;
- mci->dev.bus = &mci->bus;
+ mci->dev.bus = mci->bus;
dev_set_name(&mci->dev, "mc%d", mci->mc_idx);
dev_set_drvdata(&mci->dev, mci);
pm_runtime_forbid(&mci->dev);
@@ -997,8 +999,8 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
err = device_add(&mci->dev);
if (err < 0) {
edac_dbg(1, "failure: create device %s\n", dev_name(&mci->dev));
- bus_unregister(&mci->bus);
- kfree(mci->bus.name);
+ bus_unregister(mci->bus);
+ kfree(mci->bus->name);
return err;
}

@@ -1064,8 +1066,8 @@ fail:
}
fail2:
device_unregister(&mci->dev);
- bus_unregister(&mci->bus);
- kfree(mci->bus.name);
+ bus_unregister(mci->bus);
+ kfree(mci->bus->name);
return err;
}

@@ -1098,8 +1100,8 @@ void edac_unregister_sysfs(struct mem_ctl_info *mci)
{
edac_dbg(1, "Unregistering device %s\n", dev_name(&mci->dev));
device_unregister(&mci->dev);
- bus_unregister(&mci->bus);
- kfree(mci->bus.name);
+ bus_unregister(mci->bus);
+ kfree(mci->bus->name);
}

static void mc_attr_release(struct device *dev)
diff --git a/drivers/edac/i5100_edac.c b/drivers/edac/i5100_edac.c
index 1b635178cc44..157b934e8ce3 100644
--- a/drivers/edac/i5100_edac.c
+++ b/drivers/edac/i5100_edac.c
@@ -974,7 +974,7 @@ static int i5100_setup_debugfs(struct mem_ctl_info *mci)
if (!i5100_debugfs)
return -ENODEV;

- priv->debugfs = debugfs_create_dir(mci->bus.name, i5100_debugfs);
+ priv->debugfs = debugfs_create_dir(mci->bus->name, i5100_debugfs);

if (!priv->debugfs)
return -ENOMEM;
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 0b763276f619..5c6d7fbaf89e 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -622,7 +622,7 @@ struct edac_raw_error_desc {
*/
struct mem_ctl_info {
struct device dev;
- struct bus_type bus;
+ struct bus_type *bus;

struct list_head link; /* for global list of mem_ctl_info structs */

@@ -742,4 +742,9 @@ struct mem_ctl_info {
#endif
};

+/*
+ * Maximum number of memory controllers in the coherent fabric.
+ */
+#define EDAC_MAX_MCS 16
+
#endif
--
1.8.3

2013-07-20 02:56:19

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: BUG: key ffff880c1148c478 not in .data! (V3.10.0)

Em Fri, 19 Jul 2013 01:27:18 +0200
Borislav Petkov <[email protected]> escreveu:

> On Thu, Jul 18, 2013 at 04:51:48PM +0000, Luck, Tony wrote:
> > + BUG_ON(mci->mc_idx >= EDAC_MAX_MCS);
> >
> > Do we have to "BUG_ON()" here? Couldn't we be gentler with something like:
> >
> > if (mci->mc_idx >= EDAC_MAX_MCS) {
> > printk_once(KERN_WARNING "Too many memory controllers\n");
> > return; /* probably need to make sure caller copes with this ... so more stuff there */
>
> Yeah, we can do something like this:

With this change, the patch looks ok for me.

Acked-by: Mauro Carvalho Chehab <[email protected]>
>
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index 429e971e02d7..c55ad285c285 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -725,6 +725,11 @@ int edac_mc_add_mc(struct mem_ctl_info *mci)
> int ret = -EINVAL;
> edac_dbg(0, "\n");
>
> + if (mci->mc_idx >= EDAC_MAX_MCS) {
> + pr_warn_once("Too many memory controllers: %d\n", mci->mc_idx);
> + return ret;
> + }
> +
> #ifdef CONFIG_EDAC_DEBUG
> if (edac_debug_level >= 3)
> edac_mc_dump_mci(mci);
> --
>
> right near the beginning of the function so that we can save us the
> unwinding.
>




Cheers,
Mauro

2013-07-20 03:55:59

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: [PATCH -v2] EDAC: Fix lockdep splat

Em Fri, 19 Jul 2013 12:28:25 +0200
Borislav Petkov <[email protected]> escreveu:

> From: Borislav Petkov <[email protected]>
>
> Fix the following:
>
> BUG: key ffff88043bdd0330 not in .data!
> ------------[ cut here ]------------
> WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
> DEBUG_LOCKS_WARN_ON(1)
> Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
> CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
> Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
> 0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
> ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
> ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
> Call Trace:
> dump_stack
> warn_slowpath_common
> warn_slowpath_fmt
> lockdep_init_map
> ? trace_hardirqs_on_caller
> ? trace_hardirqs_on
> debug_mutex_init
> __mutex_init
> bus_register
> edac_create_sysfs_mci_device
> edac_mc_add_mc
> sbridge_probe
> pci_device_probe
> driver_probe_device
> __driver_attach
> ? driver_probe_device
> bus_for_each_dev
> driver_attach
> bus_add_driver
> driver_register
> __pci_register_driver
> ? 0xffffffffa0010fff
> sbridge_init
> ? 0xffffffffa0010fff
> do_one_initcall
> load_module
> ? unset_module_init_ro_nx
> SyS_init_module
> tracesys
> ---[ end trace d24a70b0d3ddf733 ]---
> EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
> EDAC sbridge: Driver loaded.
>
> What happens is that bus_register needs a statically allocated lock_key
> because the last is handed in to lockdep. However, struct mem_ctl_info
> embeds struct bus_type (the whole struct, not a pointer to it) and the
> whole thing gets dynamically allocated.
>
> Fix this by using a statically allocated struct bus_type for the MC bus.
>
> Cc: Mauro Carvalho Chehab <[email protected]>

Acked-by: Mauro Carvalho Chehab <[email protected]>

But see below.

> Cc: Markus Trippelsdorf <[email protected]>
> Signed-off-by: Borislav Petkov <[email protected]>
> ---
> drivers/edac/edac_mc.c | 9 +++++++++
> drivers/edac/edac_mc_sysfs.c | 28 +++++++++++++++-------------
> drivers/edac/i5100_edac.c | 2 +-
> include/linux/edac.h | 7 ++++++-
> 4 files changed, 31 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
> index 27e86d938262..c55ad285c285 100644
> --- a/drivers/edac/edac_mc.c
> +++ b/drivers/edac/edac_mc.c
> @@ -48,6 +48,8 @@ static LIST_HEAD(mc_devices);
> */
> static void const *edac_mc_owner;
>
> +static struct bus_type mc_bus[EDAC_MAX_MCS];
> +
> unsigned edac_dimm_info_location(struct dimm_info *dimm, char *buf,
> unsigned len)
> {
> @@ -723,6 +725,11 @@ int edac_mc_add_mc(struct mem_ctl_info *mci)
> int ret = -EINVAL;
> edac_dbg(0, "\n");
>
> + if (mci->mc_idx >= EDAC_MAX_MCS) {
> + pr_warn_once("Too many memory controllers: %d\n", mci->mc_idx);
> + return ret;

Hmm... while I'm ok with returning -EINVAL, maybe instead it could be
returning some other error more meaningful error code (ENODEV?).

> + }
> +
> #ifdef CONFIG_EDAC_DEBUG
> if (edac_debug_level >= 3)
> edac_mc_dump_mci(mci);
> @@ -762,6 +769,8 @@ int edac_mc_add_mc(struct mem_ctl_info *mci)
> /* set load time so that error rate can be tracked */
> mci->start_time = jiffies;
>
> + mci->bus = &mc_bus[mci->mc_idx];
> +
> if (edac_create_sysfs_mci_device(mci)) {
> edac_mc_printk(mci, KERN_WARNING,
> "failed to create sysfs device\n");
> diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
> index ef15a7e613bc..e7c32c4f7837 100644
> --- a/drivers/edac/edac_mc_sysfs.c
> +++ b/drivers/edac/edac_mc_sysfs.c
> @@ -370,7 +370,7 @@ static int edac_create_csrow_object(struct mem_ctl_info *mci,
> return -ENODEV;
>
> csrow->dev.type = &csrow_attr_type;
> - csrow->dev.bus = &mci->bus;
> + csrow->dev.bus = mci->bus;
> device_initialize(&csrow->dev);
> csrow->dev.parent = &mci->dev;
> csrow->mci = mci;
> @@ -605,7 +605,7 @@ static int edac_create_dimm_object(struct mem_ctl_info *mci,
> dimm->mci = mci;
>
> dimm->dev.type = &dimm_attr_type;
> - dimm->dev.bus = &mci->bus;
> + dimm->dev.bus = mci->bus;
> device_initialize(&dimm->dev);
>
> dimm->dev.parent = &mci->dev;
> @@ -975,11 +975,13 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
> * The memory controller needs its own bus, in order to avoid
> * namespace conflicts at /sys/bus/edac.
> */
> - mci->bus.name = kasprintf(GFP_KERNEL, "mc%d", mci->mc_idx);
> - if (!mci->bus.name)
> + mci->bus->name = kasprintf(GFP_KERNEL, "mc%d", mci->mc_idx);
> + if (!mci->bus->name)
> return -ENOMEM;
> - edac_dbg(0, "creating bus %s\n", mci->bus.name);
> - err = bus_register(&mci->bus);
> +
> + edac_dbg(0, "creating bus %s\n", mci->bus->name);
> +
> + err = bus_register(mci->bus);
> if (err < 0)
> return err;
>
> @@ -988,7 +990,7 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
> device_initialize(&mci->dev);
>
> mci->dev.parent = mci_pdev;
> - mci->dev.bus = &mci->bus;
> + mci->dev.bus = mci->bus;
> dev_set_name(&mci->dev, "mc%d", mci->mc_idx);
> dev_set_drvdata(&mci->dev, mci);
> pm_runtime_forbid(&mci->dev);
> @@ -997,8 +999,8 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
> err = device_add(&mci->dev);
> if (err < 0) {
> edac_dbg(1, "failure: create device %s\n", dev_name(&mci->dev));
> - bus_unregister(&mci->bus);
> - kfree(mci->bus.name);
> + bus_unregister(mci->bus);
> + kfree(mci->bus->name);
> return err;
> }
>
> @@ -1064,8 +1066,8 @@ fail:
> }
> fail2:
> device_unregister(&mci->dev);
> - bus_unregister(&mci->bus);
> - kfree(mci->bus.name);
> + bus_unregister(mci->bus);
> + kfree(mci->bus->name);
> return err;
> }
>
> @@ -1098,8 +1100,8 @@ void edac_unregister_sysfs(struct mem_ctl_info *mci)
> {
> edac_dbg(1, "Unregistering device %s\n", dev_name(&mci->dev));
> device_unregister(&mci->dev);
> - bus_unregister(&mci->bus);
> - kfree(mci->bus.name);
> + bus_unregister(mci->bus);
> + kfree(mci->bus->name);
> }
>
> static void mc_attr_release(struct device *dev)
> diff --git a/drivers/edac/i5100_edac.c b/drivers/edac/i5100_edac.c
> index 1b635178cc44..157b934e8ce3 100644
> --- a/drivers/edac/i5100_edac.c
> +++ b/drivers/edac/i5100_edac.c
> @@ -974,7 +974,7 @@ static int i5100_setup_debugfs(struct mem_ctl_info *mci)
> if (!i5100_debugfs)
> return -ENODEV;
>
> - priv->debugfs = debugfs_create_dir(mci->bus.name, i5100_debugfs);
> + priv->debugfs = debugfs_create_dir(mci->bus->name, i5100_debugfs);
>
> if (!priv->debugfs)
> return -ENOMEM;
> diff --git a/include/linux/edac.h b/include/linux/edac.h
> index 0b763276f619..5c6d7fbaf89e 100644
> --- a/include/linux/edac.h
> +++ b/include/linux/edac.h
> @@ -622,7 +622,7 @@ struct edac_raw_error_desc {
> */
> struct mem_ctl_info {
> struct device dev;
> - struct bus_type bus;
> + struct bus_type *bus;
>
> struct list_head link; /* for global list of mem_ctl_info structs */
>
> @@ -742,4 +742,9 @@ struct mem_ctl_info {
> #endif
> };
>
> +/*
> + * Maximum number of memory controllers in the coherent fabric.
> + */
> +#define EDAC_MAX_MCS 16
> +
> #endif




Cheers,
Mauro