2010-07-01 15:44:23

by Thomas Renninger

[permalink] [raw]
Subject: Dynamic Debug broken on 2.6.35-rc3?

Hi,

Doing:
echo "file ec.c +p" >/sys/kernel/debug/dynamic_debug/control

I got x
RIP: 0010:[<ffffffff81251267>] [<ffffffff81251267>]
ddebug_change+0x87/0x240

More output is in the attachment.

Thanks for any hints,

Thomas


Attachments:
screen-exchange (20.46 kB)

2010-07-01 16:26:14

by Jason Baron

[permalink] [raw]
Subject: Re: Dynamic Debug broken on 2.6.35-rc3?

On Thu, Jul 01, 2010 at 05:44:19PM +0200, Thomas Renninger wrote:
>
> Hi,
>
> Doing:
> echo "file ec.c +p" >/sys/kernel/debug/dynamic_debug/control
>
> I got x
> RIP: 0010:[<ffffffff81251267>] [<ffffffff81251267>]
> ddebug_change+0x87/0x240
>
> More output is in the attachment.
>
> Thanks for any hints,
>
> Thomas

Hi Thomas,

I just tried the same command on 2.6.35-rc3, and it worked fine. I
noticed that the kernel your running is: "2.6.35-rc3-0.0.10.4cae135-default",
so perhaps there are some other changes there causing this problem? Can
you re-produce the bug on a purely upstream kernel?

thanks,

-Jason

2010-07-02 16:56:12

by Thomas Renninger

[permalink] [raw]
Subject: Re: Dynamic Debug broken on 2.6.35-rc3?

On Thursday 01 July 2010 18:26:01 Jason Baron wrote:
> On Thu, Jul 01, 2010 at 05:44:19PM +0200, Thomas Renninger wrote:
> >
> > Hi,
> >
> > Doing:
> > echo "file ec.c +p" >/sys/kernel/debug/dynamic_debug/control
> >
> > I got x
> > RIP: 0010:[<ffffffff81251267>] [<ffffffff81251267>]
> > ddebug_change+0x87/0x240
...
> I just tried the same command on 2.6.35-rc3, and it worked fine. I
> noticed that the kernel your running is: "2.6.35-rc3-0.0.10.4cae135-default",
> so perhaps there are some other changes there causing this problem? Can
> you re-produce the bug on a purely upstream kernel?
I am able to create another crash with plain 2.6.35-rc3 kernel.
Find attached my .config.
I again initiated the crash by:
echo "file ec.c +p" >/sys/kernel/debug/dynamic_debug/control

First I got a crash in strcmp without a backtrace.

BUG: unable to handle kernel NULL pointer dereference at (null)
[ 252.832036] IP: [<ffffffff8122b4f4>] strcmp+0x14/0x30
[ 252.832036] PGD 37a76067 PUD 7a60c067 PMD 0
[ 252.832036] Oops: 0000 [#1] SMP
[ 252.832036] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map

Thus I clustered lib/dynamic_debug.c with NULL pointer checks before
strcmp(), but couldn't find anything -> got other crashs.

Hmm, with attached config I can easily produce crashs, e.g.:
cat /sys/kernel/debug/dynamic_debug/control |wc
622 5683 64665

BUG: unable to handle kernel paging request at ffffffffa01c480f
[ 412.226289] IP: [<ffffffff8123b31b>] ddebug_proc_show+0x1b/0xa0
[ 412.226289] PGD 1806067 PUD 180a063 PMD 376e8067 PTE 0
[ 412.226289] Oops: 0000 [#1] SMP
[ 412.226289] last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
[ 412.226289] CPU 5
[ 412.226289] Modules linked in: autofs4 edd nfs lockd fscache nfs_acl auth_rpcgss sunrpc af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave
acpi_cpufreq mperf fuse reiserfs loop dm_mod igb i5400_edac ioatdma edac_core floppy sr_mod shpchp iTCO_wdt iTCO_vendor_support cdrom container button
pci_hotplug i2c_i801 i5k_amb serio_raw dca sg pcspkr ext4 jbd2 crc16 uhci_hcd radeon ttm drm_kms_helper ehci_hcd drm usbcore sd_mod i2c_algo_bit fan thermal
processor thermal_sys ata_generic ata_piix ahci libahci libata scsi_mod
[ 412.226289]
[ 412.226289] Pid: 3451, comm: cat Not tainted 2.6.35-rc3-vanilla #5 X7DWN/X7DW3
[ 412.226289] RIP: 0010:[<ffffffff8123b31b>] [<ffffffff8123b31b>] ddebug_proc_show+0x1b/0xa0
[ 412.226289] RSP: 0018:ffff880078761e68 EFLAGS: 00010286
[ 412.226289] RAX: ffffffff8163fe20 RBX: ffffffffa01c47e8 RCX: ffff8800370ec3e0
[ 412.226289] RDX: ffff8800370ec3e0 RSI: ffffffffa01c47e8 RDI: ffff880036e24740
[ 412.226289] RBP: ffff880078761e98 R08: 0000000000000001 R09: 000000000000ffff
[ 412.226289] R10: 0000000000000003 R11: 000000000000000a R12: ffff880036e24740
[ 412.226289] R13: 0000000000000ec7 R14: ffff880078761ed0 R15: 0000000000000000
[ 412.226289] FS: 00007f9f0982c700(0000) GS:ffff880001d40000(0000) knlGS:0000000000000000
[ 412.226289] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 412.226289] CR2: ffffffffa01c480f CR3: 0000000037a08000 CR4: 00000000000006e0
[ 412.226289] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 412.226289] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 412.226289] Process cat (pid: 3451, threadinfo ffff880078760000, task ffff880036536480)
[ 412.226289] Stack:
[ 412.226289] ffff880078761e78 ffffffff8123b3e7 000000000000002d ffff8800379cae40
[ 412.226289] <0> ffffffffa01c47e8 ffff880036e24740 ffff880078761f08 ffffffff8114eca9
[ 412.226289] <0> ffff880078761f48 0000000000001000 0000000000611000 ffff8800379cae40
[ 412.226289] Call Trace:
[ 412.226289] [<ffffffff8123b3e7>] ? ddebug_proc_start+0x27/0x100
[ 412.226289] [<ffffffff8114eca9>] seq_read+0x269/0x430
[ 412.226289] [<ffffffff81131ec8>] vfs_read+0xc8/0x1a0
[ 412.226289] [<ffffffff81132090>] sys_read+0x50/0x90
[ 412.226289] [<ffffffff81002e2b>] system_call_fastpath+0x16/0x1b
[ 412.226289] Code: ee ff 48 83 c4 08 5b c9 c3 0f 1f 80 00 00 00 00 55 48 89 e5 41 54 49 89 fc 53 48 89 f3 48 83 ec 20 48 83 fe 01 48 8b 57 60 74 75 <0f> b6 46
27 c6 45 e1 00 83 e0 01 83 f8 01 19 c0 83 e0 bd 83 c0
[ 412.226289] RIP [<ffffffff8123b31b>] ddebug_proc_show+0x1b/0xa0
[ 412.226289] RSP <ffff880078761e68>
[ 412.226289] CR2: ffffffffa01c480f


Here another try with verbose=1 (in dynamic_debug.c).
This time no backtrace again:

[ 222.788975] ddebug_proc_open: called
[ 222.797015] ddebug_proc_start: called m=ffff88015a100340 *pos=0
[ 222.809390] ddebug_proc_show: called m=ffff88015a100340 p=0000000000000001
[ 222.823733] ddebug_proc_next: called m=ffff88015a100340 p=0000000000000001 *pos=0
[ 222.839181] ddebug_proc_show: called m=ffff88015a100340 p=ffffffff81909820
[ 222.853423] ddebug_proc_next: called m=ffff88015a100340 p=ffffffff81909820 *pos=1
[ 222.868866] ddebug_proc_show: called m=ffff88015a100340 p=ffffffff81909848
[ 222.868879] ddebug_proc_next: called m=ffff88015a100340 p=ffffffff81909848 *pos=2
[ 222.868887] ddebug_proc_show: called m=ffff88015a100340 p=ffffffff81909870
[ 222.868896] ddebug_proc_next: called m=ffff88015a100340 p=ffffffff81909870 *pos=3
[ 222.868904] ddebug_proc_show: called m=ffff88015a100340 p=ffffffff81909898
[ 222.868913] ddebug_proc_next: called m=ffff88015a100340 p=ffffffff81909898 *pos=4
[ 222.868920] ddebug_proc_show: called m=ffff88015a100340 p=ffffffff819098c0
[ 222.868930] ddebug_proc_next: called m=ffff88015a100340 p=ffffffff819098c0 *pos=5
[ 222.868937] ddebug_proc_show: called m=ffff88015a100340 p=ffffffff819098e8
[ 222.868947] ddebug_proc_next: called m=ffff88015a10034

<- KDUMP/CRASH kernel gets loaded ->

[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 2.6.35-rc3-vanilla (trenn@ett) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #6 SMP Fri Jul 2 18:39:34 CEST 2010
[ 0.000000] Command line: root=/dev/sda7 console=tty0 console=ttyS0,57600 sysrq_always_enabled panic=100 ignore_loglevel resume=/dev/disk/by-id/ata-
WDC_WD1200JS-00NCB1_WD-WCANM5535606-part2 quiet vga=normal elevator=deadline sysrq=1 reset_devices irqpoll maxcpus=1 memmap=exactmap
memmap=640K@0K memmap=261484K@33408K elfco


Attachments:
.config (111.43 kB)

2010-07-08 21:40:32

by Jason Baron

[permalink] [raw]
Subject: Re: Dynamic Debug broken on 2.6.35-rc3?

On Fri, Jul 02, 2010 at 06:55:59PM +0200, Thomas Renninger wrote:
> On Thursday 01 July 2010 18:26:01 Jason Baron wrote:
> > On Thu, Jul 01, 2010 at 05:44:19PM +0200, Thomas Renninger wrote:
> > >
> > > Hi,
> > >
> > > Doing:
> > > echo "file ec.c +p" >/sys/kernel/debug/dynamic_debug/control
> > >
> > > I got x
> > > RIP: 0010:[<ffffffff81251267>] [<ffffffff81251267>]
> > > ddebug_change+0x87/0x240
> ...
> > I just tried the same command on 2.6.35-rc3, and it worked fine. I
> > noticed that the kernel your running is: "2.6.35-rc3-0.0.10.4cae135-default",
> > so perhaps there are some other changes there causing this problem? Can
> > you re-produce the bug on a purely upstream kernel?
> I am able to create another crash with plain 2.6.35-rc3 kernel.

Hi,

So dynamic debug was not properly unregistering debug calls when modules
were removed. Specifically, we were leaving around references to debug
statements in modules that were no longer loaded. Thus, we would end up
touching invalid parts of memory, leading to these panics. The missing
unregister calls, were during module loading races, and error
conditions, and that probably explains why we haven't seen them before.
Also, I didn't see this problem initially b/c I was not using modules.
Anyways, please try the patch below to verify that it resolves this
issue.

thanks,

-Jason

Make sure we properly call ddebug_remove_module() when a module fails to
load. In addition, pass the pointer to the "debug table", to both
ddebug_add_module(), and ddebug_remove_module() so that we can uniquely
identify each set of debug statements. In this way even modules with the
same name can be properly identified and removed.


Signed-off-by: Jason Baron <[email protected]>
---
include/linux/dynamic_debug.h | 7 ++-----
include/linux/module.h | 4 ++++
kernel/module.c | 10 +++++++---
lib/dynamic_debug.c | 4 ++--
4 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index b3cd4de..a5c133e 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -40,7 +40,7 @@ int ddebug_add_module(struct _ddebug *tab, unsigned int n,
const char *modname);

#if defined(CONFIG_DYNAMIC_DEBUG)
-extern int ddebug_remove_module(char *mod_name);
+extern int ddebug_remove_module(struct _ddebug *tab, char *mod_name);

#define __dynamic_dbg_enabled(dd) ({ \
int __ret = 0; \
@@ -73,10 +73,7 @@ extern int ddebug_remove_module(char *mod_name);

#else

-static inline int ddebug_remove_module(char *mod)
-{
- return 0;
-}
+#define ddebug_remove_module(tab, name) do {} while (0)

#define dynamic_pr_debug(fmt, ...) \
do { if (0) printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__); } while (0)
diff --git a/include/linux/module.h b/include/linux/module.h
index 8a6b9fd..97ce090 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -387,6 +387,10 @@ struct module
ctor_fn_t *ctors;
unsigned int num_ctors;
#endif
+
+#ifdef CONFIG_DYNAMIC_DEBUG
+ struct _ddebug *ddebug;
+#endif
};
#ifndef MODULE_ARCH_INIT
#define MODULE_ARCH_INIT {}
diff --git a/kernel/module.c b/kernel/module.c
index 8c6b428..16bb044 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -787,7 +787,6 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,

/* Store the name of the last unloaded module for diagnostic purposes */
strlcpy(last_unloaded_module, mod->name, sizeof(last_unloaded_module));
- ddebug_remove_module(mod->name);

free_module(mod);
return 0;
@@ -1550,6 +1549,8 @@ static void free_module(struct module *mod)
remove_sect_attrs(mod);
mod_kobject_remove(mod);

+ ddebug_remove_module(mod->ddebug, mod->name);
+
/* Arch-specific cleanup. */
module_arch_cleanup(mod);

@@ -2053,12 +2054,14 @@ static inline void add_kallsyms(struct module *mod,
}
#endif /* CONFIG_KALLSYMS */

-static void dynamic_debug_setup(struct _ddebug *debug, unsigned int num)
+static void dynamic_debug_setup(struct _ddebug *debug, unsigned int num,
+ struct module *mod)
{
#ifdef CONFIG_DYNAMIC_DEBUG
if (ddebug_add_module(debug, num, debug->modname))
printk(KERN_ERR "dynamic debug error adding module: %s\n",
debug->modname);
+ mod->ddebug = debug;
#endif
}

@@ -2483,7 +2486,7 @@ static noinline struct module *load_module(void __user *umod,
debug = section_objs(hdr, sechdrs, secstrings, "__verbose",
sizeof(*debug), &num_debug);
if (debug)
- dynamic_debug_setup(debug, num_debug);
+ dynamic_debug_setup(debug, num_debug, mod);
}

err = module_finalize(hdr, sechdrs, mod);
@@ -2562,6 +2565,7 @@ static noinline struct module *load_module(void __user *umod,
synchronize_sched();
module_arch_cleanup(mod);
cleanup:
+ ddebug_remove_module(mod->ddebug, mod->name);
free_modinfo(mod);
module_unload_free(mod);
#if defined(CONFIG_MODULE_UNLOAD)
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 3df8eb1..7d66180 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -692,7 +692,7 @@ static void ddebug_table_free(struct ddebug_table *dt)
* Called in response to a module being unloaded. Removes
* any ddebug_table's which point at the module.
*/
-int ddebug_remove_module(char *mod_name)
+int ddebug_remove_module(struct _ddebug *tab, char *mod_name)
{
struct ddebug_table *dt, *nextdt;
int ret = -ENOENT;
@@ -703,7 +703,7 @@ int ddebug_remove_module(char *mod_name)

mutex_lock(&ddebug_lock);
list_for_each_entry_safe(dt, nextdt, &ddebug_tables, link) {
- if (!strcmp(dt->mod_name, mod_name)) {
+ if (!strcmp(dt->mod_name, mod_name) && (tab == dt->ddebugs)) {
ddebug_table_free(dt);
ret = 0;
}
--
1.7.1

2010-07-08 21:54:03

by Andrew Morton

[permalink] [raw]
Subject: Re: Dynamic Debug broken on 2.6.35-rc3?

On Thu, 8 Jul 2010 17:39:28 -0400
Jason Baron <[email protected]> wrote:

> Make sure we properly call ddebug_remove_module() when a module fails to
> load. In addition, pass the pointer to the "debug table", to both
> ddebug_add_module(), and ddebug_remove_module() so that we can uniquely
> identify each set of debug statements. In this way even modules with the
> same name can be properly identified and removed.
>
>
> Signed-off-by: Jason Baron <[email protected]>

It'd be nice to track the Reported-by:s. And the Tested-by:s if/when
they arrive. SighIllDoIt.

The patch (almost) applies to 2.6.34. So are we missing a Cc:stable tag
as well?

Also, this patch's title is now "Re: Dynamic Debug broken on
2.6.35-rc3?" which isn't very good. I can invent a new title for it,
but that means we don't have a well-understood handle to refer to this
patch, and to perform searches with, etc.

Maybe a resend would be best..

2010-07-09 11:03:07

by Thomas Renninger

[permalink] [raw]
Subject: Re: Dynamic Debug broken on 2.6.35-rc3?

Hi,

I can confirm that this patch fixes the issue for me.

On Thursday 08 July 2010 23:53:00 Andrew Morton wrote:
> On Thu, 8 Jul 2010 17:39:28 -0400
> Jason Baron <[email protected]> wrote:
>
> > Make sure we properly call ddebug_remove_module() when a module fails to
> > load. In addition, pass the pointer to the "debug table", to both
> > ddebug_add_module(), and ddebug_remove_module() so that we can uniquely
> > identify each set of debug statements. In this way even modules with the
> > same name can be properly identified and removed.
> >
> >
> > Signed-off-by: Jason Baron <[email protected]>
>
> It'd be nice to track the Reported-by:s. And the Tested-by:s if/when
> they arrive. SighIllDoIt.
>
> The patch (almost) applies to 2.6.34. So are we missing a Cc:stable tag
> as well?
I'll resubmit with some more meta info and will include [email protected].

Could it be that this isn't a regression, but a bug that was always present,
but only gets exposed if you add modules with a specific implementation,
e.g. specific declarations of functions missing, etc.?

I tried to patch this into a 2.6.32.X kernel. While some hunks did not
succeed, it looks like an adjusted patch should get submitted for older
stable kernels as well?:

/dev/shm/linux-2.6.32> patch --dry-run -p1 -i ../linux-2.6.35-rc3/dynamic_debug_broken.patch
patching file include/linux/dynamic_debug.h
Hunk #2 succeeded at 76 with fuzz 2 (offset 3 lines).
patching file include/linux/module.h
Hunk #1 succeeded at 376 (offset -11 lines).
patching file kernel/module.c
Hunk #1 FAILED at 787.
Hunk #2 succeeded at 1596 with fuzz 2 (offset 47 lines).
Hunk #3 succeeded at 2098 (offset 44 lines).
Hunk #4 succeeded at 2548 (offset 62 lines).
Hunk #5 succeeded at 2638 with fuzz 2 (offset 73 lines).
1 out of 5 hunks FAILED -- saving rejects to file kernel/module.c.rej
patching file lib/dynamic_debug.c
Hunk #1 succeeded at 691 (offset -1 lines).
Hunk #2 succeeded at 702 (offset -1 lines).

Thanks Jason for this quick fix!

Thomas

2010-07-09 13:31:05

by Jason Baron

[permalink] [raw]
Subject: Re: Dynamic Debug broken on 2.6.35-rc3?

On Fri, Jul 09, 2010 at 01:03:08PM +0200, Thomas Renninger wrote:
> Hi,
>
> I can confirm that this patch fixes the issue for me.
>
> On Thursday 08 July 2010 23:53:00 Andrew Morton wrote:
> > On Thu, 8 Jul 2010 17:39:28 -0400
> > Jason Baron <[email protected]> wrote:
> >
> > > Make sure we properly call ddebug_remove_module() when a module fails to
> > > load. In addition, pass the pointer to the "debug table", to both
> > > ddebug_add_module(), and ddebug_remove_module() so that we can uniquely
> > > identify each set of debug statements. In this way even modules with the
> > > same name can be properly identified and removed.
> > >
> > >
> > > Signed-off-by: Jason Baron <[email protected]>
> >
> > It'd be nice to track the Reported-by:s. And the Tested-by:s if/when
> > they arrive. SighIllDoIt.
> >
> > The patch (almost) applies to 2.6.34. So are we missing a Cc:stable tag
> > as well?
> I'll resubmit with some more meta info and will include [email protected].
>
> Could it be that this isn't a regression, but a bug that was always present,
> but only gets exposed if you add modules with a specific implementation,
> e.g. specific declarations of functions missing, etc.?
>

Hi Thomas,

yes, this race has likely been present for a while (i'd have to look at
specific kernel versions to verify). I suspect its getting exposed now
due to more usage of this feature, and the proliferation of kernel
modules...

> I tried to patch this into a 2.6.32.X kernel. While some hunks did not
> succeed, it looks like an adjusted patch should get submitted for older
> stable kernels as well?:
>

if nobody else has done the 2.6.32 stable patch, I can do it, just let
me know.

thanks again for reporting this to me.

thanks,

-Jason

2010-07-12 14:24:15

by Thomas Renninger

[permalink] [raw]
Subject: Re: Dynamic Debug broken on 2.6.35-rc3?

Hi,

it's this one:
commit ff49d74ad383f54041378144ca1a229ee9aeaa59
Author: Yehuda Sadeh <[email protected]>
Date: Sat Jul 3 13:07:35 2010 +1000

which touches same code than Jason's fix does.
Possibly this patch also addresses (only parts of?) this problem?
Jason: Do you mind having a look at the latest git version and review
Yehuda's and adjust your patch if still necessary.
If Yehuda's patch is fixing this already, we still need it backported for
2.6.34 stable kernel and further?

One question about dynamic debug (unrelated to the mem corruption
issue):
Would it make sense to initialize dynamic debug earlier, somewhen shortly
after __setup is called.
Then a boot param ddebug_enable="xy" could be added.
The param could be in /sys/../control format or just be "all"?
My idea is to be able to track all the pr_debug calls (as) early (as possible)
at boot up.
One example is ec.c. Currently it is not possible to see the pr_debug messages
when EC accesses are done when the ACPI interpreter is started, there
is no userspace and no sysfs yet.
Same for PCI related pr_debug messages at early PCI(e) initialization time?
Would that be possible or do I miss something?

Thanks,

Thomas

On Friday 09 July 2010 03:30:19 pm Jason Baron wrote:
> On Fri, Jul 09, 2010 at 01:03:08PM +0200, Thomas Renninger wrote:
> > Hi,
> >
> > I can confirm that this patch fixes the issue for me.
> >
> > On Thursday 08 July 2010 23:53:00 Andrew Morton wrote:
> > > On Thu, 8 Jul 2010 17:39:28 -0400
> > >
> > > Jason Baron <[email protected]> wrote:
> > > > Make sure we properly call ddebug_remove_module() when a module fails
> > > > to load. In addition, pass the pointer to the "debug table", to both
> > > > ddebug_add_module(), and ddebug_remove_module() so that we can
> > > > uniquely identify each set of debug statements. In this way even
> > > > modules with the same name can be properly identified and removed.
> > > >
> > > >
> > > > Signed-off-by: Jason Baron <[email protected]>
> > >
> > > It'd be nice to track the Reported-by:s. And the Tested-by:s if/when
> > > they arrive. SighIllDoIt.
> > >
> > > The patch (almost) applies to 2.6.34. So are we missing a Cc:stable
> > > tag as well?
> >
> > I'll resubmit with some more meta info and will include
> > [email protected].
> >
> > Could it be that this isn't a regression, but a bug that was always
> > present, but only gets exposed if you add modules with a specific
> > implementation, e.g. specific declarations of functions missing, etc.?
>
> Hi Thomas,
>
> yes, this race has likely been present for a while (i'd have to look at
> specific kernel versions to verify). I suspect its getting exposed now
> due to more usage of this feature, and the proliferation of kernel
> modules...
>
> > I tried to patch this into a 2.6.32.X kernel. While some hunks did not
> > succeed, it looks like an adjusted patch should get submitted for older
> > stable kernels as well?:
>
> if nobody else has done the 2.6.32 stable patch, I can do it, just let
> me know.
>
> thanks again for reporting this to me.
>
> thanks,
>
> -Jason

2010-07-12 16:22:08

by Jason Baron

[permalink] [raw]
Subject: Re: Dynamic Debug broken on 2.6.35-rc3?

On Mon, Jul 12, 2010 at 04:24:16PM +0200, Thomas Renninger wrote:
> Hi,
>
> it's this one:
> commit ff49d74ad383f54041378144ca1a229ee9aeaa59
> Author: Yehuda Sadeh <[email protected]>
> Date: Sat Jul 3 13:07:35 2010 +1000
>
> which touches same code than Jason's fix does.
> Possibly this patch also addresses (only parts of?) this problem?
> Jason: Do you mind having a look at the latest git version and review
> Yehuda's and adjust your patch if still necessary.

So Yehuda's patch is meant to address this issue. I would still move the
remove code to free_module(), for the case that load_module() fails as
follows (on top of Yehuda's patch):

Signed-off-by: Jason Baron <[email protected]>
---
kernel/module.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 5d2d281..6c56282 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -787,7 +787,6 @@ SYSCALL_DEFINE2(delete_module, const char __user *, name_user,

/* Store the name of the last unloaded module for diagnostic purposes */
strlcpy(last_unloaded_module, mod->name, sizeof(last_unloaded_module));
- ddebug_remove_module(mod->name);

free_module(mod);
return 0;
@@ -1550,6 +1549,9 @@ static void free_module(struct module *mod)
remove_sect_attrs(mod);
mod_kobject_remove(mod);

+ /* Remove dynamic debug info */
+ ddebug_remove_module(mod->name);
+
/* Arch-specific cleanup. */
module_arch_cleanup(mod);

--
1.7.1

Also, the patch i wrote addressed the issue of duplicate "module" names.
Although you can't have duplicate names of loadable modules. Dynamic
debug also refers to built in vmlinux sections by using
"KBUILD_MODNAME". These are not necessarily globally unique. Thus, we
probably should be using a better globally unique identifier in the code.
However, that improvement can probably wait...

I will re-post the above as a separate thread. Also, I should probably
post a patch to add myself to the MAINTAINERS file, since I didn't see
Yehuda's patch until now...

> If Yehuda's patch is fixing this already, we still need it backported for
> 2.6.34 stable kernel and further?
>

Yes, patch might need to be adjusted a bit but is still relevant for
2.6.34 stable. The situation was exacerbated by:

commit 3bafeb6247042dcbb72b0141ec7c7107de9f0b99
Author: Linus Torvalds <[email protected]>
Date: Sat Jun 5 11:17:36 2010 -0600

module: move find_module check to end

during the 2.6.35 time frame.

> One question about dynamic debug (unrelated to the mem corruption
> issue):
> Would it make sense to initialize dynamic debug earlier, somewhen shortly
> after __setup is called.
> Then a boot param ddebug_enable="xy" could be added.
> The param could be in /sys/../control format or just be "all"?
> My idea is to be able to track all the pr_debug calls (as) early (as possible)
> at boot up.
> One example is ec.c. Currently it is not possible to see the pr_debug messages
> when EC accesses are done when the ACPI interpreter is started, there
> is no userspace and no sysfs yet.
> Same for PCI related pr_debug messages at early PCI(e) initialization time?
> Would that be possible or do I miss something?
>
> Thanks,
>
> Thomas

definitely a good idea, and something I just had not yet done
yet...after we resolve this issue, I'd be more than happy to review
and/or write a patch to do this. I'm glad ppl are interested in this :)

thanks,

-Jason

2010-07-12 21:19:53

by Yehuda Sadeh

[permalink] [raw]
Subject: Re: Dynamic Debug broken on 2.6.35-rc3?



On Mon, 12 Jul 2010, Jason Baron wrote:
...
> I will re-post the above as a separate thread. Also, I should probably
> post a patch to add myself to the MAINTAINERS file, since I didn't see
> Yehuda's patch until now...

Yeah, I wasn't sure who was responsible for this module. Wasn't clear
from the log.

>
>> If Yehuda's patch is fixing this already, we still need it backported for
>> 2.6.34 stable kernel and further?
>>
>
> Yes, patch might need to be adjusted a bit but is still relevant for
> 2.6.34 stable. The situation was exacerbated by:
>
> commit 3bafeb6247042dcbb72b0141ec7c7107de9f0b99
> Author: Linus Torvalds <[email protected]>
> Date: Sat Jun 5 11:17:36 2010 -0600
>
> module: move find_module check to end
>
> during the 2.6.35 time frame.
>

Right. Looking at the code, pre .35-rc2 loads the dynamic debug code after
verifying that the module is loadable. So it's a bit harder to hit
it, although it can still fail at the following module_finalize() or at
the parse_args(), so the problem is there.

Yehuda

2010-07-12 21:48:01

by Andi Kleen

[permalink] [raw]
Subject: Re: Dynamic Debug broken on 2.6.35-rc3?

On Mon, Jul 12, 2010 at 12:21:40PM -0400, Jason Baron wrote:
> On Mon, Jul 12, 2010 at 04:24:16PM +0200, Thomas Renninger wrote:
> > Hi,
> >
> > it's this one:
> > commit ff49d74ad383f54041378144ca1a229ee9aeaa59
> > Author: Yehuda Sadeh <[email protected]>
> > Date: Sat Jul 3 13:07:35 2010 +1000
> >
> > which touches same code than Jason's fix does.
> > Possibly this patch also addresses (only parts of?) this problem?
> > Jason: Do you mind having a look at the latest git version and review
> > Yehuda's and adjust your patch if still necessary.
>
> So Yehuda's patch is meant to address this issue. I would still move the
> remove code to free_module(), for the case that load_module() fails as
> follows (on top of Yehuda's patch):

One of the known failures with this was a module that was just missing its exit
function, but otherwise working. Is the patch likely to address this
too?

-Andi

2010-07-13 20:36:32

by Yehuda Sadeh

[permalink] [raw]
Subject: Re: Dynamic Debug broken on 2.6.35-rc3?

On Mon, 2010-07-12 at 23:47 +0200, Andi Kleen wrote:
> On Mon, Jul 12, 2010 at 12:21:40PM -0400, Jason Baron wrote:
> > On Mon, Jul 12, 2010 at 04:24:16PM +0200, Thomas Renninger wrote:
> > > Hi,
> > >
> > > it's this one:
> > > commit ff49d74ad383f54041378144ca1a229ee9aeaa59
> > > Author: Yehuda Sadeh <[email protected]>
> > > Date: Sat Jul 3 13:07:35 2010 +1000
> > >
> > > which touches same code than Jason's fix does.
> > > Possibly this patch also addresses (only parts of?) this problem?
> > > Jason: Do you mind having a look at the latest git version and review
> > > Yehuda's and adjust your patch if still necessary.
> >
> > So Yehuda's patch is meant to address this issue. I would still move the
> > remove code to free_module(), for the case that load_module() fails as
> > follows (on top of Yehuda's patch):
>
> One of the known failures with this was a module that was just missing its exit
> function, but otherwise working. Is the patch likely to address this
> too?
>
I'm not sure about a known issue with a missing exit function. Looking
at the latest git, it doesn't seem there is a problem there. However,
there is an issue with the module init function returning an error.
Jason's fix on top of my patch solves it.

Yehuda