2009-07-07 11:50:18

by Sergey Senozhatsky

[permalink] [raw]
Subject: kmemleak not tainted

Hello,

Just noticed:
kernel: [ 1917.133154] INFO: RCU detected CPU 0 stall (t=485140/3000 jiffies)
kernel: [ 1917.133154] Pid: 511, comm: kmemleak Not tainted 2.6.31-rc2-nv-git1-00254-ga4e2f0a-dirty #2
kernel: [ 1917.133154] Call Trace:
kernel: [ 1917.133154] [<c141784d>] ? printk+0x23/0x36
kernel: [ 1917.133154] [<c10a81e0>] __rcu_pending+0x140/0x210
kernel: [ 1917.133154] [<c10a82da>] rcu_pending+0x2a/0x70
kernel: [ 1917.133154] [<c1051b8f>] update_process_times+0x3f/0x80
kernel: [ 1917.133154] [<c107148f>] tick_sched_timer+0x6f/0xf0
kernel: [ 1917.133154] [<c10640a6>] __run_hrtimer+0x56/0xe0
kernel: [ 1917.133154] [<c1071420>] ? tick_sched_timer+0x0/0xf0
kernel: [ 1917.133154] [<c1071420>] ? tick_sched_timer+0x0/0xf0
kernel: [ 1917.133154] [<c1064aa5>] hrtimer_interrupt+0x145/0x270
kernel: [ 1917.133154] [<c101c48c>] smp_apic_timer_interrupt+0x5c/0xb0
kernel: [ 1917.133154] [<c12582b8>] ? trace_hardirqs_off_thunk+0xc/0x14
kernel: [ 1917.133154] [<c1003e36>] apic_timer_interrupt+0x36/0x3c
kernel: [ 1917.133154] [<c141ad61>] ? _read_unlock_irqrestore+0x41/0x70
kernel: [ 1917.133154] [<c10f7415>] find_and_get_object+0x75/0xe0
kernel: [ 1917.133154] [<c10f73a0>] ? find_and_get_object+0x0/0xe0
kernel: [ 1917.133154] [<c10f7577>] scan_block+0x87/0x110
kernel: [ 1917.133154] [<c10f7880>] kmemleak_scan+0x280/0x420
kernel: [ 1917.133154] [<c10f7600>] ? kmemleak_scan+0x0/0x420
kernel: [ 1917.133154] [<c10f80b0>] ? kmemleak_scan_thread+0x0/0xc0
kernel: [ 1917.133154] [<c10f8100>] kmemleak_scan_thread+0x50/0xc0
kernel: [ 1917.133154] [<c105ff54>] kthread+0x84/0x90
kernel: [ 1917.133154] [<c105fed0>] ? kthread+0x0/0x90
kernel: [ 1917.133154] [<c100401b>] kernel_thread_helper+0x7/0x1c


static struct kmemleak_object *find_and_get_object(unsigned long ptr, int alias)
{
unsigned long flags;
struct kmemleak_object *object = NULL;

rcu_read_lock();
read_lock_irqsave(&kmemleak_lock, flags);
if (ptr >= min_addr && ptr < max_addr)
object = lookup_object(ptr, alias);
>> read_unlock_irqrestore(&kmemleak_lock, flags);

/* check whether the object is still available */
if (object && !get_object(object))
object = NULL;
rcu_read_unlock();

return object;
}

I'm not sure where this is kmemleak's problem, since with 31 I see lots of 'not tainted' reports on my laptop.

Sergey


Attachments:
(No filename) (2.27 kB)
signature.asc (315.00 B)
Digital signature
Download all attachments

2009-07-07 12:48:11

by Catalin Marinas

[permalink] [raw]
Subject: Re: kmemleak not tainted

On Tue, 2009-07-07 at 14:51 +0300, Sergey Senozhatsky wrote:
> kernel: [ 1917.133154] INFO: RCU detected CPU 0 stall (t=485140/3000 jiffies)

That's the relevant message. With CONFIG_RCU_CPU_STALL_DETECTOR you may
get these messages.

> static struct kmemleak_object *find_and_get_object(unsigned long ptr, int alias)
> {
> unsigned long flags;
> struct kmemleak_object *object = NULL;
>
> rcu_read_lock();
> read_lock_irqsave(&kmemleak_lock, flags);
> if (ptr >= min_addr && ptr < max_addr)
> object = lookup_object(ptr, alias);
> >> read_unlock_irqrestore(&kmemleak_lock, flags);
>
> /* check whether the object is still available */
> if (object && !get_object(object))
> object = NULL;
> rcu_read_unlock();
>
> return object;
> }

It just happened here because that's where the interrupts were enabled
and the timer routine invoked. The rcu-locked region above should be
pretty short (just a tree look-up).

What I think happens is that the kmemleak thread runs for several
seconds for scanning the memory and there may not be any context
switches. I have a patch to add more cond_resched() calls throughout the
kmemleak_scan() function which I hope will get merged. I don't get any
of these messages with CONFIG_PREEMPT enabled.

--
Catalin

2009-07-07 13:15:44

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: kmemleak not tainted

> On Tue, 2009-07-07 at 14:51 +0300, Sergey Senozhatsky wrote:
> > kernel: [ 1917.133154] INFO: RCU detected CPU 0 stall (t=485140/3000 jiffies)
>
> That's the relevant message. With CONFIG_RCU_CPU_STALL_DETECTOR you may
> get these messages.
>
> > static struct kmemleak_object *find_and_get_object(unsigned long ptr, int alias)
> > {
> > unsigned long flags;
> > struct kmemleak_object *object = NULL;
> >
> > rcu_read_lock();
> > read_lock_irqsave(&kmemleak_lock, flags);
> > if (ptr >= min_addr && ptr < max_addr)
> > object = lookup_object(ptr, alias);
> > >> read_unlock_irqrestore(&kmemleak_lock, flags);
> >
> > /* check whether the object is still available */
> > if (object && !get_object(object))
> > object = NULL;
> > rcu_read_unlock();
> >
> > return object;
> > }
>
> It just happened here because that's where the interrupts were enabled
> and the timer routine invoked. The rcu-locked region above should be
> pretty short (just a tree look-up).
>
> What I think happens is that the kmemleak thread runs for several
> seconds for scanning the memory and there may not be any context
> switches. I have a patch to add more cond_resched() calls throughout the
> kmemleak_scan() function which I hope will get merged.

Cc me please.
Thanks.


> I don't get any of these messages with CONFIG_PREEMPT enabled.
>

It started with rc2-git1 (may be). Almost every scan ends with RCU pending.

[ 1917.133154] INFO: RCU detected CPU 0 stall (t=485140/3000 jiffies)
[ 1917.133154] Pid: 511, comm: kmemleak Not tainted 2.6.31-rc2-nv-git1-00254-ga4e2f0a-dirty #2
[ 1917.133154] Call Trace:
[ 1917.133154] [<c141784d>] ? printk+0x23/0x36
[ 1917.133154] [<c10a81e0>] __rcu_pending+0x140/0x210
[ 1917.133154] [<c10a82da>] rcu_pending+0x2a/0x70
[ 1917.133154] [<c1051b8f>] update_process_times+0x3f/0x80
[ 1917.133154] [<c107148f>] tick_sched_timer+0x6f/0xf0
[ 1917.133154] [<c10640a6>] __run_hrtimer+0x56/0xe0
[ 1917.133154] [<c1071420>] ? tick_sched_timer+0x0/0xf0
[ 1917.133154] [<c1071420>] ? tick_sched_timer+0x0/0xf0
[ 1917.133154] [<c1064aa5>] hrtimer_interrupt+0x145/0x270
[ 1917.133154] [<c101c48c>] smp_apic_timer_interrupt+0x5c/0xb0
[ 1917.133154] [<c12582b8>] ? trace_hardirqs_off_thunk+0xc/0x14
[ 1917.133154] [<c1003e36>] apic_timer_interrupt+0x36/0x3c
[ 1917.133154] [<c141ad61>] ? _read_unlock_irqrestore+0x41/0x70
[ 1917.133154] [<c10f7415>] find_and_get_object+0x75/0xe0
[ 1917.133154] [<c10f73a0>] ? find_and_get_object+0x0/0xe0
[ 1917.133154] [<c10f7577>] scan_block+0x87/0x110
[ 1917.133154] [<c10f7880>] kmemleak_scan+0x280/0x420
[ 1917.133154] [<c10f7600>] ? kmemleak_scan+0x0/0x420
[ 1917.133154] [<c10f80b0>] ? kmemleak_scan_thread+0x0/0xc0
[ 1917.133154] [<c10f8100>] kmemleak_scan_thread+0x50/0xc0
[ 1917.133154] [<c105ff54>] kthread+0x84/0x90
[ 1917.133154] [<c105fed0>] ? kthread+0x0/0x90
[ 1917.133154] [<c100401b>] kernel_thread_helper+0x7/0x1c
[ 1979.742347] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
[ 2589.860586] INFO: RCU detected CPU 1 stall (t=686958/3000 jiffies)
[ 2589.860586] Pid: 511, comm: kmemleak Not tainted 2.6.31-rc2-nv-git1-00254-ga4e2f0a-dirty #2
[ 2589.860586] Call Trace:
[ 2589.860586] [<c141784d>] ? printk+0x23/0x36
[ 2589.860586] [<c10a81e0>] __rcu_pending+0x140/0x210
[ 2589.860586] [<c10a82da>] rcu_pending+0x2a/0x70
[ 2589.860586] [<c1051b8f>] update_process_times+0x3f/0x80
[ 2589.860586] [<c107148f>] tick_sched_timer+0x6f/0xf0
[ 2589.860586] [<c10640a6>] __run_hrtimer+0x56/0xe0
[ 2589.860586] [<c1071420>] ? tick_sched_timer+0x0/0xf0
[ 2589.860586] [<c1071420>] ? tick_sched_timer+0x0/0xf0
[ 2589.860586] [<c1064aa5>] hrtimer_interrupt+0x145/0x270
[ 2589.860586] [<c104b6c8>] ? _local_bh_enable+0x68/0xd0
[ 2589.860586] [<c101c48c>] smp_apic_timer_interrupt+0x5c/0xb0
[ 2589.860586] [<c12582b8>] ? trace_hardirqs_off_thunk+0xc/0x14
[ 2589.860586] [<c1003e36>] apic_timer_interrupt+0x36/0x3c
[ 2589.860586] [<c10795d5>] ? lock_acquire+0xb5/0x120
[ 2589.860586] [<c10f73a0>] ? find_and_get_object+0x0/0xe0
[ 2589.860586] [<c10f73a0>] ? find_and_get_object+0x0/0xe0
[ 2589.860586] [<c10f73eb>] find_and_get_object+0x4b/0xe0
[ 2589.860586] [<c10f73a0>] ? find_and_get_object+0x0/0xe0
[ 2589.860586] [<c10f7577>] scan_block+0x87/0x110
[ 2589.860586] [<c10f7880>] kmemleak_scan+0x280/0x420
[ 2589.860586] [<c10f7600>] ? kmemleak_scan+0x0/0x420
[ 2589.860586] [<c10f80b0>] ? kmemleak_scan_thread+0x0/0xc0
[ 2589.860586] [<c10f8100>] kmemleak_scan_thread+0x50/0xc0
[ 2589.860586] [<c105ff54>] kthread+0x84/0x90
[ 2589.860586] [<c105fed0>] ? kthread+0x0/0x90
[ 2589.860586] [<c100401b>] kernel_thread_helper+0x7/0x1c
[ 3089.007168] r8169: eth1: link down
[ 3245.897188] INFO: RCU detected CPU 1 stall (t=883769/3000 jiffies)
[ 3245.897188] Pid: 511, comm: kmemleak Not tainted 2.6.31-rc2-nv-git1-00254-ga4e2f0a-dirty #2
[ 3245.897188] Call Trace:
[ 3245.897188] [<c141784d>] ? printk+0x23/0x36
[ 3245.897188] [<c10a81e0>] __rcu_pending+0x140/0x210
[ 3245.897188] [<c10a82da>] rcu_pending+0x2a/0x70
[ 3245.897188] [<c1051b8f>] update_process_times+0x3f/0x80
[ 3245.897188] [<c107148f>] tick_sched_timer+0x6f/0xf0
[ 3245.897188] [<c10640a6>] __run_hrtimer+0x56/0xe0
[ 3245.897188] [<c1071420>] ? tick_sched_timer+0x0/0xf0
[ 3245.897188] [<c1071420>] ? tick_sched_timer+0x0/0xf0
[ 3245.897188] [<c1064aa5>] hrtimer_interrupt+0x145/0x270
[ 3245.897188] [<c101c48c>] smp_apic_timer_interrupt+0x5c/0xb0
[ 3245.897188] [<c12582b8>] ? trace_hardirqs_off_thunk+0xc/0x14
[ 3245.897188] [<c1003e36>] apic_timer_interrupt+0x36/0x3c
[ 3245.897188] [<c141ad61>] ? _read_unlock_irqrestore+0x41/0x70
[ 3245.897188] [<c10f745f>] find_and_get_object+0xbf/0xe0
[ 3245.897188] [<c10f73a0>] ? find_and_get_object+0x0/0xe0
[ 3245.897188] [<c10f7577>] scan_block+0x87/0x110
[ 3245.897188] [<c10f7880>] kmemleak_scan+0x280/0x420
[ 3245.897188] [<c10f7600>] ? kmemleak_scan+0x0/0x420
[ 3245.897188] [<c10f80b0>] ? kmemleak_scan_thread+0x0/0xc0
[ 3245.897188] [<c10f8100>] kmemleak_scan_thread+0x50/0xc0
[ 3245.897188] [<c105ff54>] kthread+0x84/0x90
[ 3245.897188] [<c105fed0>] ? kthread+0x0/0x90
[ 3245.897188] [<c100401b>] kernel_thread_helper+0x7/0x1c
[ 3290.435108] kmemleak: 5 new suspected memory leaks (see /sys/kernel/debug/kmemleak)


Hm.. Something is broken...
cat /.../kmemleak
[ 7933.537868] ================================================
[ 7933.537873] [ BUG: lock held when returning to user space! ]
[ 7933.537876] ------------------------------------------------
[ 7933.537880] cat/2897 is leaving the kernel with locks still held!
[ 7933.537884] 1 lock held by cat/2897:
[ 7933.537887] #0: (scan_mutex){+.+.+.}, at: [<c10f717c>] kmemleak_open+0x4c/0x80


> --
> Catalin
>

Sergey


Attachments:
(No filename) (6.60 kB)
signature.asc (315.00 B)
Digital signature
Download all attachments

2009-07-07 13:22:59

by Catalin Marinas

[permalink] [raw]
Subject: Re: kmemleak not tainted

On Tue, 2009-07-07 at 16:17 +0300, Sergey Senozhatsky wrote:
> > On Tue, 2009-07-07 at 14:51 +0300, Sergey Senozhatsky wrote:
> > > kernel: [ 1917.133154] INFO: RCU detected CPU 0 stall (t=485140/3000 jiffies)
[...]
> > What I think happens is that the kmemleak thread runs for several
> > seconds for scanning the memory and there may not be any context
> > switches. I have a patch to add more cond_resched() calls throughout the
> > kmemleak_scan() function which I hope will get merged.
[...]
> > I don't get any of these messages with CONFIG_PREEMPT enabled.
>
> It started with rc2-git1 (may be). Almost every scan ends with RCU pending.

Should I assume that CONFIG_PREEMPT is disabled on your system?

The branch with the pending kmemleak patches is below (I sent Linus a
pull request):

http://www.linux-arm.org/git?p=linux-2.6.git;a=shortlog;h=kmemleak

You can try the "Add more cond_resched() calls..." patch and see if it
makes any difference.

> Hm.. Something is broken...
> cat /.../kmemleak
> [ 7933.537868] ================================================
> [ 7933.537873] [ BUG: lock held when returning to user space! ]
> [ 7933.537876] ------------------------------------------------
> [ 7933.537880] cat/2897 is leaving the kernel with locks still held!
> [ 7933.537884] 1 lock held by cat/2897:
> [ 7933.537887] #0: (scan_mutex){+.+.+.}, at: [<c10f717c>] kmemleak_open+0x4c/0x80

That the "Do not acquire scan_mutex in kmemleak_open()" patch in the
same branch.

--
Catalin

2009-07-07 13:43:28

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: kmemleak not tainted

On (07/07/09 14:22), Catalin Marinas wrote:
> > > > kernel: [ 1917.133154] INFO: RCU detected CPU 0 stall (t=485140/3000 jiffies)
> [...]
> > > What I think happens is that the kmemleak thread runs for several
> > > seconds for scanning the memory and there may not be any context
> > > switches. I have a patch to add more cond_resched() calls throughout the
> > > kmemleak_scan() function which I hope will get merged.
> [...]
> > > I don't get any of these messages with CONFIG_PREEMPT enabled.
> >
> > It started with rc2-git1 (may be). Almost every scan ends with RCU pending.
>
> Should I assume that CONFIG_PREEMPT is disabled on your system?
Yeah, sorry.
# CONFIG_PREEMPT is not set
/* My .config is in attach. */

>
> The branch with the pending kmemleak patches is below (I sent Linus a
> pull request):
>
> http://www.linux-arm.org/git?p=linux-2.6.git;a=shortlog;h=kmemleak
>
> You can try the "Add more cond_resched() calls..." patch and see if it
> makes any difference.
>
Ok. I'll try.

> > Hm.. Something is broken...
> > cat /.../kmemleak
> > [ 7933.537868] ================================================
> > [ 7933.537873] [ BUG: lock held when returning to user space! ]
> > [ 7933.537876] ------------------------------------------------
> > [ 7933.537880] cat/2897 is leaving the kernel with locks still held!
> > [ 7933.537884] 1 lock held by cat/2897:
> > [ 7933.537887] #0: (scan_mutex){+.+.+.}, at: [<c10f717c>] kmemleak_open+0x4c/0x80
>
> That the "Do not acquire scan_mutex in kmemleak_open()" patch in the
> same branch.
>
> --
> Catalin
>


Thanks,
Sergey


Attachments:
(No filename) (0.00 B)
signature.asc (315.00 B)
Digital signature
Download all attachments