2011-03-02 20:19:55

by Nadolski, Edmund

[permalink] [raw]
Subject: inconsistent lock state message on 2.6.36.4 stable release

With v2.6.36.4 stable release and CONFIG_PROVE_LOCKING set, I notice the below message during boot (full dmesg attached):

...
[ 7.201343] =================================
[ 7.201344] [ INFO: inconsistent lock state ]
[ 7.201345] 2.6.36.4 #2
[ 7.201346] ---------------------------------
[ 7.201347] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[ 7.201349] kswapd1/94 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 7.201350] (cpu_hotplug.lock){+.+.?.}, at: [<ffffffff81061f4e>] get_online_cpus+0x2e/0x50
[ 7.201356] {RECLAIM_FS-ON-W} state was registered at:
[ 7.201357] [<ffffffff81097b8c>] mark_held_locks+0x6c/0xa0
[ 7.201361] [<ffffffff81097c61>] lockdep_trace_alloc+0xa1/0xe0
[ 7.201363] [<ffffffff811433f1>] kmem_cache_alloc+0x41/0x180
[ 7.201366] [<ffffffff812c0b60>] idr_pre_get+0x60/0x90
[ 7.201369] [<ffffffff812c0bb3>] ida_pre_get+0x23/0xe0
[ 7.201371] [<ffffffff8107d435>] create_worker+0x55/0x190
[ 7.201374] [<ffffffff8156972a>] workqueue_cpu_callback+0xb9/0x232
[ 7.201377] [<ffffffff81582b46>] notifier_call_chain+0x56/0x80
[ 7.201381] [<ffffffff8108828e>] __raw_notifier_call_chain+0xe/0x10
[ 7.201384] [<ffffffff81061dc0>] __cpu_notify+0x20/0x40
[ 7.201385] [<ffffffff815787f3>] _cpu_up+0x73/0x113
[ 7.201388] [<ffffffff815788e4>] cpu_up+0x51/0x61
[ 7.201389] [<ffffffff81acf6d7>] kernel_init+0xc9/0x1e8
[ 7.201393] [<ffffffff8100bf24>] kernel_thread_helper+0x4/0x10
[ 7.201396] irq event stamp: 33
[ 7.201396] hardirqs last enabled at (33): [<ffffffff8157e9d0>] _raw_spin_unlock_irqrestore+0x40/0x70
[ 7.201400] hardirqs last disabled at (32): [<ffffffff8157e2ce>] _raw_spin_lock_irqsave+0x2e/0x70
[ 7.201402] softirqs last enabled at (0): [<ffffffff8105e4f3>] copy_process+0x5c3/0x1310
[ 7.201404] softirqs last disabled at (0): [<(null)>] (null)
[ 7.201405]
[ 7.201405] other info that might help us debug this:
[ 7.201406] no locks held by kswapd1/94.
[ 7.201407]
[ 7.201407] stack backtrace:
[ 7.201409] Pid: 94, comm: kswapd1 Not tainted 2.6.36.4 #2
[ 7.201410] Call Trace:
[ 7.201412] [<ffffffff81096c2a>] print_usage_bug+0x18a/0x190
[ 7.201415] [<ffffffff81018b0f>] ? save_stack_trace+0x2f/0x50
[ 7.201417] [<ffffffff81096f90>] ? check_usage_forwards+0x0/0x100
[ 7.201419] [<ffffffff81097a54>] mark_lock+0x344/0x410
[ 7.201421] [<ffffffff810985e1>] __lock_acquire+0x431/0x14c0
[ 7.201424] [<ffffffff8106f67c>] ? lock_timer_base+0x3c/0x70
[ 7.201427] [<ffffffff812cf884>] ? do_raw_spin_lock+0x54/0x150
[ 7.201429] [<ffffffff81099710>] lock_acquire+0xa0/0x150
[ 7.201430] [<ffffffff81061f4e>] ? get_online_cpus+0x2e/0x50
[ 7.201432] [<ffffffff8157c8d7>] __mutex_lock_common+0x47/0x3d0
[ 7.201434] [<ffffffff81061f4e>] ? get_online_cpus+0x2e/0x50
[ 7.201436] [<ffffffff81061f4e>] ? get_online_cpus+0x2e/0x50
[ 7.201438] [<ffffffff8108290a>] ? prepare_to_wait+0x2a/0x90
[ 7.201440] [<ffffffff812cf884>] ? do_raw_spin_lock+0x54/0x150
[ 7.201442] [<ffffffff812cf884>] ? do_raw_spin_lock+0x54/0x150
[ 7.201444] [<ffffffff8157e9d0>] ? _raw_spin_unlock_irqrestore+0x40/0x70
[ 7.201445] [<ffffffff8157cd15>] mutex_lock_nested+0x35/0x40
[ 7.201447] [<ffffffff81061f4e>] get_online_cpus+0x2e/0x50
[ 7.201450] [<ffffffff81118d8e>] restore_pgdat_percpu_threshold+0x1e/0x110
[ 7.201453] [<ffffffff8111051f>] kswapd+0x34f/0x410
[ 7.201455] [<ffffffff81082690>] ? autoremove_wake_function+0x0/0x40
[ 7.201456] [<ffffffff811101d0>] ? kswapd+0x0/0x410
[ 7.201458] [<ffffffff81082126>] kthread+0xa6/0xb0
[ 7.201460] [<ffffffff8100bf24>] kernel_thread_helper+0x4/0x10
[ 7.201462] [<ffffffff8157eed0>] ? restore_args+0x0/0x30
[ 7.201464] [<ffffffff81082080>] ? kthread+0x0/0xb0
[ 7.201465] [<ffffffff8100bf20>] ? kernel_thread_helper+0x0/0x10
[ 7.582914] pcieport 0000:00:1c.5: setting latency timer to 64
[ 7.589538] alloc irq_desc for 69 on node -1
[ 7.594577] alloc kstat_irqs on node -1
[ 7.599139] pcieport 0000:00:1c.5: irq 69 for MSI/MSI-X
...


Per git bisect, it seems to have originated with this commit:

$ git bisect good
c04eb9683fbb6374275309b859fcbf02e1db2c78 is the first bad commit
commit c04eb9683fbb6374275309b859fcbf02e1db2c78
Author: Mel Gorman <[email protected]>
Date: Thu Jan 13 15:45:41 2011 -0800

mm: page allocator: adjust the per-cpu counter threshold when memory is low

commit 88f5acf88ae6a9778f6d25d0d5d7ec2d57764a97 upstream.

The message goes away after I do a git revert on this commit. I've also seen this message on 2.6.37.2, but not on 2.6.38-rc7.

Regards,
Ed Nadolski
[email protected]







Attachments:
dmesg.2.6.36.4 (77.55 kB)
dmesg.2.6.36.4

2011-03-02 20:24:52

by Greg KH

[permalink] [raw]
Subject: Re: inconsistent lock state message on 2.6.36.4 stable release

On Wed, Mar 02, 2011 at 01:19:48PM -0700, Nadolski, Edmund wrote:
> With v2.6.36.4 stable release and CONFIG_PROVE_LOCKING set, I notice
> the below message during boot (full dmesg attached):

2.6.36 is now "end of life" so there's really no need to test, debug, or
report problems here as there's nothing we can do about it.

> Per git bisect, it seems to have originated with this commit:
>
> $ git bisect good
> c04eb9683fbb6374275309b859fcbf02e1db2c78 is the first bad commit
> commit c04eb9683fbb6374275309b859fcbf02e1db2c78
> Author: Mel Gorman <[email protected]>
> Date: Thu Jan 13 15:45:41 2011 -0800
>
> mm: page allocator: adjust the per-cpu counter threshold when memory is low
>
> commit 88f5acf88ae6a9778f6d25d0d5d7ec2d57764a97 upstream.
>
> The message goes away after I do a git revert on this commit. I've
> also seen this message on 2.6.37.2, but not on 2.6.38-rc7.

Care to figure out what commit fixed this so that we can backport it to
the .37-stable kernel?

git bisect can be used to do this as well.

thanks,

greg k-h

2011-03-02 21:04:10

by Mel Gorman

[permalink] [raw]
Subject: Re: inconsistent lock state message on 2.6.36.4 stable release

On Wed, Mar 02, 2011 at 03:25:15PM -0500, Greg KH wrote:
> On Wed, Mar 02, 2011 at 01:19:48PM -0700, Nadolski, Edmund wrote:
> > With v2.6.36.4 stable release and CONFIG_PROVE_LOCKING set, I notice
> > the below message during boot (full dmesg attached):
>
> 2.6.36 is now "end of life" so there's really no need to test, debug, or
> report problems here as there's nothing we can do about it.
>
> > Per git bisect, it seems to have originated with this commit:
> >
> > $ git bisect good
> > c04eb9683fbb6374275309b859fcbf02e1db2c78 is the first bad commit
> > commit c04eb9683fbb6374275309b859fcbf02e1db2c78
> > Author: Mel Gorman <[email protected]>
> > Date: Thu Jan 13 15:45:41 2011 -0800
> >
> > mm: page allocator: adjust the per-cpu counter threshold when memory is low
> >
> > commit 88f5acf88ae6a9778f6d25d0d5d7ec2d57764a97 upstream.
> >
> > The message goes away after I do a git revert on this commit. I've
> > also seen this message on 2.6.37.2, but not on 2.6.38-rc7.
>
> Care to figure out what commit fixed this so that we can backport it to
> the .37-stable kernel?
>
> git bisect can be used to do this as well.
>

It looks like the call to get_online_cpus() that is causing the problem. If
so, I believe it to be fixed by the commit [b44129b3: mm: vmstat: use a
single setter function and callback for adjusting percpu thresholds].
Edmund, care to test with that patch applied?

Thanks.

--
Mel Gorman
SUSE Labs

2011-03-02 23:23:13

by Nadolski, Edmund

[permalink] [raw]
Subject: RE: inconsistent lock state message on 2.6.36.4 stable release

> -----Original Message-----
> From: Mel Gorman [mailto:[email protected]]
> Sent: Wednesday, March 02, 2011 2:04 PM
> To: Greg KH
> Cc: Nadolski, Edmund; [email protected]; Williams, Dan J
> Subject: Re: inconsistent lock state message on 2.6.36.4 stable release
>
> On Wed, Mar 02, 2011 at 03:25:15PM -0500, Greg KH wrote:
> > On Wed, Mar 02, 2011 at 01:19:48PM -0700, Nadolski, Edmund wrote:
> > > With v2.6.36.4 stable release and CONFIG_PROVE_LOCKING set, I
> notice
> > > the below message during boot (full dmesg attached):
> >
> > 2.6.36 is now "end of life" so there's really no need to test, debug,
> or
> > report problems here as there's nothing we can do about it.
> >
> > > Per git bisect, it seems to have originated with this commit:
> > >
> > > $ git bisect good
> > > c04eb9683fbb6374275309b859fcbf02e1db2c78 is the first bad commit
> > > commit c04eb9683fbb6374275309b859fcbf02e1db2c78
> > > Author: Mel Gorman <[email protected]>
> > > Date: Thu Jan 13 15:45:41 2011 -0800
> > >
> > > mm: page allocator: adjust the per-cpu counter threshold when
> memory is low
> > >
> > > commit 88f5acf88ae6a9778f6d25d0d5d7ec2d57764a97 upstream.
> > >
> > > The message goes away after I do a git revert on this commit. I've
> > > also seen this message on 2.6.37.2, but not on 2.6.38-rc7.
> >
> > Care to figure out what commit fixed this so that we can backport it
> to
> > the .37-stable kernel?
> >
> > git bisect can be used to do this as well.
> >
>
> It looks like the call to get_online_cpus() that is causing the
> problem. If
> so, I believe it to be fixed by the commit [b44129b3: mm: vmstat: use a
> single setter function and callback for adjusting percpu thresholds].
> Edmund, care to test with that patch applied?

This patch looks good, I've applied it onto 2.6.37.2 and I no longer see the message.

Thanks!
Ed

2011-03-03 10:54:53

by Mel Gorman

[permalink] [raw]
Subject: Re: inconsistent lock state message on 2.6.36.4 stable release

On Wed, Mar 02, 2011 at 04:22:53PM -0700, Nadolski, Edmund wrote:
> >
> > On Wed, Mar 02, 2011 at 03:25:15PM -0500, Greg KH wrote:
> > > On Wed, Mar 02, 2011 at 01:19:48PM -0700, Nadolski, Edmund wrote:
> > > > With v2.6.36.4 stable release and CONFIG_PROVE_LOCKING set, I
> > notice
> > > > the below message during boot (full dmesg attached):
> > >
> > > 2.6.36 is now "end of life" so there's really no need to test, debug,
> > or
> > > report problems here as there's nothing we can do about it.
> > >
> > > > Per git bisect, it seems to have originated with this commit:
> > > >
> > > > $ git bisect good
> > > > c04eb9683fbb6374275309b859fcbf02e1db2c78 is the first bad commit
> > > > commit c04eb9683fbb6374275309b859fcbf02e1db2c78
> > > > Author: Mel Gorman <[email protected]>
> > > > Date: Thu Jan 13 15:45:41 2011 -0800
> > > >
> > > > mm: page allocator: adjust the per-cpu counter threshold when
> > > > memory is low
> > > >
> > > > commit 88f5acf88ae6a9778f6d25d0d5d7ec2d57764a97 upstream.
> > > >
> > > > The message goes away after I do a git revert on this commit. I've
> > > > also seen this message on 2.6.37.2, but not on 2.6.38-rc7.
> > >
> > > Care to figure out what commit fixed this so that we can backport it
> > to
> > > the .37-stable kernel?
> > >
> > > git bisect can be used to do this as well.
> > >
> >
> > It looks like the call to get_online_cpus() that is causing the
> > problem. If
> > so, I believe it to be fixed by the commit [b44129b3: mm: vmstat: use a
> > single setter function and callback for adjusting percpu thresholds].
> > Edmund, care to test with that patch applied?
>
> This patch looks good, I've applied it onto 2.6.37.2 and I no longer see the message.
>

Good stuff, thanks.

I'll send the patch to the original reviewers and cc stable to ask for
it to be picked up.

--
Mel Gorman
SUSE Labs