LinuxLists.cc - sched: spinlock recursion in sched_rr_get

2014-07-06 17:28:14

Subject: sched: spinlock recursion in sched_rr_get_interval

Hi all,

While fuzzing with trinity inside a KVM tools guest running the latest -next
kernel I've stumbled on the following spew:

[10062.200152] BUG: spinlock recursion on CPU#11, trinity-c194/2414
[10062.201897] lock: 0xffff88010cfd7740, .magic: dead4ead, .owner: trinity-c194/2414, .owner_cpu: -1
[10062.204432] CPU: 11 PID: 2414 Comm: trinity-c194 Not tainted 3.16.0-rc3-next-20140703-sasha-00024-g2ad7668-dirty #763
[10062.207517] ffff88010cfd7740 ffff8803a429fe48 ffffffffaa4897e4 0000000000000000
[10062.209810] ffff8803c35f0000 ffff8803a429fe68 ffffffffaa47df58 ffff88010cfd7740
[10062.210024] ffffffffab845c77 ffff8803a429fe88 ffffffffaa47df83 ffff88010cfd7740
[10062.210024] Call Trace:
[10062.210024] dump_stack (lib/dump_stack.c:52)
[10062.210024] spin_dump (kernel/locking/spinlock_debug.c:68 (discriminator 6))
[10062.210024] spin_bug (kernel/locking/spinlock_debug.c:76)
[10062.210024] do_raw_spin_lock (kernel/locking/spinlock_debug.c:84 kernel/locking/spinlock_debug.c:135)
[10062.210024] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[10062.210024] ? task_rq_lock (include/linux/sched.h:2885 kernel/sched/core.c:348)
[10062.210024] task_rq_lock (include/linux/sched.h:2885 kernel/sched/core.c:348)
[10062.210024] SyS_sched_rr_get_interval (kernel/sched/core.c:4429 kernel/sched/core.c:4407)
[10062.210024] ? tracesys (arch/x86/kernel/entry_64.S:531)
[10062.210024] tracesys (arch/x86/kernel/entry_64.S:542)

Thanks,
Sasha

2014-07-07 08:30:33

by Peter Zijlstra

[permalink] [raw]

Subject: Re: sched: spinlock recursion in sched_rr_get_interval

On Sun, Jul 06, 2014 at 01:27:37PM -0400, Sasha Levin wrote:
> Hi all,
>
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel I've stumbled on the following spew:
>
> [10062.200152] BUG: spinlock recursion on CPU#11, trinity-c194/2414
> [10062.201897] lock: 0xffff88010cfd7740, .magic: dead4ead, .owner: trinity-c194/2414, .owner_cpu: -1
> [10062.204432] CPU: 11 PID: 2414 Comm: trinity-c194 Not tainted 3.16.0-rc3-next-20140703-sasha-00024-g2ad7668-dirty #763
> [10062.207517] ffff88010cfd7740 ffff8803a429fe48 ffffffffaa4897e4 0000000000000000
> [10062.209810] ffff8803c35f0000 ffff8803a429fe68 ffffffffaa47df58 ffff88010cfd7740
> [10062.210024] ffffffffab845c77 ffff8803a429fe88 ffffffffaa47df83 ffff88010cfd7740
> [10062.210024] Call Trace:
> [10062.210024] dump_stack (lib/dump_stack.c:52)
> [10062.210024] spin_dump (kernel/locking/spinlock_debug.c:68 (discriminator 6))
> [10062.210024] spin_bug (kernel/locking/spinlock_debug.c:76)
> [10062.210024] do_raw_spin_lock (kernel/locking/spinlock_debug.c:84 kernel/locking/spinlock_debug.c:135)
> [10062.210024] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
> [10062.210024] ? task_rq_lock (include/linux/sched.h:2885 kernel/sched/core.c:348)
> [10062.210024] task_rq_lock (include/linux/sched.h:2885 kernel/sched/core.c:348)
> [10062.210024] SyS_sched_rr_get_interval (kernel/sched/core.c:4429 kernel/sched/core.c:4407)
> [10062.210024] ? tracesys (arch/x86/kernel/entry_64.S:531)
> [10062.210024] tracesys (arch/x86/kernel/entry_64.S:542)

Cute, but I'm utterly failing to see how that happened :/

All we should be holding there is the rcu_read_lock(), task_rq_lock()
there will take ->pi_lock and then rq->lock.

Maybe after more tea, when the brain wakes up more or so.. weird.

Attachments:

(No filename) (1.77 kB)
(No filename) (836.00 B)
Download all attachments

2014-07-07 13:56:27

by Sasha Levin

[permalink] [raw]

Subject: Re: sched: spinlock recursion in sched_rr_get_interval

On 07/07/2014 04:30 AM, Peter Zijlstra wrote:
> On Sun, Jul 06, 2014 at 01:27:37PM -0400, Sasha Levin wrote:
>> Hi all,
>>
>> While fuzzing with trinity inside a KVM tools guest running the latest -next kernel I've stumbled on the following spew:
>>
>> [10062.200152] BUG: spinlock recursion on CPU#11, trinity-c194/2414 [10062.201897] lock: 0xffff88010cfd7740, .magic: dead4ead, .owner: trinity-c194/2414, .owner_cpu: -1 [10062.204432] CPU: 11 PID: 2414 Comm: trinity-c194 Not tainted 3.16.0-rc3-next-20140703-sasha-00024-g2ad7668-dirty #763 [10062.207517] ffff88010cfd7740 ffff8803a429fe48 ffffffffaa4897e4 0000000000000000 [10062.209810] ffff8803c35f0000 ffff8803a429fe68 ffffffffaa47df58 ffff88010cfd7740 [10062.210024] ffffffffab845c77 ffff8803a429fe88 ffffffffaa47df83 ffff88010cfd7740 [10062.210024] Call Trace: [10062.210024] dump_stack (lib/dump_stack.c:52) [10062.210024] spin_dump (kernel/locking/spinlock_debug.c:68 (discriminator 6)) [10062.210024] spin_bug (kernel/locking/spinlock_debug.c:76) [10062.210024] do_raw_spin_lock (kernel/locking/spinlock_debug.c:84 kernel/locking/spinlock_debug.c:135) [10062.210024] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) [10062.210024] ? task_r!
q_lock (in
clude/linux/sched.h:2885 kernel/sched/core.c:348) [10062.210024] task_rq_lock (include/linux/sched.h:2885 kernel/sched/core.c:348) [10062.210024] SyS_sched_rr_get_interval (kernel/sched/core.c:4429 kernel/sched/core.c:4407) [10062.210024] ? tracesys (arch/x86/kernel/entry_64.S:531) [10062.210024] tracesys (arch/x86/kernel/entry_64.S:542)
>
> Cute, but I'm utterly failing to see how that happened :/
>
> All we should be holding there is the rcu_read_lock(), task_rq_lock() there will take ->pi_lock and then rq->lock.
>
> Maybe after more tea, when the brain wakes up more or so.. weird.
>

I've also had this one, which looks similar:

[10375.005884] BUG: spinlock recursion on CPU#0, modprobe/10965
[10375.006573] lock: 0xffff8803a0fd7740, .magic: dead4ead, .owner: modprobe/10965, .owner_cpu: 15
[10375.007412] CPU: 0 PID: 10965 Comm: modprobe Tainted: G W 3.16.0-rc3-next-20140704-sasha-00023-g26c0906-dirty #765
[10375.008591] ffff8803a0fd7740 ffff88015507f6d8 ffffffff9d48f5f4 0000000000000000
[10375.009660] ffff880122f90000 ffff88015507f6f8 ffffffff9d483d68 ffff8803a0fd7740
[10375.010323] ffffffff9e8476a1 ffff88015507f718 ffffffff9d483d93 ffff8803a0fd7740
[10375.010323] Call Trace:
[10375.010323] dump_stack (lib/dump_stack.c:52)
[10375.010323] spin_dump (kernel/locking/spinlock_debug.c:68 (discriminator 6))
[10375.010323] spin_bug (kernel/locking/spinlock_debug.c:76)
[10375.010323] do_raw_spin_lock (kernel/locking/spinlock_debug.c:84 kernel/locking/spinlock_debug.c:135)
[10375.010323] _raw_spin_lock_nested (kernel/locking/spinlock.c:362 (discriminator 2))
[10375.010323] ? load_balance (kernel/sched/sched.h:1427 kernel/sched/fair.c:6556)
[10375.010323] ? _raw_spin_lock (kernel/locking/spinlock.c:152)
[10375.010323] ? load_balance (kernel/sched/sched.h:1427 kernel/sched/fair.c:6556)
[10375.010323] load_balance (kernel/sched/sched.h:1427 kernel/sched/fair.c:6556)
[10375.010323] ? put_lock_stats.isra.12 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
[10375.010323] ? _raw_spin_unlock_irqrestore (include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:191)
[10375.010323] pick_next_task_fair (kernel/sched/fair.c:6808 kernel/sched/fair.c:4851)
[10375.010323] __schedule (kernel/sched/core.c:2675 kernel/sched/core.c:2789)
[10375.010323] schedule (kernel/sched/core.c:2836)
[10375.010323] p9_client_rpc (net/9p/client.c:756 (discriminator 9))
[10375.010323] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/paravirt.h:809 include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:191)
[10375.010323] ? bit_waitqueue (kernel/sched/wait.c:291)
[10375.010323] p9_client_walk (net/9p/client.c:1188 (discriminator 3))
[10375.010323] v9fs_vfs_lookup (fs/9p/vfs_inode.c:808)
[10375.010323] ? d_alloc (fs/dcache.c:1475)
[10375.010323] lookup_real (fs/namei.c:1332)
[10375.010323] __lookup_hash (fs/namei.c:1350)
[10375.010323] lookup_slow (fs/namei.c:1460)
[10375.010323] link_path_walk (fs/namei.c:1540 fs/namei.c:1796)
[10375.010323] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
[10375.010323] path_lookupat (fs/namei.c:1943)
[10375.010323] ? kmem_cache_alloc (include/linux/kmemleak.h:44 mm/slub.c:1256 mm/slub.c:2451 mm/slub.c:2459 mm/slub.c:2464)
[10375.010323] ? getname_flags (fs/namei.c:145)
[10375.010323] ? getname_flags (fs/namei.c:145)
[10375.010323] filename_lookup (fs/namei.c:1984)
[10375.010323] user_path_at_empty (fs/namei.c:2135)
[10375.010323] ? debug_smp_processor_id (lib/smp_processor_id.c:57)
[10375.010323] ? put_lock_stats.isra.12 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
[10375.010323] user_path_at (fs/namei.c:2146)
[10375.010323] vfs_fstatat (fs/stat.c:107)
[10375.010323] ? context_tracking_user_exit (./arch/x86/include/asm/paravirt.h:809 (discriminator 2) kernel/context_tracking.c:184 (discriminator 2))
[10375.010323] vfs_stat (fs/stat.c:124)
[10375.010323] SYSC_newstat (fs/stat.c:272)
[10375.010323] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
[10375.010323] ? syscall_trace_enter (include/linux/context_tracking.h:27 arch/x86/kernel/ptrace.c:1461)
[10375.010323] ? tracesys (arch/x86/kernel/entry_64.S:531)
[10375.010323] SyS_newstat (fs/stat.c:267)
[10375.010323] tracesys (arch/x86/kernel/entry_64.S:542)

Thanks,
Sasha

2014-07-07 20:06:01

by Peter Zijlstra

[permalink] [raw]

Subject: Re: sched: spinlock recursion in sched_rr_get_interval

On Mon, Jul 07, 2014 at 09:55:43AM -0400, Sasha Levin wrote:
> I've also had this one, which looks similar:
>
> [10375.005884] BUG: spinlock recursion on CPU#0, modprobe/10965
> [10375.006573] lock: 0xffff8803a0fd7740, .magic: dead4ead, .owner: modprobe/10965, .owner_cpu: 15
> [10375.007412] CPU: 0 PID: 10965 Comm: modprobe Tainted: G W 3.16.0-rc3-next-20140704-sasha-00023-g26c0906-dirty #765

Something's fucked; so we have:

debug_spin_lock_before()
SPIN_BUG_ON(lock->owner == current, "recursion");

Causing that, _HOWEVER_ look at .owner_cpu and the reporting cpu!! How
can the lock owner, own the lock on cpu 15 and again contend with it on
CPU 0. That's impossible.

About when-ish did you start seeing things like this? Lemme go stare
hard at recent changes.

Attachments:

(No filename) (784.00 B)
(No filename) (836.00 B)
Download all attachments

2014-07-07 22:48:27

by Sasha Levin

[permalink] [raw]

Subject: Re: sched: spinlock recursion in sched_rr_get_interval

On 07/07/2014 04:05 PM, Peter Zijlstra wrote:
> On Mon, Jul 07, 2014 at 09:55:43AM -0400, Sasha Levin wrote:
>> I've also had this one, which looks similar:
>>
>> [10375.005884] BUG: spinlock recursion on CPU#0, modprobe/10965 [10375.006573] lock: 0xffff8803a0fd7740, .magic: dead4ead, .owner: modprobe/10965, .owner_cpu: 15 [10375.007412] CPU: 0 PID: 10965 Comm: modprobe Tainted: G W 3.16.0-rc3-next-20140704-sasha-00023-g26c0906-dirty #765
>
> Something's fucked; so we have:
>
> debug_spin_lock_before() SPIN_BUG_ON(lock->owner == current, "recursion");
>
> Causing that, _HOWEVER_ look at .owner_cpu and the reporting cpu!! How can the lock owner, own the lock on cpu 15 and again contend with it on CPU 0. That's impossible.
>
> About when-ish did you start seeing things like this? Lemme go stare hard at recent changes.
>

~next-20140704 I guess, about when I reported the original issue.

Thanks,
Sasha

2014-07-28 23:08:57

by Sasha Levin

[permalink] [raw]

Subject: Re: sched: spinlock recursion in sched_rr_get_interval

On 07/07/2014 06:47 PM, Sasha Levin wrote:
> On 07/07/2014 04:05 PM, Peter Zijlstra wrote:
>> > On Mon, Jul 07, 2014 at 09:55:43AM -0400, Sasha Levin wrote:
>>> >> I've also had this one, which looks similar:
>>> >>
>>> >> [10375.005884] BUG: spinlock recursion on CPU#0, modprobe/10965 [10375.006573] lock: 0xffff8803a0fd7740, .magic: dead4ead, .owner: modprobe/10965, .owner_cpu: 15 [10375.007412] CPU: 0 PID: 10965 Comm: modprobe Tainted: G W 3.16.0-rc3-next-20140704-sasha-00023-g26c0906-dirty #765
>> >
>> > Something's fucked; so we have:
>> >
>> > debug_spin_lock_before() SPIN_BUG_ON(lock->owner == current, "recursion");
>> >
>> > Causing that, _HOWEVER_ look at .owner_cpu and the reporting cpu!! How can the lock owner, own the lock on cpu 15 and again contend with it on CPU 0. That's impossible.
>> >
>> > About when-ish did you start seeing things like this? Lemme go stare hard at recent changes.
>> >
> ~next-20140704 I guess, about when I reported the original issue.

Just wanted to add that this is still going on in -next:

[ 860.050433] BUG: spinlock recursion on CPU#33, trinity-subchil/21438
[ 860.051572] lock: 0xffff8805fee10080, .magic: dead4ead, .owner: trinity-subchil/21438, .owner_cpu: -1
[ 860.052943] CPU: 33 PID: 21438 Comm: trinity-subchil Not tainted 3.16.0-rc7-next-20140728-sasha-00029-ge067ff9 #976
[ 860.053998] ffff8805fee10080 ffff8805fe72bab0 ffffffffad464226 ffff8805ba163000
[ 860.054820] ffff8805fe72bad0 ffffffffaa1d7e76 ffff8805fee10080 ffffffffae88d599
[ 860.055641] ffff8805fe72baf0 ffffffffaa1d7ef6 ffff8805fee10080 ffff8805fee10080
[ 860.056485] Call Trace:
[ 860.056818] [<ffffffffad464226>] dump_stack+0x4e/0x7a
[ 860.057788] [<ffffffffaa1d7e76>] spin_dump+0x86/0xe0
[ 860.058620] [<ffffffffaa1d7ef6>] spin_bug+0x26/0x30
[ 860.059487] [<ffffffffaa1d80bf>] do_raw_spin_lock+0x14f/0x1b0
[ 860.060318] [<ffffffffad4a9f01>] _raw_spin_lock+0x61/0x80
[ 860.060318] [<ffffffffaa1b4832>] ? load_balance+0x3a2/0xa50
[ 860.060318] [<ffffffffaa1b4832>] load_balance+0x3a2/0xa50
[ 860.060318] [<ffffffffaa1b541f>] pick_next_task_fair+0x53f/0xb00
[ 860.060318] [<ffffffffaa1b5300>] ? pick_next_task_fair+0x420/0xb00
[ 860.060318] [<ffffffffad4a89ab>] __schedule+0x16b/0x8c0
[ 860.060318] [<ffffffffaa2dbf18>] ? unlink_file_vma+0x38/0x60
[ 860.060318] [<ffffffffaa1a1903>] schedule_preempt_disabled+0x33/0x80
[ 860.060318] [<ffffffffaa1c633e>] mutex_lock_nested+0x1ae/0x620
[ 860.060318] [<ffffffffaa2dbf18>] ? unlink_file_vma+0x38/0x60
[ 860.060318] [<ffffffffaa2dbf18>] unlink_file_vma+0x38/0x60
[ 860.060318] [<ffffffffaa2d2b70>] free_pgtables+0xb0/0x130
[ 860.060318] [<ffffffffaa2df0d4>] exit_mmap+0xc4/0x180
[ 860.060318] [<ffffffffaa168aa3>] mmput+0x73/0x110
[ 860.060318] [<ffffffffaa16fa9a>] do_exit+0x2ca/0xc80
[ 860.060318] [<ffffffffaa1cecdb>] ? trace_hardirqs_on_caller+0xfb/0x280
[ 860.060318] [<ffffffffaa1cee6d>] ? trace_hardirqs_on+0xd/0x10
[ 860.060318] [<ffffffffaa1704de>] do_group_exit+0x4e/0xe0
[ 860.060318] [<ffffffffaa170584>] SyS_exit_group+0x14/0x20
[ 860.060318] [<ffffffffad4ab593>] tracesys+0xe1/0xe6

Thanks,
Sasha

2014-12-26 06:45:32

by Li Bin

[permalink] [raw]

Subject: Re: sched: spinlock recursion in sched_rr_get_interval

On 2014/7/8 4:05, Peter Zijlstra wrote:
> On Mon, Jul 07, 2014 at 09:55:43AM -0400, Sasha Levin wrote:
>> I've also had this one, which looks similar:
>>
>> [10375.005884] BUG: spinlock recursion on CPU#0, modprobe/10965
>> [10375.006573] lock: 0xffff8803a0fd7740, .magic: dead4ead, .owner: modprobe/10965, .owner_cpu: 15
>> [10375.007412] CPU: 0 PID: 10965 Comm: modprobe Tainted: G W 3.16.0-rc3-next-20140704-sasha-00023-g26c0906-dirty #765
>
> Something's fucked; so we have:
>
> debug_spin_lock_before()
> SPIN_BUG_ON(lock->owner == current, "recursion");
>

Hello，
Does ACCESS_ONCE() can help this issue? I have no evidence that its lack is
responsible for the issue, but I think here need it indeed. Is that right?

SPIN_BUG_ON(ACCESS_ONCE(lock->owner) == current, "recursion");

Thanks,
Li Bin

> Causing that, _HOWEVER_ look at .owner_cpu and the reporting cpu!! How
> can the lock owner, own the lock on cpu 15 and again contend with it on
> CPU 0. That's impossible.
>
> About when-ish did you start seeing things like this? Lemme go stare
> hard at recent changes.
>

2014-12-26 07:03:05

by Sasha Levin

[permalink] [raw]

Subject: Re: sched: spinlock recursion in sched_rr_get_interval

On 12/26/2014 01:45 AM, Li Bin wrote:
> On 2014/7/8 4:05, Peter Zijlstra wrote:
>> > On Mon, Jul 07, 2014 at 09:55:43AM -0400, Sasha Levin wrote:
>>> >> I've also had this one, which looks similar:
>>> >>
>>> >> [10375.005884] BUG: spinlock recursion on CPU#0, modprobe/10965
>>> >> [10375.006573] lock: 0xffff8803a0fd7740, .magic: dead4ead, .owner: modprobe/10965, .owner_cpu: 15
>>> >> [10375.007412] CPU: 0 PID: 10965 Comm: modprobe Tainted: G W 3.16.0-rc3-next-20140704-sasha-00023-g26c0906-dirty #765
>> >
>> > Something's fucked; so we have:
>> >
>> > debug_spin_lock_before()
>> > SPIN_BUG_ON(lock->owner == current, "recursion");
>> >
> Hello，
> Does ACCESS_ONCE() can help this issue? I have no evidence that its lack is
> responsible for the issue, but I think here need it indeed. Is that right?
>
> SPIN_BUG_ON(ACCESS_ONCE(lock->owner) == current, "recursion");

Could you explain a bit more why do you think it's needed?

Thanks,
Sasha

2014-12-27 09:03:36

by Li Bin

[permalink] [raw]

Subject: Re: sched: spinlock recursion in sched_rr_get_interval

On 2014/12/26 15:01, Sasha Levin wrote:
> On 12/26/2014 01:45 AM, Li Bin wrote:
>> On 2014/7/8 4:05, Peter Zijlstra wrote:
>>>> On Mon, Jul 07, 2014 at 09:55:43AM -0400, Sasha Levin wrote:
>>>>>> I've also had this one, which looks similar:
>>>>>>
>>>>>> [10375.005884] BUG: spinlock recursion on CPU#0, modprobe/10965
>>>>>> [10375.006573] lock: 0xffff8803a0fd7740, .magic: dead4ead, .owner: modprobe/10965, .owner_cpu: 15
>>>>>> [10375.007412] CPU: 0 PID: 10965 Comm: modprobe Tainted: G W 3.16.0-rc3-next-20140704-sasha-00023-g26c0906-dirty #765
>>>>
>>>> Something's fucked; so we have:
>>>>
>>>> debug_spin_lock_before()
>>>> SPIN_BUG_ON(lock->owner == current, "recursion");
>>>>
>> Hello，
>> Does ACCESS_ONCE() can help this issue? I have no evidence that its lack is
>> responsible for the issue, but I think here need it indeed. Is that right?
>>
>> SPIN_BUG_ON(ACCESS_ONCE(lock->owner) == current, "recursion");
>
> Could you explain a bit more why do you think it's needed?
>

Oh, just adding ACCESS_ONCE may be not enough, and i think lacking lock protection
for reading lock->owner is a risk. In short, the reason of the issue is more like
the spinlock debug mechanism, rather than a real spinlock recursion.

...
//under no lock protection
if (lock->owner == current) //access lock->owner
|-spin_dump(lock, "recursion");
|-if (lock->owner && lock->owner != SPINLOCK_OWNER_INIT) //access lock->owner again
owner = lock->owner;
...

Right, or am I missing something?
Thanks,
Li Bin

>
> Thanks,
> Sasha
>
>

2014-12-27 09:52:16

by Davidlohr Bueso

[permalink] [raw]

Subject: Re: sched: spinlock recursion in sched_rr_get_interval

On Fri, 2014-12-26 at 14:45 +0800, Li Bin wrote:
> On 2014/7/8 4:05, Peter Zijlstra wrote:
> > On Mon, Jul 07, 2014 at 09:55:43AM -0400, Sasha Levin wrote:
> >> I've also had this one, which looks similar:
> >>
> >> [10375.005884] BUG: spinlock recursion on CPU#0, modprobe/10965
> >> [10375.006573] lock: 0xffff8803a0fd7740, .magic: dead4ead, .owner: modprobe/10965, .owner_cpu: 15
> >> [10375.007412] CPU: 0 PID: 10965 Comm: modprobe Tainted: G W 3.16.0-rc3-next-20140704-sasha-00023-g26c0906-dirty #765
> >
> > Something's fucked; so we have:
> >
> > debug_spin_lock_before()
> > SPIN_BUG_ON(lock->owner == current, "recursion");
> >
>
> Hello，
> Does ACCESS_ONCE() can help this issue? I have no evidence that its lack is
> responsible for the issue, but I think here need it indeed. Is that right?
>
> SPIN_BUG_ON(ACCESS_ONCE(lock->owner) == current, "recursion");

Hmm I guess on a contended spinlock, there's a chance that lock->owner
can change, if the contended lock is acquired, right between the 'cond'
and spin_debug(), which would explain the bogus ->owner related
messages. Of course the same applies to ->owner_cpu. Your ACCESS_ONCE,
however, doesn't really change anything since we still read ->owner
again in spin_debug; How about something like this (untested)?

diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
index 0374a59..86c199a 100644
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -75,15 +75,58 @@ static void spin_bug(raw_spinlock_t *lock, const char *msg)
spin_dump(lock, msg);
}

+static void spin_dump_owner(raw_spinlock_t *lock, struct task_struct *owner,
+ int owner_cpu, const char *msg)
+{
+ printk(KERN_EMERG "BUG: spinlock %s on CPU#%d, %s/%d\n",
+ msg, raw_smp_processor_id(),
+ current->comm, task_pid_nr(current));
+ printk(KERN_EMERG " lock: %pS, .magic: %08x, .owner: %s/%d, "
+ ".owner_cpu: %d\n", lock, lock->magic, owner->comm,
+ task_pid_nr(owner), owner_cpu);
+
+ dump_stack();
+}
+
+static void spin_bug_owner_lock(raw_spinlock_t *lock)
+{
+ int owner_cpu;
+ struct task_struct *owner;
+
+ if (!debug_locks_off())
+ return;
+
+ owner = ACCESS_ONCE(lock->owner);
+ owner_cpu = ACCESS_ONCE(lock->owner_cpu);
+ if (owner == current)
+ spin_dump_owner(lock, owner, owner_cpu, "recursion");
+ if (owner_cpu == raw_smp_processor_id())
+ spin_dump_owner(lock, owner, owner_cpu, "cpu recursion");
+}
+
+static void spin_bug_owner_unlock(raw_spinlock_t *lock)
+{
+ int owner_cpu;
+ struct task_struct *owner;
+
+ if (!debug_locks_off())
+ return;
+
+ owner = ACCESS_ONCE(lock->owner);
+ owner_cpu = ACCESS_ONCE(lock->owner_cpu);
+ if (owner != current)
+ spin_dump_owner(lock, owner, owner_cpu, "wrong owner");
+ if (owner_cpu != raw_smp_processor_id())
+ spin_dump_owner(lock, owner, owner_cpu, "wrong CPU");
+}
+
#define SPIN_BUG_ON(cond, lock, msg) if (unlikely(cond)) spin_bug(lock, msg)

static inline void
debug_spin_lock_before(raw_spinlock_t *lock)
{
SPIN_BUG_ON(lock->magic != SPINLOCK_MAGIC, lock, "bad magic");
- SPIN_BUG_ON(lock->owner == current, lock, "recursion");
- SPIN_BUG_ON(lock->owner_cpu == raw_smp_processor_id(),
- lock, "cpu recursion");
+ spin_bug_owner_lock(lock);
}

static inline void debug_spin_lock_after(raw_spinlock_t *lock)
@@ -96,9 +139,8 @@ static inline void debug_spin_unlock(raw_spinlock_t *lock)
{
SPIN_BUG_ON(lock->magic != SPINLOCK_MAGIC, lock, "bad magic");
SPIN_BUG_ON(!raw_spin_is_locked(lock), lock, "already unlocked");
- SPIN_BUG_ON(lock->owner != current, lock, "wrong owner");
- SPIN_BUG_ON(lock->owner_cpu != raw_smp_processor_id(),
- lock, "wrong CPU");
+ spin_bug_owner_unlock(lock);
+
lock->owner = SPINLOCK_OWNER_INIT;
lock->owner_cpu = -1;
}

Thanks,
Davidlohr

2014-12-27 15:54:28

by Sasha Levin

[permalink] [raw]

Subject: Re: sched: spinlock recursion in sched_rr_get_interval

On 12/27/2014 04:52 AM, Davidlohr Bueso wrote:
>> Hello，
>> > Does ACCESS_ONCE() can help this issue? I have no evidence that its lack is
>> > responsible for the issue, but I think here need it indeed. Is that right?
>> >
>> > SPIN_BUG_ON(ACCESS_ONCE(lock->owner) == current, "recursion");
> Hmm I guess on a contended spinlock, there's a chance that lock->owner
> can change, if the contended lock is acquired, right between the 'cond'
> and spin_debug(), which would explain the bogus ->owner related
> messages. Of course the same applies to ->owner_cpu. Your ACCESS_ONCE,
> however, doesn't really change anything since we still read ->owner
> again in spin_debug; How about something like this (untested)?

There's a chance that lock->owner would change, but how would you explain
it changing to 'current'?

That is, what race condition specifically creates the
'lock->owner == current' situation in the debug check?

Thanks,
Sasha

2014-12-28 20:17:26

by Davidlohr Bueso

[permalink] [raw]

Subject: Re: sched: spinlock recursion in sched_rr_get_interval

On Sat, 2014-12-27 at 10:52 -0500, Sasha Levin wrote:
> On 12/27/2014 04:52 AM, Davidlohr Bueso wrote:
> >> Hello，
> >> > Does ACCESS_ONCE() can help this issue? I have no evidence that its lack is
> >> > responsible for the issue, but I think here need it indeed. Is that right?
> >> >
> >> > SPIN_BUG_ON(ACCESS_ONCE(lock->owner) == current, "recursion");
> > Hmm I guess on a contended spinlock, there's a chance that lock->owner
> > can change, if the contended lock is acquired, right between the 'cond'
> > and spin_debug(), which would explain the bogus ->owner related
> > messages. Of course the same applies to ->owner_cpu. Your ACCESS_ONCE,
> > however, doesn't really change anything since we still read ->owner
> > again in spin_debug; How about something like this (untested)?

I guess we'd need a writer rwlock counterpart too.

> There's a chance that lock->owner would change, but how would you explain
> it changing to 'current'?

So yeah, the above only deals with the weird printk values, not the
actual issue that triggers the BUG_ON. Lets sort this out first and at
least get correct data.

> That is, what race condition specifically creates the
> 'lock->owner == current' situation in the debug check?

Why do you suspect a race as opposed to a legitimate recursion issue?
Although after staring at the code for a while, I cannot see foul play
in sched_rr_get_interval.

Given that all reports show bogus contending CPU and .owner_cpu, I do
wonder if this is actually a symptom of the BUG_ON where something fishy
is going on.. although I have no evidence to support that. I also ran
into this https://lkml.org/lkml/2014/11/7/762 which shows the same bogus
values yet a totally different stack.

Sasha, I ran trinity with CONFIG_DEBUG_SPINLOCK=y all night without
triggering anything. How are you hitting this?

Thanks,
Davidlohr

2014-12-29 14:23:45

by Sasha Levin

[permalink] [raw]

Subject: Re: sched: spinlock recursion in sched_rr_get_interval

On 12/28/2014 03:17 PM, Davidlohr Bueso wrote:
> On Sat, 2014-12-27 at 10:52 -0500, Sasha Levin wrote:
>> > There's a chance that lock->owner would change, but how would you explain
>> > it changing to 'current'?
> So yeah, the above only deals with the weird printk values, not the
> actual issue that triggers the BUG_ON. Lets sort this out first and at
> least get correct data.

Is there an issue with weird printk values? I haven't seen a report of
something like that, nor have seen it myself.

>> > That is, what race condition specifically creates the
>> > 'lock->owner == current' situation in the debug check?
> Why do you suspect a race as opposed to a legitimate recursion issue?
> Although after staring at the code for a while, I cannot see foul play
> in sched_rr_get_interval.
>
> Given that all reports show bogus contending CPU and .owner_cpu, I do
> wonder if this is actually a symptom of the BUG_ON where something fishy
> is going on.. although I have no evidence to support that. I also ran
> into this https://lkml.org/lkml/2014/11/7/762 which shows the same bogus
> values yet a totally different stack.
>
> Sasha, I ran trinity with CONFIG_DEBUG_SPINLOCK=y all night without
> triggering anything. How are you hitting this?

I don't have any reliable way of reproducing it. The only two things I
can think of are:

- Try running as root in a disposable vm
- Try running with really high load (I use ~800 children on 16 vcpu
guests).

Thanks,
Sasha

2014-12-30 01:05:16

by Sasha Levin

[permalink] [raw]

Subject: Re: sched: spinlock recursion in sched_rr_get_interval

On 12/28/2014 03:17 PM, Davidlohr Bueso wrote:
>> That is, what race condition specifically creates the
>> > 'lock->owner == current' situation in the debug check?
> Why do you suspect a race as opposed to a legitimate recursion issue?
> Although after staring at the code for a while, I cannot see foul play
> in sched_rr_get_interval.

Because it's not specific to sched_rr_get_interval. I've seen the same
error with different traces, and when the only common thing is the
spinlock debug output looking off then that's what I'm going to blame.

Here's an example of a completely sched-unrelated trace:

[ 1971.009744] BUG: spinlock lockup suspected on CPU#7, trinity-c436/29017
[ 1971.013170] lock: 0xffff88016e0d8af0, .magic: dead4ead, .owner: trinity-c404/541, .owner_cpu: 12
[ 1971.017630] CPU: 7 PID: 29017 Comm: trinity-c436 Not tainted 3.19.0-rc1-next-20141226-sasha-00051-g2dd3d73-dirty #1639
[ 1971.023642] 0000000000000000 0000000000000000 ffff880102fe3000 ffff88014e923658
[ 1971.027654] ffffffffb13501de 0000000000000055 ffff88016e0d8af0 ffff88014e923698
[ 1971.031716] ffffffffa1588205 ffff88016e0d8af0 ffff88016e0d8b00 ffff88016e0d8af0
[ 1971.035695] Call Trace:
[ 1971.037081] dump_stack (lib/dump_stack.c:52)
[ 1971.040175] spin_dump (kernel/locking/spinlock_debug.c:68 (discriminator 8))
[ 1971.043138] do_raw_spin_lock (include/linux/nmi.h:48 kernel/locking/spinlock_debug.c:119 kernel/locking/spinlock_debug.c:137)
[ 1971.046155] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 1971.048801] ? __page_check_address (include/linux/spinlock.h:309 mm/rmap.c:633)
[ 1971.052152] __page_check_address (include/linux/spinlock.h:309 mm/rmap.c:633)
[ 1971.055129] try_to_unmap_one (include/linux/rmap.h:204 mm/rmap.c:1176)
[ 1971.057738] ? vma_interval_tree_iter_next (mm/interval_tree.c:24 (discriminator 4))
[ 1971.061181] rmap_walk (mm/rmap.c:1747 mm/rmap.c:1772)
[ 1971.062582] try_to_munlock (mm/rmap.c:1631)
[ 1971.064829] ? try_to_unmap_nonlinear (mm/rmap.c:1167)
[ 1971.068741] ? SyS_msync (mm/rmap.c:1546)
[ 1971.072252] ? page_get_anon_vma (mm/rmap.c:450)
[ 1971.074321] __munlock_isolated_page (mm/mlock.c:132)
[ 1971.075431] __munlock_pagevec (mm/mlock.c:388)
[ 1971.076345] ? munlock_vma_pages_range (include/linux/mm.h:906 mm/mlock.c:521)
[ 1971.077371] munlock_vma_pages_range (mm/mlock.c:533)
[ 1971.078339] exit_mmap (mm/internal.h:227 mm/mmap.c:2827)
[ 1971.079153] ? retint_restore_args (arch/x86/kernel/entry_64.S:844)
[ 1971.080197] ? __khugepaged_exit (./arch/x86/include/asm/atomic.h:118 include/linux/sched.h:2463 mm/huge_memory.c:2151)
[ 1971.081055] ? __khugepaged_exit (./arch/x86/include/asm/atomic.h:118 include/linux/sched.h:2463 mm/huge_memory.c:2151)
[ 1971.081915] mmput (kernel/fork.c:659)
[ 1971.082578] do_exit (./arch/x86/include/asm/thread_info.h:164 kernel/exit.c:438 kernel/exit.c:732)
[ 1971.083360] ? sched_clock_cpu (kernel/sched/clock.c:311)
[ 1971.084191] ? get_signal (kernel/signal.c:2338)
[ 1971.084984] ? _raw_spin_unlock_irq (./arch/x86/include/asm/paravirt.h:819 include/linux/spinlock_api_smp.h:168 kernel/locking/spinlock.c:199)
[ 1971.085862] do_group_exit (include/linux/sched.h:775 kernel/exit.c:858)
[ 1971.086659] get_signal (kernel/signal.c:2358)
[ 1971.087486] ? sched_clock_local (kernel/sched/clock.c:202)
[ 1971.088359] ? sched_clock (./arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:304)
[ 1971.089142] do_signal (arch/x86/kernel/signal.c:703)
[ 1971.089896] ? vtime_account_user (kernel/sched/cputime.c:701)
[ 1971.090853] ? context_tracking_user_exit (./arch/x86/include/asm/paravirt.h:809 (discriminator 2) kernel/context_tracking.c:144 (discriminator 2))
[ 1971.091950] ? trace_hardirqs_on (kernel/locking/lockdep.c:2609)
[ 1971.092806] do_notify_resume (arch/x86/kernel/signal.c:756)
[ 1971.093618] int_signal (arch/x86/kernel/entry_64.S:587)

Thanks,
Sasha