2013-04-19 21:24:14

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcu related? ]

On Fri, Apr 19, 2013 at 2:09 PM, Sedat Dilek <[email protected]> wrote:
>
> I have applied all three patches and see still call-traces.
> New are apparmor related messages.

Can you try the crazy rcu double-free debug hack?

See

https://lkml.org/lkml/2013/3/30/113

and I'm re-attaching the ugly-ass crazy hack patch here too..

Linus


Attachments:
rcu-double-free-hack.patch (2.20 kB)

2013-04-19 21:40:10

by Paul E. McKenney

[permalink] [raw]
Subject: Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcu related? ]

On Fri, Apr 19, 2013 at 02:24:10PM -0700, Linus Torvalds wrote:
> On Fri, Apr 19, 2013 at 2:09 PM, Sedat Dilek <[email protected]> wrote:
> >
> > I have applied all three patches and see still call-traces.
> > New are apparmor related messages.
>
> Can you try the crazy rcu double-free debug hack?
>
> See
>
> https://lkml.org/lkml/2013/3/30/113
>
> and I'm re-attaching the ugly-ass crazy hack patch here too..
>
> Linus

For whatever it is worth, CONFIG_DEBUG_OBJECTS_RCU_HEAD=y is intended to
detect RCU double-freeing. But if it isn't helping for whatever reason,
please let me know!

Thanx, Paul

2013-04-19 22:12:38

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcu related? ]

On Fri, Apr 19, 2013 at 11:24 PM, Linus Torvalds
<[email protected]> wrote:
> On Fri, Apr 19, 2013 at 2:09 PM, Sedat Dilek <[email protected]> wrote:
>>
>> I have applied all three patches and see still call-traces.
>> New are apparmor related messages.
>
> Can you try the crazy rcu double-free debug hack?
>
> See
>
> https://lkml.org/lkml/2013/3/30/113
>
> and I'm re-attaching the ugly-ass crazy hack patch here too..
>
> Linus

Attached the dmesg with some call-traces inside.

I am building a -9 kernel with Paul's kconfig recommendations...

CONFIG_DEBUG_OBJECTS=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1

...and will report.

- Sedat -


Attachments:
dmesg_3.9.0-rc7-next20130419-8-iniza-small_RCU-DOUBLE-FREE-HACK.txt (65.27 kB)

2013-04-19 22:35:03

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcu related? ]

On Sat, Apr 20, 2013 at 12:12 AM, Sedat Dilek <[email protected]> wrote:
> On Fri, Apr 19, 2013 at 11:24 PM, Linus Torvalds
> <[email protected]> wrote:
>> On Fri, Apr 19, 2013 at 2:09 PM, Sedat Dilek <[email protected]> wrote:
>>>
>>> I have applied all three patches and see still call-traces.
>>> New are apparmor related messages.
>>
>> Can you try the crazy rcu double-free debug hack?
>>
>> See
>>
>> https://lkml.org/lkml/2013/3/30/113
>>
>> and I'm re-attaching the ugly-ass crazy hack patch here too..
>>
>> Linus
>
> Attached the dmesg with some call-traces inside.
>
> I am building a -9 kernel with Paul's kconfig recommendations...
>
> CONFIG_DEBUG_OBJECTS=y
> CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
> CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1
>
> ...and will report.
>

See attached dmesg.
AFAICS no more debug-infos.

# CONFIG_DEBUG_SPINLOCK is not set <--- Enable?

- Sedat -

> - Sedat -


Attachments:
dmesg_3.9.0-rc7-next20130419-9-iniza-small_RCU-DOUBLE-FREE-HACK_CONFIG_DEBUG_OBJECTS_RCU_HEAD-y.txt (66.64 kB)

2013-04-19 22:50:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcu related? ]

On Fri, Apr 19, 2013 at 3:34 PM, Sedat Dilek <[email protected]> wrote:
>
> See attached dmesg.

This still has the bug Davidlohr pointed at:

>> This looks like what Emmanuel was/is running into:
>> https://lkml.org/lkml/2013/3/30/1

you need to move the "IS_ERR()" check before the sem_lock.

Linus

2013-04-19 22:55:59

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcu related? ]

On Sat, Apr 20, 2013 at 12:50 AM, Linus Torvalds
<[email protected]> wrote:
> On Fri, Apr 19, 2013 at 3:34 PM, Sedat Dilek <[email protected]> wrote:
>>
>> See attached dmesg.
>
> This still has the bug Davidlohr pointed at:
>
>>> This looks like what Emmanuel was/is running into:
>>> https://lkml.org/lkml/2013/3/30/1
>
> you need to move the "IS_ERR()" check before the sem_lock.
>

Davidlohr pointed to this patch (tested the triplet):

ipc, sem: do not call sem_lock when bogus sma:
https://lkml.org/lkml/2013/3/31/12

Is that what you mean?

- Sedat -

> Linus

2013-04-19 23:02:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcu related? ]

On Fri, Apr 19, 2013 at 3:55 PM, Sedat Dilek <[email protected]> wrote:
>
> Davidlohr pointed to this patch (tested the triplet):
>
> ipc, sem: do not call sem_lock when bogus sma:
> https://lkml.org/lkml/2013/3/31/12
>
> Is that what you mean?

Yup.

Linus

2013-04-20 00:06:32

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcu related? ]

On Sat, Apr 20, 2013 at 1:02 AM, Linus Torvalds
<[email protected]> wrote:
> On Fri, Apr 19, 2013 at 3:55 PM, Sedat Dilek <[email protected]> wrote:
>>
>> Davidlohr pointed to this patch (tested the triplet):
>>
>> ipc, sem: do not call sem_lock when bogus sma:
>> https://lkml.org/lkml/2013/3/31/12
>>
>> Is that what you mean?
>
> Yup.
>

Davidlohr Bueso (1):
ipc, sem: do not call sem_lock when bogus sma

Linus Torvalds (1):
crazy rcu double free debug hack

With ***both*** patches applied I am able to build a Linux-kernel with
4 parallel-make-jobs again.
David's or your patch alone are not sufficient!

- Sedat -

> Linus


Attachments:
dmesg_3.9.0-rc7-next20130419-12-iniza-small.txt (52.46 kB)
3.9.0-rc7-next20130419-12-iniza-small.patch (5.17 kB)
config-3.9.0-rc7-next20130419-12-iniza-small (109.74 kB)
Download all attachments

2013-04-20 00:19:57

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcu related? ]

On Sat, Apr 20, 2013 at 2:06 AM, Sedat Dilek <[email protected]> wrote:
> On Sat, Apr 20, 2013 at 1:02 AM, Linus Torvalds
> <[email protected]> wrote:
>> On Fri, Apr 19, 2013 at 3:55 PM, Sedat Dilek <[email protected]> wrote:
>>>
>>> Davidlohr pointed to this patch (tested the triplet):
>>>
>>> ipc, sem: do not call sem_lock when bogus sma:
>>> https://lkml.org/lkml/2013/3/31/12
>>>
>>> Is that what you mean?
>>
>> Yup.
>>
>
> Davidlohr Bueso (1):
> ipc, sem: do not call sem_lock when bogus sma
>
> Linus Torvalds (1):
> crazy rcu double free debug hack
>
> With ***both*** patches applied I am able to build a Linux-kernel with
> 4 parallel-make-jobs again.
> David's or your patch alone are not sufficient!
>

[ Still both patches applied ]

To correct myself... The 1st run was OK.

The 2nd run shows a NULL-pointer-deref (excerpt):

[ 178.490583] BUG: spinlock bad magic on CPU#1, sh/8066
[ 178.490595] lock: 0xffff88008b53ea18, .magic: 6b6b6b6b, .owner:
make/8068, .owner_cpu: 3
[ 178.490599] BUG: unable to handle kernel NULL pointer dereference
at (null)
[ 178.490608] IP: [<ffffffff812bacd0>] update_queue+0x70/0x210
[ 178.490610] PGD 0
[ 178.490612] Oops: 0000 [#1] SMP
...

See attached full dmesg!

Hope this helps.

- Sedat -


> - Sedat -
>
>> Linus


Attachments:
dmesg_3.9.0-rc7-next20130419-12-iniza-small_BROKEN.txt (68.00 kB)

2013-04-20 12:46:25

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcu related? ]

On Sat, Apr 20, 2013 at 2:19 AM, Sedat Dilek <[email protected]> wrote:
> On Sat, Apr 20, 2013 at 2:06 AM, Sedat Dilek <[email protected]> wrote:
>> On Sat, Apr 20, 2013 at 1:02 AM, Linus Torvalds
>> <[email protected]> wrote:
>>> On Fri, Apr 19, 2013 at 3:55 PM, Sedat Dilek <[email protected]> wrote:
>>>>
>>>> Davidlohr pointed to this patch (tested the triplet):
>>>>
>>>> ipc, sem: do not call sem_lock when bogus sma:
>>>> https://lkml.org/lkml/2013/3/31/12
>>>>
>>>> Is that what you mean?
>>>
>>> Yup.
>>>
>>
>> Davidlohr Bueso (1):
>> ipc, sem: do not call sem_lock when bogus sma
>>
>> Linus Torvalds (1):
>> crazy rcu double free debug hack
>>
>> With ***both*** patches applied I am able to build a Linux-kernel with
>> 4 parallel-make-jobs again.
>> David's or your patch alone are not sufficient!
>>
>
> [ Still both patches applied ]
>
> To correct myself... The 1st run was OK.
>
> The 2nd run shows a NULL-pointer-deref (excerpt):
>
> [ 178.490583] BUG: spinlock bad magic on CPU#1, sh/8066
> [ 178.490595] lock: 0xffff88008b53ea18, .magic: 6b6b6b6b, .owner:
> make/8068, .owner_cpu: 3
> [ 178.490599] BUG: unable to handle kernel NULL pointer dereference
> at (null)
> [ 178.490608] IP: [<ffffffff812bacd0>] update_queue+0x70/0x210
> [ 178.490610] PGD 0
> [ 178.490612] Oops: 0000 [#1] SMP
> ...
>
> See attached full dmesg!
>
> Hope this helps.
>
> - Sedat -
>
>
>> - Sedat -
>>
>>> Linus

I have started a new thread "[next-20130419] ipc: sem: BROKEN", please
use this one!

Thanks for all your feedback!

- Sedat -

[1] http://marc.info/?l=linux-next&m=136646172915261&w=2

2013-04-20 15:59:56

by Rik van Riel

[permalink] [raw]
Subject: Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcu related? ]

On 04/20/2013 08:46 AM, Sedat Dilek wrote:

> I have started a new thread "[next-20130419] ipc: sem: BROKEN", please
> use this one!
>
> Thanks for all your feedback!
>
> - Sedat -
>
> [1] http://marc.info/?l=linux-next&m=136646172915261&w=2

I suspect most of us are not subscribed to the linux-next mailing list.
I know I'm not.

If you need someone on some email thread, you need to CC them on the
email.

--
All rights reversed.

2013-04-20 17:04:08

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcu related? ]

On Sat, Apr 20, 2013 at 5:59 PM, Rik van Riel <[email protected]> wrote:
> On 04/20/2013 08:46 AM, Sedat Dilek wrote:
>
>> I have started a new thread "[next-20130419] ipc: sem: BROKEN", please
>> use this one!
>>
>> Thanks for all your feedback!
>>
>> - Sedat -
>>
>> [1] http://marc.info/?l=linux-next&m=136646172915261&w=2
>
>
> I suspect most of us are not subscribed to the linux-next mailing list.
> I know I'm not.
>
> If you need someone on some email thread, you need to CC them on the
> email.
>

I dropped some irrelevant CCs from here in the new thread (as I did
not know from what the root-cause was).

LKML, linux-next, linux-mm MLs should be OK or am I missing someone or
another ML?

- Sedat -

> --
> All rights reversed.

2013-04-20 20:00:42

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcu related? ]

On Sat, 2013-04-20 at 02:19 +0200, Sedat Dilek wrote:
> On Sat, Apr 20, 2013 at 2:06 AM, Sedat Dilek <[email protected]> wrote:
> > On Sat, Apr 20, 2013 at 1:02 AM, Linus Torvalds
> > <[email protected]> wrote:
> >> On Fri, Apr 19, 2013 at 3:55 PM, Sedat Dilek <[email protected]> wrote:
> >>>
> >>> Davidlohr pointed to this patch (tested the triplet):
> >>>
> >>> ipc, sem: do not call sem_lock when bogus sma:
> >>> https://lkml.org/lkml/2013/3/31/12
> >>>
> >>> Is that what you mean?
> >>
> >> Yup.
> >>
> >
> > Davidlohr Bueso (1):
> > ipc, sem: do not call sem_lock when bogus sma
> >
> > Linus Torvalds (1):
> > crazy rcu double free debug hack
> >
> > With ***both*** patches applied I am able to build a Linux-kernel with
> > 4 parallel-make-jobs again.
> > David's or your patch alone are not sufficient!
> >
>
> [ Still both patches applied ]
>
> To correct myself... The 1st run was OK.
>
> The 2nd run shows a NULL-pointer-deref (excerpt):
>
> [ 178.490583] BUG: spinlock bad magic on CPU#1, sh/8066
> [ 178.490595] lock: 0xffff88008b53ea18, .magic: 6b6b6b6b, .owner:
> make/8068, .owner_cpu: 3
> [ 178.490599] BUG: unable to handle kernel NULL pointer dereference
> at (null)
> [ 178.490608] IP: [<ffffffff812bacd0>] update_queue+0x70/0x210
> [ 178.490610] PGD 0
> [ 178.490612] Oops: 0000 [#1] SMP
> ...

The exit_sem() >> do_smart_update() >> update_queue() calls seem pretty
well protected. Furthermore we're asserting that sma->sem_perm.lock is
taken. This could just be a consequence of another issue. Earlier this
week Andrew pointed out a potential race in semctl_main() where
sma->sem_perm.deleted could be changed when cmd == GETALL.

Sedat, could you try the attached patch to keep the ipc lock acquired
(on top of the three patches you're already using) and let us know how
it goes? We could also just have the RCU read lock instead of
->sem.perm.lock for GETALL, but lets play it safe for now.

Thanks,
Davidlohr


Attachments:
ipc-fix.patch (506.00 B)

2013-04-20 21:03:21

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcu related? ]

On Sat, Apr 20, 2013 at 10:00 PM, Davidlohr Bueso
<[email protected]> wrote:
> On Sat, 2013-04-20 at 02:19 +0200, Sedat Dilek wrote:
>> On Sat, Apr 20, 2013 at 2:06 AM, Sedat Dilek <[email protected]> wrote:
>> > On Sat, Apr 20, 2013 at 1:02 AM, Linus Torvalds
>> > <[email protected]> wrote:
>> >> On Fri, Apr 19, 2013 at 3:55 PM, Sedat Dilek <[email protected]> wrote:
>> >>>
>> >>> Davidlohr pointed to this patch (tested the triplet):
>> >>>
>> >>> ipc, sem: do not call sem_lock when bogus sma:
>> >>> https://lkml.org/lkml/2013/3/31/12
>> >>>
>> >>> Is that what you mean?
>> >>
>> >> Yup.
>> >>
>> >
>> > Davidlohr Bueso (1):
>> > ipc, sem: do not call sem_lock when bogus sma
>> >
>> > Linus Torvalds (1):
>> > crazy rcu double free debug hack
>> >
>> > With ***both*** patches applied I am able to build a Linux-kernel with
>> > 4 parallel-make-jobs again.
>> > David's or your patch alone are not sufficient!
>> >
>>
>> [ Still both patches applied ]
>>
>> To correct myself... The 1st run was OK.
>>
>> The 2nd run shows a NULL-pointer-deref (excerpt):
>>
>> [ 178.490583] BUG: spinlock bad magic on CPU#1, sh/8066
>> [ 178.490595] lock: 0xffff88008b53ea18, .magic: 6b6b6b6b, .owner:
>> make/8068, .owner_cpu: 3
>> [ 178.490599] BUG: unable to handle kernel NULL pointer dereference
>> at (null)
>> [ 178.490608] IP: [<ffffffff812bacd0>] update_queue+0x70/0x210
>> [ 178.490610] PGD 0
>> [ 178.490612] Oops: 0000 [#1] SMP
>> ...
>
> The exit_sem() >> do_smart_update() >> update_queue() calls seem pretty
> well protected. Furthermore we're asserting that sma->sem_perm.lock is
> taken. This could just be a consequence of another issue. Earlier this
> week Andrew pointed out a potential race in semctl_main() where
> sma->sem_perm.deleted could be changed when cmd == GETALL.
>
> Sedat, could you try the attached patch to keep the ipc lock acquired
> (on top of the three patches you're already using) and let us know how
> it goes? We could also just have the RCU read lock instead of
> ->sem.perm.lock for GETALL, but lets play it safe for now.
>

I had to refresh your patch and it contain strange ^M charachters as well.

NOPE, the machine gets FROZEN (cold reboot).

- Sedat -

P.S.: I have attached all 4 patches against next-20130419 in case
someone wants to follow.

> Thanks,
> Davidlohr
>


Attachments:
0001-ipc-sem-untangle-RCU-locking-with-find_alloc_undo.patch (3.14 kB)
0002-ipc-sem-fix-lockdep-false-positive.patch (4.62 kB)
0003-ipc-sem-do-not-call-sem_lock-when-bogus-sma.patch (1.41 kB)
0004-ipc-sem-check-if-the-ipc-lock-has-been-already-taken.patch (1.08 kB)
Download all attachments