2013-04-19 16:43:26

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcu related? ]

On Fri, Apr 19, 2013 at 12:27 PM, Sedat Dilek <[email protected]> wrote:
> On Fri, Apr 19, 2013 at 12:19 PM, Sedat Dilek <[email protected]> wrote:
>> On Thu, Apr 18, 2013 at 11:59 PM, Sedat Dilek <[email protected]> wrote:
>>> On Thu, Apr 18, 2013 at 9:48 PM, Daniel Vetter <[email protected]> wrote:
>>>> On Thu, Apr 18, 2013 at 3:05 PM, Sedat Dilek <[email protected]> wrote:
>>>>> On Thu, Apr 18, 2013 at 10:28 AM, Stephen Rothwell <[email protected]> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> Changes since 20130417:
>>>>>>
>>>>>> New Trees: rpmsg (actually added yesterday)
>>>>>> ppc-temp (replacing powerpc for this week)
>>>>>>
>>>>>> The ceph tree gained a conflict against Linus' tree.
>>>>>>
>>>>>> The net-next tree gained a conflict against the infiniband tree.
>>>>>>
>>>>>> The usb tree gained a build failure so I used the version from
>>>>>> next-20130417.
>>>>>>
>>>>>> I added two merge fix patches after the gen-gpio tree.
>>>>>>
>>>>>> The ppc-temp tree gained a conflict against the metag tree.
>>>>>>
>>>>>> The akpm tree lost a patch that turned up elsewhere.
>>>>>>
>>>>>> ----------------------------------------------------------------------------
>>>>>>
>>>>>
>>>>> Not sure what the root-cause for this call-trace is (see screenshot).
>>>>>
>>>>> This is reproducible when running my kernel build-script (4 parallel-make-jobs).
>>>>>
>>>>> Any hints welcome!
>>>>
>>>> The panic handlers in our modeset code are pretty decent fubar - they
>>>> take mutexes all over the place. So I think the backtrace you see
>>>> there is actually a secondary effect. I've looked into fixing this up,
>>>> but the issue is that drivers themselves have tons of state protected
>>>> with mutexes, which all potentially affects the panic handler. So I've
>>>> given up on that for now ...
>>>
>>> Thanks for taking care.
>>>
>>> On suspicion [1] I have reverted [2]... NOPE.
>>>
>>> - Sedat -
>>>
>>> [1] http://marc.info/?l=linux-kernel&m=136631921208895&w=2
>>> [2] http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=5a90d1a95356de7a32acb2e5309ac579a891af8f
>>>
>>
>> Hmmm, the issue seems to be gone with today's Linux-Next (next-20130419).
>> My kernel-build is still running with no call-trace...
>>
>
> NO, It's no good.
>

I tried to switch from SLUB to SLAB...

...and also from VIRT_CPU_ACCOUNTING_GEN to TICK_CPU_ACCOUNTING.

2x NOPE.

In one kernel-build I saw in my console...

semop(1): encountered an error: Identifier removed

...if this says sth. to you.

- Sedat -

- Sedat -

> - Sedat -
>
> [1] http://www.youtube.com/watch?v=LwicaralvS0
>
>> - Sedat -
>>
>>>> -Daniel
>>>> --
>>>> Daniel Vetter
>>>> Software Engineer, Intel Corporation
>>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch


2013-04-19 18:53:06

by Sedat Dilek

[permalink] [raw]
Subject: Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcu related? ]

On Fri, Apr 19, 2013 at 6:43 PM, Sedat Dilek <[email protected]> wrote:
> On Fri, Apr 19, 2013 at 12:27 PM, Sedat Dilek <[email protected]> wrote:
>> On Fri, Apr 19, 2013 at 12:19 PM, Sedat Dilek <[email protected]> wrote:
>>> On Thu, Apr 18, 2013 at 11:59 PM, Sedat Dilek <[email protected]> wrote:
>>>> On Thu, Apr 18, 2013 at 9:48 PM, Daniel Vetter <[email protected]> wrote:
>>>>> On Thu, Apr 18, 2013 at 3:05 PM, Sedat Dilek <[email protected]> wrote:
>>>>>> On Thu, Apr 18, 2013 at 10:28 AM, Stephen Rothwell <[email protected]> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Changes since 20130417:
>>>>>>>
>>>>>>> New Trees: rpmsg (actually added yesterday)
>>>>>>> ppc-temp (replacing powerpc for this week)
>>>>>>>
>>>>>>> The ceph tree gained a conflict against Linus' tree.
>>>>>>>
>>>>>>> The net-next tree gained a conflict against the infiniband tree.
>>>>>>>
>>>>>>> The usb tree gained a build failure so I used the version from
>>>>>>> next-20130417.
>>>>>>>
>>>>>>> I added two merge fix patches after the gen-gpio tree.
>>>>>>>
>>>>>>> The ppc-temp tree gained a conflict against the metag tree.
>>>>>>>
>>>>>>> The akpm tree lost a patch that turned up elsewhere.
>>>>>>>
>>>>>>> ----------------------------------------------------------------------------
>>>>>>>
>>>>>>
>>>>>> Not sure what the root-cause for this call-trace is (see screenshot).
>>>>>>
>>>>>> This is reproducible when running my kernel build-script (4 parallel-make-jobs).
>>>>>>
>>>>>> Any hints welcome!
>>>>>
>>>>> The panic handlers in our modeset code are pretty decent fubar - they
>>>>> take mutexes all over the place. So I think the backtrace you see
>>>>> there is actually a secondary effect. I've looked into fixing this up,
>>>>> but the issue is that drivers themselves have tons of state protected
>>>>> with mutexes, which all potentially affects the panic handler. So I've
>>>>> given up on that for now ...
>>>>
>>>> Thanks for taking care.
>>>>
>>>> On suspicion [1] I have reverted [2]... NOPE.
>>>>
>>>> - Sedat -
>>>>
>>>> [1] http://marc.info/?l=linux-kernel&m=136631921208895&w=2
>>>> [2] http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=5a90d1a95356de7a32acb2e5309ac579a891af8f
>>>>
>>>
>>> Hmmm, the issue seems to be gone with today's Linux-Next (next-20130419).
>>> My kernel-build is still running with no call-trace...
>>>
>>
>> NO, It's no good.
>>
>
> I tried to switch from SLUB to SLAB...
>
> ...and also from VIRT_CPU_ACCOUNTING_GEN to TICK_CPU_ACCOUNTING.
>
> 2x NOPE.
>
> In one kernel-build I saw in my console...
>
> semop(1): encountered an error: Identifier removed
>
> ...if this says sth. to you.
>

[ CC folks from below thread ]

I have found a thread called "Re: ipc,sem: sysv semaphore scalability"
on LKML with a screenshot that shows the same call-trace.
I followed it a bit.
There is a patch in [3]... unconfirmed.

Comments on the rcu read-lock and "sem_lock()" vs "sem_unlock()" from Linus.

What's the status of this discussion?

- Sedat -

[1] https://lkml.org/lkml/2013/3/30/6
[2] http://i.imgur.com/uk6gmq1.jpg
[3] https://lkml.org/lkml/2013/3/31/12
[4] https://lkml.org/lkml/2013/3/31/77

> - Sedat -
>
> - Sedat -
>
>> - Sedat -
>>
>> [1] http://www.youtube.com/watch?v=LwicaralvS0
>>
>>> - Sedat -
>>>
>>>>> -Daniel
>>>>> --
>>>>> Daniel Vetter
>>>>> Software Engineer, Intel Corporation
>>>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

2013-04-19 19:19:34

by Rik van Riel

[permalink] [raw]
Subject: Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcu related? ]

On 04/19/2013 02:53 PM, Sedat Dilek wrote:
> On Fri, Apr 19, 2013 at 6:43 PM, Sedat Dilek <[email protected]> wrote:
>
>> I tried to switch from SLUB to SLAB...
>>
>> ...and also from VIRT_CPU_ACCOUNTING_GEN to TICK_CPU_ACCOUNTING.
>>
>> 2x NOPE.
>>
>> In one kernel-build I saw in my console...
>>
>> semop(1): encountered an error: Identifier removed
>>
>> ...if this says sth. to you.
>>
> [ CC folks from below thread ]
>
> I have found a thread called "Re: ipc,sem: sysv semaphore scalability"
> on LKML with a screenshot that shows the same call-trace.
> I followed it a bit.
> There is a patch in [3]... unconfirmed.
>
> Comments on the rcu read-lock and "sem_lock()" vs "sem_unlock()" from Linus.
>
> What's the status of this discussion?
>
> - Sedat -
>
> [1] https://lkml.org/lkml/2013/3/30/6
> [2] http://i.imgur.com/uk6gmq1.jpg
> [3] https://lkml.org/lkml/2013/3/31/12
> [4] https://lkml.org/lkml/2013/3/31/77
>
I am at a conference right now, but when I get
back I will check linux-next vs. all the fixes from
the semaphore scalability email thread.

I will gather up the not-yet-in-linux-next fixes,
and will send them to Andrew for inclusion.

No guarantee that those fixes are related to
your problem, of course :)

2013-04-19 20:12:14

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: linux-next: Tree for Apr 18 [ call-trace: drm | x86 | smp | rcu related? ]

On Fri, 2013-04-19 at 15:19 -0400, Rik van Riel wrote:
> On 04/19/2013 02:53 PM, Sedat Dilek wrote:
> > On Fri, Apr 19, 2013 at 6:43 PM, Sedat Dilek <[email protected]> wrote:
> >
> >> I tried to switch from SLUB to SLAB...
> >>
> >> ...and also from VIRT_CPU_ACCOUNTING_GEN to TICK_CPU_ACCOUNTING.
> >>
> >> 2x NOPE.
> >>
> >> In one kernel-build I saw in my console...
> >>
> >> semop(1): encountered an error: Identifier removed

This looks like what Emmanuel was/is running into:
https://lkml.org/lkml/2013/3/30/1

> >>
> >> ...if this says sth. to you.
> >>
> > [ CC folks from below thread ]
> >
> > I have found a thread called "Re: ipc,sem: sysv semaphore scalability"
> > on LKML with a screenshot that shows the same call-trace.
> > I followed it a bit.
> > There is a patch in [3]... unconfirmed.
> >
> > Comments on the rcu read-lock and "sem_lock()" vs "sem_unlock()" from Linus.
> >
> > What's the status of this discussion?
> >
> > - Sedat -
> >
> > [1] https://lkml.org/lkml/2013/3/30/6
> > [2] http://i.imgur.com/uk6gmq1.jpg
> > [3] https://lkml.org/lkml/2013/3/31/12
> > [4] https://lkml.org/lkml/2013/3/31/77
> >
> I am at a conference right now, but when I get
> back I will check linux-next vs. all the fixes from
> the semaphore scalability email thread.

I'm back from the collab. summit, so AFAICT these still need to go in
linux-next:

ipc,sem: untangle RCU locking with find_alloc_undo:
https://lkml.org/lkml/2013/3/28/275

ipc,sem: fix lockdep false positive:
https://lkml.org/lkml/2013/3/29/119

ipc, sem: do not call sem_lock when bogus sma:
https://lkml.org/lkml/2013/3/31/12

Thanks,
Davidlohr