2019-05-20 12:18:34

by Christian König

[permalink] [raw]
Subject: Re: Confusing lockdep message

Please ignore this mail,

I've fixed the double unlock and lockdep is still complaining about the
nested locking, so I'm actually facing multiple issues here.

Sorry to waste your time,
Christian.

Am 20.05.19 um 13:19 schrieb Christian König:
> Hi guys,
>
> writing the usual suspects about locking/lockdep stuff and also Daniel
> in CC because he might have stumbled over this as well.
>
> It took me a while to figuring out what the heck lockdep was
> complaining about. The relevant dmesg was the following:
>> [  145.623005] ==================================
>> [  145.623094] WARNING: Nested lock was not taken
>> [  145.623184] 5.0.0-rc1+ #144 Not tainted
>> [  145.623261] ----------------------------------
>> [  145.623351] amdgpu_test/1411 is trying to lock:
>> [  145.623442] 0000000098a1c4d3 (reservation_ww_class_mutex){+.+.},
>> at: ttm_eu_reserve_buffers+0x46e/0x910 [ttm]
>> [  145.623651]
>>                but this task is not holding:
>> [  145.623758] reservation_ww_class_acquire
>> [  145.623836]
>>                stack backtrace:
>> [  145.623924] CPU: 4 PID: 1411 Comm: amdgpu_test Not tainted
>> 5.0.0-rc1+ #144
>> [  145.624058] Hardware name: System manufacturer System Product
>> Name/PRIME X399-A, BIOS 0808 10/12/2018
>> [  145.624234] Call Trace:
>> ...
>
> The problem is now that the message is very confusion because the
> issue was *not* that I tried to acquire a lock, but rather that I
> accidentally released a lock twice.
>
> Now releasing a lock twice is a rather common mistake and I'm really
> surprised that I didn't get that pointed out by lockdep immediately.
>
> Additional to that I'm pretty sure that this used to work correctly
> sometimes in the past, so I'm either hitting a rare corner case or
> this broke just recently.
>
> Anyway can somebody take a look? I can try to provide a test case if
> required.
>
> Thanks in advance,
> Christian.


2019-05-20 18:17:16

by Daniel Vetter

[permalink] [raw]
Subject: Re: Confusing lockdep message

On Mon, May 20, 2019 at 1:38 PM Koenig, Christian
<[email protected]> wrote:
>
> Please ignore this mail,
>
> I've fixed the double unlock and lockdep is still complaining about the
> nested locking, so I'm actually facing multiple issues here.

The way we model the ww-mutex stuff is that the acquire-ctx is treated
as a lockdep lock, and then we require that one if you take two or
more ww-mutexes (the nested locking stuff lockdep complains about).

So you already hold a ww-mutex while trying to get a 2nd one, without
holding a ww-mutex acquire ctx ticket. Could be a ww-mutex you forgot
to unlock somewhere.
-Daniel
>
> Sorry to waste your time,
> Christian.
>
> Am 20.05.19 um 13:19 schrieb Christian König:
> > Hi guys,
> >
> > writing the usual suspects about locking/lockdep stuff and also Daniel
> > in CC because he might have stumbled over this as well.
> >
> > It took me a while to figuring out what the heck lockdep was
> > complaining about. The relevant dmesg was the following:
> >> [ 145.623005] ==================================
> >> [ 145.623094] WARNING: Nested lock was not taken
> >> [ 145.623184] 5.0.0-rc1+ #144 Not tainted
> >> [ 145.623261] ----------------------------------
> >> [ 145.623351] amdgpu_test/1411 is trying to lock:
> >> [ 145.623442] 0000000098a1c4d3 (reservation_ww_class_mutex){+.+.},
> >> at: ttm_eu_reserve_buffers+0x46e/0x910 [ttm]
> >> [ 145.623651]
> >> but this task is not holding:
> >> [ 145.623758] reservation_ww_class_acquire
> >> [ 145.623836]
> >> stack backtrace:
> >> [ 145.623924] CPU: 4 PID: 1411 Comm: amdgpu_test Not tainted
> >> 5.0.0-rc1+ #144
> >> [ 145.624058] Hardware name: System manufacturer System Product
> >> Name/PRIME X399-A, BIOS 0808 10/12/2018
> >> [ 145.624234] Call Trace:
> >> ...
> >
> > The problem is now that the message is very confusion because the
> > issue was *not* that I tried to acquire a lock, but rather that I
> > accidentally released a lock twice.
> >
> > Now releasing a lock twice is a rather common mistake and I'm really
> > surprised that I didn't get that pointed out by lockdep immediately.
> >
> > Additional to that I'm pretty sure that this used to work correctly
> > sometimes in the past, so I'm either hitting a rare corner case or
> > this broke just recently.
> >
> > Anyway can somebody take a look? I can try to provide a test case if
> > required.
> >
> > Thanks in advance,
> > Christian.
>


--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch