2022-03-03 14:41:12

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Report 2 in ext4 and journal based on v5.17-rc1

On Thu, Mar 03, 2022 at 02:23:33PM +0900, Byungchul Park wrote:
> I totally agree with you. *They aren't really locks but it's just waits
> and wakeups.* That's exactly why I decided to develop Dept. Dept is not
> interested in locks unlike Lockdep, but fouces on waits and wakeup
> sources itself. I think you get Dept wrong a lot. Please ask me more if
> you have things you doubt about Dept.

So the question is this --- do you now understand why, even though
there is a circular dependency, nothing gets stalled in the
interactions between the two wait channels?

- Ted


2022-03-04 06:51:58

by Byungchul Park

[permalink] [raw]
Subject: Re: Report 2 in ext4 and journal based on v5.17-rc1

On Thu, Mar 03, 2022 at 09:36:25AM -0500, Theodore Ts'o wrote:
> On Thu, Mar 03, 2022 at 02:23:33PM +0900, Byungchul Park wrote:
> > I totally agree with you. *They aren't really locks but it's just waits
> > and wakeups.* That's exactly why I decided to develop Dept. Dept is not
> > interested in locks unlike Lockdep, but fouces on waits and wakeup
> > sources itself. I think you get Dept wrong a lot. Please ask me more if
> > you have things you doubt about Dept.
>
> So the question is this --- do you now understand why, even though
> there is a circular dependency, nothing gets stalled in the
> interactions between the two wait channels?

I found a point that the two wait channels don't lead a deadlock in
some cases thanks to Jan Kara. I will fix it so that Dept won't
complain it.

Thanks,
Byungchul

>
> - Ted

2022-03-05 05:23:38

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Report 2 in ext4 and journal based on v5.17-rc1

On Fri, Mar 04, 2022 at 12:20:02PM +0900, Byungchul Park wrote:
>
> I found a point that the two wait channels don't lead a deadlock in
> some cases thanks to Jan Kara. I will fix it so that Dept won't
> complain it.

I sent my last (admittedly cranky) message before you sent this. I'm
glad you finally understood Jan's explanation. I was trying to tell
you the same thing, but apparently I failed to communicate in a
sufficiently clear manner. In any case, what Jan described is a
fundamental part of how wait queues work, and I'm kind of amazed that
you were able to implement DEPT without understanding it. (But maybe
that is why some of the DEPT reports were completely incomprehensible
to me; I couldn't interpret why in the world DEPT was saying there was
a problem.)

In any case, the thing I would ask is a little humility. We regularly
use lockdep, and we run a huge number of stress tests, throughout each
development cycle.

So if DEPT is issuing lots of reports about apparently circular
dependencies, please try to be open to the thought that the fault is
in DEPT, and don't try to argue with maintainers that their code MUST
be buggy --- but since you don't understand our code, and DEPT must be
theoretically perfect, that it is up to the Maintainers to prove to
you that their code is correct.

I am going to gently suggest that it is at least as likely, if not
more likely, that the failure is in DEPT or your understanding of what
how kernel wait channels and locking works. After all, why would it
be that we haven't found these problems via our other QA practices?

Cheers,

- Ted