2023-07-03 15:52:20

by Steven Rostedt

[permalink] [raw]
Subject: Buggy rwsem locking code in fs/smb/client/file.c

I just reviewed a patch that copied a solution from
fs/smb/client/file.c (original was fs/cifs/file.c), which is really
just hiding a bug. And because this code could have possibly caused
this buggy solution to be repeated, I believe it should be fixed,
before others use it as precedent in other areas of the kernel.

Commit d46b0da7a33dd ("cifs: Fix cifsInodeInfo lock_sem deadlock when
reconnect occurs") has in its change log:

There's a deadlock that is possible and can easily be seen with
a test where multiple readers open/read/close of the same file
and a disruption occurs causing reconnect. The deadlock is due
a reader thread inside cifs_strict_readv calling down_read and
obtaining lock_sem, and then after reconnect inside
cifs_reopen_file calling down_read a second time. If in
between the two down_read calls, a down_write comes from
another process, deadlock occurs.

CPU0 CPU1
---- ----
cifs_strict_readv()
down_read(&cifsi->lock_sem);
_cifsFileInfo_put
OR
cifs_new_fileinfo
down_write(&cifsi->lock_sem);
cifs_reopen_file()
down_read(&cifsi->lock_sem);

Fix the above by changing all down_write(lock_sem) calls to
down_write_trylock(lock_sem)/msleep() loop, which in turn
makes the second down_read call benign since it will never
block behind the writer while holding lock_sem.

And hides the bug by wrapping the down_write() with:

+void
+cifs_down_write(struct rw_semaphore *sem)
+{
+ while (!down_write_trylock(sem))
+ msleep(10);
+}
+

The comment above down_read_nested() has:

/*
* nested locking. NOTE: rwsems are not allowed to recurse
* (which occurs if the same task tries to acquire the same
* lock instance multiple times), but multiple locks of the
* same lock class might be taken, if the order of the locks
* is always the same. This ordering rule can be expressed
* to lockdep via the _nested() APIs, but enumerating the
* subclasses that are used. (If the nesting relationship is
* static then another method for expressing nested locking is
* the explicit definition of lock class keys and the use of
* lockdep_set_class() at lock initialization time.
* See Documentation/locking/lockdep-design.rst for more details.)
*/

As the NOTE above states, down_read() is not a recursive lock, which
appears to be what cifs is using it for. I wonder if it could be
converted to using RCU instead.

I'm just bringing this to everyone's attention because that code really
needs to be fixed.

-- Steve


2023-07-03 17:14:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: Buggy rwsem locking code in fs/smb/client/file.c

On Mon, 3 Jul 2023 at 08:43, Steven Rostedt <[email protected]> wrote:
>
> And hides the bug by wrapping the down_write() with:
>
> +void
> +cifs_down_write(struct rw_semaphore *sem)
> +{
> + while (!down_write_trylock(sem))
> + msleep(10);
> +}

That is indeed disgusting.

It may *work* - because as the commit message says, it means that
writers are now never queued up and thus never block recursive
readers.

And in the process it now becomes absolutely horribly unfair to
writers, who will easily get starved by readers.

This is absolutely not acceptable in any sane situation. Are writers
*so* rare and special that starving them is ok?

Because starvation can be just as deadly as a deadlock. You're just
hiding the problem from lockdep and yourself.

This is very much a "head in the sand" solution.

Linus

2023-07-03 18:44:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Buggy rwsem locking code in fs/smb/client/file.c

On Mon, Jul 03, 2023 at 11:43:18AM -0400, Steven Rostedt wrote:
> I just reviewed a patch that copied a solution from
> fs/smb/client/file.c (original was fs/cifs/file.c), which is really
> just hiding a bug. And because this code could have possibly caused
> this buggy solution to be repeated, I believe it should be fixed,
> before others use it as precedent in other areas of the kernel.
>
> Commit d46b0da7a33dd ("cifs: Fix cifsInodeInfo lock_sem deadlock when
> reconnect occurs") has in its change log:

Oh man, that's .... I gotta go buy a new WTF'o'meter again :/

2023-07-03 18:56:50

by David Wysochanski

[permalink] [raw]
Subject: Re: Buggy rwsem locking code in fs/smb/client/file.c

On Mon, Jul 3, 2023 at 1:00 PM Linus Torvalds
<[email protected]> wrote:
>
> On Mon, 3 Jul 2023 at 08:43, Steven Rostedt <[email protected]> wrote:
> >
> > And hides the bug by wrapping the down_write() with:
> >
> > +void
> > +cifs_down_write(struct rw_semaphore *sem)
> > +{
> > + while (!down_write_trylock(sem))
> > + msleep(10);
> > +}
>
> That is indeed disgusting.
>
> It may *work* - because as the commit message says, it means that
> writers are now never queued up and thus never block recursive
> readers.
>
> And in the process it now becomes absolutely horribly unfair to
> writers, who will easily get starved by readers.
>
> This is absolutely not acceptable in any sane situation. Are writers
> *so* rare and special that starving them is ok?
>
> Because starvation can be just as deadly as a deadlock. You're just
> hiding the problem from lockdep and yourself.
>
> This is very much a "head in the sand" solution.
>
> Linus
>

Steve and Linus,

Thank you for pointing this out. I'll have to get with Ronnie on this
and come up with something better.