LinuxLists.cc - net: BUG still has locks held in unix_stream_splice

2016-10-09 10:06:06

Subject: net: BUG still has locks held in unix_stream_splice_read

Hello,

While running syzkaller fuzzer on commit
b66484cd74706fa8681d051840fe4b18a3da40ff (Oct 7), I am getting:

[ BUG: syz-executor/15138 still has locks held! ]
4.8.0+ #29 Not tainted
-------------------------------------
1 lock held by syz-executor/15138:
#0: (&pipe->mutex/1){+.+.+.}, at: [< inline >]
pipe_lock_nested fs/pipe.c:66
#0: (&pipe->mutex/1){+.+.+.}, at: [<ffffffff81844c8b>]
pipe_lock+0x5b/0x70 fs/pipe.c:74

stack backtrace:
CPU: 1 PID: 15138 Comm: syz-executor Not tainted 4.8.0+ #29
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
ffff880044d4fa38 ffffffff82d383c9 ffffffff00000000 fffffbfff1097248
ffff88005a44a3c0 ffff88005a44a3c0 dffffc0000000000 ffff88005a44a3c0
ffff8800541fb9b8 ffff880044d4fa58 ffffffff81463cd5 0000000000000000
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff82d383c9>] dump_stack+0x12e/0x185 lib/dump_stack.c:51
[< inline >] print_held_locks_bug kernel/locking/lockdep.c:4296
[<ffffffff81463cd5>] debug_check_no_locks_held+0x125/0x140
kernel/locking/lockdep.c:4302
[< inline >] try_to_freeze include/linux/freezer.h:65
[< inline >] freezer_count include/linux/freezer.h:127
[< inline >] freezable_schedule_timeout include/linux/freezer.h:192
[< inline >] unix_stream_data_wait net/unix/af_unix.c:2223
[<ffffffff860bae67>] unix_stream_read_generic+0x1317/0x1b70
net/unix/af_unix.c:2332
[<ffffffff860bb81b>] unix_stream_splice_read+0x15b/0x1d0
net/unix/af_unix.c:2506
[<ffffffff85afc56e>] sock_splice_read+0xbe/0x100 net/socket.c:775
[<ffffffff818d121f>] do_splice_to+0x10f/0x170 fs/splice.c:908
[< inline >] do_splice fs/splice.c:1196
[< inline >] SYSC_splice fs/splice.c:1420
[<ffffffff818d6aec>] SyS_splice+0x114c/0x15b0 fs/splice.c:1403
[<ffffffff86da6d05>] entry_SYSCALL_64_fastpath+0x23/0xc6

I suspect this is:

commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4
Author: Al Viro <[email protected]>
Date: Sat Sep 17 21:02:10 2016 -0400
skb_splice_bits(): get rid of callback
since pipe_lock is the outermost now, we don't need to drop/regain
socket locks around the call of splice_to_pipe() from skb_splice_bits(),
which kills the need to have a socket-specific callback; we can just
call splice_to_pipe() and be done with that.

2016-10-09 10:06:38

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: net: BUG still has locks held in unix_stream_splice_read

2016-10-10 02:47:12

by Al Viro

[permalink] [raw]

Subject: Re: net: BUG still has locks held in unix_stream_splice_read

On Sun, Oct 09, 2016 at 12:06:14PM +0200, Dmitry Vyukov wrote:
> I suspect this is:
>
> commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4
> Author: Al Viro <[email protected]>
> Date: Sat Sep 17 21:02:10 2016 -0400
> skb_splice_bits(): get rid of callback
> since pipe_lock is the outermost now, we don't need to drop/regain
> socket locks around the call of splice_to_pipe() from skb_splice_bits(),
> which kills the need to have a socket-specific callback; we can just
> call splice_to_pipe() and be done with that.

Unlikely, since that particular commit removes unlocking/relocking ->iolock
around the call of splice_to_pipe(). Original would've retaken the same
lock on the way out; it's not as if we could leave the syscall there.

It might be splice-related, but I don't believe that you've got the right
commit here.

2016-10-10 03:14:58

by Al Viro

[permalink] [raw]

Subject: Re: net: BUG still has locks held in unix_stream_splice_read

On Mon, Oct 10, 2016 at 03:46:07AM +0100, Al Viro wrote:
> On Sun, Oct 09, 2016 at 12:06:14PM +0200, Dmitry Vyukov wrote:
> > I suspect this is:
> >
> > commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4
> > Author: Al Viro <[email protected]>
> > Date: Sat Sep 17 21:02:10 2016 -0400
> > skb_splice_bits(): get rid of callback
> > since pipe_lock is the outermost now, we don't need to drop/regain
> > socket locks around the call of splice_to_pipe() from skb_splice_bits(),
> > which kills the need to have a socket-specific callback; we can just
> > call splice_to_pipe() and be done with that.
>
> Unlikely, since that particular commit removes unlocking/relocking ->iolock
> around the call of splice_to_pipe(). Original would've retaken the same
> lock on the way out; it's not as if we could leave the syscall there.
>
> It might be splice-related, but I don't believe that you've got the right
> commit here.

It's not that commit, all right - it's "can't call unix_stream_read_generic()
with any locks held" stepped onto a couple of commits prior by
"splice: lift pipe_lock out of splice_to_pipe()". Could somebody explain
what is that about?

E.g what will happen if some code does a read on AF_UNIX socket with
some local mutex held? AFAICS, there are exactly two callers of
freezable_schedule_timeout() - this one and one in XFS; the latter is
in a kernel thread where we do have good warranties about the locking
environment, but here it's in the bleeding ->recvmsg/->splice_read and
for those assumption that caller doesn't hold any locks is pretty
strong, especially since it's not documented anywhere.

What's going on there?

2016-10-10 08:08:52

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: net: BUG still has locks held in unix_stream_splice_read

On Mon, Oct 10, 2016 at 5:14 AM, Al Viro <[email protected]> wrote:
> On Mon, Oct 10, 2016 at 03:46:07AM +0100, Al Viro wrote:
>> On Sun, Oct 09, 2016 at 12:06:14PM +0200, Dmitry Vyukov wrote:
>> > I suspect this is:
>> >
>> > commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4
>> > Author: Al Viro <[email protected]>
>> > Date: Sat Sep 17 21:02:10 2016 -0400
>> > skb_splice_bits(): get rid of callback
>> > since pipe_lock is the outermost now, we don't need to drop/regain
>> > socket locks around the call of splice_to_pipe() from skb_splice_bits(),
>> > which kills the need to have a socket-specific callback; we can just
>> > call splice_to_pipe() and be done with that.
>>
>> Unlikely, since that particular commit removes unlocking/relocking ->iolock
>> around the call of splice_to_pipe(). Original would've retaken the same
>> lock on the way out; it's not as if we could leave the syscall there.
>>
>> It might be splice-related, but I don't believe that you've got the right
>> commit here.
>
> It's not that commit

It's highly likely. Sorry for falsely pointing to your commit.

> , all right - it's "can't call unix_stream_read_generic()
> with any locks held" stepped onto a couple of commits prior by
> "splice: lift pipe_lock out of splice_to_pipe()". Could somebody explain
> what is that about?
>
> E.g what will happen if some code does a read on AF_UNIX socket with
> some local mutex held? AFAICS, there are exactly two callers of
> freezable_schedule_timeout() - this one and one in XFS; the latter is
> in a kernel thread where we do have good warranties about the locking
> environment, but here it's in the bleeding ->recvmsg/->splice_read and
> for those assumption that caller doesn't hold any locks is pretty
> strong, especially since it's not documented anywhere.
>
> What's going on there?

I never saw that warning before. There is some possibility that fuzzer
has discovered some new paths, but it's much more likely that
something has changed recently (the stack looks quite simple -- just a
splice from unix socket). And my previous pull was like a week ago.

2016-11-17 18:02:59

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: net: BUG still has locks held in unix_stream_splice_read

On Mon, Oct 10, 2016 at 10:01 AM, Dmitry Vyukov <[email protected]> wrote:
> On Mon, Oct 10, 2016 at 5:14 AM, Al Viro <[email protected]> wrote:
>> On Mon, Oct 10, 2016 at 03:46:07AM +0100, Al Viro wrote:
>>> On Sun, Oct 09, 2016 at 12:06:14PM +0200, Dmitry Vyukov wrote:
>>> > I suspect this is:
>>> >
>>> > commit 25869262ef7af24ccde988867ac3eb1c3d4b88d4
>>> > Author: Al Viro <[email protected]>
>>> > Date: Sat Sep 17 21:02:10 2016 -0400
>>> > skb_splice_bits(): get rid of callback
>>> > since pipe_lock is the outermost now, we don't need to drop/regain
>>> > socket locks around the call of splice_to_pipe() from skb_splice_bits(),
>>> > which kills the need to have a socket-specific callback; we can just
>>> > call splice_to_pipe() and be done with that.
>>>
>>> Unlikely, since that particular commit removes unlocking/relocking ->iolock
>>> around the call of splice_to_pipe(). Original would've retaken the same
>>> lock on the way out; it's not as if we could leave the syscall there.
>>>
>>> It might be splice-related, but I don't believe that you've got the right
>>> commit here.
>>
>> It's not that commit
>
> It's highly likely. Sorry for falsely pointing to your commit.
>
>
>> , all right - it's "can't call unix_stream_read_generic()
>> with any locks held" stepped onto a couple of commits prior by
>> "splice: lift pipe_lock out of splice_to_pipe()". Could somebody explain
>> what is that about?
>>
>> E.g what will happen if some code does a read on AF_UNIX socket with
>> some local mutex held? AFAICS, there are exactly two callers of
>> freezable_schedule_timeout() - this one and one in XFS; the latter is
>> in a kernel thread where we do have good warranties about the locking
>> environment, but here it's in the bleeding ->recvmsg/->splice_read and
>> for those assumption that caller doesn't hold any locks is pretty
>> strong, especially since it's not documented anywhere.
>>
>> What's going on there?
>
> I never saw that warning before. There is some possibility that fuzzer
> has discovered some new paths, but it's much more likely that
> something has changed recently (the stack looks quite simple -- just a
> splice from unix socket). And my previous pull was like a week ago.

Ping. Just hit it again on 4.9-rc5

[ BUG: syz-executor/15922 still has locks held! ]
4.9.0-rc5+ #43 Not tainted
-------------------------------------
1 lock held by syz-executor/15922:
#0: [ 1441.143288] (
[< inline >] pipe_lock_nested fs/pipe.c:66
[<ffffffff81a8d72b>] pipe_lock+0x5b/0x70 fs/pipe.c:74

stack backtrace:
CPU: 3 PID: 15922 Comm: syz-executor Not tainted 4.9.0-rc5+ #43
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffff88004a98f568 ffffffff834c2a19 ffffffff00000003 1ffff10009531e40
ffffed0009531e38 0000000041b58ab3 ffffffff895758b0 ffffffff834c272b
0000000000000003 ffff880035256640 0000000000000003 ffff88006d122cd8
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff834c2a19>] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51
[< inline >] print_held_locks_bug kernel/locking/lockdep.c:4296
[<ffffffff81560245>] debug_check_no_locks_held+0x125/0x140
kernel/locking/lockdep.c:4302
[< inline >] try_to_freeze include/linux/freezer.h:65
[< inline >] freezer_count include/linux/freezer.h:127
[< inline >] freezable_schedule_timeout include/linux/freezer.h:192
[<ffffffff8716ed5d>] unix_stream_data_wait+0x4fd/0x910 net/unix/af_unix.c:2223
[<ffffffff871749a2>] unix_stream_read_generic+0x11e2/0x2240
net/unix/af_unix.c:2332
[<ffffffff87175c7f>] unix_stream_splice_read+0x27f/0x400
net/unix/af_unix.c:2506
[<ffffffff86a6578e>] sock_splice_read+0xbe/0x100 net/socket.c:772
[<ffffffff81b3f90f>] do_splice_to+0x10f/0x170 fs/splice.c:897
[< inline >] do_splice fs/splice.c:1185
[< inline >] SYSC_splice fs/splice.c:1409
[<ffffffff81b4683a>] SyS_splice+0xfaa/0x16a0 fs/splice.c:1392
[<ffffffff88147905>] entry_SYSCALL_64_fastpath+0x23/0xc6

2016-11-17 21:44:59

by Cong Wang

[permalink] [raw]

Subject: Re: net: BUG still has locks held in unix_stream_splice_read

On Sun, Oct 9, 2016 at 8:14 PM, Al Viro <[email protected]> wrote:
> E.g what will happen if some code does a read on AF_UNIX socket with
> some local mutex held? AFAICS, there are exactly two callers of
> freezable_schedule_timeout() - this one and one in XFS; the latter is
> in a kernel thread where we do have good warranties about the locking
> environment, but here it's in the bleeding ->recvmsg/->splice_read and
> for those assumption that caller doesn't hold any locks is pretty
> strong, especially since it's not documented anywhere.
>
> What's going on there?

Commit 2b15af6f95 ("af_unix: use freezable blocking calls in read")
converts schedule_timeout() to its freezable version, it was probably correct
at that time, but later, commit 2b514574f7e88c8498027ee366
("net: af_unix: implement splice for stream af_unix sockets") breaks its
requirement for a freezable sleep:

commit 0f9548ca10916dec166eaf74c816bded7d8e611d

lockdep: check that no locks held at freeze time

We shouldn't try_to_freeze if locks are held. Holding a lock can cause a
deadlock if the lock is later acquired in the suspend or hibernate path
(e.g. by dpm). Holding a lock can also cause a deadlock in the case of
cgroup_freezer if a lock is held inside a frozen cgroup that is later
acquired by a process outside that group.

So probably we just need to revert commit 2b15af6f95 now.

I am going to send a revert for at least -net and -stable, since Dmitry
saw this warning again.

2016-11-17 22:27:24

by Hannes Frederic Sowa

[permalink] [raw]

Subject: Re: net: BUG still has locks held in unix_stream_splice_read

On 17.11.2016 22:44, Cong Wang wrote:
> On Sun, Oct 9, 2016 at 8:14 PM, Al Viro <[email protected]> wrote:
>> E.g what will happen if some code does a read on AF_UNIX socket with
>> some local mutex held? AFAICS, there are exactly two callers of
>> freezable_schedule_timeout() - this one and one in XFS; the latter is
>> in a kernel thread where we do have good warranties about the locking
>> environment, but here it's in the bleeding ->recvmsg/->splice_read and
>> for those assumption that caller doesn't hold any locks is pretty
>> strong, especially since it's not documented anywhere.
>>
>> What's going on there?
>
> Commit 2b15af6f95 ("af_unix: use freezable blocking calls in read")
> converts schedule_timeout() to its freezable version, it was probably correct
> at that time, but later, commit 2b514574f7e88c8498027ee366
> ("net: af_unix: implement splice for stream af_unix sockets") breaks its
> requirement for a freezable sleep:
>
> commit 0f9548ca10916dec166eaf74c816bded7d8e611d
>
> lockdep: check that no locks held at freeze time
>
> We shouldn't try_to_freeze if locks are held. Holding a lock can cause a
> deadlock if the lock is later acquired in the suspend or hibernate path
> (e.g. by dpm). Holding a lock can also cause a deadlock in the case of
> cgroup_freezer if a lock is held inside a frozen cgroup that is later
> acquired by a process outside that group.
>
> So probably we just need to revert commit 2b15af6f95 now.
>
> I am going to send a revert for at least -net and -stable, since Dmitry
> saw this warning again.

I am not an expert on freezing but this looks around right from the
freezer code. Awesome, thanks a lot for spotting this one!