2009-01-16 18:56:22

by Vegard Nossum

[permalink] [raw]
Subject: 2.6.27.9: splice_to_pipe() hung (blocked for more than 120 seconds)

Hi,

Seeing some recent splice() discussions, I decided to explore this
system call. I have written a program which might look, well, not very
useful, but the fact is that it creates an unkillable zombie process.
Another funny side effect is that system load continually rises, even
though the system seems to stay fully interactive and functional.

After a while, I also get some messages like this:

Jan 15 20:11:37 localhost kernel: INFO: task a.out:7149 blocked for
more than 120 seconds.
Jan 15 20:11:37 localhost kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 15 20:11:37 localhost kernel: a.out D ec6e2610 0 7149 1
Jan 15 20:11:37 localhost kernel: ec5aad44 00000082 c042451f
ec6e2610 00989680 c07da67c c07ddb80 c07ddb80
Jan 15 20:11:37 localhost kernel: c07ddb80 ec6e4c20 ec6e4e7c
c201db80 00000001 c201db80 470fed45 0000036b
Jan 15 20:11:37 localhost kernel: ec5aad38 c0421027 ec6e263c
ec6e4e7c ec6e3fa8 85c129f4 ec6e4c20 ec6e4c20
Jan 15 20:11:37 localhost kernel: Call Trace:
Jan 15 20:11:37 localhost kernel: [<c064420f>] __mutex_lock_common+0x8a/0xd9
Jan 15 20:11:37 localhost kernel: [<c0644302>] __mutex_lock_slowpath+0x12/0x15
Jan 15 20:11:37 localhost kernel: [<c0644181>] mutex_lock+0x29/0x2d
Jan 15 20:11:37 localhost kernel: [<c04aa8f1>] splice_to_pipe+0x23/0x1f5
Jan 15 20:11:37 localhost kernel: [<c04ab290>]
__generic_file_splice_read+0x3ff/0x413
Jan 15 20:11:37 localhost kernel: [<c04ab324>]
generic_file_splice_read+0x80/0x9a
Jan 15 20:11:37 localhost kernel: [<c04a9e95>] do_splice_to+0x4e/0x5f
Jan 15 20:11:37 localhost kernel: [<c04aa010>] sys_splice+0x16a/0x1c8
Jan 15 20:11:37 localhost kernel: [<c0403cca>] syscall_call+0x7/0xb
Jan 15 20:11:37 localhost kernel: =======================

(but this was from such a system with 6 zombies and ~80 load. See
attachments for SysRq report with processes in blocked state, it has
similar info but for just one zombie.)

This happens with 2.6.27.9-73.fc9.i686 kernel. Maybe it was fixed
recently? (In any case, I don't think it is a regression.)

It seems to be not 100% reproducible. Sometimes it works, sometimes
not. Start the program, then after a while hit Ctrl-C. If it doesn't
exit, zombie count will rise and system state will be as described.
Compile with -lpthread.


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036


Attachments:
(No filename) (2.50 kB)
socket-pipe.c (2.08 kB)
splice-hang.txt (11.05 kB)
Download all attachments

2009-01-16 20:49:24

by Eric Dumazet

[permalink] [raw]
Subject: Re: 2.6.27.9: splice_to_pipe() hung (blocked for more than 120 seconds)

CCed to netdev

Vegard Nossum a ?crit :
> Hi,
>
> Seeing some recent splice() discussions, I decided to explore this
> system call. I have written a program which might look, well, not very
> useful, but the fact is that it creates an unkillable zombie process.
> Another funny side effect is that system load continually rises, even
> though the system seems to stay fully interactive and functional.
>
> After a while, I also get some messages like this:
>
> Jan 15 20:11:37 localhost kernel: INFO: task a.out:7149 blocked for
> more than 120 seconds.
> Jan 15 20:11:37 localhost kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jan 15 20:11:37 localhost kernel: a.out D ec6e2610 0 7149 1
> Jan 15 20:11:37 localhost kernel: ec5aad44 00000082 c042451f
> ec6e2610 00989680 c07da67c c07ddb80 c07ddb80
> Jan 15 20:11:37 localhost kernel: c07ddb80 ec6e4c20 ec6e4e7c
> c201db80 00000001 c201db80 470fed45 0000036b
> Jan 15 20:11:37 localhost kernel: ec5aad38 c0421027 ec6e263c
> ec6e4e7c ec6e3fa8 85c129f4 ec6e4c20 ec6e4c20
> Jan 15 20:11:37 localhost kernel: Call Trace:
> Jan 15 20:11:37 localhost kernel: [<c064420f>] __mutex_lock_common+0x8a/0xd9
> Jan 15 20:11:37 localhost kernel: [<c0644302>] __mutex_lock_slowpath+0x12/0x15
> Jan 15 20:11:37 localhost kernel: [<c0644181>] mutex_lock+0x29/0x2d
> Jan 15 20:11:37 localhost kernel: [<c04aa8f1>] splice_to_pipe+0x23/0x1f5
> Jan 15 20:11:37 localhost kernel: [<c04ab290>]
> __generic_file_splice_read+0x3ff/0x413
> Jan 15 20:11:37 localhost kernel: [<c04ab324>]
> generic_file_splice_read+0x80/0x9a
> Jan 15 20:11:37 localhost kernel: [<c04a9e95>] do_splice_to+0x4e/0x5f
> Jan 15 20:11:37 localhost kernel: [<c04aa010>] sys_splice+0x16a/0x1c8
> Jan 15 20:11:37 localhost kernel: [<c0403cca>] syscall_call+0x7/0xb
> Jan 15 20:11:37 localhost kernel: =======================
>
> (but this was from such a system with 6 zombies and ~80 load. See
> attachments for SysRq report with processes in blocked state, it has
> similar info but for just one zombie.)
>
> This happens with 2.6.27.9-73.fc9.i686 kernel. Maybe it was fixed
> recently? (In any case, I don't think it is a regression.)
>
> It seems to be not 100% reproducible. Sometimes it works, sometimes
> not. Start the program, then after a while hit Ctrl-C. If it doesn't
> exit, zombie count will rise and system state will be as described.
> Compile with -lpthread.
>

I tried your program on latest git tree and could not reproduce any problem.

(changed to 9 threads since I have 8 cpus)

Problem might be that your threads all fight on the same pipe, with
a mutex protecting its inode.


So mutex_lock() could possibly starve for more than 120 second ?

Maybe you can reproduce the problem using standard read()/write() syscalls...

2009-01-18 12:12:56

by Vegard Nossum

[permalink] [raw]
Subject: Re: 2.6.27.9: splice_to_pipe() hung (blocked for more than 120 seconds)

On Fri, Jan 16, 2009 at 9:48 PM, Eric Dumazet <[email protected]> wrote:
>
> I tried your program on latest git tree and could not reproduce any problem.
>
> (changed to 9 threads since I have 8 cpus)

Hm. My machine has 2 CPUs. I just reproduced it on a more recent
kernel, this time from:

commit a6525042bfdfcab128bd91fad264de10fd24a55e
Date: Tue Jan 13 14:53:16 2009 -0800

with lockdep enabled, and no bad messages. So it seems that it is not
a deadlock at least...

> Problem might be that your threads all fight on the same pipe, with
> a mutex protecting its inode.
>
>
> So mutex_lock() could possibly starve for more than 120 second ?

Much longer. Last time this happened, the zombies stayed for many
hours (until I rebooted the machine).

> Maybe you can reproduce the problem using standard read()/write() syscalls...

I can try...

Tasks: 7 total, 0 running, 6 sleeping, 0 stopped, 1 zombie
Cpu(s): 0.4%us, 6.0%sy, 0.0%ni, 92.6%id, 1.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1016180k total, 54596k used, 961584k free, 4084k buffers
Swap: 104380k total, 0k used, 104380k free, 20412k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3808 500 20 0 0 0 0 Z 0 0.0 0:00.00 a.out <defunct>
3809 500 20 0 0 0 0 D 0 0.0 0:00.00 a.out
3810 500 20 0 0 0 0 D 0 0.0 0:00.00 a.out
3813 500 20 0 0 0 0 D 0 0.0 0:00.00 a.out
3815 500 20 0 0 0 0 D 0 0.0 0:00.00 a.out
3817 500 20 0 0 0 0 D 0 0.0 0:00.00 a.out
3821 500 20 0 0 0 0 D 0 0.0 0:00.00 a.out

root@ubuntu:/home/debian# strace -p 3808
attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted

root@ubuntu:/home/debian# strace -p 3809
Process 3809 attached - interrupt to quit
<nothing>
^C^C^C^C^C^C^C^C
<doesn't quit>

root@ubuntu:/home/debian# cat /proc/3808/syscall
0 0xbfa1f5b4 0xbfa1f5b4 0xc8bff4 0xbfa1f5b4 0x0 0xbfa1f5c8 0xbfa1f3f8 0xb8020424
root@ubuntu:/home/debian# cat /proc/3809/syscall
313 0x9 0x0 0x7 0x0 0x3 0x0 0xb80012cc 0xb8020424
root@ubuntu:/home/debian# cat /proc/3810/syscall
313 0x6 0x0 0x5 0x0 0x17f 0x0 0xb780037c 0xb8020424
root@ubuntu:/home/debian# cat /proc/3813/syscall
313 0xa 0x0 0x7 0x0 0x3 0x0 0xb67fe2cc 0xb8020424
root@ubuntu:/home/debian# cat /proc/3815/syscall
313 0x6 0x0 0x5 0x0 0x17f 0x0 0xb5ffd37c 0xb8020424
root@ubuntu:/home/debian# cat /proc/3817/syscall
313 0x8 0x0 0x7 0x0 0x3 0x0 0xb4ffb2cc 0xb8020424
root@ubuntu:/home/debian# cat /proc/3821/syscall
313 0x6 0x0 0x5 0x0 0x17f 0x0 0xb47fa37c 0xb8020424

Also managed to grab this this time:

SysRq : Show Locks Held

Showing all locks held in the system:
1 lock held by getty/2130:
#0: (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
1 lock held by getty/2131:
#0: (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
1 lock held by getty/2134:
#0: (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
1 lock held by getty/2138:
#0: (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
1 lock held by getty/2142:
#0: (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
1 lock held by getty/2143:
#0: (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
1 lock held by a.out/3809:
#0: (&sb->s_type->i_mutex_key#4){--..}, at: [<c10d5515>]
splice_to_pipe+0x25/0x260
2 locks held by a.out/3810:
#0: (&sb->s_type->i_mutex_key#11/2){--..}, at: [<c10d548c>]
splice_from_pipe+0x5c/0x90
#1: (&sb->s_type->i_mutex_key#4){--..}, at: [<c10be5ac>] pipe_wait+0x6c/0x80
1 lock held by a.out/3813:
#0: (&sb->s_type->i_mutex_key#4){--..}, at: [<c10d5515>]
splice_to_pipe+0x25/0x260
2 locks held by a.out/3815:
#0: (&sb->s_type->i_mutex_key#4/1){--..}, at: [<c10c97a2>]
inode_double_lock+0x32/0xb0
#1: (&sb->s_type->i_mutex_key#11/2){--..}, at: [<c10d548c>]
splice_from_pipe+0x5c/0x90
1 lock held by a.out/3817:
#0: (&sb->s_type->i_mutex_key#4){--..}, at: [<c10d5515>]
splice_to_pipe+0x25/0x260
1 lock held by a.out/3821:
#0: (&sb->s_type->i_mutex_key#4/1){--..}, at: [<c10c97a2>]
inode_double_lock+0x32/0xb0
2 locks held by bash/3916:
#0: (sysrq_key_table_lock){....}, at: [<c12297f7>] __handle_sysrq+0x17/0x140
#1: (tasklist_lock){..--}, at: [<c1059354>] debug_show_all_locks+0x34/0x180

=============================================

Anything else that can help debug this? Machine is still running, so...

Thanks :-)


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2009-01-18 13:44:34

by Vegard Nossum

[permalink] [raw]
Subject: Re: 2.6.27.9: splice_to_pipe() hung (blocked for more than 120 seconds)

On Sun, Jan 18, 2009 at 1:12 PM, Vegard Nossum <[email protected]> wrote:
> On Fri, Jan 16, 2009 at 9:48 PM, Eric Dumazet <[email protected]> wrote:
>>
>> I tried your program on latest git tree and could not reproduce any problem.
>>
>> (changed to 9 threads since I have 8 cpus)
>
> Hm. My machine has 2 CPUs. I just reproduced it on a more recent
> kernel, this time from:
>
> commit a6525042bfdfcab128bd91fad264de10fd24a55e
> Date: Tue Jan 13 14:53:16 2009 -0800
>
> with lockdep enabled, and no bad messages. So it seems that it is not
> a deadlock at least...
>
>> Problem might be that your threads all fight on the same pipe, with
>> a mutex protecting its inode.
>>
>>
>> So mutex_lock() could possibly starve for more than 120 second ?
>
> Much longer. Last time this happened, the zombies stayed for many
> hours (until I rebooted the machine).
>
>> Maybe you can reproduce the problem using standard read()/write() syscalls...
>
> I can try...
>
> Tasks: 7 total, 0 running, 6 sleeping, 0 stopped, 1 zombie
> Cpu(s): 0.4%us, 6.0%sy, 0.0%ni, 92.6%id, 1.0%wa, 0.0%hi, 0.0%si, 0.0%st
> Mem: 1016180k total, 54596k used, 961584k free, 4084k buffers
> Swap: 104380k total, 0k used, 104380k free, 20412k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 3808 500 20 0 0 0 0 Z 0 0.0 0:00.00 a.out <defunct>
> 3809 500 20 0 0 0 0 D 0 0.0 0:00.00 a.out
> 3810 500 20 0 0 0 0 D 0 0.0 0:00.00 a.out
> 3813 500 20 0 0 0 0 D 0 0.0 0:00.00 a.out
> 3815 500 20 0 0 0 0 D 0 0.0 0:00.00 a.out
> 3817 500 20 0 0 0 0 D 0 0.0 0:00.00 a.out
> 3821 500 20 0 0 0 0 D 0 0.0 0:00.00 a.out
>
> root@ubuntu:/home/debian# strace -p 3808
> attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
>
> root@ubuntu:/home/debian# strace -p 3809
> Process 3809 attached - interrupt to quit
> <nothing>
> ^C^C^C^C^C^C^C^C
> <doesn't quit>
>
> root@ubuntu:/home/debian# cat /proc/3808/syscall
> 0 0xbfa1f5b4 0xbfa1f5b4 0xc8bff4 0xbfa1f5b4 0x0 0xbfa1f5c8 0xbfa1f3f8 0xb8020424
> root@ubuntu:/home/debian# cat /proc/3809/syscall
> 313 0x9 0x0 0x7 0x0 0x3 0x0 0xb80012cc 0xb8020424
> root@ubuntu:/home/debian# cat /proc/3810/syscall
> 313 0x6 0x0 0x5 0x0 0x17f 0x0 0xb780037c 0xb8020424
> root@ubuntu:/home/debian# cat /proc/3813/syscall
> 313 0xa 0x0 0x7 0x0 0x3 0x0 0xb67fe2cc 0xb8020424
> root@ubuntu:/home/debian# cat /proc/3815/syscall
> 313 0x6 0x0 0x5 0x0 0x17f 0x0 0xb5ffd37c 0xb8020424
> root@ubuntu:/home/debian# cat /proc/3817/syscall
> 313 0x8 0x0 0x7 0x0 0x3 0x0 0xb4ffb2cc 0xb8020424
> root@ubuntu:/home/debian# cat /proc/3821/syscall
> 313 0x6 0x0 0x5 0x0 0x17f 0x0 0xb47fa37c 0xb8020424
>
> Also managed to grab this this time:
>
> SysRq : Show Locks Held
>
> Showing all locks held in the system:
> 1 lock held by getty/2130:
> #0: (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
> 1 lock held by getty/2131:
> #0: (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
> 1 lock held by getty/2134:
> #0: (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
> 1 lock held by getty/2138:
> #0: (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
> 1 lock held by getty/2142:
> #0: (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
> 1 lock held by getty/2143:
> #0: (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
> 1 lock held by a.out/3809:
> #0: (&sb->s_type->i_mutex_key#4){--..}, at: [<c10d5515>]
> splice_to_pipe+0x25/0x260
> 2 locks held by a.out/3810:
> #0: (&sb->s_type->i_mutex_key#11/2){--..}, at: [<c10d548c>]
> splice_from_pipe+0x5c/0x90
> #1: (&sb->s_type->i_mutex_key#4){--..}, at: [<c10be5ac>] pipe_wait+0x6c/0x80
> 1 lock held by a.out/3813:
> #0: (&sb->s_type->i_mutex_key#4){--..}, at: [<c10d5515>]
> splice_to_pipe+0x25/0x260
> 2 locks held by a.out/3815:
> #0: (&sb->s_type->i_mutex_key#4/1){--..}, at: [<c10c97a2>]
> inode_double_lock+0x32/0xb0
> #1: (&sb->s_type->i_mutex_key#11/2){--..}, at: [<c10d548c>]
> splice_from_pipe+0x5c/0x90
> 1 lock held by a.out/3817:
> #0: (&sb->s_type->i_mutex_key#4){--..}, at: [<c10d5515>]
> splice_to_pipe+0x25/0x260
> 1 lock held by a.out/3821:
> #0: (&sb->s_type->i_mutex_key#4/1){--..}, at: [<c10c97a2>]
> inode_double_lock+0x32/0xb0
> 2 locks held by bash/3916:
> #0: (sysrq_key_table_lock){....}, at: [<c12297f7>] __handle_sysrq+0x17/0x140
> #1: (tasklist_lock){..--}, at: [<c1059354>] debug_show_all_locks+0x34/0x180
>
> =============================================
>

I have one theory. We have this skeleton:

ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out,
loff_t *ppos, size_t len, unsigned int flags,
splice_actor *actor)
{
...
inode_double_lock(inode, pipe->inode);
ret = __splice_from_pipe(pipe, &sd, actor);
inode_double_unlock(inode, pipe->inode);
...
}

ssize_t __splice_from_pipe(struct pipe_inode_info *pipe, struct splice_desc *sd,
splice_actor *actor)
{
...
pipe_wait(pipe);
...
}

void pipe_wait(struct pipe_inode_info *pipe)
{
if (pipe->inode)
mutex_unlock(&pipe->inode->i_mutex);
...
if (pipe->inode)
mutex_lock(&pipe->inode->i_mutex);
}

So in short: Is it possible that inode_double_lock() in
splice_from_pipe() first locks the pipe mutex, THEN locks the
file/socket mutex? In that case, there should be a lock imbalance,
because pipe_wait() would unlock the pipe while the file/socket mutex
is held.

That would possibly explain the sporadicity of the lockup; it depends
on the actual order of the double lock.

Why doesn't lockdep report that? Hm. I guess it is because these are
both inode mutexes and lockdep can't detect a locking imbalance within
the same lock class?

Anyway, that's just a theory. :-) Will try to confirm by simplifying
the test-case.


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2009-01-18 14:10:23

by Vegard Nossum

[permalink] [raw]
Subject: Re: 2.6.27.9: splice_to_pipe() hung (blocked for more than 120 seconds)

On Sun, Jan 18, 2009 at 2:44 PM, Vegard Nossum <[email protected]> wrote:
> So in short: Is it possible that inode_double_lock() in
> splice_from_pipe() first locks the pipe mutex, THEN locks the
> file/socket mutex? In that case, there should be a lock imbalance,
> because pipe_wait() would unlock the pipe while the file/socket mutex
> is held.
>
> That would possibly explain the sporadicity of the lockup; it depends
> on the actual order of the double lock.
>
> Why doesn't lockdep report that? Hm. I guess it is because these are
> both inode mutexes and lockdep can't detect a locking imbalance within
> the same lock class?
>
> Anyway, that's just a theory. :-) Will try to confirm by simplifying
> the test-case.

Hm, I do believe this _is_ evidence in favour of the theory:

top - 09:03:57 up 2:16, 2 users, load average: 129.27, 49.28, 21.57
Tasks: 161 total, 1 running, 95 sleeping, 1 stopped, 64 zombie

:-)

#define _GNU_SOURCE

#include <sys/socket.h>
#include <sys/types.h>

#include <fcntl.h>
#include <errno.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

static int sock_fd[2];
static int pipe_fd[2];

#define N 16384

static void *do_write(void *unused)
{
unsigned int i;

for (i = 0; i < N; ++i)
write(pipe_fd[1], "x", 1);

return NULL;
}

static void *do_read(void *unused)
{
unsigned int i;
char c;

for (i = 0; i < N; ++i)
read(sock_fd[0], &c, 1);

return NULL;
}

static void *do_splice(void *unused)
{
unsigned int i;

for (i = 0; i < N; ++i)
splice(pipe_fd[0], NULL, sock_fd[1], NULL, 1, 0);

return NULL;
}

int main(int argc, char *argv[])
{
pthread_t writer;
pthread_t reader;
pthread_t splicer[2];

while (1) {
if (socketpair(AF_UNIX, SOCK_STREAM, 0, sock_fd) == -1)
exit(EXIT_FAILURE);

if (pipe(pipe_fd) == -1)
exit(EXIT_FAILURE);

pthread_create(&writer, NULL, &do_write, NULL);
pthread_create(&reader, NULL, &do_read, NULL);
pthread_create(&splicer[0], NULL, &do_splice, NULL);
pthread_create(&splicer[1], NULL, &do_splice, NULL);

pthread_join(writer, NULL);
pthread_join(reader, NULL);
pthread_join(splicer[0], NULL);
pthread_join(splicer[1], NULL);

printf("failed to deadlock, retrying...\n");
}

return EXIT_SUCCESS;
}

$ gcc splice.c -lpthread
$ ./a.out &
$ ./a.out &
$ ./a.out &
(as many as you want; then wait for a bit -- ten seconds works for me)
$ killall -9 a.out
(not all will die -- those are now zombies)


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2009-01-19 13:59:19

by Jarek Poplawski

[permalink] [raw]
Subject: Re: 2.6.27.9: splice_to_pipe() hung (blocked for more than 120 seconds)

On 18-01-2009 14:44, Vegard Nossum wrote:
...
>
> I have one theory. We have this skeleton:
>
> ssize_t splice_from_pipe(struct pipe_inode_info *pipe, struct file *out,
> loff_t *ppos, size_t len, unsigned int flags,
> splice_actor *actor)
> {
> ...
> inode_double_lock(inode, pipe->inode);
> ret = __splice_from_pipe(pipe, &sd, actor);
> inode_double_unlock(inode, pipe->inode);
> ...
> }
>
> ssize_t __splice_from_pipe(struct pipe_inode_info *pipe, struct splice_desc *sd,
> splice_actor *actor)
> {
> ...
> pipe_wait(pipe);
> ...
> }
>
> void pipe_wait(struct pipe_inode_info *pipe)
> {
> if (pipe->inode)
> mutex_unlock(&pipe->inode->i_mutex);
> ...
> if (pipe->inode)
> mutex_lock(&pipe->inode->i_mutex);
> }
>
> So in short: Is it possible that inode_double_lock() in
> splice_from_pipe() first locks the pipe mutex, THEN locks the
> file/socket mutex? In that case, there should be a lock imbalance,
> because pipe_wait() would unlock the pipe while the file/socket mutex
> is held.

I guess you mean a lock inversion.

>
> That would possibly explain the sporadicity of the lockup; it depends
> on the actual order of the double lock.
>
> Why doesn't lockdep report that? Hm. I guess it is because these are
> both inode mutexes and lockdep can't detect a locking imbalance within
> the same lock class?

Looks like you are right. Since there is used mutex_lock_nested() for
these locks in inode_double_lock(), lockdep could be mislead by this
"common" mutex_lock() later (but I didn't check this too much).

Jarek P.