2013-10-01 14:58:10

by Dave Jones

[permalink] [raw]
Subject: pipe/cred lockdep warning

I seem to recall hitting this quite a while ago. Does it look familiar ?
Either it didn't get fixed, or it's back..

Dave

[ 2836.628351] ======================================================
[ 2836.628392] [ INFO: possible circular locking dependency detected ]
[ 2836.628436] 3.12.0-rc3+ #90 Not tainted
[ 2836.628464] -------------------------------------------------------
[ 2836.628506] trinity-child0/30759 is trying to acquire lock:
[ 2836.628544] (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff81239045>] proc_pid_attr_write+0xf5/0x140
[ 2836.628624]
but task is already holding lock:
[ 2836.628664] (&pipe->mutex/1){+.+.+.}, at: [<ffffffff811ccf66>] pipe_lock+0x26/0x30
[ 2836.628734]
which lock already depends on the new lock.

[ 2836.628787]
the existing dependency chain (in reverse order) is:
[ 2836.628837]
-> #2 (&pipe->mutex/1){+.+.+.}:
[ 2836.628888] [<ffffffff810cc1e3>] lock_acquire+0x93/0x200
[ 2836.628933] [<ffffffff8172de5c>] mutex_lock_nested+0x7c/0x420
[ 2836.628981] [<ffffffff811ccf66>] pipe_lock+0x26/0x30
[ 2836.629023] [<ffffffff811f6e94>] generic_file_splice_write+0x64/0x170
[ 2836.629076] [<ffffffffa02dc710>] xfs_file_splice_write+0xb0/0x230 [xfs]
[ 2836.629152] [<ffffffff811f8dca>] SyS_splice+0x24a/0x7e0
[ 2836.629196] [<ffffffff8173ce24>] tracesys+0xdd/0xe2
[ 2836.629238]
-> #1 (&(&ip->i_iolock)->mr_lock){++++++}:
[ 2836.629291] [<ffffffff810cc1e3>] lock_acquire+0x93/0x200
[ 2836.629335] [<ffffffff81089562>] down_read_nested+0x52/0xa0
[ 2836.630980] [<ffffffffa033b7a0>] xfs_ilock+0x1d0/0x280 [xfs]
[ 2836.632649] [<ffffffffa02dc161>] xfs_file_aio_read+0x111/0x3e0 [xfs]
[ 2836.634313] [<ffffffff811c3f1a>] do_sync_read+0x5a/0x90
[ 2836.635957] [<ffffffff811c4551>] vfs_read+0xa1/0x170
[ 2836.637581] [<ffffffff811ca911>] kernel_read+0x41/0x60
[ 2836.639184] [<ffffffff811cab33>] prepare_binprm+0xb3/0x170
[ 2836.640768] [<ffffffff811cc733>] do_execve_common.isra.28+0x5a3/0x990
[ 2836.642344] [<ffffffff811cce06>] SyS_execve+0x36/0x50
[ 2836.643904] [<ffffffff8173d209>] stub_execve+0x69/0xa0
[ 2836.645442]
-> #0 (&sig->cred_guard_mutex){+.+.+.}:
[ 2836.648483] [<ffffffff810cb76a>] __lock_acquire+0x191a/0x1be0
[ 2836.650025] [<ffffffff810cc1e3>] lock_acquire+0x93/0x200
[ 2836.651559] [<ffffffff8172eba7>] mutex_lock_interruptible_nested+0x77/0x510
[ 2836.653110] [<ffffffff81239045>] proc_pid_attr_write+0xf5/0x140
[ 2836.654659] [<ffffffff811c4f92>] __kernel_write+0x72/0x150
[ 2836.656208] [<ffffffff811f7269>] write_pipe_buf+0x49/0x70
[ 2836.657750] [<ffffffff811f6894>] splice_from_pipe_feed+0x84/0x140
[ 2836.659310] [<ffffffff811f6c3e>] __splice_from_pipe+0x6e/0x90
[ 2836.660868] [<ffffffff811f8731>] splice_from_pipe+0x51/0x70
[ 2836.662423] [<ffffffff811f8789>] default_file_splice_write+0x19/0x30
[ 2836.663979] [<ffffffff811f8dca>] SyS_splice+0x24a/0x7e0
[ 2836.665518] [<ffffffff8173ce24>] tracesys+0xdd/0xe2
[ 2836.666999]
other info that might help us debug this:

[ 2836.671264] Chain exists of:
&sig->cred_guard_mutex --> &(&ip->i_iolock)->mr_lock --> &pipe->mutex/1

[ 2836.675587] Possible unsafe locking scenario:

[ 2836.678443] CPU0 CPU1
[ 2836.679832] ---- ----
[ 2836.681187] lock(&pipe->mutex/1);
[ 2836.682548] lock(&(&ip->i_iolock)->mr_lock);
[ 2836.683938] lock(&pipe->mutex/1);
[ 2836.685309] lock(&sig->cred_guard_mutex);
[ 2836.686670]
*** DEADLOCK ***

[ 2836.690616] 2 locks held by trinity-child0/30759:
[ 2836.691921] #0: (sb_writers#4){.+.+.+}, at: [<ffffffff811f92b1>] SyS_splice+0x731/0x7e0
[ 2836.693264] #1: (&pipe->mutex/1){+.+.+.}, at: [<ffffffff811ccf66>] pipe_lock+0x26/0x30
[ 2836.694597]
stack backtrace:
[ 2836.697165] CPU: 0 PID: 30759 Comm: trinity-child0 Not tainted 3.12.0-rc3+ #90
[ 2836.699886] ffffffff825329d0 ffff88009cb61b88 ffffffff81729d13 ffffffff82504b80
[ 2836.701295] ffff88009cb61bc8 ffffffff817260f1 ffff88009cb61c20 ffff88009caa5cb8
[ 2836.702723] ffff88009caa5580 0000000000000001 ffff88009caa5c80 ffff88009caa5cb8
[ 2836.704149] Call Trace:
[ 2836.705547] [<ffffffff81729d13>] dump_stack+0x4e/0x82
[ 2836.706958] [<ffffffff817260f1>] print_circular_bug+0x200/0x20f
[ 2836.708365] [<ffffffff810cb76a>] __lock_acquire+0x191a/0x1be0
[ 2836.709778] [<ffffffff810cc1e3>] lock_acquire+0x93/0x200
[ 2836.711184] [<ffffffff81239045>] ? proc_pid_attr_write+0xf5/0x140
[ 2836.712596] [<ffffffff81239045>] ? proc_pid_attr_write+0xf5/0x140
[ 2836.713983] [<ffffffff8172eba7>] mutex_lock_interruptible_nested+0x77/0x510
[ 2836.715382] [<ffffffff81239045>] ? proc_pid_attr_write+0xf5/0x140
[ 2836.716785] [<ffffffff81239045>] ? proc_pid_attr_write+0xf5/0x140
[ 2836.718171] [<ffffffff811a5b99>] ? alloc_pages_current+0xa9/0x170
[ 2836.719554] [<ffffffff81239045>] proc_pid_attr_write+0xf5/0x140
[ 2836.720926] [<ffffffff811f7220>] ? splice_direct_to_actor+0x1f0/0x1f0
[ 2836.722303] [<ffffffff811c4f92>] __kernel_write+0x72/0x150
[ 2836.723680] [<ffffffff811f7220>] ? splice_direct_to_actor+0x1f0/0x1f0
[ 2836.725052] [<ffffffff811f7269>] write_pipe_buf+0x49/0x70
[ 2836.726420] [<ffffffff811f6894>] splice_from_pipe_feed+0x84/0x140
[ 2836.727805] [<ffffffff811f7220>] ? splice_direct_to_actor+0x1f0/0x1f0
[ 2836.729199] [<ffffffff811f6c3e>] __splice_from_pipe+0x6e/0x90
[ 2836.730589] [<ffffffff811f7220>] ? splice_direct_to_actor+0x1f0/0x1f0
[ 2836.731983] [<ffffffff811f8731>] splice_from_pipe+0x51/0x70
[ 2836.733373] [<ffffffff811f8789>] default_file_splice_write+0x19/0x30
[ 2836.734771] [<ffffffff811f8dca>] SyS_splice+0x24a/0x7e0
[ 2836.736161] [<ffffffff8173ce24>] tracesys+0xdd/0xe2


2013-10-03 18:56:34

by Al Viro

[permalink] [raw]
Subject: Re: pipe/cred lockdep warning

On Tue, Oct 01, 2013 at 10:57:55AM -0400, Dave Jones wrote:
> I seem to recall hitting this quite a while ago. Does it look familiar ?
> Either it didn't get fixed, or it's back..

> [ 2836.628506] trinity-child0/30759 is trying to acquire lock:
> [ 2836.628544] (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff81239045>] proc_pid_attr_write+0xf5/0x140
> [ 2836.628624]
> but task is already holding lock:
> [ 2836.628664] (&pipe->mutex/1){+.+.+.}, at: [<ffffffff811ccf66>] pipe_lock+0x26/0x30
> [ 2836.628734]

splice to /proc/<pid>/attr/<something> [cred_guard shite under pipe->mutex]

vs.

splice from XFS [pipe->mutex under exclusive iolock]

vs.

read from XFS in execve() [shared iolock under cred_guard]

Note, BTW, that splice to /proc/<pid>/attr/<something> is broken.
proc_pid_attr_write() is *not* supposed to allow partial writes at all.
Frankly, I'd consider adding a ->splice_write() instance that would
simply return -EINVAL there...

2013-10-04 23:27:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: pipe/cred lockdep warning

On Thu, Oct 3, 2013 at 11:56 AM, Al Viro <[email protected]> wrote:
>
> Note, BTW, that splice to /proc/<pid>/attr/<something> is broken.
> proc_pid_attr_write() is *not* supposed to allow partial writes at all.
> Frankly, I'd consider adding a ->splice_write() instance that would
> simply return -EINVAL there...

That sounds like the right thing to do.

Or possibly go even further, and say that the default is to return
-EINVAL, and files and filesystems that actually want the
"default_file_splice_write()" semantics have to say so in their d_op
structure.

Hmm?

Linus

2013-10-05 00:46:16

by Al Viro

[permalink] [raw]
Subject: Re: pipe/cred lockdep warning

On Fri, Oct 04, 2013 at 04:27:35PM -0700, Linus Torvalds wrote:
> On Thu, Oct 3, 2013 at 11:56 AM, Al Viro <[email protected]> wrote:
> >
> > Note, BTW, that splice to /proc/<pid>/attr/<something> is broken.
> > proc_pid_attr_write() is *not* supposed to allow partial writes at all.
> > Frankly, I'd consider adding a ->splice_write() instance that would
> > simply return -EINVAL there...
>
> That sounds like the right thing to do.
>
> Or possibly go even further, and say that the default is to return
> -EINVAL, and files and filesystems that actually want the
> "default_file_splice_write()" semantics have to say so in their d_op
> structure.
>
> Hmm?

f_op, unfortunately... That's going to be a _lot_ of churn, I'm afraid ;-/
We have 2002 instances of file_operations in the tree. 774 of those
have ->write. 60 have both ->aio_write and ->write, 4 - only ->aio_write.

->splice_write is present only for 25. Which leaves 753 instances of
file_operations to review and decide whether we keep the current behaviour
or start giving -EINVAL. In a lot of those the answer will be EINVAL,
but which ones those are?

We could, I suppose, generate a commit that would add default_file_splice_write
to those 753 instances and then get rid of them one by one (e.g. provide
the same thing under different name and use that name in already reviewed
cases if behaviour is right, so that we could keep track of what's left
to do), but...
a) if we go that way, I would suggest doing the first commit as
right-after-rc1 special
b) I'd expect that review and removal to take at least a full cycle
and contain quite a few mistaken decisions ;-/

2013-10-05 01:42:36

by Al Viro

[permalink] [raw]
Subject: Re: pipe/cred lockdep warning

On Fri, Oct 04, 2013 at 06:25:00PM -0700, Linus Torvalds wrote:

> Any splice user has better have a fallback to a read/write loop anyway, so
> I think the default of trying to splice was likely a bad decision.

Ummm... Point.

> That said, you're right that it's a big change. And maybe we don't really
> care enough, and we should just make sure sysfs and proc get the EINVAL
> return. So it's not like I have a really strong preference.
>
> But I *definitely* don't want something that just sets every f_op to use
> the default splice write. That's just equivalent to our current default to
> possibly bad behavior..

Well, I meant something like this: at 3.13-rc1 we add two exported
functions calling default_file_splice_write() - fallback_file_splice_write()
and this_fucker_will_be_gone_in_3_14_rc1_splice_write(). And do a single
commit slapping the latter in those file_operations. After that NULL means
-EINVAL. At which point maintainers are welcome to review the damn thing and
either remove the initializer if for this instance it makes no sense, or
switch it to something that will stay. Maybe fallback_file_splice_write(),
maybe something else - up to them. I wouldn't be surprised if in some cases
generic_file_splice_write() would turn out to be the right answer. In any
case, in 3.14-rc1 the function gets treatment according to its name and
those who hadn't bothered to switch are left with obvious build errors.
Should be a sufficient incentive to get off their arses and deal with that by
the time 3.14 comes...

It would work and it would avoid incompatible changes, but I agree that
anything using splice in userland ought to have a fallback... Dunno.
It's definitely not something to be done before .13-rc1, so we have time
to discuss that anyway.