2012-05-01 12:04:52

by Sasha Levin

[permalink] [raw]
Subject: WARNING: at include/linux/iocontext.h:140 copy_io+0xb9/0x130()

Hi all,

I've stumbled on the warning at the bottom with today's linux-next within a KVM guest doing syscall fuzzing using trinity.

I haven't encountered this warning before, and it looks like it's not easy to reproduce it within the guest. Looking at the code in that area I guess it's some sorts of a rare race condition.

The only recent commits in that area are the work done on blk cgroup. It is enabled in the test kernel.

[ 418.101833] WARNING: at include/linux/iocontext.h:140 copy_io+0xb9/0x130()
[ 418.108177] Pid: 29267, comm: trinity Tainted: G W 3.4.0-rc5-next-20120501-sasha #104
[ 418.108180] Call Trace:
[ 418.108186] [<ffffffff810b6c67>] warn_slowpath_common+0x87/0xb0
[ 418.108189] [<ffffffff810b6ca5>] warn_slowpath_null+0x15/0x20
[ 418.108192] [<ffffffff810b44d9>] copy_io+0xb9/0x130
[ 418.108196] [<ffffffff810b5ae4>] copy_process+0x884/0xf90
[ 418.108200] [<ffffffff810f17c5>] ? sched_clock_local+0x25/0x90
[ 418.108203] [<ffffffff810b65b7>] do_fork+0x137/0x240
[ 418.108208] [<ffffffff810e808e>] ? sub_preempt_count+0xae/0xf0
[ 418.108212] [<ffffffff82d92cf9>] ? _raw_spin_unlock_irq+0x59/0x80
[ 418.108216] [<ffffffff82d93be5>] ? sysret_check+0x22/0x5d
[ 418.108219] [<ffffffff81057b13>] sys_clone+0x23/0x30
[ 418.108222] [<ffffffff82d93f43>] stub_clone+0x13/0x20
[ 418.108225] [<ffffffff82d93bb9>] ? system_call_fastpath+0x16/0x1b
[ 418.108228] ---[ end trace 8f6ca168297608bb ]---


2012-05-01 16:17:36

by Tejun Heo

[permalink] [raw]
Subject: [PATCH v3.4-rc5] block: iocontext->nr_tasks should be initialized to one

create_task_io_context() left ioc->nr_tasks at zero; however, a newly
created ioc should have its nr_tasks initialized to one as it begins
attached to the task creating it.

This affects only CLONE_IO which currently doesn't seem to have any
actual user. Sasha triggered WARN_ON_ONCE() in ioc_task_link() using
syscall fuzzer. Even when it happens, the failure mode isn't critical
(blk-cgroup may allow attaching a CLONE_IO'd task to a cgroup when it
shouldn't and blkcg limits may behave weirdly).

Fix it by initializing it to one in create_task_io_context().

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Sasha Levin <[email protected]>
LKML-Reference: <1335873936.16988.148.camel@lappy>
Cc: [email protected]
---
block/blk-ioc.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 1e2d53b..c942409 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -244,6 +244,7 @@ int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node)
/* initialize */
atomic_long_set(&ioc->refcount, 1);
atomic_set(&ioc->active_ref, 1);
+ atomic_set(&ioc->nr_tasks, 1);
spin_lock_init(&ioc->lock);
INIT_RADIX_TREE(&ioc->icq_tree, GFP_ATOMIC | __GFP_HIGH);
INIT_HLIST_HEAD(&ioc->icq_list);

2012-05-01 18:02:46

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v3.4-rc5] block: iocontext->nr_tasks should be initialized to one

On 2012-05-01 18:17, Tejun Heo wrote:
> create_task_io_context() left ioc->nr_tasks at zero; however, a newly
> created ioc should have its nr_tasks initialized to one as it begins
> attached to the task creating it.
>
> This affects only CLONE_IO which currently doesn't seem to have any
> actual user. Sasha triggered WARN_ON_ONCE() in ioc_task_link() using
> syscall fuzzer. Even when it happens, the failure mode isn't critical
> (blk-cgroup may allow attaching a CLONE_IO'd task to a cgroup when it
> shouldn't and blkcg limits may behave weirdly).

CLONE_IO is an exported interface, it can be set from clone(2).
Otherwise Sasha would not have hit this :-)

Thanks, applied.

--
Jens Axboe

2012-05-01 18:05:04

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v3.4-rc5] block: iocontext->nr_tasks should be initialized to one

On 2012-05-01 18:17, Tejun Heo wrote:
> create_task_io_context() left ioc->nr_tasks at zero; however, a newly
> created ioc should have its nr_tasks initialized to one as it begins
> attached to the task creating it.
>
> This affects only CLONE_IO which currently doesn't seem to have any
> actual user. Sasha triggered WARN_ON_ONCE() in ioc_task_link() using
> syscall fuzzer. Even when it happens, the failure mode isn't critical
> (blk-cgroup may allow attaching a CLONE_IO'd task to a cgroup when it
> shouldn't and blkcg limits may behave weirdly).
>
> Fix it by initializing it to one in create_task_io_context().
>
> Signed-off-by: Tejun Heo <[email protected]>
> Reported-by: Sasha Levin <[email protected]>
> LKML-Reference: <1335873936.16988.148.camel@lappy>
> Cc: [email protected]

BTW, this only affects for-3.5/core, it's not a mainline bug. So I've
dropped the stable CC.

--
Jens Axboe

2012-05-01 18:06:31

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v3.4-rc5] block: iocontext->nr_tasks should be initialized to one

On Tue, May 01, 2012 at 08:04:58PM +0200, Jens Axboe wrote:
> On 2012-05-01 18:17, Tejun Heo wrote:
> > create_task_io_context() left ioc->nr_tasks at zero; however, a newly
> > created ioc should have its nr_tasks initialized to one as it begins
> > attached to the task creating it.
> >
> > This affects only CLONE_IO which currently doesn't seem to have any
> > actual user. Sasha triggered WARN_ON_ONCE() in ioc_task_link() using
> > syscall fuzzer. Even when it happens, the failure mode isn't critical
> > (blk-cgroup may allow attaching a CLONE_IO'd task to a cgroup when it
> > shouldn't and blkcg limits may behave weirdly).
> >
> > Fix it by initializing it to one in create_task_io_context().
> >
> > Signed-off-by: Tejun Heo <[email protected]>
> > Reported-by: Sasha Levin <[email protected]>
> > LKML-Reference: <1335873936.16988.148.camel@lappy>
> > Cc: [email protected]
>
> BTW, this only affects for-3.5/core, it's not a mainline bug. So I've
> dropped the stable CC.

Ah, sorry about that. Got confused which one got in when.

Thanks.

--
tejun

2012-05-01 18:09:13

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v3.4-rc5] block: iocontext->nr_tasks should be initialized to one

On Tue, May 01, 2012 at 08:02:39PM +0200, Jens Axboe wrote:
> On 2012-05-01 18:17, Tejun Heo wrote:
> > create_task_io_context() left ioc->nr_tasks at zero; however, a newly
> > created ioc should have its nr_tasks initialized to one as it begins
> > attached to the task creating it.
> >
> > This affects only CLONE_IO which currently doesn't seem to have any
> > actual user. Sasha triggered WARN_ON_ONCE() in ioc_task_link() using
> > syscall fuzzer. Even when it happens, the failure mode isn't critical
> > (blk-cgroup may allow attaching a CLONE_IO'd task to a cgroup when it
> > shouldn't and blkcg limits may behave weirdly).
>
> CLONE_IO is an exported interface, it can be set from clone(2).
> Otherwise Sasha would not have hit this :-)

Yeah, but with pthread not exposing it, I'm very skeptical how much,
if any, use it's getting. With its incompatibility with blk-cgroup
and cfq being able to merge coop request streams, I'm not sure how
much we need it. Maybe we can just make it noop?

Thanks.

--
tejun

2012-05-01 18:18:27

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v3.4-rc5] block: iocontext->nr_tasks should be initialized to one

On 2012-05-01 20:09, Tejun Heo wrote:
> On Tue, May 01, 2012 at 08:02:39PM +0200, Jens Axboe wrote:
>> On 2012-05-01 18:17, Tejun Heo wrote:
>>> create_task_io_context() left ioc->nr_tasks at zero; however, a newly
>>> created ioc should have its nr_tasks initialized to one as it begins
>>> attached to the task creating it.
>>>
>>> This affects only CLONE_IO which currently doesn't seem to have any
>>> actual user. Sasha triggered WARN_ON_ONCE() in ioc_task_link() using
>>> syscall fuzzer. Even when it happens, the failure mode isn't critical
>>> (blk-cgroup may allow attaching a CLONE_IO'd task to a cgroup when it
>>> shouldn't and blkcg limits may behave weirdly).
>>
>> CLONE_IO is an exported interface, it can be set from clone(2).
>> Otherwise Sasha would not have hit this :-)
>
> Yeah, but with pthread not exposing it, I'm very skeptical how much,
> if any, use it's getting. With its incompatibility with blk-cgroup
> and cfq being able to merge coop request streams, I'm not sure how
> much we need it. Maybe we can just make it noop?

It's a lot more robust and specific than hoping to get coop merging. For
cfq, it also implies that multiple threads sharing an io context should
be accounted as one.

But as to actual users, I really don't know. I agree it's probably not
that widely used. If google still had that code search, we could get a
better idea :-)

--
Jens Axboe

2012-05-01 18:31:13

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH v3.4-rc5] block: iocontext->nr_tasks should be initialized to one

Jens Axboe <[email protected]> writes:

> On 2012-05-01 20:09, Tejun Heo wrote:
>> On Tue, May 01, 2012 at 08:02:39PM +0200, Jens Axboe wrote:
>>> On 2012-05-01 18:17, Tejun Heo wrote:
>>>> create_task_io_context() left ioc->nr_tasks at zero; however, a newly
>>>> created ioc should have its nr_tasks initialized to one as it begins
>>>> attached to the task creating it.
>>>>
>>>> This affects only CLONE_IO which currently doesn't seem to have any
>>>> actual user. Sasha triggered WARN_ON_ONCE() in ioc_task_link() using
>>>> syscall fuzzer. Even when it happens, the failure mode isn't critical
>>>> (blk-cgroup may allow attaching a CLONE_IO'd task to a cgroup when it
>>>> shouldn't and blkcg limits may behave weirdly).
>>>
>>> CLONE_IO is an exported interface, it can be set from clone(2).
>>> Otherwise Sasha would not have hit this :-)
>>
>> Yeah, but with pthread not exposing it, I'm very skeptical how much,
>> if any, use it's getting. With its incompatibility with blk-cgroup
>> and cfq being able to merge coop request streams, I'm not sure how
>> much we need it. Maybe we can just make it noop?
>
> It's a lot more robust and specific than hoping to get coop merging. For
> cfq, it also implies that multiple threads sharing an io context should
> be accounted as one.
>
> But as to actual users, I really don't know. I agree it's probably not
> that widely used. If google still had that code search, we could get a
> better idea :-)

I know of one project: the venerable dump/restore utility uses CLONE_IO.

Cheers,
Jeff

2012-05-01 18:36:15

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH v3.4-rc5] block: iocontext->nr_tasks should be initialized to one

On Tue, May 01, 2012 at 02:31:07PM -0400, Jeff Moyer wrote:

[..]
> > But as to actual users, I really don't know. I agree it's probably not
> > that widely used. If google still had that code search, we could get a
> > better idea :-)
>
> I know of one project: the venerable dump/restore utility uses CLONE_IO.

I thought you wrote cooperating queue logic to fix dump as it was not
using CLONE_IO and IO from multiple threads was going in separate
queues.

Thanks
Vivek

2012-05-01 18:37:42

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v3.4-rc5] block: iocontext->nr_tasks should be initialized to one

On 2012-05-01 20:31, Jeff Moyer wrote:
> Jens Axboe <[email protected]> writes:
>
>> On 2012-05-01 20:09, Tejun Heo wrote:
>>> On Tue, May 01, 2012 at 08:02:39PM +0200, Jens Axboe wrote:
>>>> On 2012-05-01 18:17, Tejun Heo wrote:
>>>>> create_task_io_context() left ioc->nr_tasks at zero; however, a newly
>>>>> created ioc should have its nr_tasks initialized to one as it begins
>>>>> attached to the task creating it.
>>>>>
>>>>> This affects only CLONE_IO which currently doesn't seem to have any
>>>>> actual user. Sasha triggered WARN_ON_ONCE() in ioc_task_link() using
>>>>> syscall fuzzer. Even when it happens, the failure mode isn't critical
>>>>> (blk-cgroup may allow attaching a CLONE_IO'd task to a cgroup when it
>>>>> shouldn't and blkcg limits may behave weirdly).
>>>>
>>>> CLONE_IO is an exported interface, it can be set from clone(2).
>>>> Otherwise Sasha would not have hit this :-)
>>>
>>> Yeah, but with pthread not exposing it, I'm very skeptical how much,
>>> if any, use it's getting. With its incompatibility with blk-cgroup
>>> and cfq being able to merge coop request streams, I'm not sure how
>>> much we need it. Maybe we can just make it noop?
>>
>> It's a lot more robust and specific than hoping to get coop merging. For
>> cfq, it also implies that multiple threads sharing an io context should
>> be accounted as one.
>>
>> But as to actual users, I really don't know. I agree it's probably not
>> that widely used. If google still had that code search, we could get a
>> better idea :-)
>
> I know of one project: the venerable dump/restore utility uses CLONE_IO.

Thanks Jeff, now I remember the specifics of what we tested. IIRC, we
also did numbers back then comparing the coop merging vs specifically
using CLONE_IO.

--
Jens Axboe

2012-05-01 18:50:07

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH v3.4-rc5] block: iocontext->nr_tasks should be initialized to one

Jens Axboe <[email protected]> writes:

> On 2012-05-01 20:31, Jeff Moyer wrote:
>> Jens Axboe <[email protected]> writes:
>> I know of one project: the venerable dump/restore utility uses CLONE_IO.
>
> Thanks Jeff, now I remember the specifics of what we tested. IIRC, we
> also did numbers back then comparing the coop merging vs specifically
> using CLONE_IO.

Maybe? I picked through some old emails but couldn't find that exact
comparison. It would have been a smart thing to do, though!

-Jeff

2012-05-01 18:59:13

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH v3.4-rc5] block: iocontext->nr_tasks should be initialized to one

On Tue, May 01, 2012 at 02:48:41PM -0400, Jeff Moyer wrote:
> Vivek Goyal <[email protected]> writes:
>
> > On Tue, May 01, 2012 at 02:31:07PM -0400, Jeff Moyer wrote:
> >
> > [..]
> >> > But as to actual users, I really don't know. I agree it's probably not
> >> > that widely used. If google still had that code search, we could get a
> >> > better idea :-)
> >>
> >> I know of one project: the venerable dump/restore utility uses CLONE_IO.
> >
> > I thought you wrote cooperating queue logic to fix dump as it was not
> > using CLONE_IO and IO from multiple threads was going in separate
> > queues.
>
> That's correct. I believe I sent the patch for dump before the kernel
> patch was accepted. Plus, it can't hurt, right?

Ok, so now you have fixed dump to use CLONE_IO.

So only other user of coop thing remaining potentially is qemu. I was
doing some qemu testing where threads were doing IO to nearby area
but no coop merging was taking place. So not sure in practice how well
does it work. Thought, that's irrlevant for this discussion. Thought of
mentioning this observation.

Thanks
Vivek

2012-05-01 19:03:50

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v3.4-rc5] block: iocontext->nr_tasks should be initialized to one

On 2012-05-01 20:50, Jeff Moyer wrote:
> Jens Axboe <[email protected]> writes:
>
>> On 2012-05-01 20:31, Jeff Moyer wrote:
>>> Jens Axboe <[email protected]> writes:
>>> I know of one project: the venerable dump/restore utility uses CLONE_IO.
>>
>> Thanks Jeff, now I remember the specifics of what we tested. IIRC, we
>> also did numbers back then comparing the coop merging vs specifically
>> using CLONE_IO.
>
> Maybe? I picked through some old emails but couldn't find that exact
> comparison. It would have been a smart thing to do, though!

It might have been in a bugzilla thing, unfortunately. But yes, it'd be
the smart thing to do, hence I was pretty sure that you did it!

--
Jens Axboe

2012-05-01 19:04:55

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH v3.4-rc5] block: iocontext->nr_tasks should be initialized to one

Vivek Goyal <[email protected]> writes:

> On Tue, May 01, 2012 at 02:48:41PM -0400, Jeff Moyer wrote:
>> Vivek Goyal <[email protected]> writes:
>>
>> > On Tue, May 01, 2012 at 02:31:07PM -0400, Jeff Moyer wrote:
>> >
>> > [..]
>> >> > But as to actual users, I really don't know. I agree it's probably not
>> >> > that widely used. If google still had that code search, we could get a
>> >> > better idea :-)
>> >>
>> >> I know of one project: the venerable dump/restore utility uses CLONE_IO.
>> >
>> > I thought you wrote cooperating queue logic to fix dump as it was not
>> > using CLONE_IO and IO from multiple threads was going in separate
>> > queues.
>>
>> That's correct. I believe I sent the patch for dump before the kernel
>> patch was accepted. Plus, it can't hurt, right?
>
> Ok, so now you have fixed dump to use CLONE_IO.
>
> So only other user of coop thing remaining potentially is qemu. I was

No, that's not the *only* other potential user. ;-) I wouldn't be
surprised if nfsd benefitted from the merging. I also wouldn't be
surprised if other 3rd party apps did. Are you trying to make a case to
get rid of the queue merging logic?

> doing some qemu testing where threads were doing IO to nearby area
> but no coop merging was taking place. So not sure in practice how well
> does it work.

Well, that sounds like it warrants further investigation.

> Thought, that's irrlevant for this discussion. Thought of mentioning
> this observation.

If you can provide a reproducer, I'll be happy to take a look.

Cheers,
Jeff

2012-05-01 19:08:54

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH v3.4-rc5] block: iocontext->nr_tasks should be initialized to one

On Tue, May 01, 2012 at 03:04:49PM -0400, Jeff Moyer wrote:
> Vivek Goyal <[email protected]> writes:
>
> > On Tue, May 01, 2012 at 02:48:41PM -0400, Jeff Moyer wrote:
> >> Vivek Goyal <[email protected]> writes:
> >>
> >> > On Tue, May 01, 2012 at 02:31:07PM -0400, Jeff Moyer wrote:
> >> >
> >> > [..]
> >> >> > But as to actual users, I really don't know. I agree it's probably not
> >> >> > that widely used. If google still had that code search, we could get a
> >> >> > better idea :-)
> >> >>
> >> >> I know of one project: the venerable dump/restore utility uses CLONE_IO.
> >> >
> >> > I thought you wrote cooperating queue logic to fix dump as it was not
> >> > using CLONE_IO and IO from multiple threads was going in separate
> >> > queues.
> >>
> >> That's correct. I believe I sent the patch for dump before the kernel
> >> patch was accepted. Plus, it can't hurt, right?
> >
> > Ok, so now you have fixed dump to use CLONE_IO.
> >
> > So only other user of coop thing remaining potentially is qemu. I was
>
> No, that's not the *only* other potential user. ;-) I wouldn't be
> surprised if nfsd benefitted from the merging. I also wouldn't be
> surprised if other 3rd party apps did. Are you trying to make a case to
> get rid of the queue merging logic?

No, just counting who benefits from coop logic.

>
> > doing some qemu testing where threads were doing IO to nearby area
> > but no coop merging was taking place. So not sure in practice how well
> > does it work.
>
> Well, that sounds like it warrants further investigation.
>
> > Thought, that's irrlevant for this discussion. Thought of mentioning
> > this observation.
>
> If you can provide a reproducer, I'll be happy to take a look.

I will dig. This was in the context of trying to solve qemu related
IO slowdown with CFQ. Will let you know if I find a concrete reproducer.

Thanks
Vivek

2012-05-01 19:20:38

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH v3.4-rc5] block: iocontext->nr_tasks should be initialized to one

Vivek Goyal <[email protected]> writes:

> On Tue, May 01, 2012 at 02:31:07PM -0400, Jeff Moyer wrote:
>
> [..]
>> > But as to actual users, I really don't know. I agree it's probably not
>> > that widely used. If google still had that code search, we could get a
>> > better idea :-)
>>
>> I know of one project: the venerable dump/restore utility uses CLONE_IO.
>
> I thought you wrote cooperating queue logic to fix dump as it was not
> using CLONE_IO and IO from multiple threads was going in separate
> queues.

That's correct. I believe I sent the patch for dump before the kernel
patch was accepted. Plus, it can't hurt, right?

Cheers,
Jeff