On Wed, 26 May 2021 at 18:29, Pavel Begunkov <[email protected]> wrote:
> On 5/26/21 4:52 PM, Marco Elver wrote:
> > Due to some moving around of code, the patch lost the actual fix (using
> > atomically read io_wq) -- so here it is again ... hopefully as intended.
> > :-)
>
> "fortify" damn it... It was synchronised with &ctx->uring_lock
> before, see io_uring_try_cancel_iowq() and io_uring_del_tctx_node(),
> so should not clear before *del_tctx_node()
Ah, so if I understand right, the property stated by the comment in
io_uring_try_cancel_iowq() was broken, and your patch below would fix
that, right?
> The fix should just move it after this sync point. Will you send
> it out as a patch?
Do you mean your move of write to io_wq goes on top of the patch I
proposed? (If so, please also leave your Signed-of-by so I can squash
it.)
So if I understand right, we do in fact have 2 problems:
1. the data race as I noted in my patch, and
2. the fact that io_wq does not live long enough.
Did I get it right?
Thanks,
-- Marco
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 7db6aaf31080..b76ba26b4c6c 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -9075,11 +9075,12 @@ static void io_uring_clean_tctx(struct io_uring_task *tctx)
> struct io_tctx_node *node;
> unsigned long index;
>
> - tctx->io_wq = NULL;
> xa_for_each(&tctx->xa, index, node)
> io_uring_del_tctx_node(index);
> - if (wq)
> + if (wq) {
> + tctx->io_wq = NULL;
> io_wq_put_and_exit(wq);
> + }
> }
>
> static s64 tctx_inflight(struct io_uring_task *tctx, bool tracked)
>
>
> --
> Pavel Begunkov
On 5/26/21 5:36 PM, Marco Elver wrote:
> On Wed, 26 May 2021 at 18:29, Pavel Begunkov <[email protected]> wrote:
>> On 5/26/21 4:52 PM, Marco Elver wrote:
>>> Due to some moving around of code, the patch lost the actual fix (using
>>> atomically read io_wq) -- so here it is again ... hopefully as intended.
>>> :-)
>>
>> "fortify" damn it... It was synchronised with &ctx->uring_lock
>> before, see io_uring_try_cancel_iowq() and io_uring_del_tctx_node(),
>> so should not clear before *del_tctx_node()
>
> Ah, so if I understand right, the property stated by the comment in
> io_uring_try_cancel_iowq() was broken, and your patch below would fix
> that, right?
"io_uring: fortify tctx/io_wq cleanup" broke it and the diff
should fix it.
>> The fix should just move it after this sync point. Will you send
>> it out as a patch?
>
> Do you mean your move of write to io_wq goes on top of the patch I
> proposed? (If so, please also leave your Signed-of-by so I can squash
> it.)
No, only my diff, but you hinted on what has happened, so I would
prefer you to take care of patching. If you want of course.
To be entirely fair, assuming that aligned ptr
reads can't be torn, I don't see any _real_ problem. But surely
the report is very helpful and the current state is too wonky, so
should be patched.
TL;DR;
The synchronisation goes as this: it's usually used by the owner
task, and the owner task deletes it, so is mostly naturally
synchronised. An exception is a worker (not only) that accesses
it for cancellation purpose, but it uses it only under ->uring_lock,
so if removal is also taking the lock it should be fine. see
io_uring_del_tctx_node() locking.
>
> So if I understand right, we do in fact have 2 problems:
> 1. the data race as I noted in my patch, and
Yes, and it deals with it
> 2. the fact that io_wq does not live long enough.
Nope, io_wq outlives them fine.
> Did I get it right?
>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 7db6aaf31080..b76ba26b4c6c 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -9075,11 +9075,12 @@ static void io_uring_clean_tctx(struct io_uring_task *tctx)
>> struct io_tctx_node *node;
>> unsigned long index;
>>
>> - tctx->io_wq = NULL;
>> xa_for_each(&tctx->xa, index, node)
>> io_uring_del_tctx_node(index);
>> - if (wq)
>> + if (wq) {
>> + tctx->io_wq = NULL;
>> io_wq_put_and_exit(wq);
>> + }
>> }
>>
>> static s64 tctx_inflight(struct io_uring_task *tctx, bool tracked)
--
Pavel Begunkov
On Wed, May 26, 2021 at 09:31PM +0100, Pavel Begunkov wrote:
> On 5/26/21 5:36 PM, Marco Elver wrote:
> > On Wed, 26 May 2021 at 18:29, Pavel Begunkov <[email protected]> wrote:
> >> On 5/26/21 4:52 PM, Marco Elver wrote:
> >>> Due to some moving around of code, the patch lost the actual fix (using
> >>> atomically read io_wq) -- so here it is again ... hopefully as intended.
> >>> :-)
> >>
> >> "fortify" damn it... It was synchronised with &ctx->uring_lock
> >> before, see io_uring_try_cancel_iowq() and io_uring_del_tctx_node(),
> >> so should not clear before *del_tctx_node()
> >
> > Ah, so if I understand right, the property stated by the comment in
> > io_uring_try_cancel_iowq() was broken, and your patch below would fix
> > that, right?
>
> "io_uring: fortify tctx/io_wq cleanup" broke it and the diff
> should fix it.
>
> >> The fix should just move it after this sync point. Will you send
> >> it out as a patch?
> >
> > Do you mean your move of write to io_wq goes on top of the patch I
> > proposed? (If so, please also leave your Signed-of-by so I can squash
> > it.)
>
> No, only my diff, but you hinted on what has happened, so I would
> prefer you to take care of patching. If you want of course.
>
> To be entirely fair, assuming that aligned ptr
> reads can't be torn, I don't see any _real_ problem. But surely
> the report is very helpful and the current state is too wonky, so
> should be patched.
In the current version, it is a problem if we end up with a double-read,
as it is in the current C code. The compiler might of course optimize
it into 1 read into a register.
Tangent: I avoid reasoning in terms of compiler optimizations where
I can. :-) It's is a slippery slope if the code in question isn't
tolerant to data races by design (examples are stats counting, or other
heuristics -- in the case here that's certainly not the case).
Therefore, my wish is that we really ought to resolve as many data races
as we can (+ mark intentional ones appropriately). Also, so that we're
left with only the interesting cases like in the case here. (More
background if you're interested: https://lwn.net/Articles/816850/)
The problem here, however, has a nicer resolution as you suggested.
> TL;DR;
> The synchronisation goes as this: it's usually used by the owner
> task, and the owner task deletes it, so is mostly naturally
> synchronised. An exception is a worker (not only) that accesses
> it for cancellation purpose, but it uses it only under ->uring_lock,
> so if removal is also taking the lock it should be fine. see
> io_uring_del_tctx_node() locking.
Did you mean io_uring_del_task_file()? There is no
io_uring_del_tctx_node().
> > So if I understand right, we do in fact have 2 problems:
> > 1. the data race as I noted in my patch, and
>
> Yes, and it deals with it
>
> > 2. the fact that io_wq does not live long enough.
>
> Nope, io_wq outlives them fine.
I've sent:
https://lkml.kernel.org/r/[email protected]
Thanks,
-- Marco