2022-07-20 23:47:47

by Ammar Faizi

[permalink] [raw]
Subject: Linux 5.19-rc7 liburing test `poll-mshot-overflow.t` and `read-write.t` fail

Hello Jens,

Kernel version:

commit ff6992735ade75aae3e35d16b17da1008d753d28
Author: Linus Torvalds <[email protected]>
Date: Sun Jul 17 13:30:22 2022 -0700

Linux 5.19-rc7

liburing version:

commit 4e6eec8bdea906fe5341c97aef96986d605004e9 (HEAD, origin/master, origin/HEAD)
Author: Dylan Yudaken <[email protected]>
Date: Mon Jul 18 06:34:29 2022 -0700

fix io_uring_recvmsg_cmsg_nexthdr logic

io_uring_recvmsg_cmsg_nexthdr was using the payload to delineate the end
of the cmsg list, but really it needs to use whatever was returned by the
kernel.

Reported-and-tested-by: Jens Axboe <[email protected]>
Fixes: 874406f7fb09 ("add multishot recvmsg API")
Signed-off-by: Dylan Yudaken <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

Two liburing tests fail:

Tests failed: <poll-mshot-overflow.t> <read-write.t>
make[1]: *** [Makefile:237: runtests] Error 1
make[1]: Leaving directory '/home/ammarfaizi2/app/liburing/test'
make: *** [Makefile:21: runtests] Error 2


ammarfaizi2@integral2:~/app/liburing$ uname -a
Linux integral2 5.19.0-rc7-2022-07-18 #1 SMP PREEMPT_DYNAMIC Mon Jul 18 15:42:27 WIB 2022 x86_64 x86_64 x86_64 GNU/Linux
ammarfaizi2@integral2:~/app/liburing$ test/read-write.t
cqe res -22, wanted 8192
test_buf_select vec failed
ammarfaizi2@integral2:~/app/liburing$ test/poll-mshot-overflow.t
signalled no more!
ammarfaizi2@integral2:~/app/liburing$

JFYI, -22 is -EINVAL.

read-write.t call trace when calling fprintf(..., "cqe res %d, wanted %d\n", ...):

#0 ___fprintf_chk (./debug/fprintf_chk.c:25)
#1 fprintf (/usr/include/x86_64-linux-gnu/bits/stdio2.h:105)
#2 __test_io (read-write.c:181)
#3 test_buf_select (read-write.c:577)
#4 main (read-write.c:849)

poll-mshot-overflow.t call trace should be trivial.

--
Ammar Faizi


2022-07-21 10:13:42

by Dylan Yudaken

[permalink] [raw]
Subject: Re: Linux 5.19-rc7 liburing test `poll-mshot-overflow.t` and `read-write.t` fail

On Thu, 2022-07-21 at 06:21 +0700, Ammar Faizi wrote:
> Hello Jens,
>
> Kernel version:
>
>    commit ff6992735ade75aae3e35d16b17da1008d753d28
>    Author: Linus Torvalds <[email protected]>
>    Date:   Sun Jul 17 13:30:22 2022 -0700
>
>        Linux 5.19-rc7
>
> liburing version:
>
>    commit 4e6eec8bdea906fe5341c97aef96986d605004e9 (HEAD,
> origin/master, origin/HEAD)
>    Author: Dylan Yudaken <[email protected]>
>    Date:   Mon Jul 18 06:34:29 2022 -0700
>
>        fix io_uring_recvmsg_cmsg_nexthdr logic
>       
>        io_uring_recvmsg_cmsg_nexthdr was using the payload to
> delineate the end
>        of the cmsg list, but really it needs to use whatever was
> returned by the
>        kernel.
>       
>        Reported-and-tested-by: Jens Axboe <[email protected]>
>        Fixes: 874406f7fb09 ("add multishot recvmsg API")
>        Signed-off-by: Dylan Yudaken <[email protected]>
>        Link:
> https://lore.kernel.org/r/[email protected]
>        Signed-off-by: Jens Axboe <[email protected]>
>
> Two liburing tests fail:
>
>    Tests failed:  <poll-mshot-overflow.t> <read-write.t>
>    make[1]: *** [Makefile:237: runtests] Error 1
>    make[1]: Leaving directory '/home/ammarfaizi2/app/liburing/test'
>    make: *** [Makefile:21: runtests] Error 2
>
>
>    ammarfaizi2@integral2:~/app/liburing$ uname -a
>    Linux integral2 5.19.0-rc7-2022-07-18 #1 SMP PREEMPT_DYNAMIC Mon
> Jul 18 15:42:27 WIB 2022 x86_64 x86_64 x86_64 GNU/Linux
>    ammarfaizi2@integral2:~/app/liburing$ test/read-write.t
>    cqe res -22, wanted 8192
>    test_buf_select vec failed

What fs are you using? testing on a fresh XFS fs read-write.t works for
me

>    ammarfaizi2@integral2:~/app/liburing$ test/poll-mshot-overflow.t
>    signalled no more!
>    ammarfaizi2@integral2:~/app/liburing$
>
> JFYI, -22 is -EINVAL.
>
> read-write.t call trace when calling fprintf(..., "cqe res %d, wanted
> %d\n", ...):
>
>    #0  ___fprintf_chk (./debug/fprintf_chk.c:25)
>    #1  fprintf (/usr/include/x86_64-linux-gnu/bits/stdio2.h:105)
>    #2  __test_io (read-write.c:181)
>    #3  test_buf_select (read-write.c:577)
>    #4  main (read-write.c:849)
>
> poll-mshot-overflow.t call trace should be trivial.
>


poll-mshot-overflow.t tests something that I changed in 5.20, but
actually I do not know if the fix should be backported. Do people have
an opinion here? The backport unfortunately looks like it might be
complex.

The test tests an edge condition with overflow and multishot polls.
Overflow will actually change the ordering of CQEs, such that you might
get a CQE without IORING_CQE_F_MORE and then later receive one with
IORING_CQE_F_MORE set.

This is a real problem for strict ordered API's like recv (which is why
I fixed it), but for poll it's unclear to me if it is a big enough
problem and needs backporting. Certainly I think it has been this way
for a long time and no one has complained?


2022-07-21 10:48:45

by Ammar Faizi

[permalink] [raw]
Subject: Re: Linux 5.19-rc7 liburing test `poll-mshot-overflow.t` and `read-write.t` fail

On 7/21/22 4:48 PM, Dylan Yudaken wrote:
> What fs are you using? testing on a fresh XFS fs read-write.t works for
> me

I am using btrfs.

After I got your email, I tried to run the test on an ext4 directory and
it works fine. But fails on a btrfs directory.

Any idea why does the test fail on a btrfs fs?

--
Ammar Faizi

2022-07-21 12:17:06

by Dylan Yudaken

[permalink] [raw]
Subject: Re: Linux 5.19-rc7 liburing test `poll-mshot-overflow.t` and `read-write.t` fail

On Thu, 2022-07-21 at 17:41 +0700, Ammar Faizi wrote:
> On 7/21/22 4:48 PM, Dylan Yudaken wrote:
> > What fs are you using? testing on a fresh XFS fs read-write.t works
> > for
> > me
>
> I am using btrfs.
>
> After I got your email, I tried to run the test on an ext4 directory
> and
> it works fine. But fails on a btrfs directory.
>
> Any idea why does the test fail on a btrfs fs?
>

It seems to be a problem with blocking reads, buffer select and READV.
My guess is that ext4/xfs are not blocking.

in b66e65f41426 ("io_uring: never call io_buffer_select() for a buffer
re-select"), this line was added in __io_iov_buffer_select

- iov[0].iov_len = len;
+ req->rw.len = iov[0].iov_len = len;

Basically stashing the buffer length in rw.len. The problem is that the
next time around that breaks at

if (req->rw.len != 1)
return -EINVAL;


The below fixes it as an example, but it's not great. Maybe someone can
figure out a better patch? Otherwise I can try tomorrow:

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 2b7bb62c7805..d9fa226f8e30 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -647,6 +647,8 @@ struct io_rw {
u64 addr;
u32 len;
rwf_t flags;
+ u64 bufaddr;
+ u32 buflen;
};

struct io_connect {
@@ -3899,7 +3901,7 @@ static ssize_t io_compat_import(struct io_kiocb
*req, struct iovec *iov,
return -ENOBUFS;
req->rw.addr = (unsigned long) buf;
iov[0].iov_base = buf;
- req->rw.len = iov[0].iov_len = (compat_size_t) len;
+ req->rw.buflen = iov[0].iov_len = (compat_size_t) len;
return 0;
}
#endif
@@ -3920,9 +3922,9 @@ static ssize_t __io_iov_buffer_select(struct
io_kiocb *req, struct iovec *iov,
buf = io_buffer_select(req, &len, issue_flags);
if (!buf)
return -ENOBUFS;
- req->rw.addr = (unsigned long) buf;
+ req->rw.bufaddr = (unsigned long) buf;
iov[0].iov_base = buf;
- req->rw.len = iov[0].iov_len = len;
+ req->rw.buflen = iov[0].iov_len = len;
return 0;
}

@@ -3930,8 +3932,8 @@ static ssize_t io_iov_buffer_select(struct
io_kiocb *req, struct iovec *iov,
unsigned int issue_flags)
{
if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)) {
- iov[0].iov_base = u64_to_user_ptr(req->rw.addr);
- iov[0].iov_len = req->rw.len;
+ iov[0].iov_base = u64_to_user_ptr(req->rw.bufaddr);
+ iov[0].iov_len = req->rw.buflen;
return 0;
}
if (req->rw.len != 1)

2022-07-21 13:32:50

by Dylan Yudaken

[permalink] [raw]
Subject: Re: Linux 5.19-rc7 liburing test `poll-mshot-overflow.t` and `read-write.t` fail

On Thu, 2022-07-21 at 20:08 +0700, Ammar Faizi wrote:
> On 7/21/22 7:05 PM, Dylan Yudaken wrote:
> > It seems to be a problem with blocking reads, buffer select and
> > READV.
> > My guess is that ext4/xfs are not blocking.
> >
> > in b66e65f41426 ("io_uring: never call io_buffer_select() for a
> > buffer
> > re-select"), this line was added in __io_iov_buffer_select
> >
> > -       iov[0].iov_len = len;
> > +       req->rw.len = iov[0].iov_len = len;
> >
> > Basically stashing the buffer length in rw.len. The problem is that
> > the
> > next time around that breaks at
> >
> >          if (req->rw.len != 1)
> >                  return -EINVAL;
> >
> >
> > The below fixes it as an example, but it's not great. Maybe someone
> > can
> > figure out a better patch? Otherwise I can try tomorrow:
>
> It's 8:05 PM from my end. I'll try to play with your patch after
> dinner
> while waiting for others say something.
>

I've just sent the below actually which is a bit simpler. I reran all
the tests on btrfs and xfs and it seems to work now:

diff --git a/fs/io_uring.c b/fs/io_uring.c
index a01ea49f3017..b0180679584f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1737,6 +1737,14 @@ static void io_kbuf_recycle(struct io_kiocb
*req, unsigned issue_flags)
(req->flags & REQ_F_PARTIAL_IO))
return;

+ /*
+ * READV uses fields in `struct io_rw` (len/addr) to stash the
selected
+ * buffer data. However if that buffer is recycled the original
request
+ * data stored in addr is lost. Therefore forbid recycling for
now.
+ */
+ if (req->opcode == IORING_OP_READV)
+ return;
+
/*
* We don't need to recycle for REQ_F_BUFFER_RING, we can just
clear
* the flag and hence ensure that bl->head doesn't get
incremented.


2022-07-21 14:09:54

by Ammar Faizi

[permalink] [raw]
Subject: Re: Linux 5.19-rc7 liburing test `poll-mshot-overflow.t` and `read-write.t` fail

On 7/21/22 7:05 PM, Dylan Yudaken wrote:
> It seems to be a problem with blocking reads, buffer select and READV.
> My guess is that ext4/xfs are not blocking.
>
> in b66e65f41426 ("io_uring: never call io_buffer_select() for a buffer
> re-select"), this line was added in __io_iov_buffer_select
>
> - iov[0].iov_len = len;
> + req->rw.len = iov[0].iov_len = len;
>
> Basically stashing the buffer length in rw.len. The problem is that the
> next time around that breaks at
>
> if (req->rw.len != 1)
> return -EINVAL;
>
>
> The below fixes it as an example, but it's not great. Maybe someone can
> figure out a better patch? Otherwise I can try tomorrow:

It's 8:05 PM from my end. I'll try to play with your patch after dinner
while waiting for others say something.

--
Ammar Faizi