2023-06-30 16:59:00

by Maxime Jayat

[permalink] [raw]
Subject: can: isotp: epoll breaks isotp_sendmsg

Hi,

There is something not clear happening with the non-blocking behavior
of ISO-TP sockets in the TX path, but more importantly, using epoll now
completely breaks isotp_sendmsg.
I believe it is related to
79e19fa79c ("can: isotp: isotp_ops: fix poll() to not report false
EPOLLOUT events"),
but actually is probably deeper than that.

I don't completely understand what is exactly going on, so I am sharing
the problem I face:

With an ISO-TP socket in non-blocking mode, using epoll seems to make
isotp_sendmsg always return -EAGAIN.

I have a non-blocking socket + epoll version of can-utils isotpsend
available for
testing at https://gist.github.com/MJayat/4857da43ab154e4ba644d2446b5fa46d
With this version I do the following test:

isotprecv -l -m 0x80 -s 7e8 -d 7e0 vcan0 &
echo "01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f" | strace ./isotpsend
-l10 -s 7E0 -d 7E8 vcan0

I get:
...
15:37:02.456849 epoll_ctl(4, EPOLL_CTL_ADD, 3,
{events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=0, u64=0}}) = 0 <0.000249>
15:37:02.457839 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}],
1, 2000) = 1 <0.000200>
15:37:02.458838 write(3,
"\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 15) = -1
EAGAIN (Resource temporarily unavailable) <0.000278>
15:37:02.459908 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}],
1, 2000) = 1 <0.000250>
15:37:02.460879 write(3,
"\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 15) = -1
EAGAIN (Resource temporarily unavailable) <0.000272>
15:37:02.461831 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}],
1, 2000) = 1 <0.000199>
...
impossible to write on the socket, and busy-looping.

With a change to epoll_ctl flags, now including EPOLLET, I get:
...
15:36:22.443689 epoll_ctl(4, EPOLL_CTL_ADD, 3,
{events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=0, u64=0}}) = 0
<0.000171>
15:36:22.444514 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}],
1, 2000) = 1 <0.000188>
15:36:22.445413 write(3,
"\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 15) = -1
EAGAIN (Resource temporarily unavailable) <0.000175>
15:36:22.446335 epoll_wait(4, [], 1, 2000) = 0 <2.026006>
...
epoll_wait now blocks indefinitely.

By reverting 79e19fa79c, I get better results but still incorrect:
...
15:41:43.890880 epoll_ctl(4, EPOLL_CTL_ADD, 3,
{events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=0, u64=0}}) = 0 <0.000200>
15:41:43.892011 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}],
1, 2000) = 1 <0.000408>
15:41:43.893148 write(3,
"\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 15) = 15
<0.000458>
15:41:43.894405 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}],
1, 2000) = 1 <0.000257>
15:41:43.895385 write(3,
"\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 15) = 15
<0.128429>
15:41:44.028757 epoll_wait(4, [{events=EPOLLOUT, data={u32=0, u64=0}}],
1, 2000) = 1 <0.001886>
15:41:44.040858 write(3,
"\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 15) = 15
<0.108069>
...
It is then possible to write on the socket but the write is blocking,
which is
not the expected behavior for a non-blocking socket.

I don't know how to solve the problem. To me, using wq_has_sleeper seems
weird.
The implementation of isotp_poll feels weird too (calling both
datagram_poll and
poll_wait?). But I am not sure what would be the correct implementation.

Note that this behavior is currently on all linux-stable branches. I am
currently testing on v6.1.36 but I know it is failing on v6.3 too.

My actual use-case is in Async Rust using tokio. Not using epoll is not an
option and a non-blocking socket that sometimes blocks can be terrible for
the performance of the whole application. Our embedded target runs with the
out-of-tree module on an older linux version, but my colleagues with
up-to-date
linux stable kernels have recently been prevented from running tests on
their
PC.

Is there someone with a good idea of how to proceed to fix this?
I'll probably keep digging but I don't think I can spend so much time on
this,
so any help is appreciated.

Thanks,
Maxime




2023-06-30 23:13:17

by Michal Sojka

[permalink] [raw]
Subject: Re: can: isotp: epoll breaks isotp_sendmsg

Hi Maxime,

On Fri, Jun 30 2023, Maxime Jayat wrote:
> Hi,
>
> There is something not clear happening with the non-blocking behavior
> of ISO-TP sockets in the TX path, but more importantly, using epoll now
> completely breaks isotp_sendmsg.
> I believe it is related to
> 79e19fa79c ("can: isotp: isotp_ops: fix poll() to not report false
> EPOLLOUT events"),
> but actually is probably deeper than that.
>
> I don't completely understand what is exactly going on, so I am sharing
> the problem I face:
>
> With an ISO-TP socket in non-blocking mode, using epoll seems to make
> isotp_sendmsg always return -EAGAIN.

That's definitely not expected behavior. I tested the patch only with
poll, hoping that epoll would behave the same.

[...]

>
> By reverting 79e19fa79c, I get better results but still incorrect:

[...]

> It is then possible to write on the socket but the write is blocking,
> which is not the expected behavior for a non-blocking socket.

Yes, incorrect behavior was why we made the commit in question, however
we saw write() returning -EAGAIN when it shouldn't.

> I don't know how to solve the problem. To me, using wq_has_sleeper seems
> weird.

Agreed. I've never tried to understand how synchronization works here.
Hopefully, Oliver knows more.

> The implementation of isotp_poll feels weird too (calling both
> datagram_poll and
> poll_wait?). But I am not sure what would be the correct
> implementation.

I understand it as follows (which might be wrong - someone, please
correct me), isotp_poll() should register the file with all waitqueues
it can wait on. so->wait is one and sock->sq.wait (used by
datagram_poll) is another. The former is definitely used for TX, the
latter is probably used because skb_recv_datagram() is called for RX.
But so->wait is also used for RX and there might proabbly be be some
inconsistency between those.

> My actual use-case is in Async Rust using tokio.

Our initial motivation was also Rust and tokio however than I did
testing only with simple C programs. I'm definitely interested in having
this working.

I'll try to look at this in more detail during the weekend. It's too
late for me today.

Best regards,
-Michal

2023-07-17 08:40:36

by Marc Kleine-Budde

[permalink] [raw]
Subject: Re: can: isotp: epoll breaks isotp_sendmsg

On 01.07.2023 00:35:18, Michal Sojka wrote:
> Hi Maxime,
>
> On Fri, Jun 30 2023, Maxime Jayat wrote:
> > Hi,
> >
> > There is something not clear happening with the non-blocking behavior
> > of ISO-TP sockets in the TX path, but more importantly, using epoll now
> > completely breaks isotp_sendmsg.
> > I believe it is related to
> > 79e19fa79c ("can: isotp: isotp_ops: fix poll() to not report false
> > EPOLLOUT events"),
> > but actually is probably deeper than that.
> >
> > I don't completely understand what is exactly going on, so I am sharing
> > the problem I face:
> >
> > With an ISO-TP socket in non-blocking mode, using epoll seems to make
> > isotp_sendmsg always return -EAGAIN.
>
> That's definitely not expected behavior. I tested the patch only with
> poll, hoping that epoll would behave the same.
>
> [...]
>
> >
> > By reverting 79e19fa79c, I get better results but still incorrect:
>
> [...]
>
> > It is then possible to write on the socket but the write is blocking,
> > which is not the expected behavior for a non-blocking socket.
>
> Yes, incorrect behavior was why we made the commit in question, however
> we saw write() returning -EAGAIN when it shouldn't.
>
> > I don't know how to solve the problem. To me, using wq_has_sleeper seems
> > weird.
>
> Agreed. I've never tried to understand how synchronization works here.
> Hopefully, Oliver knows more.
>
> > The implementation of isotp_poll feels weird too (calling both
> > datagram_poll and
> > poll_wait?). But I am not sure what would be the correct
> > implementation.
>
> I understand it as follows (which might be wrong - someone, please
> correct me), isotp_poll() should register the file with all waitqueues
> it can wait on. so->wait is one and sock->sq.wait (used by
> datagram_poll) is another. The former is definitely used for TX, the
> latter is probably used because skb_recv_datagram() is called for RX.
> But so->wait is also used for RX and there might proabbly be be some
> inconsistency between those.
>
> > My actual use-case is in Async Rust using tokio.
>
> Our initial motivation was also Rust and tokio however than I did
> testing only with simple C programs. I'm definitely interested in having
> this working.
>
> I'll try to look at this in more detail during the weekend. It's too
> late for me today.

Any progress on this issue?

Marc

--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung Nürnberg | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-9 |


Attachments:
(No filename) (2.64 kB)
signature.asc (499.00 B)
Download all attachments

2023-07-17 13:17:30

by Michal Sojka

[permalink] [raw]
Subject: Re: can: isotp: epoll breaks isotp_sendmsg

On Mon, Jul 17 2023, Marc Kleine-Budde wrote:
> Any progress on this issue?

I was on the vacation until today. Hope to look at it soon.

-Michal

2023-08-13 13:42:10

by Lukas Magel

[permalink] [raw]
Subject: Re: can: isotp: epoll breaks isotp_sendmsg

Hi Maxime, hi Michal,

On 01.07.23 00:35, Michal Sojka wrote:
> Hi Maxime,
>
> On Fri, Jun 30 2023, Maxime Jayat wrote:
>> Hi,
>>
>> There is something not clear happening with the non-blocking behavior
>> of ISO-TP sockets in the TX path, but more importantly, using epoll now
>> completely breaks isotp_sendmsg.
>> I believe it is related to
>> 79e19fa79c ("can: isotp: isotp_ops: fix poll() to not report false
>> EPOLLOUT events"),
>> but actually is probably deeper than that.
>>
>> I don't completely understand what is exactly going on, so I am sharing
>> the problem I face:
>>
>> With an ISO-TP socket in non-blocking mode, using epoll seems to make
>> isotp_sendmsg always return -EAGAIN.
> That's definitely not expected behavior. I tested the patch only with
> poll, hoping that epoll would behave the same.
>
> [...]


I am writing to report that we have been witnessing a behavior very similar
to what you describe. ISO-TP send breaks with EAGAIN if a poll (for a read)
occurs at the same time.

Our Python stack uses two threads to do concurrent, blocking reads & writes
from and to the ISO-TP socket. The socket has a timeout of 0.1s to facilitate
shutdown when requested by the application. Notably, the blocking semantics
are handled by CPython, i.e. the underlying kernel socket is non-blocking.
CPython polls until the requested operation (read or write) can be executed
or the timeout occurs.

What happens during execution is that the socket is continuously being
polled by the read thread, i.e. so->wait is always filled with one task.
This process repeats until the socket receives a frame from the bus and the
poll returns successsfully. The app reads the data from the socket and
sends a response. Since the send occurs in a different thread, the reader
thread will have already returned to its poll loop and to populating
so->wait. When the send occurs, isotp_sendmsg checks so->wait for sleepers
and returns EAGAIN because the socket is non-blocking although there is no
concurrent send operation. This dance continues until the timeout occurs for
either the read or the write operation. If the write times out first, a
timeout error causes the Python app to break. If the read times out first,
there is a race that the write goes through or the reader puts in
another poll.

This behavior can be seen rather nicely in strace:
110580 poll([{fd=5, events=POLLIN}], 1, 100) = 0 (Timeout)
110580 poll([{fd=5, events=POLLIN}], 1, 100) = 0 (Timeout)
110580 poll([{fd=5, events=POLLIN}], 1, 100) = 1 ([{fd=5, revents=POLLIN}])
110580 recvfrom(5, ">\0", 4095, 0, NULL, NULL) = 2
110580 poll([{fd=5, events=POLLIN}], 1, 100 <unfinished ...>
110569 poll([{fd=5, events=POLLOUT}], 1, 100) = 1 ([{fd=5, revents=POLLOUT}])
110569 sendto(5, "~\0", 2, 0, NULL, 0)  = -1 EAGAIN (Resource temporarily unavailable)
110569 poll([{fd=5, events=POLLOUT}], 1, 100) = 1 ([{fd=5, revents=POLLOUT}])
110569 sendto(5, "~\0", 2, 0, NULL, 0)  = -1 EAGAIN (Resource temporarily unavailable)
110569 poll([{fd=5, events=POLLOUT}], 1, 100) = 1 ([{fd=5, revents=POLLOUT}])
110569 sendto(5, "~\0", 2, 0, NULL, 0)  = -1 EAGAIN (Resource temporarily unavailable)
[....]
110569 poll([{fd=5, events=POLLOUT}], 1, 6) = 1 ([{fd=5, revents=POLLOUT}])
110569 sendto(5, "~\0", 2, 0, NULL, 0)  = -1 EAGAIN (Resource temporarily unavailable)
110569 poll([{fd=5, events=POLLOUT}], 1, 6 <unfinished ...>
110580 <... poll resumed>)              = 0 (Timeout)
110569 <... poll resumed>)              = 1 ([{fd=5, revents=POLLOUT}])
110580 poll([{fd=5, events=POLLIN}], 1, 100 <unfinished ...>
110569 sendto(5, "~\0", 2, 0, NULL, 0)  = -1 EAGAIN (Resource temporarily unavailable)
110569 poll([{fd=5, events=POLLOUT}], 1, 5) = 1 ([{fd=5, revents=POLLOUT}])
110569 sendto(5, "~\0", 2, 0, NULL, 0)  = -1 EAGAIN (Resource temporarily unavailable)

I believe this is consistent to the behavior you're witnessing with epoll
because epoll also places a sleeper in so->wait that is left in the queue
until the epoll descriptor is closed.


>
>> By reverting 79e19fa79c, I get better results but still incorrect:
> [...]
>
>> It is then possible to write on the socket but the write is blocking,
>> which is not the expected behavior for a non-blocking socket.
> Yes, incorrect behavior was why we made the commit in question, however
> we saw write() returning -EAGAIN when it shouldn't.
>
>> I don't know how to solve the problem. To me, using wq_has_sleeper seems
>> weird.
> Agreed. I've never tried to understand how synchronization works here.
> Hopefully, Oliver knows more.
>
>> The implementation of isotp_poll feels weird too (calling both
>> datagram_poll and
>> poll_wait?). But I am not sure what would be the correct
>> implementation.
> I understand it as follows (which might be wrong - someone, please
> correct me), isotp_poll() should register the file with all waitqueues
> it can wait on. so->wait is one and sock->sq.wait (used by
> datagram_poll) is another. The former is definitely used for TX, the
> latter is probably used because skb_recv_datagram() is called for RX.
> But so->wait is also used for RX and there might proabbly be be some
> inconsistency between those.


AFAIK, it is correct behavior for isotp_poll to register all wait queues
with the poller.

Before 79e19fa79c, I assume datagram_poll always returned
EPOLLOUT because its corresponding send buffer is unused and empty. This
return value can be incorrect if so->tx.state is not IDLE and a send would
block. With the patch, this behavior is now suppressed. I believe that
the inconsistency could have rather been introduced with:
can: isotp: fix race between isotp_sendsmg() and isotp_release() 0517374

With this patch, the behavior of isotp_sendmsg was changed to only check
so->wait for sleepers instead of consulting so->tx.state to see if the
socket is busy. Since the wait queue can also have sleepers only interested
in read operations, I believe the return value is not a valid indicator
of send readiness. Additionally, in this state, the behavior in isotp_sendmsg
is inconsistent with isotp_poll. I will try to test next week if reverting this
part of the patch could fix the race condition.


>
>> My actual use-case is in Async Rust using tokio.
> Our initial motivation was also Rust and tokio however than I did
> testing only with simple C programs. I'm definitely interested in having
> this working.
>
> I'll try to look at this in more detail during the weekend. It's too
> late for me today.
>
> Best regards,
> -Michal
>
Regards,
Lukas