LinuxLists.cc - Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

2014-06-16 09:59:15

Subject: Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

Hi Arnaldo,

Things have gone quiet ;-). What's the current state of this patch?

Thanks,

Michael

On Thu, May 29, 2014 at 4:17 PM, Arnaldo Carvalho de Melo
<[email protected]> wrote:
> Em Thu, May 29, 2014 at 02:06:04PM +0000, David Laight escreveu:
>> From: 'Arnaldo Carvalho de Melo'
>> ...
>> > > I remember some discussions from an XNET standards meeting (I've forgotten
>> > > exactly which errors on which calls were being discussed).
>> > > My recollection is that you return success with a partial transfer
>> > > count for ANY error that happens after some data has been transferred.
>> > > The actual error will be returned when it happens again on the next
>> > > system call - Note the AGAIN, not a saved error.
>
>> > A saved error, for the right entity, in the recvmmsg case, that
>> > basically is batching multiple recvmsg syscalls, doesn't sound like a
>> > problem, i.e. the idea is to, as much as possible, mimic what multiple
>> > recvmsg calls would do, but reduce its in/out kernel (and inside kernel
>> > subsystems) overhead.
>
>> > Perhaps we can have something in between, i.e. for things like EFAULT,
>> > we should report straight away, effectively dropping whatever datagrams
>> > successfully received in the current batch, do you agree?
>
>> Not unreasonable - EFAULT shouldn't happen unless the application
>> is buggy.
>
> Ok.
>
>> > For transient errors the existing mechanism, fixed so that only per
>> > socket errors are saved for later, as today, could be kept?
>
>> I don't think it is ever necessary to save an errno value for the
>> next system call at all.
>> Just process the next system call and see what happens.
>
>> If the call returns with less than the maximum number of datagrams
>> and with a non-zero timeout left - then the application can infer
>> that it was terminated by an abnormal event of some kind.
>> This might be a signal.
>
> Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the
> error on the next call, but we provide a way for the app to retrieve the
> reason for the smaller than expected batch?
>
>> I'm not sure if an icmp error on a connected datagram socket could
>> generate a 'disconnect'. It might happen if the interface is being
>> used for something like SCTP.
>> In either case the next call will detect the error.
>
> - Arnaldo

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2014-06-24 20:25:50

by Arnaldo Carvalho de Melo

[permalink] [raw]

Subject: Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

Em Mon, Jun 16, 2014 at 11:58:51AM +0200, Michael Kerrisk (man-pages) escreveu:
> Hi Arnaldo,
>
> Things have gone quiet ;-). What's the current state of this patch?

Yeah, I kept meaning to prod the other people on this thread about what
they thought about my last messages, patches, etc. :-)

Can I have acked-by or even tested-by on those? Is it ok?

- Arnaldo

> Thanks,
>
> Michael
>
>
> On Thu, May 29, 2014 at 4:17 PM, Arnaldo Carvalho de Melo
> <[email protected]> wrote:
> > Em Thu, May 29, 2014 at 02:06:04PM +0000, David Laight escreveu:
> >> From: 'Arnaldo Carvalho de Melo'
> >> ...
> >> > > I remember some discussions from an XNET standards meeting (I've forgotten
> >> > > exactly which errors on which calls were being discussed).
> >> > > My recollection is that you return success with a partial transfer
> >> > > count for ANY error that happens after some data has been transferred.
> >> > > The actual error will be returned when it happens again on the next
> >> > > system call - Note the AGAIN, not a saved error.
> >
> >> > A saved error, for the right entity, in the recvmmsg case, that
> >> > basically is batching multiple recvmsg syscalls, doesn't sound like a
> >> > problem, i.e. the idea is to, as much as possible, mimic what multiple
> >> > recvmsg calls would do, but reduce its in/out kernel (and inside kernel
> >> > subsystems) overhead.
> >
> >> > Perhaps we can have something in between, i.e. for things like EFAULT,
> >> > we should report straight away, effectively dropping whatever datagrams
> >> > successfully received in the current batch, do you agree?
> >
> >> Not unreasonable - EFAULT shouldn't happen unless the application
> >> is buggy.
> >
> > Ok.
> >
> >> > For transient errors the existing mechanism, fixed so that only per
> >> > socket errors are saved for later, as today, could be kept?
> >
> >> I don't think it is ever necessary to save an errno value for the
> >> next system call at all.
> >> Just process the next system call and see what happens.
> >
> >> If the call returns with less than the maximum number of datagrams
> >> and with a non-zero timeout left - then the application can infer
> >> that it was terminated by an abnormal event of some kind.
> >> This might be a signal.
> >
> > Then it could use getsockopt(SO_ERROR) perhaps? I.e. we don't return the
> > error on the next call, but we provide a way for the app to retrieve the
> > reason for the smaller than expected batch?
> >
> >> I'm not sure if an icmp error on a connected datagram socket could
> >> generate a 'disconnect'. It might happen if the interface is being
> >> used for something like SCTP.
> >> In either case the next call will detect the error.
> >
> > - Arnaldo
>
>
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2014-06-27 11:29:38

by Michael Kerrisk (man-pages)

[permalink] [raw]

Subject: Re: [PATCH/RFC] Re: recvmmsg() timeout behavior strangeness [RESEND]

On 06/24/2014 10:25 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jun 16, 2014 at 11:58:51AM +0200, Michael Kerrisk (man-pages) escreveu:
>> Hi Arnaldo,
>>
>> Things have gone quiet ;-). What's the current state of this patch?
>
> Yeah, I kept meaning to prod the other people on this thread about what
> they thought about my last messages, patches, etc. :-)
>
> Can I have acked-by or even tested-by on those? Is it ok?

I just need to go back and test one point that sounds like it might still be
broken.

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/