by Alban Crequy

[permalink] [raw]

Subject: Re: [PATCH 4/9] AF_UNIX: find the recipients for multicast messages

Le Tue, 23 Nov 2010 17:08:37 +0100,
Eric Dumazet <[email protected]> a écrit :

> (...)

Thanks for the explanations

> Le mardi 23 novembre 2010 à 15:03 +0000, Alban Crequy a écrit :
> >
> > - Another idea would be to avoid completely the allocation by
> > inlining unix_find_multicast_recipients() inside
> > unix_dgram_sendmsg() and delivering the messages to the recipients
> > as long as the list is being iterated locklessly. But I want to
> > provide atomicity of delivery: the message must be delivered with
> > skb_queue_tail() either to all the recipients or to none of them in
> > case of interruption or memory pressure. I don't see how I can
> > achieve that without iterating several times on the list of
> > recipients, hence the allocation and the copy in the array. I also
> > want to guarantee the order of delivery as described in
> > multicast-unix-sockets.txt and for this, I am taking lots of
> > spinlocks anyway. I don't see how to avoid that, but I would be
> > happy to be wrong and have a better solution.
> >
>
>
> So if one destination has a full receive queue, you want nobody
> receive the message ? That seems a bit risky to me, if someone sends
> SIGSTOP to one of your process...

Yes. For the D-Bus usage, I want to have this guarantee. If random
remote procedure calls are lost, it will break applications built on
top of D-Bus with multicast Unix sockets. The current implementation of
D-Bus avoid this problem by having almost infinite receiving queues in
the process dbus-daemon: 1GB. But in the kernel,
/proc/sys/net/unix/max_dgram_qlen is 10 messages by default. Increasing
it a bit will not fix the problem and increasing it to 1GB is not
reasonable in kernel.

There is different actions the kernel can do when the queue is full:

1. block the sender. It is useful in RPC, we don't want random RPC to
disappear unnoticed.
2. drop the message for recipients with a full queue. It could be
acceptable for some slow monitoring tools that don't want to disturb
the applications.
3. close the receiving socket as a punishment. At least the problem is
not unnoticed and the user can have some error feedback.

I was thinking to make it configurable when a socket joins a multicast
group. So different multicast group members would behave differently.
The flag UNIX_MREQ_DROP_WHEN_FULL is there for that (but not fully
implemented in the patchset).

It makes things more complex for poll(POLLOUT). Before the buffer
reaches the kernel, it cannot run the socket filters, so it is not
possible to know the exact recipients. So poll(POLLOUT) has to block
as soon as only one receiving queue is full (unless the multicast
member has the flag UNIX_MREQ_DROP_WHEN_FULL).

When the peers install sockets filters and there is 2 flows of messages
from A to B and from C to D, if the receiving queue of D is full, it
will also block the communication from A to B: poll(A, POLLOUT) will
block. This is annoying but I don't see how to fix it.

> > To give an idea of the number of members in a multicast group for
> > the D-Bus use case, I have 90 D-Bus connections on my session bus:
> >
> > $ dbus-send --print-reply --dest=org.freedesktop.DBus \
> > /org/freedesktop/DBus org.freedesktop.DBus.ListNames | grep '":'|wc
> > -l 90
> >
> > In common cases, there should be only a few real recipients (1 or
> > 2?) after the socket filters eliminate most of them, but
> > unix_find_multicast_recipients() will still allocate an array of
> > about that size.
> >
>
> I am not sure if doing 90 clones of skb and filtering them one by one
> is going to be fast :-(

Yes... I think it can be optimized. Run the socket filter first by
calling sk_run_filter() directly and then call skb_clone() + pskb_trim()
only on the few remaining sockets.

2010-11-23 18:39:25

by David Miller

[permalink] [raw]

Subject: Re: [PATCH 4/9] AF_UNIX: find the recipients for multicast messages

From: Alban Crequy <[email protected]>
Date: Tue, 23 Nov 2010 17:47:01 +0000

> Le Tue, 23 Nov 2010 17:08:37 +0100,
> Eric Dumazet <[email protected]> a ?crit :
>> I am not sure if doing 90 clones of skb and filtering them one by one
>> is going to be fast :-(
>
> Yes... I think it can be optimized. Run the socket filter first by
> calling sk_run_filter() directly and then call skb_clone() + pskb_trim()
> only on the few remaining sockets.

BTW, we have and have talked about the same exact problem with
AF_PACKET socket users such as DHCP.

We clone and push the packet down into the AF_PACKET protocol
code from the pt_type callback when %99 of the time the socket
filter doesn't match and thus the clone is completely wasted
work.

If we know the socket, or more specifically the filter, early enough,
we could have a special interface like:

struct sk_buff *skb_filter_or_clone(struct sk_buff *skb, ...)

Which returns a non-NULL cloned SKB if the filter accepts the
packet.