2006-10-14 21:43:30

by Joan Raventos

[permalink] [raw]
Subject: poll problem with PF_PACKET when using PACKET_RX_RING

Hello,

In order to use PF_PACKET/SOCK_RAW with PACKET_RX_RING
one would possibly do (as described in
Documentation/networking/packet_mmap.txt):
1. setup PF_PACKET socket via socket call.
2. use setsockopt to change the PF_PACKET socket into
PACKET_RX_RING mode and alloc the ring.
3. mmap the ring.
4. use poll with the socket descriptor and then
directly access the pkts from the mmaped ring.

However I've observed that if the socket is opened on
a link with substantial traffic, chances are that some
pkts might hit the socket between (1) and (2). At that
point those pkts will make the socket descriptor be
active (packet_poll is built as an OR from
datagram_poll -essentially whether sk_receive_queue is
empty- and the ring status). However after (2) pkts
are no longer processed by packet_rcv but via
tpacket_rcv, and thus no longer queued in the
sk_receive_queue but into the ring. Moreover since the
user-space app is likely to access them directly (4),
no one is going to empty the socket queue and the
descriptor will remain always active, converting the
poll loop into a busy wait (100% cpu occupied by the
user-space process, etc.).

Is this a bug in PF_PACKET? Should the socket queue be
emptied by packet_set_ring (called via setsockopt when
PACKET_RX_RING is used) so the above cannot happen?
Should the user-space app drain the socket queue with
recvfrom prior to (4) -quite unlikely in practice-?

Thx in advance for your comments!

Salu2,
J.


2006-10-15 14:00:48

by Patrick McHardy

[permalink] [raw]
Subject: Re: poll problem with PF_PACKET when using PACKET_RX_RING

Joan Raventos wrote:
> Hello,
>
> In order to use PF_PACKET/SOCK_RAW with PACKET_RX_RING
> one would possibly do (as described in
> Documentation/networking/packet_mmap.txt):
> 1. setup PF_PACKET socket via socket call.
> 2. use setsockopt to change the PF_PACKET socket into
> PACKET_RX_RING mode and alloc the ring.
> 3. mmap the ring.
> 4. use poll with the socket descriptor and then
> directly access the pkts from the mmaped ring.
>
> However I've observed that if the socket is opened on
> a link with substantial traffic, chances are that some
> pkts might hit the socket between (1) and (2). At that
> point those pkts will make the socket descriptor be
> active (packet_poll is built as an OR from
> datagram_poll -essentially whether sk_receive_queue is
> empty- and the ring status). However after (2) pkts
> are no longer processed by packet_rcv but via
> tpacket_rcv, and thus no longer queued in the
> sk_receive_queue but into the ring. Moreover since the
> user-space app is likely to access them directly (4),
> no one is going to empty the socket queue and the
> descriptor will remain always active, converting the
> poll loop into a busy wait (100% cpu occupied by the
> user-space process, etc.).
>
> Is this a bug in PF_PACKET? Should the socket queue be
> emptied by packet_set_ring (called via setsockopt when
> PACKET_RX_RING is used) so the above cannot happen?
> Should the user-space app drain the socket queue with
> recvfrom prior to (4) -quite unlikely in practice-?


I guess the best way is not to bind the socket before having
completed setup. We could still flush the queue to make life
easier for userspace, not sure about that ..

2006-10-15 18:11:04

by Joan Raventos

[permalink] [raw]
Subject: Re: poll problem with PF_PACKET when using PACKET_RX_RING

Hi Patrick,

Thx for your prompt reply! Plz see some comments inline.

>>
>> Is this a bug in PF_PACKET? Should the socket queue be
>> emptied by packet_set_ring (called via setsockopt when
>> PACKET_RX_RING is used) so the above cannot happen?
>> Should the user-space app drain the socket queue with
>> recvfrom prior to (4) -quite unlikely in practice-?


> I guess the best way is not to bind the socket before having
> completed setup. We could still flush the queue to make life
> easier for userspace, not sure about that ..

Even w/o bind, packet_create is doing a dev_add_pack, which I think will make pkts arrive to that socket (ie. in netif_receive_skb one can see the loops over the rcu for both ptype_all and type-specific which seem match whenever !ptype->dev || ptype->dev==skb->dev).

Also the packet_mmap.txt doc does not mention bind, which probably is more a mechanism to closely specify a dev than to signal socket readiness.

Salu2,
J.




2006-10-16 06:18:22

by Patrick McHardy

[permalink] [raw]
Subject: Re: poll problem with PF_PACKET when using PACKET_RX_RING

Joan Raventos wrote:
>>>Is this a bug in PF_PACKET? Should the socket queue be
>>>emptied by packet_set_ring (called via setsockopt when
>>>PACKET_RX_RING is used) so the above cannot happen?
>>>Should the user-space app drain the socket queue with
>>>recvfrom prior to (4) -quite unlikely in practice-?
>
>
>>I guess the best way is not to bind the socket before having
>>completed setup. We could still flush the queue to make life
>>easier for userspace, not sure about that ..
>
>
> Even w/o bind, packet_create is doing a dev_add_pack, which I think will make pkts arrive to that socket (ie. in netif_receive_skb one can see the loops over the rcu for both ptype_all and type-specific which seem match whenever !ptype->dev || ptype->dev==skb->dev).
>
> Also the packet_mmap.txt doc does not mention bind, which probably is more a mechanism to closely specify a dev than to signal socket readiness.

packet_create only calls dev_add_pack if a protocol is given.
You can use a protocol number of 0 and then bind the socket
after setting it up properly.

According to your description, you first used setsockopt(...,
PACKET_RX_RING), then mmap. In that case the receive queue
should already get flushed by packet_set_ring (about line 1710).
How did you verify that the receive queue still contains packets?

2006-10-16 10:57:50

by Pádraig Brady

[permalink] [raw]
Subject: Re: poll problem with PF_PACKET when using PACKET_RX_RING

Joan Raventos wrote:
> Hello,
>
> In order to use PF_PACKET/SOCK_RAW with PACKET_RX_RING
> one would possibly do (as described in
> Documentation/networking/packet_mmap.txt):
> 1. setup PF_PACKET socket via socket call.
> 2. use setsockopt to change the PF_PACKET socket into
> PACKET_RX_RING mode and alloc the ring.
> 3. mmap the ring.
> 4. use poll with the socket descriptor and then
> directly access the pkts from the mmaped ring.

A few years back I developed a network sniffer
on 2.4.20 using PACKET_MMAP supporting very high packet rates.
When testing with high packet rates, invariably if traffic
was present while the buffers were being setup, the buffer data
would be corrupted. I worked around it by ensuring no packets went
into the stack before the userspace process sniffing the packets was started.

P?draig.

2006-10-16 19:39:04

by Joan Raventos

[permalink] [raw]
Subject: Re: poll problem with PF_PACKET when using PACKET_RX_RING

>>>>Is this a bug in PF_PACKET? Should the socket queue be
>>>>emptied by packet_set_ring (called via setsockopt when
>>>>PACKET_RX_RING is used) so the above cannot happen?
>>>>Should the user-space app drain the socket queue with
>>>>recvfrom prior to (4) -quite unlikely in practice-?
>>
>>
>>>I guess the best way is not to bind the socket before having
>>>completed setup. We could still flush the queue to make life
>>>easier for userspace, not sure about that ..
>
>
>> Even w/o bind, packet_create is doing a dev_add_pack, which I think will make pkts arrive to that socket (ie. in netif_receive_skb one can see the loops over the rcu for both ptype_all and type-specific which seem match whenever !ptype->dev || ptype->dev==skb->dev).
>>
>> Also the packet_mmap.txt doc does not mention bind, which probably is more a mechanism to closely specify a dev than to signal socket readiness.

> packet_create only calls dev_add_pack if a protocol is given.
> You can use a protocol number of 0 and then bind the socket
> after setting it up properly.

Currently I'm using ETH_P_ALL on the socket call. If I understand your proposal correctly you suggest to pass 0 on the socket call, so dev_add_pack is not called, and afterwards use a sockaddr_ll with bind to set the sll_protocol to whatever value (ETH_P_ALL in my case). Correct?

> According to your description, you first used setsockopt(...,
> PACKET_RX_RING), then mmap. In that case the receive queue
> should already get flushed by packet_set_ring (about line 1710).

Ok, I see... I guess if mmap has not been issued by the time setsockopt is called then po->mapped == 0 and the code you point out is triggered, specifically skb_queue_purge.

> How did you verify that the receive queue still contains packets?

You are totally right! non-block recv to the descriptor returns EAGAIN, so the queues are empty. After further instrumentation of the ring code, I'm suspecting of an issue with the ring read index at the user-space app...

Nevertheless the whole ring communication between kernel and user-space seems to be based on marking pkts via a flag in each pkt slot in the ring (tp_status). I guess it works fine because the assignments are atomic (like the one on af_packet.c:671). Correct?
BTW what's the purpose of mb() and why is it exactly needed in that position in the code?

Thx again!

Salu2,
J.