2013-03-03 18:35:10

by Eric Dumazet

[permalink] [raw]
Subject: Re: [RFC PATCH 1/5] net: implement support for low latency socket polling

On Wed, 2013-02-27 at 09:55 -0800, Eliezer Tamir wrote:

> index 821c7f4..d1d1016 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -408,6 +408,10 @@ struct sk_buff {
> struct sock *sk;
> struct net_device *dev;
>
> +#ifdef CONFIG_INET_LL_RX_POLL
> + struct napi_struct *dev_ref; /* where this skb came from */
> +#endif
> +
> /*
> * This is the control buffer. It is free to use for every
> * layer. Please put your private variables there. If you

Yes, thats the killer, because :

1) It adds 8 bytes per skb, and we are going to reach the 256 bytes per
sk_buff boundary. cloned skbs will use an extra cache line.

It might make sense to union this on dma_cookie, as dma_cookie is only
used on TX path.

2) We need to reference count napi structs.

For 2) , we would need to add a percpu ref counter (a bit like struct
netdevice -> pcpu_refcnt)

Alternative to 2) would be to use a generation id, incremented every
time a napi used in spin polling enabled driver is dismantled (and freed
after RCU grace period)

And store in sockets not only the pointer to napi_struct, but the
current generation id : If the generation id doesnt match, disable
the spinpoll until next packet rebuilds the cache again.



2013-03-03 19:21:56

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC PATCH 1/5] net: implement support for low latency socket polling

> Alternative to 2) would be to use a generation id, incremented every
> time a napi used in spin polling enabled driver is dismantled (and freed
> after RCU grace period)
>
> And store in sockets not only the pointer to napi_struct, but the
> current generation id : If the generation id doesnt match, disable
> the spinpoll until next packet rebuilds the cache again.

This would require rcu_read_lock, aka preempt off, during polling, right?

-Andi

--
[email protected] -- Speaking for myself only.

2013-03-03 21:20:07

by Eric Dumazet

[permalink] [raw]
Subject: Re: [RFC PATCH 1/5] net: implement support for low latency socket polling

On Sun, 2013-03-03 at 20:21 +0100, Andi Kleen wrote:
> > Alternative to 2) would be to use a generation id, incremented every
> > time a napi used in spin polling enabled driver is dismantled (and freed
> > after RCU grace period)
> >
> > And store in sockets not only the pointer to napi_struct, but the
> > current generation id : If the generation id doesnt match, disable
> > the spinpoll until next packet rebuilds the cache again.
>
> This would require rcu_read_lock, aka preempt off, during polling, right?
>

Of course, polling probably needs BH disabling as well to get the per
napi lock

2013-03-04 03:55:33

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC PATCH 1/5] net: implement support for low latency socket polling

On Sun, Mar 03, 2013 at 01:20:01PM -0800, Eric Dumazet wrote:
> On Sun, 2013-03-03 at 20:21 +0100, Andi Kleen wrote:
> > > Alternative to 2) would be to use a generation id, incremented every
> > > time a napi used in spin polling enabled driver is dismantled (and freed
> > > after RCU grace period)
> > >
> > > And store in sockets not only the pointer to napi_struct, but the
> > > current generation id : If the generation id doesnt match, disable
> > > the spinpoll until next packet rebuilds the cache again.
> >
> > This would require rcu_read_lock, aka preempt off, during polling, right?
> >
>
> Of course, polling probably needs BH disabling as well to get the per
> napi lock

Ok maybe the cond_resched() is good enough.

-Andi

--
[email protected] -- Speaking for myself only.

2013-03-04 08:44:08

by Eliezer Tamir

[permalink] [raw]
Subject: Re: [RFC PATCH 1/5] net: implement support for low latency socket polling

On 03/03/2013 20:35, Eric Dumazet wrote:
> On Wed, 2013-02-27 at 09:55 -0800, Eliezer Tamir wrote:
>
>> index 821c7f4..d1d1016 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -408,6 +408,10 @@ struct sk_buff {
>> struct sock *sk;
>> struct net_device *dev;
>>
>> +#ifdef CONFIG_INET_LL_RX_POLL
>> + struct napi_struct *dev_ref; /* where this skb came from */
>> +#endif
>> +
>> /*
>> * This is the control buffer. It is free to use for every
>> * layer. Please put your private variables there. If you
>
> Yes, thats the killer, because :
>
> 1) It adds 8 bytes per skb, and we are going to reach the 256 bytes per
> sk_buff boundary. cloned skbs will use an extra cache line.
>
> It might make sense to union this on dma_cookie, as dma_cookie is only
> used on TX path.

I will try this out.

> 2) We need to reference count napi structs.
>
> For 2) , we would need to add a percpu ref counter (a bit like struct
> netdevice -> pcpu_refcnt)
>
> Alternative to 2) would be to use a generation id, incremented every
> time a napi used in spin polling enabled driver is dismantled (and freed
> after RCU grace period)

I like this option, because one would assume that the life expectancy of
a napi is rather long. We can just increment the generation id any time
any napi is disabled, which simplifies things.

There could be other configuration changes that would make our notion on
where to poll outdated, for example, someone may have reprogrammed an RX
filter. This is not as catastrophic as a napi going away but still.

Would it make sense to make this a generic mechanism?
One could for example increment the generation id every time the RTNL is
taken. or is this too much?

Thanks,
Eliezer

2013-03-04 14:52:53

by Eric Dumazet

[permalink] [raw]
Subject: Re: [RFC PATCH 1/5] net: implement support for low latency socket polling

On Mon, 2013-03-04 at 10:43 +0200, Eliezer Tamir wrote:

> One could for example increment the generation id every time the RTNL is
> taken. or is this too much?

RTNL is taken for a lot of operations, it would be better to have a
finer grained increment.

2013-03-04 15:28:35

by Eliezer Tamir

[permalink] [raw]
Subject: Re: [RFC PATCH 1/5] net: implement support for low latency socket polling

On 04/03/2013 16:52, Eric Dumazet wrote:
> On Mon, 2013-03-04 at 10:43 +0200, Eliezer Tamir wrote:
>
>> One could for example increment the generation id every time the RTNL is
>> taken. or is this too much?
>
> RTNL is taken for a lot of operations, it would be better to have a
> finer grained increment.

If is taken rarely enough it will still be worth it.

Otherwise it may be hard to know what operations need to invalidate the
napi reference. It can very well be HW dependent, and then you end up
adding a function for drivers to call to do the invalidation.

Or we can decide that we only care about catastrophic events and only
worry about a napi completely going away and not worry about
configuration changes.(Polling the wrong queue will not kill you, it's
just a waste of perfectly good CPU cycles.)

2013-03-04 16:16:09

by Eric Dumazet

[permalink] [raw]
Subject: Re: [RFC PATCH 1/5] net: implement support for low latency socket polling

On Mon, 2013-03-04 at 17:28 +0200, Eliezer Tamir wrote:
> On 04/03/2013 16:52, Eric Dumazet wrote:
> > On Mon, 2013-03-04 at 10:43 +0200, Eliezer Tamir wrote:
> >
> >> One could for example increment the generation id every time the RTNL is
> >> taken. or is this too much?
> >
> > RTNL is taken for a lot of operations, it would be better to have a
> > finer grained increment.
>
> If is taken rarely enough it will still be worth it.
>

Yes, but eventually it makes attempts to get rid of RTNL a nightmare.

When adding new network features, just use the right semantic from the
beginning.

> Otherwise it may be hard to know what operations need to invalidate the
> napi reference. It can very well be HW dependent, and then you end up
> adding a function for drivers to call to do the invalidation.
>
> Or we can decide that we only care about catastrophic events and only
> worry about a napi completely going away and not worry about
> configuration changes.(Polling the wrong queue will not kill you, it's
> just a waste of perfectly good CPU cycles.)

As long as the incoming packets are able to update the information, who
cares if one packet missed the poll ?