From: Eric Dumazet <[email protected]>
Date: Thu, 4 Feb 2021 23:44:17 +0100
> On Thu, Feb 4, 2021 at 11:14 PM Saeed Mahameed <[email protected]> wrote:
> >
> > On Thu, 2021-02-04 at 13:31 -0800, Eric Dumazet wrote:
> > > From: Eric Dumazet <[email protected]>
> > >
> > > Commit c80794323e82 ("net: Fix packet reordering caused by GRO and
> > > listified RX cooperation") had the unfortunate effect of adding
> > > latencies in common workloads.
> > >
> > > Before the patch, GRO packets were immediately passed to
> > > upper stacks.
> > >
> > > After the patch, we can accumulate quite a lot of GRO
> > > packets (depdending on NAPI budget).
> > >
> >
> > Why napi budget ? looking at the code it seems to be more related to
> > MAX_GRO_SKBS * gro_normal_batch, since we are counting GRO SKBs as 1
>
>
> Simply because we call gro_normal_list() from napi_poll(),
>
> So we flush the napi rx_list every 64 packets under stress.(assuming
> NIC driver uses NAPI_POLL_WEIGHT),
> or more often if napi_complete_done() is called if the budget was not exhausted.
Saeed,
Eric means that if we have e.g. 8 GRO packets with 8 segs each, then
rx_list will be flushed only after processing of 64 ingress frames.
> GRO always has been able to keep MAX_GRO_SKBS in its layer, but no recent patch
> has changed this part.
>
>
> >
> >
> > but maybe i am missing some information about the actual issue you are
> > hitting.
>
>
> Well, the issue is precisely described in the changelog.
>
> >
> >
> > > My fix is counting in napi->rx_count number of segments
> > > instead of number of logical packets.
> > >
> > > Fixes: c80794323e82 ("net: Fix packet reordering caused by GRO and
> > > listified RX cooperation")
> > > Signed-off-by: Eric Dumazet <[email protected]>
> > > Bisected-by: John Sperbeck <[email protected]>
> > > Tested-by: Jian Yang <[email protected]>
> > > Cc: Maxim Mikityanskiy <[email protected]>
> > > Cc: Alexander Lobakin <[email protected]>
It's strange why mailmap didn't pick up my active email at pm.me.
Anyways, this fix is correct for me. It restores the original Edward's
logics, but without spurious out-of-order deliveries.
Moreover, the pre-patch behaviour can easily be achieved by increasing
net.core.gro_normal_batch if needed.
Thanks!
Reviewed-by: Alexander Lobakin <[email protected]>
> > > Cc: Saeed Mahameed <[email protected]>
> > > Cc: Edward Cree <[email protected]>
> > > ---
> > > net/core/dev.c | 11 ++++++-----
> > > 1 file changed, 6 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index
> > > a979b86dbacda9dfe31dd8b269024f7f0f5a8ef1..449b45b843d40ece7dd1e2ed6a5
> > > 996ee1db9f591 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -5735,10 +5735,11 @@ static void gro_normal_list(struct
> > > napi_struct *napi)
> > > /* Queue one GRO_NORMAL SKB up for list processing. If batch size
> > > exceeded,
> > > * pass the whole batch up to the stack.
> > > */
> > > -static void gro_normal_one(struct napi_struct *napi, struct sk_buff
> > > *skb)
> > > +static void gro_normal_one(struct napi_struct *napi, struct sk_buff
> > > *skb, int segs)
> > > {
> > > list_add_tail(&skb->list, &napi->rx_list);
> > > - if (++napi->rx_count >= gro_normal_batch)
> > > + napi->rx_count += segs;
> > > + if (napi->rx_count >= gro_normal_batch)
> > > gro_normal_list(napi);
> > > }
> > >
> > > @@ -5777,7 +5778,7 @@ static int napi_gro_complete(struct napi_struct
> > > *napi, struct sk_buff *skb)
> > > }
> > >
> > > out:
> > > - gro_normal_one(napi, skb);
> > > + gro_normal_one(napi, skb, NAPI_GRO_CB(skb)->count);
> >
> > Seems correct to me,
> >
> > Reviewed-by: Saeed Mahameed <[email protected]>
Al