MIME-Version: 1.0
References: <20230311151756.83302-1-kerneljasonxing@gmail.com>
 <CANn89iKWewG7JZXQ=bmab9rSXUs_P5fX-BQ792QjYuH151DV-g@mail.gmail.com>
 <CAL+tcoAchbTk9ibrAVH-bZ-0KHJ8g3XnsQHFWiBosyNgYJtymA@mail.gmail.com> <CANn89i+uS7-mA227g6yJfTK4ugdA82z+PLV9_74f1dBMo_OhEg@mail.gmail.com>
In-Reply-To: <CANn89i+uS7-mA227g6yJfTK4ugdA82z+PLV9_74f1dBMo_OhEg@mail.gmail.com>
From:   Jason Xing <kerneljasonxing@gmail.com>
Date:   Tue, 14 Mar 2023 01:15:39 +0800
Message-ID: <CAL+tcoCsQ18ae+hUwqFigerJQfhrusuOOC63Wc+ZGyGWEvSFBQ@mail.gmail.com>
Subject: Re: [PATCH net-next] net-sysfs: display two backlog queue len separately
To:     Eric Dumazet <edumazet@google.com>
Cc:     davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
        netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
        Jason Xing <kernelxing@tencent.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Precedence: bulk

On Mon, Mar 13, 2023 at 11:59=E2=80=AFPM Eric Dumazet <edumazet@google.com>=
 wrote:
>
> On Mon, Mar 13, 2023 at 6:16=E2=80=AFAM Jason Xing <kerneljasonxing@gmail=
.com> wrote:
> >
> > On Mon, Mar 13, 2023 at 8:34=E2=80=AFPM Eric Dumazet <edumazet@google.c=
om> wrote:
> > >
> > > On Sat, Mar 11, 2023 at 7:18=E2=80=AFAM Jason Xing <kerneljasonxing@g=
mail.com> wrote:
> > > >
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > Sometimes we need to know which one of backlog queue can be exactly
> > > > long enough to cause some latency when debugging this part is neede=
d.
> > > > Thus, we can then separate the display of both.
> > > >
> > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > ---
> > > >  net/core/net-procfs.c | 17 ++++++++++++-----
> > > >  1 file changed, 12 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
> > > > index 1ec23bf8b05c..97a304e1957a 100644
> > > > --- a/net/core/net-procfs.c
> > > > +++ b/net/core/net-procfs.c
> > > > @@ -115,10 +115,14 @@ static int dev_seq_show(struct seq_file *seq,=
 void *v)
> > > >         return 0;
> > > >  }
> > > >
> > > > -static u32 softnet_backlog_len(struct softnet_data *sd)
> > > > +static u32 softnet_input_pkt_queue_len(struct softnet_data *sd)
> > > >  {
> > > > -       return skb_queue_len_lockless(&sd->input_pkt_queue) +
> > > > -              skb_queue_len_lockless(&sd->process_queue);
> > > > +       return skb_queue_len_lockless(&sd->input_pkt_queue);
> > > > +}
> > > > +
> > > > +static u32 softnet_process_queue_len(struct softnet_data *sd)
> > > > +{
> > > > +       return skb_queue_len_lockless(&sd->process_queue);
> > > >  }
> > > >
> > > >  static struct softnet_data *softnet_get_online(loff_t *pos)
> > > > @@ -169,12 +173,15 @@ static int softnet_seq_show(struct seq_file *=
seq, void *v)
> > > >          * mapping the data a specific CPU
> > > >          */
> > > >         seq_printf(seq,
> > > > -                  "%08x %08x %08x %08x %08x %08x %08x %08x %08x %0=
8x %08x %08x %08x\n",
> > > > +                  "%08x %08x %08x %08x %08x %08x %08x %08x %08x %0=
8x %08x %08x %08x "
> > > > +                  "%08x %08x\n",
> > > >                    sd->processed, sd->dropped, sd->time_squeeze, 0,
> > > >                    0, 0, 0, 0, /* was fastroute */
> > > >                    0,   /* was cpu_collision */
> > > >                    sd->received_rps, flow_limit_count,
> > > > -                  softnet_backlog_len(sd), (int)seq->index);
> > > > +                  0,   /* was len of two backlog queues */
> > >
> > > You can not pretend the sum is zero, some user space tools out there
> > > would be fooled.
> > >
> > > > +                  (int)seq->index,
> > > > +                  softnet_input_pkt_queue_len(sd), softnet_process=
_queue_len(sd));
> > > >         return 0;
> > > >  }
> > > >
> > > > --
> > > > 2.37.3
> > > >
> > >
> > > In general I would prefer we no longer change this file.
> >
> > Fine. Since now, let this legacy file be one part of history.
> >
> > >
> > > Perhaps add a tracepoint instead ?
> >
> > Thanks, Eric. It's one good idea. It seems acceptable if we only need
> > to trace two separate backlog queues where it can probably hit the
> > limit, say, in the enqueue_to_backlog().
>
>
[...]
> Note that enqueue_to_backlog() already uses a specific kfree_skb_reason()=
 reason
> (SKB_DROP_REASON_CPU_BACKLOG) so existing infrastructure should work just=
 fine.

Sure, I noticed that. It traces all the kfree_skb paths, not only
softnet_data. If it isn't proper, what would you recommend where to
put the trace function into? Now I'm thinking of resorting to the
legacy file we discussed above :(

>
>
> >
> > Similarly I decide to write another two tracepoints of time_squeeze
> > and budget_squeeze which I introduced to distinguish from time_squeeze
> > as the below link shows:
> > https://lore.kernel.org/lkml/CAL+tcoAwodpnE2NjMLPhBbmHUvmKMgSykqx0EQ4YZ=
aQHjrx0Hw@mail.gmail.com/.
> > For that change, any suggestions are deeply welcome :)
> >
>
> For your workloads to hit these limits enough for you to be worried,
> it looks like you are not using any scaling stuff documented
> in Documentation/networking/scaling.rst

Thanks for the guidance. Scaling is a good way to go really. But I
just would like to separate these two kinds of limits to watch them
closely. More often we cannot decide to adjust accurately which one
should be adjusted. Time squeeze may not be clear and we cannot
randomly write a larger number into both proc files which may do harm
to some external customers unless we can show some proof to them.

Maybe I got something wrong. If adding some tracepoints for those
limits in softnet_data is not elegant, please enlighten me :)

Thanks,
Jason