We are seeing cases where neigh_cleanup_and_release() is called by
neigh_forced_gc() many times in a row with preemption turned off.
When running on a low powered CPU at a low CPU frequency, this has
been measured to keep preemption off for ~10 ms. That's not great on a
system with HZ=1000 which expects tasks to be able to schedule in
with ~1ms latency.
Suggested-by: Douglas Anderson <[email protected]>
Signed-off-by: Judy Hsiao <[email protected]>
---
net/core/neighbour.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index df81c1f0a570..f7a89c7a7673 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -256,6 +256,8 @@ static int neigh_forced_gc(struct neigh_table *tbl)
unsigned long tref = jiffies - 5 * HZ;
struct neighbour *n, *tmp;
int shrunk = 0;
+ bool finish = true;
+ unsigned long timeout = jiffies + msecs_to_jiffies(1); /* timeout in 1ms */
NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs);
@@ -278,10 +280,14 @@ static int neigh_forced_gc(struct neigh_table *tbl)
shrunk++;
if (shrunk >= max_clean)
break;
+ if (time_after(jiffies, timeout)) {
+ finish = false;
+ break;
+ }
}
}
-
- WRITE_ONCE(tbl->last_flush, jiffies);
+ if (finish)
+ WRITE_ONCE(tbl->last_flush, jiffies);
write_unlock_bh(&tbl->lock);
--
2.43.0.rc2.451.g8631bc7472-goog
On Fri, Dec 1, 2023 at 9:39 AM Judy Hsiao <[email protected]> wrote:
>
> We are seeing cases where neigh_cleanup_and_release() is called by
> neigh_forced_gc() many times in a row with preemption turned off.
> When running on a low powered CPU at a low CPU frequency, this has
> been measured to keep preemption off for ~10 ms. That's not great on a
> system with HZ=1000 which expects tasks to be able to schedule in
> with ~1ms latency.
This will not work in general, because this code runs with BH blocked.
jiffies will stay untouched for many more ms on systems with only one CPU.
I would rather not rely on jiffies here but ktime_get_ns() [1]
Also if we break the loop based on time, we might be unable to purge
the last elements in gc_list.
We might need to use a second list to make sure to cycle over all
elements eventually.
[1]
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index df81c1f0a57047e176b7c7e4809d2dae59ba6be5..e2340e6b07735db8cf6e75d23ef09bb4b0db53b4
100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -253,9 +253,11 @@ static int neigh_forced_gc(struct neigh_table *tbl)
{
int max_clean = atomic_read(&tbl->gc_entries) -
READ_ONCE(tbl->gc_thresh2);
+ u64 tmax = ktime_get_ns() + NSEC_PER_MSEC;
unsigned long tref = jiffies - 5 * HZ;
struct neighbour *n, *tmp;
int shrunk = 0;
+ int loop = 0;
NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs);
@@ -279,10 +281,16 @@ static int neigh_forced_gc(struct neigh_table *tbl)
if (shrunk >= max_clean)
break;
}
+ if (++loop == 16) {
+ if (ktime_get_ns() > tmax)
+ goto unlock;
+ loop = 0;
+ }
}
WRITE_ONCE(tbl->last_flush, jiffies);
+unlock:
write_unlock_bh(&tbl->lock);
return shrunk;
Hi,
On Fri, Dec 1, 2023 at 1:10 AM Eric Dumazet <[email protected]> wrote:
>
> On Fri, Dec 1, 2023 at 9:39 AM Judy Hsiao <[email protected]> wrote:
> >
> > We are seeing cases where neigh_cleanup_and_release() is called by
> > neigh_forced_gc() many times in a row with preemption turned off.
> > When running on a low powered CPU at a low CPU frequency, this has
> > been measured to keep preemption off for ~10 ms. That's not great on a
> > system with HZ=1000 which expects tasks to be able to schedule in
> > with ~1ms latency.
>
> This will not work in general, because this code runs with BH blocked.
>
> jiffies will stay untouched for many more ms on systems with only one CPU.
>
> I would rather not rely on jiffies here but ktime_get_ns() [1]
>
> Also if we break the loop based on time, we might be unable to purge
> the last elements in gc_list.
> We might need to use a second list to make sure to cycle over all
> elements eventually.
>
>
> [1]
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index df81c1f0a57047e176b7c7e4809d2dae59ba6be5..e2340e6b07735db8cf6e75d23ef09bb4b0db53b4
> 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -253,9 +253,11 @@ static int neigh_forced_gc(struct neigh_table *tbl)
> {
> int max_clean = atomic_read(&tbl->gc_entries) -
> READ_ONCE(tbl->gc_thresh2);
> + u64 tmax = ktime_get_ns() + NSEC_PER_MSEC;
It might be nice to make the above timeout based on jiffies. On a
HZ=100 system it's probably OK to keep preemption disabled for 10 ms
but on a HZ=1000 system you'd want 1 ms. ...so maybe you'd want to use
jiffies_to_nsecs(1)?
One worry might be that we disabled preemption _right before_ we were
supposed to be scheduled out. In that case we'll end up blocking some
other task for another full timeslice, but maybe there's not a lot we
can do there?
-Doug
On Fri, Dec 1, 2023 at 4:16 PM Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Fri, Dec 1, 2023 at 1:10 AM Eric Dumazet <[email protected]> wrote:
> >
> > On Fri, Dec 1, 2023 at 9:39 AM Judy Hsiao <[email protected]> wrote:
> > >
> > > We are seeing cases where neigh_cleanup_and_release() is called by
> > > neigh_forced_gc() many times in a row with preemption turned off.
> > > When running on a low powered CPU at a low CPU frequency, this has
> > > been measured to keep preemption off for ~10 ms. That's not great on a
> > > system with HZ=1000 which expects tasks to be able to schedule in
> > > with ~1ms latency.
> >
> > This will not work in general, because this code runs with BH blocked.
> >
> > jiffies will stay untouched for many more ms on systems with only one CPU.
> >
> > I would rather not rely on jiffies here but ktime_get_ns() [1]
> >
> > Also if we break the loop based on time, we might be unable to purge
> > the last elements in gc_list.
> > We might need to use a second list to make sure to cycle over all
> > elements eventually.
> >
> >
> > [1]
> > diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> > index df81c1f0a57047e176b7c7e4809d2dae59ba6be5..e2340e6b07735db8cf6e75d23ef09bb4b0db53b4
> > 100644
> > --- a/net/core/neighbour.c
> > +++ b/net/core/neighbour.c
> > @@ -253,9 +253,11 @@ static int neigh_forced_gc(struct neigh_table *tbl)
> > {
> > int max_clean = atomic_read(&tbl->gc_entries) -
> > READ_ONCE(tbl->gc_thresh2);
> > + u64 tmax = ktime_get_ns() + NSEC_PER_MSEC;
>
> It might be nice to make the above timeout based on jiffies. On a
> HZ=100 system it's probably OK to keep preemption disabled for 10 ms
> but on a HZ=1000 system you'd want 1 ms. ...so maybe you'd want to use
> jiffies_to_nsecs(1)?
I do not think so. 10ms would be awfully long.
We have nsec based time service, why downgrading to jiffies resolution ???
>
> One worry might be that we disabled preemption _right before_ we were
> supposed to be scheduled out. In that case we'll end up blocking some
> other task for another full timeslice, but maybe there's not a lot we
> can do there?
Can you tell us in which scenario this gc_list can be so big, other
than fuzzers ?
Hi,
On Fri, Dec 1, 2023 at 7:58 AM Eric Dumazet <[email protected]> wrote:
>
> On Fri, Dec 1, 2023 at 4:16 PM Doug Anderson <[email protected]> wrote:
> >
> > Hi,
> >
> > On Fri, Dec 1, 2023 at 1:10 AM Eric Dumazet <[email protected]> wrote:
> > >
> > > On Fri, Dec 1, 2023 at 9:39 AM Judy Hsiao <[email protected]> wrote:
> > > >
> > > > We are seeing cases where neigh_cleanup_and_release() is called by
> > > > neigh_forced_gc() many times in a row with preemption turned off.
> > > > When running on a low powered CPU at a low CPU frequency, this has
> > > > been measured to keep preemption off for ~10 ms. That's not great on a
> > > > system with HZ=1000 which expects tasks to be able to schedule in
> > > > with ~1ms latency.
> > >
> > > This will not work in general, because this code runs with BH blocked.
> > >
> > > jiffies will stay untouched for many more ms on systems with only one CPU.
> > >
> > > I would rather not rely on jiffies here but ktime_get_ns() [1]
> > >
> > > Also if we break the loop based on time, we might be unable to purge
> > > the last elements in gc_list.
> > > We might need to use a second list to make sure to cycle over all
> > > elements eventually.
> > >
> > >
> > > [1]
> > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> > > index df81c1f0a57047e176b7c7e4809d2dae59ba6be5..e2340e6b07735db8cf6e75d23ef09bb4b0db53b4
> > > 100644
> > > --- a/net/core/neighbour.c
> > > +++ b/net/core/neighbour.c
> > > @@ -253,9 +253,11 @@ static int neigh_forced_gc(struct neigh_table *tbl)
> > > {
> > > int max_clean = atomic_read(&tbl->gc_entries) -
> > > READ_ONCE(tbl->gc_thresh2);
> > > + u64 tmax = ktime_get_ns() + NSEC_PER_MSEC;
> >
> > It might be nice to make the above timeout based on jiffies. On a
> > HZ=100 system it's probably OK to keep preemption disabled for 10 ms
> > but on a HZ=1000 system you'd want 1 ms. ...so maybe you'd want to use
> > jiffies_to_nsecs(1)?
>
> I do not think so. 10ms would be awfully long.
>
> We have nsec based time service, why downgrading to jiffies resolution ???
Well, the whole issue is that we're disabling preemption, right?
Unless I'm mistaken, on a 1000 HZ system then a task's timeslice is
1ms and on a 100 HZ system then a task's timeslice is 10ms. When we
disable preemption then the problem is that we can keep going past the
end of our timeslice. This is bad for whatever task the system is
trying to schedule instead of us since it will be blocked waiting for
us to re-enable preemption.
So essentially the problem here is really tied to jiffies resolution,
right? Specifically, if jiffies is 100 Hz then it's actually
inefficient to timeout every 1 ms--I think it would be better to use
up our whole timeslice.
> > One worry might be that we disabled preemption _right before_ we were
> > supposed to be scheduled out. In that case we'll end up blocking some
> > other task for another full timeslice, but maybe there's not a lot we
> > can do there?
>
> Can you tell us in which scenario this gc_list can be so big, other
> than fuzzers ?
The place we hit this wasn't actually with fuzzers but with normal
usage in our labs. The only case where it was a really big problem was
when neigh_forced_gc() was scheduled on a "little" CPU (in a
big.LITTLE system) and that little CPU happened to be running at the
lowest CPU frequency. Specifically Judy was testing on sc7180-trogdor
and the lowest CPU Frequency of the "little" CPUs was 300 MHz. Since
the littles are less powerful than the bigs, this is roughly the
equivalent processing power of a big core running at 120 MHz.
FWIW, we are apparently no longer seeing the bad latency after
<https://crrev.com/c/4914309>, which does this:
# Increase kernel neighbor table size.
echo 1024 > /proc/sys/net/ipv4/neigh/default/gc_thresh1
echo 4096 > /proc/sys/net/ipv4/neigh/default/gc_thresh2
echo 8192 > /proc/sys/net/ipv4/neigh/default/gc_thresh3
echo 1024 > /proc/sys/net/ipv6/neigh/default/gc_thresh1
echo 4096 > /proc/sys/net/ipv6/neigh/default/gc_thresh2
echo 8192 > /proc/sys/net/ipv6/neigh/default/gc_thresh3
However, I still believe that we should land something like Judy's
patch because, no matter what kernel tunings we have, the kernel
shouldn't be disabling preemption for so long.
I will also note that, while I don't know the code at all, someone on
our networking team commented this: High CPU usage / latency on
neigh_cleanup_and_release is expected naturally because of our
relatively-noisy lab environment: there are often hundreds if not
thousands of devices + virtual devices in a single L2 network.
-Doug
On Fri, Dec 1, 2023 at 6:16 PM Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Fri, Dec 1, 2023 at 7:58 AM Eric Dumazet <[email protected]> wrote:
> >
> > On Fri, Dec 1, 2023 at 4:16 PM Doug Anderson <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > On Fri, Dec 1, 2023 at 1:10 AM Eric Dumazet <[email protected]> wrote:
> > > >
> > > > On Fri, Dec 1, 2023 at 9:39 AM Judy Hsiao <[email protected]> wrote:
> > > > >
> > > > > We are seeing cases where neigh_cleanup_and_release() is called by
> > > > > neigh_forced_gc() many times in a row with preemption turned off.
> > > > > When running on a low powered CPU at a low CPU frequency, this has
> > > > > been measured to keep preemption off for ~10 ms. That's not great on a
> > > > > system with HZ=1000 which expects tasks to be able to schedule in
> > > > > with ~1ms latency.
> > > >
> > > > This will not work in general, because this code runs with BH blocked.
> > > >
> > > > jiffies will stay untouched for many more ms on systems with only one CPU.
> > > >
> > > > I would rather not rely on jiffies here but ktime_get_ns() [1]
> > > >
> > > > Also if we break the loop based on time, we might be unable to purge
> > > > the last elements in gc_list.
> > > > We might need to use a second list to make sure to cycle over all
> > > > elements eventually.
> > > >
> > > >
> > > > [1]
> > > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> > > > index df81c1f0a57047e176b7c7e4809d2dae59ba6be5..e2340e6b07735db8cf6e75d23ef09bb4b0db53b4
> > > > 100644
> > > > --- a/net/core/neighbour.c
> > > > +++ b/net/core/neighbour.c
> > > > @@ -253,9 +253,11 @@ static int neigh_forced_gc(struct neigh_table *tbl)
> > > > {
> > > > int max_clean = atomic_read(&tbl->gc_entries) -
> > > > READ_ONCE(tbl->gc_thresh2);
> > > > + u64 tmax = ktime_get_ns() + NSEC_PER_MSEC;
> > >
> > > It might be nice to make the above timeout based on jiffies. On a
> > > HZ=100 system it's probably OK to keep preemption disabled for 10 ms
> > > but on a HZ=1000 system you'd want 1 ms. ...so maybe you'd want to use
> > > jiffies_to_nsecs(1)?
> >
> > I do not think so. 10ms would be awfully long.
> >
> > We have nsec based time service, why downgrading to jiffies resolution ???
>
> Well, the whole issue is that we're disabling preemption, right?
> Unless I'm mistaken, on a 1000 HZ system then a task's timeslice is
> 1ms and on a 100 HZ system then a task's timeslice is 10ms. When we
> disable preemption then the problem is that we can keep going past the
> end of our timeslice. This is bad for whatever task the system is
> trying to schedule instead of us since it will be blocked waiting for
> us to re-enable preemption.
>
> So essentially the problem here is really tied to jiffies resolution,
> right? Specifically, if jiffies is 100 Hz then it's actually
> inefficient to timeout every 1 ms--I think it would be better to use
> up our whole timeslice.
It is not because a kernel is built with HZ=100 that each thread has
to consume cpu times in 10ms slices.
Process scheduler does not use jiffies at all, but high resolution time service.
Keep in mind this code is run from soft-interrupt, not a dedicated processus.
>
>
> > > One worry might be that we disabled preemption _right before_ we were
> > > supposed to be scheduled out. In that case we'll end up blocking some
> > > other task for another full timeslice, but maybe there's not a lot we
> > > can do there?
> >
> > Can you tell us in which scenario this gc_list can be so big, other
> > than fuzzers ?
>
> The place we hit this wasn't actually with fuzzers but with normal
> usage in our labs. The only case where it was a really big problem was
> when neigh_forced_gc() was scheduled on a "little" CPU (in a
> big.LITTLE system) and that little CPU happened to be running at the
> lowest CPU frequency. Specifically Judy was testing on sc7180-trogdor
> and the lowest CPU Frequency of the "little" CPUs was 300 MHz. Since
> the littles are less powerful than the bigs, this is roughly the
> equivalent processing power of a big core running at 120 MHz.
>
> FWIW, we are apparently no longer seeing the bad latency after
> <https://crrev.com/c/4914309>, which does this:
>
> # Increase kernel neighbor table size.
> echo 1024 > /proc/sys/net/ipv4/neigh/default/gc_thresh1
> echo 4096 > /proc/sys/net/ipv4/neigh/default/gc_thresh2
> echo 8192 > /proc/sys/net/ipv4/neigh/default/gc_thresh3
> echo 1024 > /proc/sys/net/ipv6/neigh/default/gc_thresh1
> echo 4096 > /proc/sys/net/ipv6/neigh/default/gc_thresh2
> echo 8192 > /proc/sys/net/ipv6/neigh/default/gc_thresh3
>
> However, I still believe that we should land something like Judy's
> patch because, no matter what kernel tunings we have, the kernel
> shouldn't be disabling preemption for so long.
Sure, and I suggested a refinement, because as I said jiffies can
stick to a value.
Not sure why a refinement given by a network maintainer is not an option ?
I must be missing something.
>
> I will also note that, while I don't know the code at all, someone on
> our networking team commented this: High CPU usage / latency on
> neigh_cleanup_and_release is expected naturally because of our
> relatively-noisy lab environment: there are often hundreds if not
> thousands of devices + virtual devices in a single L2 network.
Well, I know this code a bit, trust me.
Hello,
On Fri, 1 Dec 2023, Doug Anderson wrote:
> The place we hit this wasn't actually with fuzzers but with normal
> usage in our labs. The only case where it was a really big problem was
> when neigh_forced_gc() was scheduled on a "little" CPU (in a
> big.LITTLE system) and that little CPU happened to be running at the
> lowest CPU frequency. Specifically Judy was testing on sc7180-trogdor
> and the lowest CPU Frequency of the "little" CPUs was 300 MHz. Since
> the littles are less powerful than the bigs, this is roughly the
> equivalent processing power of a big core running at 120 MHz.
If we are talking about 32-bit systems with high HZ value I now
see a little problem in neigh_alloc() where we may start neigh_forced_gc()
later on gc_thresh3, not early on gc_thresh2 as expected. This can happen
after a long idle period when last_flush becomes invalid and 'now'
is 'time_before' last_flush:
- time_after(now, tbl->last_flush + 5 * HZ))) {
+ !time_in_range_open(now, tbl->last_flush, tbl->last_flush + 5 * HZ))) {
With a big gap between gc_thresh2 and gc_thresh3 we
may work on large gc_list if we react on gc_thresh3 instead of
gc_thresh2. But such storms can not happen more than once per jiffie wrap.
> FWIW, we are apparently no longer seeing the bad latency after
> <https://crrev.com/c/4914309>, which does this:
>
> # Increase kernel neighbor table size.
> echo 1024 > /proc/sys/net/ipv4/neigh/default/gc_thresh1
> echo 4096 > /proc/sys/net/ipv4/neigh/default/gc_thresh2
We expect to keep the entries not much above 4096 and
we should see small gc_list.
> echo 8192 > /proc/sys/net/ipv4/neigh/default/gc_thresh3
On invalid last_flush we will free 8192-4096 => 4096 entries
under BH lock.
Regards
--
Julian Anastasov <[email protected]>
Hi,
On Fri, Dec 1, 2023 at 9:35 AM Eric Dumazet <[email protected]> wrote:
>
> > > > > @@ -253,9 +253,11 @@ static int neigh_forced_gc(struct neigh_table *tbl)
> > > > > {
> > > > > int max_clean = atomic_read(&tbl->gc_entries) -
> > > > > READ_ONCE(tbl->gc_thresh2);
> > > > > + u64 tmax = ktime_get_ns() + NSEC_PER_MSEC;
> > > >
> > > > It might be nice to make the above timeout based on jiffies. On a
> > > > HZ=100 system it's probably OK to keep preemption disabled for 10 ms
> > > > but on a HZ=1000 system you'd want 1 ms. ...so maybe you'd want to use
> > > > jiffies_to_nsecs(1)?
> > >
> > > I do not think so. 10ms would be awfully long.
> > >
> > > We have nsec based time service, why downgrading to jiffies resolution ???
> >
> > Well, the whole issue is that we're disabling preemption, right?
> > Unless I'm mistaken, on a 1000 HZ system then a task's timeslice is
> > 1ms and on a 100 HZ system then a task's timeslice is 10ms. When we
> > disable preemption then the problem is that we can keep going past the
> > end of our timeslice. This is bad for whatever task the system is
> > trying to schedule instead of us since it will be blocked waiting for
> > us to re-enable preemption.
> >
> > So essentially the problem here is really tied to jiffies resolution,
> > right? Specifically, if jiffies is 100 Hz then it's actually
> > inefficient to timeout every 1 ms--I think it would be better to use
> > up our whole timeslice.
>
> It is not because a kernel is built with HZ=100 that each thread has
> to consume cpu times in 10ms slices.
>
> Process scheduler does not use jiffies at all, but high resolution time service.
>
> Keep in mind this code is run from soft-interrupt, not a dedicated processus.
Fair enough. I guess my mental model is wrong here. Please disregard
my suggestion about using something based on how long "jiffies" is
then. Using a fixed 1 ms timeout sounds great, then.
> > > Can you tell us in which scenario this gc_list can be so big, other
> > > than fuzzers ?
> >
> > The place we hit this wasn't actually with fuzzers but with normal
> > usage in our labs. The only case where it was a really big problem was
> > when neigh_forced_gc() was scheduled on a "little" CPU (in a
> > big.LITTLE system) and that little CPU happened to be running at the
> > lowest CPU frequency. Specifically Judy was testing on sc7180-trogdor
> > and the lowest CPU Frequency of the "little" CPUs was 300 MHz. Since
> > the littles are less powerful than the bigs, this is roughly the
> > equivalent processing power of a big core running at 120 MHz.
> >
> > FWIW, we are apparently no longer seeing the bad latency after
> > <https://crrev.com/c/4914309>, which does this:
> >
> > # Increase kernel neighbor table size.
> > echo 1024 > /proc/sys/net/ipv4/neigh/default/gc_thresh1
> > echo 4096 > /proc/sys/net/ipv4/neigh/default/gc_thresh2
> > echo 8192 > /proc/sys/net/ipv4/neigh/default/gc_thresh3
> > echo 1024 > /proc/sys/net/ipv6/neigh/default/gc_thresh1
> > echo 4096 > /proc/sys/net/ipv6/neigh/default/gc_thresh2
> > echo 8192 > /proc/sys/net/ipv6/neigh/default/gc_thresh3
> >
> > However, I still believe that we should land something like Judy's
> > patch because, no matter what kernel tunings we have, the kernel
> > shouldn't be disabling preemption for so long.
>
> Sure, and I suggested a refinement, because as I said jiffies can
> stick to a value.
>
> Not sure why a refinement given by a network maintainer is not an option ?
>
> I must be missing something.
Your refinement was good. I was merely trying to answer the question
you asked about how we got into it as completely as possible. Sorry if
I caused confusion.
Hi,
On Fri, Dec 1, 2023 at 1:10 AM Eric Dumazet <[email protected]> wrote:
>
> On Fri, Dec 1, 2023 at 9:39 AM Judy Hsiao <[email protected]> wrote:
> >
> > We are seeing cases where neigh_cleanup_and_release() is called by
> > neigh_forced_gc() many times in a row with preemption turned off.
> > When running on a low powered CPU at a low CPU frequency, this has
> > been measured to keep preemption off for ~10 ms. That's not great on a
> > system with HZ=1000 which expects tasks to be able to schedule in
> > with ~1ms latency.
>
> This will not work in general, because this code runs with BH blocked.
>
> jiffies will stay untouched for many more ms on systems with only one CPU.
>
> I would rather not rely on jiffies here but ktime_get_ns() [1]
>
> Also if we break the loop based on time, we might be unable to purge
> the last elements in gc_list.
> We might need to use a second list to make sure to cycle over all
> elements eventually.
>
>
> [1]
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index df81c1f0a57047e176b7c7e4809d2dae59ba6be5..e2340e6b07735db8cf6e75d23ef09bb4b0db53b4
> 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -253,9 +253,11 @@ static int neigh_forced_gc(struct neigh_table *tbl)
> {
> int max_clean = atomic_read(&tbl->gc_entries) -
> READ_ONCE(tbl->gc_thresh2);
> + u64 tmax = ktime_get_ns() + NSEC_PER_MSEC;
> unsigned long tref = jiffies - 5 * HZ;
> struct neighbour *n, *tmp;
> int shrunk = 0;
> + int loop = 0;
>
> NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs);
>
> @@ -279,10 +281,16 @@ static int neigh_forced_gc(struct neigh_table *tbl)
> if (shrunk >= max_clean)
> break;
> }
> + if (++loop == 16) {
> + if (ktime_get_ns() > tmax)
> + goto unlock;
> + loop = 0;
> + }
> }
>
> WRITE_ONCE(tbl->last_flush, jiffies);
>
> +unlock:
> write_unlock_bh(&tbl->lock);
I'm curious what the plan here is. Your patch looks OK to me and I
could give it a weak Reviewed-by, but I don't know the code well
enough to know if we also need to address your second comment that we
need to "use a second list to make sure to cycle over all elements
eventually". Is that something you'd expect to get resolved before
landing?
Thanks! :-)
-Doug
On 12/4/23 4:40 PM, Doug Anderson wrote:
> Hi,
>
> On Fri, Dec 1, 2023 at 1:10 AM Eric Dumazet <[email protected]> wrote:
>>
>> On Fri, Dec 1, 2023 at 9:39 AM Judy Hsiao <[email protected]> wrote:
>>>
>>> We are seeing cases where neigh_cleanup_and_release() is called by
>>> neigh_forced_gc() many times in a row with preemption turned off.
>>> When running on a low powered CPU at a low CPU frequency, this has
>>> been measured to keep preemption off for ~10 ms. That's not great on a
>>> system with HZ=1000 which expects tasks to be able to schedule in
>>> with ~1ms latency.
>>
>> This will not work in general, because this code runs with BH blocked.
>>
>> jiffies will stay untouched for many more ms on systems with only one CPU.
>>
>> I would rather not rely on jiffies here but ktime_get_ns() [1]
>>
>> Also if we break the loop based on time, we might be unable to purge
>> the last elements in gc_list.
>> We might need to use a second list to make sure to cycle over all
>> elements eventually.
>>
>>
>> [1]
>> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
>> index df81c1f0a57047e176b7c7e4809d2dae59ba6be5..e2340e6b07735db8cf6e75d23ef09bb4b0db53b4
>> 100644
>> --- a/net/core/neighbour.c
>> +++ b/net/core/neighbour.c
>> @@ -253,9 +253,11 @@ static int neigh_forced_gc(struct neigh_table *tbl)
>> {
>> int max_clean = atomic_read(&tbl->gc_entries) -
>> READ_ONCE(tbl->gc_thresh2);
>> + u64 tmax = ktime_get_ns() + NSEC_PER_MSEC;
>> unsigned long tref = jiffies - 5 * HZ;
>> struct neighbour *n, *tmp;
>> int shrunk = 0;
>> + int loop = 0;
>>
>> NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs);
>>
>> @@ -279,10 +281,16 @@ static int neigh_forced_gc(struct neigh_table *tbl)
>> if (shrunk >= max_clean)
>> break;
>> }
>> + if (++loop == 16) {
>> + if (ktime_get_ns() > tmax)
>> + goto unlock;
>> + loop = 0;
>> + }
>> }
>>
>> WRITE_ONCE(tbl->last_flush, jiffies);
>>
>> +unlock:
>> write_unlock_bh(&tbl->lock);
>
> I'm curious what the plan here is. Your patch looks OK to me and I
> could give it a weak Reviewed-by, but I don't know the code well
> enough to know if we also need to address your second comment that we
> need to "use a second list to make sure to cycle over all elements
> eventually". Is that something you'd expect to get resolved before
> landing?
>
> Thanks! :-)
entries are added to the gc_list at the tail, so it should be ok to take
a break. It will pickup at the head on the next trip through.
On Tue, Dec 5, 2023 at 12:40 AM Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Fri, Dec 1, 2023 at 1:10 AM Eric Dumazet <[email protected]> wrote:
> >
> > On Fri, Dec 1, 2023 at 9:39 AM Judy Hsiao <[email protected]> wrote:
> > >
> > > We are seeing cases where neigh_cleanup_and_release() is called by
> > > neigh_forced_gc() many times in a row with preemption turned off.
> > > When running on a low powered CPU at a low CPU frequency, this has
> > > been measured to keep preemption off for ~10 ms. That's not great on a
> > > system with HZ=1000 which expects tasks to be able to schedule in
> > > with ~1ms latency.
> >
> > This will not work in general, because this code runs with BH blocked.
> >
> > jiffies will stay untouched for many more ms on systems with only one CPU.
> >
> > I would rather not rely on jiffies here but ktime_get_ns() [1]
> >
> > Also if we break the loop based on time, we might be unable to purge
> > the last elements in gc_list.
> > We might need to use a second list to make sure to cycle over all
> > elements eventually.
> >
> >
> > [1]
> > diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> > index df81c1f0a57047e176b7c7e4809d2dae59ba6be5..e2340e6b07735db8cf6e75d23ef09bb4b0db53b4
> > 100644
> > --- a/net/core/neighbour.c
> > +++ b/net/core/neighbour.c
> > @@ -253,9 +253,11 @@ static int neigh_forced_gc(struct neigh_table *tbl)
> > {
> > int max_clean = atomic_read(&tbl->gc_entries) -
> > READ_ONCE(tbl->gc_thresh2);
> > + u64 tmax = ktime_get_ns() + NSEC_PER_MSEC;
> > unsigned long tref = jiffies - 5 * HZ;
> > struct neighbour *n, *tmp;
> > int shrunk = 0;
> > + int loop = 0;
> >
> > NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs);
> >
> > @@ -279,10 +281,16 @@ static int neigh_forced_gc(struct neigh_table *tbl)
> > if (shrunk >= max_clean)
> > break;
> > }
> > + if (++loop == 16) {
> > + if (ktime_get_ns() > tmax)
> > + goto unlock;
> > + loop = 0;
> > + }
> > }
> >
> > WRITE_ONCE(tbl->last_flush, jiffies);
> >
> > +unlock:
> > write_unlock_bh(&tbl->lock);
>
> I'm curious what the plan here is. Your patch looks OK to me and I
> could give it a weak Reviewed-by, but I don't know the code well
> enough to know if we also need to address your second comment that we
> need to "use a second list to make sure to cycle over all elements
> eventually". Is that something you'd expect to get resolved before
> landing?
Please Judy send a V2 of the patch.
I gave feedback, my intention was not to author the patch.
This is standard procedure, I now realize this was Judy first
contribution, sorry for not making this clear.
I will simply add a "Reviewed-by: ..." tag when I agree with the result.
On Tue, Dec 5, 2023 at 9:00 AM Eric Dumazet <[email protected]> wrote:
>
> Please Judy send a V2 of the patch.
>
> I gave feedback, my intention was not to author the patch.
>
> This is standard procedure, I now realize this was Judy first
> contribution, sorry for not making this clear.
I meant : "first contribution" in a networking tree, let me clarify that.
>
> I will simply add a "Reviewed-by: ..." tag when I agree with the result.