When initializing a 'struct klp_object' in klp_init_object_loaded(), and
performing relocations in klp_resolve_symbols(), klp_find_object_symbol()
is invoked to look up the address of a symbol in an already-loaded module
(or vmlinux). This, in turn, calls kallsyms_on_each_symbol() or
module_kallsyms_on_each_symbol() to find the address of the symbol that is
being patched.
It turns out that symbol lookups often take up the most CPU time when
enabling and disabling a patch, and may hog the CPU and cause other tasks
on that CPU's runqueue to starve -- even in paths where interrupts are
enabled. For example, under certain workloads, enabling a KLP patch with
many objects or functions may cause ksoftirqd to be starved, and thus for
interrupts to be backlogged and delayed. This may end up causing TCP
retransmits on the host where the KLP patch is being applied, and in
general, may cause any interrupts serviced by softirqd to be delayed while
the patch is being applied.
So as to ensure that kallsyms_on_each_symbol() does not end up hogging the
CPU, this patch adds a call to cond_resched() in kallsyms_on_each_symbol()
and module_kallsyms_on_each_symbol(), which are invoked when doing a symbol
lookup in vmlinux and a module respectively. Without this patch, if a
live-patch is applied on a 36-core Intel host with heavy TCP traffic, a
~10x spike is observed in TCP retransmits while the patch is being applied.
Additionally, collecting sched events with perf indicates that ksoftirqd is
awakened ~1.3 seconds before it's eventually scheduled. With the patch, no
increase in TCP retransmit events is observed, and ksoftirqd is scheduled
shortly after it's awakened.
Signed-off-by: David Vernet <[email protected]>
---
kernel/kallsyms.c | 1 +
kernel/module.c | 2 ++
2 files changed, 3 insertions(+)
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 0ba87982d017..2a9afe484aec 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -223,6 +223,7 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
ret = fn(data, namebuf, NULL, kallsyms_sym_address(i));
if (ret != 0)
return ret;
+ cond_resched();
}
return 0;
}
diff --git a/kernel/module.c b/kernel/module.c
index 40ec9a030eec..c96160f7f3f5 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -4462,6 +4462,8 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *,
mod, kallsyms_symbol_value(sym));
if (ret != 0)
goto out;
+
+ cond_resched();
}
}
out:
--
2.30.2
Adding modules + BPF list and maintainers to this thread.
David Vernet <[email protected]> wrote on Wed [2021-Dec-29 13:56:47 -0800]:
> When initializing a 'struct klp_object' in klp_init_object_loaded(), and
> performing relocations in klp_resolve_symbols(), klp_find_object_symbol()
> is invoked to look up the address of a symbol in an already-loaded module
> (or vmlinux). This, in turn, calls kallsyms_on_each_symbol() or
> module_kallsyms_on_each_symbol() to find the address of the symbol that is
> being patched.
>
> It turns out that symbol lookups often take up the most CPU time when
> enabling and disabling a patch, and may hog the CPU and cause other tasks
> on that CPU's runqueue to starve -- even in paths where interrupts are
> enabled. For example, under certain workloads, enabling a KLP patch with
> many objects or functions may cause ksoftirqd to be starved, and thus for
> interrupts to be backlogged and delayed. This may end up causing TCP
> retransmits on the host where the KLP patch is being applied, and in
> general, may cause any interrupts serviced by softirqd to be delayed while
> the patch is being applied.
>
> So as to ensure that kallsyms_on_each_symbol() does not end up hogging the
> CPU, this patch adds a call to cond_resched() in kallsyms_on_each_symbol()
> and module_kallsyms_on_each_symbol(), which are invoked when doing a symbol
> lookup in vmlinux and a module respectively. Without this patch, if a
> live-patch is applied on a 36-core Intel host with heavy TCP traffic, a
> ~10x spike is observed in TCP retransmits while the patch is being applied.
> Additionally, collecting sched events with perf indicates that ksoftirqd is
> awakened ~1.3 seconds before it's eventually scheduled. With the patch, no
> increase in TCP retransmit events is observed, and ksoftirqd is scheduled
> shortly after it's awakened.
>
> Signed-off-by: David Vernet <[email protected]>
> ---
> kernel/kallsyms.c | 1 +
> kernel/module.c | 2 ++
> 2 files changed, 3 insertions(+)
>
> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> index 0ba87982d017..2a9afe484aec 100644
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -223,6 +223,7 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
> ret = fn(data, namebuf, NULL, kallsyms_sym_address(i));
> if (ret != 0)
> return ret;
> + cond_resched();
> }
> return 0;
> }
> diff --git a/kernel/module.c b/kernel/module.c
> index 40ec9a030eec..c96160f7f3f5 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -4462,6 +4462,8 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *,
> mod, kallsyms_symbol_value(sym));
> if (ret != 0)
> goto out;
> +
> + cond_resched();
> }
> }
> out:
> --
> 2.30.2
>
On Thu, Dec 30, 2021 at 09:46:50AM IST, David Vernet wrote:
> Adding modules + BPF list and maintainers to this thread.
>
Thanks for the Cc, but I'm dropping my patch for the next revision, so there
should be no conflicts with this one.
> [...]
--
Kartikeya
On Wed 2021-12-29 20:16:50, David Vernet wrote:
> Adding modules + BPF list and maintainers to this thread.
>
> David Vernet <[email protected]> wrote on Wed [2021-Dec-29 13:56:47 -0800]:
> > When initializing a 'struct klp_object' in klp_init_object_loaded(), and
> > performing relocations in klp_resolve_symbols(), klp_find_object_symbol()
> > is invoked to look up the address of a symbol in an already-loaded module
> > (or vmlinux). This, in turn, calls kallsyms_on_each_symbol() or
> > module_kallsyms_on_each_symbol() to find the address of the symbol that is
> > being patched.
> >
> > It turns out that symbol lookups often take up the most CPU time when
> > enabling and disabling a patch, and may hog the CPU and cause other tasks
> > on that CPU's runqueue to starve -- even in paths where interrupts are
> > enabled. For example, under certain workloads, enabling a KLP patch with
> > many objects or functions may cause ksoftirqd to be starved, and thus for
^^^^^^^^^^^^^^^^^^^^^^^^^
This suggests that a single kallsyms_on_each_symbol() is not a big
problem. cond_resched() might be called non-necessarily often there.
I wonder if it would be enough to add cond_resched() into the two
loops calling klp_find_object_symbol().
That said, kallsyms_on_each_symbol() is a slow path and there might
be many symbols. So, it might be the right place.
I am just thinking loudly. I do not have strong opinion. I am fine
with any cond_resched() location that solves the problem. Feel free
to use:
Acked-by: Petr Mladek <[email protected]>
Best Regards,
Petr
> > interrupts to be backlogged and delayed. This may end up causing TCP
> > retransmits on the host where the KLP patch is being applied, and in
> > general, may cause any interrupts serviced by softirqd to be delayed while
> > the patch is being applied.
> >
> > So as to ensure that kallsyms_on_each_symbol() does not end up hogging the
> > CPU, this patch adds a call to cond_resched() in kallsyms_on_each_symbol()
> > and module_kallsyms_on_each_symbol(), which are invoked when doing a symbol
> > lookup in vmlinux and a module respectively. Without this patch, if a
> > live-patch is applied on a 36-core Intel host with heavy TCP traffic, a
> > ~10x spike is observed in TCP retransmits while the patch is being applied.
> > Additionally, collecting sched events with perf indicates that ksoftirqd is
> > awakened ~1.3 seconds before it's eventually scheduled. With the patch, no
> > increase in TCP retransmit events is observed, and ksoftirqd is scheduled
> > shortly after it's awakened.
> >
> > Signed-off-by: David Vernet <[email protected]>
> > ---
> > kernel/kallsyms.c | 1 +
> > kernel/module.c | 2 ++
> > 2 files changed, 3 insertions(+)
> >
> > diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> > index 0ba87982d017..2a9afe484aec 100644
> > --- a/kernel/kallsyms.c
> > +++ b/kernel/kallsyms.c
> > @@ -223,6 +223,7 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
> > ret = fn(data, namebuf, NULL, kallsyms_sym_address(i));
> > if (ret != 0)
> > return ret;
> > + cond_resched();
> > }
> > return 0;
> > }
> > diff --git a/kernel/module.c b/kernel/module.c
> > index 40ec9a030eec..c96160f7f3f5 100644
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -4462,6 +4462,8 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *,
> > mod, kallsyms_symbol_value(sym));
> > if (ret != 0)
> > goto out;
> > +
> > + cond_resched();
> > }
> > }
> > out:
> > --
> > 2.30.2
> >
On Wed, 29 Dec 2021, David Vernet wrote:
> When initializing a 'struct klp_object' in klp_init_object_loaded(), and
> performing relocations in klp_resolve_symbols(), klp_find_object_symbol()
> is invoked to look up the address of a symbol in an already-loaded module
> (or vmlinux). This, in turn, calls kallsyms_on_each_symbol() or
> module_kallsyms_on_each_symbol() to find the address of the symbol that is
> being patched.
>
> It turns out that symbol lookups often take up the most CPU time when
> enabling and disabling a patch, and may hog the CPU and cause other tasks
> on that CPU's runqueue to starve -- even in paths where interrupts are
> enabled. For example, under certain workloads, enabling a KLP patch with
> many objects or functions may cause ksoftirqd to be starved, and thus for
> interrupts to be backlogged and delayed. This may end up causing TCP
> retransmits on the host where the KLP patch is being applied, and in
> general, may cause any interrupts serviced by softirqd to be delayed while
> the patch is being applied.
>
> So as to ensure that kallsyms_on_each_symbol() does not end up hogging the
> CPU, this patch adds a call to cond_resched() in kallsyms_on_each_symbol()
> and module_kallsyms_on_each_symbol(), which are invoked when doing a symbol
> lookup in vmlinux and a module respectively. Without this patch, if a
> live-patch is applied on a 36-core Intel host with heavy TCP traffic, a
> ~10x spike is observed in TCP retransmits while the patch is being applied.
> Additionally, collecting sched events with perf indicates that ksoftirqd is
> awakened ~1.3 seconds before it's eventually scheduled. With the patch, no
> increase in TCP retransmit events is observed, and ksoftirqd is scheduled
> shortly after it's awakened.
>
> Signed-off-by: David Vernet <[email protected]>
Acked-by: Miroslav Benes <[email protected]>
I had similar ideas Petr mentioned elsewhere, but I, also, have no strong
opinion about it. Especially when livepatch is the only user of the said
interface.
M
On Wed, Dec 29, 2021 at 1:57 PM David Vernet <[email protected]> wrote:
>
> When initializing a 'struct klp_object' in klp_init_object_loaded(), and
> performing relocations in klp_resolve_symbols(), klp_find_object_symbol()
> is invoked to look up the address of a symbol in an already-loaded module
> (or vmlinux). This, in turn, calls kallsyms_on_each_symbol() or
> module_kallsyms_on_each_symbol() to find the address of the symbol that is
> being patched.
>
> It turns out that symbol lookups often take up the most CPU time when
> enabling and disabling a patch, and may hog the CPU and cause other tasks
> on that CPU's runqueue to starve -- even in paths where interrupts are
> enabled. For example, under certain workloads, enabling a KLP patch with
> many objects or functions may cause ksoftirqd to be starved, and thus for
> interrupts to be backlogged and delayed. This may end up causing TCP
> retransmits on the host where the KLP patch is being applied, and in
> general, may cause any interrupts serviced by softirqd to be delayed while
> the patch is being applied.
>
> So as to ensure that kallsyms_on_each_symbol() does not end up hogging the
> CPU, this patch adds a call to cond_resched() in kallsyms_on_each_symbol()
> and module_kallsyms_on_each_symbol(), which are invoked when doing a symbol
> lookup in vmlinux and a module respectively. Without this patch, if a
> live-patch is applied on a 36-core Intel host with heavy TCP traffic, a
> ~10x spike is observed in TCP retransmits while the patch is being applied.
> Additionally, collecting sched events with perf indicates that ksoftirqd is
> awakened ~1.3 seconds before it's eventually scheduled. With the patch, no
> increase in TCP retransmit events is observed, and ksoftirqd is scheduled
> shortly after it's awakened.
>
> Signed-off-by: David Vernet <[email protected]>
Acked-by: Song Liu <[email protected]>
PS: Do we observe livepatch takes a longer time to load after this change?
(I believe longer time shouldn't be a problem at all. Just curious.)
On Thu 2022-01-06 16:21:18, Song Liu wrote:
> On Wed, Dec 29, 2021 at 1:57 PM David Vernet <[email protected]> wrote:
> >
> > When initializing a 'struct klp_object' in klp_init_object_loaded(), and
> > performing relocations in klp_resolve_symbols(), klp_find_object_symbol()
> > is invoked to look up the address of a symbol in an already-loaded module
> > (or vmlinux). This, in turn, calls kallsyms_on_each_symbol() or
> > module_kallsyms_on_each_symbol() to find the address of the symbol that is
> > being patched.
> >
> > It turns out that symbol lookups often take up the most CPU time when
> > enabling and disabling a patch, and may hog the CPU and cause other tasks
> > on that CPU's runqueue to starve -- even in paths where interrupts are
> > enabled. For example, under certain workloads, enabling a KLP patch with
> > many objects or functions may cause ksoftirqd to be starved, and thus for
> > interrupts to be backlogged and delayed. This may end up causing TCP
> > retransmits on the host where the KLP patch is being applied, and in
> > general, may cause any interrupts serviced by softirqd to be delayed while
> > the patch is being applied.
> >
> > So as to ensure that kallsyms_on_each_symbol() does not end up hogging the
> > CPU, this patch adds a call to cond_resched() in kallsyms_on_each_symbol()
> > and module_kallsyms_on_each_symbol(), which are invoked when doing a symbol
> > lookup in vmlinux and a module respectively. Without this patch, if a
> > live-patch is applied on a 36-core Intel host with heavy TCP traffic, a
> > ~10x spike is observed in TCP retransmits while the patch is being applied.
> > Additionally, collecting sched events with perf indicates that ksoftirqd is
> > awakened ~1.3 seconds before it's eventually scheduled. With the patch, no
> > increase in TCP retransmit events is observed, and ksoftirqd is scheduled
> > shortly after it's awakened.
> >
> > Signed-off-by: David Vernet <[email protected]>
>
> Acked-by: Song Liu <[email protected]>
>
> PS: Do we observe livepatch takes a longer time to load after this change?
> (I believe longer time shouldn't be a problem at all. Just curious.)
It should depend on the load of the system and the number of patched
symbols. The module is typically loaded with a normal priority
process.
The commit message talks about 1.3 seconds delay of ksoftirq. In
principle, the change caused that this 1.3 sec of a single CPU time
was interleaved with other scheduled tasks on the same CPU. I would
expect that it prolonged the load just by a couple of seconds in
the described use case.
Note that the change has effect only with voluntary scheduling.
Well, it is typically used on servers where the livepatching
makes sense.
Best Regards,
Petr
On Wed 2021-12-29 13:56:47, David Vernet wrote:
> When initializing a 'struct klp_object' in klp_init_object_loaded(), and
> performing relocations in klp_resolve_symbols(), klp_find_object_symbol()
> is invoked to look up the address of a symbol in an already-loaded module
> (or vmlinux). This, in turn, calls kallsyms_on_each_symbol() or
> module_kallsyms_on_each_symbol() to find the address of the symbol that is
> being patched.
>
> It turns out that symbol lookups often take up the most CPU time when
> enabling and disabling a patch, and may hog the CPU and cause other tasks
> on that CPU's runqueue to starve -- even in paths where interrupts are
> enabled. For example, under certain workloads, enabling a KLP patch with
> many objects or functions may cause ksoftirqd to be starved, and thus for
> interrupts to be backlogged and delayed. This may end up causing TCP
> retransmits on the host where the KLP patch is being applied, and in
> general, may cause any interrupts serviced by softirqd to be delayed while
> the patch is being applied.
>
> So as to ensure that kallsyms_on_each_symbol() does not end up hogging the
> CPU, this patch adds a call to cond_resched() in kallsyms_on_each_symbol()
> and module_kallsyms_on_each_symbol(), which are invoked when doing a symbol
> lookup in vmlinux and a module respectively. Without this patch, if a
> live-patch is applied on a 36-core Intel host with heavy TCP traffic, a
> ~10x spike is observed in TCP retransmits while the patch is being applied.
> Additionally, collecting sched events with perf indicates that ksoftirqd is
> awakened ~1.3 seconds before it's eventually scheduled. With the patch, no
> increase in TCP retransmit events is observed, and ksoftirqd is scheduled
> shortly after it's awakened.
>
> Signed-off-by: David Vernet <[email protected]>
OK, there was not any strong pushback.
I have committed the patch into livepatch.git, branch for-5.17/kallsyms.
Best Regards,
Petr
On 12/29/21 4:56 PM, David Vernet wrote:
> For example, under certain workloads, enabling a KLP patch with
> many objects or functions may cause ksoftirqd to be starved, and thus for
> interrupts to be backlogged and delayed.
Just curious, approximately how many objects/functions does it take to
hit this condition? While the livepatching kselftests are more about
API and kernel correctness, this sounds like an interesting test case
for some of the other (out of tree) test suites.
Thanks,
--
Joe
On Fri, Jan 7, 2022 at 6:13 AM Joe Lawrence <[email protected]> wrote:
>
> On 12/29/21 4:56 PM, David Vernet wrote:
> > For example, under certain workloads, enabling a KLP patch with
> > many objects or functions may cause ksoftirqd to be starved, and thus for
> > interrupts to be backlogged and delayed.
>
> Just curious, approximately how many objects/functions does it take to
> hit this condition? While the livepatching kselftests are more about
> API and kernel correctness, this sounds like an interesting test case
> for some of the other (out of tree) test suites.
Not many patched functions. We only do small fixes at the moment. In the recent
example, we hit the issue with ~10 patched functions. Another version
with 2 to 3
patched function seems fine.
Yes, I think this is an important test case.
Song
Apologies all for the delayed response -- I was still on holiday last week.
Petr Mladek <[email protected]> wrote on Mon [2022-Jan-03 17:04:31 +0100]:
> > > It turns out that symbol lookups often take up the most CPU time when
> > > enabling and disabling a patch, and may hog the CPU and cause other tasks
> > > on that CPU's runqueue to starve -- even in paths where interrupts are
> > > enabled. For example, under certain workloads, enabling a KLP patch with
> > > many objects or functions may cause ksoftirqd to be starved, and thus for
> ^^^^^^^^^^^^^^^^^^^^^^^^^
> This suggests that a single kallsyms_on_each_symbol() is not a big
> problem. cond_resched() might be called non-necessarily often there.
> I wonder if it would be enough to add cond_resched() into the two
> loops calling klp_find_object_symbol().
In the initial version of the patch I was intending to send out, I actually
had the cond_resched() in klp_find_object_symbol(). Having it there did
appear to fix the ksoftirqd starvation issue, but I elected to put it in
klp_find_object_symbol() after Chris (cc'd) suggested it because
cond_resched() is so lightweight, and it didn't affect the runtime for
livepatching in my experiments.
> That said, kallsyms_on_each_symbol() is a slow path and there might
> be many symbols. So, it might be the right place.
Yes, my thinking was that because it didn't seem to affect throughput, and
because it would could potentially cause the same ssue to occur if it were
ever called elsewhere, that this was the correct place for it.
Petr Mladek <[email protected]> wrote on Fri [2022-Jan-07 09:17:52 +0100]:
> On Thu 2022-01-06 16:21:18, Song Liu wrote:
> > PS: Do we observe livepatch takes a longer time to load after this change?
> > (I believe longer time shouldn't be a problem at all. Just curious.)
>
> It should depend on the load of the system and the number of patched
> symbols. The module is typically loaded with a normal priority
> process.
>
> The commit message talks about 1.3 seconds delay of ksoftirq. In
> principle, the change caused that this 1.3 sec of a single CPU time
> was interleaved with other scheduled tasks on the same CPU. I would
> expect that it prolonged the load just by a couple of seconds in
> the described use case.
Just to add a data point here for the record, I didn't observe any increase
in latency when applying a livepatch with this change. It took roughly ~2
+/- ~.5 seconds both with and without the change. Obviously that number
doesn't mean much without knowing the specifics of the patch and what
workloads were running on the host, but in general the invocations to
cond_resched() patch did not seem to materially affect the overhead of
livepatching.
The only time I would expect it to cause longer livepatches is when there
are a lot of tasks that need the CPU, which is dependent on system load as
Petr said. I'd argue that in this case, the patch would be working as
intended as hogging the CPU for 1 or more seconds seems like a recipe for
problems.
On 1/7/22 11:46 AM, Song Liu wrote:
> On Fri, Jan 7, 2022 at 6:13 AM Joe Lawrence <[email protected]> wrote:
>>
>> On 12/29/21 4:56 PM, David Vernet wrote:
>>> For example, under certain workloads, enabling a KLP patch with
>>> many objects or functions may cause ksoftirqd to be starved, and thus for
>>> interrupts to be backlogged and delayed.
>>
>> Just curious, approximately how many objects/functions does it take to
>> hit this condition? While the livepatching kselftests are more about
>> API and kernel correctness, this sounds like an interesting test case
>> for some of the other (out of tree) test suites.
>
> Not many patched functions. We only do small fixes at the moment. In the recent
> example, we hit the issue with ~10 patched functions. Another version
> with 2 to 3
> patched function seems fine.
>
> Yes, I think this is an important test case.
>
Thanks, Song. If you can share any test setup details, I'll pass those
along to our internal QE group. And once merged, we'll be adding this
one to the list of backports for our distro.
--
Joe
On Mon, Jan 10, 2022 at 8:17 AM Joe Lawrence <[email protected]> wrote:
>
> On 1/7/22 11:46 AM, Song Liu wrote:
> > On Fri, Jan 7, 2022 at 6:13 AM Joe Lawrence <[email protected]> wrote:
> >>
> >> On 12/29/21 4:56 PM, David Vernet wrote:
> >>> For example, under certain workloads, enabling a KLP patch with
> >>> many objects or functions may cause ksoftirqd to be starved, and thus for
> >>> interrupts to be backlogged and delayed.
> >>
> >> Just curious, approximately how many objects/functions does it take to
> >> hit this condition? While the livepatching kselftests are more about
> >> API and kernel correctness, this sounds like an interesting test case
> >> for some of the other (out of tree) test suites.
> >
> > Not many patched functions. We only do small fixes at the moment. In the recent
> > example, we hit the issue with ~10 patched functions. Another version
> > with 2 to 3
> > patched function seems fine.
> >
> > Yes, I think this is an important test case.
> >
>
> Thanks, Song. If you can share any test setup details, I'll pass those
> along to our internal QE group. And once merged, we'll be adding this
> one to the list of backports for our distro.
We get this on shadow production traffic of our web service, which we cannot
recreate out of our data centers.
I tried to use iperf to generate traffic (among a few hosts), but it
doesn't work.
At the moment, I am not sure what's the easiest way to repro this.
Thanks,
Song