2020-03-26 16:34:06

by Jann Horn

[permalink] [raw]
Subject: [PATCH 0/2] Make printk_deferred() work properly before percpu setup is done

While I was doing some development work, I noticed that if you call
printk_deferred() before percpu setup has finished, stuff breaks, and
e.g. "dmesg -w" fails to print new messages.

This happens because writing to percpu memory before percpu
initialization is done causes the modified percpu memory to be
propagated from the boot CPU to all the secondary CPUs; and both the
printk code as well as the irq_work implementation use percpu memory.

I think that printk_deferred() ought to work even before percpu
initialization, since it is used by things like pr_warn_ratelimited()
and the unwinder infrastructure. I'm not entirely sure though whether
this is the best way to implement that, or whether it would be better to
let printk_deferred() do something different if it is called during
early boot.

Jann Horn (2):
irq_work: Reinitialize list heads for secondary CPUs
printk: Reinitialize klogd percpu state for secondary CPUs

kernel/irq_work.c | 22 ++++++++++++++++++++++
kernel/printk/printk.c | 23 +++++++++++++++++++++++
2 files changed, 45 insertions(+)


base-commit: 76ccd234269bd05debdbc12c96eafe62dd9a6180
--
2.25.1.696.g5e7596f4ac-goog


2020-03-26 16:34:36

by Jann Horn

[permalink] [raw]
Subject: [PATCH 1/2] irq_work: Reinitialize list heads for secondary CPUs

When printk_deferred() is used before percpu initialization, it will queue
up lazy IRQ work on the boot CPU; percpu initialization then copies the
work list head to all secondary CPUs. To ensure that the secondary CPUs
don't re-execute the boot CPU's work and whatever its ->next pointer leads
to, zero out the secondary CPUs' work list heads before bringing up SMP.

Signed-off-by: Jann Horn <[email protected]>
---
kernel/irq_work.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 828cc30774bc..903e5be9aebf 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -152,6 +152,7 @@ static void irq_work_run_list(struct llist_head *list)
* while we are in the middle of the func.
*/
flags = atomic_fetch_andnot(IRQ_WORK_PENDING, &work->flags);
+ WARN_ON_ONCE((flags & IRQ_WORK_PENDING) == 0);

work->func(work);
/*
@@ -195,3 +196,24 @@ void irq_work_sync(struct irq_work *work)
cpu_relax();
}
EXPORT_SYMBOL_GPL(irq_work_sync);
+
+/*
+ * If we queued up IRQ work before fully initializing the percpu subsystem, e.g.
+ * via printk_deferred(), the head pointer of the boot CPU will have been copied
+ * over to all the other CPUs.
+ * To fix that, manually initialize the list heads of all secondary processors
+ * before bringing up SMP.
+ */
+static int __init irq_work_percpu_fixup(void)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ if (cpu == smp_processor_id())
+ continue;
+ per_cpu(raised_list.first, cpu) = NULL;
+ per_cpu(lazy_list.first, cpu) = NULL;
+ }
+ return 0;
+}
+early_initcall(irq_work_percpu_fixup)
--
2.25.1.696.g5e7596f4ac-goog

2020-03-26 16:34:55

by Jann Horn

[permalink] [raw]
Subject: [PATCH 2/2] printk: Reinitialize klogd percpu state for secondary CPUs

When printk_deferred() is used before percpu initialization, it will queue
up wake_up_klogd_work and set printk_pending on the boot CPU; percpu
initialization then copies the the pending work's state and the
printk_pending flag to all secondary CPUs. The end result is that e.g.
if you run "dmesg -w" when the system is up, it won't notice new printk
messages.
To ensure that the secondary CPUs don't think that they already have
pending printk work, reset the secondary CPUs' work items and
printk_pending state.

Signed-off-by: Jann Horn <[email protected]>
---
kernel/printk/printk.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index fada22dc4ab6..e531407188f1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2964,6 +2964,29 @@ static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {
.flags = ATOMIC_INIT(IRQ_WORK_LAZY),
};

+/*
+ * If we called defer_console_output() before fully initializing the percpu
+ * subsystem, e.g. via printk_deferred(), the irq_work flags and the
+ * printk_pending flag may have been copied from the boot CPU to the others
+ * while work was pending.
+ * To fix that, manually reset the percpu data for all secondary processors
+ * before bringing up SMP.
+ */
+static int __init klogd_percpu_fixup(void)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ if (cpu == smp_processor_id())
+ continue;
+ atomic_set(&per_cpu(wake_up_klogd_work.flags, cpu),
+ IRQ_WORK_LAZY);
+ per_cpu(printk_pending, cpu) = 0;
+ }
+ return 0;
+}
+early_initcall(klogd_percpu_fixup)
+
void wake_up_klogd(void)
{
preempt_disable();
--
2.25.1.696.g5e7596f4ac-goog

2020-03-27 02:45:43

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH 0/2] Make printk_deferred() work properly before percpu setup is done

On (20/03/26 17:32), Jann Horn wrote:
> While I was doing some development work, I noticed that if you call
> printk_deferred() before percpu setup has finished, stuff breaks, and
> e.g. "dmesg -w" fails to print new messages.
>
> This happens because writing to percpu memory before percpu
> initialization is done causes the modified percpu memory to be
> propagated from the boot CPU to all the secondary CPUs; and both the
> printk code as well as the irq_work implementation use percpu memory.
>
> I think that printk_deferred() ought to work even before percpu
> initialization, since it is used by things like pr_warn_ratelimited()
> and the unwinder infrastructure. I'm not entirely sure though whether
> this is the best way to implement that, or whether it would be better to
> let printk_deferred() do something different if it is called during
> early boot.

Hi Jann,

I believe we have a patch for this issue

https://lore.kernel.org/lkml/[email protected]/

-ss

2020-03-27 03:07:27

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH 0/2] Make printk_deferred() work properly before percpu setup is done

On Fri, Mar 27, 2020 at 3:45 AM Sergey Senozhatsky
<[email protected]> wrote:
> On (20/03/26 17:32), Jann Horn wrote:
> > While I was doing some development work, I noticed that if you call
> > printk_deferred() before percpu setup has finished, stuff breaks, and
> > e.g. "dmesg -w" fails to print new messages.
> >
> > This happens because writing to percpu memory before percpu
> > initialization is done causes the modified percpu memory to be
> > propagated from the boot CPU to all the secondary CPUs; and both the
> > printk code as well as the irq_work implementation use percpu memory.
> >
> > I think that printk_deferred() ought to work even before percpu
> > initialization, since it is used by things like pr_warn_ratelimited()
> > and the unwinder infrastructure. I'm not entirely sure though whether
> > this is the best way to implement that, or whether it would be better to
> > let printk_deferred() do something different if it is called during
> > early boot.
>
> Hi Jann,
>
> I believe we have a patch for this issue
>
> https://lore.kernel.org/lkml/[email protected]/

Ah, thanks, I didn't think of searching the list archives for an
existing pending patch...

2020-03-27 03:11:52

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH 0/2] Make printk_deferred() work properly before percpu setup is done

On (20/03/27 04:05), Jann Horn wrote:
> > Hi Jann,
> >
> > I believe we have a patch for this issue
> >
> > https://lore.kernel.org/lkml/[email protected]/
>
> Ah, thanks, I didn't think of searching the list archives for an
> existing pending patch...

No worries! Is there any chance you can check if that patch addresses
all of the issues you've been running into?

-ss