2020-02-27 11:10:19

by Lech Perczak

[permalink] [raw]
Subject: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

Hello,

After upgrading kernel on our boards from v4.19.105 to v4.19.106 we found out that syslog fails to read the messages after ones read initially after opening /proc/kmsg just after booting.
I also found out, that output of 'dmesg --follow' also doesn't react on new printks appearing for whatever reason - to read new messages, reopening /proc/kmsg or /dev/kmsg was needed.
I bisected this down to commit 15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a lockdep splat with printk()"), and reverting it on top of v4.19.106 restored correct behaviour.

My test scenario for bisecting was:
1. run 'dmesg --follow' as root
2. run 'echo t > /proc/sysrq-trigger'
3. If trace appears in dmesg output -> good, otherwise, bad. If trace doesn't appear in output of 'dmesg --follow', re-running it will show the trace.

I ran my tests on Debian 10.3 with configuration based directly on one from 4.19.0-8-amd64 (4.19.98-1) in Qemu.
I could reproduce the same issue on several boards with x86 and ARMv7 CPUs alike, with 100% reproducibility.

I haven't yet digged into why exactly this commit breaks notifications for readers of /proc/kmsg and /dev/kmsg, but as reverting it fixed the issue, I'm pretty sure this is the one. It is possible that the same happened in 5.4 line, bu I hadn't had a chance to test this as well yet.

For reference, here is my bisect log:
git bisect start
# bad: [f25804f389846835535db255e7ba80eeed967ed7] Linux 4.19.106
git bisect bad f25804f389846835535db255e7ba80eeed967ed7
# good: [4fccc2503536a564a4ba31a1d50439854201659f] Linux 4.19.105
git bisect good 4fccc2503536a564a4ba31a1d50439854201659f
# bad: [aec48d8d0e6e291f61d0a0741bef2f8cc0712584] clk: uniphier: Add SCSSI clock gate for each channel
git bisect bad aec48d8d0e6e291f61d0a0741bef2f8cc0712584
# bad: [5fe69d2ba08f516ace6496d40ac47fe33d3d4ec7] KVM: s390: ENOTSUPP -> EOPNOTSUPP fixups
git bisect bad 5fe69d2ba08f516ace6496d40ac47fe33d3d4ec7
# bad: [15341b1dd409749fa5625e4b632013b6ba81609b] char/random: silence a lockdep splat with printk()
git bisect bad 15341b1dd409749fa5625e4b632013b6ba81609b
# good: [b9dc4d61b5c2d8ea289087f57898426017431391] cpu/hotplug, stop_machine: Fix stop_machine vs hotplug order
git bisect good b9dc4d61b5c2d8ea289087f57898426017431391
# good: [e5c8d49b9be04080f5aab67ea36ac65a30758442] pinctrl: sh-pfc: sh7264: Fix CAN function GPIOs
git bisect good e5c8d49b9be04080f5aab67ea36ac65a30758442
# good: [67f7f0c7e9f4a6c5215e78728261b1d4ab57dcc6] powerpc/powernv/iov: Ensure the pdn for VFs always contains a valid PE number
git bisect good 67f7f0c7e9f4a6c5215e78728261b1d4ab57dcc6
# good: [4802b257d352e48a0b10886b2d1a12b366a17c48] iommu/vt-d: Fix off-by-one in PASID allocation
git bisect good 4802b257d352e48a0b10886b2d1a12b366a17c48
# first bad commit: [15341b1dd409749fa5625e4b632013b6ba81609b] char/random: silence a lockdep splat with printk()

--
Pozdrawiam/With kind regards,
Lech Perczak

Sr. Software Engineer
Camlin Technologies Poland Limited Sp. z o.o.


2020-02-27 12:38:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

On Thu, Feb 27, 2020 at 11:09:49AM +0000, Lech Perczak wrote:
> Hello,
>
> After upgrading kernel on our boards from v4.19.105 to v4.19.106 we found out that syslog fails to read the messages after ones read initially after opening /proc/kmsg just after booting.
> I also found out, that output of 'dmesg --follow' also doesn't react on new printks appearing for whatever reason - to read new messages, reopening /proc/kmsg or /dev/kmsg was needed.
> I bisected this down to commit 15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a lockdep splat with printk()"), and reverting it on top of v4.19.106 restored correct behaviour.

That is really really odd.

> My test scenario for bisecting was:
> 1. run 'dmesg --follow' as root
> 2. run 'echo t > /proc/sysrq-trigger'
> 3. If trace appears in dmesg output -> good, otherwise, bad. If trace doesn't appear in output of 'dmesg --follow', re-running it will show the trace.
>
> I ran my tests on Debian 10.3 with configuration based directly on one from 4.19.0-8-amd64 (4.19.98-1) in Qemu.
> I could reproduce the same issue on several boards with x86 and ARMv7 CPUs alike, with 100% reproducibility.
>
> I haven't yet digged into why exactly this commit breaks notifications for readers of /proc/kmsg and /dev/kmsg, but as reverting it fixed the issue, I'm pretty sure this is the one. It is possible that the same happened in 5.4 line, bu I hadn't had a chance to test this as well yet.

I can revert this, but it feels like there is something else going wrong
here. Can you try the 5.4 tree to see if that too has your same
problem?

thanks,

greg k-h

2020-02-27 12:40:54

by Lech Perczak

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

W dniu 27.02.2020 o 13:36, Greg Kroah-Hartman pisze:
> On Thu, Feb 27, 2020 at 11:09:49AM +0000, Lech Perczak wrote:
>> Hello,
>>
>> After upgrading kernel on our boards from v4.19.105 to v4.19.106 we found out that syslog fails to read the messages after ones read initially after opening /proc/kmsg just after booting.
>> I also found out, that output of 'dmesg --follow' also doesn't react on new printks appearing for whatever reason - to read new messages, reopening /proc/kmsg or /dev/kmsg was needed.
>> I bisected this down to commit 15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a lockdep splat with printk()"), and reverting it on top of v4.19.106 restored correct behaviour.
> That is really really odd.
Very odd it is indeed.
>
>> My test scenario for bisecting was:
>> 1. run 'dmesg --follow' as root
>> 2. run 'echo t > /proc/sysrq-trigger'
>> 3. If trace appears in dmesg output -> good, otherwise, bad. If trace doesn't appear in output of 'dmesg --follow', re-running it will show the trace.
>>
>> I ran my tests on Debian 10.3 with configuration based directly on one from 4.19.0-8-amd64 (4.19.98-1) in Qemu.
>> I could reproduce the same issue on several boards with x86 and ARMv7 CPUs alike, with 100% reproducibility.
>>
>> I haven't yet digged into why exactly this commit breaks notifications for readers of /proc/kmsg and /dev/kmsg, but as reverting it fixed the issue, I'm pretty sure this is the one. It is possible that the same happened in 5.4 line, bu I hadn't had a chance to test this as well yet.
> I can revert this, but it feels like there is something else going wrong
> here. Can you try the 5.4 tree to see if that too has your same
> problem?
Yes, I'll check it in a short while.
>
> thanks,
>
> greg k-h

--
Pozdrawiam/With kind regards,
Lech Perczak

Sr. Software Engineer
Camlin Technologies Poland Limited Sp. z o.o.

2020-02-27 14:28:12

by Lech Perczak

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

W dniu 27.02.2020 o 13:39, Lech Perczak pisze:
> W dniu 27.02.2020 o 13:36, Greg Kroah-Hartman pisze:
>> On Thu, Feb 27, 2020 at 11:09:49AM +0000, Lech Perczak wrote:
>>> Hello,
>>>
>>> After upgrading kernel on our boards from v4.19.105 to v4.19.106 we found out that syslog fails to read the messages after ones read initially after opening /proc/kmsg just after booting.
>>> I also found out, that output of 'dmesg --follow' also doesn't react on new printks appearing for whatever reason - to read new messages, reopening /proc/kmsg or /dev/kmsg was needed.
>>> I bisected this down to commit 15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a lockdep splat with printk()"), and reverting it on top of v4.19.106 restored correct behaviour.
>> That is really really odd.
> Very odd it is indeed.
>>> My test scenario for bisecting was:
>>> 1. run 'dmesg --follow' as root
>>> 2. run 'echo t > /proc/sysrq-trigger'
>>> 3. If trace appears in dmesg output -> good, otherwise, bad. If trace doesn't appear in output of 'dmesg --follow', re-running it will show the trace.
>>>
>>> I ran my tests on Debian 10.3 with configuration based directly on one from 4.19.0-8-amd64 (4.19.98-1) in Qemu.
>>> I could reproduce the same issue on several boards with x86 and ARMv7 CPUs alike, with 100% reproducibility.
>>>
>>> I haven't yet digged into why exactly this commit breaks notifications for readers of /proc/kmsg and /dev/kmsg, but as reverting it fixed the issue, I'm pretty sure this is the one. It is possible that the same happened in 5.4 line, bu I hadn't had a chance to test this as well yet.
>> I can revert this, but it feels like there is something else going wrong
>> here. Can you try the 5.4 tree to see if that too has your same
>> problem?
> Yes, I'll check it in a short while.
I checked v5.4.22 just now and didn't observe the issue. Maybe this commit wasn't destined for 4.19.y due to intricacies of locking inside printk() :-)

From my side, there is no need to rush with the revert, as I can do this locally and drop the revert with next rebase-to-upstream, which we do very often.
OTOH, the issue is likely to affect a lot of users, especially ones using distros tracking this branch like Debian 10 mentioned earlier,
once they pick it up the change, as the kernel log content recorded by syslog will be affected, and 'dmesg --follow' behaviour will be quite surprising.
>> thanks,
>>
>> greg k-h
Also CC'ed Sergey, maybe he will have a clue about this.

--
Pozdrawiam/With kind regards,
Lech Perczak

Sr. Software Engineer
Camlin Technologies Poland Limited Sp. z o.o.

2020-02-28 03:13:39

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

Cc-ing Petr, Steven, John

https://lore.kernel.org/lkml/[email protected]

On (20/02/27 14:08), Lech Perczak wrote:
> W dniu 27.02.2020 o?13:39, Lech Perczak pisze:
> > W dniu 27.02.2020 o?13:36, Greg Kroah-Hartman pisze:
> >> On Thu, Feb 27, 2020 at 11:09:49AM +0000, Lech Perczak wrote:
> >>> Hello,
> >>>
> >>> After upgrading kernel on our boards from v4.19.105 to v4.19.106 we found out that syslog fails to read the messages after ones read initially after opening /proc/kmsg just after booting.
> >>> I also found out, that output of 'dmesg --follow' also doesn't react on new printks appearing for whatever reason - to read new messages, reopening /proc/kmsg or /dev/kmsg was needed.
> >>> I bisected this down to commit 15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a lockdep splat with printk()"), and reverting it on top of v4.19.106 restored correct behaviour.
> >> That is really really odd.
> > Very odd it is indeed.
> >>> My test scenario for bisecting was:
> >>> 1. run 'dmesg --follow' as root
> >>> 2. run 'echo t > /proc/sysrq-trigger'
> >>> 3. If trace appears in dmesg output -> good, otherwise, bad. If trace doesn't appear in output of 'dmesg --follow', re-running it will show the trace.
> >>>
> >>> I ran my tests on Debian 10.3 with configuration based directly on one from 4.19.0-8-amd64 (4.19.98-1) in Qemu.
> >>> I could reproduce the same issue on several boards with x86 and ARMv7 CPUs alike, with 100% reproducibility.
> >>>
> >>> I haven't yet digged into why exactly this commit breaks notifications for readers of /proc/kmsg and /dev/kmsg, but as reverting it fixed the issue, I'm pretty sure this is the one. It is possible that the same happened in 5.4 line, bu I hadn't had a chance to test this as well yet.
> >> I can revert this, but it feels like there is something else going wrong
> >> here. Can you try the 5.4 tree to see if that too has your same
> >> problem?
> > Yes, I'll check it in a short while.
> I checked v5.4.22 just now and didn't observe the issue. Maybe this commit wasn't destined for 4.19.y due to intricacies of locking inside printk() :-)
>
> From my side, there is no need to rush with the revert, as I can do this locally and drop the revert with next rebase-to-upstream, which we do very often.
> OTOH, the issue is likely to affect a lot of users, especially ones using distros tracking this branch like Debian 10 mentioned earlier,
> once they pick it up the change, as the kernel log content recorded by syslog will be affected, and 'dmesg --follow' behaviour will be quite surprising.

This is very-very odd... Hmm.
Just out of curiosity, what happens if you comment out that
printk() entirely?

printk_deferred() should not affect the PRINTK_PENDING_WAKEUP path.

Either we never queue wakeup irq_work(), e.g. because waitqueue_active()
never lets us to do so or because `(curr_log_seq != log_next_seq)' is
always zero:

void wake_up_klogd(void)
{
preempt_disable();
if (waitqueue_active(&log_wait)) {
this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
}
preempt_enable();
}

int vprintk_emit()
{
...
pending_output = (curr_log_seq != log_next_seq);
...
if (pending_output)
wake_up_klogd()
}

Or we do wakeup, but then either `syslog_seq != log_next_seq' or
`user->seq != log_next_seq' fail.


Lech, any chance you can trace what's happening in the system?

-ss

2020-02-28 09:12:49

by John Ogness

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

On 2020-02-28, Sergey Senozhatsky <[email protected]> wrote:
> Cc-ing Petr, Steven, John

Thanks.

> https://lore.kernel.org/lkml/[email protected]
>
> On (20/02/27 14:08), Lech Perczak wrote:
>> >>> My test scenario for bisecting was:
>> >>> 1. run 'dmesg --follow' as root
>> >>> 2. run 'echo t > /proc/sysrq-trigger'
>> >>> 3. If trace appears in dmesg output -> good, otherwise, bad. If trace doesn't appear in output of 'dmesg --follow', re-running it will show the trace.
>> >>>
>> >>> I ran my tests on Debian 10.3 with configuration based directly on one from 4.19.0-8-amd64 (4.19.98-1) in Qemu.
>> >>> I could reproduce the same issue on several boards with x86 and ARMv7 CPUs alike, with 100% reproducibility.
>
> This is very-very odd... Hmm.
> Just out of curiosity, what happens if you comment out that
> printk() entirely?
>
> printk_deferred() should not affect the PRINTK_PENDING_WAKEUP path.

It is the printk_deferred() causing the issue. This is relatively early,
so perhaps something is not yet properly initialized.

> Either we never queue wakeup irq_work(), e.g. because
> waitqueue_active() never lets us to do so or because `(curr_log_seq !=
> log_next_seq)' is always zero

wake_up_klogd() is called and the waitqueue (@log_wait) is
active. irq_work_queue() is called, but the work function,
wake_up_klogd_work_func(), is never called.

Perhaps @wake_up_klogd_work gets broken somehow. I'm looking into it.

John Ogness

2020-02-28 10:04:54

by Petr Mladek

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

On Fri 2020-02-28 12:13:06, Sergey Senozhatsky wrote:
> Cc-ing Petr, Steven, John
>
> https://lore.kernel.org/lkml/[email protected]
>
> On (20/02/27 14:08), Lech Perczak wrote:
> > W dniu 27.02.2020 o?13:39, Lech Perczak pisze:
> > > W dniu 27.02.2020 o?13:36, Greg Kroah-Hartman pisze:
> > >> On Thu, Feb 27, 2020 at 11:09:49AM +0000, Lech Perczak wrote:
> > >>> Hello,
> > >>>
> > >>> After upgrading kernel on our boards from v4.19.105 to v4.19.106 we found out that syslog fails to read the messages after ones read initially after opening /proc/kmsg just after booting.
> > >>> I also found out, that output of 'dmesg --follow' also doesn't react on new printks appearing for whatever reason - to read new messages, reopening /proc/kmsg or /dev/kmsg was needed.
> > >>> I bisected this down to commit 15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a lockdep splat with printk()"), and reverting it on top of v4.19.106 restored correct behaviour.
> > >> That is really really odd.
> > > Very odd it is indeed.
> > >>> My test scenario for bisecting was:
> > >>> 1. run 'dmesg --follow' as root
> > >>> 2. run 'echo t > /proc/sysrq-trigger'
> > >>> 3. If trace appears in dmesg output -> good, otherwise, bad. If trace doesn't appear in output of 'dmesg --follow', re-running it will show the trace.
> > >>>

I have reproduced the problem with a kernel based on v4.19.106
and I see the following in the log:

[ 0.028250] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[ 0.028263] random: get_random_bytes called from start_kernel+0x9e/0x4f6 with crng_init=0
[ 0.028268] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:4 nr_cpu_ids:4 nr_node_ids:1
[ 0.028407] percpu: Embedded 44 pages/cpu s142216 r8192 d29816 u524288
[ 0.028411] pcpu-alloc: s142216 r8192 d29816 u524288 alloc=1*2097152
[ 0.028412] pcpu-alloc: [0] 0 1 2 3

Note that percpu stuff is initialized after printk_deferred(). And the
deferred console is scheduled by:

void defer_console_output(void)
{
preempt_disable();
__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
preempt_enable();
}

I am afraid that the patch creates some mess via the non-initialized
per-cpu variable.

I see that x86 has some support for EARLY_PER_CPU stuff but it seems
to be arch-specific.

I do not see a reliable way to detect when per-cpu variables are
initialized. Adding Tejun and PeterZ into CC if they have any
idea.

I suggest to revert the patch until we have some easy and safe solution.

Best Regards,
Petr

2020-02-28 10:58:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

On Fri, Feb 28, 2020 at 11:04:16AM +0100, Petr Mladek wrote:
> On Fri 2020-02-28 12:13:06, Sergey Senozhatsky wrote:
> > Cc-ing Petr, Steven, John
> >
> > https://lore.kernel.org/lkml/[email protected]
> >
> > On (20/02/27 14:08), Lech Perczak wrote:
> > > W dniu 27.02.2020 o?13:39, Lech Perczak pisze:
> > > > W dniu 27.02.2020 o?13:36, Greg Kroah-Hartman pisze:
> > > >> On Thu, Feb 27, 2020 at 11:09:49AM +0000, Lech Perczak wrote:
> > > >>> Hello,
> > > >>>
> > > >>> After upgrading kernel on our boards from v4.19.105 to v4.19.106 we found out that syslog fails to read the messages after ones read initially after opening /proc/kmsg just after booting.
> > > >>> I also found out, that output of 'dmesg --follow' also doesn't react on new printks appearing for whatever reason - to read new messages, reopening /proc/kmsg or /dev/kmsg was needed.
> > > >>> I bisected this down to commit 15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a lockdep splat with printk()"), and reverting it on top of v4.19.106 restored correct behaviour.
> > > >> That is really really odd.
> > > > Very odd it is indeed.
> > > >>> My test scenario for bisecting was:
> > > >>> 1. run 'dmesg --follow' as root
> > > >>> 2. run 'echo t > /proc/sysrq-trigger'
> > > >>> 3. If trace appears in dmesg output -> good, otherwise, bad. If trace doesn't appear in output of 'dmesg --follow', re-running it will show the trace.
> > > >>>
>
> I have reproduced the problem with a kernel based on v4.19.106
> and I see the following in the log:
>
> [ 0.028250] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
> [ 0.028263] random: get_random_bytes called from start_kernel+0x9e/0x4f6 with crng_init=0
> [ 0.028268] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:4 nr_cpu_ids:4 nr_node_ids:1
> [ 0.028407] percpu: Embedded 44 pages/cpu s142216 r8192 d29816 u524288
> [ 0.028411] pcpu-alloc: s142216 r8192 d29816 u524288 alloc=1*2097152
> [ 0.028412] pcpu-alloc: [0] 0 1 2 3
>
> Note that percpu stuff is initialized after printk_deferred(). And the
> deferred console is scheduled by:
>
> void defer_console_output(void)
> {
> preempt_disable();
> __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> preempt_enable();
> }
>
> I am afraid that the patch creates some mess via the non-initialized
> per-cpu variable.
>
> I see that x86 has some support for EARLY_PER_CPU stuff but it seems
> to be arch-specific.
>
> I do not see a reliable way to detect when per-cpu variables are
> initialized. Adding Tejun and PeterZ into CC if they have any
> idea.
>
> I suggest to revert the patch until we have some easy and safe solution.

Ok, I'll do so, but why is this not an issue in 5.4.y and newer kernels?

thanks,

greg k-h

2020-02-28 11:32:54

by Petr Mladek

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

On Fri 2020-02-28 11:58:36, Greg Kroah-Hartman wrote:
> On Fri, Feb 28, 2020 at 11:04:16AM +0100, Petr Mladek wrote:
> > On Fri 2020-02-28 12:13:06, Sergey Senozhatsky wrote:
> > > Cc-ing Petr, Steven, John
> > >
> > > https://lore.kernel.org/lkml/[email protected]
> > >
> > > On (20/02/27 14:08), Lech Perczak wrote:
> > > > W dniu 27.02.2020 o?13:39, Lech Perczak pisze:
> > > > > W dniu 27.02.2020 o?13:36, Greg Kroah-Hartman pisze:
> > > > >> On Thu, Feb 27, 2020 at 11:09:49AM +0000, Lech Perczak wrote:
> > > > >>> Hello,
> > > > >>>
> > > > >>> After upgrading kernel on our boards from v4.19.105 to v4.19.106 we found out that syslog fails to read the messages after ones read initially after opening /proc/kmsg just after booting.
> > > > >>> I also found out, that output of 'dmesg --follow' also doesn't react on new printks appearing for whatever reason - to read new messages, reopening /proc/kmsg or /dev/kmsg was needed.
> > > > >>> I bisected this down to commit 15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a lockdep splat with printk()"), and reverting it on top of v4.19.106 restored correct behaviour.
> > > > >> That is really really odd.
> > > > > Very odd it is indeed.
> > > > >>> My test scenario for bisecting was:
> > > > >>> 1. run 'dmesg --follow' as root
> > > > >>> 2. run 'echo t > /proc/sysrq-trigger'
> > > > >>> 3. If trace appears in dmesg output -> good, otherwise, bad. If trace doesn't appear in output of 'dmesg --follow', re-running it will show the trace.
> > > > >>>
> >
> > I have reproduced the problem with a kernel based on v4.19.106
> > and I see the following in the log:
> >
> > [ 0.028250] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
> > [ 0.028263] random: get_random_bytes called from start_kernel+0x9e/0x4f6 with crng_init=0
> > [ 0.028268] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:4 nr_cpu_ids:4 nr_node_ids:1
> > [ 0.028407] percpu: Embedded 44 pages/cpu s142216 r8192 d29816 u524288
> > [ 0.028411] pcpu-alloc: s142216 r8192 d29816 u524288 alloc=1*2097152
> > [ 0.028412] pcpu-alloc: [0] 0 1 2 3
> >
> > Note that percpu stuff is initialized after printk_deferred(). And the
> > deferred console is scheduled by:
> >
> > void defer_console_output(void)
> > {
> > preempt_disable();
> > __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> > irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > preempt_enable();
> > }
> >
> > I am afraid that the patch creates some mess via the non-initialized
> > per-cpu variable.
> >
> > I see that x86 has some support for EARLY_PER_CPU stuff but it seems
> > to be arch-specific.
> >
> > I do not see a reliable way to detect when per-cpu variables are
> > initialized. Adding Tejun and PeterZ into CC if they have any
> > idea.
> >
> > I suggest to revert the patch until we have some easy and safe solution.
>
> Ok, I'll do so, but why is this not an issue in 5.4.y and newer kernels?

Good question. Well, there have been many changes in the random number
subsystem initialization recently. My bet is that it is much harder to
hit the warning there.

Best Regards,
Petr

2020-02-28 11:41:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

On Fri, Feb 28, 2020 at 12:32:14PM +0100, Petr Mladek wrote:
> On Fri 2020-02-28 11:58:36, Greg Kroah-Hartman wrote:
> > On Fri, Feb 28, 2020 at 11:04:16AM +0100, Petr Mladek wrote:
> > > On Fri 2020-02-28 12:13:06, Sergey Senozhatsky wrote:
> > > > Cc-ing Petr, Steven, John
> > > >
> > > > https://lore.kernel.org/lkml/[email protected]
> > > >
> > > > On (20/02/27 14:08), Lech Perczak wrote:
> > > > > W dniu 27.02.2020 o?13:39, Lech Perczak pisze:
> > > > > > W dniu 27.02.2020 o?13:36, Greg Kroah-Hartman pisze:
> > > > > >> On Thu, Feb 27, 2020 at 11:09:49AM +0000, Lech Perczak wrote:
> > > > > >>> Hello,
> > > > > >>>
> > > > > >>> After upgrading kernel on our boards from v4.19.105 to v4.19.106 we found out that syslog fails to read the messages after ones read initially after opening /proc/kmsg just after booting.
> > > > > >>> I also found out, that output of 'dmesg --follow' also doesn't react on new printks appearing for whatever reason - to read new messages, reopening /proc/kmsg or /dev/kmsg was needed.
> > > > > >>> I bisected this down to commit 15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a lockdep splat with printk()"), and reverting it on top of v4.19.106 restored correct behaviour.
> > > > > >> That is really really odd.
> > > > > > Very odd it is indeed.
> > > > > >>> My test scenario for bisecting was:
> > > > > >>> 1. run 'dmesg --follow' as root
> > > > > >>> 2. run 'echo t > /proc/sysrq-trigger'
> > > > > >>> 3. If trace appears in dmesg output -> good, otherwise, bad. If trace doesn't appear in output of 'dmesg --follow', re-running it will show the trace.
> > > > > >>>
> > >
> > > I have reproduced the problem with a kernel based on v4.19.106
> > > and I see the following in the log:
> > >
> > > [ 0.028250] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
> > > [ 0.028263] random: get_random_bytes called from start_kernel+0x9e/0x4f6 with crng_init=0
> > > [ 0.028268] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:4 nr_cpu_ids:4 nr_node_ids:1
> > > [ 0.028407] percpu: Embedded 44 pages/cpu s142216 r8192 d29816 u524288
> > > [ 0.028411] pcpu-alloc: s142216 r8192 d29816 u524288 alloc=1*2097152
> > > [ 0.028412] pcpu-alloc: [0] 0 1 2 3
> > >
> > > Note that percpu stuff is initialized after printk_deferred(). And the
> > > deferred console is scheduled by:
> > >
> > > void defer_console_output(void)
> > > {
> > > preempt_disable();
> > > __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> > > irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > > preempt_enable();
> > > }
> > >
> > > I am afraid that the patch creates some mess via the non-initialized
> > > per-cpu variable.
> > >
> > > I see that x86 has some support for EARLY_PER_CPU stuff but it seems
> > > to be arch-specific.
> > >
> > > I do not see a reliable way to detect when per-cpu variables are
> > > initialized. Adding Tejun and PeterZ into CC if they have any
> > > idea.
> > >
> > > I suggest to revert the patch until we have some easy and safe solution.
> >
> > Ok, I'll do so, but why is this not an issue in 5.4.y and newer kernels?
>
> Good question. Well, there have been many changes in the random number
> subsystem initialization recently. My bet is that it is much harder to
> hit the warning there.

Ah, yeah, very true. Some of those patches were backported to 4.19.y,
but not all of them.

thanks,

greg k-h

2020-02-28 11:51:12

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

On (20/02/28 11:04), Petr Mladek wrote:
> > On (20/02/27 14:08), Lech Perczak wrote:
> > > W dniu 27.02.2020 o?13:39, Lech Perczak pisze:
> > > > W dniu 27.02.2020 o?13:36, Greg Kroah-Hartman pisze:
> > > >> On Thu, Feb 27, 2020 at 11:09:49AM +0000, Lech Perczak wrote:
> > > >>> Hello,
> > > >>>
> > > >>> After upgrading kernel on our boards from v4.19.105 to v4.19.106 we found out that syslog fails to read the messages after ones read initially after opening /proc/kmsg just after booting.
> > > >>> I also found out, that output of 'dmesg --follow' also doesn't react on new printks appearing for whatever reason - to read new messages, reopening /proc/kmsg or /dev/kmsg was needed.
> > > >>> I bisected this down to commit 15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a lockdep splat with printk()"), and reverting it on top of v4.19.106 restored correct behaviour.
> > > >> That is really really odd.
> > > > Very odd it is indeed.
> > > >>> My test scenario for bisecting was:
> > > >>> 1. run 'dmesg --follow' as root
> > > >>> 2. run 'echo t > /proc/sysrq-trigger'
> > > >>> 3. If trace appears in dmesg output -> good, otherwise, bad. If trace doesn't appear in output of 'dmesg --follow', re-running it will show the trace.
> > > >>>
>
> I have reproduced the problem with a kernel based on v4.19.106
> and I see the following in the log:
>
> [ 0.028250] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
> [ 0.028263] random: get_random_bytes called from start_kernel+0x9e/0x4f6 with crng_init=0
> [ 0.028268] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:4 nr_cpu_ids:4 nr_node_ids:1
> [ 0.028407] percpu: Embedded 44 pages/cpu s142216 r8192 d29816 u524288
> [ 0.028411] pcpu-alloc: s142216 r8192 d29816 u524288 alloc=1*2097152
> [ 0.028412] pcpu-alloc: [0] 0 1 2 3
>
> Note that percpu stuff is initialized after printk_deferred(). And the
> deferred console is scheduled by:
>
> void defer_console_output(void)
> {
> preempt_disable();
> __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> preempt_enable();
> }

Thanks.
I thought about "per-CPU printk() stuff happening too early", but
couldn't figure out why would it cause any problems later, when we
have everything setup and working.

Note that we test printk_safe_irq_ready only before irq_work_queue(), but
otherwise we access per-CPU printk_context. Theoretically we also can touch
per-CPU printk_context or even printk() to per-CPU buffer "too early".

> I am afraid that the patch creates some mess via the non-initialized
> per-cpu variable.

We have `printk_safe_irq_ready' for printk() related irq_work. Maybe
we can use it in printk.c as well. We never know when printk_deferred()
may trigger, so this type of problems can repeat.

-ss

2020-02-28 12:03:07

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

On (20/02/28 10:11), John Ogness wrote:
[..]
> >> >>> My test scenario for bisecting was:
> >> >>> 1. run 'dmesg --follow' as root
> >> >>> 2. run 'echo t > /proc/sysrq-trigger'
> >> >>> 3. If trace appears in dmesg output -> good, otherwise, bad. If trace doesn't appear in output of 'dmesg --follow', re-running it will show the trace.
> >> >>>
> >> >>> I ran my tests on Debian 10.3 with configuration based directly on one from 4.19.0-8-amd64 (4.19.98-1) in Qemu.
> >> >>> I could reproduce the same issue on several boards with x86 and ARMv7 CPUs alike, with 100% reproducibility.
> >
> > This is very-very odd... Hmm.
> > Just out of curiosity, what happens if you comment out that
> > printk() entirely?
> >
> > printk_deferred() should not affect the PRINTK_PENDING_WAKEUP path.
>
> It is the printk_deferred() causing the issue. This is relatively early,
> so perhaps something is not yet properly initialized.
>
> > Either we never queue wakeup irq_work(), e.g. because
> > waitqueue_active() never lets us to do so or because `(curr_log_seq !=
> > log_next_seq)' is always zero
>
> wake_up_klogd() is called and the waitqueue (@log_wait) is
> active. irq_work_queue() is called, but the work function,
> wake_up_klogd_work_func(), is never called.
>
> Perhaps @wake_up_klogd_work gets broken somehow. I'm looking into it.

Thanks.

The interesting part here is that @wake_up_klogd_work is per-CPU. So
while I can imagine that, for instance, boot-CPU would get busted, but
not sure I see why all CPUs would experience problems. Maybe we hit
that randomness warning for every CPU during bring up? Then maybe some
more randomness-related patches need to be backported to 4.19?

-ss

2020-02-28 13:04:33

by Petr Mladek

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

On Fri 2020-02-28 12:32:14, Petr Mladek wrote:
> On Fri 2020-02-28 11:58:36, Greg Kroah-Hartman wrote:
> > On Fri, Feb 28, 2020 at 11:04:16AM +0100, Petr Mladek wrote:
> > > On Fri 2020-02-28 12:13:06, Sergey Senozhatsky wrote:
> > > > Cc-ing Petr, Steven, John
> > > >
> > > > https://lore.kernel.org/lkml/[email protected]
> > > >
> > > > On (20/02/27 14:08), Lech Perczak wrote:
> > > > > W dniu 27.02.2020 o?13:39, Lech Perczak pisze:
> > > > > > W dniu 27.02.2020 o?13:36, Greg Kroah-Hartman pisze:
> > > > > >> On Thu, Feb 27, 2020 at 11:09:49AM +0000, Lech Perczak wrote:
> > > > > >>> Hello,
> > > > > >>>
> > > > > >>> After upgrading kernel on our boards from v4.19.105 to v4.19.106 we found out that syslog fails to read the messages after ones read initially after opening /proc/kmsg just after booting.
> > > > > >>> I also found out, that output of 'dmesg --follow' also doesn't react on new printks appearing for whatever reason - to read new messages, reopening /proc/kmsg or /dev/kmsg was needed.
> > > > > >>> I bisected this down to commit 15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a lockdep splat with printk()"), and reverting it on top of v4.19.106 restored correct behaviour.
> > > > > >> That is really really odd.
> > > > > > Very odd it is indeed.
> > > > > >>> My test scenario for bisecting was:
> > > > > >>> 1. run 'dmesg --follow' as root
> > > > > >>> 2. run 'echo t > /proc/sysrq-trigger'
> > > > > >>> 3. If trace appears in dmesg output -> good, otherwise, bad. If trace doesn't appear in output of 'dmesg --follow', re-running it will show the trace.
> > > > > >>>
> > >
> > > I have reproduced the problem with a kernel based on v4.19.106
> > > and I see the following in the log:
> > >
> > > [ 0.028250] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
> > > [ 0.028263] random: get_random_bytes called from start_kernel+0x9e/0x4f6 with crng_init=0
> > > [ 0.028268] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:4 nr_cpu_ids:4 nr_node_ids:1
> > > [ 0.028407] percpu: Embedded 44 pages/cpu s142216 r8192 d29816 u524288
> > > [ 0.028411] pcpu-alloc: s142216 r8192 d29816 u524288 alloc=1*2097152
> > > [ 0.028412] pcpu-alloc: [0] 0 1 2 3
> > >
> > > Note that percpu stuff is initialized after printk_deferred(). And the
> > > deferred console is scheduled by:
> > >
> > > void defer_console_output(void)
> > > {
> > > preempt_disable();
> > > __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> > > irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > > preempt_enable();
> > > }
> > >
> > > I am afraid that the patch creates some mess via the non-initialized
> > > per-cpu variable.
> > >
> > > I see that x86 has some support for EARLY_PER_CPU stuff but it seems
> > > to be arch-specific.
> > >
> > > I do not see a reliable way to detect when per-cpu variables are
> > > initialized. Adding Tejun and PeterZ into CC if they have any
> > > idea.
> > >
> > > I suggest to revert the patch until we have some easy and safe solution.
> >
> > Ok, I'll do so, but why is this not an issue in 5.4.y and newer kernels?
>
> Good question. Well, there have been many changes in the random number
> subsystem initialization recently. My bet is that it is much harder to
> hit the warning there.

OK, the problem has gone after I cherry-picked the commit
d55535232c3dbde9a523 ("random: move rand_initialize() earlier").
I still see the warning but it is printed much later:

[ 0.051846] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[ 0.054070] NR_IRQS: 524544, nr_irqs: 456, preallocated irqs: 16
[ 0.054281] random: get_random_bytes called from start_kernel+0x308/0x4fe with crng_init=0
[ 0.054430] Console: colour dummy device 80x25

But I am not sure if it is safe to backport this patch into the old
stable kernel.

Anyway, this fix would not be enough. The commit message mentions:

Note that this warning may still remain for machines that do not have
UEFI RNG support (which initializes the RNG pools during setup_arch()),
or for x86 machines without RDRAND (or booting without "random.trust=on"
or CONFIG_RANDOM_TRUST_CPU=y).


So, I would still prefer to _revert_ the commit 15341b1dd409749f
("char/random: silence a lockdep splat with printk()"). It calmed
down lockdep report. The real life danger is dubious. The warning
is printed early when the system is running on single CPU where
it could not race.

Best Regards,
Petr

2020-02-28 13:42:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

On Fri, Feb 28, 2020 at 02:02:17PM +0100, Petr Mladek wrote:
> On Fri 2020-02-28 12:32:14, Petr Mladek wrote:
> > On Fri 2020-02-28 11:58:36, Greg Kroah-Hartman wrote:
> > > On Fri, Feb 28, 2020 at 11:04:16AM +0100, Petr Mladek wrote:
> > > > On Fri 2020-02-28 12:13:06, Sergey Senozhatsky wrote:
> > > > > Cc-ing Petr, Steven, John
> > > > >
> > > > > https://lore.kernel.org/lkml/[email protected]
> > > > >
> > > > > On (20/02/27 14:08), Lech Perczak wrote:
> > > > > > W dniu 27.02.2020 o?13:39, Lech Perczak pisze:
> > > > > > > W dniu 27.02.2020 o?13:36, Greg Kroah-Hartman pisze:
> > > > > > >> On Thu, Feb 27, 2020 at 11:09:49AM +0000, Lech Perczak wrote:
> > > > > > >>> Hello,
> > > > > > >>>
> > > > > > >>> After upgrading kernel on our boards from v4.19.105 to v4.19.106 we found out that syslog fails to read the messages after ones read initially after opening /proc/kmsg just after booting.
> > > > > > >>> I also found out, that output of 'dmesg --follow' also doesn't react on new printks appearing for whatever reason - to read new messages, reopening /proc/kmsg or /dev/kmsg was needed.
> > > > > > >>> I bisected this down to commit 15341b1dd409749fa5625e4b632013b6ba81609b ("char/random: silence a lockdep splat with printk()"), and reverting it on top of v4.19.106 restored correct behaviour.
> > > > > > >> That is really really odd.
> > > > > > > Very odd it is indeed.
> > > > > > >>> My test scenario for bisecting was:
> > > > > > >>> 1. run 'dmesg --follow' as root
> > > > > > >>> 2. run 'echo t > /proc/sysrq-trigger'
> > > > > > >>> 3. If trace appears in dmesg output -> good, otherwise, bad. If trace doesn't appear in output of 'dmesg --follow', re-running it will show the trace.
> > > > > > >>>
> > > >
> > > > I have reproduced the problem with a kernel based on v4.19.106
> > > > and I see the following in the log:
> > > >
> > > > [ 0.028250] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
> > > > [ 0.028263] random: get_random_bytes called from start_kernel+0x9e/0x4f6 with crng_init=0
> > > > [ 0.028268] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:4 nr_cpu_ids:4 nr_node_ids:1
> > > > [ 0.028407] percpu: Embedded 44 pages/cpu s142216 r8192 d29816 u524288
> > > > [ 0.028411] pcpu-alloc: s142216 r8192 d29816 u524288 alloc=1*2097152
> > > > [ 0.028412] pcpu-alloc: [0] 0 1 2 3
> > > >
> > > > Note that percpu stuff is initialized after printk_deferred(). And the
> > > > deferred console is scheduled by:
> > > >
> > > > void defer_console_output(void)
> > > > {
> > > > preempt_disable();
> > > > __this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
> > > > irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > > > preempt_enable();
> > > > }
> > > >
> > > > I am afraid that the patch creates some mess via the non-initialized
> > > > per-cpu variable.
> > > >
> > > > I see that x86 has some support for EARLY_PER_CPU stuff but it seems
> > > > to be arch-specific.
> > > >
> > > > I do not see a reliable way to detect when per-cpu variables are
> > > > initialized. Adding Tejun and PeterZ into CC if they have any
> > > > idea.
> > > >
> > > > I suggest to revert the patch until we have some easy and safe solution.
> > >
> > > Ok, I'll do so, but why is this not an issue in 5.4.y and newer kernels?
> >
> > Good question. Well, there have been many changes in the random number
> > subsystem initialization recently. My bet is that it is much harder to
> > hit the warning there.
>
> OK, the problem has gone after I cherry-picked the commit
> d55535232c3dbde9a523 ("random: move rand_initialize() earlier").
> I still see the warning but it is printed much later:
>
> [ 0.051846] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
> [ 0.054070] NR_IRQS: 524544, nr_irqs: 456, preallocated irqs: 16
> [ 0.054281] random: get_random_bytes called from start_kernel+0x308/0x4fe with crng_init=0
> [ 0.054430] Console: colour dummy device 80x25
>
> But I am not sure if it is safe to backport this patch into the old
> stable kernel.
>
> Anyway, this fix would not be enough. The commit message mentions:
>
> Note that this warning may still remain for machines that do not have
> UEFI RNG support (which initializes the RNG pools during setup_arch()),
> or for x86 machines without RDRAND (or booting without "random.trust=on"
> or CONFIG_RANDOM_TRUST_CPU=y).
>
>
> So, I would still prefer to _revert_ the commit 15341b1dd409749f
> ("char/random: silence a lockdep splat with printk()"). It calmed
> down lockdep report. The real life danger is dubious. The warning
> is printed early when the system is running on single CPU where
> it could not race.

Yeah, good idea, it's now reverted in my tree and will show up in the
next release.

thanks everyone for finding and working on this.

greg k-h

2020-02-28 20:56:32

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

On Fri, Feb 28, 2020 at 02:02:17PM +0100, Petr Mladek wrote:
>
> So, I would still prefer to _revert_ the commit 15341b1dd409749f
> ("char/random: silence a lockdep splat with printk()"). It calmed
> down lockdep report. The real life danger is dubious. The warning
> is printed early when the system is running on single CPU where
> it could not race.

I'm wondering now if we should revert this commit before 5.6 comes out
(it landed in 5.6-rc1). "Is much less likely to happen given the
other random initialization patches" is not the same as "guaranteed
not to happen".

What do folks think?

- Ted

2020-02-29 03:33:25

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

On (20/02/28 15:53), Theodore Y. Ts'o wrote:
> > So, I would still prefer to _revert_ the commit 15341b1dd409749f
> > ("char/random: silence a lockdep splat with printk()"). It calmed
> > down lockdep report. The real life danger is dubious. The warning
> > is printed early when the system is running on single CPU where
> > it could not race.
>
> I'm wondering now if we should revert this commit before 5.6 comes out
> (it landed in 5.6-rc1). "Is much less likely to happen given the
> other random initialization patches" is not the same as "guaranteed
> not to happen".
>
> What do folks think?

Well, my 5 cents, there is nothing that prevents "too-early"
printk_deferred() calls in the future. From that POV I'd probably
prefer to "forbid" printk_deffered() to touch per-CPU deferred
machinery until it's not "too early" anymore. Similar to what we
do in printk_safe::queue_flush_work().

-ss

2020-02-29 04:09:57

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

On (20/02/29 12:32), Sergey Senozhatsky wrote:
> > I'm wondering now if we should revert this commit before 5.6 comes out
> > (it landed in 5.6-rc1). "Is much less likely to happen given the
> > other random initialization patches" is not the same as "guaranteed
> > not to happen".
> >
> > What do folks think?
>
> Well, my 5 cents, there is nothing that prevents "too-early"
> printk_deferred() calls in the future. From that POV I'd probably
> prefer to "forbid" printk_deffered() to touch per-CPU deferred
> machinery until it's not "too early" anymore. Similar to what we
> do in printk_safe::queue_flush_work().

Informal patch

=====
Subject: [PATCH] printk: defer printk_deferred() until irq_work is ready

Signed-off-by: Sergey Senozhatsky <[email protected]>
---
kernel/printk/internal.h | 3 +++
kernel/printk/printk.c | 6 ++++++
kernel/printk/printk_safe.c | 11 ++++++++---
3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index c8e6ab689d42..2f051fb83c00 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -23,6 +23,8 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list args);
void __printk_safe_enter(void);
void __printk_safe_exit(void);

+bool printk_irq_work_ready(void);
+
#define printk_safe_enter_irqsave(flags) \
do { \
local_irq_save(flags); \
@@ -64,4 +66,5 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list args) { return 0; }
#define printk_safe_enter_irq() local_irq_disable()
#define printk_safe_exit_irq() local_irq_enable()

+bool printk_irq_work_ready(void) { return false; }
#endif /* CONFIG_PRINTK */
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ad4606234545..bb545c86124e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3009,6 +3009,9 @@ static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) = {

void wake_up_klogd(void)
{
+ if (!printk_irq_work_ready())
+ return;
+
preempt_disable();
if (waitqueue_active(&log_wait)) {
this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
@@ -3019,6 +3022,9 @@ void wake_up_klogd(void)

void defer_console_output(void)
{
+ if (!printk_irq_work_ready())
+ return;
+
preempt_disable();
__this_cpu_or(printk_pending, PRINTK_PENDING_OUTPUT);
irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index b4045e782743..5d2c282984d0 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -27,7 +27,7 @@
* There are situations when we want to make sure that all buffers
* were handled or when IRQs are blocked.
*/
-static int printk_safe_irq_ready __read_mostly;
+static bool __printk_irq_work_ready __read_mostly;

#define SAFE_LOG_BUF_LEN ((1 << CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT) - \
sizeof(atomic_t) - \
@@ -48,10 +48,15 @@ static DEFINE_PER_CPU(int, printk_context);
static DEFINE_PER_CPU(struct printk_safe_seq_buf, nmi_print_seq);
#endif

+bool printk_irq_work_ready(void)
+{
+ return __printk_irq_work_ready;
+}
+
/* Get flushed in a more safe context. */
static void queue_flush_work(struct printk_safe_seq_buf *s)
{
- if (printk_safe_irq_ready)
+ if (printk_irq_work_ready())
irq_work_queue(&s->work);
}

@@ -408,7 +413,7 @@ void __init printk_safe_init(void)
* variable is set.
*/
barrier();
- printk_safe_irq_ready = 1;
+ __printk_irq_work_ready = true;

/* Flush pending messages that did not have scheduled IRQ works. */
printk_safe_flush();
--
2.25.0.265.gbab2e86ba0-goog

2020-02-29 23:48:56

by Steven Rostedt

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

On Sat, 29 Feb 2020 12:32:53 +0900
Sergey Senozhatsky <[email protected]> wrote:

> > What do folks think?
>
> Well, my 5 cents, there is nothing that prevents "too-early"
> printk_deferred() calls in the future. From that POV I'd probably
> prefer to "forbid" printk_deffered() to touch per-CPU deferred
> machinery until it's not "too early" anymore. Similar to what we
> do in printk_safe::queue_flush_work().

I agree that printk_deferred() should handle being called too early.
But the issue is with per_cpu variables correct? Not the irq_work?

We could add a flag in init/main.c after setup_per_cpu_areas() and then
just have printk_deferred() act like a normal printk(). At that point,
there shouldn't be an issue in calling printk() directly, is there?

-- Steve

2020-03-01 05:23:27

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

On (20/02/29 18:47), Steven Rostedt wrote:
[..]
> > > What do folks think?
> >
> > Well, my 5 cents, there is nothing that prevents "too-early"
> > printk_deferred() calls in the future. From that POV I'd probably
> > prefer to "forbid" printk_deffered() to touch per-CPU deferred
> > machinery until it's not "too early" anymore. Similar to what we
> > do in printk_safe::queue_flush_work().
>
> I agree that printk_deferred() should handle being called too early.
> But the issue is with per_cpu variables correct? Not the irq_work?

Correct. printk_deferred() and printk_safe()/printk_nmi() irq_works
are per-CPU. We use "a special" flag in printk_safe()/printk_nmi() to
tell if it's too early to modify per-CPU irq_work or not.

I believe that we need to use that flag for all printk-safe/nmi
per-CPU data, including buffers, not only for irq_work. Just in
case if printk_safe or printk_nmi, somehow, are being called too
early.

> We could add a flag in init/main.c after setup_per_cpu_areas() and then
> just have printk_deferred() act like a normal printk(). At that point,
> there shouldn't be an issue in calling printk() directly, is there?

Sure, this will work. I believe we introduced a "work around" approach
in printk-safe because noone would ACK a global init/main.c flag for
printk(). If we can land a "per_cpu_areas_ready" flag (I've some doubts
here), then yes (!), let's use it and let's remove printk-safe workaround.

-ss

2020-03-02 09:51:01

by Petr Mladek

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

On Sun 2020-03-01 14:22:19, Sergey Senozhatsky wrote:
> On (20/02/29 18:47), Steven Rostedt wrote:
> [..]
> > > > What do folks think?
> > >
> > > Well, my 5 cents, there is nothing that prevents "too-early"
> > > printk_deferred() calls in the future. From that POV I'd probably
> > > prefer to "forbid" printk_deffered() to touch per-CPU deferred
> > > machinery until it's not "too early" anymore. Similar to what we
> > > do in printk_safe::queue_flush_work().
> >
> > I agree that printk_deferred() should handle being called too early.
> > But the issue is with per_cpu variables correct? Not the irq_work?
>
> Correct. printk_deferred() and printk_safe()/printk_nmi() irq_works
> are per-CPU. We use "a special" flag in printk_safe()/printk_nmi() to
> tell if it's too early to modify per-CPU irq_work or not.
>
> I believe that we need to use that flag for all printk-safe/nmi
> per-CPU data, including buffers, not only for irq_work. Just in
> case if printk_safe or printk_nmi, somehow, are being called too
> early.
>
> > We could add a flag in init/main.c after setup_per_cpu_areas() and then
> > just have printk_deferred() act like a normal printk(). At that point,
> > there shouldn't be an issue in calling printk() directly, is there?
>
> Sure, this will work. I believe we introduced a "work around" approach
> in printk-safe because noone would ACK a global init/main.c flag for
> printk(). If we can land a "per_cpu_areas_ready" flag (I've some doubts
> here), then yes (!), let's use it and let's remove printk-safe workaround.

A compromise might be to set a flag in setup_log_buf(). It is called
much earlier but it seems to be safe enough.

mm_init() is called close after setup_log_buf(). And it seems to be
using per-cpu variables when creating caches, see:

+ mm_init()
+ kmem_cache_init()
+ create_boot_cache()
+ __kmem_cache_create()
+ setup_cpu_cache()

It is just a detail. But I would make the flag independent
on the existing printk_safe stuff. printk_safe will get removed
with the lockless printk buffer. While the irq_work() will still
be needed for the wakeup functions.

Sergey, do you agree and want to update your patch accordingly?

Best Regards,
Petr

2020-03-02 09:59:51

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: Regression in v4.19.106 breaking waking up of readers of /proc/kmsg and /dev/kmsg

On (20/03/02 10:49), Petr Mladek wrote:
[..]
> It is just a detail. But I would make the flag independent
> on the existing printk_safe stuff. printk_safe will get removed
> with the lockless printk buffer. While the irq_work() will still
> be needed for the wakeup functions.

Yeah, somehow I thought, for a moment, that we would also remove
printk_deferred() once we have lockless buffer, but seems that we
can remove it only when console_sem is gone.

> Sergey, do you agree and want to update your patch accordingly?

Yes, can update the patch.

-ss