2021-02-15 19:24:17

by Paul Menzel

[permalink] [raw]
Subject: smpboot: CPU numbers printed as warning

Dear Linux folks,


Using Linux 5.10.13 (and before), looking at the Linux kernel warnings,
the CPU numbers show up. For example with 12 cpus/threads:

```
$ sudo dmesg --level=warn
[ 0.216103] #2
[ 0.220105] #3
[ 0.224103] #4
[ 0.228104] #5
[ 0.232110] #6
[ 0.236101] #7
[ 0.240102] #8
[ 0.244102] #9
[ 0.248100] #10
[ 0.252098] #11
```

If I am not mistaken, this is from `announce_cpu()` in
`arch/x86/kernel/smpboot.c`, and the `pr_cont()` in their “forgets” the
log level it belongs to, maybe because of SMP and other messages are
printed in between.

```
if (system_state < SYSTEM_RUNNING) {
if (node != current_node) {
if (current_node > (-1))
pr_cont("\n");
current_node = node;

printk(KERN_INFO ".... node %*s#%d, CPUs: ",
node_width - num_digits(node), " ", node);
}

/* Add padding for the BSP */
if (cpu == 1)
pr_cont("%*s", width + 1, " ");

pr_cont("%*s#%d", width - num_digits(cpu), " ", cpu);

} else
pr_info("Booting Node %d Processor %d APIC 0x%x\n",
node, cpu, apicid);
```

Would using `pr_info()` instead be an acceptable fix?


Kind regards,

Paul


2021-02-16 09:53:55

by Petr Mladek

[permalink] [raw]
Subject: Re: smpboot: CPU numbers printed as warning

On Mon 2021-02-15 20:22:34, Paul Menzel wrote:
> Dear Linux folks,
>
>
> Using Linux 5.10.13 (and before), looking at the Linux kernel warnings, the
> CPU numbers show up. For example with 12 cpus/threads:
>
> ```
> $ sudo dmesg --level=warn
> [ 0.216103] #2
> [ 0.220105] #3
> [ 0.224103] #4
> [ 0.228104] #5
> [ 0.232110] #6
> [ 0.236101] #7
> [ 0.240102] #8
> [ 0.244102] #9
> [ 0.248100] #10
> [ 0.252098] #11
> ```

Is this the exact output from sudo dmesg --level=warn?
It is strange that each CPU number is printed on itw own line.

Anyway, it might be affected by the new lockless ringbuffer.
The original code decided whether to connect the lines only by
"current" task pointer. The lockless ring buffer takes into account
also CPU number.

Well, it has never been reliable. For example, I see here:

<6>[ 0.238262][ T1] smp: Bringing up secondary CPUs ...
<6>[ 0.239340][ T1] x86: Booting SMP configuration:
<6>[ 0.239794][ T1] .... node #0, CPUs: #1
<6>[ 0.113946][ T0] kvm-clock: cpu 1, msr 6ba01041, secondary cpu clock
<6>[ 0.113946][ T0] smpboot: CPU 1 Converting physical 0 to logical die 1
<6>[ 0.246056][ T16] kvm-guest: stealtime: cpu 1, msr 17f9f2080
<4>[ 0.246679][ T1] #2
<6>[ 0.113946][ T0] kvm-clock: cpu 2, msr 6ba01081, secondary cpu clock
<6>[ 0.113946][ T0] smpboot: CPU 2 Converting physical 0 to logical die 2
<6>[ 0.250023][ T21] kvm-guest: stealtime: cpu 2, msr 17fbf2080
<4>[ 0.250648][ T1] #3
<6>[ 0.113946][ T0] kvm-clock: cpu 3, msr 6ba010c1, secondary cpu clock
<6>[ 0.113946][ T0] smpboot: CPU 3 Converting physical 0 to logical die 3
<6>[ 0.254026][ T26] kvm-guest: stealtime: cpu 3, msr 17fdf2080
<6>[ 0.254568][ T1] smp: Brought up 1 node, 4 CPUs
<6>[ 0.254597][ T1] smpboot: Max logical packages: 4
<6>[ 0.255097][ T1] smpboot: Total of 4 processors activated (16896.11 BogoMIPS)

There are another messages printed in between that obviously break pr_cont().

> If I am not mistaken, this is from `announce_cpu()` in
> `arch/x86/kernel/smpboot.c`, and the `pr_cont()` in their “forgets” the log
> level it belongs to, maybe because of SMP and other messages are printed in
> between.
>
> ```
> if (system_state < SYSTEM_RUNNING) {
> if (node != current_node) {
> if (current_node > (-1))
> pr_cont("\n");
> current_node = node;
>
> printk(KERN_INFO ".... node %*s#%d, CPUs: ",
> node_width - num_digits(node), " ", node);
> }
>
> /* Add padding for the BSP */
> if (cpu == 1)
> pr_cont("%*s", width + 1, " ");
>
> pr_cont("%*s#%d", width - num_digits(cpu), " ", cpu);
>
> } else
> pr_info("Booting Node %d Processor %d APIC 0x%x\n",
> node, cpu, apicid);
> ```

> Would using `pr_info()` instead be an acceptable fix?

Makes sense to me.

Also you should add '\n' into the previous string to make the behavior
clear. It will always be printed on a new line when pr_info()
is used.

Best Regards,
Petr

2021-02-16 10:20:05

by Borislav Petkov

[permalink] [raw]
Subject: Re: smpboot: CPU numbers printed as warning

On Tue, Feb 16, 2021 at 10:49:04AM +0100, Petr Mladek wrote:
> Also you should add '\n' into the previous string to make the behavior
> clear. It will always be printed on a new line when pr_info()
> is used.

This was made to use pr_cont() on purpose so that the output is compact,
for example:

[ 4.088605] x86: Booting SMP configuration:
[ 4.089511] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63
[ 4.188510] .... node #1, CPUs: #64 #65 #66 #67 #68 #69 #70 #71 #72 #73 #74 #75 #76 #77 #78 #79 #80 #81 #82 #83 #84 #85 #86 #87 #88 #89 #90 #91 #92 #93 #94 #95 #96 #97 #98 #99 #100 #101 #102 #103 #104 #105 #106 #107 #108 #109 #110 #111 #112 #113 #114 #115 #116 #117 #118 #119 #120 #121 #122 #123 #124 #125 #126 #127
[ 4.307511] .... node #0, CPUs: #128 #129 #130 #131 #132 #133 #134 #135 #136 #137 #138 #139 #140 #141 #142 #143 #144 #145 #146 #147 #148 #149 #150 #151 #152 #153 #154 #155 #156 #157 #158 #159 #160 #161 #162 #163 #164 #165 #166 #167 #168 #169 #170 #171 #172 #173 #174 #175 #176 #177 #178 #179 #180 #181 #182 #183 #184 #185 #186 #187 #188 #189 #190 #191
[ 4.416511] .... node #1, CPUs: #192 #193 #194 #195 #196 #197 #198 #199 #200 #201 #202 #203 #204 #205 #206 #207 #208 #209 #210 #211 #212 #213 #214 #215 #216 #217 #218 #219 #220 #221 #222 #223 #224 #225 #226 #227 #228 #229 #230 #231 #232 #233 #234 #235 #236 #237 #238 #239 #240 #241 #242 #243 #244 #245 #246 #247 #248 #249 #250 #251 #252 #253 #254 #255
[ 4.531683] smp: Brought up 2 nodes, 256 CPUs
[ 4.534510] smpboot: Max logical packages: 2
[ 4.535527] smpboot: Total of 256 processors activated (1147449.34 BogoMIPS)

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-02-16 10:42:07

by Paul Menzel

[permalink] [raw]
Subject: Re: smpboot: CPU numbers printed as warning

Dear Petr,


Thank you for the quick reply.


Am 16.02.21 um 10:49 schrieb Petr Mladek:
> On Mon 2021-02-15 20:22:34, Paul Menzel wrote:

>> Using Linux 5.10.13 (and before), looking at the Linux kernel warnings, the
>> CPU numbers show up. For example with 12 cpus/threads:
>>
>> ```
>> $ sudo dmesg --level=warn
>> [ 0.216103] #2
>> [ 0.220105] #3
>> [ 0.224103] #4
>> [ 0.228104] #5
>> [ 0.232110] #6
>> [ 0.236101] #7
>> [ 0.240102] #8
>> [ 0.244102] #9
>> [ 0.248100] #10
>> [ 0.252098] #11
>> ```
>
> Is this the exact output from sudo dmesg --level=warn?

Yes, it is.

> It is strange that each CPU number is printed on its own line.

Should it be put on one line, if `dmesg --level=warn` is executed, even
with other messages in between?

> Anyway, it might be affected by the new lockless ringbuffer.
> The original code decided whether to connect the lines only by
> "current" task pointer. The lockless ring buffer takes into account
> also CPU number.
>
> Well, it has never been reliable. For example, I see here:
>
> <6>[ 0.238262][ T1] smp: Bringing up secondary CPUs ...
> <6>[ 0.239340][ T1] x86: Booting SMP configuration:
> <6>[ 0.239794][ T1] .... node #0, CPUs: #1
> <6>[ 0.113946][ T0] kvm-clock: cpu 1, msr 6ba01041, secondary cpu clock
> <6>[ 0.113946][ T0] smpboot: CPU 1 Converting physical 0 to logical die 1
> <6>[ 0.246056][ T16] kvm-guest: stealtime: cpu 1, msr 17f9f2080
> <4>[ 0.246679][ T1] #2
> <6>[ 0.113946][ T0] kvm-clock: cpu 2, msr 6ba01081, secondary cpu clock
> <6>[ 0.113946][ T0] smpboot: CPU 2 Converting physical 0 to logical die 2
> <6>[ 0.250023][ T21] kvm-guest: stealtime: cpu 2, msr 17fbf2080
> <4>[ 0.250648][ T1] #3
> <6>[ 0.113946][ T0] kvm-clock: cpu 3, msr 6ba010c1, secondary cpu clock
> <6>[ 0.113946][ T0] smpboot: CPU 3 Converting physical 0 to logical die 3
> <6>[ 0.254026][ T26] kvm-guest: stealtime: cpu 3, msr 17fdf2080
> <6>[ 0.254568][ T1] smp: Brought up 1 node, 4 CPUs
> <6>[ 0.254597][ T1] smpboot: Max logical packages: 4
> <6>[ 0.255097][ T1] smpboot: Total of 4 processors activated (16896.11 BogoMIPS)
>
> There are another messages printed in between that obviously break pr_cont().

Yes, that is what I meant.

>> If I am not mistaken, this is from `announce_cpu()` in
>> `arch/x86/kernel/smpboot.c`, and the `pr_cont()` in their “forgets” the log
>> level it belongs to, maybe because of SMP and other messages are printed in
>> between.
>>
>> ```
>> if (system_state < SYSTEM_RUNNING) {
>> if (node != current_node) {
>> if (current_node > (-1))
>> pr_cont("\n");
>> current_node = node;
>>
>> printk(KERN_INFO ".... node %*s#%d, CPUs: ",
>> node_width - num_digits(node), " ", node);
>> }
>>
>> /* Add padding for the BSP */
>> if (cpu == 1)
>> pr_cont("%*s", width + 1, " ");
>>
>> pr_cont("%*s#%d", width - num_digits(cpu), " ", cpu);
>>
>> } else
>> pr_info("Booting Node %d Processor %d APIC 0x%x\n",
>> node, cpu, apicid);
>> ```
>
>> Would using `pr_info()` instead be an acceptable fix?
>
> Makes sense to me.
>
> Also you should add '\n' into the previous string to make the behavior
> clear. It will always be printed on a new line when pr_info()
> is used.

I am going to reply to Borislav’s response.


Kind regards,

Paul

2021-02-16 11:15:49

by John Ogness

[permalink] [raw]
Subject: Re: smpboot: CPU numbers printed as warning

On 2021-02-16, Borislav Petkov <[email protected]> wrote:
>> Also you should add '\n' into the previous string to make the behavior
>> clear. It will always be printed on a new line when pr_info()
>> is used.
>
> This was made to use pr_cont() on purpose so that the output is
> compact,

It is supported to provide loglevels for CONT messages. The loglevel is
then only used if the append fails:

pr_cont(KERN_INFO "message part");

I don't know if we want to go down that path. But it is supported.

John Ogness

2021-02-16 11:49:02

by Borislav Petkov

[permalink] [raw]
Subject: Re: smpboot: CPU numbers printed as warning

On Tue, Feb 16, 2021 at 12:17:24PM +0106, John Ogness wrote:
> It is supported to provide loglevels for CONT messages. The loglevel is
> then only used if the append fails:
>
> pr_cont(KERN_INFO "message part");
>
> I don't know if we want to go down that path. But it is supported.

Sounds ok to me. I leave it up to you guys to decide whether we wanna
change that there.

But I don't care that much about the CPU numbers ending up getting
issued at warning level so we can just as well do nothing and leave it
as is.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2021-02-16 12:00:36

by Paul Menzel

[permalink] [raw]
Subject: Re: smpboot: CPU numbers printed as warning

Dear Borislav, dear Petr,


Am 16.02.21 um 11:14 schrieb Borislav Petkov:
> On Tue, Feb 16, 2021 at 10:49:04AM +0100, Petr Mladek wrote:
>> Also you should add '\n' into the previous string to make the behavior
>> clear. It will always be printed on a new line when pr_info()
>> is used.
>
> This was made to use pr_cont() on purpose so that the output is compact,
> for example:
>
> [ 4.088605] x86: Booting SMP configuration:
> [ 4.089511] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63
> [ 4.188510] .... node #1, CPUs: #64 #65 #66 #67 #68 #69 #70 #71 #72 #73 #74 #75 #76 #77 #78 #79 #80 #81 #82 #83 #84 #85 #86 #87 #88 #89 #90 #91 #92 #93 #94 #95 #96 #97 #98 #99 #100 #101 #102 #103 #104 #105 #106 #107 #108 #109 #110 #111 #112 #113 #114 #115 #116 #117 #118 #119 #120 #121 #122 #123 #124 #125 #126 #127
> [ 4.307511] .... node #0, CPUs: #128 #129 #130 #131 #132 #133 #134 #135 #136 #137 #138 #139 #140 #141 #142 #143 #144 #145 #146 #147 #148 #149 #150 #151 #152 #153 #154 #155 #156 #157 #158 #159 #160 #161 #162 #163 #164 #165 #166 #167 #168 #169 #170 #171 #172 #173 #174 #175 #176 #177 #178 #179 #180 #181 #182 #183 #184 #185 #186 #187 #188 #189 #190 #191
> [ 4.416511] .... node #1, CPUs: #192 #193 #194 #195 #196 #197 #198 #199 #200 #201 #202 #203 #204 #205 #206 #207 #208 #209 #210 #211 #212 #213 #214 #215 #216 #217 #218 #219 #220 #221 #222 #223 #224 #225 #226 #227 #228 #229 #230 #231 #232 #233 #234 #235 #236 #237 #238 #239 #240 #241 #242 #243 #244 #245 #246 #247 #248 #249 #250 #251 #252 #253 #254 #255
> [ 4.531683] smp: Brought up 2 nodes, 256 CPUs
> [ 4.534510] smpboot: Max logical packages: 2
> [ 4.535527] smpboot: Total of 256 processors activated (1147449.34 BogoMIPS)

Yes, the intention is clear, but it’s not working perfectly in all
situations. Any ideas, how to improve that? After reading John’s
response, I’d go with `pr_cont(KERN_INFO "message part");`.

By the way, what are these CPU numbers useful for? Isn’t

smp: Brought up 2 nodes, 256 CPUs

enough information, and nothing else needed for the majority of users?


Kind regards,

Paul

2021-02-16 12:38:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: smpboot: CPU numbers printed as warning

On Tue, Feb 16, 2021 at 12:58:19PM +0100, Paul Menzel wrote:
> Dear Borislav, dear Petr,
>
>
> Am 16.02.21 um 11:14 schrieb Borislav Petkov:
> > On Tue, Feb 16, 2021 at 10:49:04AM +0100, Petr Mladek wrote:
> > > Also you should add '\n' into the previous string to make the behavior
> > > clear. It will always be printed on a new line when pr_info()
> > > is used.
> >
> > This was made to use pr_cont() on purpose so that the output is compact,
> > for example:
> >
> > [ 4.088605] x86: Booting SMP configuration:
> > [ 4.089511] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63
> > [ 4.188510] .... node #1, CPUs: #64 #65 #66 #67 #68 #69 #70 #71 #72 #73 #74 #75 #76 #77 #78 #79 #80 #81 #82 #83 #84 #85 #86 #87 #88 #89 #90 #91 #92 #93 #94 #95 #96 #97 #98 #99 #100 #101 #102 #103 #104 #105 #106 #107 #108 #109 #110 #111 #112 #113 #114 #115 #116 #117 #118 #119 #120 #121 #122 #123 #124 #125 #126 #127
> > [ 4.307511] .... node #0, CPUs: #128 #129 #130 #131 #132 #133 #134 #135 #136 #137 #138 #139 #140 #141 #142 #143 #144 #145 #146 #147 #148 #149 #150 #151 #152 #153 #154 #155 #156 #157 #158 #159 #160 #161 #162 #163 #164 #165 #166 #167 #168 #169 #170 #171 #172 #173 #174 #175 #176 #177 #178 #179 #180 #181 #182 #183 #184 #185 #186 #187 #188 #189 #190 #191
> > [ 4.416511] .... node #1, CPUs: #192 #193 #194 #195 #196 #197 #198 #199 #200 #201 #202 #203 #204 #205 #206 #207 #208 #209 #210 #211 #212 #213 #214 #215 #216 #217 #218 #219 #220 #221 #222 #223 #224 #225 #226 #227 #228 #229 #230 #231 #232 #233 #234 #235 #236 #237 #238 #239 #240 #241 #242 #243 #244 #245 #246 #247 #248 #249 #250 #251 #252 #253 #254 #255
> > [ 4.531683] smp: Brought up 2 nodes, 256 CPUs
> > [ 4.534510] smpboot: Max logical packages: 2
> > [ 4.535527] smpboot: Total of 256 processors activated (1147449.34 BogoMIPS)
>
> Yes, the intention is clear, but it’s not working perfectly in all
> situations. Any ideas, how to improve that? After reading John’s response,
> I’d go with `pr_cont(KERN_INFO "message part");`.
>
> By the way, what are these CPU numbers useful for? Isn’t
>
> smp: Brought up 2 nodes, 256 CPUs
>
> enough information, and nothing else needed for the majority of users?

Majority yes, the primary use case is knowing which CPU is failing to
come up. Hardly ever happens, but when it does, its bloody annoying to
not know :-)

Also, the majority of people only has a hand full of CPUs, so they don't
care either way.