Hi!
While working on a sane topology evaluation mechanism, which addresses the
short-comings of the existing tragedy held together with duct-tape and
hay-wire, I ran into the issue that quite some of this tragedy is deeply
embedded in the APIC code and uses an impenetrable maze of callbacks which
might or might not be correct at the point where the CPUs are registered
via MPPARSE or ACPI/MADT.
This made me look deeper and the findings were anything but pretty.
Redundant per CPU variables, completely unused code, needless complexity
all over the place. The most amazing gem was:
physid_mask_t tmp; // 32bytes on stack
apic->magic(&tmp, bit); // Zeros tmp and sets bit
physids_or(real_map, real_map, tmp);
Definitely hard to come up with a more complex way of setting a bit in a
bitmap. Followed suit by the evaluation of the boot cpu APIC ID which
consists of more hacks than sensible code.
So I stopped working on the topology stuff and decided to do an overhaul of
the APIC code first. Cleaning up old gunk which dates back to the early SMP
days, making the CPU registration halfways understandable and then going
through all APIC callbacks to figure out what they actually do and whether
they are required at all. There is also quite some overhead through the
indirect calls and some of them are actually even pointlessly indirected
twice. At some point Peter yelled static_call() at me and that's what I
finally ended up implementing.
This builds and boots on 32bit and 64bit, but obviously needs a larger test
base especially on those old 32bit systems which are just museum pieces.
I have neither evaluated whether this has a measurable impact, but that's
something I leave to the perfomance teams. Definitely less indirect calls
in hotpaths are a win by definition.
Talking about those museums pieces and the related historic maze, I really
have to bring up the question again, whether we should finally kill support
for the museum CPUs and move on.
Ideally we remove 32bit support alltogether. I know the answer... :(
But what I really want to do is to make x86 SMP only. The amount of
#ifdeffery and hacks to keep the UP support alive is amazing. And we do this
just for the sake that it runs on some 25+ years old hardware for absolutely
zero value. It'd be not the first architecture to go SMP=y.
Yes, we "support" Alpha, PARISC, Itanic and other oddballs too, but that's
completely different. They are not getting new hardware every other day and
the main impact on the kernel as a whole is mostly static. They are
sometimes in the way of generalizing things in the core code. Other than
that their architecture code is self contained and they can tinker on it as
they see fit or let it slowly bitrot like Itanic.
But x86 is (still) alive and being extended and expanded. That means that
any refactoring of common infrastructure has to take the broken hardware
museum into account. It's doable, but it's not pretty and of really
questionable value. I wouldn't mind if there were a bunch of museum
attendants actively working on it with taste, but that's obviously wishful
thinking. We are even short of people with taste who work on contemporary
hardware support...
While I cursed myself at some point during this work for having merged
i386/x86_64 back then, I still think that it was the correct decision at
that point in time and saved us a lot of trouble. It admittedly added some
trouble which we would not have now, but it avoided the insanity of having
to maintain two trees with different bugs and "fixes" for the very same
problems. TBH quite some of the horrors which I just removed came out of
the x86/64 side. The oddballs of i386 early SMP support are a horror on
their own of course.
As we made that decision more than 15 years [!] ago, it's about time to make
new decisions.
Vented enough.
I'm sure that I broke things on the way, but we can't just continue with
the current mess and add duct tape over hay-wire over duct-tape over
hay-wire forever. At some point we need to bite the bullet and get rid
of the historical nonsense even if it's painful. That point is now.
So 58 patches and a lot of cursing later:
58 files changed, 744 insertions(+), 1348 deletions(-)
Despite adding the new static call mechanics this endeavour deletes 600
lines of hilarities. There are more of those, but they need to be addressed
separately. Quite some of them in course of the topology evaluation rework,
which so far sports a negative diffstat too.
Now I need a break and a stiff drink to get rid of the bad taste and the
nightmares caused by this.
The series is also available from git:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/apic
Thanks,
tglx
---
hyperv/hv_apic.c | 26 +-
hyperv/hv_init.c | 2
hyperv/hv_spinlock.c | 2
hyperv/hv_vtl.c | 2
include/asm/apic.h | 250 +++++++++++++----------
include/asm/io_apic.h | 7
include/asm/mpspec.h | 28 --
include/asm/processor.h | 1
include/asm/smp.h | 11 -
kernel/acpi/boot.c | 12 -
kernel/apic/Makefile | 2
kernel/apic/apic.c | 453 +++++++++++++------------------------------
kernel/apic/apic_common.c | 21 +
kernel/apic/apic_flat_64.c | 80 +------
kernel/apic/apic_noop.c | 91 +-------
kernel/apic/apic_numachip.c | 50 ----
kernel/apic/bigsmp_32.c | 89 +-------
kernel/apic/hw_nmi.c | 4
kernel/apic/init.c | 101 +++++++++
kernel/apic/io_apic.c | 30 +-
kernel/apic/ipi.c | 176 +++++++---------
kernel/apic/local.h | 30 ++
kernel/apic/msi.c | 2
kernel/apic/probe_32.c | 117 ++---------
kernel/apic/probe_64.c | 18 -
kernel/apic/vector.c | 16 -
kernel/apic/x2apic_cluster.c | 23 --
kernel/apic/x2apic_phys.c | 74 ++-----
kernel/apic/x2apic_uv_x.c | 51 ----
kernel/cpu/acrn.c | 2
kernel/cpu/amd.c | 2
kernel/cpu/common.c | 2
kernel/cpu/hygon.c | 3
kernel/cpu/mce/amd.c | 2
kernel/cpu/mce/inject.c | 3
kernel/cpu/mce/threshold.c | 2
kernel/cpu/mshyperv.c | 4
kernel/devicetree.c | 21 -
kernel/irq.c | 14 -
kernel/irq_work.c | 4
kernel/jailhouse.c | 6
kernel/kvm.c | 12 -
kernel/mpparse.c | 6
kernel/nmi_selftest.c | 2
kernel/setup.c | 6
kernel/setup_percpu.c | 10
kernel/sev.c | 2
kernel/smp.c | 10
kernel/smpboot.c | 115 ----------
kernel/vsmp_64.c | 2
kvm/vmx/posted_intr.c | 2
kvm/vmx/vmx.c | 2
mm/srat.c | 5
pci/xen.c | 2
platform/uv/uv_nmi.c | 2
xen/apic.c | 76 ++-----
xen/enlighten_hvm.c | 2
xen/smp_pv.c | 2
58 files changed, 744 insertions(+), 1348 deletions(-)
On Mon, 17 Jul 2023 at 16:14, Thomas Gleixner <[email protected]> wrote:
>
> But what I really want to do is to make x86 SMP only.
I don't hate the notion, but it would make our UP coverage much worse
for other targets.
We already have weak coverage of UP builds anyway, since no sane
developer uses UP. But if we make UP not even be an option on x86,
then that coverage goes from bad to abysmal.
That said, I already floated dropping i486 support a couple of years
ago. If what you *really* want is "unconditional APIC support", then
that would be it, no?
So I don't like "force SMP" from a coverage standpoint. But if the
pain point is "we support machines that don't even have an APIC at
all", *that* I think we could just decide to do.
Hmm?
Anyway, the series looks good to me. I did have one reaction, but that
is probably due to my own confusion.
Linus
On Tue, Jul 18, 2023 at 01:14:33AM +0200, Thomas Gleixner wrote:
> So 58 patches and a lot of cursing later:
Hehe, and you've not even posted the topology bits yet :-)
> 58 files changed, 744 insertions(+), 1348 deletions(-)
(add another 24 lines of comments, and we have 58 patches, 58 files
changed and 580 lines removed)
Acked-by: Peter Zijlstra (Intel) <[email protected]>
On Tue, Jul 18, 2023 at 1:14 AM Thomas Gleixner <[email protected]> wrote:
> This builds and boots on 32bit and 64bit, but obviously needs a larger test
> base especially on those old 32bit systems which are just museum pieces.
These things are indeed museum pieces if you think servers, desktops
and laptops. They will at max be glorified terminals.
What we noticed on ARM32 is that it used for:
1. Running 32-bit kernels as guests in virtual machines (I don't know if
x86 has this problem, sorry I'm ignorant there)
2. Embedded systems with very long support cycles
For x86 there is PC104, I think William Breathitt Gray knows more about
those, scope and usage etc. The typical usecase is industrial embedded
(I've seen quite a few e.g biochemical lab equipment set-ups) which are
running on a "it works don't fix it"-basis but they are network connected
so they may need new kernels for security reasons, or to fix bugs.
https://en.wikipedia.org/wiki/PC/104
These things have lifecycles that easily outspans any server, desktop or
laptop. 30+ years easily. They are just sitting there, making whatever
blood cleaning agent or medical.
I think the automation people have mostly switched over to using
ARM things such as RaspberryXYZ for new plants, but there is some
poor guy with the job of keeping all the PC104 plants running on recent
kernels for the next 20 years or so.
Yours,
Linus Walleij
On 18/07/2023 01:14, Thomas Gleixner wrote:
> Talking about those museums pieces and the related historic maze, I really
> have to bring up the question again, whether we should finally kill support
> for the museum CPUs and move on.
>
> Ideally we remove 32bit support altogether. I know the answer... :(
Hello Thomas,
For what it's worth, there are a few millions of these in the field:
# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 28
model name : Intel(R) Atom(TM) CPU CE4150 @ 1.20GHz
stepping : 10
microcode : 0x106
cpu MHz : 1199.885
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fdiv_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts cpuid aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm tpr_shadow vnmi flexpriority vpid dtherm
vmx flags : vnmi flexpriority tsc_offset vtpr vapic
bugs :
bogomips : 2400.76
clflush size : 64
cache_alignment : 64
address sizes : 32 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 28
model name : Intel(R) Atom(TM) CPU CE4150 @ 1.20GHz
stepping : 10
microcode : 0x106
cpu MHz : 1200.188
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
apicid : 1
initial apicid : 1
fdiv_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts cpuid aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm
bugs :
bogomips : 2400.76
clflush size : 64
cache_alignment : 64
address sizes : 32 bits physical, 48 bits virtual
power management:
# uname -a
Linux foo 5.15.42+ #182 SMP PREEMPT Mon Jul 17 09:41:27 UTC 2023 i686 GNU/Linux
They will probably be running 6.1 in a few months.
Regards
On Thu, Jul 20, 2023 at 02:43:55PM +0200, Marc Gonzalez wrote:
> On 18/07/2023 01:14, Thomas Gleixner wrote:
> > Ideally we remove 32bit support altogether. I know the answer... :(
>
> Hello Thomas,
>
> For what it's worth, there are a few millions of these in the field:
>
> # cat /proc/cpuinfo
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 28
> model name : Intel(R) Atom(TM) CPU CE4150 @ 1.20GHz
> stepping : 10
> microcode : 0x106
> cpu MHz : 1199.885
> cache size : 512 KB
> physical id : 0
> siblings : 2
> core id : 0
> cpu cores : 1
> apicid : 0
> initial apicid : 0
> fdiv_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 10
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts cpuid aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm tpr_shadow vnmi flexpriority vpid dtherm
> vmx flags : vnmi flexpriority tsc_offset vtpr vapic
> bugs :
> bogomips : 2400.76
> clflush size : 64
> cache_alignment : 64
> address sizes : 32 bits physical, 48 bits virtual
> power management:
But that's a 64bit chip, no? lm, cx16
On Thu, Jul 20 2023 at 15:13, Peter Zijlstra wrote:
> On Thu, Jul 20, 2023 at 02:43:55PM +0200, Marc Gonzalez wrote:
>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts cpuid aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm tpr_shadow vnmi flexpriority vpid dtherm
>> vmx flags : vnmi flexpriority tsc_offset vtpr vapic
>> bugs :
>> bogomips : 2400.76
>> clflush size : 64
>> cache_alignment : 64
>> address sizes : 32 bits physical, 48 bits virtual
>> power management:
>
> But that's a 64bit chip, no? lm, cx16
The fun is that this is one of those chips which are per technical
specification not supporting long mode. They advertise long mode in
CPUID and it actually works. There are quite a few ATOM models out there
which have the same "feature".
Thansk,
tglx
On 20/07/2023 15:13, Peter Zijlstra wrote:
> On Thu, Jul 20, 2023 at 02:43:55PM +0200, Marc Gonzalez wrote:
>
>> # cat /proc/cpuinfo
>> processor : 0
>> vendor_id : GenuineIntel
>> cpu family : 6
>> model : 28
>> model name : Intel(R) Atom(TM) CPU CE4150 @ 1.20GHz
>> stepping : 10
>> microcode : 0x106
>> cpu MHz : 1199.885
>> cache size : 512 KB
>> physical id : 0
>> siblings : 2
>> core id : 0
>> cpu cores : 1
>> apicid : 0
>> initial apicid : 0
>> fdiv_bug : no
>> f00f_bug : no
>> coma_bug : no
>> fpu : yes
>> fpu_exception : yes
>> cpuid level : 10
>> wp : yes
>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts cpuid aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm tpr_shadow vnmi flexpriority vpid dtherm
>> vmx flags : vnmi flexpriority tsc_offset vtpr vapic
>> bugs :
>> bogomips : 2400.76
>> clflush size : 64
>> cache_alignment : 64
>> address sizes : 32 bits physical, 48 bits virtual
>
> But that's a 64bit chip, no? lm, cx16
Hol'up. A 64b chip with 32b physical addresses?
Only for the additional registers then?
https://www.cpu-world.com/CPUs/Atom/Intel-Atom%20CE4150.html
https://www.techpowerup.com/cpu-specs/atom-ce4150.c1440
I'm 99% sure it's running a 32b kernel.
Are you saying a 64b kernel would work?
(Well, there are several binary blobs, so no way.)
Regards
On Tue, Jul 18, 2023 at 04:29:23PM +0200, Linus Walleij wrote:
> On Tue, Jul 18, 2023 at 1:14 AM Thomas Gleixner <[email protected]> wrote:
>
> > This builds and boots on 32bit and 64bit, but obviously needs a larger test
> > base especially on those old 32bit systems which are just museum pieces.
>
> These things are indeed museum pieces if you think servers, desktops
> and laptops. They will at max be glorified terminals.
>
> What we noticed on ARM32 is that it used for:
> 1. Running 32-bit kernels as guests in virtual machines (I don't know if
> x86 has this problem, sorry I'm ignorant there)
> 2. Embedded systems with very long support cycles
>
> For x86 there is PC104, I think William Breathitt Gray knows more about
> those, scope and usage etc. The typical usecase is industrial embedded
> (I've seen quite a few e.g biochemical lab equipment set-ups) which are
> running on a "it works don't fix it"-basis but they are network connected
> so they may need new kernels for security reasons, or to fix bugs.
> https://en.wikipedia.org/wiki/PC/104
>
> These things have lifecycles that easily outspans any server, desktop or
> laptop. 30+ years easily. They are just sitting there, making whatever
> blood cleaning agent or medical.
>
> I think the automation people have mostly switched over to using
> ARM things such as RaspberryXYZ for new plants, but there is some
> poor guy with the job of keeping all the PC104 plants running on recent
> kernels for the next 20 years or so.
>
> Yours,
> Linus Walleij
It's true that there a still a good number of PC104 setups still running
out there in the manufacturing sector. However, it should be noted that
these are typically systems that are configured and set once, left to
run indefinitely doing their specific manufacturing task until the
machines invariably break down from wear a decade or so later.
It's rare for the software of these systems to be updated; where a
machine fails, the owner will usually repair or replace the particular
mechanical component and reload that same ancient software they have
been using for years. The cases where software is updated may be out of
necessity to support a replacement device for a component that is no
longer in production. In these situations, you would find newer PC104
devices to fill that gap: where compatibility is needed with the ancient
core machine featuring only an ISA bus, but which the plant owner
doesn't want to throw away because "it still runs just fine with a
little spit shining."
Perhaps some years ago I would have said there was still demand for
PC104 support, but now with the motherboards of these older systems
finally failing due to age, the owners of these machines are forced to
upgrade to something newer. As mentioned, I've also seen a general trend
in this sector to move towards ARM products, perhaps out of a desire for
lower power consumption or maybe their industrial line of features.
Overall I don't see much future for PC104 in newer kernels because as
the systems using it fail, users are switching to platforms without it.
William Breathitt Gray