2019-08-14 04:01:35

by Bandan Das

[permalink] [raw]
Subject: [PATCH] x86/apic: reset LDR in clear_local_APIC


On a 32 bit RHEL6 guest with greater than 8 cpus, the
kdump kernel hangs when calibrating apic. This happens
because when apic initializes bigsmp, it also initializes LDR
even though it probably wouldn't be used.

When booting into kdump, KVM apic incorrectly reads the stale LDR
values from the guest while building the logical destination map
even for inactive vcpus. While KVM apic can be fixed to ignore apics
that haven't been enabled, a simple guest only change can be to
just clear out the LDR.

Signed-off-by: Bandan Das <[email protected]>
---
arch/x86/kernel/apic/apic.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index f5291362da1a..6b9511dc9157 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1141,6 +1141,10 @@ void clear_local_APIC(void)
apic_write(APIC_LVT0, v | APIC_LVT_MASKED);
v = apic_read(APIC_LVT1);
apic_write(APIC_LVT1, v | APIC_LVT_MASKED);
+ if (!x2apic_enabled()) {
+ v = apic_read(APIC_LDR) & ~APIC_LDR_MASK;
+ apic_write(APIC_LDR, v);
+ }
if (maxlvt >= 4) {
v = apic_read(APIC_LVTPC);
apic_write(APIC_LVTPC, v | APIC_LVT_MASKED);
--
2.20.1


2019-08-19 21:08:36

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/apic: reset LDR in clear_local_APIC

Bandan,

On Wed, 14 Aug 2019, Bandan Das wrote:
> On a 32 bit RHEL6 guest with greater than 8 cpus, the
> kdump kernel hangs when calibrating apic. This happens
> because when apic initializes bigsmp, it also initializes LDR
> even though it probably wouldn't be used.

'It probably wouldn't be used' is a not really a useful technical
statement.

Either it is used, then it needs to be handled. Or it is unused then why is
it written in the first place?

The bigsmp APIC uses physical destination mode because the logical flat
model only supports 8 APICs. So clearly bigsmp_init_apic_ldr() is bogus and
should be empty.

> When booting into kdump, KVM apic incorrectly reads the stale LDR
> values from the guest while building the logical destination map
> even for inactive vcpus. While KVM apic can be fixed to ignore apics
> that haven't been enabled, a simple guest only change can be to
> just clear out the LDR.

This does not make much sense either. What has KVM to do with logical
destination maps while booting the kdump kernel? The kdump kernel is not
going through the regular cold/warm boot process, so KVM does not even know
that the crashing kernel jumped into the kdump one.

What builds the logical destination maps and what has LDR and the KVM APIC
to do with that?

I'm not opposed to the change per se, but I'm not accepting change logs
which have the fairy tale smell.

Thanks,

tglx

2019-08-19 22:07:15

by Bandan Das

[permalink] [raw]
Subject: Re: [PATCH] x86/apic: reset LDR in clear_local_APIC

Hi Thomas,

Thomas Gleixner <[email protected]> writes:

> Bandan,
>
> On Wed, 14 Aug 2019, Bandan Das wrote:
>> On a 32 bit RHEL6 guest with greater than 8 cpus, the
>> kdump kernel hangs when calibrating apic. This happens
>> because when apic initializes bigsmp, it also initializes LDR
>> even though it probably wouldn't be used.
>
> 'It probably wouldn't be used' is a not really a useful technical
> statement.
>
> Either it is used, then it needs to be handled. Or it is unused then why is
> it written in the first place?
>
> The bigsmp APIC uses physical destination mode because the logical flat
> model only supports 8 APICs. So clearly bigsmp_init_apic_ldr() is bogus and
> should be empty.
>

Your note above is what I meant by "it probably wouldn't be used" because
I don't have much insight into the history behind why LDR is being initialized
in the first place. The only evidence I found is a comment in apic.c that states:
/*
* Intel recommends to set DFR, LDR and TPR before enabling
* an APIC. See e.g. "AP-388 82489DX User's Manual" (Intel
* document number 292116). So here it goes...
*/
That said, not initalizing the ldr in bigsmp_init_apic_ldr() should be
enough to fix this. Would you prefer that change instead ?

>> When booting into kdump, KVM apic incorrectly reads the stale LDR
>> values from the guest while building the logical destination map
>> even for inactive vcpus. While KVM apic can be fixed to ignore apics
>> that haven't been enabled, a simple guest only change can be to
>> just clear out the LDR.
>
> This does not make much sense either. What has KVM to do with logical
> destination maps while booting the kdump kernel? The kdump kernel is not

This is the guest kernel and KVM takes care of injecting the interrupt to
the right vcpu (recalculate_apic_map)() in lapic.c).

For the KVM side change, please take a look at
https://lore.kernel.org/kvm/[email protected]/

> going through the regular cold/warm boot process, so KVM does not even know
> that the crashing kernel jumped into the kdump one.
>
> What builds the logical destination maps and what has LDR and the KVM APIC
> to do with that?
>
> I'm not opposed to the change per se, but I'm not accepting change logs
> which have the fairy tale smell.
>
Heh, no it's not.

> Thanks,
>
> tglx

2019-08-19 23:11:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/apic: reset LDR in clear_local_APIC

Bandan,

On Mon, 19 Aug 2019, Bandan Das wrote:
> Thomas Gleixner <[email protected]> writes:
> > On Wed, 14 Aug 2019, Bandan Das wrote:
> >> On a 32 bit RHEL6 guest with greater than 8 cpus, the
> >> kdump kernel hangs when calibrating apic. This happens
> >> because when apic initializes bigsmp, it also initializes LDR
> >> even though it probably wouldn't be used.
> >
> > 'It probably wouldn't be used' is a not really a useful technical
> > statement.
> >
> > Either it is used, then it needs to be handled. Or it is unused then why is
> > it written in the first place?
> >
> > The bigsmp APIC uses physical destination mode because the logical flat
> > model only supports 8 APICs. So clearly bigsmp_init_apic_ldr() is bogus and
> > should be empty.
> >
>
> Your note above is what I meant by "it probably wouldn't be used" because
> I don't have much insight into the history behind why LDR is being initialized
> in the first place. The only evidence I found is a comment in apic.c that states:
> /*
> * Intel recommends to set DFR, LDR and TPR before enabling
> * an APIC. See e.g. "AP-388 82489DX User's Manual" (Intel
> * document number 292116). So here it goes...
> */

The physflat stuff is documented in the SDM and in the APIC code
(apic_flat_64.c):

static void physflat_init_apic_ldr(void)
{
/*
* LDR and DFR are not involved in physflat mode, rather:
* "In physical destination mode, the destination processor is
* specified by its local APIC ID [...]." (Intel SDM, 10.6.2.1)
*/
}

Why is LDR initialized in the bigsmp code? Probably histerical raisins and
I'm just too tired to consult the history git trees for an answer.

> That said, not initalizing the ldr in bigsmp_init_apic_ldr() should be
> enough to fix this. Would you prefer that change instead ?

That's surely something we want to get rid off. But for sanity sake we
should clear LDR as well after understanding it completely.

> >> When booting into kdump, KVM apic incorrectly reads the stale LDR
> >> values from the guest while building the logical destination map
> >> even for inactive vcpus. While KVM apic can be fixed to ignore apics
> >> that haven't been enabled, a simple guest only change can be to
> >> just clear out the LDR.
> >
> > This does not make much sense either. What has KVM to do with logical
> > destination maps while booting the kdump kernel? The kdump kernel is not
>
> This is the guest kernel and KVM takes care of injecting the interrupt to
> the right vcpu (recalculate_apic_map)() in lapic.c).

Yeah. I know that KVM injects interrupts. Still that does not explain the
issue properly.

The point is that when the kdump kernel boots in the guest and uses logical
destination mode then it will overwrite LDR _BEFORE_ the local APIC timer
calibration takes place. So no, I'm not bying this. Just because it makes
your problem disappear does not mean it's the proper explanation.

> For the KVM side change, please take a look at
> https://lore.kernel.org/kvm/[email protected]/

That's the same text in diffferent form and not conclusive either.

> > going through the regular cold/warm boot process, so KVM does not even know
> > that the crashing kernel jumped into the kdump one.
> >
> > What builds the logical destination maps and what has LDR and the KVM APIC
> > to do with that?
> >
> > I'm not opposed to the change per se, but I'm not accepting change logs
> > which have the fairy tale smell.
> >
> Heh, no it's not.

Well, it's not an accurate technical description of the root cause either :)

Thanks,

tglx

2019-08-21 15:54:44

by Bandan Das

[permalink] [raw]
Subject: Re: [PATCH] x86/apic: reset LDR in clear_local_APIC

Thomas Gleixner <[email protected]> writes:

> Bandan,
>
> On Mon, 19 Aug 2019, Bandan Das wrote:
>> Thomas Gleixner <[email protected]> writes:
>> > On Wed, 14 Aug 2019, Bandan Das wrote:
>> >> On a 32 bit RHEL6 guest with greater than 8 cpus, the
>> >> kdump kernel hangs when calibrating apic. This happens
>> >> because when apic initializes bigsmp, it also initializes LDR
>> >> even though it probably wouldn't be used.
>> >
>> > 'It probably wouldn't be used' is a not really a useful technical
>> > statement.
>> >
>> > Either it is used, then it needs to be handled. Or it is unused then why is
>> > it written in the first place?
>> >
>> > The bigsmp APIC uses physical destination mode because the logical flat
>> > model only supports 8 APICs. So clearly bigsmp_init_apic_ldr() is bogus and
>> > should be empty.
>> >
>>
>> Your note above is what I meant by "it probably wouldn't be used" because
>> I don't have much insight into the history behind why LDR is being initialized
>> in the first place. The only evidence I found is a comment in apic.c that states:
>> /*
>> * Intel recommends to set DFR, LDR and TPR before enabling
>> * an APIC. See e.g. "AP-388 82489DX User's Manual" (Intel
>> * document number 292116). So here it goes...
>> */
>
> The physflat stuff is documented in the SDM and in the APIC code
> (apic_flat_64.c):
>
> static void physflat_init_apic_ldr(void)
> {
> /*
> * LDR and DFR are not involved in physflat mode, rather:
> * "In physical destination mode, the destination processor is
> * specified by its local APIC ID [...]." (Intel SDM, 10.6.2.1)
> */
> }
>
> Why is LDR initialized in the bigsmp code? Probably histerical raisins and
> I'm just too tired to consult the history git trees for an answer.
>
>> That said, not initalizing the ldr in bigsmp_init_apic_ldr() should be
>> enough to fix this. Would you prefer that change instead ?
>
> That's surely something we want to get rid off. But for sanity sake we
> should clear LDR as well after understanding it completely.
>
>> >> When booting into kdump, KVM apic incorrectly reads the stale LDR
>> >> values from the guest while building the logical destination map
>> >> even for inactive vcpus. While KVM apic can be fixed to ignore apics
>> >> that haven't been enabled, a simple guest only change can be to
>> >> just clear out the LDR.
>> >
>> > This does not make much sense either. What has KVM to do with logical
>> > destination maps while booting the kdump kernel? The kdump kernel is not
>>
>> This is the guest kernel and KVM takes care of injecting the interrupt to
>> the right vcpu (recalculate_apic_map)() in lapic.c).
>
> Yeah. I know that KVM injects interrupts. Still that does not explain the
> issue properly.
>
> The point is that when the kdump kernel boots in the guest and uses logical
> destination mode then it will overwrite LDR _BEFORE_ the local APIC timer
> calibration takes place. So no, I'm not bying this. Just because it makes
It will but only for 1 cpu which is the boot cpu since kdump will usually be run with
nr_cpus=1.

> your problem disappear does not mean it's the proper explanation.
>
Let me try this again -
1. We boot a 16 vcpu guest, the guest kernel writes the LDR for the CPUs but
ofcourse, PhysFlat is always used.

2. We force a kdump crash - the kdump kernel boots with nr_cpus=1 but that does not
make KVM forget about the inactive vcpus. They are still in the vcpu list but not
active.

3. Before jumping into the kdump kernel, the guest kernel does not clear the LDR bits.

4. In the kdump kernel, the guest only overwrites the LDR for the boot cpu i.e from
KVM's point of view, the stale LDR values are still around for the inactive vcpus.

5. recalculate_apic_map() in its previous form (before the kvm patch linked above) did
not check whether the virtual apic is actually enabled; rather, if it finds any value in
the LDR, it will keep the cpu in the mapping table it maintains. However, it makes the
assumption that only one bit in the LDR is set which is not true for bigsmp_init_apic_ldr() -
the way it initializes the LDR, multiple bits can be set! Since recalculate_apic_map uses
ffs() it just checks for the first set bit and assumes that other bits are set and potentially
overwrites and existing entry.

For example, let's assume the kdump kernel boots on CPU 1. So, in recalculate_apic_map(),
mask is 1 and ffs(mask) - 1 is 0 and so, cluster[ffs(mask) - 1] = apic; writes 1 to
cluster[0]. When iterating through all the vcpus, the next in line is vcpu2 which still has
a stale LDR value. So, the function will again incorrectly overwrite cluster[0] = 2. Since 2
is inactive, the timer calibration interrupt ends up nowhere.

So, in KVM: if we make sure that the logical destination map isn't filled up if the virtual
apic is not enabled by software, it really doesn't matter whether the LDR for an inactive CPU
has a stale value.

In x86/apic: if we make sure that the LDR is 0 or reset, recalculate_apic_map() will never consider
including this cpu in the logical map.

In short, as I mentioned in the patch description, this is really a KVM bug but it doesn't hurt
to clear out the LDR in the guest and then, it wouldn't need a hypervisor fix.

Is this better ?


>> For the KVM side change, please take a look at
>> https://lore.kernel.org/kvm/[email protected]/
>
> That's the same text in diffferent form and not conclusive either.
>
>> > going through the regular cold/warm boot process, so KVM does not even know
>> > that the crashing kernel jumped into the kdump one.
>> >
>> > What builds the logical destination maps and what has LDR and the KVM APIC
>> > to do with that?
>> >
>> > I'm not opposed to the change per se, but I'm not accepting change logs
>> > which have the fairy tale smell.
>> >
>> Heh, no it's not.
>
> Well, it's not an accurate technical description of the root cause either :)
>
> Thanks,

>
> tglx

2019-08-21 18:36:19

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/apic: reset LDR in clear_local_APIC

Bandan,

On Wed, 21 Aug 2019, Bandan Das wrote:
> Thomas Gleixner <[email protected]> writes:
> So, in KVM: if we make sure that the logical destination map isn't filled up if the virtual
> apic is not enabled by software, it really doesn't matter whether the LDR for an inactive CPU
> has a stale value.
>
> In x86/apic: if we make sure that the LDR is 0 or reset,
> recalculate_apic_map() will never consider including this cpu in the
> logical map.
?
> In short, as I mentioned in the patch description, this is really a KVM
> bug but it doesn't hurt to clear out the LDR in the guest and then, it
> wouldn't need a hypervisor fix.

I still needs a hypervisor fix. Taking disabled APICs into account is a bug
which has also other consequeces than that particular one. So please don't
claim that. It's wrong.

If that prevents the APIC bug from triggering on unfixed hypervisors, then
this is a nice side effect, but not a solution.

> Is this better ?

That's way better.

So can you please create two patches:

1) Make that bogus bigsmp ldr init empty

That one wants a changelog along these lines:

- Setting LDR for physical destination mode is pointless
- Setting multiple bits in the LDR is wrong

Mention how this was discovered and caused the KVM APIC bug to be
triggered. Also mention that the change is not there to paper over
the KVM APIC bug. The change fixes a bug in the bigsmp APIC code.

2) Clear LDR in in that apic reset function

That one wants a changelog along these lines:

- Except for x2apic the LDR should be cleared as any other APIC
register

Mention how this was discovered. Again the change is not there to
paper over the KVM APIC bug. It's for correctness sake and valid on
its own.

Thanks,

tglx


2019-08-21 19:58:04

by Bandan Das

[permalink] [raw]
Subject: Re: [PATCH] x86/apic: reset LDR in clear_local_APIC

Thomas Gleixner <[email protected]> writes:

> Bandan,
>
> On Wed, 21 Aug 2019, Bandan Das wrote:
>> Thomas Gleixner <[email protected]> writes:
>> So, in KVM: if we make sure that the logical destination map isn't filled up if the virtual
>> apic is not enabled by software, it really doesn't matter whether the LDR for an inactive CPU
>> has a stale value.
>>
>> In x86/apic: if we make sure that the LDR is 0 or reset,
>> recalculate_apic_map() will never consider including this cpu in the
>> logical map.
> ?
>> In short, as I mentioned in the patch description, this is really a KVM
>> bug but it doesn't hurt to clear out the LDR in the guest and then, it
>> wouldn't need a hypervisor fix.
>
> I still needs a hypervisor fix. Taking disabled APICs into account is a bug
> which has also other consequeces than that particular one. So please don't
> claim that. It's wrong.
>
> If that prevents the APIC bug from triggering on unfixed hypervisors, then
> this is a nice side effect, but not a solution.
>
Agreed and fwiw, the kvm fix has been queued already.

>> Is this better ?
>
> That's way better.
>
> So can you please create two patches:
>
> 1) Make that bogus bigsmp ldr init empty
>
> That one wants a changelog along these lines:
>
> - Setting LDR for physical destination mode is pointless
> - Setting multiple bits in the LDR is wrong
>
> Mention how this was discovered and caused the KVM APIC bug to be
> triggered. Also mention that the change is not there to paper over
> the KVM APIC bug. The change fixes a bug in the bigsmp APIC code.
>
> 2) Clear LDR in in that apic reset function
>
> That one wants a changelog along these lines:
>
> - Except for x2apic the LDR should be cleared as any other APIC
> register
>
> Mention how this was discovered. Again the change is not there to
> paper over the KVM APIC bug. It's for correctness sake and valid on
> its own.
>
> Thanks,
>
Will do as you suggested. Thank you for the review.

Bandan
> tglx
>
>