2018-09-06 10:17:45

by Philipp Eppelt

[permalink] [raw]
Subject: x86/apic: MSI address malformed for "flat" driver

Hi,

I believe the x86/APIC implementation does not behave according to the
Intel SDM specification, when it comes to composing MSI messages for the
"flat" APIC driver as of a31e58e129f73ab5b04016330b13ed51fde7a961 .


APIC "flat" driver and MSI address composing from the current master
(2018-09-02, 60c1f89241d49bacf71035470684a8d7b4bb46ea):

static struct apic apic_flat __ro_after_init = {
...
.irq_delivery_mode = dest_Fixed,
.irq_dest_mode = 1, /* logical */
...
.calc_dest_apicid = apic_flat_calc_apicid,
};


static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg
*msg) {
...
msg->address_lo =
MSI_ADDR_BASE_LO |
((apic->irq_dest_mode == 0) ?
MSI_ADDR_DEST_MODE_PHYSICAL :
MSI_ADDR_DEST_MODE_LOGICAL) |
MSI_ADDR_REDIRECTION_CPU |
MSI_ADDR_DEST_ID(cfg->dest_apicid);
...
}

The "flat" driver defines the MSI addressing scheme to be used as
logical addressing in flat mode. The MSI msg address is composed
accordingly, but sets MSI_ADDR_REDIRECTION_CPU which is a zero at bit[3].

The intel SDM states for the MSI address format (SDM vol.3 10.11):
31-20 0xfee
19-12 Destination ID (DID)
11-4 Reserved
3 Redirection Hint (RH)
2 Destination Mode (DM)
1-0 XX

The relation of RH and DM is, if RH is 0, DM is ignored and the DID
field is interpreted the same as are bits [63:56] in the IO-APIC,
meaning as local APIC ID.

If RH is 1 and DM is 0, physical addressing is used (see the
apic_physflat driver).
If RH is 1 and DM is 1, logical addressing is used which splits up into
flat and cluster mode determined by the APICs DFR and LDR using the
logical APIC address.


Currently, irq_msi_compose_msg composes for the "flat" driver an address
like 0xfee0'1004 for a 64-bit single-core system without IO-APIC and MSI
remapping and no ACPI (a virtual system).

That's incorrect because RH == 0 means the DID should show a local APIC
ID, but it shows a logical APIC ID for logical flat addressing (DM == 1,
DFR[31:28] == 0).
The LDR register is correctly set up as well, so the behavior is
consistent, but completely ignores the RH value.

The DID calculation producing the local APIC ID should be done by
"apic_default_calc_apicid", when the RH bit is not set.


That's my analysis I want to put up for discussion.

I hope to have included all necessary information on my setup, please
let me know if I missed something.


I don't have an overview over all affected parts in and around the APIC,
so I am currently not able to produce a patch (besides just changing
.calc_dest_apicid which makes the "flat" driver inconsistent).


Cheers,
Philipp


p.s. I am on vacation the next three weeks starting Saturday, so forgive
me for not answering in the meantime.


Kernel config: x86_64_defconfig
+
CONFIG_KERNEL_XZ=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_CC_STACKPROTECTOR_REGULAR=y
CONFIG_PCI_MSI=y
CONFIG_OF=y
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_SIZE=16384
CONFIG_VIRTIO_BLK=y
CONFIG_SERIO_RAW=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_VIRTIO_CONSOLE=y
CONFIG_HW_RANDOM_VIRTIO=y
CONFIG_VIRTIO_PCI=y
# CONFIG_VIRTIO_PCI_LEGACY is not set
CONFIG_VIRTIO_INPUT=y
CONFIG_EXT3_FS=y
CONFIG_TMPFS=y
CONFIG_MESSAGE_LOGLEVEL_DEFAULT=7
CONFIG_DEBUG_FS=y
CONFIG_STACKTRACE=y
CONFIG_MEMTEST=y


virtual System:
x86-64 64-bit UP, 128MB RAM, no ACPI, no MSI remapping, no IO-APIC

--
[email protected] - Tel. 0351-41 883 221
http://www.kernkonzept.com


2018-09-07 19:13:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: x86/apic: MSI address malformed for "flat" driver

On Thu, 6 Sep 2018, Philipp Eppelt wrote:
>
> The "flat" driver defines the MSI addressing scheme to be used as
> logical addressing in flat mode. The MSI msg address is composed
> accordingly, but sets MSI_ADDR_REDIRECTION_CPU which is a zero at bit[3].

Correct. That's what it means:

* When RH is 0, the interrupt is directed to the processor listed in the
Destination ID field.

So for DM:

* If RH is 0, then the DM bit is ignored and the message is sent ahead
independent of whether the physical or logical destination mode is
used.

which is means that the delivery does not do any magic redirections,
because the Redirection Hint is off. If RH is set, then the delivery can
redirect according to the rules in the DM section. We are not using that
because we want targeted single CPU delivery.

The interpretation of the DID field is purely depending on the local APIC
itself by matching the APIC ID against the DID field. And the local APIC ID
of CPU0 is 1 << 0, i.e. 0x1 which matches the MSI message you see.

> Currently, irq_msi_compose_msg composes for the "flat" driver an address
> like 0xfee0'1004 for a 64-bit single-core system without IO-APIC and MSI
> remapping and no ACPI (a virtual system).

The DM field is irrelevant if RH is 0. If RH is one and DM is 1 then you
can do group stuff and other magic, but we don't use that for 'external'
interrupts.

Where it _is_ used though is in the IPI delivery so that IPIs to multiple
CPUs require only a single APIC write, while with physical mode it's
necessary to write a single message to each CPU.

Hope that helps.

Thanks,

tglx

2018-09-11 06:11:28

by Cyril Novikov

[permalink] [raw]
Subject: Re: x86/apic: MSI address malformed for "flat" driver

On 9/7/2018 12:11 PM, Thomas Gleixner wrote:
> On Thu, 6 Sep 2018, Philipp Eppelt wrote:
>>
>> The "flat" driver defines the MSI addressing scheme to be used as
>> logical addressing in flat mode. The MSI msg address is composed
>> accordingly, but sets MSI_ADDR_REDIRECTION_CPU which is a zero at bit[3].
>
> Correct. That's what it means:
>
> * When RH is 0, the interrupt is directed to the processor listed in the
> Destination ID field.
>
> So for DM:
>
> * If RH is 0, then the DM bit is ignored and the message is sent ahead
> independent of whether the physical or logical destination mode is
> used.
>
> which is means that the delivery does not do any magic redirections,
> because the Redirection Hint is off. If RH is set, then the delivery can
> redirect according to the rules in the DM section. We are not using that
> because we want targeted single CPU delivery.
>
> The interpretation of the DID field is purely depending on the local APIC
> itself by matching the APIC ID against the DID field. And the local APIC ID
> of CPU0 is 1 << 0, i.e. 0x1 which matches the MSI message you see.

I believe you are wrong here and the local APIC ID of CPU0 is 0.

processor : 0
vendor_id : GenuineIntel
...
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0

The fact that the code works means that DM is not ignored when RH is 0.
In other words, RH=0 DM=1 means logical destination mode.

--
Cyril

2018-09-11 12:32:22

by Thomas Gleixner

[permalink] [raw]
Subject: Re: x86/apic: MSI address malformed for "flat" driver

On Mon, 10 Sep 2018, Cyril Novikov wrote:
> On 9/7/2018 12:11 PM, Thomas Gleixner wrote:
> > On Thu, 6 Sep 2018, Philipp Eppelt wrote:
> > >
> > > The "flat" driver defines the MSI addressing scheme to be used as
> > > logical addressing in flat mode. The MSI msg address is composed
> > > accordingly, but sets MSI_ADDR_REDIRECTION_CPU which is a zero at bit[3].
> >
> > Correct. That's what it means:
> >
> > * When RH is 0, the interrupt is directed to the processor listed in the
> > Destination ID field.
> >
> > So for DM:
> >
> > * If RH is 0, then the DM bit is ignored and the message is sent ahead
> > independent of whether the physical or logical destination mode is
> > used.
> >
> > which is means that the delivery does not do any magic redirections,
> > because the Redirection Hint is off. If RH is set, then the delivery can
> > redirect according to the rules in the DM section. We are not using that
> > because we want targeted single CPU delivery.
> >
> > The interpretation of the DID field is purely depending on the local APIC
> > itself by matching the APIC ID against the DID field. And the local APIC ID
> > of CPU0 is 1 << 0, i.e. 0x1 which matches the MSI message you see.
>
> I believe you are wrong here and the local APIC ID of CPU0 is 0.
>
> processor : 0
> vendor_id : GenuineIntel
> ...
> physical id : 0
> siblings : 8
> core id : 0
> cpu cores : 4
> apicid : 0
>
> The fact that the code works means that DM is not ignored when RH is 0. In
> other words, RH=0 DM=1 means logical destination mode.

Sorry, I did not explain it very well. Let me try again.

* If RH is 0, then the DM bit is ignored and the message is sent ahead
independent of whether the physical or logical destination mode is
used.

The PCI device simply writes the message data to that address, it does not
even know what the individual bits mean. It's a write of data to address.

The write gets then directed to the APIC bus or the Processor System Bus
depending on the CPU by a translation unit. The translated message which
goes on the bus to which the APIC(s) are connected contains the DM bit
which is always evaluated by the local APICs for matching.

You can simply verify that by inverting the DM field. You probably get
completely malfunctioning interrupts or if you're lucky they are delivered
to the wrong CPU.

Why? Because the APIC has two match mechanisms.

If the message on the system/apic bus has DM = 0 then it matches
the Phsyical APIC ID which you can see in /proc/cpuinfo

If the message on the system/apic bus has DM = 1 then it matches the
Logical APIC ID which is stored in the LDR register. apic flat sets that
to 1 << CPUNr, i.e. 0x01 for CPU0.

If RH is set in the address then the translation unit tries to be smart
about the delivery, i.e. by directing it to the processor which has the
lowest interrupt priority. In logical mode it choses ONE processor out of
the destination ID bits, i.e. the resulting message on the system/apic bus
contains only a single bit. Physical mode is single CPU destination anyway
so there is no real difference to RH=0.

If RH is not set then the logic translates the message without
modifications including the DM bit. If the destination ID would have more
than a single bit set, then the interrupt would be simultaneously delivered
to all CPUs which have a matching bit in the LDR. Not desired for device
interrupts, but the single CPU affinity of the vector allocation guarantees
that there is only one bit set. The kernel still uses multiple bits for
IPIs.

Yes, we could switch APIC flat to use phsyical mode in the MSI and the
IOAPIC case, but I did not see a reason to do so.

Hope that clarifies it.

Out of curiosity: What kind of problem are you trying to solve?

Thanks,

tglx

2018-10-01 14:24:59

by Philipp Eppelt

[permalink] [raw]
Subject: Re: x86/apic: MSI address malformed for "flat" driver

On 09/11/2018 02:29 PM, Thomas Gleixner wrote:
> On Mon, 10 Sep 2018, Cyril Novikov wrote:
>> On 9/7/2018 12:11 PM, Thomas Gleixner wrote:
>>> On Thu, 6 Sep 2018, Philipp Eppelt wrote:
>>>>
>>>> The "flat" driver defines the MSI addressing scheme to be used as
>>>> logical addressing in flat mode. The MSI msg address is composed
>>>> accordingly, but sets MSI_ADDR_REDIRECTION_CPU which is a zero at bit[3].
>>>
>>> Correct. That's what it means:
>>>
>>> * When RH is 0, the interrupt is directed to the processor listed in the
>>> Destination ID field.
>>>
>>> So for DM:
>>>
>>> * If RH is 0, then the DM bit is ignored and the message is sent ahead
>>> independent of whether the physical or logical destination mode is
>>> used.
>>>
>>> which is means that the delivery does not do any magic redirections,
>>> because the Redirection Hint is off. If RH is set, then the delivery can
>>> redirect according to the rules in the DM section. We are not using that
>>> because we want targeted single CPU delivery.
>>>
>>> The interpretation of the DID field is purely depending on the local APIC
>>> itself by matching the APIC ID against the DID field. And the local APIC ID
>>> of CPU0 is 1 << 0, i.e. 0x1 which matches the MSI message you see.
>>
>> I believe you are wrong here and the local APIC ID of CPU0 is 0.
>>
>> processor : 0
>> vendor_id : GenuineIntel
>> ...
>> physical id : 0
>> siblings : 8
>> core id : 0
>> cpu cores : 4
>> apicid : 0
>>
>> The fact that the code works means that DM is not ignored when RH is 0. In
>> other words, RH=0 DM=1 means logical destination mode.
>
> Sorry, I did not explain it very well. Let me try again.
>
> * If RH is 0, then the DM bit is ignored and the message is sent ahead
> independent of whether the physical or logical destination mode is
> used.
>
> The PCI device simply writes the message data to that address, it does not
> even know what the individual bits mean. It's a write of data to address.
>
> The write gets then directed to the APIC bus or the Processor System Bus
> depending on the CPU by a translation unit.

Ah, so there is a translation unit right before the apic/system bus
which translates the MSI address & data to the system bus format.
I missed that bit in the manual. I assumed a connection between RH and
DM when the APIC interprets the message.


> If RH is not set then the logic translates the message without
> modifications including the DM bit.

Albeit RH=0 the DM bit IS interpreted by the local APIC at the end as
the DM bit is part of the system bus message format? Can you point me to
some documentation on the translated message format? I guess it is
similar to the local APIC's Interrupt Command Register?

>
> Hope that clarifies it.

Yes, thank you very much for the additional explanations.

>
> Out of curiosity: What kind of problem are you trying to solve?

I am in the process of writing a x86_64 VMM for the L4Re OS with Linux
as a guest and are working on the PCI subsystem/MSI handling of the VMM
for virtual devices.

Cheers,
Philipp