vSMP Foundation provides locality based interrupt routing which needed
vector_allocation_domain to allow all online cpus can handle all possible
vectors.
Enforcing Interrupt Routing Comply (IRC) mode requires us to unplug this hook as
otherwise the IOAPIC, MSI and MSIX destination selectors to always select the
lowest online cpu as the destination. I.e affinity of HW interrupts cannot be
controled by kernel and/or userspace code.
The purpose of the patch is to fix the code to set override vector allocation
domain only when IRC is set to ignore to allow the kernel and userspace to
effectively control the destination of the HW interrupts.
Signed-off-by: Oren Twaig <[email protected]>
Acked-by: Shai Fultheim <[email protected]>
---
arch/x86/kernel/vsmp_64.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/vsmp_64.c b/arch/x86/kernel/vsmp_64.c
index f6584a9..b7f8e5b 100644
--- a/arch/x86/kernel/vsmp_64.c
+++ b/arch/x86/kernel/vsmp_64.c
@@ -26,6 +26,8 @@
#define TOPOLOGY_REGISTER_OFFSET 0x10
+static int irc = 1;
+
#if defined CONFIG_PCI && defined CONFIG_PARAVIRT
/*
* Interrupt control on vSMPowered systems:
@@ -101,6 +103,10 @@ static void __init set_vsmp_pv_ops(void)
#ifdef CONFIG_SMP
if (cap & ctl & BIT(8)) {
ctl &= ~BIT(8);
+
+ /* Interrupt routing set to ignore */
+ irc = 0;
+
#ifdef CONFIG_PROC_FS
/* Don't let users change irq affinity via procfs */
no_irq_affinity = 1;
@@ -218,7 +224,9 @@ static void vsmp_apic_post_init(void)
{
/* need to update phys_pkg_id */
apic->phys_pkg_id = apicid_phys_pkg_id;
- apic->vector_allocation_domain = fill_vector_allocation_domain;
+
+ if (!irc)
+ apic->vector_allocation_domain = fill_vector_allocation_domain;
}
void __init vsmp_init(void)
* Oren Twaig <[email protected]> wrote:
> vSMP Foundation provides locality based interrupt routing which needed
> vector_allocation_domain to allow all online cpus can handle all possible
> vectors.
>
> Enforcing Interrupt Routing Comply (IRC) mode requires us to unplug this hook as
> otherwise the IOAPIC, MSI and MSIX destination selectors to always select the
> lowest online cpu as the destination. I.e affinity of HW interrupts cannot be
> controled by kernel and/or userspace code.
>
> The purpose of the patch is to fix the code to set override vector allocation
> domain only when IRC is set to ignore to allow the kernel and userspace to
> effectively control the destination of the HW interrupts.
>
> Signed-off-by: Oren Twaig <[email protected]>
> Acked-by: Shai Fultheim <[email protected]>
So what was the behavior before the change - certain IRQs did not get
routed, they just ended up on CPU0 or on some other undesirable CPU?
Or was IRQ distribution random? It's not clear from the changelog.
Thanks,
Ingo
On 4/25/2014 11:01 AM, Ingo Molnar wrote:
>
> * Oren Twaig <[email protected]> wrote:
>
>> vSMP Foundation provides locality based interrupt routing which needed
>> vector_allocation_domain to allow all online cpus can handle all
possible
>> vectors.
>>
>> Enforcing Interrupt Routing Comply (IRC) mode requires us to unplug
this hook as
>> otherwise the IOAPIC, MSI and MSIX destination selectors to always
select the
>> lowest online cpu as the destination. I.e affinity of HW interrupts
cannot be
>> controled by kernel and/or userspace code.
>>
>> The purpose of the patch is to fix the code to set override vector
allocation
>> domain only when IRC is set to ignore to allow the kernel and
userspace to
>> effectively control the destination of the HW interrupts.
>>
>> Signed-off-by: Oren Twaig <[email protected]>
>> Acked-by: Shai Fultheim <[email protected]>
>
> So what was the behavior before the change - certain IRQs did not get
> routed, they just ended up on CPU0 or on some other undesirable CPU?
> Or was IRQ distribution random? It's not clear from the changelog.
It all depends on the IRC flag. When set to "ignore" by the linux
kernel, vSMP Foundation knew that it can deliver the IRQ to the CPU
which would result in less virtualization overhead. For example, we
could deliver the HW interrupt to the CPU which got it or any other CPU
in the system. We couldn't have done it without the kernel making sure
that each vector can be passed to all CPUs. This is why we override the
verctor allocation domain to signal all CPUs.
But, when the IRC is set to "comply" we, before this patch, still
efected the allocation domains alltough it wasn't needed. It wasn't
needed because when in "comply" mode, we always pass the HW interrupt to
the CPU the kernel requested (by setting the IOAPIC entry, MSI/X entry
or IR entry)
Thanks,
Oren
>
> Thanks,
>
> Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com
> diff --git a/arch/x86/kernel/vsmp_64.c b/arch/x86/kernel/vsmp_64.c
> index f6584a9..b7f8e5b 100644
> --- a/arch/x86/kernel/vsmp_64.c
> +++ b/arch/x86/kernel/vsmp_64.c
> @@ -26,6 +26,8 @@
>
> #define TOPOLOGY_REGISTER_OFFSET 0x10
>
> +static int irc = 1;
Using a static for such state is very unusual. You need to describe what
protects it against races and why that is needed over a cleaner solution.
-Andi
* Oren Twaig <[email protected]> wrote:
> On 4/25/2014 11:01 AM, Ingo Molnar wrote:
> >
> > * Oren Twaig <[email protected]> wrote:
> >
> >> vSMP Foundation provides locality based interrupt routing which needed
> >> vector_allocation_domain to allow all online cpus can handle all
> possible
> >> vectors.
> >>
> >> Enforcing Interrupt Routing Comply (IRC) mode requires us to
> unplug this hook as
> >> otherwise the IOAPIC, MSI and MSIX destination selectors to
> always select the
> >> lowest online cpu as the destination. I.e affinity of HW
> interrupts cannot be
> >> controled by kernel and/or userspace code.
> >>
> >> The purpose of the patch is to fix the code to set override
> vector allocation
> >> domain only when IRC is set to ignore to allow the kernel and
> userspace to
> >> effectively control the destination of the HW interrupts.
> >>
> >> Signed-off-by: Oren Twaig <[email protected]>
> >> Acked-by: Shai Fultheim <[email protected]>
> >
> > So what was the behavior before the change - certain IRQs did not get
> > routed, they just ended up on CPU0 or on some other undesirable CPU?
> > Or was IRQ distribution random? It's not clear from the changelog.
>
> It all depends on the IRC flag. When set to "ignore" by the linux
> kernel, vSMP Foundation knew that it can deliver the IRQ to the CPU
> which would result in less virtualization overhead. For example, we
> could deliver the HW interrupt to the CPU which got it or any other
> CPU in the system. We couldn't have done it without the kernel
> making sure that each vector can be passed to all CPUs. This is why
> we override the verctor allocation domain to signal all CPUs.
>
> But, when the IRC is set to "comply" we, before this patch, still
> efected the allocation domains alltough it wasn't needed. It wasn't
> needed because when in "comply" mode, we always pass the HW
> interrupt to the CPU the kernel requested (by setting the IOAPIC
> entry, MSI/X entry or IR entry)
I still don't see a clear explanation of what the _user_ saw and sees
before and after the change. What is the effect of the patch: correct
IRQ routing (i.e. before the change IRQs would end up on the wrong
CPU), lower overhead IRQ routing (i.e. before the change IRQ routing
overhead was more expensive), or something else?
You don't spell this out clearly and it's a crucial piece of
information that comes before every other explanation.
Thanks,
Ingo
Hi Ingo,
On 04/26/2014 09:09 AM, Ingo Molnar wrote:
> I still don't see a clear explanation of what the _user_ saw and sees
> before and after the change. What is the effect of the patch: correct
> IRQ routing (i.e. before the change IRQs would end up on the wrong
> CPU), lower overhead IRQ routing (i.e. before the change IRQ routing
> overhead was more expensive), or something else?
>
> You don't spell this out clearly and it's a crucial piece of
> information that comes before every other explanation.
>
I see.. I tried to explain the entire flow and that was confusing - I'll explain
only the patch.
As you stated, in general, the patch corrects IRQ routing in case a vSMP
Foundation box is detected but the Interrupt Routing Comply (IRC) is set to
"comply".
Before the patch:
When a vSMP Foundation box was detected and IRC was set to "comply", users (and
kernel) couldn't effectively set the destination of the IRQs. This is because
the hook inside vsmp_64.c always setup all CPUs as the IRQ destination using
cpumask_setall() as the return value for IRQ allocation mask. Later, this
"overrided" mask caused the kernel to set the IRQ destination to the lowest
online CPU in the mask (CPU0 usually).
After the patch:
When the IRC is set to "comply", Users (and kernel) can control the destination
of the IRQs as we will not be changing the default
"apic->vector_allocation_domain".
Thanks,
Oren
> Thanks,
>
> Ingo
Hi Andi,
On 04/25/2014 05:22 PM, Andi Kleen wrote:
>> +static int irc = 1;
> Using a static for such state is very unusual. You need to describe what
> protects it against races and why that is needed over a cleaner solution.
The only reason I've used a static variable is because I wanted to avoid
inserting another code/functions which are depended on CONFIG_PCI. The code is
used once during initialization and hence cannot be racy.
But, if static variables are unusual (new at linux kernel), I will change the
flow to read the HW state again (using the PCI functions). Please let me know if that is desirable.
Thanks,
Oren.
>
> -Andi
* Oren Twaig <[email protected]> wrote:
> Hi Ingo,
>
> On 04/26/2014 09:09 AM, Ingo Molnar wrote:
> > I still don't see a clear explanation of what the _user_ saw and sees
> > before and after the change. What is the effect of the patch: correct
> > IRQ routing (i.e. before the change IRQs would end up on the wrong
> > CPU), lower overhead IRQ routing (i.e. before the change IRQ routing
> > overhead was more expensive), or something else?
> >
> > You don't spell this out clearly and it's a crucial piece of
> > information that comes before every other explanation.
> >
> I see.. I tried to explain the entire flow and that was confusing - I'll explain
> only the patch.
>
> As you stated, in general, the patch corrects IRQ routing in case a vSMP
> Foundation box is detected but the Interrupt Routing Comply (IRC) is set to
> "comply".
>
> Before the patch:
> When a vSMP Foundation box was detected and IRC was set to "comply", users (and
> kernel) couldn't effectively set the destination of the IRQs. This is because
> the hook inside vsmp_64.c always setup all CPUs as the IRQ destination using
> cpumask_setall() as the return value for IRQ allocation mask. Later, this
> "overrided" mask caused the kernel to set the IRQ destination to the lowest
> online CPU in the mask (CPU0 usually).
>
> After the patch:
> When the IRC is set to "comply", Users (and kernel) can control the destination
> of the IRQs as we will not be changing the default
> "apic->vector_allocation_domain".
Much better, thanks!
Ingo
* Oren Twaig <[email protected]> wrote:
> Hi Andi,
>
> On 04/25/2014 05:22 PM, Andi Kleen wrote:
> >> +static int irc = 1;
> >
> > Using a static for such state is very unusual. You need to
> > describe what protects it against races and why that is needed
> > over a cleaner solution.
>
> The only reason I've used a static variable is because I wanted to
> avoid inserting another code/functions which are depended on
> CONFIG_PCI. The code is used once during initialization and hence
> cannot be racy.
>
> But, if static variables are unusual (new at linux kernel), [...]
They aren't unusual at all - Andi Kleen is a known to troll x86
discussions time and again with random input, just ignore it when you
get bad advice.
> [...] I will change the flow to read the HW state again (using the
> PCI functions). Please let me know if that is desirable.
No, being slower is not desirable.
Maybe name the flag in a clearer fashion (the term 'irc' is used for
something entirely different, most of the time), i.e. make sure it's
very obvious that it's a set-once init flag.
Thanks,
Ingo
On Sun, Apr 27, 2014 at 09:57:59AM +0300, Oren Twaig wrote:
> Hi Andi,
>
> On 04/25/2014 05:22 PM, Andi Kleen wrote:
> >> +static int irc = 1;
> > Using a static for such state is very unusual. You need to describe what
> > protects it against races and why that is needed over a cleaner solution.
>
> The only reason I've used a static variable is because I wanted to avoid
> inserting another code/functions which are depended on CONFIG_PCI. The code is
> used once during initialization and hence cannot be racy.
Again, what lock protects it?
If you cannot answer that question you likely shouldn't use static.
-Andi
Hi Andi,
On 04/27/2014 09:34 PM, Andi Kleen wrote:
> Again, what lock protects it?
>
> If you cannot answer that question you likely shouldn't use static.
The only function which touches this variable is vsmp_init() which is an
"_init" function which is guarantee to run by a single cpu - this means, no race.
Thanks,
Oren
>
> -Andi