2006-10-15 07:29:57

by Yinghai Lu

[permalink] [raw]
Subject: [PATCH] x86_64: typo in __assign_irq_vector when update pos for vector and offset

typo with cpu instead of new_cpu when update the pos struct.

Signed-off-by: Yinghai Lu <[email protected]>

diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index 44b55f8..3857440 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -655,8 +655,8 @@ next:
goto next;
/* Found one! */
for_each_cpu_mask(new_cpu, domain) {
- pos[cpu].vector = vector;
- pos[cpu].offset = offset;
+ pos[new_cpu].vector = vector;
+ pos[new_cpu].offset = offset;
}
if (old_vector >= 0) {
int old_cpu;


2006-10-15 19:21:36

by Yinghai Lu

[permalink] [raw]
Subject: re: [PATCH] x86_64: typo in __assign_irq_vector when update pos for vector and offset

Please use this one

typo with cpu instead of new_cpu

Signed-off-by: Yinghai Lu <[email protected]>

diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index 44b55f8..756d097 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -651,12 +651,12 @@ next:
if (vector == IA32_SYSCALL_VECTOR)
goto next;
for_each_cpu_mask(new_cpu, domain)
- if (per_cpu(vector_irq, cpu)[vector] != -1)
+ if (per_cpu(vector_irq, new_cpu)[vector] != -1)
goto next;
/* Found one! */
for_each_cpu_mask(new_cpu, domain) {
- pos[cpu].vector = vector;
- pos[cpu].offset = offset;
+ pos[new_cpu].vector = vector;
+ pos[new_cpu].offset = offset;
}
if (old_vector >= 0) {
int old_cpu;


Attachments:
(No filename) (0.99 kB)
io_apic_x.diff (644.00 B)
Download all attachments

2006-10-16 15:26:13

by Lu, Yinghai

[permalink] [raw]
Subject: Fwd: [PATCH] x86_64: typo in __assign_irq_vector when update pos for vector and offset

Please check the patch.

Also I have a question about TARGET_CPUS in io_apic.c.

for a 16 sockets system with 32 non coherent ht chain. and if every
chain have 8 irq for devices, the genapic will use physflat. and it
should use you new added "different cpu can have same vector for
different irq". --- in the i8259.c

but the setup_IOAPIC_irqs and arch_setup_ht_irq and arch_setup_msi_irq
is still using TARGET_CPUS ( it is cpumask_of_cpu(0) for physflat), so
the assign_irq_vector will not get vector for them, becase cpu 0 does
not have that much vector to be alllocated. and later
setup_affinity_xxx_irq can not be used because before irq is not there
and show on /proc/interrupts.

So I want to
1. for arch_setup_ht_irq and arch_setup_msi_irq, we can use the dev it
takes to get bus and use bus->sysdata to get bus->node mapping that is
created in fillin_cpumask_to_bus, to get real target_cpus instead of
cpu0.
2. for ioapics, may need to add another array,
ioapic_node[MAX_IOAPICS], and use ioapic address to get the numa node
for it. So later can use it to get real targets cpus when need to use
TARGET_CPUS.

Please comments.

Thanks

Yinghai Lu





---------- Forwarded message ----------
From: yhlu <[email protected]>
Date: Oct 15, 2006 12:21 PM
Subject: re: [PATCH] x86_64: typo in __assign_irq_vector when update
pos for vector and offset
To: "Eric W. Biederman" <[email protected]>, Andi Kleen <[email protected]>
Cc: linux kernel mailing list <[email protected]>,
[email protected]


Please use this one

typo with cpu instead of new_cpu

Signed-off-by: Yinghai Lu <[email protected]>

diff --git a/arch/x86_64/kernel/io_apic.c b/arch/x86_64/kernel/io_apic.c
index 44b55f8..756d097 100644
--- a/arch/x86_64/kernel/io_apic.c
+++ b/arch/x86_64/kernel/io_apic.c
@@ -651,12 +651,12 @@ next:
if (vector == IA32_SYSCALL_VECTOR)
goto next;
for_each_cpu_mask(new_cpu, domain)
- if (per_cpu(vector_irq, cpu)[vector] != -1)
+ if (per_cpu(vector_irq, new_cpu)[vector] != -1)
goto next;
/* Found one! */
for_each_cpu_mask(new_cpu, domain) {
- pos[cpu].vector = vector;
- pos[cpu].offset = offset;
+ pos[new_cpu].vector = vector;
+ pos[new_cpu].offset = offset;
}
if (old_vector >= 0) {
int old_cpu;


Attachments:
(No filename) (2.46 kB)
io_apic_x.diff (644.00 B)
Download all attachments

2006-10-16 18:07:45

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Fwd: [PATCH] x86_64: typo in __assign_irq_vector when update pos for vector and offset

"Yinghai Lu" <[email protected]> writes:

> Please check the patch.

It looks good. I think I was having a bad day when I introduced those thinkos.
I'm a bit distracted at the moment so I don't trust myself to forward
the patch on. Hopefully later today.

> Also I have a question about TARGET_CPUS in io_apic.c.
>
> for a 16 sockets system with 32 non coherent ht chain. and if every
> chain have 8 irq for devices, the genapic will use physflat. and it
> should use you new added "different cpu can have same vector for
> different irq". --- in the i8259.c
>
> but the setup_IOAPIC_irqs and arch_setup_ht_irq and arch_setup_msi_irq
> is still using TARGET_CPUS ( it is cpumask_of_cpu(0) for physflat), so
> the assign_irq_vector will not get vector for them, becase cpu 0 does
> not have that much vector to be alllocated. and later
> setup_affinity_xxx_irq can not be used because before irq is not there
> and show on /proc/interrupts.

Yep, that is a bug. I hadn't realized real systems that exhausted the
number of vectors were quite so close. That totally explains your
interest. That is one I/O heavy development system you have YH.

Unless there is a reason not to, TARGET_CPUS should be cpu_online map.
I don't know why it is restricted to a single cpu. I'm curious how
that would affect the user space irq balancer.

> So I want to
> 1. for arch_setup_ht_irq and arch_setup_msi_irq, we can use the dev it
> takes to get bus and use bus->sysdata to get bus->node mapping that is
> created in fillin_cpumask_to_bus, to get real target_cpus instead of
> cpu0.

That sounds like a very reasonable approach.

> 2. for ioapics, may need to add another array,
> ioapic_node[MAX_IOAPICS], and use ioapic address to get the numa node
> for it. So later can use it to get real targets cpus when need to use
> TARGET_CPUS.

Yes. Numa affinity for the ioapics seems to make sense. I don't
think we can count on looking at the address though? Are your
ioapics pci devices? If so that would be the easiest way to
deal with this problem.

> Please comments.

So far that is the job of the user space irqbalancer, but I don't see
why the kernel can't setup some reasonable defaults, if we have all of
the information needed.

For 2.6.19 we should be able to get my typos fixed, and probably
the default mask increased so that we are given a choice of something
other than cpu 0.

Beyond that it is going to take some additional working and thinking
and so it probably makes sense to have the code sit in the -mm
or Andi's tree for a while, and let it mature for 2.6.20.

Eric

2006-10-16 18:28:00

by Lu, Yinghai

[permalink] [raw]
Subject: RE: Fwd: [PATCH] x86_64: typo in __assign_irq_vector when update pos for vector and offset

With phys_flat mode, the apic will be delivered in phys mode, we only
can use cpu real apic id as target instead of apicid mask. Because that
only has 8 bits.

For io apic controllers, it seems the kernel didn't have pci_dev
corresponding, and we can use address stored in mpc_config.

YH


2006-10-16 18:35:45

by Andi Kleen

[permalink] [raw]
Subject: Re: Fwd: [PATCH] x86_64: typo in __assign_irq_vector when update pos for vector and offset

On Mon, Oct 16, 2006 at 12:05:35PM -0600, Eric W. Biederman wrote:
> For 2.6.19 we should be able to get my typos fixed, and probably
> the default mask increased so that we are given a choice of something
> other than cpu 0.
>
> Beyond that it is going to take some additional working and thinking
> and so it probably makes sense to have the code sit in the -mm
> or Andi's tree for a while, and let it mature for 2.6.20.

I admit I lost track of the patches for this new code which went
in while I was away.

Is that the only patch needed or are there other known problems too?

-Andi

2006-10-16 18:57:29

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Fwd: [PATCH] x86_64: typo in __assign_irq_vector when update pos for vector and offset

Andi Kleen <[email protected]> writes:

> On Mon, Oct 16, 2006 at 12:05:35PM -0600, Eric W. Biederman wrote:
>> For 2.6.19 we should be able to get my typos fixed, and probably
>> the default mask increased so that we are given a choice of something
>> other than cpu 0.
>>
>> Beyond that it is going to take some additional working and thinking
>> and so it probably makes sense to have the code sit in the -mm
>> or Andi's tree for a while, and let it mature for 2.6.20.
>
> I admit I lost track of the patches for this new code which went
> in while I was away.
>
> Is that the only patch needed or are there other known problems too?

The story.

The code went from -mm to -linus and everything was going smoothly
until we found a hyperthreaded cpu that in lowest priority delivery
mode would delivery mode would deliver irqs to hyperthreads even
if they were not in the mask.

Because of that and because Linus really likes lowest priority delivery
mode I submitted a patch that generalized the vector allocator to be
able to work on multiple cpus.

The patch was good except I missed one spot in retrigger irqs,
that YH caught, and I while the logic was fine in the vector allocator
I had several cases of using the wrong variable in the vector allocator.

Grumble! I made the vector allocator too general....

The one thinko that really affects people I fixed. There are two more
cases that YH caught this weekend, that are still to be pushed. Plus
there is the case that YH just caught that TARGET_CPUS being only a single
cpu is a real problem if you have more than 240 distinct irq sources.

So all that is pending are those two thinko fixes from YH. The first
that fixes the retrigger irqs is a real bug fix. The second that started
this thread is clearly the right thing to do, but I don't think there
are actually any real side effects from it.

Given how many thinkos I made last time I attacked this I'm not wanting
to do anything until I know I am thinking clearly :)

Eric

2006-10-16 19:06:28

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Fwd: [PATCH] x86_64: typo in __assign_irq_vector when update pos for vector and offset

"Lu, Yinghai" <[email protected]> writes:

> With phys_flat mode, the apic will be delivered in phys mode, we only
> can use cpu real apic id as target instead of apicid mask. Because that
> only has 8 bits.

Yes but the linux abstraction is a cpu mask. The current vector allocator
will keep going until it finds a cpu with a free vector if you give it
a mask with multiple cpus.

So to get things going making TARGET_CPUS cpu_online_map looks like
the right thing to do.

> For io apic controllers, it seems the kernel didn't have pci_dev
> corresponding, and we can use address stored in mpc_config.

My question is are your io_apics pci devices? Not does the kernel
have them.

So the truth is we really don't care about where the io_apics are. We
care about the source of the irqs, but in general they will all
be on the same NUMA node. As for using the addresses that doesn't feel
quite right as it doesn't sound like a general solution.

There are a lot of ways we can approach assigning irqs to cpus and there
is a lot of work there. I think Adrian Bunk has been doing some work
with the user space irq balancer, and should probably be involved.

Anyway this is all 2.6.20+ work to get the kernel to have a sane default.
As soon as 2.6.19 is solid I will worry about the future.

Eric

2006-10-16 20:06:25

by Lu, Yinghai

[permalink] [raw]
Subject: RE: Fwd: [PATCH] x86_64: typo in __assign_irq_vector when update pos for vector and offset

>So to get things going making TARGET_CPUS cpu_online_map looks like
>the right thing to do.

Yes. but need to other reference to TARGET_CPUS to verify...it doesn't
break sth.

>My question is are your io_apics pci devices? Not does the kernel
>have them.

Yes, I'm testing with 32 amd8132 in the simulator. Or forget about about
ioapic, and use MSI, and HT-irq directly...?

>There are a lot of ways we can approach assigning irqs to cpus and
there
>is a lot of work there. I think Adrian Bunk has been doing some work
>with the user space irq balancer, and should probably be involved.

Right. We need only do needed in kernel space, and leave most to irq
balancer.

YH





2006-10-17 17:57:40

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Fwd: [PATCH] x86_64: typo in __assign_irq_vector when update pos for vector and offset

"Lu, Yinghai" <[email protected]> writes:

>>So to get things going making TARGET_CPUS cpu_online_map looks like
>>the right thing to do.
>
> Yes. but need to other reference to TARGET_CPUS to verify...it doesn't
> break sth.

I just looked and tested and we are fine.

>>My question is are your io_apics pci devices? Not does the kernel
>>have them.
>
> Yes, I'm testing with 32 amd8132 in the simulator. Or forget about about
> ioapic, and use MSI, and HT-irq directly...?

Ok. So if want a pci device we can have one :)
Usually what I have seen is that all io_apics except the
first one show up as pci devices.

>>There are a lot of ways we can approach assigning irqs to cpus and
> there
>>is a lot of work there. I think Adrian Bunk has been doing some work
>>with the user space irq balancer, and should probably be involved.
>
> Right. We need only do needed in kernel space, and leave most to irq
> balancer.

Actually I was just being pragmatic. Make it work now. Make it optimal
later :)

Eric

2006-10-17 18:05:41

by Lu, Yinghai

[permalink] [raw]
Subject: RE: Fwd: [PATCH] x86_64: typo in __assign_irq_vector when update pos for vector and offset

From: [email protected] [mailto:[email protected]]
>I just looked and tested and we are fine.

Yes. my test is ok, We can change cpumask_of_cpu(0) in
physflat_target_cpus to cpu_online_map. Or just use flat_target_cpus
instead of physflat_target_cpus.

YH