2015-12-11 07:49:51

by kernel test robot

[permalink] [raw]
Subject: [lkp] [x86/irq] 4c24cee6b2: IP-Config: Auto-configuration of network failed

FYI, we noticed the below changes on

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/urgent
commit 4c24cee6b2aeaee3dab896f76fef4fe79d9e4183 ("x86/irq: Enhance __assign_irq_vector() to rollback in case of failure")


+------------------------------------------------+------------+------------+
| | 6dd7cb991f | 4c24cee6b2 |
+------------------------------------------------+------------+------------+
| boot_successes | 6 | 0 |
| boot_failures | 0 | 8 |
| IP-Config:Auto-configuration_of_network_failed | 0 | 6 |
| BUG:kernel_boot_hang | 0 | 2 |
+------------------------------------------------+------------+------------+

It appears that the Ethernet card doesn't work properly after your patch.

[ 15.342990] Waiting up to 110 more seconds for network.
[ 25.346987] Waiting up to 100 more seconds for network.
[ 35.350995] Waiting up to 90 more seconds for network.
[ 45.350993] Waiting up to 80 more seconds for network.
[ 55.351006] Waiting up to 70 more seconds for network.
[ 65.350992] Waiting up to 60 more seconds for network.
[ 75.355017] Waiting up to 50 more seconds for network.
[ 85.359009] Waiting up to 40 more seconds for network.
[ 95.363009] Waiting up to 30 more seconds for network.
[ 305.883015] Waiting up to 20 more seconds for network.
[ 315.887002] Waiting up to 10 more seconds for network.
[ 325.887524] Sending DHCP requests ...... timed out!
[ 417.893036] IP-Config: Auto-configuration of network failed
[ 417.893852] ALSA device list:
[ 417.894270] No soundcards found.
[ 417.899649] Freeing unused kernel memory: 2884K (ffffffff82574000 - ffffffff82845000)


Thanks,
Ying Huang


Attachments:
(No filename) (1.82 kB)
config-4.4.0-rc4-00088-g4c24cee (109.75 kB)
dmesg.xz (11.52 kB)
Download all attachments

2015-12-14 06:38:09

by Jiang Liu

[permalink] [raw]
Subject: Re: [lkp] [x86/irq] 4c24cee6b2: IP-Config: Auto-configuration of network failed

Hi Ying,
Thanks for reporting this issue. But I couldn't figure
out what's wrong with this commit. And there's no error or
warning messages in the attached dmesg file. Are there other
systems reporting the same issue?
Thanks,
Gerry

On 2015/12/11 15:49, kernel test robot wrote:
> FYI, we noticed the below changes on
>
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/urgent
> commit 4c24cee6b2aeaee3dab896f76fef4fe79d9e4183 ("x86/irq: Enhance __assign_irq_vector() to rollback in case of failure")
>
>
> +------------------------------------------------+------------+------------+
> | | 6dd7cb991f | 4c24cee6b2 |
> +------------------------------------------------+------------+------------+
> | boot_successes | 6 | 0 |
> | boot_failures | 0 | 8 |
> | IP-Config:Auto-configuration_of_network_failed | 0 | 6 |
> | BUG:kernel_boot_hang | 0 | 2 |
> +------------------------------------------------+------------+------------+
>
> It appears that the Ethernet card doesn't work properly after your patch.
>
> [ 15.342990] Waiting up to 110 more seconds for network.
> [ 25.346987] Waiting up to 100 more seconds for network.
> [ 35.350995] Waiting up to 90 more seconds for network.
> [ 45.350993] Waiting up to 80 more seconds for network.
> [ 55.351006] Waiting up to 70 more seconds for network.
> [ 65.350992] Waiting up to 60 more seconds for network.
> [ 75.355017] Waiting up to 50 more seconds for network.
> [ 85.359009] Waiting up to 40 more seconds for network.
> [ 95.363009] Waiting up to 30 more seconds for network.
> [ 305.883015] Waiting up to 20 more seconds for network.
> [ 315.887002] Waiting up to 10 more seconds for network.
> [ 325.887524] Sending DHCP requests ...... timed out!
> [ 417.893036] IP-Config: Auto-configuration of network failed
> [ 417.893852] ALSA device list:
> [ 417.894270] No soundcards found.
> [ 417.899649] Freeing unused kernel memory: 2884K (ffffffff82574000 - ffffffff82845000)
>
>
> Thanks,
> Ying Huang
>

2015-12-14 06:54:07

by Huang, Ying

[permalink] [raw]
Subject: Re: [LKP] [lkp] [x86/irq] 4c24cee6b2: IP-Config: Auto-configuration of network failed

Hi, Jiang,

Jiang Liu <[email protected]> writes:

> Hi Ying,
> Thanks for reporting this issue. But I couldn't figure
> out what's wrong with this commit. And there's no error or
> warning messages in the attached dmesg file. Are there other
> systems reporting the same issue?

No, there are no other systems reporting the same issue. I will queue
more tests for make sure this is not a false positive.

Best Regards,
Huang, Ying

> Thanks,
> Gerry
>
> On 2015/12/11 15:49, kernel test robot wrote:
>> FYI, we noticed the below changes on
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/urgent
>> commit 4c24cee6b2aeaee3dab896f76fef4fe79d9e4183 ("x86/irq: Enhance
>> __assign_irq_vector() to rollback in case of failure")
>>
>>
>> +------------------------------------------------+------------+------------+
>> | | 6dd7cb991f | 4c24cee6b2 |
>> +------------------------------------------------+------------+------------+
>> | boot_successes | 6 | 0 |
>> | boot_failures | 0 | 8 |
>> | IP-Config:Auto-configuration_of_network_failed | 0 | 6 |
>> | BUG:kernel_boot_hang | 0 | 2 |
>> +------------------------------------------------+------------+------------+
>>
>> It appears that the Ethernet card doesn't work properly after your patch.
>>
>> [ 15.342990] Waiting up to 110 more seconds for network.
>> [ 25.346987] Waiting up to 100 more seconds for network.
>> [ 35.350995] Waiting up to 90 more seconds for network.
>> [ 45.350993] Waiting up to 80 more seconds for network.
>> [ 55.351006] Waiting up to 70 more seconds for network.
>> [ 65.350992] Waiting up to 60 more seconds for network.
>> [ 75.355017] Waiting up to 50 more seconds for network.
>> [ 85.359009] Waiting up to 40 more seconds for network.
>> [ 95.363009] Waiting up to 30 more seconds for network.
>> [ 305.883015] Waiting up to 20 more seconds for network.
>> [ 315.887002] Waiting up to 10 more seconds for network.
>> [ 325.887524] Sending DHCP requests ...... timed out!
>> [ 417.893036] IP-Config: Auto-configuration of network failed
>> [ 417.893852] ALSA device list:
>> [ 417.894270] No soundcards found.
>> [ 417.899649] Freeing unused kernel memory: 2884K (ffffffff82574000 - ffffffff82845000)
>>
>>
>> Thanks,
>> Ying Huang
>>
> _______________________________________________
> LKP mailing list
> [email protected]
> https://lists.01.org/mailman/listinfo/lkp

2015-12-14 09:54:34

by Borislav Petkov

[permalink] [raw]
Subject: Re: [LKP] [lkp] [x86/irq] 4c24cee6b2: IP-Config: Auto-configuration of network failed

On Mon, Dec 14, 2015 at 02:54:02PM +0800, Huang, Ying wrote:
> No, there are no other systems reporting the same issue. I will queue
> more tests for make sure this is not a false positive.

I can trigger this too with my guest here.

I have these two ontop of rc5:

cc22b9b83f6a x86/irq: Enhance __assign_irq_vector() to rollback in case of failure
45dd79e03e1e x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer
9f9499ae8e64 Linux 4.4-rc5

and my guest stalls while booting.

The new thing I see in dmesg is this:

..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
+..MP-BIOS bug: 8254 timer not connected to IO-APIC
+...trying to set up timer (IRQ0) through the 8259A ...
+..... (found apic 0 pin 2) ...
+....... failed.
+...trying to set up timer as Virtual Wire IRQ...
+..... failed.
+...trying to set up timer as ExtINT IRQ...
+..... works.
+APIC calibration not consistent with PM-Timer: 111ms instead of 100ms
+APIC delta adjusted to PM-Timer: 6248393 (6997337)

which leads to boot stalling and timeoutting when loading the hdd
driver:

...
[ 3.973447] console [netcon0] enabled
[ 3.976099] netconsole: network logging started
[ 3.979604] rtc_cmos 00:00: setting system clock to 2015-12-14 10:45:35 UTC (1450089935)
[ 3.985348] PM: Checking hibernation image partition /dev/sdb1
[ 6.600706] usb 1-1: New USB device found, idVendor=0627, idProduct=0001
[ 6.613651] usb 1-1: New USB device strings: Mfr=1, Product=3, SerialNumber=5
[ 6.636905] usb 1-1: Product: QEMU USB Tablet
[ 6.642248] usb 1-1: Manufacturer: QEMU
[ 6.647109] usb 1-1: SerialNumber: 42
[ 7.580995] ata2.00: qc timeout (cmd 0xa0)
[ 7.589300] ata2.00: TEST_UNIT_READY failed (err_mask=0x5)
[ 7.750715] ata2.01: NODEV after polling detection
[ 7.759605] ata2.00: configured for MWDMA2
[ 8.585691] input: QEMU QEMU USB Tablet as /devices/pci0000:00/0000:00:01.2/usb1/1-1/1-1:1.0/0003:0627:0001.0001/input/input1
[ 8.602467] hid-generic 0003:0627:0001.0001: input,hidraw0: USB HID v0.01 Pointer [QEMU QEMU USB Tablet] on usb-0000:00:01.2-1/input0
[ 12.760846] ata2.00: qc timeout (cmd 0xa0)
[ 12.786543] ata2.00: TEST_UNIT_READY failed (err_mask=0x5)
[ 12.796576] ata2.00: limiting speed to MWDMA2:PIO3
[ 12.958455] ata2.01: NODEV after polling detection
[ 12.969693] ata2.00: configured for MWDMA2
[ 17.972782] ata2.00: qc timeout (cmd 0xa0)
[ 17.978967] ata2.00: TEST_UNIT_READY failed (err_mask=0x5)
[ 17.983495] ata2.00: disabled
[ 17.986352] ata2: soft resetting link
[ 18.146586] ata2.01: NODEV after polling detection
[ 18.151413] ata2: EH complete
[ 32.745227] ata1: lost interrupt (Status 0x50)
[ 32.748470] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 32.756586] ata1.00: failed command: READ DMA
[ 32.761251] ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[ 32.761251] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 32.773928] ata1.00: status: { DRDY }
[ 32.777028] ata1: soft resetting link
[ 32.934437] ata1.01: NODEV after polling detection
[ 32.946663] ata1.00: configured for MWDMA2
[ 32.949964] ata1.00: device reported invalid CHS sector 0
[ 32.953793] ata1: EH complete
[ 63.849089] ata1: lost interrupt (Status 0x50)
[ 63.857470] ata1.00: limiting speed to MWDMA1:PIO4
[ 63.860982] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 63.865862] ata1.00: failed command: READ DMA
[ 63.883697] ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
[ 63.883697] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 63.899573] ata1.00: status: { DRDY }
[ 63.902649] ata1: soft resetting link
[ 64.062580] ata1.01: NODEV after polling detection
[ 64.073800] ata1.00: configured for MWDMA1
[ 64.076813] ata1.00: device reported invalid CHS sector 0
[ 64.096188] ata1: EH complete

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.

2015-12-15 07:55:21

by Jiang Liu

[permalink] [raw]
Subject: Re: [LKP] [lkp] [x86/irq] 4c24cee6b2: IP-Config: Auto-configuration of network failed

On 2015/12/14 17:54, Borislav Petkov wrote:
> On Mon, Dec 14, 2015 at 02:54:02PM +0800, Huang, Ying wrote:
>> No, there are no other systems reporting the same issue. I will queue
>> more tests for make sure this is not a false positive.
>
> I can trigger this too with my guest here.
>
> I have these two ontop of rc5:
>
> cc22b9b83f6a x86/irq: Enhance __assign_irq_vector() to rollback in case of failure
> 45dd79e03e1e x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer
> 9f9499ae8e64 Linux 4.4-rc5
>
> and my guest stalls while booting.
>
> The new thing I see in dmesg is this:
>
> ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> +..MP-BIOS bug: 8254 timer not connected to IO-APIC
> +...trying to set up timer (IRQ0) through the 8259A ...
> +..... (found apic 0 pin 2) ...
> +....... failed.
> +...trying to set up timer as Virtual Wire IRQ...
> +..... failed.
> +...trying to set up timer as ExtINT IRQ...
> +..... works.
> +APIC calibration not consistent with PM-Timer: 111ms instead of 100ms
> +APIC delta adjusted to PM-Timer: 6248393 (6997337)
>
> which leads to boot stalling and timeoutting when loading the hdd
> driver:
Hi Boris and Ying,
Aha, found a possible regression. Could you please help to
apply the attached bugfix patch ontop of "cc22b9b83f6a x86/irq:
Enhance __assign_irq_vector() to rollback in case of failure"?
Hi Ying, I have push this patch to github so it should reach
0day test farm soon:)
Thanks,
Gerry

>
> ...
> [ 3.973447] console [netcon0] enabled
> [ 3.976099] netconsole: network logging started
> [ 3.979604] rtc_cmos 00:00: setting system clock to 2015-12-14 10:45:35 UTC (1450089935)
> [ 3.985348] PM: Checking hibernation image partition /dev/sdb1
> [ 6.600706] usb 1-1: New USB device found, idVendor=0627, idProduct=0001
> [ 6.613651] usb 1-1: New USB device strings: Mfr=1, Product=3, SerialNumber=5
> [ 6.636905] usb 1-1: Product: QEMU USB Tablet
> [ 6.642248] usb 1-1: Manufacturer: QEMU
> [ 6.647109] usb 1-1: SerialNumber: 42
> [ 7.580995] ata2.00: qc timeout (cmd 0xa0)
> [ 7.589300] ata2.00: TEST_UNIT_READY failed (err_mask=0x5)
> [ 7.750715] ata2.01: NODEV after polling detection
> [ 7.759605] ata2.00: configured for MWDMA2
> [ 8.585691] input: QEMU QEMU USB Tablet as /devices/pci0000:00/0000:00:01.2/usb1/1-1/1-1:1.0/0003:0627:0001.0001/input/input1
> [ 8.602467] hid-generic 0003:0627:0001.0001: input,hidraw0: USB HID v0.01 Pointer [QEMU QEMU USB Tablet] on usb-0000:00:01.2-1/input0
> [ 12.760846] ata2.00: qc timeout (cmd 0xa0)
> [ 12.786543] ata2.00: TEST_UNIT_READY failed (err_mask=0x5)
> [ 12.796576] ata2.00: limiting speed to MWDMA2:PIO3
> [ 12.958455] ata2.01: NODEV after polling detection
> [ 12.969693] ata2.00: configured for MWDMA2
> [ 17.972782] ata2.00: qc timeout (cmd 0xa0)
> [ 17.978967] ata2.00: TEST_UNIT_READY failed (err_mask=0x5)
> [ 17.983495] ata2.00: disabled
> [ 17.986352] ata2: soft resetting link
> [ 18.146586] ata2.01: NODEV after polling detection
> [ 18.151413] ata2: EH complete
> [ 32.745227] ata1: lost interrupt (Status 0x50)
> [ 32.748470] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> [ 32.756586] ata1.00: failed command: READ DMA
> [ 32.761251] ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
> [ 32.761251] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> [ 32.773928] ata1.00: status: { DRDY }
> [ 32.777028] ata1: soft resetting link
> [ 32.934437] ata1.01: NODEV after polling detection
> [ 32.946663] ata1.00: configured for MWDMA2
> [ 32.949964] ata1.00: device reported invalid CHS sector 0
> [ 32.953793] ata1: EH complete
> [ 63.849089] ata1: lost interrupt (Status 0x50)
> [ 63.857470] ata1.00: limiting speed to MWDMA1:PIO4
> [ 63.860982] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> [ 63.865862] ata1.00: failed command: READ DMA
> [ 63.883697] ata1.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
> [ 63.883697] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> [ 63.899573] ata1.00: status: { DRDY }
> [ 63.902649] ata1: soft resetting link
> [ 64.062580] ata1.01: NODEV after polling detection
> [ 64.073800] ata1.00: configured for MWDMA1
> [ 64.076813] ata1.00: device reported invalid CHS sector 0
> [ 64.096188] ata1: EH complete
>


Attachments:
0001-.patch (1.83 kB)

2015-12-15 10:08:28

by Borislav Petkov

[permalink] [raw]
Subject: Re: [LKP] [lkp] [x86/irq] 4c24cee6b2: IP-Config: Auto-configuration of network failed

On Tue, Dec 15, 2015 at 03:55:14PM +0800, Jiang Liu wrote:
> Aha, found a possible regression. Could you please help to apply
> the attached bugfix patch ontop of "cc22b9b83f6a x86/irq: Enhance
> __assign_irq_vector() to rollback in case of failure"?

Yap, attached patch seems to work here.

Tested-by: Borislav Petkov <[email protected]>

Thanks!

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.

2015-12-19 20:32:41

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [LKP] [lkp] [x86/irq] 4c24cee6b2: IP-Config: Auto-configuration of network failed

On Tue, 15 Dec 2015, Jiang Liu wrote:
> Hi Boris and Ying,
> Aha, found a possible regression. Could you please help to
> apply the attached bugfix patch ontop of "cc22b9b83f6a x86/irq:
> Enhance __assign_irq_vector() to rollback in case of failure"?
> Hi Ying, I have push this patch to github so it should reach
> 0day test farm soon:)

So if that's fixed, can you please resend the series with everything
folded back?

Thanks,

tglx

2015-12-23 14:10:18

by Jiang Liu

[permalink] [raw]
Subject: [Bugfix v2 1/5] x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer

Function __assign_irq_vector() makes use of apic_chip_data.old_domain
as a temporary buffer, which causes trouble to rollback logic in case of
failure. So use a dedicated temporary buffer for __assign_irq_vector().

Signed-off-by: Jiang Liu <[email protected]>
---
arch/x86/kernel/apic/vector.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 861bc59c8f25..d6ec36b4461e 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -30,7 +30,7 @@ struct apic_chip_data {

struct irq_domain *x86_vector_domain;
static DEFINE_RAW_SPINLOCK(vector_lock);
-static cpumask_var_t vector_cpumask;
+static cpumask_var_t vector_cpumask, used_cpumask;
static struct irq_chip lapic_controller;
#ifdef CONFIG_X86_IO_APIC
static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];
@@ -124,6 +124,7 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
/* Only try and allocate irqs on cpus that are present */
err = -ENOSPC;
cpumask_clear(d->old_domain);
+ cpumask_clear(used_cpumask);
cpu = cpumask_first_and(mask, cpu_online_mask);
while (cpu < nr_cpu_ids) {
int new_cpu, vector, offset;
@@ -157,9 +158,8 @@ next:
}

if (unlikely(current_vector == vector)) {
- cpumask_or(d->old_domain, d->old_domain,
- vector_cpumask);
- cpumask_andnot(vector_cpumask, mask, d->old_domain);
+ cpumask_or(used_cpumask, used_cpumask, vector_cpumask);
+ cpumask_andnot(vector_cpumask, mask, used_cpumask);
cpu = cpumask_first_and(vector_cpumask,
cpu_online_mask);
continue;
@@ -404,6 +404,7 @@ int __init arch_early_irq_init(void)
arch_init_htirq_domain(x86_vector_domain);

BUG_ON(!alloc_cpumask_var(&vector_cpumask, GFP_KERNEL));
+ BUG_ON(!alloc_cpumask_var(&used_cpumask, GFP_KERNEL));

return arch_early_ioapic_init();
}
--
1.7.10.4

2015-12-23 14:11:10

by Jiang Liu

[permalink] [raw]
Subject: [Bugfix v2 2/5] x86/irq: Enhance __assign_irq_vector() to rollback in case of failure

Enhance __assign_irq_vector() to rollback in case of failure so the
caller doesn't need to explicitly rollback.

Signed-off-by: Jiang Liu <[email protected]>
---
arch/x86/kernel/apic/vector.c | 26 ++++++++++++--------------
1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index d6ec36b4461e..b32c6ef7b4b0 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -117,6 +117,7 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
int cpu, err;
+ unsigned int dest;

if (d->move_in_progress)
return -EBUSY;
@@ -132,19 +133,21 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
apic->vector_allocation_domain(cpu, vector_cpumask, mask);

if (cpumask_subset(vector_cpumask, d->domain)) {
- err = 0;
- if (cpumask_equal(vector_cpumask, d->domain))
- break;
/*
* New cpumask using the vector is a proper subset of
* the current in use mask. So cleanup the vector
* allocation for the members that are not used anymore.
*/
+ cpumask_and(used_cpumask, d->domain, vector_cpumask);
+ err = apic->cpu_mask_to_apicid_and(mask, used_cpumask,
+ &dest);
+ if (err || cpumask_equal(vector_cpumask, d->domain))
+ break;
cpumask_andnot(d->old_domain, d->domain,
vector_cpumask);
d->move_in_progress =
cpumask_intersects(d->old_domain, cpu_online_mask);
- cpumask_and(d->domain, d->domain, vector_cpumask);
+ cpumask_copy(d->domain, used_cpumask);
break;
}

@@ -167,11 +170,13 @@ next:

if (test_bit(vector, used_vectors))
goto next;
-
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask) {
if (!IS_ERR_OR_NULL(per_cpu(vector_irq, new_cpu)[vector]))
goto next;
}
+ if (apic->cpu_mask_to_apicid_and(mask, vector_cpumask, &dest))
+ goto next;
+
/* Found one! */
current_vector = vector;
current_offset = offset;
@@ -190,8 +195,7 @@ next:

if (!err) {
/* cache destination APIC IDs into cfg->dest_apicid */
- err = apic->cpu_mask_to_apicid_and(mask, d->domain,
- &d->cfg.dest_apicid);
+ d->cfg.dest_apicid = dest;
}

return err;
@@ -493,14 +497,8 @@ static int apic_set_affinity(struct irq_data *irq_data,
return -EINVAL;

err = assign_irq_vector(irq, data, dest);
- if (err) {
- if (assign_irq_vector(irq, data,
- irq_data_get_affinity_mask(irq_data)))
- pr_err("Failed to recover vector for irq %d\n", irq);
- return err;
- }

- return IRQ_SET_MASK_OK;
+ return err ? err : IRQ_SET_MASK_OK;
}

static struct irq_chip lapic_controller = {
--
1.7.10.4

2015-12-23 14:11:05

by Jiang Liu

[permalink] [raw]
Subject: [Bugfix v2 3/5] x86/irq: Fix a race window in x86_vector_free_irqs()

There's a race condition between x86_vector_free_irqs()
{
free_apic_chip_data(irq_data->chip_data);
xxxxx //irq_data->chip_data has been freed, but the pointer
//hasn't been reset yet
irq_domain_reset_irq_data(irq_data);
}
and smp_irq_move_cleanup_interrupt()
{
raw_spin_lock(&vector_lock);
data = apic_chip_data(irq_desc_get_irq_data(desc));
access data->xxxx // may access freed memory
raw_spin_unlock(&desc->lock);
}
, which may cause smp_irq_move_cleanup_interrupt() accesses freed memory.
So use vector_lock to guard all memory free code in x86_vector_free_irqs().

Signed-off-by: Jiang Liu <[email protected]>
---
arch/x86/kernel/apic/vector.c | 20 ++++++++------------
1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index b32c6ef7b4b0..f648fce39d5e 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -228,23 +228,16 @@ static int assign_irq_vector_policy(int irq, int node,
static void clear_irq_vector(int irq, struct apic_chip_data *data)
{
struct irq_desc *desc;
- unsigned long flags;
- int cpu, vector;
-
- raw_spin_lock_irqsave(&vector_lock, flags);
- BUG_ON(!data->cfg.vector);
+ int cpu, vector = data->cfg.vector;

- vector = data->cfg.vector;
+ BUG_ON(!vector);
for_each_cpu_and(cpu, data->domain, cpu_online_mask)
per_cpu(vector_irq, cpu)[vector] = VECTOR_UNUSED;
-
data->cfg.vector = 0;
cpumask_clear(data->domain);

- if (likely(!data->move_in_progress)) {
- raw_spin_unlock_irqrestore(&vector_lock, flags);
+ if (likely(!data->move_in_progress))
return;
- }

desc = irq_to_desc(irq);
for_each_cpu_and(cpu, data->old_domain, cpu_online_mask) {
@@ -257,7 +250,7 @@ static void clear_irq_vector(int irq, struct apic_chip_data *data)
}
}
data->move_in_progress = 0;
- raw_spin_unlock_irqrestore(&vector_lock, flags);
+ cpumask_clear(data->old_domain);
}

void init_irq_alloc_info(struct irq_alloc_info *info,
@@ -279,18 +272,21 @@ static void x86_vector_free_irqs(struct irq_domain *domain,
unsigned int virq, unsigned int nr_irqs)
{
struct irq_data *irq_data;
+ unsigned long flags;
int i;

for (i = 0; i < nr_irqs; i++) {
irq_data = irq_domain_get_irq_data(x86_vector_domain, virq + i);
if (irq_data && irq_data->chip_data) {
+ raw_spin_lock_irqsave(&vector_lock, flags);
clear_irq_vector(virq + i, irq_data->chip_data);
free_apic_chip_data(irq_data->chip_data);
+ irq_domain_reset_irq_data(irq_data);
+ raw_spin_unlock_irqrestore(&vector_lock, flags);
#ifdef CONFIG_X86_IO_APIC
if (virq + i < nr_legacy_irqs())
legacy_irq_data[virq + i] = NULL;
#endif
- irq_domain_reset_irq_data(irq_data);
}
}
}
--
1.7.10.4

2015-12-23 14:11:12

by Jiang Liu

[permalink] [raw]
Subject: [Bugfix v2 4/5] x86/irq: Fix a race condition between vector assigning and cleanup

Joe Lawrence <[email protected]> reported an use after release
issue related to x86 IRQ management code. Please refer to following
link for more information:
https://www.mail-archive.com/[email protected]/msg1026840.html

Thomas pointed out that it's caused by a race condition between
__assign_irq_vector() and __send_cleanup_vector(). Based on Thomas'
draft patch, we solve this race condition by:
1) Use move_in_progress to signal that an IRQ cleanup IPI is needed
2) Use old_domain to save old CPU mask for IRQ cleanup
3) Use vector to protect move_in_progress and old_domain

This bugfix patch also helps to get rid of that atomic allocation in
__send_cleanup_vector().

Signed-off-by: Jiang Liu <[email protected]>
---
arch/x86/kernel/apic/vector.c | 76 ++++++++++++++++++-----------------------
1 file changed, 34 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index f648fce39d5e..ab54b296a7d0 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -119,7 +119,7 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
int cpu, err;
unsigned int dest;

- if (d->move_in_progress)
+ if (cpumask_intersects(d->old_domain, cpu_online_mask))
return -EBUSY;

/* Only try and allocate irqs on cpus that are present */
@@ -141,13 +141,14 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
cpumask_and(used_cpumask, d->domain, vector_cpumask);
err = apic->cpu_mask_to_apicid_and(mask, used_cpumask,
&dest);
- if (err || cpumask_equal(vector_cpumask, d->domain))
+ if (err)
break;
- cpumask_andnot(d->old_domain, d->domain,
- vector_cpumask);
- d->move_in_progress =
- cpumask_intersects(d->old_domain, cpu_online_mask);
- cpumask_copy(d->domain, used_cpumask);
+ d->cfg.dest_apicid = dest;
+ if (!cpumask_equal(vector_cpumask, d->domain)) {
+ cpumask_andnot(d->old_domain, d->domain,
+ vector_cpumask);
+ cpumask_copy(d->domain, used_cpumask);
+ }
break;
}

@@ -180,22 +181,20 @@ next:
/* Found one! */
current_vector = vector;
current_offset = offset;
- if (d->cfg.vector) {
+ if (d->cfg.vector)
cpumask_copy(d->old_domain, d->domain);
- d->move_in_progress =
- cpumask_intersects(d->old_domain, cpu_online_mask);
- }
+ d->cfg.vector = vector;
+ d->cfg.dest_apicid = dest;
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask)
per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
- d->cfg.vector = vector;
cpumask_copy(d->domain, vector_cpumask);
err = 0;
break;
}

if (!err) {
- /* cache destination APIC IDs into cfg->dest_apicid */
- d->cfg.dest_apicid = dest;
+ cpumask_and(d->old_domain, d->old_domain, cpu_online_mask);
+ d->move_in_progress = !cpumask_empty(d->old_domain);
}

return err;
@@ -227,7 +226,7 @@ static int assign_irq_vector_policy(int irq, int node,

static void clear_irq_vector(int irq, struct apic_chip_data *data)
{
- struct irq_desc *desc;
+ struct irq_desc *desc = irq_to_desc(irq);
int cpu, vector = data->cfg.vector;

BUG_ON(!vector);
@@ -236,10 +235,6 @@ static void clear_irq_vector(int irq, struct apic_chip_data *data)
data->cfg.vector = 0;
cpumask_clear(data->domain);

- if (likely(!data->move_in_progress))
- return;
-
- desc = irq_to_desc(irq);
for_each_cpu_and(cpu, data->old_domain, cpu_online_mask) {
for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS;
vector++) {
@@ -421,10 +416,13 @@ static void __setup_vector_irq(int cpu)
struct irq_data *idata = irq_desc_get_irq_data(desc);

data = apic_chip_data(idata);
- if (!data || !cpumask_test_cpu(cpu, data->domain))
- continue;
- vector = data->cfg.vector;
- per_cpu(vector_irq, cpu)[vector] = desc;
+ if (data) {
+ cpumask_clear_cpu(cpu, data->old_domain);
+ if (cpumask_test_cpu(cpu, data->domain)) {
+ vector = data->cfg.vector;
+ per_cpu(vector_irq, cpu)[vector] = desc;
+ }
+ }
}
/* Mark the free vectors */
for (vector = 0; vector < NR_VECTORS; ++vector) {
@@ -506,20 +504,17 @@ static struct irq_chip lapic_controller = {
#ifdef CONFIG_SMP
static void __send_cleanup_vector(struct apic_chip_data *data)
{
- cpumask_var_t cleanup_mask;
-
- if (unlikely(!alloc_cpumask_var(&cleanup_mask, GFP_ATOMIC))) {
- unsigned int i;
+ unsigned long flags;

- for_each_cpu_and(i, data->old_domain, cpu_online_mask)
- apic->send_IPI_mask(cpumask_of(i),
- IRQ_MOVE_CLEANUP_VECTOR);
- } else {
- cpumask_and(cleanup_mask, data->old_domain, cpu_online_mask);
- apic->send_IPI_mask(cleanup_mask, IRQ_MOVE_CLEANUP_VECTOR);
- free_cpumask_var(cleanup_mask);
- }
+ raw_spin_lock_irqsave(&vector_lock, flags);
+ if (!data->move_in_progress)
+ goto out_unlock;
data->move_in_progress = 0;
+ cpumask_and(data->old_domain, data->old_domain, cpu_online_mask);
+ if (!cpumask_empty(data->old_domain))
+ apic->send_IPI_mask(data->old_domain, IRQ_MOVE_CLEANUP_VECTOR);
+out_unlock:
+ raw_spin_unlock_irqrestore(&vector_lock, flags);
}

void send_cleanup_vector(struct irq_cfg *cfg)
@@ -563,14 +558,10 @@ asmlinkage __visible void smp_irq_move_cleanup_interrupt(void)
goto unlock;

/*
- * Check if the irq migration is in progress. If so, we
- * haven't received the cleanup request yet for this irq.
+ * Nothing to cleanup if this cpu is not set
+ * in the old_domain mask.
*/
- if (data->move_in_progress)
- goto unlock;
-
- if (vector == data->cfg.vector &&
- cpumask_test_cpu(me, data->domain))
+ if (!cpumask_test_cpu(me, data->old_domain))
goto unlock;

irr = apic_read(APIC_IRR + (vector / 32 * 0x10));
@@ -586,6 +577,7 @@ asmlinkage __visible void smp_irq_move_cleanup_interrupt(void)
goto unlock;
}
__this_cpu_write(vector_irq[vector], VECTOR_UNUSED);
+ cpumask_clear_cpu(me, data->old_domain);
unlock:
raw_spin_unlock(&desc->lock);
}
--
1.7.10.4

2015-12-23 14:11:14

by Jiang Liu

[permalink] [raw]
Subject: [Bugfix v2 5/5] x86/irq: Trivial cleanups for x86 vector allocation code

Trivial cleanups for x86 vector allocation code:
1) reorganize apic_chip_data to optimize for size and cache efficiency
2) avoid redundant calling of irq_to_desc()
3) refine code comments

Signed-off-by: Jiang Liu <[email protected]>
---
arch/x86/kernel/apic/vector.c | 54 ++++++++++++++++++-----------------------
1 file changed, 23 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index ab54b296a7d0..008114d0d2bd 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -23,9 +23,9 @@

struct apic_chip_data {
struct irq_cfg cfg;
+ u8 move_in_progress : 1;
cpumask_var_t domain;
cpumask_var_t old_domain;
- u8 move_in_progress : 1;
};

struct irq_domain *x86_vector_domain;
@@ -38,7 +38,7 @@ static struct apic_chip_data *legacy_irq_data[NR_IRQS_LEGACY];

void lock_vector_lock(void)
{
- /* Used to the online set of cpus does not change
+ /* Used to ensure that the online set of cpus does not change
* during assign_irq_vector.
*/
raw_spin_lock(&vector_lock);
@@ -100,8 +100,7 @@ static void free_apic_chip_data(struct apic_chip_data *data)
}
}

-static int __assign_irq_vector(int irq, struct apic_chip_data *d,
- const struct cpumask *mask)
+static int assign_irq_vector(struct irq_data *data, const struct cpumask *mask)
{
/*
* NOTE! The local APIC isn't very good at handling
@@ -116,11 +115,15 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d,
*/
static int current_vector = FIRST_EXTERNAL_VECTOR + VECTOR_OFFSET_START;
static int current_offset = VECTOR_OFFSET_START % 16;
- int cpu, err;
+ int cpu, err = -EBUSY;
+ struct irq_desc *desc = irq_data_to_desc(data);
+ struct apic_chip_data *d = data->chip_data;
unsigned int dest;
+ unsigned long flags;

+ raw_spin_lock_irqsave(&vector_lock, flags);
if (cpumask_intersects(d->old_domain, cpu_online_mask))
- return -EBUSY;
+ goto out;

/* Only try and allocate irqs on cpus that are present */
err = -ENOSPC;
@@ -186,7 +189,7 @@ next:
d->cfg.vector = vector;
d->cfg.dest_apicid = dest;
for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask)
- per_cpu(vector_irq, new_cpu)[vector] = irq_to_desc(irq);
+ per_cpu(vector_irq, new_cpu)[vector] = desc;
cpumask_copy(d->domain, vector_cpumask);
err = 0;
break;
@@ -196,37 +199,27 @@ next:
cpumask_and(d->old_domain, d->old_domain, cpu_online_mask);
d->move_in_progress = !cpumask_empty(d->old_domain);
}
-
- return err;
-}
-
-static int assign_irq_vector(int irq, struct apic_chip_data *data,
- const struct cpumask *mask)
-{
- int err;
- unsigned long flags;
-
- raw_spin_lock_irqsave(&vector_lock, flags);
- err = __assign_irq_vector(irq, data, mask);
+out:
raw_spin_unlock_irqrestore(&vector_lock, flags);
+
return err;
}

-static int assign_irq_vector_policy(int irq, int node,
- struct apic_chip_data *data,
+static int assign_irq_vector_policy(struct irq_data *data, int node,
struct irq_alloc_info *info)
{
if (info && info->mask)
- return assign_irq_vector(irq, data, info->mask);
+ return assign_irq_vector(data, info->mask);
if (node != NUMA_NO_NODE &&
- assign_irq_vector(irq, data, cpumask_of_node(node)) == 0)
+ assign_irq_vector(data, cpumask_of_node(node)) == 0)
return 0;
- return assign_irq_vector(irq, data, apic->target_cpus());
+ return assign_irq_vector(data, apic->target_cpus());
}

-static void clear_irq_vector(int irq, struct apic_chip_data *data)
+static void clear_irq_vector(struct irq_data *irq_data)
{
- struct irq_desc *desc = irq_to_desc(irq);
+ struct irq_desc *desc = irq_data_to_desc(irq_data);
+ struct apic_chip_data *data = irq_data->chip_data;
int cpu, vector = data->cfg.vector;

BUG_ON(!vector);
@@ -274,7 +267,7 @@ static void x86_vector_free_irqs(struct irq_domain *domain,
irq_data = irq_domain_get_irq_data(x86_vector_domain, virq + i);
if (irq_data && irq_data->chip_data) {
raw_spin_lock_irqsave(&vector_lock, flags);
- clear_irq_vector(virq + i, irq_data->chip_data);
+ clear_irq_vector(irq_data);
free_apic_chip_data(irq_data->chip_data);
irq_domain_reset_irq_data(irq_data);
raw_spin_unlock_irqrestore(&vector_lock, flags);
@@ -319,7 +312,7 @@ static int x86_vector_alloc_irqs(struct irq_domain *domain, unsigned int virq,
irq_data->chip = &lapic_controller;
irq_data->chip_data = data;
irq_data->hwirq = virq + i;
- err = assign_irq_vector_policy(virq + i, node, data, info);
+ err = assign_irq_vector_policy(irq_data, node, info);
if (err)
goto error;
}
@@ -481,8 +474,7 @@ void apic_ack_edge(struct irq_data *data)
static int apic_set_affinity(struct irq_data *irq_data,
const struct cpumask *dest, bool force)
{
- struct apic_chip_data *data = irq_data->chip_data;
- int err, irq = irq_data->irq;
+ int err;

if (!config_enabled(CONFIG_SMP))
return -EPERM;
@@ -490,7 +482,7 @@ static int apic_set_affinity(struct irq_data *irq_data,
if (!cpumask_intersects(dest, cpu_online_mask))
return -EINVAL;

- err = assign_irq_vector(irq, data, dest);
+ err = assign_irq_vector(irq_data, dest);

return err ? err : IRQ_SET_MASK_OK;
}
--
1.7.10.4

2015-12-23 18:41:44

by Borislav Petkov

[permalink] [raw]
Subject: Re: [Bugfix v2 4/5] x86/irq: Fix a race condition between vector assigning and cleanup

On Wed, Dec 23, 2015 at 10:13:29PM +0800, Jiang Liu wrote:
> Joe Lawrence <[email protected]> reported an use after release
> issue related to x86 IRQ management code. Please refer to following
> link for more information:
> https://www.mail-archive.com/[email protected]/msg1026840.html

Please use link references of the following format:

http://lkml.kernel.org/r/<Message-ID>

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.

2015-12-23 19:10:45

by Borislav Petkov

[permalink] [raw]
Subject: Re: [Bugfix v2 1/5] x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer

On Wed, Dec 23, 2015 at 10:13:26PM +0800, Jiang Liu wrote:
> Function __assign_irq_vector() makes use of apic_chip_data.old_domain
> as a temporary buffer, which causes trouble to rollback logic in case of
> failure. So use a dedicated temporary buffer for __assign_irq_vector().
>
> Signed-off-by: Jiang Liu <[email protected]>
> ---
> arch/x86/kernel/apic/vector.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)

All 5:

Tested-by: Borislav Petkov <[email protected]>

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.

2015-12-24 05:15:40

by Jeremiah Mahler

[permalink] [raw]
Subject: Re: [Bugfix v2 1/5] x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer

Jiang,

On Wed, Dec 23, 2015 at 10:13:26PM +0800, Jiang Liu wrote:
> Function __assign_irq_vector() makes use of apic_chip_data.old_domain
> as a temporary buffer, which causes trouble to rollback logic in case of
> failure. So use a dedicated temporary buffer for __assign_irq_vector().
>
> Signed-off-by: Jiang Liu <[email protected]>
> ---
> arch/x86/kernel/apic/vector.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
[...]

I tried this patch and the rest in the series but unfortunately
the bug is still present.

[ 10.184649] wlan0: authenticated
[ 10.187883] wlan0: associate with 02:1a:11:fb:90:1c (try 1/3)
[ 10.191574] do_IRQ: 0.35 No irq handler for vector
[ 10.191589] do_IRQ: 0.35 No irq handler for vector
[ 10.198159] do_IRQ: 0.35 No irq handler for vector
[ 10.198165] do_IRQ: 0.35 No irq handler for vector
[ 10.200534] wlan0: RX AssocResp from 02:1a:11:fb:90:1c (capab=0x431
status=0 aid=1)
[ 10.204611] wlan0: associated
[ 10.238883] do_IRQ: 0.35 No irq handler for vector
[ 10.238892] do_IRQ: 0.35 No irq handler for vector
[ 10.280716] do_IRQ: 0.35 No irq handler for vector
[ 10.281083] do_IRQ: 0.35 No irq handler for vector
[ 10.286484] do_IRQ: 0.35 No irq handler for vector
...

--
- Jeremiah Mahler

2015-12-24 14:34:33

by Joe Lawrence

[permalink] [raw]
Subject: Re: [Bugfix v2 1/5] x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer

On 12/23/2015 09:13 AM, Jiang Liu wrote:
> Function __assign_irq_vector() makes use of apic_chip_data.old_domain
> as a temporary buffer, which causes trouble to rollback logic in case of
> failure. So use a dedicated temporary buffer for __assign_irq_vector().
>
> Signed-off-by: Jiang Liu <[email protected]>

Hi Jiang,

FWIW, my overnight testing is still running okay, so for v2:

Tested-by: Joe Lawrence <[email protected]>

However, it looks like there will probably be a v3.

-- Joe

2015-12-28 08:24:39

by Jiang Liu

[permalink] [raw]
Subject: Re: [Bugfix v2 1/5] x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer

On 2015/12/24 13:15, Jeremiah Mahler wrote:
> Jiang,
>
> On Wed, Dec 23, 2015 at 10:13:26PM +0800, Jiang Liu wrote:
>> Function __assign_irq_vector() makes use of apic_chip_data.old_domain
>> as a temporary buffer, which causes trouble to rollback logic in case of
>> failure. So use a dedicated temporary buffer for __assign_irq_vector().
>>
>> Signed-off-by: Jiang Liu <[email protected]>
>> ---
>> arch/x86/kernel/apic/vector.c | 9 +++++----
>> 1 file changed, 5 insertions(+), 4 deletions(-)
> [...]
>
> I tried this patch and the rest in the series but unfortunately
> the bug is still present.
>
> [ 10.184649] wlan0: authenticated
> [ 10.187883] wlan0: associate with 02:1a:11:fb:90:1c (try 1/3)
> [ 10.191574] do_IRQ: 0.35 No irq handler for vector
> [ 10.191589] do_IRQ: 0.35 No irq handler for vector
> [ 10.198159] do_IRQ: 0.35 No irq handler for vector
> [ 10.198165] do_IRQ: 0.35 No irq handler for vector
> [ 10.200534] wlan0: RX AssocResp from 02:1a:11:fb:90:1c (capab=0x431
> status=0 aid=1)
> [ 10.204611] wlan0: associated
> [ 10.238883] do_IRQ: 0.35 No irq handler for vector
> [ 10.238892] do_IRQ: 0.35 No irq handler for vector
> [ 10.280716] do_IRQ: 0.35 No irq handler for vector
> [ 10.281083] do_IRQ: 0.35 No irq handler for vector
> [ 10.286484] do_IRQ: 0.35 No irq handler for vector
> ...
>
Hi Jeremiah,
Could you please help to confirm which commit caused the
regression?
1) x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary
buffer
2) x86/irq: Fix a race condition between vector assigning and cleanup

Thanks,
Gerry

2015-12-29 03:26:10

by Jeremiah Mahler

[permalink] [raw]
Subject: Re: [Bugfix v2 1/5] x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer

Gerry,

On Mon, Dec 28, 2015 at 04:24:32PM +0800, Jiang Liu wrote:
[...]
> Hi Jeremiah,
> Could you please help to confirm which commit caused the
> regression?
> 1) x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary
> buffer
> 2) x86/irq: Fix a race condition between vector assigning and cleanup
>
> Thanks,
> Gerry

According to my original bisect [1], it is the second patch.

[1]: https://lkml.org/lkml/2015/12/20/11

--
- Jeremiah Mahler

2015-12-29 13:41:08

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [Bugfix v2 3/5] x86/irq: Fix a race window in x86_vector_free_irqs()

On Wed, 23 Dec 2015, Jiang Liu wrote:
> void init_irq_alloc_info(struct irq_alloc_info *info,
> @@ -279,18 +272,21 @@ static void x86_vector_free_irqs(struct irq_domain *domain,
> unsigned int virq, unsigned int nr_irqs)
> {
> struct irq_data *irq_data;
> + unsigned long flags;
> int i;
>
> for (i = 0; i < nr_irqs; i++) {
> irq_data = irq_domain_get_irq_data(x86_vector_domain, virq + i);
> if (irq_data && irq_data->chip_data) {
> + raw_spin_lock_irqsave(&vector_lock, flags);
> clear_irq_vector(virq + i, irq_data->chip_data);
> free_apic_chip_data(irq_data->chip_data);
> + irq_domain_reset_irq_data(irq_data);
> + raw_spin_unlock_irqrestore(&vector_lock, flags);

We really do not need to free stuff under the vector lock.

lock();
data = irq_data->chip_data;
irq_domain_reset_irq_data(irq_data);
unlock();
free_apic_chip_data(data);

Hmm?

Thanks,

tglx

2015-12-30 17:27:01

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [Bugfix v2 4/5] x86/irq: Fix a race condition between vector assigning and cleanup

On Wed, 23 Dec 2015, Jiang Liu wrote:
> static void clear_irq_vector(int irq, struct apic_chip_data *data)
> {
> - struct irq_desc *desc;
> + struct irq_desc *desc = irq_to_desc(irq);
> int cpu, vector = data->cfg.vector;
>
> BUG_ON(!vector);
> @@ -236,10 +235,6 @@ static void clear_irq_vector(int irq, struct apic_chip_data *data)
> data->cfg.vector = 0;
> cpumask_clear(data->domain);
>
> - if (likely(!data->move_in_progress))
> - return;

Why are you removing this?

> -
> - desc = irq_to_desc(irq);
> for_each_cpu_and(cpu, data->old_domain, cpu_online_mask) {
> for (vector = FIRST_EXTERNAL_VECTOR; vector < NR_VECTORS;
> vector++) {
> @@ -421,10 +416,13 @@ static void __setup_vector_irq(int cpu)
> struct irq_data *idata = irq_desc_get_irq_data(desc);
>
> data = apic_chip_data(idata);
> - if (!data || !cpumask_test_cpu(cpu, data->domain))
> - continue;
> - vector = data->cfg.vector;
> - per_cpu(vector_irq, cpu)[vector] = desc;
> + if (data) {
> + cpumask_clear_cpu(cpu, data->old_domain);

Why would the newly online cpu be in data->old_domain?

> + if (cpumask_test_cpu(cpu, data->domain)) {
> + vector = data->cfg.vector;
> + per_cpu(vector_irq, cpu)[vector] = desc;
> + }
> + }
> @@ -563,14 +558,10 @@ asmlinkage __visible void smp_irq_move_cleanup_interrupt(void)
> goto unlock;
>
> /*
> - * Check if the irq migration is in progress. If so, we
> - * haven't received the cleanup request yet for this irq.
> + * Nothing to cleanup if this cpu is not set
> + * in the old_domain mask.
> */
> - if (data->move_in_progress)
> - goto unlock;

Removing this is broken. If data->move_in_progress is set, then you cannot
clear the vector. If there are two interrupt moves pending then the IPI of the
first one will clear the second one as well, which might be still targeted to
this cpu.

This whole patch set is way too complex. A lot of tiny changes here and there
and none of them is documented.

Thanks,

tglx

2015-12-30 18:53:39

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [Bugfix v2 2/5] x86/irq: Enhance __assign_irq_vector() to rollback in case of failure

On Wed, 23 Dec 2015, Jiang Liu wrote:
> @@ -167,11 +170,13 @@ next:
>
> if (test_bit(vector, used_vectors))
> goto next;
> -
> for_each_cpu_and(new_cpu, vector_cpumask, cpu_online_mask) {
> if (!IS_ERR_OR_NULL(per_cpu(vector_irq, new_cpu)[vector]))
> goto next;
> }
> + if (apic->cpu_mask_to_apicid_and(mask, vector_cpumask, &dest))
> + goto next;

This is silly. If the checks in cpu_mask_to_apicid_and() fail, then there is
no point to try the next vector number simply because the checks will fail
with the next vector again. You need a new vector_cpumask() in order to make
progress.

> +
> /* Found one! */
> current_vector = vector;
> current_offset = offset;
> @@ -190,8 +195,7 @@ next:
>
> if (!err) {
> /* cache destination APIC IDs into cfg->dest_apicid */
> - err = apic->cpu_mask_to_apicid_and(mask, d->domain,
> - &d->cfg.dest_apicid);

I have serious doubts that this is the proper solution.

d->domain is @vector_cpumask. @vector_cpumask gets assigned by the call to
apic->vector_allocation_domain()

So the only way this can fail is when:

vector_cpumask & mask & cpu_online_mask == 0

So we can do that check way earlier, i.e. right after the call to
apic->vector_allocation_domain(). Once we have established that this check
does not fail, we know that the call to apic->cpu_mask_to_apicid_and() wont
fail either once we have a vector number to use.

Thanks,

tglx

2015-12-30 22:52:00

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [Bugfix v2 4/5] x86/irq: Fix a race condition between vector assigning and cleanup

On Wed, 23 Dec 2015, Jiang Liu wrote:
> /*
> - * Check if the irq migration is in progress. If so, we
> - * haven't received the cleanup request yet for this irq.
> + * Nothing to cleanup if this cpu is not set
> + * in the old_domain mask.
> */
> - if (data->move_in_progress)
> - goto unlock;
> -
> - if (vector == data->cfg.vector &&
> - cpumask_test_cpu(me, data->domain))

Actually we cannot remove this check either. If the vector changed from 100 to
80, then we cleanup vector 80 which is the active vector and the old one
remains stale. Not pretty, right?

I'll send out a fine granular series tomorrow when my brain is more awake.

Thanks,

tglx