Hi,
next-20150619 is un-bootable (it `drowns` in warnings).
I modified irq_move_masked_irq() to do WARN_ONCE() instead of WARN_ON
every !irqd_can_balance(), and here it is:
[ 0.412227] ------------[ cut here ]------------
[ 0.412291] WARNING: CPU: 0 PID: 0 at kernel/irq/migration.c:21 irq_move_masked_irq+0x57/0xc4()
[ 0.412371] Can't balance irq 0 [edge]
[ 0.412428] Modules linked in:
[ 0.412524] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.0-rc8-next-20150619-dbg-00010-g841f367-dirty #228
[ 0.412606] Hardware name: SAMSUNG ELECTRONICS CO.,LTD Samsung DeskTop System/Samsung DeskTop System, BIOS 05CC 04/09/2010
[ 0.412693] 0000000000000009 ffff880137c03e18 ffffffff814f2acd ffffffff810851a4
[ 0.412920] ffff880137c03e68 ffff880137c03e58 ffffffff8103d196 0000000000000000
[ 0.413146] ffffffff8108bfe8 ffff880133007c00 0000000000000000 0000000000000000
[ 0.413373] Call Trace:
[ 0.413428] <IRQ> [<ffffffff814f2acd>] dump_stack+0x4c/0x6e
[ 0.413532] [<ffffffff810851a4>] ? console_unlock+0x400/0x42f
[ 0.413596] [<ffffffff8103d196>] warn_slowpath_common+0x9b/0xb5
[ 0.413659] [<ffffffff8108bfe8>] ? irq_move_masked_irq+0x57/0xc4
[ 0.413722] [<ffffffff8103d1f6>] warn_slowpath_fmt+0x46/0x48
[ 0.413785] [<ffffffff8108bfe8>] irq_move_masked_irq+0x57/0xc4
[ 0.413847] [<ffffffff8108c097>] irq_move_irq+0x42/0x56
[ 0.413910] [<ffffffff8102c410>] apic_ack_edge+0x23/0x3b
[ 0.413972] [<ffffffff81089b79>] irq_chip_ack_parent+0x14/0x16
[ 0.414034] [<ffffffff810894cd>] handle_edge_irq+0x90/0xe1
[ 0.414096] [<ffffffff81004aa9>] handle_irq+0x24/0x2a
[ 0.414158] [<ffffffff814fb2ec>] do_IRQ+0x4c/0xcd
[ 0.414219] [<ffffffff814f9c6a>] common_interrupt+0x6a/0x6a
[ 0.414280] <EOI> [<ffffffff81403266>] ? cpuidle_enter_state+0x195/0x29d
[ 0.414386] [<ffffffff81403262>] ? cpuidle_enter_state+0x191/0x29d
[ 0.414450] [<ffffffff8140339a>] cpuidle_enter+0x17/0x19
[ 0.414512] [<ffffffff81075c30>] call_cpuidle+0x55/0x57
[ 0.414573] [<ffffffff81403381>] ? cpuidle_select+0x13/0x15
[ 0.414636] [<ffffffff81075eac>] cpu_startup_entry+0x27a/0x399
[ 0.414700] [<ffffffff814ef8bd>] rest_init+0x134/0x13a
[ 0.414762] [<ffffffff81b0df2c>] start_kernel+0x41a/0x427
[ 0.414825] [<ffffffff81b0d2fe>] x86_64_start_reservations+0x2a/0x2c
[ 0.414889] [<ffffffff81b0d46b>] x86_64_start_kernel+0x16b/0x17a
[ 0.414987] ---[ end trace 082909e0a7e1599a ]---
Do you guys want to replace WAN_ON() with WARN_ONCE(), perhaps? This, of course,
doesn't fix anything; but at least one can boot the system. (not really a patch,
just an idea).
Signed-off-by: Sergey Senozhatsky <[email protected]>
---
kernel/irq/migration.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/irq/migration.c b/kernel/irq/migration.c
index dd203e2..0352b04 100644
--- a/kernel/irq/migration.c
+++ b/kernel/irq/migration.c
@@ -16,7 +16,9 @@ void irq_move_masked_irq(struct irq_data *idata)
* Paranoia: cpu-local interrupts shouldn't be calling in here anyway.
*/
if (!irqd_can_balance(&desc->irq_data)) {
- WARN_ON(1);
+ WARN_ONCE(1, "Can't balance irq %d [%s]\n",
+ desc->irq_data.irq,
+ desc->name);
return;
}
On Fri, 19 Jun 2015, Sergey Senozhatsky wrote:
> [ 0.412291] WARNING: CPU: 0 PID: 0 at kernel/irq/migration.c:21 irq_move_masked_irq+0x57/0xc4()
> [ 0.412371] Can't balance irq 0 [edge]
Yuck.
> Do you guys want to replace WAN_ON() with WARN_ONCE(), perhaps? This, of course,
> doesn't fix anything; but at least one can boot the system. (not really a patch,
> just an idea).
Indeed. We really want to clear the move pending bit before the can
balance check. Patch below. But that does not explain why this happens
in the first place.
Can you please send me a full dmesg, kernel config and output of
/proc/interrupts ? (Private mail is fine, or upload it to some place)
Thanks,
tglx
---
diff --git a/kernel/irq/migration.c b/kernel/irq/migration.c
index dd203e276b07..8cf8c24afaab 100644
--- a/kernel/irq/migration.c
+++ b/kernel/irq/migration.c
@@ -12,16 +12,19 @@ void irq_move_masked_irq(struct irq_data *idata)
if (likely(!irqd_is_setaffinity_pending(&desc->irq_data)))
return;
+ irqd_clr_move_pending(&desc->irq_data);
+
/*
* Paranoia: cpu-local interrupts shouldn't be calling in here anyway.
*/
if (!irqd_can_balance(&desc->irq_data)) {
- WARN_ON(1);
+ pr_warn("Cannot move irq %d: %s\n",
+ desc->irq_data.irq,
+ irqd_is_per_cpu(&desc->irq_data) ?
+ "per cpu" : "balancing disabled");
return;
}
- irqd_clr_move_pending(&desc->irq_data);
-
if (unlikely(cpumask_empty(desc->pending_mask)))
return;
On Fri, 19 Jun 2015, Thomas Gleixner wrote:
> On Fri, 19 Jun 2015, Sergey Senozhatsky wrote:
> > [ 0.412291] WARNING: CPU: 0 PID: 0 at kernel/irq/migration.c:21 irq_move_masked_irq+0x57/0xc4()
> > [ 0.412371] Can't balance irq 0 [edge]
>
> Yuck.
>
> > Do you guys want to replace WAN_ON() with WARN_ONCE(), perhaps? This, of course,
> > doesn't fix anything; but at least one can boot the system. (not really a patch,
> > just an idea).
>
> Indeed. We really want to clear the move pending bit before the can
> balance check. Patch below. But that does not explain why this happens
> in the first place.
>
> Can you please send me a full dmesg, kernel config and output of
> /proc/interrupts ? (Private mail is fine, or upload it to some place)
Thanks for providing the data. I think I know what happens.
Something in the kernel (not yet clear what) tries to move the hpet
irq 0 by calling irq_set_affinity(). That's an kernel internal
interface which does not check whether the NO BALANCE flag is set for
the irq. So the call runs and triggers the move from next interrupt
machinery which ends up calling irq_move_masked_irq() and that trips
over the flag and yells.
That's why I changed the WARN to a pr_warn() because we already know
the call stack.
So the core behaviour is inconsistent. We let the caller of
irq_set_affinity() succeed and yell later because we think it's wrong.
I'm pretty sure that we must drop the check for NO BALANCE in
irq_move_masked_irq() and only check for the per_cpu bit, but at the
same time I really want to know where that call to irq_set_affinity(irq0)
is coming from.
Can you please collect the output of /proc/timer_list for the previous
patch and then replace the previous patch with the one below and
gather all the data again?
Thanks,
tglx
On (06/19/15 14:21), Thomas Gleixner wrote:
> On Fri, 19 Jun 2015, Thomas Gleixner wrote:
> > On Fri, 19 Jun 2015, Sergey Senozhatsky wrote:
> > > [ 0.412291] WARNING: CPU: 0 PID: 0 at kernel/irq/migration.c:21 irq_move_masked_irq+0x57/0xc4()
> > > [ 0.412371] Can't balance irq 0 [edge]
> >
> > Yuck.
> >
> > > Do you guys want to replace WAN_ON() with WARN_ONCE(), perhaps? This, of course,
> > > doesn't fix anything; but at least one can boot the system. (not really a patch,
> > > just an idea).
> >
> > Indeed. We really want to clear the move pending bit before the can
> > balance check. Patch below. But that does not explain why this happens
> > in the first place.
> >
> > Can you please send me a full dmesg, kernel config and output of
> > /proc/interrupts ? (Private mail is fine, or upload it to some place)
>
> Thanks for providing the data. I think I know what happens.
>
> Something in the kernel (not yet clear what) tries to move the hpet
> irq 0 by calling irq_set_affinity(). That's an kernel internal
> interface which does not check whether the NO BALANCE flag is set for
> the irq. So the call runs and triggers the move from next interrupt
> machinery which ends up calling irq_move_masked_irq() and that trips
> over the flag and yells.
>
> That's why I changed the WARN to a pr_warn() because we already know
> the call stack.
>
> So the core behaviour is inconsistent. We let the caller of
> irq_set_affinity() succeed and yell later because we think it's wrong.
>
> I'm pretty sure that we must drop the check for NO BALANCE in
> irq_move_masked_irq() and only check for the per_cpu bit, but at the
> same time I really want to know where that call to irq_set_affinity(irq0)
> is coming from.
>
> Can you please collect the output of /proc/timer_list for the previous
> patch and then replace the previous patch with the one below and
> gather all the data again?
>
It's 10pm here in Korea and I'm out of office already. I'll try
to collect the data tomorrow (or on Monday in the worst case).
Thank you.
-ss
On 2015/6/19 20:21, Thomas Gleixner wrote:
> On Fri, 19 Jun 2015, Thomas Gleixner wrote:
>> On Fri, 19 Jun 2015, Sergey Senozhatsky wrote:
>>> [ 0.412291] WARNING: CPU: 0 PID: 0 at kernel/irq/migration.c:21 irq_move_masked_irq+0x57/0xc4()
>>> [ 0.412371] Can't balance irq 0 [edge]
>>
>> Yuck.
>>
>>> Do you guys want to replace WAN_ON() with WARN_ONCE(), perhaps? This, of course,
>>> doesn't fix anything; but at least one can boot the system. (not really a patch,
>>> just an idea).
>>
>> Indeed. We really want to clear the move pending bit before the can
>> balance check. Patch below. But that does not explain why this happens
>> in the first place.
>>
>> Can you please send me a full dmesg, kernel config and output of
>> /proc/interrupts ? (Private mail is fine, or upload it to some place)
>
> Thanks for providing the data. I think I know what happens.
>
> Something in the kernel (not yet clear what) tries to move the hpet
> irq 0 by calling irq_set_affinity(). That's an kernel internal
> interface which does not check whether the NO BALANCE flag is set for
> the irq. So the call runs and triggers the move from next interrupt
> machinery which ends up calling irq_move_masked_irq() and that trips
> over the flag and yells.
>
> That's why I changed the WARN to a pr_warn() because we already know
> the call stack.
>
> So the core behaviour is inconsistent. We let the caller of
> irq_set_affinity() succeed and yell later because we think it's wrong.
>
> I'm pretty sure that we must drop the check for NO BALANCE in
> irq_move_masked_irq() and only check for the per_cpu bit, but at the
> same time I really want to know where that call to irq_set_affinity(irq0)
> is coming from.
>
> Can you please collect the output of /proc/timer_list for the previous
> patch and then replace the previous patch with the one below and
> gather all the data again?
Hi Thomas,
Maybe it's caused by the hpet driver itself?
irq_set_affinity() may set the IRQD_SETAFFINITY_PENDING flag,
thus triggering the warning.
---------------------------------------------------------------
static int hpet_setup_irq(struct hpet_dev *dev)
{
if (request_irq(dev->irq, hpet_interrupt_handler,
IRQF_TIMER | IRQF_NOBALANCING,
dev->name, dev))
return -1;
disable_irq(dev->irq);
irq_set_affinity(dev->irq, cpumask_of(dev->cpu));
enable_irq(dev->irq);
printk(KERN_DEBUG "hpet: %s irq %d for MSI\n",
dev->name, dev->irq);
return 0;
}
-------------------------------------------------------------
Thanks!
Gerry
>
> Thanks,
>
> tglx
>
[...]
>> Something in the kernel (not yet clear what) tries to move the hpet
>> irq 0 by calling irq_set_affinity(). That's an kernel internal
>> interface which does not check whether the NO BALANCE flag is set for
>> the irq. So the call runs and triggers the move from next interrupt
>> machinery which ends up calling irq_move_masked_irq() and that trips
>> over the flag and yells.
>>
>> That's why I changed the WARN to a pr_warn() because we already know
>> the call stack.
>>
>> So the core behaviour is inconsistent. We let the caller of
>> irq_set_affinity() succeed and yell later because we think it's wrong.
>>
>> I'm pretty sure that we must drop the check for NO BALANCE in
>> irq_move_masked_irq() and only check for the per_cpu bit, but at the
>> same time I really want to know where that call to irq_set_affinity(irq0)
>> is coming from.
>>
>> Can you please collect the output of /proc/timer_list for the previous
>> patch and then replace the previous patch with the one below and
>> gather all the data again?
>
> Hi Thomas,
> Maybe it's caused by the hpet driver itself?
> irq_set_affinity() may set the IRQD_SETAFFINITY_PENDING flag,
> thus triggering the warning.
And the usage pattern seems reasonable, the IRQF_NOBALANCING flag
means nobody may change the affinity except myself:)
On Sat, 20 Jun 2015, Jiang Liu wrote:
> [...]
> >> Something in the kernel (not yet clear what) tries to move the hpet
> >> irq 0 by calling irq_set_affinity(). That's an kernel internal
> >> interface which does not check whether the NO BALANCE flag is set for
> >> the irq. So the call runs and triggers the move from next interrupt
> >> machinery which ends up calling irq_move_masked_irq() and that trips
> >> over the flag and yells.
> >>
> >> That's why I changed the WARN to a pr_warn() because we already know
> >> the call stack.
> >>
> >> So the core behaviour is inconsistent. We let the caller of
> >> irq_set_affinity() succeed and yell later because we think it's wrong.
> >>
> >> I'm pretty sure that we must drop the check for NO BALANCE in
> >> irq_move_masked_irq() and only check for the per_cpu bit, but at the
> >> same time I really want to know where that call to irq_set_affinity(irq0)
> >> is coming from.
> >>
> >> Can you please collect the output of /proc/timer_list for the previous
> >> patch and then replace the previous patch with the one below and
> >> gather all the data again?
> >
> > Hi Thomas,
> > Maybe it's caused by the hpet driver itself?
> > irq_set_affinity() may set the IRQD_SETAFFINITY_PENDING flag,
> > thus triggering the warning.
> And the usage pattern seems reasonable, the IRQF_NOBALANCING flag
> means nobody may change the affinity except myself:)
Right, that's why I removed the restriction. I just wonder why we have
not seen that before ...
Thanks,
tglx
On 2015/6/20 0:15, Thomas Gleixner wrote:
> On Sat, 20 Jun 2015, Jiang Liu wrote:
>
>> [...]
>>>> Something in the kernel (not yet clear what) tries to move the hpet
>>>> irq 0 by calling irq_set_affinity(). That's an kernel internal
>>>> interface which does not check whether the NO BALANCE flag is set for
>>>> the irq. So the call runs and triggers the move from next interrupt
>>>> machinery which ends up calling irq_move_masked_irq() and that trips
>>>> over the flag and yells.
>>>>
>>>> That's why I changed the WARN to a pr_warn() because we already know
>>>> the call stack.
>>>>
>>>> So the core behaviour is inconsistent. We let the caller of
>>>> irq_set_affinity() succeed and yell later because we think it's wrong.
>>>>
>>>> I'm pretty sure that we must drop the check for NO BALANCE in
>>>> irq_move_masked_irq() and only check for the per_cpu bit, but at the
>>>> same time I really want to know where that call to irq_set_affinity(irq0)
>>>> is coming from.
>>>>
>>>> Can you please collect the output of /proc/timer_list for the previous
>>>> patch and then replace the previous patch with the one below and
>>>> gather all the data again?
>>>
>>> Hi Thomas,
>>> Maybe it's caused by the hpet driver itself?
>>> irq_set_affinity() may set the IRQD_SETAFFINITY_PENDING flag,
>>> thus triggering the warning.
>> And the usage pattern seems reasonable, the IRQF_NOBALANCING flag
>> means nobody may change the affinity except myself:)
>
> Right, that's why I removed the restriction. I just wonder why we have
> not seen that before ...
I suspected it's caused by the hierarchy irqdomain at first glance
because the multiple irq_datas issue, but seems it's not after checking
the code. It will only be triggered if HPET works in MSI mode instead of
legacy IRQ mode, but still need more investigation here.
On (06/19/15 18:15), Thomas Gleixner wrote:
[..]
> > > Hi Thomas,
> > > Maybe it's caused by the hpet driver itself?
> > > irq_set_affinity() may set the IRQD_SETAFFINITY_PENDING flag,
> > > thus triggering the warning.
> > And the usage pattern seems reasonable, the IRQF_NOBALANCING flag
> > means nobody may change the affinity except myself:)
>
> Right, that's why I removed the restriction.
>
> I just wonder why we have not seen that before ...
just a note,
I compile/run every linux-next on at least two different
x86-64 boxen, w/o changing the .config a lot.
-ss
On Sat, 20 Jun 2015, Jiang Liu wrote:
> On 2015/6/20 0:15, Thomas Gleixner wrote:
> > On Sat, 20 Jun 2015, Jiang Liu wrote:
> >
> >> [...]
> >>>> Something in the kernel (not yet clear what) tries to move the hpet
> >>>> irq 0 by calling irq_set_affinity(). That's an kernel internal
> >>>> interface which does not check whether the NO BALANCE flag is set for
> >>>> the irq. So the call runs and triggers the move from next interrupt
> >>>> machinery which ends up calling irq_move_masked_irq() and that trips
> >>>> over the flag and yells.
> >>>>
> >>>> That's why I changed the WARN to a pr_warn() because we already know
> >>>> the call stack.
> >>>>
> >>>> So the core behaviour is inconsistent. We let the caller of
> >>>> irq_set_affinity() succeed and yell later because we think it's wrong.
> >>>>
> >>>> I'm pretty sure that we must drop the check for NO BALANCE in
> >>>> irq_move_masked_irq() and only check for the per_cpu bit, but at the
> >>>> same time I really want to know where that call to irq_set_affinity(irq0)
> >>>> is coming from.
> >>>>
> >>>> Can you please collect the output of /proc/timer_list for the previous
> >>>> patch and then replace the previous patch with the one below and
> >>>> gather all the data again?
> >>>
> >>> Hi Thomas,
> >>> Maybe it's caused by the hpet driver itself?
> >>> irq_set_affinity() may set the IRQD_SETAFFINITY_PENDING flag,
> >>> thus triggering the warning.
> >> And the usage pattern seems reasonable, the IRQF_NOBALANCING flag
> >> means nobody may change the affinity except myself:)
> >
> > Right, that's why I removed the restriction. I just wonder why we have
> > not seen that before ...
> I suspected it's caused by the hierarchy irqdomain at first glance
> because the multiple irq_datas issue, but seems it's not after checking
> the code. It will only be triggered if HPET works in MSI mode instead of
> legacy IRQ mode, but still need more investigation here.
Right. And what confuses me is that this happens on irq0, which is
ioapic edge.
Thanks,
tglx
On 2015/6/20 0:30, Thomas Gleixner wrote:
> On Sat, 20 Jun 2015, Jiang Liu wrote:
>> On 2015/6/20 0:15, Thomas Gleixner wrote:
>>> On Sat, 20 Jun 2015, Jiang Liu wrote:
>>>
>>>> [...]
>>>>>> Something in the kernel (not yet clear what) tries to move the hpet
>>>>>> irq 0 by calling irq_set_affinity(). That's an kernel internal
>>>>>> interface which does not check whether the NO BALANCE flag is set for
>>>>>> the irq. So the call runs and triggers the move from next interrupt
>>>>>> machinery which ends up calling irq_move_masked_irq() and that trips
>>>>>> over the flag and yells.
>>>>>>
>>>>>> That's why I changed the WARN to a pr_warn() because we already know
>>>>>> the call stack.
>>>>>>
>>>>>> So the core behaviour is inconsistent. We let the caller of
>>>>>> irq_set_affinity() succeed and yell later because we think it's wrong.
>>>>>>
>>>>>> I'm pretty sure that we must drop the check for NO BALANCE in
>>>>>> irq_move_masked_irq() and only check for the per_cpu bit, but at the
>>>>>> same time I really want to know where that call to irq_set_affinity(irq0)
>>>>>> is coming from.
>>>>>>
>>>>>> Can you please collect the output of /proc/timer_list for the previous
>>>>>> patch and then replace the previous patch with the one below and
>>>>>> gather all the data again?
>>>>>
>>>>> Hi Thomas,
>>>>> Maybe it's caused by the hpet driver itself?
>>>>> irq_set_affinity() may set the IRQD_SETAFFINITY_PENDING flag,
>>>>> thus triggering the warning.
>>>> And the usage pattern seems reasonable, the IRQF_NOBALANCING flag
>>>> means nobody may change the affinity except myself:)
>>>
>>> Right, that's why I removed the restriction. I just wonder why we have
>>> not seen that before ...
>> I suspected it's caused by the hierarchy irqdomain at first glance
>> because the multiple irq_datas issue, but seems it's not after checking
>> the code. It will only be triggered if HPET works in MSI mode instead of
>> legacy IRQ mode, but still need more investigation here.
>
> Right. And what confuses me is that this happens on irq0, which is
> ioapic edge.
Hi Thomas,
One possible cause is that hpet_assign_irq() fails, and the
code below treats 0 as a valid irq number(which should be fixed).
----------------------------------------------------------------------
static void hpet_msi_capability_lookup()
{
.........................
irq = hpet_assign_irq(hpet_domain, hdev, hdev->num);
if (irq < 0)
continue;
sprintf(hdev->name, "hpet%d", i);
hdev->num = i;
hdev->irq = irq;
hdev->flags = 0;
...............................
}
Hi Sergey,
Could you please help to apply the debug patch below
and check the log message?
Thanks!
Gerry
--------------------------------------------------------------------
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 40cf79a55fe7..28a65d94fd89 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -578,8 +578,11 @@ static void hpet_msi_capability_lookup(unsigned int
start_timer)
continue;
irq = hpet_assign_irq(hpet_domain, hdev, hdev->num);
- if (irq < 0)
+ if (irq <= 0) {
+ pr_warn("irqdomain: fails to allocate irq for
hpet%d\n",
+ hdev->num);
continue;
+ }
sprintf(hdev->name, "hpet%d", i);
hdev->num = i;
-----------------------------------------------------------------------
On (06/20/15 00:47), Jiang Liu wrote:
> Hi Sergey,
> Could you please help to apply the debug patch below
> and check the log message?
> Thanks!
Hi,
sure, as soon as I get back that box.
-ss
> Gerry
> --------------------------------------------------------------------
> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> index 40cf79a55fe7..28a65d94fd89 100644
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -578,8 +578,11 @@ static void hpet_msi_capability_lookup(unsigned int
> start_timer)
> continue;
>
> irq = hpet_assign_irq(hpet_domain, hdev, hdev->num);
> - if (irq < 0)
> + if (irq <= 0) {
> + pr_warn("irqdomain: fails to allocate irq for
> hpet%d\n",
> + hdev->num);
> continue;
> + }
>
> sprintf(hdev->name, "hpet%d", i);
> hdev->num = i;
> -----------------------------------------------------------------------
>
On (06/19/15 14:21), Thomas Gleixner wrote:
[..]
> Can you please collect the output of /proc/timer_list for the previous
> patch and then replace the previous patch with the one below and
> gather all the data again?
>
attached.
-ss
On (06/20/15 00:47), Jiang Liu wrote:
[..]
> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
> index 40cf79a55fe7..28a65d94fd89 100644
> --- a/arch/x86/kernel/hpet.c
> +++ b/arch/x86/kernel/hpet.c
> @@ -578,8 +578,11 @@ static void hpet_msi_capability_lookup(unsigned int
> start_timer)
> continue;
>
> irq = hpet_assign_irq(hpet_domain, hdev, hdev->num);
> - if (irq < 0)
> + if (irq <= 0) {
> + pr_warn("irqdomain: fails to allocate irq for
> hpet%d\n",
> + hdev->num);
> continue;
> + }
>
> sprintf(hdev->name, "hpet%d", i);
> hdev->num = i;
dmesg attached.
---
[ 0.288688] irqdomain: fails to allocate irq for hpet0
[ 0.288755] irqdomain: fails to allocate irq for hpet0
[ 0.288820] irqdomain: fails to allocate irq for hpet0
[ 0.288885] irqdomain: fails to allocate irq for hpet0
---
-ss
On Sat, 20 Jun 2015, Sergey Senozhatsky wrote:
> On (06/19/15 14:21), Thomas Gleixner wrote:
> [..]
> > Can you please collect the output of /proc/timer_list for the previous
> > patch and then replace the previous patch with the one below and
> > gather all the data again?
> >
>
> attached.
Can you please send me the dmesg of the second patch as well?
Thanks,
tglx
On (06/20/15 09:52), Thomas Gleixner wrote:
[..]
>
> Can you please send me the dmesg of the second patch as well?
>
I'm sorry, what second patch?
-ss
On Sat, 20 Jun 2015, Sergey Senozhatsky wrote:
> On (06/20/15 09:52), Thomas Gleixner wrote:
> [..]
> >
> > Can you please send me the dmesg of the second patch as well?
> >
>
> I'm sorry, what second patch?
The one which I forgot to append when I replied yesterday. Appended now.
Thanks,
tglx
----
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index e2449cf38b06..c47aab35a17e 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -578,7 +578,7 @@ static void hpet_msi_capability_lookup(unsigned int start_timer)
continue;
irq = hpet_assign_irq(hpet_domain, hdev, hdev->num);
- if (irq < 0)
+ if (irq <= 0)
continue;
sprintf(hdev->name, "hpet%d", i);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index b1c7e8f46bfb..49f9b6d2edf6 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -209,6 +209,9 @@ int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask,
if (!chip || !chip->irq_set_affinity)
return -EINVAL;
+ /* Figure out where the set affinitycall comes from */
+ WARN_ON(data->irq == 0);
+
if (irq_can_move_pcntxt(data)) {
ret = irq_do_set_affinity(data, mask, force);
} else {
diff --git a/kernel/irq/migration.c b/kernel/irq/migration.c
index dd203e276b07..00bbabbc144b 100644
--- a/kernel/irq/migration.c
+++ b/kernel/irq/migration.c
@@ -12,16 +12,17 @@ void irq_move_masked_irq(struct irq_data *idata)
if (likely(!irqd_is_setaffinity_pending(&desc->irq_data)))
return;
+ irqd_clr_move_pending(&desc->irq_data);
+
+
/*
* Paranoia: cpu-local interrupts shouldn't be calling in here anyway.
*/
- if (!irqd_can_balance(&desc->irq_data)) {
+ if (irqd_is_per_cpu(&desc->irq_data)) {
WARN_ON(1);
return;
}
- irqd_clr_move_pending(&desc->irq_data);
-
if (unlikely(cpumask_empty(desc->pending_mask)))
return;
On (06/20/15 10:13), Thomas Gleixner wrote:
[..]
> > > Can you please send me the dmesg of the second patch as well?
> > >
> >
> > I'm sorry, what second patch?
>
> The one which I forgot to append when I replied yesterday. Appended now.
>
attached.
[ 0.197188] ------------[ cut here ]------------
[ 0.197250] WARNING: CPU: 0 PID: 1 at kernel/irq/manage.c:213 irq_set_affinity_locked+0x5b/0xf9()
[ 0.197329] Modules linked in:
[ 0.197426] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.1.0-rc8-next-20150619-dbg-00010-g841f367-dirty #234
[ 0.197507] Hardware name: SAMSUNG ELECTRONICS CO.,LTD Samsung DeskTop System/Samsung DeskTop System, BIOS 05CC 04/09/2010
[ 0.197594] 0000000000000009 ffff880133127db8 ffffffff814f2add ffffffff810851a4
[ 0.197824] 0000000000000000 ffff880133127df8 ffffffff8103d196 0000000000000000
[ 0.198052] ffffffff8108795c ffff880133007c00 0000000000000000 ffff880133007c00
[ 0.198281] Call Trace:
[ 0.198339] [<ffffffff814f2add>] dump_stack+0x4c/0x6e
[ 0.198401] [<ffffffff810851a4>] ? console_unlock+0x400/0x42f
[ 0.198465] [<ffffffff8103d196>] warn_slowpath_common+0x9b/0xb5
[ 0.198528] [<ffffffff8108795c>] ? irq_set_affinity_locked+0x5b/0xf9
[ 0.198592] [<ffffffff8103d253>] warn_slowpath_null+0x1a/0x1c
[ 0.198654] [<ffffffff8108795c>] irq_set_affinity_locked+0x5b/0xf9
[ 0.198717] [<ffffffff81087a46>] __irq_set_affinity+0x4c/0x67
[ 0.198781] [<ffffffff81b1d48c>] setup_ioapic_dest+0xa7/0xbf
[ 0.198843] [<ffffffff810851a4>] ? console_unlock+0x400/0x42f
[ 0.198906] [<ffffffff81b1b1ef>] native_smp_cpus_done+0xa4/0xae
[ 0.198970] [<ffffffff81b2408a>] smp_init+0x75/0x7a
[ 0.199032] [<ffffffff81b0dfc7>] kernel_init_freeable+0x8e/0x198
[ 0.199096] [<ffffffff814ef8d3>] ? rest_init+0x13a/0x13a
[ 0.199158] [<ffffffff814ef8e1>] kernel_init+0xe/0xd4
[ 0.199219] [<ffffffff814f953f>] ret_from_fork+0x3f/0x70
[ 0.199281] [<ffffffff814ef8d3>] ? rest_init+0x13a/0x13a
[ 0.199359] ---[ end trace 3b2deb957481a854 ]---
-ss
On Sat, 20 Jun 2015, Sergey Senozhatsky wrote:
> On (06/20/15 10:13), Thomas Gleixner wrote:
> [..]
> > > > Can you please send me the dmesg of the second patch as well?
> > > >
> > >
> > > I'm sorry, what second patch?
> >
> > The one which I forgot to append when I replied yesterday. Appended now.
> >
> [ 0.198654] [<ffffffff8108795c>] irq_set_affinity_locked+0x5b/0xf9
> [ 0.198717] [<ffffffff81087a46>] __irq_set_affinity+0x4c/0x67
> [ 0.198781] [<ffffffff81b1d48c>] setup_ioapic_dest+0xa7/0xbf
> [ 0.198843] [<ffffffff810851a4>] ? console_unlock+0x400/0x42f
> [ 0.198906] [<ffffffff81b1b1ef>] native_smp_cpus_done+0xa4/0xae
> [ 0.198970] [<ffffffff81b2408a>] smp_init+0x75/0x7a
OK. It's setup_ioapic_dest() and I understand now where this all comes
from.
One last request. Can you please provide /proc/timer_list and dmesg
from a mainline kernel for comparison?
Thanks,
tglx
Commit-ID: bafac298fb20e9ae1305c710d4fd8d20c5911afa
Gitweb: http://git.kernel.org/tip/bafac298fb20e9ae1305c710d4fd8d20c5911afa
Author: Jiang Liu <[email protected]>
AuthorDate: Sat, 20 Jun 2015 11:50:50 +0200
Committer: Thomas Gleixner <[email protected]>
CommitDate: Sat, 20 Jun 2015 12:00:58 +0200
x86/hpet: Check for irq==0 when allocating hpet MSI interrupts
irq == 0 is not a valid irq for a irqdomain MSI allocation, but hpet
code checks only for negative return values.
Reported-by: Sergey Senozhatsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
---
arch/x86/kernel/hpet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index e2449cf..c47aab3 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -578,7 +578,7 @@ static void hpet_msi_capability_lookup(unsigned int start_timer)
continue;
irq = hpet_assign_irq(hpet_domain, hdev, hdev->num);
- if (irq < 0)
+ if (irq <= 0)
continue;
sprintf(hdev->name, "hpet%d", i);
Commit-ID: a614a610ac9b28f195d790d25be72d26f345c53a
Gitweb: http://git.kernel.org/tip/a614a610ac9b28f195d790d25be72d26f345c53a
Author: Thomas Gleixner <[email protected]>
AuthorDate: Sat, 20 Jun 2015 12:05:40 +0200
Committer: Thomas Gleixner <[email protected]>
CommitDate: Sat, 20 Jun 2015 19:05:14 +0200
genirq: Remove bogus restriction in irq_move_mask_irq()
If an interrupt is marked with the no balancing flag, we still allow
setting the affinity for such an interrupt from the kernel itself, but
for interrupts which move the affinity from interrupt context via
irq_move_mask_irq() this runs into a check for the no balancing flag,
which in turn ends up with an endless storm of stack dumps because the
move pending flag is not reset.
Allow the move for interrupts which have the no balancing flag set and
clear the move pending bit before checking for interrupts with the per
cpu flag set.
Reported-by: Sergey Senozhatsky <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Jiang Liu <[email protected]>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1506201002570.4107@nanos
Signed-off-by: Thomas Gleixner <[email protected]>
---
kernel/irq/migration.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/irq/migration.c b/kernel/irq/migration.c
index dd203e2..37ddb7b 100644
--- a/kernel/irq/migration.c
+++ b/kernel/irq/migration.c
@@ -12,16 +12,16 @@ void irq_move_masked_irq(struct irq_data *idata)
if (likely(!irqd_is_setaffinity_pending(&desc->irq_data)))
return;
+ irqd_clr_move_pending(&desc->irq_data);
+
/*
* Paranoia: cpu-local interrupts shouldn't be calling in here anyway.
*/
- if (!irqd_can_balance(&desc->irq_data)) {
+ if (irqd_is_per_cpu(&desc->irq_data)) {
WARN_ON(1);
return;
}
- irqd_clr_move_pending(&desc->irq_data);
-
if (unlikely(cpumask_empty(desc->pending_mask)))
return;
On (06/20/15 11:12), Thomas Gleixner wrote:
[..]
> > [ 0.198654] [<ffffffff8108795c>] irq_set_affinity_locked+0x5b/0xf9
> > [ 0.198717] [<ffffffff81087a46>] __irq_set_affinity+0x4c/0x67
> > [ 0.198781] [<ffffffff81b1d48c>] setup_ioapic_dest+0xa7/0xbf
> > [ 0.198843] [<ffffffff810851a4>] ? console_unlock+0x400/0x42f
> > [ 0.198906] [<ffffffff81b1b1ef>] native_smp_cpus_done+0xa4/0xae
> > [ 0.198970] [<ffffffff81b2408a>] smp_init+0x75/0x7a
>
> OK. It's setup_ioapic_dest() and I understand now where this all comes
> from.
>
> One last request. Can you please provide /proc/timer_list and dmesg
> from a mainline kernel for comparison?
>
sorry, was offline. sure, will send out tomorrow.
-ss
On (06/20/15 11:12), Thomas Gleixner wrote:
[..]
>
> One last request. Can you please provide /proc/timer_list and dmesg
> from a mainline kernel for comparison?
>
attached.
-ss
On Sun, 21 Jun 2015, Sergey Senozhatsky wrote:
> On (06/20/15 11:12), Thomas Gleixner wrote:
> > One last request. Can you please provide /proc/timer_list and dmesg
> > from a mainline kernel for comparison?
>
> attached.
Thanks for providing that. As I feared already, there is a regression
as well. The new code fails to allocated more than one hpet MSI
interrupt. Fix below.
Thanks,
tglx
---
Subject: x86/hpet: Use proper hpet device number for MSI allocation
From: Thomas Gleixner <[email protected]>
Date: Sun, 21 Jun 2015 16:21:50 +0200
hpet_assign_irq() is called with hpet_device->num as "hardware
interrupt number", but hpet_device->num is initialized after the
interrupt has been assigned, so it's always 0. As a consequence only
the first MSI allocation succeeds, the following ones fail because the
"hardware interrupt number" already exists.
Move the initialization of dev->num and other fields before the call
to hpet_assign_irq(), which is the ordering before the offending
commit which introduced that regression.
Fixes: "3cb96f0c9733 x86/hpet: Enhance HPET IRQ to support hierarchical irqdomains"
Reported-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Borislav Petkov <[email protected]>
---
arch/x86/kernel/hpet.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
Index: tip/arch/x86/kernel/hpet.c
===================================================================
--- tip.orig/arch/x86/kernel/hpet.c
+++ tip/arch/x86/kernel/hpet.c
@@ -577,16 +577,17 @@ static void hpet_msi_capability_lookup(u
if (!(cfg & HPET_TN_FSB_CAP))
continue;
+ hdev->flags = 0;
+ if (cfg & HPET_TN_PERIODIC_CAP)
+ hdev->flags |= HPET_DEV_PERI_CAP;
+ sprintf(hdev->name, "hpet%d", i);
+ hdev->num = i;
+
irq = hpet_assign_irq(hpet_domain, hdev, hdev->num);
if (irq <= 0)
continue;
- sprintf(hdev->name, "hpet%d", i);
- hdev->num = i;
hdev->irq = irq;
- hdev->flags = 0;
- if (cfg & HPET_TN_PERIODIC_CAP)
- hdev->flags |= HPET_DEV_PERI_CAP;
hdev->flags |= HPET_DEV_FSB_CAP;
hdev->flags |= HPET_DEV_VALID;
num_timers_used++;
Commit-ID: cb17b2a674f2059343f997599b4b001e64eec516
Gitweb: http://git.kernel.org/tip/cb17b2a674f2059343f997599b4b001e64eec516
Author: Thomas Gleixner <[email protected]>
AuthorDate: Sun, 21 Jun 2015 16:21:50 +0200
Committer: Thomas Gleixner <[email protected]>
CommitDate: Sun, 21 Jun 2015 16:38:40 +0200
x86/hpet: Use proper hpet device number for MSI allocation
hpet_assign_irq() is called with hpet_device->num as "hardware
interrupt number", but hpet_device->num is initialized after the
interrupt has been assigned, so it's always 0. As a consequence only
the first MSI allocation succeeds, the following ones fail because the
"hardware interrupt number" already exists.
Move the initialization of dev->num and other fields before the call
to hpet_assign_irq(), which is the ordering before the offending
commit which introduced that regression.
Fixes: "3cb96f0c9733 x86/hpet: Enhance HPET IRQ to support hierarchical irqdomains"
Reported-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1506211635010.4107@nanos
Cc: Jiang Liu <[email protected]>
Cc: Borislav Petkov <[email protected]>
---
arch/x86/kernel/hpet.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index c47aab3..10757d0 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -577,16 +577,17 @@ static void hpet_msi_capability_lookup(unsigned int start_timer)
if (!(cfg & HPET_TN_FSB_CAP))
continue;
+ hdev->flags = 0;
+ if (cfg & HPET_TN_PERIODIC_CAP)
+ hdev->flags |= HPET_DEV_PERI_CAP;
+ sprintf(hdev->name, "hpet%d", i);
+ hdev->num = i;
+
irq = hpet_assign_irq(hpet_domain, hdev, hdev->num);
if (irq <= 0)
continue;
- sprintf(hdev->name, "hpet%d", i);
- hdev->num = i;
hdev->irq = irq;
- hdev->flags = 0;
- if (cfg & HPET_TN_PERIODIC_CAP)
- hdev->flags |= HPET_DEV_PERI_CAP;
hdev->flags |= HPET_DEV_FSB_CAP;
hdev->flags |= HPET_DEV_VALID;
num_timers_used++;