Subject: do_IRQ: 0.126 No irq handler for vector (irq -1)

Hello list,

i've 36 servers all running vanilla 3.18.18 kernel which have a very
high disk and network load.

Since a few days i encounter regular the following error messages and
pretty often completely hanging disk i/o:
[535040.439859] do_IRQ: 0.126 No irq handler for vector (irq -1)
[548400.353679] do_IRQ: 2.109 No irq handler for vector (irq -1)
[551624.894507] do_IRQ: 4.84 No irq handler for vector (irq -1)
[557524.288691] do_IRQ: 1.158 No irq handler for vector (irq -1)
[559786.928441] do_IRQ: 3.172 No irq handler for vector (irq -1)
[572906.281394] do_IRQ: 3.72 No irq handler for vector (irq -1)
[576611.808128] do_IRQ: 3.118 No irq handler for vector (irq -1)
[577242.682643] do_IRQ: 2.45 No irq handler for vector (irq -1)
[578524.584545] do_IRQ: 5.190 No irq handler for vector (irq -1)
[602109.548268] do_IRQ: 3.101 No irq handler for vector (irq -1)

All systems are Single E5 Xeons and I'm running irqbalance on them.

Chipset:
Intel C602J chipset

Is there anything i can do to fix this? Is there may be a kernel patch
available?

Thanks!

Greets,
Stefan


2015-07-20 10:53:42

by Thomas Gleixner

[permalink] [raw]
Subject: Re: do_IRQ: 0.126 No irq handler for vector (irq -1)

On Mon, 20 Jul 2015, Stefan Priebe - Profihost AG wrote:
> Hello list,
>
> i've 36 servers all running vanilla 3.18.18 kernel which have a very
> high disk and network load.
>
> Since a few days i encounter regular the following error messages and
> pretty often completely hanging disk i/o:
> [535040.439859] do_IRQ: 0.126 No irq handler for vector (irq -1)

Did this happen right after you updated to 3.18.18?

Which kernel version were you using before that?

Have you observed such error messages before the update?

> All systems are Single E5 Xeons and I'm running irqbalance on them.

Does it stop if you disable irqbalance ?

Thanks,

tglx

Subject: Re: do_IRQ: 0.126 No irq handler for vector (irq -1)

Am 20.07.2015 um 12:53 schrieb Thomas Gleixner:
> On Mon, 20 Jul 2015, Stefan Priebe - Profihost AG wrote:
>> Hello list,
>>
>> i've 36 servers all running vanilla 3.18.18 kernel which have a very
>> high disk and network load.
>>
>> Since a few days i encounter regular the following error messages and
>> pretty often completely hanging disk i/o:
>> [535040.439859] do_IRQ: 0.126 No irq handler for vector (irq -1)
>
> Did this happen right after you updated to 3.18.18?
> Which kernel version were you using before that?
> Have you observed such error messages before the update?

No it was always working before or at least i've never noticed a device
hang and such messages. It was vanilla 3.10.78.

>> All systems are Single E5 Xeons and I'm running irqbalance on them.
>
> Does it stop if you disable irqbalance ?

Will try that today.

Stefan

Subject: Re: do_IRQ: 0.126 No irq handler for vector (irq -1)


Am 20.07.2015 um 12:53 schrieb Thomas Gleixner:
> On Mon, 20 Jul 2015, Stefan Priebe - Profihost AG wrote:
>> Hello list,
>>
>> i've 36 servers all running vanilla 3.18.18 kernel which have a very
>> high disk and network load.
>>
>> Since a few days i encounter regular the following error messages and
>> pretty often completely hanging disk i/o:
>> [535040.439859] do_IRQ: 0.126 No irq handler for vector (irq -1)
>
> Did this happen right after you updated to 3.18.18?

Seems so.

> Which kernel version were you using before that?

3.10.78

> Have you observed such error messages before the update?

No.

>> All systems are Single E5 Xeons and I'm running irqbalance on them.
>
> Does it stop if you disable irqbalance ?

No. The machines still crash.

Stefan

>
> Thanks,
>
> tglx
>

2015-07-21 21:15:20

by Thomas Gleixner

[permalink] [raw]
Subject: Re: do_IRQ: 0.126 No irq handler for vector (irq -1)

On Tue, 21 Jul 2015, Stefan Priebe wrote:
> Am 20.07.2015 um 12:53 schrieb Thomas Gleixner:
> > On Mon, 20 Jul 2015, Stefan Priebe - Profihost AG wrote:
> > > Hello list,
> > >
> > > i've 36 servers all running vanilla 3.18.18 kernel which have a very
> > > high disk and network load.
> > >
> > > Since a few days i encounter regular the following error messages and
> > > pretty often completely hanging disk i/o:
> > > [535040.439859] do_IRQ: 0.126 No irq handler for vector (irq -1)
> > >
> > > All systems are Single E5 Xeons and I'm running irqbalance on them.
> >
> > Does it stop if you disable irqbalance ?
>
> No. The machines still crash.

crash as in running into a BUG? Or is it just that disk I/O is stalled?

Can you please provide the full dmesg output of such a machine?

I'll cook up a debug patch for that against 3.18.18.

Thanks,

tglx

Subject: Re: do_IRQ: 0.126 No irq handler for vector (irq -1)


Am 21.07.2015 um 23:15 schrieb Thomas Gleixner:
> On Tue, 21 Jul 2015, Stefan Priebe wrote:
>> Am 20.07.2015 um 12:53 schrieb Thomas Gleixner:
>>> On Mon, 20 Jul 2015, Stefan Priebe - Profihost AG wrote:
>>>> Hello list,
>>>>
>>>> i've 36 servers all running vanilla 3.18.18 kernel which have a very
>>>> high disk and network load.
>>>>
>>>> Since a few days i encounter regular the following error messages and
>>>> pretty often completely hanging disk i/o:
>>>> [535040.439859] do_IRQ: 0.126 No irq handler for vector (irq -1)
>>>>
>>>> All systems are Single E5 Xeons and I'm running irqbalance on them.
>>>
>>> Does it stop if you disable irqbalance ?
>>
>> No. The machines still crash.
>
> crash as in running into a BUG? Or is it just that disk I/O is stalled?

Sorry i meant I/O is stalled. It crashes to me as i can't login anymore
due to hanging I/O.

> Can you please provide the full dmesg output of such a machine?

Yes (this time from a machine using 3.18.14) =>
http://pastebin.com/raw.php?i=S6kAk0iS

> I'll cook up a debug patch for that against 3.18.18.

That would be great!

Stefan

> Thanks,
>
> tglx
>

Subject: Re: do_IRQ: 0.126 No irq handler for vector (irq -1)


Am 22.07.2015 um 09:23 schrieb Stefan Priebe - Profihost AG:
>
> Am 21.07.2015 um 23:15 schrieb Thomas Gleixner:
>> On Tue, 21 Jul 2015, Stefan Priebe wrote:
>>> Am 20.07.2015 um 12:53 schrieb Thomas Gleixner:
>>>> On Mon, 20 Jul 2015, Stefan Priebe - Profihost AG wrote:
>>>>> Hello list,
>>>>>
>>>>> i've 36 servers all running vanilla 3.18.18 kernel which have a very
>>>>> high disk and network load.
>>>>>
>>>>> Since a few days i encounter regular the following error messages and
>>>>> pretty often completely hanging disk i/o:
>>>>> [535040.439859] do_IRQ: 0.126 No irq handler for vector (irq -1)
>>>>>
>>>>> All systems are Single E5 Xeons and I'm running irqbalance on them.
>>>>
>>>> Does it stop if you disable irqbalance ?
>>>
>>> No. The machines still crash.
>>
>> crash as in running into a BUG? Or is it just that disk I/O is stalled?
>
> Sorry i meant I/O is stalled. It crashes to me as i can't login anymore
> due to hanging I/O.
>
>> Can you please provide the full dmesg output of such a machine?
>
> Yes (this time from a machine using 3.18.14) =>
> http://pastebin.com/raw.php?i=S6kAk0iS
>
>> I'll cook up a debug patch for that against 3.18.18.

Do you have any special upstream commits in mind?

Stefan

>
> That would be great!
>
> Stefan
>
>> Thanks,
>>
>> tglx
>>

2015-07-26 19:42:26

by Thomas Gleixner

[permalink] [raw]
Subject: Re: do_IRQ: 0.126 No irq handler for vector (irq -1)

On Thu, 23 Jul 2015, Stefan Priebe wrote:
> Do you have any special upstream commits in mind?

Not yet. Could you run a test with 3.16 and 3.17 please? That would
narrow down the problem space quite a bit.

Thanks,

tglx