2016-10-17 14:50:05

by Nathan Zimmer

[permalink] [raw]
Subject: console issue since 3.6, console=ttyS1 hangs

A cluster client recently tried to update from Sles11 to Sles12 and found in
some cases the boxes would hang in early boot. It came down to console=ttyS1
on the command line. After a bisection I found it happended in here:

commit 835d844d1a28efba81d5aca7385e24c29d3a6db2
Author: Sean Young <[email protected]>
Date: Fri Sep 7 19:06:23 2012 +0100

8250_pnp: do pnp probe before legacy probe


I found I can revert the part of the patch in 8250.c (now 8250_core.c) and
the hangs do not happen.

Bios of the offending box ( I don't know if there a bios update )
Version 2.15.1234. Copyright (C) 2012 American Megatrends, Inc.
BIOS Date: 02/05/2014 13:45:09
Ver: ma2e2054.16I

I don't have much more info but will collect anything that is asked.
Any help would be appeciated.

Nate


2016-10-17 15:19:18

by Sean Young

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

On Mon, Oct 17, 2016 at 09:49:51AM -0500, Nathan Zimmer wrote:
> A cluster client recently tried to update from Sles11 to Sles12 and found in
> some cases the boxes would hang in early boot. It came down to console=ttyS1
> on the command line. After a bisection I found it happended in here:
>
> commit 835d844d1a28efba81d5aca7385e24c29d3a6db2
> Author: Sean Young <[email protected]>
> Date: Fri Sep 7 19:06:23 2012 +0100
>
> 8250_pnp: do pnp probe before legacy probe
>
>
> I found I can revert the part of the patch in 8250.c (now 8250_core.c) and
> the hangs do not happen.
>
> Bios of the offending box ( I don't know if there a bios update )
> Version 2.15.1234. Copyright (C) 2012 American Megatrends, Inc.
> BIOS Date: 02/05/2014 13:45:09
> Ver: ma2e2054.16I
>
> I don't have much more info but will collect anything that is asked.
> Any help would be appeciated.

The console output would be helpful (both before 3.6 and with 3.6). It
could be that what the bios provides in pnp does not match the actual
config.

The output of:

cat /sys/bus/pnp/drivers/serial/*/resources

Would be helpful, thanks.


Sean

2016-10-17 16:41:50

by Nathan Zimmer

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

Ok I'll get that sometime tomorrow. Right now they pulled it down maintenance...

On Mon, Oct 17, 2016 at 04:19:07PM +0100, Sean Young wrote:
> On Mon, Oct 17, 2016 at 09:49:51AM -0500, Nathan Zimmer wrote:
> > A cluster client recently tried to update from Sles11 to Sles12 and found in
> > some cases the boxes would hang in early boot. It came down to console=ttyS1
> > on the command line. After a bisection I found it happended in here:
> >
> > commit 835d844d1a28efba81d5aca7385e24c29d3a6db2
> > Author: Sean Young <[email protected]>
> > Date: Fri Sep 7 19:06:23 2012 +0100
> >
> > 8250_pnp: do pnp probe before legacy probe
> >
> >
> > I found I can revert the part of the patch in 8250.c (now 8250_core.c) and
> > the hangs do not happen.
> >
> > Bios of the offending box ( I don't know if there a bios update )
> > Version 2.15.1234. Copyright (C) 2012 American Megatrends, Inc.
> > BIOS Date: 02/05/2014 13:45:09
> > Ver: ma2e2054.16I
> >
> > I don't have much more info but will collect anything that is asked.
> > Any help would be appeciated.
>
> The console output would be helpful (both before 3.6 and with 3.6). It
> could be that what the bios provides in pnp does not match the actual
> config.
>
> The output of:
>
> cat /sys/bus/pnp/drivers/serial/*/resources
>
> Would be helpful, thanks.
>
>
> Sean

2016-10-18 16:40:17

by Nathan Zimmer

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

3.7.0
cat /sys/bus/pnp/drivers/serial/*/resources
state = active
io 0x2f8-0x2ff
irq 12
dma disabled

3.6.0
:~ # cat /sys/bus/pnp/drivers/serial/*/resources
cat: /sys/bus/pnp/drivers/serial/*/resources: No such file or directory

Which is intresting.
So I thought tacking on "tail /sys/devices/pnp0/*/resources" might be helpful.



On Mon, Oct 17, 2016 at 11:41:40AM -0500, Nathan Zimmer wrote:
> Ok I'll get that sometime tomorrow. Right now they pulled it down maintenance...
>
> On Mon, Oct 17, 2016 at 04:19:07PM +0100, Sean Young wrote:
> > On Mon, Oct 17, 2016 at 09:49:51AM -0500, Nathan Zimmer wrote:
> > > A cluster client recently tried to update from Sles11 to Sles12 and found in
> > > some cases the boxes would hang in early boot. It came down to console=ttyS1
> > > on the command line. After a bisection I found it happended in here:
> > >
> > > commit 835d844d1a28efba81d5aca7385e24c29d3a6db2
> > > Author: Sean Young <[email protected]>
> > > Date: Fri Sep 7 19:06:23 2012 +0100
> > >
> > > 8250_pnp: do pnp probe before legacy probe
> > >
> > >
> > > I found I can revert the part of the patch in 8250.c (now 8250_core.c) and
> > > the hangs do not happen.
> > >
> > > Bios of the offending box ( I don't know if there a bios update )
> > > Version 2.15.1234. Copyright (C) 2012 American Megatrends, Inc.
> > > BIOS Date: 02/05/2014 13:45:09
> > > Ver: ma2e2054.16I
> > >
> > > I don't have much more info but will collect anything that is asked.
> > > Any help would be appeciated.
> >
> > The console output would be helpful (both before 3.6 and with 3.6). It
> > could be that what the bios provides in pnp does not match the actual
> > config.
> >
> > The output of:
> >
> > cat /sys/bus/pnp/drivers/serial/*/resources
> >
> > Would be helpful, thanks.
> >
> >
> > Sean


Attachments:
(No filename) (1.77 kB)
info_3.6.0 (2.38 kB)
info_3.7.0 (2.41 kB)
Download all attachments

2016-10-18 18:05:28

by Sean Young

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

On Tue, Oct 18, 2016 at 11:40:04AM -0500, Nathan Zimmer wrote:
> 3.7.0
> cat /sys/bus/pnp/drivers/serial/*/resources
> state = active
> io 0x2f8-0x2ff
> irq 12
> dma disabled
>
> 3.6.0
> :~ # cat /sys/bus/pnp/drivers/serial/*/resources
> cat: /sys/bus/pnp/drivers/serial/*/resources: No such file or directory

irq 12 for ttyS1? That should be irq 3. The bios is putting bogus information
in pnp. Maybe there is rubbish in the bios setup or maybe it's fixed in a
newer bios update.

So before this change, the kernel would assume irq 3. After this change,
the kernel first uses the information in pnp to see where the serial
port is. It gets told that it's irq 12 and presumably it runs into all
sorts of problems then. If memory serves that's the irq for the ps/2 mouse.

The interesting bit is in 3.6.0:

setserial
/dev/ttyS1, UART: 16550A, Port: 0x02f8, IRQ: 3

becomes in 3.7.0:

setserial
/dev/ttyS1, UART: 16550A, Port: 0x02f8, IRQ: 12

You should be able to set the right irq with setserial, but obviously
that doesn't help you if it fails in early boot. It's not immediately
obvious to me what can be done in the kernel for this. Maybe the dmesg
output could inspire, thanks.


Sean

2016-10-18 19:29:43

by Nathan Zimmer

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

On Tue, Oct 18, 2016 at 07:05:18PM +0100, Sean Young wrote:
> On Tue, Oct 18, 2016 at 11:40:04AM -0500, Nathan Zimmer wrote:
> > 3.7.0
> > cat /sys/bus/pnp/drivers/serial/*/resources
> > state = active
> > io 0x2f8-0x2ff
> > irq 12
> > dma disabled
> >
> > 3.6.0
> > :~ # cat /sys/bus/pnp/drivers/serial/*/resources
> > cat: /sys/bus/pnp/drivers/serial/*/resources: No such file or directory
>
> irq 12 for ttyS1? That should be irq 3. The bios is putting bogus information
> in pnp. Maybe there is rubbish in the bios setup or maybe it's fixed in a
> newer bios update.
>
> So before this change, the kernel would assume irq 3. After this change,
> the kernel first uses the information in pnp to see where the serial
> port is. It gets told that it's irq 12 and presumably it runs into all
> sorts of problems then. If memory serves that's the irq for the ps/2 mouse.
>
> The interesting bit is in 3.6.0:
>
> setserial
> /dev/ttyS1, UART: 16550A, Port: 0x02f8, IRQ: 3
>
> becomes in 3.7.0:
>
> setserial
> /dev/ttyS1, UART: 16550A, Port: 0x02f8, IRQ: 12
>
> You should be able to set the right irq with setserial, but obviously
> that doesn't help you if it fails in early boot. It's not immediately
> obvious to me what can be done in the kernel for this. Maybe the dmesg
> output could inspire, thanks.
>
>
> Sean

Yea the changing irq seemed weird to me too but I couldn't manage a guess to why.

Here are the dmesgs.

Nate


Attachments:
(No filename) (1.41 kB)
dmesg_3.6.0 (97.30 kB)
dmesg_3.7.0 (95.04 kB)
Download all attachments

2016-10-19 15:03:50

by Sean Young

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

On Tue, Oct 18, 2016 at 02:29:30PM -0500, Nathan Zimmer wrote:
> On Tue, Oct 18, 2016 at 07:05:18PM +0100, Sean Young wrote:
> > On Tue, Oct 18, 2016 at 11:40:04AM -0500, Nathan Zimmer wrote:
> > > 3.7.0
> > > cat /sys/bus/pnp/drivers/serial/*/resources
> > > state = active
> > > io 0x2f8-0x2ff
> > > irq 12
> > > dma disabled
> > >
> > > 3.6.0
> > > :~ # cat /sys/bus/pnp/drivers/serial/*/resources
> > > cat: /sys/bus/pnp/drivers/serial/*/resources: No such file or directory
> >
> > irq 12 for ttyS1? That should be irq 3. The bios is putting bogus information
> > in pnp. Maybe there is rubbish in the bios setup or maybe it's fixed in a
> > newer bios update.
> >
> > So before this change, the kernel would assume irq 3. After this change,
> > the kernel first uses the information in pnp to see where the serial
> > port is. It gets told that it's irq 12 and presumably it runs into all
> > sorts of problems then. If memory serves that's the irq for the ps/2 mouse.
> >
> > The interesting bit is in 3.6.0:
> >
> > setserial
> > /dev/ttyS1, UART: 16550A, Port: 0x02f8, IRQ: 3
> >
> > becomes in 3.7.0:
> >
> > setserial
> > /dev/ttyS1, UART: 16550A, Port: 0x02f8, IRQ: 12
> >
> > You should be able to set the right irq with setserial, but obviously
> > that doesn't help you if it fails in early boot. It's not immediately
> > obvious to me what can be done in the kernel for this. Maybe the dmesg
> > output could inspire, thanks.
>
> Yea the changing irq seemed weird to me too but I couldn't manage a guess to why.
>
> Here are the dmesgs.

So with 3.6.0:

> [ 2.079980] Serial: 8250/16550 driver, 8 ports, IRQ sharing disabled
> [ 2.100887] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
> [ 2.101715] serial 00:04: unable to assign resources
> [ 2.102174] serial: probe of 00:04 failed with error -16

The pnp probe fails for some reason. I don't understand why.

With 3.7.0:

> [ 2.062700] Serial: 8250/16550 driver, 8 ports, IRQ sharing disabled
> [ 2.063250] serial 00:04: [io 0x02f8-0x02ff]
> [ 2.063875] serial 00:04: [irq 12]
> [ 2.064345] serial 00:04: [dma 18446744073709551615 disabled]
> [ 2.065540] serial 00:04: activated
> [ 2.086442] 00:04: ttyS1 at I/O 0x2f8 (irq = 12) is a 16550A

Now the pnp probe succeeds (with broken irq from pnp).

Can you please check if there is a wrong irq configured in the bios setup
or if there is a bios update available? I don't know why this worked in
the first place.


Sean

2016-10-19 22:13:38

by Nathan Zimmer

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs



On 10/19/2016 04:07 AM, Sean Young wrote:
> On Tue, Oct 18, 2016 at 02:29:30PM -0500, Nathan Zimmer wrote:
>> On Tue, Oct 18, 2016 at 07:05:18PM +0100, Sean Young wrote:
>>> On Tue, Oct 18, 2016 at 11:40:04AM -0500, Nathan Zimmer wrote:
>>>> 3.7.0
>>>> cat /sys/bus/pnp/drivers/serial/*/resources
>>>> state = active
>>>> io 0x2f8-0x2ff
>>>> irq 12
>>>> dma disabled
>>>>
>>>> 3.6.0
>>>> :~ # cat /sys/bus/pnp/drivers/serial/*/resources
>>>> cat: /sys/bus/pnp/drivers/serial/*/resources: No such file or directory
>>> irq 12 for ttyS1? That should be irq 3. The bios is putting bogus information
>>> in pnp. Maybe there is rubbish in the bios setup or maybe it's fixed in a
>>> newer bios update.
>>>
>>> So before this change, the kernel would assume irq 3. After this change,
>>> the kernel first uses the information in pnp to see where the serial
>>> port is. It gets told that it's irq 12 and presumably it runs into all
>>> sorts of problems then. If memory serves that's the irq for the ps/2 mouse.
>>>
>>> The interesting bit is in 3.6.0:
>>>
>>> setserial
>>> /dev/ttyS1, UART: 16550A, Port: 0x02f8, IRQ: 3
>>>
>>> becomes in 3.7.0:
>>>
>>> setserial
>>> /dev/ttyS1, UART: 16550A, Port: 0x02f8, IRQ: 12
>>>
>>> You should be able to set the right irq with setserial, but obviously
>>> that doesn't help you if it fails in early boot. It's not immediately
>>> obvious to me what can be done in the kernel for this. Maybe the dmesg
>>> output could inspire, thanks.
>> Yea the changing irq seemed weird to me too but I couldn't manage a guess to why.
>>
>> Here are the dmesgs.
> So with 3.6.0:
>
>> [ 2.079980] Serial: 8250/16550 driver, 8 ports, IRQ sharing disabled
>> [ 2.100887] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
>> [ 2.101715] serial 00:04: unable to assign resources
>> [ 2.102174] serial: probe of 00:04 failed with error -16
> The pnp probe fails for some reason. I don't understand why.
>
> With 3.7.0:
>
>> [ 2.062700] Serial: 8250/16550 driver, 8 ports, IRQ sharing disabled
>> [ 2.063250] serial 00:04: [io 0x02f8-0x02ff]
>> [ 2.063875] serial 00:04: [irq 12]
>> [ 2.064345] serial 00:04: [dma 18446744073709551615 disabled]
>> [ 2.065540] serial 00:04: activated
>> [ 2.086442] 00:04: ttyS1 at I/O 0x2f8 (irq = 12) is a 16550A
> Now the pnp probe succeeds (with broken irq from pnp).
>
> Can you please check if there is a wrong irq configured in the bios setup
> or if there is a bios update available? I don't know why this worked in
> the first place.
>
>
> Sean

Apparently this is the latest bios available for these nodes.
Also in the bios setup screens I don't see anything for changing irq
numbers for serial console.
But this is a cluster so sometimes thing get hidden to keep everything
uniform as possible.

If you want to point me to the pnp probe code you would be suspicious of
I can try to debug and see what is going there.


2016-10-20 20:10:59

by Sean Young

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

On Wed, Oct 19, 2016 at 05:13:41PM -0500, Nathan Zimmer wrote:
> On 10/19/2016 04:07 AM, Sean Young wrote:
> >So with 3.6.0:
> >
> >>[ 2.079980] Serial: 8250/16550 driver, 8 ports, IRQ sharing disabled
> >>[ 2.100887] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
> >>[ 2.101715] serial 00:04: unable to assign resources
> >>[ 2.102174] serial: probe of 00:04 failed with error -16
> >The pnp probe fails for some reason. I don't understand why.
> >
> >With 3.7.0:
> >
> >>[ 2.062700] Serial: 8250/16550 driver, 8 ports, IRQ sharing disabled
> >>[ 2.063250] serial 00:04: [io 0x02f8-0x02ff]
> >>[ 2.063875] serial 00:04: [irq 12]
> >>[ 2.064345] serial 00:04: [dma 18446744073709551615 disabled]
> >>[ 2.065540] serial 00:04: activated
> >>[ 2.086442] 00:04: ttyS1 at I/O 0x2f8 (irq = 12) is a 16550A
> >Now the pnp probe succeeds (with broken irq from pnp).
> >
> >Can you please check if there is a wrong irq configured in the bios setup
> >or if there is a bios update available? I don't know why this worked in
> >the first place.
>
> Apparently this is the latest bios available for these nodes.
> Also in the bios setup screens I don't see anything for changing irq numbers
> for serial console.
> But this is a cluster so sometimes thing get hidden to keep everything
> uniform as possible.
>
> If you want to point me to the pnp probe code you would be suspicious of I
> can try to debug and see what is going there.

That would be great, thanks. A good start would be to boot 3.6.0 with
"loglevel=7 pnp.debug=1" and hopefully that will show why the probe
used to fail.

Also, does the issue still exist with a more contemporary kernel?


Sean

2016-10-21 15:55:46

by Nathan Zimmer

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

It didn't seem to make a difference as far as output.
Did I miss a config option? or something else?

[ 0.000000] Linux version 3.6.0 (root@r1i2n0) (gcc version 4.3.4
[gcc-4_3-branch revision 152973] (SUSE Linux) ) #3 SMP Mon Oct 17
20:43:34 EDT 2016
[ 0.000000] Command line:
root=/dev/disk/by-id/ata-WDC_WD5000BHTZ-04JCPV1_WD-WXA1E54KKR60-part2
resume=/dev/disk/by-id/ata-WDC_WD5000BHTZ-04JCPV1_WD-WXA1E54KKR60-part1
crashkernel=256M-:128M loglevel=8 pnp.debug=1
...
[ 2.076084] Serial: 8250/16550 driver, 8 ports, IRQ sharing disabled
[ 2.097001] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
[ 2.097844] serial 00:04: unable to assign resources
[ 2.098303] serial: probe of 00:04 failed with error -16


On 10/20/2016 03:10 PM, Sean Young wrote:
> On Wed, Oct 19, 2016 at 05:13:41PM -0500, Nathan Zimmer wrote:
>> On 10/19/2016 04:07 AM, Sean Young wrote:
>>> So with 3.6.0:
>>>
>>>> [ 2.079980] Serial: 8250/16550 driver, 8 ports, IRQ sharing disabled
>>>> [ 2.100887] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
>>>> [ 2.101715] serial 00:04: unable to assign resources
>>>> [ 2.102174] serial: probe of 00:04 failed with error -16
>>> The pnp probe fails for some reason. I don't understand why.
>>>
>>> With 3.7.0:
>>>
>>>> [ 2.062700] Serial: 8250/16550 driver, 8 ports, IRQ sharing disabled
>>>> [ 2.063250] serial 00:04: [io 0x02f8-0x02ff]
>>>> [ 2.063875] serial 00:04: [irq 12]
>>>> [ 2.064345] serial 00:04: [dma 18446744073709551615 disabled]
>>>> [ 2.065540] serial 00:04: activated
>>>> [ 2.086442] 00:04: ttyS1 at I/O 0x2f8 (irq = 12) is a 16550A
>>> Now the pnp probe succeeds (with broken irq from pnp).
>>>
>>> Can you please check if there is a wrong irq configured in the bios setup
>>> or if there is a bios update available? I don't know why this worked in
>>> the first place.
>> Apparently this is the latest bios available for these nodes.
>> Also in the bios setup screens I don't see anything for changing irq numbers
>> for serial console.
>> But this is a cluster so sometimes thing get hidden to keep everything
>> uniform as possible.
>>
>> If you want to point me to the pnp probe code you would be suspicious of I
>> can try to debug and see what is going there.
> That would be great, thanks. A good start would be to boot 3.6.0 with
> "loglevel=7 pnp.debug=1" and hopefully that will show why the probe
> used to fail.
>
> Also, does the issue still exist with a more contemporary kernel?
>
>
> Sean

2016-10-24 13:52:40

by Sean Young

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

On Fri, Oct 21, 2016 at 10:55:40AM -0500, Nathan Zimmer wrote:
> It didn't seem to make a difference as far as output.
> Did I miss a config option? or something else?
>
> [ 0.000000] Linux version 3.6.0 (root@r1i2n0) (gcc version 4.3.4
> [gcc-4_3-branch revision 152973] (SUSE Linux) ) #3 SMP Mon Oct 17 20:43:34
> EDT 2016
> [ 0.000000] Command line:
> root=/dev/disk/by-id/ata-WDC_WD5000BHTZ-04JCPV1_WD-WXA1E54KKR60-part2
> resume=/dev/disk/by-id/ata-WDC_WD5000BHTZ-04JCPV1_WD-WXA1E54KKR60-part1
> crashkernel=256M-:128M loglevel=8 pnp.debug=1
> ...
> [ 2.076084] Serial: 8250/16550 driver, 8 ports, IRQ sharing disabled
> [ 2.097001] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
> [ 2.097844] serial 00:04: unable to assign resources
> [ 2.098303] serial: probe of 00:04 failed with error -16

Ehm is this kernel compiled with CONFIG_PNP_DEBUG_MESSAGES? Are you
getting any debug messages? You could try ignore_loglevel.


Sean

2016-10-24 21:49:37

by Nathan Zimmer

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

On Mon, Oct 24, 2016 at 02:52:35PM +0100, Sean Young wrote:
> On Fri, Oct 21, 2016 at 10:55:40AM -0500, Nathan Zimmer wrote:
> > It didn't seem to make a difference as far as output.
> > Did I miss a config option? or something else?
> >
> > [ 0.000000] Linux version 3.6.0 (root@r1i2n0) (gcc version 4.3.4
> > [gcc-4_3-branch revision 152973] (SUSE Linux) ) #3 SMP Mon Oct 17 20:43:34
> > EDT 2016
> > [ 0.000000] Command line:
> > root=/dev/disk/by-id/ata-WDC_WD5000BHTZ-04JCPV1_WD-WXA1E54KKR60-part2
> > resume=/dev/disk/by-id/ata-WDC_WD5000BHTZ-04JCPV1_WD-WXA1E54KKR60-part1
> > crashkernel=256M-:128M loglevel=8 pnp.debug=1
> > ...
> > [ 2.076084] Serial: 8250/16550 driver, 8 ports, IRQ sharing disabled
> > [ 2.097001] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
> > [ 2.097844] serial 00:04: unable to assign resources
> > [ 2.098303] serial: probe of 00:04 failed with error -16
>
> Ehm is this kernel compiled with CONFIG_PNP_DEBUG_MESSAGES? Are you
> getting any debug messages? You could try ignore_loglevel.
>
>
> Sean

In going back and forth for builds I had become confused about the status of
CONFIG_PNP_DEBUG_MESSAGES.

Here is a better dmesg.

Nate


Attachments:
(No filename) (1.17 kB)
dmegs_pnp_3.6.0 (90.54 kB)
Download all attachments

2016-10-25 20:41:47

by Sean Young

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

On Mon, Oct 24, 2016 at 04:49:25PM -0500, Nathan Zimmer wrote:
> [ 0.974874] system 00:03: Plug and Play ACPI device, IDs PNP0c02 (active)
> [ 0.975038] pnp 00:04: parse resource options
> [ 0.975048] pnp 00:04: dependent set 0 (acceptable) io min 0x2f8 max 0x2f8 align 1 size 8 flags 0x1
> [ 0.975056] pnp 00:04: dependent set 0 (acceptable) irq 3 4 5 6 7 10 11 12 flags 0x1
> [ 0.975060] pnp 00:04: dependent set 0 (acceptable) dma <none> (bitmask 0x0) flags 0x0

So here the bios claims that the serial port can use any of 3 to 12 irqs.

> [ 1.543636] Serial: 8250/16550 driver, 8 ports, IRQ sharing disabled

Why is this kernel compiled with irq sharing disabled?

> [ 1.565062] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A

The isa probe driver find the serial port.

> [ 1.566453] serial 00:04: pnp_assign_resources, try dependent set 0
> [ 1.567383] serial 00:04: couldn't assign io 0 (min 0x2f8 max 0x2f8)

But then decides that the port is already in use (the existing serial driver).

> [ 1.568366] serial 00:04: pnp_assign_resources failed (-16)
> [ 1.569188] serial 00:04: unable to assign resources
> [ 1.569924] serial: probe of 00:04 failed with error -16

Please try and boot 3.7.0 with "8250.share_irqs=1", maybe it will pick
irq 3 and it will be happy again, but that is just a guess.

I think I have not fully understood what the failure is. Does the serial
port not work or does the boot hang? What are the symptoms?

We might be able to fix the problem with a pnp quirk but 3.7 is has not had
any releases for a long time. We will need a reproduction on a concurrent
kernel so a patch can be written for that.


Sean

2016-10-26 18:16:22

by Nathan Zimmer

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs



On 10/25/2016 03:41 PM, Sean Young wrote:
> On Mon, Oct 24, 2016 at 04:49:25PM -0500, Nathan Zimmer wrote:
>> [ 0.974874] system 00:03: Plug and Play ACPI device, IDs PNP0c02 (active)
>> [ 0.975038] pnp 00:04: parse resource options
>> [ 0.975048] pnp 00:04: dependent set 0 (acceptable) io min 0x2f8 max 0x2f8 align 1 size 8 flags 0x1
>> [ 0.975056] pnp 00:04: dependent set 0 (acceptable) irq 3 4 5 6 7 10 11 12 flags 0x1
>> [ 0.975060] pnp 00:04: dependent set 0 (acceptable) dma <none> (bitmask 0x0) flags 0x0
> So here the bios claims that the serial port can use any of 3 to 12 irqs.
>
>> [ 1.543636] Serial: 8250/16550 driver, 8 ports, IRQ sharing disabled
> Why is this kernel compiled with irq sharing disabled?
Because I first noticed the error on a sles kernel and that is how they
have it set.
The error also occurs with sharing on.

<snip from a 4.8 dmesg with irq sharing enabled>
[ 4.662336] Serial: 8250/16550 driver, 8 ports, IRQ sharing enabled
[ 4.663316] serial 00:03: pnp_assign_resources, try dependent set 0
[ 4.664249] serial 00:03: [io 0x02f8-0x02ff]
[ 4.664913] serial 00:03: device 0000:00:16.1 using irq 5
[ 4.688879] serial 00:03: device 0000:00:1f.3 using irq 10
[ 4.712265] serial 00:03: device 0000:00:16.0 using irq 11
[ 4.735265] serial 00:03: [irq 12]
[ 4.757538] serial 00:03: dma 0 disabled
[ 4.780153] serial 00:03: [dma 18446744073709551615 disabled]
[ 4.802826] serial 00:03: pnp_assign_resources succeeded: current
resources:
[ 4.825758] serial 00:03: [io 0x02f8-0x02ff flags 0x40000101]
[ 4.848625] serial 00:03: [irq 12 flags 0x40000401]
[ 4.871224] serial 00:03: [dma 18446744073709551615 flags 0x50000800]
[ 4.893988] serial 00:03: pnp_start_dev: current resources:
[ 4.916634] serial 00:03: [io 0x02f8-0x02ff flags 0x40000101]
[ 4.939280] serial 00:03: [irq 12 flags 0x40000401]
[ 4.961646] serial 00:03: [dma 18446744073709551615 flags 0x50000800]
[ 4.984180] serial 00:03: set resources
[ 5.006654] serial 00:03: encode 3 resources
[ 5.028545] serial 00:03: encode io 0x2f8-0x2ff decode 0x1
[ 5.050486] serial 00:03: encode irq 12 edge high exclusive (2-byte
descriptor)
[ 5.072593] serial 00:03: encode dma (disabled)
[ 5.094644] serial 00:03: activated
[ 5.136000] 00:03: ttyS1 at I/O 0x2f8 (irq = 12, base_baud = 115200)
is a 16550A


>> [ 1.565062] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
> The isa probe driver find the serial port.
>
>> [ 1.566453] serial 00:04: pnp_assign_resources, try dependent set 0
>> [ 1.567383] serial 00:04: couldn't assign io 0 (min 0x2f8 max 0x2f8)
> But then decides that the port is already in use (the existing serial driver).
>> [ 1.568366] serial 00:04: pnp_assign_resources failed (-16)
>> [ 1.569188] serial 00:04: unable to assign resources
>> [ 1.569924] serial: probe of 00:04 failed with error -16
> Please try and boot 3.7.0 with "8250.share_irqs=1", maybe it will pick
> irq 3 and it will be happy again, but that is just a guess.
>
> I think I have not fully understood what the failure is. Does the serial
> port not work or does the boot hang? What are the symptoms?
With console=ttyS1 the boot will "hang", sometimes it makes it all the
way through but may take 30 minutes, instead of the 2-4 minutes this box

> We might be able to fix the problem with a pnp quirk but 3.7 is has not had
> any releases for a long time. We will need a reproduction on a concurrent
> kernel so a patch can be written for that.
Yes it still happens with 4.8+
I had only started dwelling on 3.6/3.7 since that is where it first
appears and don't have any attachment to those.

>
> Sean

2016-10-27 20:19:25

by Sean Young

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

On Wed, Oct 26, 2016 at 01:16:16PM -0500, Nathan Zimmer wrote:
> On 10/25/2016 03:41 PM, Sean Young wrote:
> >On Mon, Oct 24, 2016 at 04:49:25PM -0500, Nathan Zimmer wrote:
> >>[ 1.565062] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
> >The isa probe driver find the serial port.
> >
> >>[ 1.566453] serial 00:04: pnp_assign_resources, try dependent set 0
> >>[ 1.567383] serial 00:04: couldn't assign io 0 (min 0x2f8 max 0x2f8)
> >But then decides that the port is already in use (the existing serial driver).
> >>[ 1.568366] serial 00:04: pnp_assign_resources failed (-16)
> >>[ 1.569188] serial 00:04: unable to assign resources
> >>[ 1.569924] serial: probe of 00:04 failed with error -16
> >Please try and boot 3.7.0 with "8250.share_irqs=1", maybe it will pick
> >irq 3 and it will be happy again, but that is just a guess.
> >
> >I think I have not fully understood what the failure is. Does the serial
> >port not work or does the boot hang? What are the symptoms?
> With console=ttyS1 the boot will "hang", sometimes it makes it all the way
> through but may take 30 minutes, instead of the 2-4 minutes this box

Where does it hang? Any error messages?

> >We might be able to fix the problem with a pnp quirk but 3.7 is has not had
> >any releases for a long time. We will need a reproduction on a concurrent
> >kernel so a patch can be written for that.
> Yes it still happens with 4.8+
> I had only started dwelling on 3.6/3.7 since that is where it first appears
> and don't have any attachment to those.

Ok. Please try to following patch. I'm not sure it is good enough to be
merged as-is, but should provide a start for testing. Output with
CONFIG_PNP_DEBUG_MESSAGES should show only irq 3 is available for the
serial port now.

Sean

>From 3a1705a2e28f4385b778ad96d7c517b82ea860e2 Mon Sep 17 00:00:00 2001
From: Sean Young <[email protected]>
Date: Thu, 27 Oct 2016 20:13:50 +0100
Subject: [PATCH] PNP: Add quirk for BIOS advertising wrong irqs for serial
port

Signed-off-by: Sean Young <[email protected]>
---
drivers/pnp/quirks.c | 43 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)

diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
index d28e3ab..8712161 100644
--- a/drivers/pnp/quirks.c
+++ b/drivers/pnp/quirks.c
@@ -66,6 +66,48 @@ static void quirk_awe32_resources(struct pnp_dev *dev)
}
}

+static void quirk_serial_port(struct pnp_dev *dev)
+{
+ struct pnp_option *option;
+ struct pnp_irq *irq;
+ struct pnp_port *port;
+
+ list_for_each_entry(option, &dev->options, list) {
+ if (!pnp_option_is_dependent(option))
+ continue;
+
+ if (option->type == IORESOURCE_IO) {
+ port = &option->u.port;
+
+ if (port->min != 0x2f8 || port->max != 0x2f8 ||
+ port->size != 8 || port->align != 1)
+ return;
+ } else if (option->type == IORESOURCE_IRQ) {
+ pnp_irq_mask_t map;
+
+ irq = &option->u.irq;
+
+ bitmap_zero(map.bits, PNP_IRQ_NR);
+ __set_bit(3, map.bits);
+ __set_bit(4, map.bits);
+ __set_bit(5, map.bits);
+ __set_bit(6, map.bits);
+ __set_bit(7, map.bits);
+ __set_bit(10, map.bits);
+ __set_bit(11, map.bits);
+ __set_bit(12, map.bits);
+
+ if (!bitmap_equal(map.bits, irq->map.bits, PNP_IRQ_NR))
+ return;
+ }
+ }
+
+ if (irq && port) {
+ bitmap_zero(irq->map.bits, PNP_IRQ_NR);
+ __set_bit(3, irq->map.bits);
+ }
+}
+
static void quirk_cmi8330_resources(struct pnp_dev *dev)
{
struct pnp_option *option;
@@ -448,6 +490,7 @@ static struct pnp_fixup pnp_fixups[] = {
#ifdef CONFIG_PCI
{"PNP0c02", quirk_intel_mch},
#endif
+ {"PNP0c02", quirk_serial_port},
{""}
};

--
2.7.4

2016-10-28 19:42:46

by Nathan Zimmer

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

On Thu, Oct 27, 2016 at 09:19:16PM +0100, Sean Young wrote:
> On Wed, Oct 26, 2016 at 01:16:16PM -0500, Nathan Zimmer wrote:
> > On 10/25/2016 03:41 PM, Sean Young wrote:
> > >On Mon, Oct 24, 2016 at 04:49:25PM -0500, Nathan Zimmer wrote:
> > >>[ 1.565062] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
> > >The isa probe driver find the serial port.
> > >
> > >>[ 1.566453] serial 00:04: pnp_assign_resources, try dependent set 0
> > >>[ 1.567383] serial 00:04: couldn't assign io 0 (min 0x2f8 max 0x2f8)
> > >But then decides that the port is already in use (the existing serial driver).
> > >>[ 1.568366] serial 00:04: pnp_assign_resources failed (-16)
> > >>[ 1.569188] serial 00:04: unable to assign resources
> > >>[ 1.569924] serial: probe of 00:04 failed with error -16
> > >Please try and boot 3.7.0 with "8250.share_irqs=1", maybe it will pick
> > >irq 3 and it will be happy again, but that is just a guess.
> > >
> > >I think I have not fully understood what the failure is. Does the serial
> > >port not work or does the boot hang? What are the symptoms?
> > With console=ttyS1 the boot will "hang", sometimes it makes it all the way
> > through but may take 30 minutes, instead of the 2-4 minutes this box
>
> Where does it hang? Any error messages?
>

Shortly after mounting the root.
After that I get no more output...

Here is a failure log from 4.8.

> > >We might be able to fix the problem with a pnp quirk but 3.7 is has not had
> > >any releases for a long time. We will need a reproduction on a concurrent
> > >kernel so a patch can be written for that.
> > Yes it still happens with 4.8+
> > I had only started dwelling on 3.6/3.7 since that is where it first appears
> > and don't have any attachment to those.
>
> Ok. Please try to following patch. I'm not sure it is good enough to be
> merged as-is, but should provide a start for testing. Output with
> CONFIG_PNP_DEBUG_MESSAGES should show only irq 3 is available for the
> serial port now.
>
> Sean
>
> From 3a1705a2e28f4385b778ad96d7c517b82ea860e2 Mon Sep 17 00:00:00 2001
> From: Sean Young <[email protected]>
> Date: Thu, 27 Oct 2016 20:13:50 +0100
> Subject: [PATCH] PNP: Add quirk for BIOS advertising wrong irqs for serial
> port
>
> Signed-off-by: Sean Young <[email protected]>
> ---
> drivers/pnp/quirks.c | 43 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 43 insertions(+)
>
> diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
> index d28e3ab..8712161 100644
> --- a/drivers/pnp/quirks.c
> +++ b/drivers/pnp/quirks.c
> @@ -66,6 +66,48 @@ static void quirk_awe32_resources(struct pnp_dev *dev)
> }
> }
>
> +static void quirk_serial_port(struct pnp_dev *dev)
> +{
> + struct pnp_option *option;
> + struct pnp_irq *irq;
> + struct pnp_port *port;
> +
> + list_for_each_entry(option, &dev->options, list) {
> + if (!pnp_option_is_dependent(option))
> + continue;
> +
> + if (option->type == IORESOURCE_IO) {
> + port = &option->u.port;
> +
> + if (port->min != 0x2f8 || port->max != 0x2f8 ||
> + port->size != 8 || port->align != 1)
> + return;
> + } else if (option->type == IORESOURCE_IRQ) {
> + pnp_irq_mask_t map;
> +
> + irq = &option->u.irq;
> +
> + bitmap_zero(map.bits, PNP_IRQ_NR);
> + __set_bit(3, map.bits);
> + __set_bit(4, map.bits);
> + __set_bit(5, map.bits);
> + __set_bit(6, map.bits);
> + __set_bit(7, map.bits);
> + __set_bit(10, map.bits);
> + __set_bit(11, map.bits);
> + __set_bit(12, map.bits);
> +
> + if (!bitmap_equal(map.bits, irq->map.bits, PNP_IRQ_NR))
> + return;
> + }
> + }
> +
> + if (irq && port) {
> + bitmap_zero(irq->map.bits, PNP_IRQ_NR);
> + __set_bit(3, irq->map.bits);
> + }
> +}
> +
> static void quirk_cmi8330_resources(struct pnp_dev *dev)
> {
> struct pnp_option *option;
> @@ -448,6 +490,7 @@ static struct pnp_fixup pnp_fixups[] = {
> #ifdef CONFIG_PCI
> {"PNP0c02", quirk_intel_mch},
> #endif
> + {"PNP0c02", quirk_serial_port},
> {""}
> };
>
> --
> 2.7.4
>


Attachments:
(No filename) (3.92 kB)
con_log_4.8 (63.24 kB)
Download all attachments

2016-10-28 19:55:51

by Nathan Zimmer

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

Unfortunately the quirk crashed...

[ 3.985834] pnp 00:01: parse allocated resources
[ 3.986342] pnp 00:01: PNP0c02: calling
quirk_system_pci_resources+0x0/0x180
[ 3.987055] pnp 00:01: PNP0c02: calling quirk_intel_mch+0x0/0x1a0
[ 3.987613] pnp 00:01: PNP0c02: calling quirk_serial_port+0x0/0x140
[ 3.988246] BUG: unable to handle kernel NULL pointer dereference at
0000000000000001
[ 3.989044] IP: [<ffffffff813b52af>] quirk_serial_port+0xdf/0x140
[ 3.989600] PGD 0
[ 3.989798] Oops: 0002 [#1] SMP
[ 3.990089] Modules linked in:
[ 3.990476] CPU: 5 PID: 1 Comm: swapper/0 Not tainted
4.8.0-00001-ge41e11b #37
[ 3.991176] Hardware name: SGI.COM ICE-XIP119/S0751-Medina, BIOS
ma2e2054 02/05/2014
[ 3.994913] task: ffff88086c2b4040 task.stack: ffff88086c2b8000
[ 3.995461] RIP: 0010:[<ffffffff813b52af>] [<ffffffff813b52af>]
quirk_serial_port+0xdf/0x140
[ 3.996352] RSP: 0000:ffff88086c2bb998 EFLAGS: 00010282
[ 3.996909] RAX: 0000000000000037 RBX: ffff88106f108360 RCX:
0000000000000001
[ 3.997596] RDX: 0000000000000001 RSI: 0000000000000292 RDI:
ffff88106f108000
[ 3.998240] RBP: ffff88086c2bb9e8 R08: 0000000000000000 R09:
0000000000000000
[ 3.998852] R10: 000000000000000a R11: 0000000000000000 R12:
ffff88106f108000
[ 3.999506] R13: ffff88106f108360 R14: 00000000ffffffea R15:
ffff88084f2d3f40
[ 4.000181] FS: 0000000000000000(0000) GS:ffff88086fb40000(0000)
knlGS:0000000000000000
[ 4.000880] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.001363] CR2: 0000000000000001 CR3: 0000000001a06000 CR4:
00000000001406e0
[ 4.005050] Stack:
[ 4.005249] ffff88086c2bb9b8 0000000000000000 ffffffff81810846
ffff88086c2bb988
[ 4.006061] 00000000fbe1ffff ffffffff81aaf740 ffff88106f108000
ffff88106f108000
[ 4.006838] 00000000ffffffea ffff88084f2d3f40 ffff88086c2bba08
ffffffff813b48f2
[ 4.007586] Call Trace:
[ 4.007821] [<ffffffff813b48f2>] pnp_fixup_device+0x22/0x70
[ 4.008272] [<ffffffff813b0073>] __pnp_add_device+0x13/0xf0
[ 4.008758] [<ffffffff813a1c9d>] ? acpi_walk_resources+0xe3/0x114
[ 4.009272] [<ffffffff813b6550>] ?
pnpacpi_parse_allocated_resource+0xa0/0xa0
[ 4.009920] [<ffffffff813b01a0>] pnp_add_device+0x50/0xf0
[ 4.010481] [<ffffffff813a1bab>] ? acpi_walk_resource_buffer+0x10d/0x11c
[ 4.011127] [<ffffffff813b6550>] ?
pnpacpi_parse_allocated_resource+0xa0/0xa0
[ 4.011823] [<ffffffff813a1cbd>] ? acpi_walk_resources+0x103/0x114
[ 4.015512] [<ffffffff81b687ff>] pnpacpi_add_device+0x1ce/0x254
[ 4.016040] [<ffffffff81b688bb>] pnpacpi_add_device_handler+0x36/0x3a
[ 4.016678] [<ffffffff8139b2c5>] acpi_ns_get_device_callback+0x134/0x156
[ 4.017356] [<ffffffff8139ad78>] acpi_ns_walk_namespace+0x10a/0x251
[ 4.017977] [<ffffffff8139b191>] ? acpi_get_devices+0x10b/0x10b
[ 4.018556] [<ffffffff81b685bb>] ? ispnpidacpi+0x7b/0x7b
[ 4.019089] [<ffffffff8139b156>] acpi_get_devices+0xd0/0x10b
[ 4.019649] [<ffffffff81b68885>] ? pnpacpi_add_device+0x254/0x254
[ 4.020236] [<ffffffff81b68611>] pnpacpi_init+0x56/0x76
[ 4.020753] [<ffffffff81000380>] do_one_initcall+0xc0/0x1d0
[ 4.021299] [<ffffffff812e7444>] ? ida_pre_get+0x54/0xe0
[ 4.024877] [<ffffffff812398da>] ? proc_alloc_inum+0x4a/0xd0
[ 4.025453] [<ffffffff81b25d25>] ? repair_env_string+0x17/0x58
[ 4.026030] [<ffffffff810849e7>] ? parse_one+0xd7/0x180
[ 4.026557] [<ffffffff81084b7c>] ? parse_args+0xec/0x300
[ 4.027094] [<ffffffff81b25d0e>] ? kernel_init_freeable+0x2a3/0x2a3
[ 4.027703] [<ffffffff810a6c3e>] ? __wake_up+0x4e/0x70
[ 4.028222] [<ffffffff81b25a45>] do_basic_setup+0xb2/0xd8
[ 4.028765] [<ffffffff81b25d0e>] ? kernel_init_freeable+0x2a3/0x2a3
[ 4.029370] [<ffffffff81b25c85>] kernel_init_freeable+0x21a/0x2a3
[ 4.029945] [<ffffffff8157a939>] kernel_init+0x9/0x100
[ 4.030443] [<ffffffff81586ecf>] ret_from_fork+0x1f/0x40
[ 4.030961] [<ffffffff8157a930>] ? rest_init+0x80/0x80
[ 4.031476] Code: 01 00 00 4c 89 e7 4c 89 f6 e8 5e 08 f4 ff 85 c0 74
3c 48 8b 1b 4c 89 f2 4c 39 eb 0f 85 65 ff ff ff 48 85 d2 74 28 4d 85 ff
74 23 <48> c7 02 00 00 00 00 48 c7 42 08 00 00 00 00 48 c7 42 10 00 00
[ 4.036680] RIP [<ffffffff813b52af>] quirk_serial_port+0xdf/0x140
[ 4.037267] RSP <ffff88086c2bb998>
[ 4.037615] CR2: 0000000000000001
[ 4.037938] ---[ end trace 5bc20b620cbcf8e2 ]---
[ 4.038360] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x00000009
[ 4.038360]
[ 4.039210] ---[ end Kernel panic - not syncing: Attempted to kill
init! exitcode=0x00000009
[ 4.039210]

The other thing I did as a diagnostic exercise was to look at the commit
I bisected too.

After some experimentation I found only I could revert a small part to
"fix" the issue.

This probably doesn't illuminate much but...

On 10/27/2016 03:19 PM, Sean Young wrote:
> On Wed, Oct 26, 2016 at 01:16:16PM -0500, Nathan Zimmer wrote:
>> On 10/25/2016 03:41 PM, Sean Young wrote:
>>> On Mon, Oct 24, 2016 at 04:49:25PM -0500, Nathan Zimmer wrote:
>>>> [ 1.565062] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
>>> The isa probe driver find the serial port.
>>>
>>>> [ 1.566453] serial 00:04: pnp_assign_resources, try dependent set 0
>>>> [ 1.567383] serial 00:04: couldn't assign io 0 (min 0x2f8 max 0x2f8)
>>> But then decides that the port is already in use (the existing serial driver).
>>>> [ 1.568366] serial 00:04: pnp_assign_resources failed (-16)
>>>> [ 1.569188] serial 00:04: unable to assign resources
>>>> [ 1.569924] serial: probe of 00:04 failed with error -16
>>> Please try and boot 3.7.0 with "8250.share_irqs=1", maybe it will pick
>>> irq 3 and it will be happy again, but that is just a guess.
>>>
>>> I think I have not fully understood what the failure is. Does the serial
>>> port not work or does the boot hang? What are the symptoms?
>> With console=ttyS1 the boot will "hang", sometimes it makes it all the way
>> through but may take 30 minutes, instead of the 2-4 minutes this box
> Where does it hang? Any error messages?
>
>>> We might be able to fix the problem with a pnp quirk but 3.7 is has not had
>>> any releases for a long time. We will need a reproduction on a concurrent
>>> kernel so a patch can be written for that.
>> Yes it still happens with 4.8+
>> I had only started dwelling on 3.6/3.7 since that is where it first appears
>> and don't have any attachment to those.
> Ok. Please try to following patch. I'm not sure it is good enough to be
> merged as-is, but should provide a start for testing. Output with
> CONFIG_PNP_DEBUG_MESSAGES should show only irq 3 is available for the
> serial port now.
>
> Sean
>
> From 3a1705a2e28f4385b778ad96d7c517b82ea860e2 Mon Sep 17 00:00:00 2001
> From: Sean Young <[email protected]>
> Date: Thu, 27 Oct 2016 20:13:50 +0100
> Subject: [PATCH] PNP: Add quirk for BIOS advertising wrong irqs for serial
> port
>
> Signed-off-by: Sean Young <[email protected]>
> ---
> drivers/pnp/quirks.c | 43 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 43 insertions(+)
>
> diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
> index d28e3ab..8712161 100644
> --- a/drivers/pnp/quirks.c
> +++ b/drivers/pnp/quirks.c
> @@ -66,6 +66,48 @@ static void quirk_awe32_resources(struct pnp_dev *dev)
> }
> }
>
> +static void quirk_serial_port(struct pnp_dev *dev)
> +{
> + struct pnp_option *option;
> + struct pnp_irq *irq;
> + struct pnp_port *port;
> +
> + list_for_each_entry(option, &dev->options, list) {
> + if (!pnp_option_is_dependent(option))
> + continue;
> +
> + if (option->type == IORESOURCE_IO) {
> + port = &option->u.port;
> +
> + if (port->min != 0x2f8 || port->max != 0x2f8 ||
> + port->size != 8 || port->align != 1)
> + return;
> + } else if (option->type == IORESOURCE_IRQ) {
> + pnp_irq_mask_t map;
> +
> + irq = &option->u.irq;
> +
> + bitmap_zero(map.bits, PNP_IRQ_NR);
> + __set_bit(3, map.bits);
> + __set_bit(4, map.bits);
> + __set_bit(5, map.bits);
> + __set_bit(6, map.bits);
> + __set_bit(7, map.bits);
> + __set_bit(10, map.bits);
> + __set_bit(11, map.bits);
> + __set_bit(12, map.bits);
> +
> + if (!bitmap_equal(map.bits, irq->map.bits, PNP_IRQ_NR))
> + return;
> + }
> + }
> +
> + if (irq && port) {
> + bitmap_zero(irq->map.bits, PNP_IRQ_NR);
> + __set_bit(3, irq->map.bits);
> + }
> +}
> +
> static void quirk_cmi8330_resources(struct pnp_dev *dev)
> {
> struct pnp_option *option;
> @@ -448,6 +490,7 @@ static struct pnp_fixup pnp_fixups[] = {
> #ifdef CONFIG_PCI
> {"PNP0c02", quirk_intel_mch},
> #endif
> + {"PNP0c02", quirk_serial_port},
> {""}
> };
>


Attachments:
hack.patch (1.44 kB)

2016-10-29 21:16:54

by Sean Young

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

On Fri, Oct 28, 2016 at 02:42:25PM -0500, Nathan Zimmer wrote:
> On Thu, Oct 27, 2016 at 09:19:16PM +0100, Sean Young wrote:
> > On Wed, Oct 26, 2016 at 01:16:16PM -0500, Nathan Zimmer wrote:
> > > On 10/25/2016 03:41 PM, Sean Young wrote:
> > > >On Mon, Oct 24, 2016 at 04:49:25PM -0500, Nathan Zimmer wrote:
> > > >>[ 1.565062] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
> > > >The isa probe driver find the serial port.
> > > >
> > > >>[ 1.566453] serial 00:04: pnp_assign_resources, try dependent set 0
> > > >>[ 1.567383] serial 00:04: couldn't assign io 0 (min 0x2f8 max 0x2f8)
> > > >But then decides that the port is already in use (the existing serial driver).
> > > >>[ 1.568366] serial 00:04: pnp_assign_resources failed (-16)
> > > >>[ 1.569188] serial 00:04: unable to assign resources
> > > >>[ 1.569924] serial: probe of 00:04 failed with error -16
> > > >Please try and boot 3.7.0 with "8250.share_irqs=1", maybe it will pick
> > > >irq 3 and it will be happy again, but that is just a guess.
> > > >
> > > >I think I have not fully understood what the failure is. Does the serial
> > > >port not work or does the boot hang? What are the symptoms?
> > > With console=ttyS1 the boot will "hang", sometimes it makes it all the way
> > > through but may take 30 minutes, instead of the 2-4 minutes this box
> >
> > Where does it hang? Any error messages?
> >
>
> Shortly after mounting the root.
> After that I get no more output...
>
> Here is a failure log from 4.8.

So does the console on the serial port work on 4.8? Also what does
"cat /proc/interrupts" say with and without my change of ordering the
serial pnp probe before the isa probe?

It could be that the serial driver picking a different irq caused another
driver to be forced to pick another irq which does not work.


Sean

2016-10-30 15:33:09

by Nathan Zimmer

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs



On 10/27/2016 03:19 PM, Sean Young wrote:
> On Wed, Oct 26, 2016 at 01:16:16PM -0500, Nathan Zimmer wrote:
>> On 10/25/2016 03:41 PM, Sean Young wrote:
>>> On Mon, Oct 24, 2016 at 04:49:25PM -0500, Nathan Zimmer wrote:
>>>> [ 1.565062] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
>>> The isa probe driver find the serial port.
>>>
>>>> [ 1.566453] serial 00:04: pnp_assign_resources, try dependent set 0
>>>> [ 1.567383] serial 00:04: couldn't assign io 0 (min 0x2f8 max 0x2f8)
>>> But then decides that the port is already in use (the existing serial driver).
>>>> [ 1.568366] serial 00:04: pnp_assign_resources failed (-16)
>>>> [ 1.569188] serial 00:04: unable to assign resources
>>>> [ 1.569924] serial: probe of 00:04 failed with error -16
>>> Please try and boot 3.7.0 with "8250.share_irqs=1", maybe it will pick
>>> irq 3 and it will be happy again, but that is just a guess.
>>>
>>> I think I have not fully understood what the failure is. Does the serial
>>> port not work or does the boot hang? What are the symptoms?
>> With console=ttyS1 the boot will "hang", sometimes it makes it all the way
>> through but may take 30 minutes, instead of the 2-4 minutes this box
> Where does it hang? Any error messages?
>
>>> We might be able to fix the problem with a pnp quirk but 3.7 is has not had
>>> any releases for a long time. We will need a reproduction on a concurrent
>>> kernel so a patch can be written for that.
>> Yes it still happens with 4.8+
>> I had only started dwelling on 3.6/3.7 since that is where it first appears
>> and don't have any attachment to those.
> Ok. Please try to following patch. I'm not sure it is good enough to be
> merged as-is, but should provide a start for testing. Output with
> CONFIG_PNP_DEBUG_MESSAGES should show only irq 3 is available for the
> serial port now.
>
> Sean
>
> From 3a1705a2e28f4385b778ad96d7c517b82ea860e2 Mon Sep 17 00:00:00 2001
> From: Sean Young <[email protected]>
> Date: Thu, 27 Oct 2016 20:13:50 +0100
> Subject: [PATCH] PNP: Add quirk for BIOS advertising wrong irqs for serial
> port
>
> Signed-off-by: Sean Young <[email protected]>
> ---
> drivers/pnp/quirks.c | 43 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 43 insertions(+)
>
> diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
> index d28e3ab..8712161 100644
> --- a/drivers/pnp/quirks.c
> +++ b/drivers/pnp/quirks.c
> @@ -66,6 +66,48 @@ static void quirk_awe32_resources(struct pnp_dev *dev)
> }
> }
>
> +static void quirk_serial_port(struct pnp_dev *dev)
> +{
> + struct pnp_option *option;
> + struct pnp_irq *irq;
> + struct pnp_port *port;
> +
> + list_for_each_entry(option, &dev->options, list) {
> + if (!pnp_option_is_dependent(option))
> + continue;
> +
> + if (option->type == IORESOURCE_IO) {
> + port = &option->u.port;
> +
> + if (port->min != 0x2f8 || port->max != 0x2f8 ||
> + port->size != 8 || port->align != 1)
> + return;
> + } else if (option->type == IORESOURCE_IRQ) {
> + pnp_irq_mask_t map;
> +
> + irq = &option->u.irq;
> +
> + bitmap_zero(map.bits, PNP_IRQ_NR);
> + __set_bit(3, map.bits);
> + __set_bit(4, map.bits);
> + __set_bit(5, map.bits);
> + __set_bit(6, map.bits);
> + __set_bit(7, map.bits);
> + __set_bit(10, map.bits);
> + __set_bit(11, map.bits);
> + __set_bit(12, map.bits);
> +
> + if (!bitmap_equal(map.bits, irq->map.bits, PNP_IRQ_NR))
> + return;
> + }
> + }
> +
> + if (irq && port) {
> + bitmap_zero(irq->map.bits, PNP_IRQ_NR);
> + __set_bit(3, irq->map.bits);
> + }
> +}
> +
> static void quirk_cmi8330_resources(struct pnp_dev *dev)
> {
> struct pnp_option *option;
> @@ -448,6 +490,7 @@ static struct pnp_fixup pnp_fixups[] = {
> #ifdef CONFIG_PCI
> {"PNP0c02", quirk_intel_mch},
> #endif
> + {"PNP0c02", quirk_serial_port},
> {""}
> };
>

I think this should be PNP0501 instead of PNP0c02.
Once I alter that then when I boot the serial comes up on irq 3. However
it still hangs.
I'll keep digging.

2016-10-30 16:01:45

by Nathan Zimmer

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs



On 10/29/2016 04:16 PM, Sean Young wrote:
> On Fri, Oct 28, 2016 at 02:42:25PM -0500, Nathan Zimmer wrote:
>> On Thu, Oct 27, 2016 at 09:19:16PM +0100, Sean Young wrote:
>>> On Wed, Oct 26, 2016 at 01:16:16PM -0500, Nathan Zimmer wrote:
>>>> On 10/25/2016 03:41 PM, Sean Young wrote:
>>>>> On Mon, Oct 24, 2016 at 04:49:25PM -0500, Nathan Zimmer wrote:
>>>>>> [ 1.565062] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
>>>>> The isa probe driver find the serial port.
>>>>>
>>>>>> [ 1.566453] serial 00:04: pnp_assign_resources, try dependent set 0
>>>>>> [ 1.567383] serial 00:04: couldn't assign io 0 (min 0x2f8 max 0x2f8)
>>>>> But then decides that the port is already in use (the existing serial driver).
>>>>>> [ 1.568366] serial 00:04: pnp_assign_resources failed (-16)
>>>>>> [ 1.569188] serial 00:04: unable to assign resources
>>>>>> [ 1.569924] serial: probe of 00:04 failed with error -16
>>>>> Please try and boot 3.7.0 with "8250.share_irqs=1", maybe it will pick
>>>>> irq 3 and it will be happy again, but that is just a guess.
>>>>>
>>>>> I think I have not fully understood what the failure is. Does the serial
>>>>> port not work or does the boot hang? What are the symptoms?
>>>> With console=ttyS1 the boot will "hang", sometimes it makes it all the way
>>>> through but may take 30 minutes, instead of the 2-4 minutes this box
>>> Where does it hang? Any error messages?
>>>
>> Shortly after mounting the root.
>> After that I get no more output...
>>
>> Here is a failure log from 4.8.
> So does the console on the serial port work on 4.8? Also what does
> "cat /proc/interrupts" say with and without my change of ordering the
> serial pnp probe before the isa probe?
>
> It could be that the serial driver picking a different irq caused another
> driver to be forced to pick another irq which does not work.
>
>
> Sean

Works is a strong word. It produces output until it hangs in boot.

It certainly could be another driver but I don't see another irq12 in
the dmesg log.

I'll grab some data once I back in the office tomorrow.


Nate

2016-10-31 20:27:16

by Sean Young

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

On Sun, Oct 30, 2016 at 10:33:02AM -0500, Nathan wrote:
> I think this should be PNP0501 instead of PNP0c02.
> Once I alter that then when I boot the serial comes up on irq 3. However it
> still hangs.
> I'll keep digging.

Well that's that theory out of the window. I'm not sure where to look now,
I would start by enabling as many as possible of the "kernel hacking" config
options and see if anything gets caught.

Looking at your earlier messages, you have a collection of percpu allocation
failures. That might be worth resolving before anything else.


Sean

2016-11-01 02:55:54

by Peter Hurley

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

On Mon, Oct 31, 2016 at 2:27 PM, Sean Young <[email protected]> wrote:
> On Sun, Oct 30, 2016 at 10:33:02AM -0500, Nathan wrote:
>> I think this should be PNP0501 instead of PNP0c02.
>> Once I alter that then when I boot the serial comes up on irq 3. However it
>> still hangs.
>> I'll keep digging.
>
> Well that's that theory out of the window. I'm not sure where to look now,
> I would start by enabling as many as possible of the "kernel hacking" config
> options and see if anything gets caught.
>
> Looking at your earlier messages, you have a collection of percpu allocation
> failures. That might be worth resolving before anything else.

Hi Nathan,

Couple of questions:
1. Was login over serial console setup and working on SLES 11? or was
the 'console=ttyS1' only for debug output?
I ask because console output doesn't use IRQs; iow, maybe the serial
port w/ driver never actually worked.
2. Can you post dmesg for the SLES 11 setup? That would show if there
were probe errors even on that.

An alternative that should be equivalent to your previous setup is to
build w/ CONFIG_SERIAL_8250_PNP=n
Seems like your ACPI BIOS is buggy, but also that something else is using IRQ 3?

Regards,
Peter Hurley

2016-11-02 15:30:00

by Nathan Zimmer

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

On Mon, Oct 31, 2016 at 08:55:49PM -0600, Peter Hurley wrote:
> On Mon, Oct 31, 2016 at 2:27 PM, Sean Young <[email protected]> wrote:
> > On Sun, Oct 30, 2016 at 10:33:02AM -0500, Nathan wrote:
> >> I think this should be PNP0501 instead of PNP0c02.
> >> Once I alter that then when I boot the serial comes up on irq 3. However it
> >> still hangs.
> >> I'll keep digging.
> >
> > Well that's that theory out of the window. I'm not sure where to look now,
> > I would start by enabling as many as possible of the "kernel hacking" config
> > options and see if anything gets caught.
> >
> > Looking at your earlier messages, you have a collection of percpu allocation
> > failures. That might be worth resolving before anything else.
>
> Hi Nathan,
>
> Couple of questions:
> 1. Was login over serial console setup and working on SLES 11? or was
> the 'console=ttyS1' only for debug output?
> I ask because console output doesn't use IRQs; iow, maybe the serial
> port w/ driver never actually worked.
> 2. Can you post dmesg for the SLES 11 setup? That would show if there
> were probe errors even on that.
>
> An alternative that should be equivalent to your previous setup is to
> build w/ CONFIG_SERIAL_8250_PNP=n
> Seems like your ACPI BIOS is buggy, but also that something else is using IRQ 3?
>
> Regards,
> Peter Hurley



1) Yes I can confirm I used it to login sometimes.

I built with CONFIG_SERIAL_8250_PNP=n and that seemed to work better, in that the system did not hang.
However I couldn't login on the serial and got these error messages, I suspect I broke something while trying different permutations.

gdm[5206]: WARNING: GdmDisplay: display lasted 0.136636 seconds
gdm[5206]: WARNING: GdmDisplay: display lasted 0.180955 seconds
gdm[5206]: WARNING: GdmDisplay: display lasted 0.161415 seconds
gdm[5206]: WARNING: GdmLocalDisplayFactory: maximum number of X display failures reached: check X server log for errors

It did boot all the way though.

2) attached log



Attachments:
(No filename) (1.94 kB)
dmesg_sles11 (91.76 kB)
Download all attachments

2016-11-04 00:25:50

by Peter Hurley

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

On Wed, Nov 2, 2016 at 9:29 AM, Nathan Zimmer <[email protected]> wrote:
> On Mon, Oct 31, 2016 at 08:55:49PM -0600, Peter Hurley wrote:
>> On Mon, Oct 31, 2016 at 2:27 PM, Sean Young <[email protected]> wrote:
>> > On Sun, Oct 30, 2016 at 10:33:02AM -0500, Nathan wrote:
>> >> I think this should be PNP0501 instead of PNP0c02.
>> >> Once I alter that then when I boot the serial comes up on irq 3. However it
>> >> still hangs.
>> >> I'll keep digging.
>> >
>> > Well that's that theory out of the window. I'm not sure where to look now,
>> > I would start by enabling as many as possible of the "kernel hacking" config
>> > options and see if anything gets caught.
>> >
>> > Looking at your earlier messages, you have a collection of percpu allocation
>> > failures. That might be worth resolving before anything else.
>>
>> Hi Nathan,
>>
>> Couple of questions:
>> 1. Was login over serial console setup and working on SLES 11? or was
>> the 'console=ttyS1' only for debug output?
>> I ask because console output doesn't use IRQs; iow, maybe the serial
>> port w/ driver never actually worked.
>> 2. Can you post dmesg for the SLES 11 setup? That would show if there
>> were probe errors even on that.
>>
>> An alternative that should be equivalent to your previous setup is to
>> build w/ CONFIG_SERIAL_8250_PNP=n
>> Seems like your ACPI BIOS is buggy, but also that something else is using IRQ 3?
>>
>> Regards,
>> Peter Hurley
>
>
>
> 1) Yes I can confirm I used it to login sometimes.
>
> I built with CONFIG_SERIAL_8250_PNP=n and that seemed to work better, in that the system did not hang.
> However I couldn't login on the serial and got these error messages, I suspect I broke something while trying different permutations.
>
> gdm[5206]: WARNING: GdmDisplay: display lasted 0.136636 seconds
> gdm[5206]: WARNING: GdmDisplay: display lasted 0.180955 seconds
> gdm[5206]: WARNING: GdmDisplay: display lasted 0.161415 seconds
> gdm[5206]: WARNING: GdmLocalDisplayFactory: maximum number of X display failures reached: check X server log for errors
>
> It did boot all the way though.
>
> 2) attached log

So I'm confused where this leaves us.

In your OP, you claim to have gotten it working with a partial revert
of commit 835d844d1a28 (but you didn't attach the partial revert so no
one knows what you did); however, my suggestion should have been
equivalent.

Note that you have the serial port disabled in BIOS; that's why you're
getting the probe error for PNP.

Regards,
Peter Hurley

2016-11-04 21:33:54

by Nathan Zimmer

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

On Thu, Nov 03, 2016 at 06:25:46PM -0600, Peter Hurley wrote:
> On Wed, Nov 2, 2016 at 9:29 AM, Nathan Zimmer <[email protected]> wrote:
> > On Mon, Oct 31, 2016 at 08:55:49PM -0600, Peter Hurley wrote:
> >> On Mon, Oct 31, 2016 at 2:27 PM, Sean Young <[email protected]> wrote:
> >> > On Sun, Oct 30, 2016 at 10:33:02AM -0500, Nathan wrote:
> >> >> I think this should be PNP0501 instead of PNP0c02.
> >> >> Once I alter that then when I boot the serial comes up on irq 3. However it
> >> >> still hangs.
> >> >> I'll keep digging.
> >> >
> >> > Well that's that theory out of the window. I'm not sure where to look now,
> >> > I would start by enabling as many as possible of the "kernel hacking" config
> >> > options and see if anything gets caught.
> >> >
> >> > Looking at your earlier messages, you have a collection of percpu allocation
> >> > failures. That might be worth resolving before anything else.
> >>
> >> Hi Nathan,
> >>
> >> Couple of questions:
> >> 1. Was login over serial console setup and working on SLES 11? or was
> >> the 'console=ttyS1' only for debug output?
> >> I ask because console output doesn't use IRQs; iow, maybe the serial
> >> port w/ driver never actually worked.
> >> 2. Can you post dmesg for the SLES 11 setup? That would show if there
> >> were probe errors even on that.
> >>
> >> An alternative that should be equivalent to your previous setup is to
> >> build w/ CONFIG_SERIAL_8250_PNP=n
> >> Seems like your ACPI BIOS is buggy, but also that something else is using IRQ 3?
> >>
> >> Regards,
> >> Peter Hurley
> >
> >
> >
> > 1) Yes I can confirm I used it to login sometimes.
> >
> > I built with CONFIG_SERIAL_8250_PNP=n and that seemed to work better, in that the system did not hang.
> > However I couldn't login on the serial and got these error messages, I suspect I broke something while trying different permutations.
> >
> > gdm[5206]: WARNING: GdmDisplay: display lasted 0.136636 seconds
> > gdm[5206]: WARNING: GdmDisplay: display lasted 0.180955 seconds
> > gdm[5206]: WARNING: GdmDisplay: display lasted 0.161415 seconds
> > gdm[5206]: WARNING: GdmLocalDisplayFactory: maximum number of X display failures reached: check X server log for errors
> >
> > It did boot all the way though.
> >
> > 2) attached log
>
> So I'm confused where this leaves us.
>
> In your OP, you claim to have gotten it working with a partial revert
> of commit 835d844d1a28 (but you didn't attach the partial revert so no
> one knows what you did); however, my suggestion should have been
> equivalent.

I apologize, if I was unclear. Your suggestion of CONFIG_SERIAL_8250_PNP=n did successfully boot and provide messages
across the console, and yes is basically equivelent to the revert.
Those warnings I just noticed in the dmesg and they weren't there before.

>
> Note that you have the serial port disabled in BIOS; that's why you're
> getting the probe error for PNP.
>
> Regards,
> Peter Hurley


Now when you say its diabled in bios, how can I be sure and double check that?
These bios screens do not have any mention of PNP settings.
I am getting output over the console (via ipmi) until the boot hangs.

Nate

2016-11-04 22:18:46

by Peter Hurley

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

On Fri, Nov 4, 2016 at 3:33 PM, Nathan Zimmer <[email protected]> wrote:
> On Thu, Nov 03, 2016 at 06:25:46PM -0600, Peter Hurley wrote:
>> On Wed, Nov 2, 2016 at 9:29 AM, Nathan Zimmer <[email protected]> wrote:
>> > On Mon, Oct 31, 2016 at 08:55:49PM -0600, Peter Hurley wrote:
>> >> On Mon, Oct 31, 2016 at 2:27 PM, Sean Young <[email protected]> wrote:
>> >> > On Sun, Oct 30, 2016 at 10:33:02AM -0500, Nathan wrote:
>> >> >> I think this should be PNP0501 instead of PNP0c02.
>> >> >> Once I alter that then when I boot the serial comes up on irq 3. However it
>> >> >> still hangs.
>> >> >> I'll keep digging.
>> >> >
>> >> > Well that's that theory out of the window. I'm not sure where to look now,
>> >> > I would start by enabling as many as possible of the "kernel hacking" config
>> >> > options and see if anything gets caught.
>> >> >
>> >> > Looking at your earlier messages, you have a collection of percpu allocation
>> >> > failures. That might be worth resolving before anything else.
>> >>
>> >> Hi Nathan,
>> >>
>> >> Couple of questions:
>> >> 1. Was login over serial console setup and working on SLES 11? or was
>> >> the 'console=ttyS1' only for debug output?
>> >> I ask because console output doesn't use IRQs; iow, maybe the serial
>> >> port w/ driver never actually worked.
>> >> 2. Can you post dmesg for the SLES 11 setup? That would show if there
>> >> were probe errors even on that.
>> >>
>> >> An alternative that should be equivalent to your previous setup is to
>> >> build w/ CONFIG_SERIAL_8250_PNP=n
>> >> Seems like your ACPI BIOS is buggy, but also that something else is using IRQ 3?
>> >>
>> >> Regards,
>> >> Peter Hurley
>> >
>> >
>> >
>> > 1) Yes I can confirm I used it to login sometimes.
>> >
>> > I built with CONFIG_SERIAL_8250_PNP=n and that seemed to work better, in that the system did not hang.
>> > However I couldn't login on the serial and got these error messages, I suspect I broke something while trying different permutations.
>> >
>> > gdm[5206]: WARNING: GdmDisplay: display lasted 0.136636 seconds
>> > gdm[5206]: WARNING: GdmDisplay: display lasted 0.180955 seconds
>> > gdm[5206]: WARNING: GdmDisplay: display lasted 0.161415 seconds
>> > gdm[5206]: WARNING: GdmLocalDisplayFactory: maximum number of X display failures reached: check X server log for errors
>> >
>> > It did boot all the way though.
>> >
>> > 2) attached log
>>
>> So I'm confused where this leaves us.
>>
>> In your OP, you claim to have gotten it working with a partial revert
>> of commit 835d844d1a28 (but you didn't attach the partial revert so no
>> one knows what you did); however, my suggestion should have been
>> equivalent.
>
> I apologize, if I was unclear. Your suggestion of CONFIG_SERIAL_8250_PNP=n did successfully boot and provide messages
> across the console, and yes is basically equivelent to the revert.

Ok, so the partial revert didn't get the login working then?

> Those warnings I just noticed in the dmesg and they weren't there before.
>
>>
>> Note that you have the serial port disabled in BIOS; that's why you're
>> getting the probe error for PNP.
>
> Now when you say its diabled in bios, how can I be sure and double check that?

Well, the ACPI BIOS is reporting it as disabled. Even the SLES11 log says:

[ 2.136899] pnp 00:04: Plug and Play ACPI device, IDs PNP0501 (disabled)


> These bios screens do not have any mention of PNP settings.
> I am getting output over the console (via ipmi) until the boot hangs.

Yeah, probably the device actually decodes io address access anyway,
but in the disabled state probably has not routed IRQ.

I have no idea how to help you with the bios, sorry.

Regards,
Peter Hurley

2016-11-05 23:44:38

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs

On Fri, 4 Nov 2016, Peter Hurley wrote:

> > These bios screens do not have any mention of PNP settings.
> > I am getting output over the console (via ipmi) until the boot hangs.
>
> Yeah, probably the device actually decodes io address access anyway,
> but in the disabled state probably has not routed IRQ.
>
> I have no idea how to help you with the bios, sorry.

I'd look out for serial port, Super-I/O or COM1 port (which is how PC-DOS
named the device some 35 years ago) settings rather than anything to do
with PNP. Typically you'd be able to choose from a few classic combined
I/O space address and IRQ assignments in addition to a `Disabled' setting.

There might be a genuine BIOS bug there as well of course as serial ports
seem to be less used these days and the issue may have escaped validation.

Maciej

2016-11-07 15:40:38

by Nathan Zimmer

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs



On 11/05/2016 06:44 PM, Maciej W. Rozycki wrote:
> On Fri, 4 Nov 2016, Peter Hurley wrote:
>
>>> These bios screens do not have any mention of PNP settings.
>>> I am getting output over the console (via ipmi) until the boot hangs.
>> Yeah, probably the device actually decodes io address access anyway,
>> but in the disabled state probably has not routed IRQ.
>>
>> I have no idea how to help you with the bios, sorry.
> I'd look out for serial port, Super-I/O or COM1 port (which is how PC-DOS
> named the device some 35 years ago) settings rather than anything to do
> with PNP. Typically you'd be able to choose from a few classic combined
> I/O space address and IRQ assignments in addition to a `Disabled' setting.
>
> There might be a genuine BIOS bug there as well of course as serial ports
> seem to be less used these days and the issue may have escaped validation.
>
> Maciej

Given they hid some of the setting in the bios to keep these boxes
uniform I will have to talk with the bios guys more about that.
Last time they were certain it wasn't a big deal.

Thanks,
Nate

2016-11-22 15:31:00

by Nathan Zimmer

[permalink] [raw]
Subject: Re: console issue since 3.6, console=ttyS1 hangs



On 11/05/2016 06:44 PM, Maciej W. Rozycki wrote:
> On Fri, 4 Nov 2016, Peter Hurley wrote:
>
>>> These bios screens do not have any mention of PNP settings.
>>> I am getting output over the console (via ipmi) until the boot hangs.
>> Yeah, probably the device actually decodes io address access anyway,
>> but in the disabled state probably has not routed IRQ.
>>
>> I have no idea how to help you with the bios, sorry.
> I'd look out for serial port, Super-I/O or COM1 port (which is how PC-DOS
> named the device some 35 years ago) settings rather than anything to do
> with PNP. Typically you'd be able to choose from a few classic combined
> I/O space address and IRQ assignments in addition to a `Disabled' setting.
>
> There might be a genuine BIOS bug there as well of course as serial ports
> seem to be less used these days and the issue may have escaped validation.
>
> Maciej

After getting back to this after some other hot issues came up I am sure
there is a bios bug, comparing acpidump of this box to other boxes
really makes it obvious.

I found pnpacpi=off allows me to boot fine and get console traffic.
However I was hoping to find something that cast a little narrower net.


Nate