2015-06-19 13:54:04

by Böszörményi Zoltán

[permalink] [raw]
Subject: Ethernet chip disappeared from lspci

Hi,

I have a problem on a special POS mainboard that has
a Realtek RTL8111/8168/8411 chip. I use mainline kernel 4.0.5.

The initial problem was that when r8169 was not blacklisted,
as soon as this driver loaded, a lot of IRQ problems popped up,
like pressing keys on the USB keyboard made the keys duplicated
and the system was sluggish. Upon powering off the system,
the r8169 driver compained about "rtl_eriar_cond = 1 loop 100"
or something like that and the system couldn't even reboot or
get powered down properly.

It was impossible to get dmesg or other diagnostics info out of
the system in this state.

When I blacklisted r8169, everything was OK except there was
no network, obviously.

I also noticed that with kernel 4.0.5, there are memory range conflicts, like

pci 0000:00:02.0: can't claim BAR 0 [mem ....]: address conflict with PCI Bus 0000:00 [mem
... window]

I also tried to load the r8168 driver from Realtek, with the
same results as with r8169.

I don't know what happened, was it the "official" Realtek driver
that disabled the chip, or that I toggled the PXE boot in the BIOS,
but now lspci doesn't list the ethernet chip anymore and not even
the PXE boot messages show up, despite it being enabled in the BIOS.
I tried kernels 3.18.16, 4.0.5 again and 4.1.0-rc8.

I have this in dmesg:

[ 0.136171] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 *7 10 11 12 14 15)
[ 0.136323] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 11 12 *14 15)
[ 0.136466] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 10 11 12 14 *15)
[ 0.136609] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 11 12 14 *15)
[ 0.136751] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
[ 0.136894] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
[ 0.137050] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
[ 0.137195] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 *6 7 10 11 12 14 15)

and

[ 0.139098] PCI: Using ACPI for IRQ routing
[ 0.139098] PCI: pci_cache_line_size set to 64 bytes
[ 0.139098] pci 0000:00:02.0: can't claim BAR 0 [mem 0xfeb00000-0xfeb7ffff]: address
conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
[ 0.139098] pci 0000:00:02.0: can't claim BAR 2 [mem 0xd0000000-0xdfffffff pref]:
address conflict with PCI Bus 0000:00 [mem 0x7f700000-0xdfffffff window]
[ 0.139104] pci 0000:00:02.0: can't claim BAR 3 [mem 0xfea00000-0xfeafffff]: address
conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
[ 0.139113] pci 0000:00:02.1: can't claim BAR 0 [mem 0xfeb80000-0xfebfffff]: address
conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
[ 0.139123] pci 0000:00:1b.0: can't claim BAR 0 [mem 0xfe9f8000-0xfe9fbfff 64bit]:
address conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
[ 0.139146] pci 0000:00:1d.7: can't claim BAR 0 [mem 0xfe9f7c00-0xfe9f7fff]: address
conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
[ 0.139161] pci 0000:00:1f.2: can't claim BAR 5 [mem 0xfe9f7800-0xfe9f7bff]: address
conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
[ 0.139190] Expanded resource reserved due to conflict with PCI Bus 0000:00

Full dmesg for 4.0.5 is attached.

Can anyone help me re-enable the network card?

Thanks in advance,
Zolt?n B?sz?rm?nyi


Attachments:
dmesg-4.0.5.log (45.26 kB)

2015-06-19 13:31:50

by Böszörményi Zoltán

[permalink] [raw]
Subject: Re: Ethernet chip disappeared from lspci

Nevermind, this is a POS machine with a big battery inside.
When I allowed it to discharge, the network card came back
with PXE boot. There might have been some bad state kept
by the battery.

Sorry for the noise.

2015-06-19 15:24 keltez?ssel, Boszormenyi Zoltan ?rta:
> Hi,
>
> I have a problem on a special POS mainboard that has
> a Realtek RTL8111/8168/8411 chip. I use mainline kernel 4.0.5.
>
> The initial problem was that when r8169 was not blacklisted,
> as soon as this driver loaded, a lot of IRQ problems popped up,
> like pressing keys on the USB keyboard made the keys duplicated
> and the system was sluggish. Upon powering off the system,
> the r8169 driver compained about "rtl_eriar_cond = 1 loop 100"
> or something like that and the system couldn't even reboot or
> get powered down properly.
>
> It was impossible to get dmesg or other diagnostics info out of
> the system in this state.
>
> When I blacklisted r8169, everything was OK except there was
> no network, obviously.
>
> I also noticed that with kernel 4.0.5, there are memory range conflicts, like
>
> pci 0000:00:02.0: can't claim BAR 0 [mem ....]: address conflict with PCI Bus 0000:00 [mem
> ... window]
>
> I also tried to load the r8168 driver from Realtek, with the
> same results as with r8169.
>
> I don't know what happened, was it the "official" Realtek driver
> that disabled the chip, or that I toggled the PXE boot in the BIOS,
> but now lspci doesn't list the ethernet chip anymore and not even
> the PXE boot messages show up, despite it being enabled in the BIOS.
> I tried kernels 3.18.16, 4.0.5 again and 4.1.0-rc8.
>
> I have this in dmesg:
>
> [ 0.136171] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 *7 10 11 12 14 15)
> [ 0.136323] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 11 12 *14 15)
> [ 0.136466] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 10 11 12 14 *15)
> [ 0.136609] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 11 12 14 *15)
> [ 0.136751] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
> [ 0.136894] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
> [ 0.137050] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
> [ 0.137195] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 *6 7 10 11 12 14 15)
>
> and
>
> [ 0.139098] PCI: Using ACPI for IRQ routing
> [ 0.139098] PCI: pci_cache_line_size set to 64 bytes
> [ 0.139098] pci 0000:00:02.0: can't claim BAR 0 [mem 0xfeb00000-0xfeb7ffff]: address
> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
> [ 0.139098] pci 0000:00:02.0: can't claim BAR 2 [mem 0xd0000000-0xdfffffff pref]:
> address conflict with PCI Bus 0000:00 [mem 0x7f700000-0xdfffffff window]
> [ 0.139104] pci 0000:00:02.0: can't claim BAR 3 [mem 0xfea00000-0xfeafffff]: address
> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
> [ 0.139113] pci 0000:00:02.1: can't claim BAR 0 [mem 0xfeb80000-0xfebfffff]: address
> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
> [ 0.139123] pci 0000:00:1b.0: can't claim BAR 0 [mem 0xfe9f8000-0xfe9fbfff 64bit]:
> address conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
> [ 0.139146] pci 0000:00:1d.7: can't claim BAR 0 [mem 0xfe9f7c00-0xfe9f7fff]: address
> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
> [ 0.139161] pci 0000:00:1f.2: can't claim BAR 5 [mem 0xfe9f7800-0xfe9f7bff]: address
> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
> [ 0.139190] Expanded resource reserved due to conflict with PCI Bus 0000:00
>
> Full dmesg for 4.0.5 is attached.
>
> Can anyone help me re-enable the network card?
>
> Thanks in advance,
> Zolt?n B?sz?rm?nyi
>

2015-06-19 13:47:05

by Böszörményi Zoltán

[permalink] [raw]
Subject: ACPI regression? Was Re: Ethernet chip disappeared from lspci

Hi,

so after the network card came alive again, I tried kernels
3.18.16, 4.0.5 and 4.1.0-rc8. With the last two kernels, when
loading the r8169 driver, I experience the symptoms described
below. Also, after booting 4.0.5 and then 4.1.0-rc8, the network
card disappeared from the PCI devices again, neither PXE shows up
nor the device in lspci.

It seems I will have to wait again until the battery loses its
capacity since the last testing to get the network chip back.

I would be happy to test patches that may fix this behavior.

With 3.18.16 and the device in lspci, the network works with r8169.

Best regards,
Zolt?n B?sz?rm?nyi

2015-06-19 15:31 keltez?ssel, Boszormenyi Zoltan ?rta:
> Nevermind, this is a POS machine with a big battery inside.
> When I allowed it to discharge, the network card came back
> with PXE boot. There might have been some bad state kept
> by the battery.
>
> Sorry for the noise.
>
> 2015-06-19 15:24 keltez?ssel, Boszormenyi Zoltan ?rta:
>> Hi,
>>
>> I have a problem on a special POS mainboard that has
>> a Realtek RTL8111/8168/8411 chip. I use mainline kernel 4.0.5.
>>
>> The initial problem was that when r8169 was not blacklisted,
>> as soon as this driver loaded, a lot of IRQ problems popped up,
>> like pressing keys on the USB keyboard made the keys duplicated
>> and the system was sluggish. Upon powering off the system,
>> the r8169 driver compained about "rtl_eriar_cond = 1 loop 100"
>> or something like that and the system couldn't even reboot or
>> get powered down properly.
>>
>> It was impossible to get dmesg or other diagnostics info out of
>> the system in this state.
>>
>> When I blacklisted r8169, everything was OK except there was
>> no network, obviously.
>>
>> I also noticed that with kernel 4.0.5, there are memory range conflicts, like
>>
>> pci 0000:00:02.0: can't claim BAR 0 [mem ....]: address conflict with PCI Bus 0000:00 [mem
>> ... window]
>>
>> I also tried to load the r8168 driver from Realtek, with the
>> same results as with r8169.
>>
>> I don't know what happened, was it the "official" Realtek driver
>> that disabled the chip, or that I toggled the PXE boot in the BIOS,
>> but now lspci doesn't list the ethernet chip anymore and not even
>> the PXE boot messages show up, despite it being enabled in the BIOS.
>> I tried kernels 3.18.16, 4.0.5 again and 4.1.0-rc8.
>>
>> I have this in dmesg:
>>
>> [ 0.136171] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 *7 10 11 12 14 15)
>> [ 0.136323] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 11 12 *14 15)
>> [ 0.136466] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 10 11 12 14 *15)
>> [ 0.136609] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 11 12 14 *15)
>> [ 0.136751] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
>> [ 0.136894] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
>> [ 0.137050] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
>> [ 0.137195] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 *6 7 10 11 12 14 15)
>>
>> and
>>
>> [ 0.139098] PCI: Using ACPI for IRQ routing
>> [ 0.139098] PCI: pci_cache_line_size set to 64 bytes
>> [ 0.139098] pci 0000:00:02.0: can't claim BAR 0 [mem 0xfeb00000-0xfeb7ffff]: address
>> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
>> [ 0.139098] pci 0000:00:02.0: can't claim BAR 2 [mem 0xd0000000-0xdfffffff pref]:
>> address conflict with PCI Bus 0000:00 [mem 0x7f700000-0xdfffffff window]
>> [ 0.139104] pci 0000:00:02.0: can't claim BAR 3 [mem 0xfea00000-0xfeafffff]: address
>> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
>> [ 0.139113] pci 0000:00:02.1: can't claim BAR 0 [mem 0xfeb80000-0xfebfffff]: address
>> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
>> [ 0.139123] pci 0000:00:1b.0: can't claim BAR 0 [mem 0xfe9f8000-0xfe9fbfff 64bit]:
>> address conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
>> [ 0.139146] pci 0000:00:1d.7: can't claim BAR 0 [mem 0xfe9f7c00-0xfe9f7fff]: address
>> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
>> [ 0.139161] pci 0000:00:1f.2: can't claim BAR 5 [mem 0xfe9f7800-0xfe9f7bff]: address
>> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
>> [ 0.139190] Expanded resource reserved due to conflict with PCI Bus 0000:00
>>
>> Full dmesg for 4.0.5 is attached.
>>
>> Can anyone help me re-enable the network card?
>>
>> Thanks in advance,
>> Zolt?n B?sz?rm?nyi
>>

2015-06-19 22:47:16

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: ACPI regression? Was Re: Ethernet chip disappeared from lspci

On Friday, June 19, 2015 03:46:48 PM Boszormenyi Zoltan wrote:
> Hi,
>
> so after the network card came alive again, I tried kernels
> 3.18.16, 4.0.5 and 4.1.0-rc8. With the last two kernels, when
> loading the r8169 driver, I experience the symptoms described
> below. Also, after booting 4.0.5 and then 4.1.0-rc8, the network
> card disappeared from the PCI devices again, neither PXE shows up
> nor the device in lspci.
>
> It seems I will have to wait again until the battery loses its
> capacity since the last testing to get the network chip back.
>
> I would be happy to test patches that may fix this behavior.
>
> With 3.18.16 and the device in lspci, the network works with r8169.

The only think I can suggest is to try this patch:

https://patchwork.kernel.org/patch/6628061/

and see if it makes any difference.


> 2015-06-19 15:31 keltezéssel, Boszormenyi Zoltan írta:
> > Nevermind, this is a POS machine with a big battery inside.
> > When I allowed it to discharge, the network card came back
> > with PXE boot. There might have been some bad state kept
> > by the battery.
> >
> > Sorry for the noise.
> >
> > 2015-06-19 15:24 keltezéssel, Boszormenyi Zoltan írta:
> >> Hi,
> >>
> >> I have a problem on a special POS mainboard that has
> >> a Realtek RTL8111/8168/8411 chip. I use mainline kernel 4.0.5.
> >>
> >> The initial problem was that when r8169 was not blacklisted,
> >> as soon as this driver loaded, a lot of IRQ problems popped up,
> >> like pressing keys on the USB keyboard made the keys duplicated
> >> and the system was sluggish. Upon powering off the system,
> >> the r8169 driver compained about "rtl_eriar_cond = 1 loop 100"
> >> or something like that and the system couldn't even reboot or
> >> get powered down properly.
> >>
> >> It was impossible to get dmesg or other diagnostics info out of
> >> the system in this state.
> >>
> >> When I blacklisted r8169, everything was OK except there was
> >> no network, obviously.
> >>
> >> I also noticed that with kernel 4.0.5, there are memory range conflicts, like
> >>
> >> pci 0000:00:02.0: can't claim BAR 0 [mem ....]: address conflict with PCI Bus 0000:00 [mem
> >> ... window]
> >>
> >> I also tried to load the r8168 driver from Realtek, with the
> >> same results as with r8169.
> >>
> >> I don't know what happened, was it the "official" Realtek driver
> >> that disabled the chip, or that I toggled the PXE boot in the BIOS,
> >> but now lspci doesn't list the ethernet chip anymore and not even
> >> the PXE boot messages show up, despite it being enabled in the BIOS.
> >> I tried kernels 3.18.16, 4.0.5 again and 4.1.0-rc8.
> >>
> >> I have this in dmesg:
> >>
> >> [ 0.136171] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 *7 10 11 12 14 15)
> >> [ 0.136323] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 11 12 *14 15)
> >> [ 0.136466] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 10 11 12 14 *15)
> >> [ 0.136609] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 11 12 14 *15)
> >> [ 0.136751] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
> >> [ 0.136894] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
> >> [ 0.137050] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
> >> [ 0.137195] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 *6 7 10 11 12 14 15)
> >>
> >> and
> >>
> >> [ 0.139098] PCI: Using ACPI for IRQ routing
> >> [ 0.139098] PCI: pci_cache_line_size set to 64 bytes
> >> [ 0.139098] pci 0000:00:02.0: can't claim BAR 0 [mem 0xfeb00000-0xfeb7ffff]: address
> >> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
> >> [ 0.139098] pci 0000:00:02.0: can't claim BAR 2 [mem 0xd0000000-0xdfffffff pref]:
> >> address conflict with PCI Bus 0000:00 [mem 0x7f700000-0xdfffffff window]
> >> [ 0.139104] pci 0000:00:02.0: can't claim BAR 3 [mem 0xfea00000-0xfeafffff]: address
> >> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
> >> [ 0.139113] pci 0000:00:02.1: can't claim BAR 0 [mem 0xfeb80000-0xfebfffff]: address
> >> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
> >> [ 0.139123] pci 0000:00:1b.0: can't claim BAR 0 [mem 0xfe9f8000-0xfe9fbfff 64bit]:
> >> address conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
> >> [ 0.139146] pci 0000:00:1d.7: can't claim BAR 0 [mem 0xfe9f7c00-0xfe9f7fff]: address
> >> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
> >> [ 0.139161] pci 0000:00:1f.2: can't claim BAR 5 [mem 0xfe9f7800-0xfe9f7bff]: address
> >> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
> >> [ 0.139190] Expanded resource reserved due to conflict with PCI Bus 0000:00
> >>
> >> Full dmesg for 4.0.5 is attached.
> >>
> >> Can anyone help me re-enable the network card?
> >>
> >> Thanks in advance,
> >> Zoltán Böszörményi
> >>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> Please read the FAQ at http://www.tux.org/lkml/

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2015-06-20 06:38:55

by Böszörményi Zoltán

[permalink] [raw]
Subject: Re: ACPI regression? Was Re: Ethernet chip disappeared from lspci

2015-06-20 01:13 keltezéssel, Rafael J. Wysocki írta:
> On Friday, June 19, 2015 03:46:48 PM Boszormenyi Zoltan wrote:
>> Hi,
>>
>> so after the network card came alive again, I tried kernels
>> 3.18.16, 4.0.5 and 4.1.0-rc8. With the last two kernels, when
>> loading the r8169 driver, I experience the symptoms described
>> below. Also, after booting 4.0.5 and then 4.1.0-rc8, the network
>> card disappeared from the PCI devices again, neither PXE shows up
>> nor the device in lspci.
>>
>> It seems I will have to wait again until the battery loses its
>> capacity since the last testing to get the network chip back.
>>
>> I would be happy to test patches that may fix this behavior.
>>
>> With 3.18.16 and the device in lspci, the network works with r8169.
> The only think I can suggest is to try this patch:
>
> https://patchwork.kernel.org/patch/6628061/
>
> and see if it makes any difference.

Thanks, I tried it on 4.1.0-rc8 and it didn't make a difference.
Attached are the dmesg and the acpidump output, both compressed.
I hope you or someone else can see something that fixes this issue.
Until then, I am stuck with 3.18.16 on this machine.

Thanks in advance,
Zoltán Böszörményi

>
>
>> 2015-06-19 15:31 keltezéssel, Boszormenyi Zoltan írta:
>>> Nevermind, this is a POS machine with a big battery inside.
>>> When I allowed it to discharge, the network card came back
>>> with PXE boot. There might have been some bad state kept
>>> by the battery.
>>>
>>> Sorry for the noise.
>>>
>>> 2015-06-19 15:24 keltezéssel, Boszormenyi Zoltan írta:
>>>> Hi,
>>>>
>>>> I have a problem on a special POS mainboard that has
>>>> a Realtek RTL8111/8168/8411 chip. I use mainline kernel 4.0.5.
>>>>
>>>> The initial problem was that when r8169 was not blacklisted,
>>>> as soon as this driver loaded, a lot of IRQ problems popped up,
>>>> like pressing keys on the USB keyboard made the keys duplicated
>>>> and the system was sluggish. Upon powering off the system,
>>>> the r8169 driver compained about "rtl_eriar_cond = 1 loop 100"
>>>> or something like that and the system couldn't even reboot or
>>>> get powered down properly.
>>>>
>>>> It was impossible to get dmesg or other diagnostics info out of
>>>> the system in this state.
>>>>
>>>> When I blacklisted r8169, everything was OK except there was
>>>> no network, obviously.
>>>>
>>>> I also noticed that with kernel 4.0.5, there are memory range conflicts, like
>>>>
>>>> pci 0000:00:02.0: can't claim BAR 0 [mem ....]: address conflict with PCI Bus 0000:00 [mem
>>>> ... window]
>>>>
>>>> I also tried to load the r8168 driver from Realtek, with the
>>>> same results as with r8169.
>>>>
>>>> I don't know what happened, was it the "official" Realtek driver
>>>> that disabled the chip, or that I toggled the PXE boot in the BIOS,
>>>> but now lspci doesn't list the ethernet chip anymore and not even
>>>> the PXE boot messages show up, despite it being enabled in the BIOS.
>>>> I tried kernels 3.18.16, 4.0.5 again and 4.1.0-rc8.
>>>>
>>>> I have this in dmesg:
>>>>
>>>> [ 0.136171] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 *7 10 11 12 14 15)
>>>> [ 0.136323] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 10 11 12 *14 15)
>>>> [ 0.136466] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 10 11 12 14 *15)
>>>> [ 0.136609] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 11 12 14 *15)
>>>> [ 0.136751] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
>>>> [ 0.136894] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
>>>> [ 0.137050] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
>>>> [ 0.137195] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 *6 7 10 11 12 14 15)
>>>>
>>>> and
>>>>
>>>> [ 0.139098] PCI: Using ACPI for IRQ routing
>>>> [ 0.139098] PCI: pci_cache_line_size set to 64 bytes
>>>> [ 0.139098] pci 0000:00:02.0: can't claim BAR 0 [mem 0xfeb00000-0xfeb7ffff]: address
>>>> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
>>>> [ 0.139098] pci 0000:00:02.0: can't claim BAR 2 [mem 0xd0000000-0xdfffffff pref]:
>>>> address conflict with PCI Bus 0000:00 [mem 0x7f700000-0xdfffffff window]
>>>> [ 0.139104] pci 0000:00:02.0: can't claim BAR 3 [mem 0xfea00000-0xfeafffff]: address
>>>> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
>>>> [ 0.139113] pci 0000:00:02.1: can't claim BAR 0 [mem 0xfeb80000-0xfebfffff]: address
>>>> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
>>>> [ 0.139123] pci 0000:00:1b.0: can't claim BAR 0 [mem 0xfe9f8000-0xfe9fbfff 64bit]:
>>>> address conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
>>>> [ 0.139146] pci 0000:00:1d.7: can't claim BAR 0 [mem 0xfe9f7c00-0xfe9f7fff]: address
>>>> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
>>>> [ 0.139161] pci 0000:00:1f.2: can't claim BAR 5 [mem 0xfe9f7800-0xfe9f7bff]: address
>>>> conflict with PCI Bus 0000:00 [mem 0xf0000000-0xfed8ffff window]
>>>> [ 0.139190] Expanded resource reserved due to conflict with PCI Bus 0000:00
>>>>
>>>> Full dmesg for 4.0.5 is attached.
>>>>
>>>> Can anyone help me re-enable the network card?
>>>>
>>>> Thanks in advance,
>>>> Zoltán Böszörményi
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> Please read the FAQ at http://www.tux.org/lkml/


Attachments:
acpidump.tgz (47.89 kB)
dmesg-4.1.0-rc8-3.log.gz (12.85 kB)
Download all attachments

2015-06-20 07:45:15

by Andreas Mohr

[permalink] [raw]
Subject: Re: ACPI regression? Was Re: Ethernet chip disappeared from lspci

Hi,

[lkml.org still broken --> no accurate mail header info possible...]

Just to ask the obvious:
I assume using /sys/bus/pci/rescan does not help once it's broken?
(since the machine comes up empty at initial-boot scan, too)


Also, you could try diffing lspci -vvxxx -s.... output
of working vs. "distorting" kernel version - perhaps some register setup
has been changed (e.g. due to power management improvements or some such),
which may encourage the card
to get a problematic/corrupt state.

> Upon powering off the system,
> the r8169 driver compained about "rtl_eriar_cond = 1 loop 100"

Yup, that seems to be
rtl_eri_read() in ethernet/realtek/r8169.c
waiting on low condition of
RTL_R32(ERIAR) & ERIAR_FLAG;

Andreas Mohr

2015-06-21 10:34:39

by Böszörményi Zoltán

[permalink] [raw]
Subject: Re: ACPI regression? Was Re: Ethernet chip disappeared from lspci

Hi,

please, cc me, I am not subscribed to lkml.

> Hi,
>
> [lkml.org still broken --> no accurate mail header info possible...]
>
> Just to ask the obvious:
> I assume using /sys/bus/pci/rescan does not help once it's broken?
> (since the machine comes up empty at initial-boot scan, too)

I will try it, too, but I am not sure it would work.

Currently I can't test it because the last time I completely discharged
the battery. I also disconnected it to be able to get the realtek chip back
immediately for faster testing. Now, that I have reconnected the battery,
I need to wait for it to be charged somewhat to be able to reproduce
losing the network chip.

> Also, you could try diffing lspci -vvxxx -s.... output
> of working vs. "distorting" kernel version - perhaps some register setup
> has been changed (e.g. due to power management improvements or some such),
> which may encourage the card
> to get a problematic/corrupt state.

I attached a tarball that contains lspci -vvxxx for
- all devices / only the network chip
- before / after "modprobe r8169"
- for all 3 kernel versions tested.

I figured out that if I type the modprobe and lspci in the same command line,
I can get diagnostics out of the machine, after all.

It's not just the Realtek chip that has changed parameters.

(Vague idea) I noticed that some devices have changed like this:

- Memory behind bridge: 80000000-801fffff
- Prefetchable memory behind bridge: 0000000080200000-00000000803fffff
+ Memory behind bridge: ff000000-ff1fffff
+ Prefetchable memory behind bridge: 00000000ff200000-00000000ff3fffff

Can't this cause a problem? E.g. programming the bridge with an address range
that the bridge doesn't actually support?

>
> > Upon powering off the system,
> > the r8169 driver compained about "rtl_eriar_cond = 1 loop 100"
>
> Yup, that seems to be
> rtl_eri_read() in ethernet/realtek/r8169.c
> waiting on low condition of
> RTL_R32(ERIAR) & ERIAR_FLAG;

I found that, too, and I think it is a symptom of instead of the cause.

Thanks for your efforts,
Zoltán Böszörményi


Attachments:
lspci.tgz (22.32 kB)

2015-06-21 14:04:23

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: ACPI regression? Was Re: Ethernet chip disappeared from lspci

[+cc linux-pci]

Hi Boszormenyi,

On Sun, Jun 21, 2015 at 5:34 AM, Boszormenyi Zoltan <[email protected]> wrote:
> Hi,
>
> please, cc me, I am not subscribed to lkml.
>
>> Hi,
>>
>> [lkml.org still broken --> no accurate mail header info possible...]
>>
>> Just to ask the obvious:
>> I assume using /sys/bus/pci/rescan does not help once it's broken?
>> (since the machine comes up empty at initial-boot scan, too)
>
> I will try it, too, but I am not sure it would work.
>
> Currently I can't test it because the last time I completely discharged
> the battery. I also disconnected it to be able to get the realtek chip back
> immediately for faster testing. Now, that I have reconnected the battery,
> I need to wait for it to be charged somewhat to be able to reproduce
> losing the network chip.
>
>> Also, you could try diffing lspci -vvxxx -s.... output
>> of working vs. "distorting" kernel version - perhaps some register setup
>> has been changed (e.g. due to power management improvements or some such),
>> which may encourage the card
>> to get a problematic/corrupt state.
>
> I attached a tarball that contains lspci -vvxxx for
> - all devices / only the network chip
> - before / after "modprobe r8169"
> - for all 3 kernel versions tested.
>
> I figured out that if I type the modprobe and lspci in the same command line,
> I can get diagnostics out of the machine, after all.
>
> It's not just the Realtek chip that has changed parameters.
>
> (Vague idea) I noticed that some devices have changed like this:
>
> - Memory behind bridge: 80000000-801fffff
> - Prefetchable memory behind bridge: 0000000080200000-00000000803fffff
> + Memory behind bridge: ff000000-ff1fffff
> + Prefetchable memory behind bridge: 00000000ff200000-00000000ff3fffff
>
> Can't this cause a problem? E.g. programming the bridge with an address range
> that the bridge doesn't actually support?

This worked in v3.18.16, but not in v4.0.5 or v4.1.0-rc8. You
attached a v4.1.0-rc8 dmesg log earlier. Would you mind collecting a
v3.18.16 dmesg log, so we can compare them?

These (from the v4.1.0-rc8 dmesg) look wrong, but I'll have to look at
the code to see what might be going on:

acpi PNP0A08:00: host bridge window expanded to [mem
0x00000000-0xffffffff window]; [mem 0x00000000-0xffffffff window]
ignored
pci 0000:00:1c.1: can't claim BAR 15 [mem 0xfdf00000-0xfdffffff
64bit pref]: address conflict with PCI Bus 0000:00 [mem
0xf0000000-0xfed8ffff window]

Bjorn

2015-06-21 14:19:40

by Böszörményi Zoltán

[permalink] [raw]
Subject: Re: ACPI regression? Was Re: Ethernet chip disappeared from lspci

2015-06-21 16:03 keltezéssel, Bjorn Helgaas írta:
> [+cc linux-pci]
>
> Hi Boszormenyi,
>
> On Sun, Jun 21, 2015 at 5:34 AM, Boszormenyi Zoltan <[email protected]> wrote:
>> Hi,
>>
>> please, cc me, I am not subscribed to lkml.
>>
>>> Hi,
>>>
>>> [lkml.org still broken --> no accurate mail header info possible...]
>>>
>>> Just to ask the obvious:
>>> I assume using /sys/bus/pci/rescan does not help once it's broken?
>>> (since the machine comes up empty at initial-boot scan, too)
>> I will try it, too, but I am not sure it would work.
>>
>> Currently I can't test it because the last time I completely discharged
>> the battery. I also disconnected it to be able to get the realtek chip back
>> immediately for faster testing. Now, that I have reconnected the battery,
>> I need to wait for it to be charged somewhat to be able to reproduce
>> losing the network chip.
>>
>>> Also, you could try diffing lspci -vvxxx -s.... output
>>> of working vs. "distorting" kernel version - perhaps some register setup
>>> has been changed (e.g. due to power management improvements or some such),
>>> which may encourage the card
>>> to get a problematic/corrupt state.
>> I attached a tarball that contains lspci -vvxxx for
>> - all devices / only the network chip
>> - before / after "modprobe r8169"
>> - for all 3 kernel versions tested.
>>
>> I figured out that if I type the modprobe and lspci in the same command line,
>> I can get diagnostics out of the machine, after all.
>>
>> It's not just the Realtek chip that has changed parameters.
>>
>> (Vague idea) I noticed that some devices have changed like this:
>>
>> - Memory behind bridge: 80000000-801fffff
>> - Prefetchable memory behind bridge: 0000000080200000-00000000803fffff
>> + Memory behind bridge: ff000000-ff1fffff
>> + Prefetchable memory behind bridge: 00000000ff200000-00000000ff3fffff
>>
>> Can't this cause a problem? E.g. programming the bridge with an address range
>> that the bridge doesn't actually support?
> This worked in v3.18.16, but not in v4.0.5 or v4.1.0-rc8. You
> attached a v4.1.0-rc8 dmesg log earlier. Would you mind collecting a
> v3.18.16 dmesg log, so we can compare them?

I collected all 3 for you to compare them, compressed, attached.

BTW, I browsed git log and found 2ea3d266bab3b497238113b20136f7c3f69ad9c0
as suspicious. I will try the 4.0/4.1 kernels with this one reverted.

>
> These (from the v4.1.0-rc8 dmesg) look wrong, but I'll have to look at
> the code to see what might be going on:
>
> acpi PNP0A08:00: host bridge window expanded to [mem
> 0x00000000-0xffffffff window]; [mem 0x00000000-0xffffffff window]
> ignored
> pci 0000:00:1c.1: can't claim BAR 15 [mem 0xfdf00000-0xfdffffff
> 64bit pref]: address conflict with PCI Bus 0000:00 [mem
> 0xf0000000-0xfed8ffff window]
>
> Bjorn
>

Thanks,
Zoltán Böszörményi


Attachments:
dmesg.tgz (38.18 kB)

2015-06-21 15:38:03

by Böszörményi Zoltán

[permalink] [raw]
Subject: Re: ACPI regression? Was Re: Ethernet chip disappeared from lspci

2015-06-21 16:19 keltezéssel, Boszormenyi Zoltan írta:
> 2015-06-21 16:03 keltezéssel, Bjorn Helgaas írta:
>> [+cc linux-pci]
>>
>> Hi Boszormenyi,
>>
>> On Sun, Jun 21, 2015 at 5:34 AM, Boszormenyi Zoltan <[email protected]> wrote:
>>> Hi,
>>>
>>> please, cc me, I am not subscribed to lkml.
>>>
>>>> Hi,
>>>>
>>>> [lkml.org still broken --> no accurate mail header info possible...]
>>>>
>>>> Just to ask the obvious:
>>>> I assume using /sys/bus/pci/rescan does not help once it's broken?
>>>> (since the machine comes up empty at initial-boot scan, too)
>>> I will try it, too, but I am not sure it would work.
>>>
>>> Currently I can't test it because the last time I completely discharged
>>> the battery. I also disconnected it to be able to get the realtek chip back
>>> immediately for faster testing. Now, that I have reconnected the battery,
>>> I need to wait for it to be charged somewhat to be able to reproduce
>>> losing the network chip.
>>>
>>>> Also, you could try diffing lspci -vvxxx -s.... output
>>>> of working vs. "distorting" kernel version - perhaps some register setup
>>>> has been changed (e.g. due to power management improvements or some such),
>>>> which may encourage the card
>>>> to get a problematic/corrupt state.
>>> I attached a tarball that contains lspci -vvxxx for
>>> - all devices / only the network chip
>>> - before / after "modprobe r8169"
>>> - for all 3 kernel versions tested.
>>>
>>> I figured out that if I type the modprobe and lspci in the same command line,
>>> I can get diagnostics out of the machine, after all.
>>>
>>> It's not just the Realtek chip that has changed parameters.
>>>
>>> (Vague idea) I noticed that some devices have changed like this:
>>>
>>> - Memory behind bridge: 80000000-801fffff
>>> - Prefetchable memory behind bridge: 0000000080200000-00000000803fffff
>>> + Memory behind bridge: ff000000-ff1fffff
>>> + Prefetchable memory behind bridge: 00000000ff200000-00000000ff3fffff
>>>
>>> Can't this cause a problem? E.g. programming the bridge with an address range
>>> that the bridge doesn't actually support?
>> This worked in v3.18.16, but not in v4.0.5 or v4.1.0-rc8. You
>> attached a v4.1.0-rc8 dmesg log earlier. Would you mind collecting a
>> v3.18.16 dmesg log, so we can compare them?
> I collected all 3 for you to compare them, compressed, attached.
>
> BTW, I browsed git log and found 2ea3d266bab3b497238113b20136f7c3f69ad9c0
> as suspicious. I will try the 4.0/4.1 kernels with this one reverted.

Reverting this one didn't help.

>
>> These (from the v4.1.0-rc8 dmesg) look wrong, but I'll have to look at
>> the code to see what might be going on:
>>
>> acpi PNP0A08:00: host bridge window expanded to [mem
>> 0x00000000-0xffffffff window]; [mem 0x00000000-0xffffffff window]
>> ignored
>> pci 0000:00:1c.1: can't claim BAR 15 [mem 0xfdf00000-0xfdffffff
>> 64bit pref]: address conflict with PCI Bus 0000:00 [mem
>> 0xf0000000-0xfed8ffff window]
>>
>> Bjorn
>>
> Thanks,
> Zoltán Böszörményi
>

2015-06-21 17:25:14

by Jiang Liu

[permalink] [raw]
Subject: Re: ACPI regression? Was Re: Ethernet chip disappeared from lspci

On 2015/6/21 22:19, Boszormenyi Zoltan wrote:
> 2015-06-21 16:03 keltezéssel, Bjorn Helgaas írta:
>> [+cc linux-pci]
>>
>> Hi Boszormenyi,
>>
>> On Sun, Jun 21, 2015 at 5:34 AM, Boszormenyi Zoltan <[email protected]> wrote:
>>> Hi,
>>>
>>> please, cc me, I am not subscribed to lkml.
>>>
>>>> Hi,
>>>>
>>>> [lkml.org still broken --> no accurate mail header info possible...]
>>>>
>>>> Just to ask the obvious:
>>>> I assume using /sys/bus/pci/rescan does not help once it's broken?
>>>> (since the machine comes up empty at initial-boot scan, too)
>>> I will try it, too, but I am not sure it would work.
>>>
>>> Currently I can't test it because the last time I completely discharged
>>> the battery. I also disconnected it to be able to get the realtek chip back
>>> immediately for faster testing. Now, that I have reconnected the battery,
>>> I need to wait for it to be charged somewhat to be able to reproduce
>>> losing the network chip.
>>>
>>>> Also, you could try diffing lspci -vvxxx -s.... output
>>>> of working vs. "distorting" kernel version - perhaps some register setup
>>>> has been changed (e.g. due to power management improvements or some such),
>>>> which may encourage the card
>>>> to get a problematic/corrupt state.
>>> I attached a tarball that contains lspci -vvxxx for
>>> - all devices / only the network chip
>>> - before / after "modprobe r8169"
>>> - for all 3 kernel versions tested.
>>>
>>> I figured out that if I type the modprobe and lspci in the same command line,
>>> I can get diagnostics out of the machine, after all.
>>>
>>> It's not just the Realtek chip that has changed parameters.
>>>
>>> (Vague idea) I noticed that some devices have changed like this:
>>>
>>> - Memory behind bridge: 80000000-801fffff
>>> - Prefetchable memory behind bridge: 0000000080200000-00000000803fffff
>>> + Memory behind bridge: ff000000-ff1fffff
>>> + Prefetchable memory behind bridge: 00000000ff200000-00000000ff3fffff
>>>
>>> Can't this cause a problem? E.g. programming the bridge with an address range
>>> that the bridge doesn't actually support?
>> This worked in v3.18.16, but not in v4.0.5 or v4.1.0-rc8. You
>> attached a v4.1.0-rc8 dmesg log earlier. Would you mind collecting a
>> v3.18.16 dmesg log, so we can compare them?
>
> I collected all 3 for you to compare them, compressed, attached.
>
> BTW, I browsed git log and found 2ea3d266bab3b497238113b20136f7c3f69ad9c0
> as suspicious. I will try the 4.0/4.1 kernels with this one reverted.
>
>>
>> These (from the v4.1.0-rc8 dmesg) look wrong, but I'll have to look at
>> the code to see what might be going on:
>>
>> acpi PNP0A08:00: host bridge window expanded to [mem
>> 0x00000000-0xffffffff window]; [mem 0x00000000-0xffffffff window]
>> ignored
>> pci 0000:00:1c.1: can't claim BAR 15 [mem 0xfdf00000-0xfdffffff
>> 64bit pref]: address conflict with PCI Bus 0000:00 [mem
>> 0xf0000000-0xfed8ffff window]
>>
>> Bjorn
Hi Bjorn and Boszormenyi,
From the 3.18 kernel, we got a message:
[ 0.126248] acpi PNP0A08:00: host bridge window
[0x400000000-0xfffffffff] (ignored, not CPU addressable)
And from 4.1.-rc8, we got another message:
[ 0.127051] acpi PNP0A08:00: host bridge window expanded to [mem
0x00000000-0xffffffff window]; [mem 0x00000000-0xffffffff window] ignored

That smells like a 32bit overflow or 64bit cut-off issue.

Hi Boszormenyi, could you please help to provide acpidump from the
machine?
Thanks!
Gerry


2015-06-21 17:56:09

by Jiang Liu

[permalink] [raw]
Subject: Re: ACPI regression? Was Re: Ethernet chip disappeared from lspci

On 2015/6/22 1:25, Jiang Liu wrote:
[...]
>>>> - Memory behind bridge: 80000000-801fffff
>>>> - Prefetchable memory behind bridge: 0000000080200000-00000000803fffff
>>>> + Memory behind bridge: ff000000-ff1fffff
>>>> + Prefetchable memory behind bridge: 00000000ff200000-00000000ff3fffff
>>>>
>>>> Can't this cause a problem? E.g. programming the bridge with an address range
>>>> that the bridge doesn't actually support?
>>> This worked in v3.18.16, but not in v4.0.5 or v4.1.0-rc8. You
>>> attached a v4.1.0-rc8 dmesg log earlier. Would you mind collecting a
>>> v3.18.16 dmesg log, so we can compare them?
>>
>> I collected all 3 for you to compare them, compressed, attached.
>>
>> BTW, I browsed git log and found 2ea3d266bab3b497238113b20136f7c3f69ad9c0
>> as suspicious. I will try the 4.0/4.1 kernels with this one reverted.
>>
>>>
>>> These (from the v4.1.0-rc8 dmesg) look wrong, but I'll have to look at
>>> the code to see what might be going on:
>>>
>>> acpi PNP0A08:00: host bridge window expanded to [mem
>>> 0x00000000-0xffffffff window]; [mem 0x00000000-0xffffffff window]
>>> ignored
>>> pci 0000:00:1c.1: can't claim BAR 15 [mem 0xfdf00000-0xfdffffff
>>> 64bit pref]: address conflict with PCI Bus 0000:00 [mem
>>> 0xf0000000-0xfed8ffff window]
>>>
>>> Bjorn
> Hi Bjorn and Boszormenyi,
> From the 3.18 kernel, we got a message:
> [ 0.126248] acpi PNP0A08:00: host bridge window
> [0x400000000-0xfffffffff] (ignored, not CPU addressable)
> And from 4.1.-rc8, we got another message:
> [ 0.127051] acpi PNP0A08:00: host bridge window expanded to [mem
> 0x00000000-0xffffffff window]; [mem 0x00000000-0xffffffff window] ignored
>
> That smells like a 32bit overflow or 64bit cut-off issue.
Hi Bjorn and Boszormenyi,
With v3.18.6, it uses u64 to compare resource ranges. We changed to use
resource_size_t with recent changes, and resource_size_t
may be u32 or u64 depending on configuration. So resource range
[0x400000000-0xfffffffff] may have been cut-off as
[0x00000000-0xffffffff], thus cause the trouble.

Hi Boszormenyi,
Could you please help to try following test patch?
against v4.1-rc8?
Thanks!
Gerry
-------------------------------------------------------------------
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
index 8244f013f210..d7b8c392c420 100644
--- a/drivers/acpi/resource.c
+++ b/drivers/acpi/resource.c
@@ -206,6 +206,11 @@ static bool acpi_decode_space(struct resource_win *win,

res->start = attr->minimum;
res->end = attr->maximum;
+ if (res->start != attr->minimum || res->end != attr->maximum) {
+ pr_warn("resource window ([%#llx-%#llx] ignored, not CPU
addressable)\n",
+ attr->minimum, attr->maximum);
+ return false;
+ }

/*
* For bridges that translate addresses across the bridge,
-----------------------------------------------------------------------------

2015-06-21 18:28:29

by Böszörményi Zoltán

[permalink] [raw]
Subject: Re: ACPI regression? Was Re: Ethernet chip disappeared from lspci

2015-06-21 19:25 keltezéssel, Jiang Liu írta:
> On 2015/6/21 22:19, Boszormenyi Zoltan wrote:
>> 2015-06-21 16:03 keltezéssel, Bjorn Helgaas írta:
>>> [+cc linux-pci]
>>>
>>> Hi Boszormenyi,
>>>
>>> On Sun, Jun 21, 2015 at 5:34 AM, Boszormenyi Zoltan <[email protected]> wrote:
>>>> Hi,
>>>>
>>>> please, cc me, I am not subscribed to lkml.
>>>>
>>>>> Hi,
>>>>>
>>>>> [lkml.org still broken --> no accurate mail header info possible...]
>>>>>
>>>>> Just to ask the obvious:
>>>>> I assume using /sys/bus/pci/rescan does not help once it's broken?
>>>>> (since the machine comes up empty at initial-boot scan, too)
>>>> I will try it, too, but I am not sure it would work.
>>>>
>>>> Currently I can't test it because the last time I completely discharged
>>>> the battery. I also disconnected it to be able to get the realtek chip back
>>>> immediately for faster testing. Now, that I have reconnected the battery,
>>>> I need to wait for it to be charged somewhat to be able to reproduce
>>>> losing the network chip.
>>>>
>>>>> Also, you could try diffing lspci -vvxxx -s.... output
>>>>> of working vs. "distorting" kernel version - perhaps some register setup
>>>>> has been changed (e.g. due to power management improvements or some such),
>>>>> which may encourage the card
>>>>> to get a problematic/corrupt state.
>>>> I attached a tarball that contains lspci -vvxxx for
>>>> - all devices / only the network chip
>>>> - before / after "modprobe r8169"
>>>> - for all 3 kernel versions tested.
>>>>
>>>> I figured out that if I type the modprobe and lspci in the same command line,
>>>> I can get diagnostics out of the machine, after all.
>>>>
>>>> It's not just the Realtek chip that has changed parameters.
>>>>
>>>> (Vague idea) I noticed that some devices have changed like this:
>>>>
>>>> - Memory behind bridge: 80000000-801fffff
>>>> - Prefetchable memory behind bridge: 0000000080200000-00000000803fffff
>>>> + Memory behind bridge: ff000000-ff1fffff
>>>> + Prefetchable memory behind bridge: 00000000ff200000-00000000ff3fffff
>>>>
>>>> Can't this cause a problem? E.g. programming the bridge with an address range
>>>> that the bridge doesn't actually support?
>>> This worked in v3.18.16, but not in v4.0.5 or v4.1.0-rc8. You
>>> attached a v4.1.0-rc8 dmesg log earlier. Would you mind collecting a
>>> v3.18.16 dmesg log, so we can compare them?
>> I collected all 3 for you to compare them, compressed, attached.
>>
>> BTW, I browsed git log and found 2ea3d266bab3b497238113b20136f7c3f69ad9c0
>> as suspicious. I will try the 4.0/4.1 kernels with this one reverted.
>>
>>> These (from the v4.1.0-rc8 dmesg) look wrong, but I'll have to look at
>>> the code to see what might be going on:
>>>
>>> acpi PNP0A08:00: host bridge window expanded to [mem
>>> 0x00000000-0xffffffff window]; [mem 0x00000000-0xffffffff window]
>>> ignored
>>> pci 0000:00:1c.1: can't claim BAR 15 [mem 0xfdf00000-0xfdffffff
>>> 64bit pref]: address conflict with PCI Bus 0000:00 [mem
>>> 0xf0000000-0xfed8ffff window]
>>>
>>> Bjorn
> Hi Bjorn and Boszormenyi,
> From the 3.18 kernel, we got a message:
> [ 0.126248] acpi PNP0A08:00: host bridge window
> [0x400000000-0xfffffffff] (ignored, not CPU addressable)
> And from 4.1.-rc8, we got another message:
> [ 0.127051] acpi PNP0A08:00: host bridge window expanded to [mem
> 0x00000000-0xffffffff window]; [mem 0x00000000-0xffffffff window] ignored
>
> That smells like a 32bit overflow or 64bit cut-off issue.
>
> Hi Boszormenyi, could you please help to provide acpidump from the
> machine?

I already did in a previous mail which was only sent to LKML, but here it is again.

Thanks,
Zoltán



> Thanks!
> Gerry
>
>
>
>


Attachments:
acpidump.tgz (47.89 kB)

2015-06-21 18:56:09

by Böszörményi Zoltán

[permalink] [raw]
Subject: Re: ACPI regression? Was Re: Ethernet chip disappeared from lspci

2015-06-21 19:55 keltezéssel, Jiang Liu írta:
> On 2015/6/22 1:25, Jiang Liu wrote:
> [...]
>>>>> - Memory behind bridge: 80000000-801fffff
>>>>> - Prefetchable memory behind bridge: 0000000080200000-00000000803fffff
>>>>> + Memory behind bridge: ff000000-ff1fffff
>>>>> + Prefetchable memory behind bridge: 00000000ff200000-00000000ff3fffff
>>>>>
>>>>> Can't this cause a problem? E.g. programming the bridge with an address range
>>>>> that the bridge doesn't actually support?
>>>> This worked in v3.18.16, but not in v4.0.5 or v4.1.0-rc8. You
>>>> attached a v4.1.0-rc8 dmesg log earlier. Would you mind collecting a
>>>> v3.18.16 dmesg log, so we can compare them?
>>> I collected all 3 for you to compare them, compressed, attached.
>>>
>>> BTW, I browsed git log and found 2ea3d266bab3b497238113b20136f7c3f69ad9c0
>>> as suspicious. I will try the 4.0/4.1 kernels with this one reverted.
>>>
>>>> These (from the v4.1.0-rc8 dmesg) look wrong, but I'll have to look at
>>>> the code to see what might be going on:
>>>>
>>>> acpi PNP0A08:00: host bridge window expanded to [mem
>>>> 0x00000000-0xffffffff window]; [mem 0x00000000-0xffffffff window]
>>>> ignored
>>>> pci 0000:00:1c.1: can't claim BAR 15 [mem 0xfdf00000-0xfdffffff
>>>> 64bit pref]: address conflict with PCI Bus 0000:00 [mem
>>>> 0xf0000000-0xfed8ffff window]
>>>>
>>>> Bjorn
>> Hi Bjorn and Boszormenyi,
>> From the 3.18 kernel, we got a message:
>> [ 0.126248] acpi PNP0A08:00: host bridge window
>> [0x400000000-0xfffffffff] (ignored, not CPU addressable)
>> And from 4.1.-rc8, we got another message:
>> [ 0.127051] acpi PNP0A08:00: host bridge window expanded to [mem
>> 0x00000000-0xffffffff window]; [mem 0x00000000-0xffffffff window] ignored
>>
>> That smells like a 32bit overflow or 64bit cut-off issue.
> Hi Bjorn and Boszormenyi,
> With v3.18.6, it uses u64 to compare resource ranges. We changed to use
> resource_size_t with recent changes, and resource_size_t
> may be u32 or u64 depending on configuration. So resource range
> [0x400000000-0xfffffffff] may have been cut-off as
> [0x00000000-0xffffffff], thus cause the trouble.
>
> Hi Boszormenyi,
> Could you please help to try following test patch?
> against v4.1-rc8?

I have tried it. The result (dmesg, lspci before/after modprobe) is attached.
The "not CPU addressable" message shows up once in dmesg.
The device shows up in lspci and the module can be loaded. The previously
experienced sluggishness is gone now, but the network doesn't work after modprobe.
I think it was an expected outcome, since that particular range is ignored with this patch.

Thanks,
Zoltán

> Thanks!
> Gerry
> -------------------------------------------------------------------
> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
> index 8244f013f210..d7b8c392c420 100644
> --- a/drivers/acpi/resource.c
> +++ b/drivers/acpi/resource.c
> @@ -206,6 +206,11 @@ static bool acpi_decode_space(struct resource_win *win,
>
> res->start = attr->minimum;
> res->end = attr->maximum;
> + if (res->start != attr->minimum || res->end != attr->maximum) {
> + pr_warn("resource window ([%#llx-%#llx] ignored, not CPU
> addressable)\n",
> + attr->minimum, attr->maximum);
> + return false;
> + }
>
> /*
> * For bridges that translate addresses across the bridge,
> -----------------------------------------------------------------------------
>


Attachments:
dmesg-lspci-xx2.tgz (21.35 kB)

2015-06-21 19:59:23

by Böszörményi Zoltán

[permalink] [raw]
Subject: Re: ACPI regression? Was Re: Ethernet chip disappeared from lspci

2015-06-21 20:55 keltezéssel, Boszormenyi Zoltan írta:
> 2015-06-21 19:55 keltezéssel, Jiang Liu írta:
>> On 2015/6/22 1:25, Jiang Liu wrote:
>> [...]
>>>>>> - Memory behind bridge: 80000000-801fffff
>>>>>> - Prefetchable memory behind bridge: 0000000080200000-00000000803fffff
>>>>>> + Memory behind bridge: ff000000-ff1fffff
>>>>>> + Prefetchable memory behind bridge: 00000000ff200000-00000000ff3fffff
>>>>>>
>>>>>> Can't this cause a problem? E.g. programming the bridge with an address range
>>>>>> that the bridge doesn't actually support?
>>>>> This worked in v3.18.16, but not in v4.0.5 or v4.1.0-rc8. You
>>>>> attached a v4.1.0-rc8 dmesg log earlier. Would you mind collecting a
>>>>> v3.18.16 dmesg log, so we can compare them?
>>>> I collected all 3 for you to compare them, compressed, attached.
>>>>
>>>> BTW, I browsed git log and found 2ea3d266bab3b497238113b20136f7c3f69ad9c0
>>>> as suspicious. I will try the 4.0/4.1 kernels with this one reverted.
>>>>
>>>>> These (from the v4.1.0-rc8 dmesg) look wrong, but I'll have to look at
>>>>> the code to see what might be going on:
>>>>>
>>>>> acpi PNP0A08:00: host bridge window expanded to [mem
>>>>> 0x00000000-0xffffffff window]; [mem 0x00000000-0xffffffff window]
>>>>> ignored
>>>>> pci 0000:00:1c.1: can't claim BAR 15 [mem 0xfdf00000-0xfdffffff
>>>>> 64bit pref]: address conflict with PCI Bus 0000:00 [mem
>>>>> 0xf0000000-0xfed8ffff window]
>>>>>
>>>>> Bjorn
>>> Hi Bjorn and Boszormenyi,
>>> From the 3.18 kernel, we got a message:
>>> [ 0.126248] acpi PNP0A08:00: host bridge window
>>> [0x400000000-0xfffffffff] (ignored, not CPU addressable)
>>> And from 4.1.-rc8, we got another message:
>>> [ 0.127051] acpi PNP0A08:00: host bridge window expanded to [mem
>>> 0x00000000-0xffffffff window]; [mem 0x00000000-0xffffffff window] ignored
>>>
>>> That smells like a 32bit overflow or 64bit cut-off issue.
>> Hi Bjorn and Boszormenyi,
>> With v3.18.6, it uses u64 to compare resource ranges. We changed to use
>> resource_size_t with recent changes, and resource_size_t
>> may be u32 or u64 depending on configuration. So resource range
>> [0x400000000-0xfffffffff] may have been cut-off as
>> [0x00000000-0xffffffff], thus cause the trouble.
>>
>> Hi Boszormenyi,
>> Could you please help to try following test patch?
>> against v4.1-rc8?
> I have tried it. The result (dmesg, lspci before/after modprobe) is attached.
> The "not CPU addressable" message shows up once in dmesg.
> The device shows up in lspci and the module can be loaded. The previously
> experienced sluggishness is gone now, but the network doesn't work after modprobe.
> I think it was an expected outcome, since that particular range is ignored with this patch.

Hm, I can see a very similar message in 3.18.16, so it was not
the expected outcome.

After building the "official" r8168 from Realtek for 4.1.0-rc8,
the difference in lspci from the working 3.18.16 is nil, before
and after modprobe. (r8168 was build for 3.18.16, that's why.)

However, connman (similar to NetworkManager) still sees the network
connectivity as "down". I checked that the firmware files are there in
/lib/firmware/rtl_nic.

With r8168 (the "official" Realtek driver), the kernel message about
"link up" appears immediately and connman can configure the network.

I have tried the patch on 4.0.5, too, with the same result.

So, there may be another problem with the r8169 driver itself besides
this ACPI problem but no matter what I do, I can't seem to be able
to enable debugging messages for r8169.

So, for now I can use r8168 instead of r8169 with this patch.

Thanks,
Zoltán

>
> Thanks,
> Zoltán
>
>> Thanks!
>> Gerry
>> -------------------------------------------------------------------
>> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
>> index 8244f013f210..d7b8c392c420 100644
>> --- a/drivers/acpi/resource.c
>> +++ b/drivers/acpi/resource.c
>> @@ -206,6 +206,11 @@ static bool acpi_decode_space(struct resource_win *win,
>>
>> res->start = attr->minimum;
>> res->end = attr->maximum;
>> + if (res->start != attr->minimum || res->end != attr->maximum) {
>> + pr_warn("resource window ([%#llx-%#llx] ignored, not CPU
>> addressable)\n",
>> + attr->minimum, attr->maximum);
>> + return false;
>> + }
>>
>> /*
>> * For bridges that translate addresses across the bridge,
>> -----------------------------------------------------------------------------
>>

2015-06-23 04:10:32

by Jiang Liu

[permalink] [raw]
Subject: [Patch v1] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32bit kernel

The data type resource_size_t may be 32 bits or 64 bits depending on
CONFIG_PHYS_ADDR_T_64BIT. So reject ACPI resource descriptors which
will cause resource_size_t overflow with 32bit kernel

This issue was triggered on a platform running 32bit kernel with an
ACPI resource descriptor with address range [0x400000000-0xfffffffff].
Please refer to https://lkml.org/lkml/2015/6/19/277 for more information.

Reported-by: Boszormenyi Zoltan <[email protected]>
Fixes: 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation")
Signed-off-by: Jiang Liu <[email protected]>
Cc: [email protected] # 4.0
---
drivers/acpi/resource.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
index 8244f013f210..f1c966e05078 100644
--- a/drivers/acpi/resource.c
+++ b/drivers/acpi/resource.c
@@ -193,6 +193,7 @@ static bool acpi_decode_space(struct resource_win *win,
u8 iodec = attr->granularity == 0xfff ? ACPI_DECODE_10 : ACPI_DECODE_16;
bool wp = addr->info.mem.write_protect;
u64 len = attr->address_length;
+ u64 start, end, offset = 0;
struct resource *res = &win->res;

/*
@@ -204,9 +205,6 @@ static bool acpi_decode_space(struct resource_win *win,
pr_debug("ACPI: Invalid address space min_addr_fix %d, max_addr_fix %d, len %llx\n",
addr->min_address_fixed, addr->max_address_fixed, len);

- res->start = attr->minimum;
- res->end = attr->maximum;
-
/*
* For bridges that translate addresses across the bridge,
* translation_offset is the offset that must be added to the
@@ -214,12 +212,22 @@ static bool acpi_decode_space(struct resource_win *win,
* primary side. Non-bridge devices must list 0 for all Address
* Translation offset bits.
*/
- if (addr->producer_consumer == ACPI_PRODUCER) {
- res->start += attr->translation_offset;
- res->end += attr->translation_offset;
- } else if (attr->translation_offset) {
+ if (addr->producer_consumer == ACPI_PRODUCER)
+ offset = attr->translation_offset;
+ else if (attr->translation_offset)
pr_debug("ACPI: translation_offset(%lld) is invalid for non-bridge device.\n",
attr->translation_offset);
+ start = attr->minimum + offset;
+ end = attr->maximum + offset;
+
+ win->offset = offset;
+ res->start = start;
+ res->end = end;
+ if (sizeof(resource_size_t) < sizeof(u64) &&
+ (offset != win->offset || start != res->start || end != res->end)) {
+ pr_warn("acpi resource window ([%#llx-%#llx] ignored, not CPU addressable)\n",
+ attr->minimum, attr->maximum);
+ return false;
}

switch (addr->resource_type) {
@@ -236,8 +244,6 @@ static bool acpi_decode_space(struct resource_win *win,
return false;
}

- win->offset = attr->translation_offset;
-
if (addr->producer_consumer == ACPI_PRODUCER)
res->flags |= IORESOURCE_WINDOW;

--
1.7.10.4

2015-06-23 07:35:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: [Patch v1] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32bit kernel


* Jiang Liu <[email protected]> wrote:

> The data type resource_size_t may be 32 bits or 64 bits depending on
> CONFIG_PHYS_ADDR_T_64BIT. So reject ACPI resource descriptors which
> will cause resource_size_t overflow with 32bit kernel
>
> This issue was triggered on a platform running 32bit kernel with an
> ACPI resource descriptor with address range [0x400000000-0xfffffffff].
> Please refer to https://lkml.org/lkml/2015/6/19/277 for more information.
>
> Reported-by: Boszormenyi Zoltan <[email protected]>
> Fixes: 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation")
> Signed-off-by: Jiang Liu <[email protected]>
> Cc: [email protected] # 4.0

Yeah, so please use the customary changelog style we use in the kernel:

" Current code does (A), this causes problem (B) when doing (C).

In that case the user notices (D).

We can improve this doing (E), because now the user will experience (F),
which is more desirable."

Please fill in A-F accordingly.

In particular your changelog is missing 'B' and 'D': what exactly is a
'resource_size_t overflow' and what does the user notice from it?

Your changelog is also missing 'F'.

Thanks,

Ingo