2015-02-25 03:47:59

by Dave Airlie

[permalink] [raw]
Subject: regression in 4.0.0-rc1 with r8169 ethernet

Hey,

just booted an old AMD rs780 box with new kernel and my ethernet
failed to come up!

Looks like the MAC addr is all ff's because the PCI bridge windows are
messed up.

I've attached two dmesg one from a 3.19.0-rc6 I had on it, and one
failing from the 4.0.0-rc1 time.

b24e2bdde4af656bb0679a101265ebb8f8735d3c is latest Linus commit in
that tree (I have some radeon patches on top).

motherboard is a Gigabyte GA-MA78GM-S2H, lspci also attached.

Dave.


Attachments:
r8169-2.txt (79.11 kB)
r8169-fail.txt (83.80 kB)
mylspci (28.21 kB)
Download all attachments

2015-02-25 05:46:43

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: regression in 4.0.0-rc1 with r8169 ethernet

[+cc linux-pci]

On Tue, Feb 24, 2015 at 9:47 PM, Dave Airlie <[email protected]> wrote:
> Hey,
>
> just booted an old AMD rs780 box with new kernel and my ethernet
> failed to come up!
>
> Looks like the MAC addr is all ff's because the PCI bridge windows are
> messed up.
>
> I've attached two dmesg one from a 3.19.0-rc6 I had on it, and one
> failing from the 4.0.0-rc1 time.
>
> b24e2bdde4af656bb0679a101265ebb8f8735d3c is latest Linus commit in
> that tree (I have some radeon patches on top).
>
> motherboard is a Gigabyte GA-MA78GM-S2H, lspci also attached.

Hi Dave,

Looking, thanks for the report and sorry for the inconvenience.

Bjorn

2015-02-25 07:11:21

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: regression in 4.0.0-rc1 with r8169 ethernet

[+cc Jiang, Thomas, Lv, Rafael, linux-pci, linux-acpi]

On Tue, Feb 24, 2015 at 9:47 PM, Dave Airlie <[email protected]> wrote:
> Hey,
>
> just booted an old AMD rs780 box with new kernel and my ethernet
> failed to come up!
>
> Looks like the MAC addr is all ff's because the PCI bridge windows are
> messed up.
>
> I've attached two dmesg one from a 3.19.0-rc6 I had on it, and one
> failing from the 4.0.0-rc1 time.
>
> b24e2bdde4af656bb0679a101265ebb8f8735d3c is latest Linus commit in
> that tree (I have some radeon patches on top).
>
> motherboard is a Gigabyte GA-MA78GM-S2H, lspci also attached.

Here's the dmesg diff that looks relevant to me:

-pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7]
-pci_bus 0000:00: root bus resource [io 0x0d00-0xffff]
-pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
-pci_bus 0000:00: root bus resource [mem 0x000c0000-0x000dffff]
-pci_bus 0000:00: root bus resource [mem 0xfed40000-0xfed44fff]
-pci_bus 0000:00: root bus resource [mem 0x7ff00000-0xfebfffff]
+pci_bus 0000:00: root bus resource [io 0x0cf8-0x0cff]
+pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window]
+pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window]
+pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
+pci_bus 0000:00: root bus resource [mem 0x000c0000-0x000dffff window]
+pci_bus 0000:00: root bus resource [mem 0xfed40000-0xfed44fff window]

What's interesting is:

* v3.19 ignored [io 0x0cf8-0x0cff], but v4.0 includes it. I think
it's wrong to include it because that's the configuration space
address/data registers, so it's consumed by the host bridge and not
produced on the downstream side.

* v3.19 includes [mem 0x7ff00000-0xfebfffff], but v4.0 does not. This
is what's screwing up the devices.

I think all the windows should be marked as ACPI_PRODUCER in _CRS
since the space is "produced" on the downstream side of the bridge.
The [io 0x0cf8-0x0cff] region should probably be marked
ACPI_CONSUMER, and maybe that accounts for why v3.19 ignores it. But
I haven't found the code that does that yet.

I suspect this is all related to the ACPI resource parsing rework. I
looked through that briefly, but no issues jumped out at me, so this
is just a heads-up in case it is obvious to you guys.

Dave, it'd be useful if you could collect an acpidump so we can look
at the _CRS data in more detail.

Bjorn

2015-02-25 08:04:01

by Dave Airlie

[permalink] [raw]
Subject: Re: regression in 4.0.0-rc1 with r8169 ethernet

> * v3.19 ignored [io 0x0cf8-0x0cff], but v4.0 includes it. I think
> it's wrong to include it because that's the configuration space
> address/data registers, so it's consumed by the host bridge and not
> produced on the downstream side.
>
> * v3.19 includes [mem 0x7ff00000-0xfebfffff], but v4.0 does not. This
> is what's screwing up the devices.
>
> I think all the windows should be marked as ACPI_PRODUCER in _CRS
> since the space is "produced" on the downstream side of the bridge.
> The [io 0x0cf8-0x0cff] region should probably be marked
> ACPI_CONSUMER, and maybe that accounts for why v3.19 ignores it. But
> I haven't found the code that does that yet.
>
> I suspect this is all related to the ACPI resource parsing rework. I
> looked through that briefly, but no issues jumped out at me, so this
> is just a heads-up in case it is obvious to you guys.
>
> Dave, it'd be useful if you could collect an acpidump so we can look
> at the _CRS data in more detail.

acpidump fails here with a /dev/mem warning in the kernel,

now I'm not near the machine again until next week most likely, so I
only have ssh for now,
and the kernel it is running for which I don't have the source anymore!

is there any of tables from /sys I can grab instead?

Dave.

2015-02-25 22:19:04

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: regression in 4.0.0-rc1 with r8169 ethernet

On Wednesday, February 25, 2015 06:03:56 PM Dave Airlie wrote:
> > * v3.19 ignored [io 0x0cf8-0x0cff], but v4.0 includes it. I think
> > it's wrong to include it because that's the configuration space
> > address/data registers, so it's consumed by the host bridge and not
> > produced on the downstream side.
> >
> > * v3.19 includes [mem 0x7ff00000-0xfebfffff], but v4.0 does not. This
> > is what's screwing up the devices.
> >
> > I think all the windows should be marked as ACPI_PRODUCER in _CRS
> > since the space is "produced" on the downstream side of the bridge.
> > The [io 0x0cf8-0x0cff] region should probably be marked
> > ACPI_CONSUMER, and maybe that accounts for why v3.19 ignores it. But
> > I haven't found the code that does that yet.
> >
> > I suspect this is all related to the ACPI resource parsing rework. I
> > looked through that briefly, but no issues jumped out at me, so this
> > is just a heads-up in case it is obvious to you guys.
> >
> > Dave, it'd be useful if you could collect an acpidump so we can look
> > at the _CRS data in more detail.
>
> acpidump fails here with a /dev/mem warning in the kernel,
>
> now I'm not near the machine again until next week most likely, so I
> only have ssh for now,
> and the kernel it is running for which I don't have the source anymore!
>
> is there any of tables from /sys I can grab instead?

/sys/firmware/acpi/tables/DSDT
/sys/firmware/acpi/tables/SSDT*

Also I'm wondering if reverting commit 2ea3d266bab3 (ACPI: Translate
resource into master side address for bridge window resources) makes
any difference?

Rafael

2015-02-27 14:56:13

by Thomas Voegtle

[permalink] [raw]
Subject: Re: regression in 4.0.0-rc1 with r8169 ethernet


Hi,

I have the same problem with a Asrock Q1900B-ITX mainboard with
a Intel Celeron J1900 onboard.

I did a bisect and ended up with:

593669c2ac0fe18baee04a3cd5539a148aa48574 is the first bad commit

commit 593669c2ac0fe18baee04a3cd5539a148aa48574
Author: Jiang Liu <[email protected]>
Date: Thu Feb 5 13:44:46 2015 +0800

x86/PCI/ACPI: Use common ACPI resource interfaces to simplify
implementation


I can revert this quite big commit on current git head (4f671fe) with no
problems and then everything is fine again.



Thomas

2015-02-27 22:01:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: regression in 4.0.0-rc1 with r8169 ethernet

On Friday, February 27, 2015 03:50:32 PM Thomas Voegtle wrote:
>
> Hi,
>
> I have the same problem with a Asrock Q1900B-ITX mainboard with
> a Intel Celeron J1900 onboard.
>
> I did a bisect and ended up with:
>
> 593669c2ac0fe18baee04a3cd5539a148aa48574 is the first bad commit
>
> commit 593669c2ac0fe18baee04a3cd5539a148aa48574
> Author: Jiang Liu <[email protected]>
> Date: Thu Feb 5 13:44:46 2015 +0800
>
> x86/PCI/ACPI: Use common ACPI resource interfaces to simplify
> implementation
>
>
> I can revert this quite big commit on current git head (4f671fe) with no
> problems and then everything is fine again.

Thanks for nailing this one!

It really wasn't supposed to make any functional difference, though, so there
must be some subtle mistake that escaped everyone in it.

I'll have a look at that and hopefully Jiang Liu will be able to help in the
meantime too.


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2015-02-28 08:06:54

by Jiang Liu

[permalink] [raw]
Subject: Re: regression in 4.0.0-rc1 with r8169 ethernet

On 2015/2/28 6:24, Rafael J. Wysocki wrote:
> On Friday, February 27, 2015 03:50:32 PM Thomas Voegtle wrote:
>>
>> Hi,
>>
>> I have the same problem with a Asrock Q1900B-ITX mainboard with
>> a Intel Celeron J1900 onboard.
>>
>> I did a bisect and ended up with:
>>
>> 593669c2ac0fe18baee04a3cd5539a148aa48574 is the first bad commit
>>
>> commit 593669c2ac0fe18baee04a3cd5539a148aa48574
>> Author: Jiang Liu <[email protected]>
>> Date: Thu Feb 5 13:44:46 2015 +0800
>>
>> x86/PCI/ACPI: Use common ACPI resource interfaces to simplify
>> implementation
>>
>>
>> I can revert this quite big commit on current git head (4f671fe) with no
>> problems and then everything is fine again.
>
> Thanks for nailing this one!
>
> It really wasn't supposed to make any functional difference, though, so there
> must be some subtle mistake that escaped everyone in it.
>
> I'll have a look at that and hopefully Jiang Liu will be able to help in the
> meantime too.
Hi all,
Sorry for slow response, just return from Chinese New Holidays:)
Hi Thomas,
Could you please help to provide the dmesgs before and after the
revert?
Thanks!
Gerry

>
>

2015-02-28 08:36:43

by Marcel Holtmann

[permalink] [raw]
Subject: Re: regression in 4.0.0-rc1 with r8169 ethernet

Hi Jiang,

>>> I have the same problem with a Asrock Q1900B-ITX mainboard with
>>> a Intel Celeron J1900 onboard.
>>>
>>> I did a bisect and ended up with:
>>>
>>> 593669c2ac0fe18baee04a3cd5539a148aa48574 is the first bad commit
>>>
>>> commit 593669c2ac0fe18baee04a3cd5539a148aa48574
>>> Author: Jiang Liu <[email protected]>
>>> Date: Thu Feb 5 13:44:46 2015 +0800
>>>
>>> x86/PCI/ACPI: Use common ACPI resource interfaces to simplify
>>> implementation
>>>
>>>
>>> I can revert this quite big commit on current git head (4f671fe) with no
>>> problems and then everything is fine again.
>>
>> Thanks for nailing this one!
>>
>> It really wasn't supposed to make any functional difference, though, so there
>> must be some subtle mistake that escaped everyone in it.
>>
>> I'll have a look at that and hopefully Jiang Liu will be able to help in the
>> meantime too.
> Hi all,
> Sorry for slow response, just return from Chinese New Holidays:)
> Hi Thomas,
> Could you please help to provide the dmesgs before and after the
> revert?

just grab a Minnowboard Max and test this by yourself. It has the same problem. The MAC address is ff:ff:ff:ff:ff:ff with this patch. Once I reverted it, the MAC address is correctly read again.

Regards

Marcel

2015-02-28 08:43:44

by Jiang Liu

[permalink] [raw]
Subject: Re: regression in 4.0.0-rc1 with r8169 ethernet

On 2015/2/25 15:10, Bjorn Helgaas wrote:
> [+cc Jiang, Thomas, Lv, Rafael, linux-pci, linux-acpi]
>
> On Tue, Feb 24, 2015 at 9:47 PM, Dave Airlie <[email protected]> wrote:
>> Hey,
>>
>> just booted an old AMD rs780 box with new kernel and my ethernet
>> failed to come up!
>>
>> Looks like the MAC addr is all ff's because the PCI bridge windows are
>> messed up.
>>
>> I've attached two dmesg one from a 3.19.0-rc6 I had on it, and one
>> failing from the 4.0.0-rc1 time.
>>
>> b24e2bdde4af656bb0679a101265ebb8f8735d3c is latest Linus commit in
>> that tree (I have some radeon patches on top).
>>
>> motherboard is a Gigabyte GA-MA78GM-S2H, lspci also attached.
>
> Here's the dmesg diff that looks relevant to me:
>
> -pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7]
> -pci_bus 0000:00: root bus resource [io 0x0d00-0xffff]
> -pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
> -pci_bus 0000:00: root bus resource [mem 0x000c0000-0x000dffff]
> -pci_bus 0000:00: root bus resource [mem 0xfed40000-0xfed44fff]
> -pci_bus 0000:00: root bus resource [mem 0x7ff00000-0xfebfffff]
> +pci_bus 0000:00: root bus resource [io 0x0cf8-0x0cff]
> +pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window]
> +pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window]
> +pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
> +pci_bus 0000:00: root bus resource [mem 0x000c0000-0x000dffff window]
> +pci_bus 0000:00: root bus resource [mem 0xfed40000-0xfed44fff window]
>
> What's interesting is:
>
> * v3.19 ignored [io 0x0cf8-0x0cff], but v4.0 includes it. I think
> it's wrong to include it because that's the configuration space
> address/data registers, so it's consumed by the host bridge and not
> produced on the downstream side.
Hi Bjorn,
We should ignore resources occupied by host bridge itself.
Will work out a patch to fix this issue.

>
> * v3.19 includes [mem 0x7ff00000-0xfebfffff], but v4.0 does not. This
> is what's screwing up the devices.
I need more information to figure out why resource [mem
0x7ff00000-0xfebfffff] is ignored by new ACPI parser.
Could you please help to forward the original dmesg attachments?
Regards!
Gerry

>
> I think all the windows should be marked as ACPI_PRODUCER in _CRS
> since the space is "produced" on the downstream side of the bridge.
> The [io 0x0cf8-0x0cff] region should probably be marked
> ACPI_CONSUMER, and maybe that accounts for why v3.19 ignores it. But
> I haven't found the code that does that yet.
>
> I suspect this is all related to the ACPI resource parsing rework. I
> looked through that briefly, but no issues jumped out at me, so this
> is just a heads-up in case it is obvious to you guys.
>
> Dave, it'd be useful if you could collect an acpidump so we can look
> at the _CRS data in more detail.
>
> Bjorn
>

2015-02-28 08:45:20

by Jiang Liu

[permalink] [raw]
Subject: Re: regression in 4.0.0-rc1 with r8169 ethernet

On 2015/2/28 16:36, Marcel Holtmann wrote:
> Hi Jiang,
>
>>>> I have the same problem with a Asrock Q1900B-ITX mainboard with
>>>> a Intel Celeron J1900 onboard.
>>>>
>>>> I did a bisect and ended up with:
>>>>
>>>> 593669c2ac0fe18baee04a3cd5539a148aa48574 is the first bad commit
>>>>
>>>> commit 593669c2ac0fe18baee04a3cd5539a148aa48574
>>>> Author: Jiang Liu <[email protected]>
>>>> Date: Thu Feb 5 13:44:46 2015 +0800
>>>>
>>>> x86/PCI/ACPI: Use common ACPI resource interfaces to simplify
>>>> implementation
>>>>
>>>>
>>>> I can revert this quite big commit on current git head (4f671fe) with no
>>>> problems and then everything is fine again.
>>>
>>> Thanks for nailing this one!
>>>
>>> It really wasn't supposed to make any functional difference, though, so there
>>> must be some subtle mistake that escaped everyone in it.
>>>
>>> I'll have a look at that and hopefully Jiang Liu will be able to help in the
>>> meantime too.
>> Hi all,
>> Sorry for slow response, just return from Chinese New Holidays:)
>> Hi Thomas,
>> Could you please help to provide the dmesgs before and after the
>> revert?
>
> just grab a Minnowboard Max and test this by yourself. It has the same problem. The MAC address is ff:ff:ff:ff:ff:ff with this patch. Once I reverted it, the MAC address is correctly read again.
Hi Marcel,
I have no access to any Minnowboard Max board, so couldn't test
it by myself. Could you please help to forward the two dmesg files you
have sent to Bjorn in the original email?
Thanks!
Gerry

>
> Regards
>
> Marcel
>

2015-02-28 10:04:42

by Thomas Voegtle

[permalink] [raw]
Subject: Re: regression in 4.0.0-rc1 with r8169 ethernet

On Sat, 28 Feb 2015, Jiang Liu wrote:

> On 2015/2/28 6:24, Rafael J. Wysocki wrote:
>> On Friday, February 27, 2015 03:50:32 PM Thomas Voegtle wrote:
>>>
>>> Hi,
>>>
>>> I have the same problem with a Asrock Q1900B-ITX mainboard with
>>> a Intel Celeron J1900 onboard.
>>>
>>> I did a bisect and ended up with:
>>>
>>> 593669c2ac0fe18baee04a3cd5539a148aa48574 is the first bad commit
>>>
>>> commit 593669c2ac0fe18baee04a3cd5539a148aa48574
>>> Author: Jiang Liu <[email protected]>
>>> Date: Thu Feb 5 13:44:46 2015 +0800
>>>
>>> x86/PCI/ACPI: Use common ACPI resource interfaces to simplify
>>> implementation
>>>
>>>
>>> I can revert this quite big commit on current git head (4f671fe) with no
>>> problems and then everything is fine again.
>>
>> Thanks for nailing this one!
>>
>> It really wasn't supposed to make any functional difference, though, so there
>> must be some subtle mistake that escaped everyone in it.
>>
>> I'll have a look at that and hopefully Jiang Liu will be able to help in the
>> meantime too.
> Hi all,
> Sorry for slow response, just return from Chinese New Holidays:)
> Hi Thomas,
> Could you please help to provide the dmesgs before and after the
> revert?


Attached.

Thanks,

Thomas


Attachments:
4.0.0-rc1-00036-withRevert.dmesg.txt (42.19 kB)
4.0.0-rc1-00036-g4f671fe.dmesg.txt (49.31 kB)
Download all attachments