2020-02-22 11:13:43

by Martin Volf

[permalink] [raw]
Subject: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

Hello,

hardware monitoring sensors NCT6796D on my Asus PRIME Z390M-PLUS
motherboard with Intel i7-9700 CPU don't work with 5.4 and newer linux
kernels, the driver nct6775 does not load.

It is working OK in version 5.3. I have used almost all released stable
versions from 5.3.8 to 5.3.16; I didn't try older kernels.

Even on new kernels the sensors-detect finds the sensors:
Found `Nuvoton NCT6796D Super IO Sensors' Success!
(address 0x290, driver `nct6775')
but "modprobe nct6775" says:
ERROR: could not insert 'nct6775': No such device
There is nothing interesting in dmesg.

git bisect found out the first bad commit is
b84398d6d7f900805662b1619223fd644d862d7c,
i2c: i801: Use iTCO version 6 in Cannon Lake PCH and beyond

Unfortunately I am not able to revert it in v5.4 to confirm it is really
the culprit.

Is there a way to have working hwmon sensors on my system in newer linux
kernels?

Thanks,

Martin

--8<--
lspci
00:00.0 Host bridge: Intel Corporation 8th Gen Core 8-core Desktop
Processor Host Bridge/DRAM Registers [Coffee Lake S] (rev 0d)
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630
(Desktop 9 Series) (rev 02)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI
Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH
HECI Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Cannon Lake PCH SATA AHCI
Controller (rev 10)
00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root
Port #17 (rev f0)
00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root
Port #1 (rev f0)
00:1c.6 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root
Port #7 (rev f0)
00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root
Port #9 (rev f0)
00:1f.0 ISA bridge: Intel Corporation Z390 Chipset LPC/eSPI Controller (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake
PCH SPI Controller (rev 10)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7)
I219-V (rev 10)


2020-02-22 11:52:34

by Wolfram Sang

[permalink] [raw]
Subject: Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

Hi Martin,

On Sat, Feb 22, 2020 at 12:13:07PM +0100, Martin Volf wrote:
> Hello,
>
> hardware monitoring sensors NCT6796D on my Asus PRIME Z390M-PLUS
> motherboard with Intel i7-9700 CPU don't work with 5.4 and newer linux
> kernels, the driver nct6775 does not load.
>
> It is working OK in version 5.3. I have used almost all released stable
> versions from 5.3.8 to 5.3.16; I didn't try older kernels.
>
> Even on new kernels the sensors-detect finds the sensors:
> Found `Nuvoton NCT6796D Super IO Sensors' Success!
> (address 0x290, driver `nct6775')
> but "modprobe nct6775" says:
> ERROR: could not insert 'nct6775': No such device
> There is nothing interesting in dmesg.
>
> git bisect found out the first bad commit is
> b84398d6d7f900805662b1619223fd644d862d7c,
> i2c: i801: Use iTCO version 6 in Cannon Lake PCH and beyond
>
> Unfortunately I am not able to revert it in v5.4 to confirm it is really
> the culprit.
>
> Is there a way to have working hwmon sensors on my system in newer linux
> kernels?

Well, it worked before, so I am quite sure it can be fixed. Thank you
very much for your detailed regression report! Sadly, I am not familiar
enough with those drivers, but you put the right people on CC, so I
think you will get more feedback within the next days. I'll keep an eye
on this, too.

Happy hacking,

Wolfram

>
> Thanks,
>
> Martin
>
> --8<--
> lspci
> 00:00.0 Host bridge: Intel Corporation 8th Gen Core 8-core Desktop
> Processor Host Bridge/DRAM Registers [Coffee Lake S] (rev 0d)
> 00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630
> (Desktop 9 Series) (rev 02)
> 00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI
> Host Controller (rev 10)
> 00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
> 00:16.0 Communication controller: Intel Corporation Cannon Lake PCH
> HECI Controller (rev 10)
> 00:17.0 SATA controller: Intel Corporation Cannon Lake PCH SATA AHCI
> Controller (rev 10)
> 00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root
> Port #17 (rev f0)
> 00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root
> Port #1 (rev f0)
> 00:1c.6 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root
> Port #7 (rev f0)
> 00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root
> Port #9 (rev f0)
> 00:1f.0 ISA bridge: Intel Corporation Z390 Chipset LPC/eSPI Controller (rev 10)
> 00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
> 00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake
> PCH SPI Controller (rev 10)
> 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7)
> I219-V (rev 10)


Attachments:
(No filename) (2.82 kB)
signature.asc (849.00 B)
Download all attachments

2020-02-22 15:41:41

by Guenter Roeck

[permalink] [raw]
Subject: Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

On 2/22/20 3:13 AM, Martin Volf wrote:
> Hello,
>
> hardware monitoring sensors NCT6796D on my Asus PRIME Z390M-PLUS
> motherboard with Intel i7-9700 CPU don't work with 5.4 and newer linux
> kernels, the driver nct6775 does not load.
>
> It is working OK in version 5.3. I have used almost all released stable
> versions from 5.3.8 to 5.3.16; I didn't try older kernels.
>
> Even on new kernels the sensors-detect finds the sensors:
> Found `Nuvoton NCT6796D Super IO Sensors' Success!
> (address 0x290, driver `nct6775')
> but "modprobe nct6775" says:
> ERROR: could not insert 'nct6775': No such device
> There is nothing interesting in dmesg.
>

My wild guess would be that the i801 driver is a bit aggressive with
reserving memory spaces, but I don't immediately see what it does
differently in that regard after the offending patch. Does it work
if you unload the i2c_i801 driver first ?

You could also try to compare the output of /proc/ioports with
the old and the new kernel, and see if the IO address space used
by nct6775 in v5.3 is assigned to the i801 driver (or some other
driver, such as the watchdog driver) in v5.4.

If you are into hacking the kernel, you could also add some
debug messages into the nct6775 driver to find out where exactly
it fails. If that helps, maybe we can then add those messages into
into the driver source to help others if this is observed again.

Thanks,
Guenter

2020-02-22 17:57:22

by Martin Volf

[permalink] [raw]
Subject: Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

Hello,

On Sat, Feb 22, 2020 at 4:41 PM Guenter Roeck <[email protected]> wrote:
> On 2/22/20 3:13 AM, Martin Volf wrote:
> > hardware monitoring sensors NCT6796D on my Asus PRIME Z390M-PLUS
> > motherboard with Intel i7-9700 CPU don't work with 5.4 and newer linux
> > kernels, the driver nct6775 does not load.
> >
> > It is working OK in version 5.3. I have used almost all released stable
> > versions from 5.3.8 to 5.3.16; I didn't try older kernels.
...
> My wild guess would be that the i801 driver is a bit aggressive with
> reserving memory spaces, but I don't immediately see what it does
> differently in that regard after the offending patch. Does it work
> if you unload the i2c_i801 driver first ?

Yes, after unloading i2c_i801, the nct6775 works. There is definitely
some sort of race between these two drivers. Mostly i2c_i801 wins, but it
happened twice that nct6775 got automatically loaded just before i2c_i801
and the sensors worked.

> You could also try to compare the output of /proc/ioports with
> the old and the new kernel, and see if the IO address space used
> by nct6775 in v5.3 is assigned to the i801 driver (or some other
> driver, such as the watchdog driver) in v5.4.

This is diff of /proc/ioports in 5.3.18 with loaded nct6775 and in
5.4.21 without:

--- ioports-5.3.18
+++ ioports-5.4.21
@@ -2,6 +2,7 @@
0000-001f : dma1
0020-0021 : pic1
002e-0031 : iTCO_wdt
+ 002e-0031 : iTCO_wdt
0040-0043 : timer0
0050-0053 : timer1
0060-0060 : keyboard
@@ -14,11 +15,10 @@
00f0-00ff : fpu
00f0-00f0 : PNP0C04:00
0290-029f : pnp 00:01
- 0295-0296 : nct6775
- 0295-0296 : nct6775
03c0-03df : vga+
03f8-03ff : serial
0400-041f : iTCO_wdt
+ 0400-041f : iTCO_wdt
0680-069f : pnp 00:03
0cf8-0cff : PCI conf1
0d00-ffff : PCI Bus 0000:00

> If you are into hacking the kernel, you could also add some
> debug messages into the nct6775 driver to find out where exactly
> it fails. If that helps, maybe we can then add those messages into
> into the driver source to help others if this is observed again.

I have added some pr_info calls, the diff is at the and of this massage.

"bad" dmesg (i.e. i2c_i801 loaded before modprobe nct6775)

[ 1631.975392] nct6775: ### sensors_nct6775_init:
platform_driver_register() -> 0x0
[ 1631.975396] nct6775: ### nct6775_find: superio_enter(0x2e) -> 0xfffffff0
[ 1631.975417] nct6775: ### nct6775_find: superio_enter(0x4e) -> 0x0
[ 1631.975455] nct6775: ### nct6775_find: (val & SIO_ID_MASK) == 0xffff

"good" dmesg (rmmod i2c_i801; modprobe nct6775)

[ 1730.751188] nct6775: ### sensors_nct6775_init:
platform_driver_register() -> 0x0
[ 1730.751213] nct6775: ### nct6775_find: superio_enter(0x2e) -> 0x0
[ 1730.751251] nct6775: ### nct6775_find: (val & SIO_ID_MASK) == 0xd42b
[ 1730.751359] nct6775: Found NCT6798D or compatible chip at 0x2e:0x290
[ 1730.751367] ACPI Warning: SystemIO range
0x0000000000000295-0x0000000000000296 conflicts with OpRegion
0x0000000000000290-0x0000000000000299 (\AMW0.SHWM)
(20190816/utaddress-204)
[ 1730.751379] ACPI: This conflict may cause random problems and
system instability
[ 1730.751381] ACPI: If an ACPI driver is available for this device,
you should use it instead of the native driver
[ 1730.751431] nct6775: ### nct6775_probe: platform_get_resource() -> 0xfdac7b00
[ 1730.751434] nct6775: ### nct6775_probe: devm_request_region() -> 0
[ 1730.751554] nct6775 nct6775.656: Invalid temperature source 28 at
index 0, source register 0x100, temp register 0x73
[ 1730.751588] nct6775 nct6775.656: Invalid temperature source 28 at
index 1, source register 0x200, temp register 0x75
[ 1730.751686] nct6775 nct6775.656: Invalid temperature source 28 at
index 4, source register 0x900, temp register 0x7b
[ 1730.751865] nct6775: ### nct6775_probe: superio_enter(0x2e) -> 0x0
[ 1730.753685] nct6775: ### nct6775_find: superio_enter(0x4e) -> 0x0
[ 1730.753722] nct6775: ### nct6775_find: (val & SIO_ID_MASK) == 0xffff

So 0x2e is the resource the two drivers are fighting for.

I have created /etc/modprobe.d/nct6775-before-i2c_i801.conf with
install i2c_i801 /sbin/modprobe nct6775; /sbin/modprobe
--ignore-install i2c_i801

and it is working. I'm OK with this workaround, but I can do more
experiments if you instruct me what to try.

Thanks,

Martin

--8<--
--- nct6775.c.orig
+++ nct6775.c
@@ -3806,10 +3806,13 @@ static int nct6775_probe(struct platform
int num_attr_groups = 0;

res = platform_get_resource(pdev, IORESOURCE_IO, 0);
+pr_info("### nct6775_probe: platform_get_resource() -> 0x%x\n", res);
if (!devm_request_region(&pdev->dev, res->start, IOREGION_LENGTH,
DRVNAME))
return -EBUSY;

+pr_info("### nct6775_probe: devm_request_region() -> 0");
+
data = devm_kzalloc(&pdev->dev, sizeof(struct nct6775_data),
GFP_KERNEL);
if (!data)
@@ -4318,6 +4321,7 @@ static int nct6775_probe(struct platform

break;
default:
+pr_info("### nct6775_probe: data->kind == 0x%x\n", data->kind);
return -ENODEV;
}
data->have_in = BIT(data->in_num) - 1;
@@ -4503,6 +4507,7 @@ static int nct6775_probe(struct platform
nct6775_init_device(data);

err = superio_enter(sio_data->sioreg);
+pr_info("### nct6775_probe: superio_enter(0x%x) -> 0x%x\n",
sio_data->sioreg, err);
if (err)
return err;

@@ -4729,6 +4734,7 @@ static int __init nct6775_find(int sioad
int addr;

err = superio_enter(sioaddr);
+pr_info("### nct6775_find: superio_enter(0x%x) -> 0x%x\n", sioaddr, err);
if (err)
return err;

@@ -4737,6 +4743,7 @@ static int __init nct6775_find(int sioad
if (force_id && val != 0xffff)
val = force_id;

+pr_info("### nct6775_find: (val & SIO_ID_MASK) == 0x%04x\n", val);
switch (val & SIO_ID_MASK) {
case SIO_NCT6106_ID:
sio_data->kind = nct6106;
@@ -4831,6 +4838,7 @@ static int __init sensors_nct6775_init(v
int sioaddr[2] = { 0x2e, 0x4e };

err = platform_driver_register(&nct6775_driver);
+pr_info("### sensors_nct6775_init: platform_driver_register() -> 0x%x\n", err);
if (err)
return err;

2020-02-22 19:06:14

by Guenter Roeck

[permalink] [raw]
Subject: Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

On 2/22/20 9:55 AM, Martin Volf wrote:
> Hello,
>
> On Sat, Feb 22, 2020 at 4:41 PM Guenter Roeck <[email protected]> wrote:
>> On 2/22/20 3:13 AM, Martin Volf wrote:
>>> hardware monitoring sensors NCT6796D on my Asus PRIME Z390M-PLUS
>>> motherboard with Intel i7-9700 CPU don't work with 5.4 and newer linux
>>> kernels, the driver nct6775 does not load.
>>>
>>> It is working OK in version 5.3. I have used almost all released stable
>>> versions from 5.3.8 to 5.3.16; I didn't try older kernels.
> ...
>> My wild guess would be that the i801 driver is a bit aggressive with
>> reserving memory spaces, but I don't immediately see what it does
>> differently in that regard after the offending patch. Does it work
>> if you unload the i2c_i801 driver first ?
>
> Yes, after unloading i2c_i801, the nct6775 works. There is definitely
> some sort of race between these two drivers. Mostly i2c_i801 wins, but it
> happened twice that nct6775 got automatically loaded just before i2c_i801
> and the sensors worked.
>
>> You could also try to compare the output of /proc/ioports with
>> the old and the new kernel, and see if the IO address space used
>> by nct6775 in v5.3 is assigned to the i801 driver (or some other
>> driver, such as the watchdog driver) in v5.4.
>
> This is diff of /proc/ioports in 5.3.18 with loaded nct6775 and in
> 5.4.21 without:
>
> --- ioports-5.3.18
> +++ ioports-5.4.21
> @@ -2,6 +2,7 @@
> 0000-001f : dma1
> 0020-0021 : pic1
> 002e-0031 : iTCO_wdt
> + 002e-0031 : iTCO_wdt
> 0040-0043 : timer0
> 0050-0053 : timer1
> 0060-0060 : keyboard
> @@ -14,11 +15,10 @@
> 00f0-00ff : fpu
> 00f0-00f0 : PNP0C04:00
> 0290-029f : pnp 00:01
> - 0295-0296 : nct6775
> - 0295-0296 : nct6775
> 03c0-03df : vga+
> 03f8-03ff : serial
> 0400-041f : iTCO_wdt
> + 0400-041f : iTCO_wdt
> 0680-069f : pnp 00:03
> 0cf8-0cff : PCI conf1
> 0d00-ffff : PCI Bus 0000:00
>
>> If you are into hacking the kernel, you could also add some
>> debug messages into the nct6775 driver to find out where exactly
>> it fails. If that helps, maybe we can then add those messages into
>> into the driver source to help others if this is observed again.
>
> I have added some pr_info calls, the diff is at the and of this massage.
>
> "bad" dmesg (i.e. i2c_i801 loaded before modprobe nct6775)
>
> [ 1631.975392] nct6775: ### sensors_nct6775_init:
> platform_driver_register() -> 0x0
> [ 1631.975396] nct6775: ### nct6775_find: superio_enter(0x2e) -> 0xfffffff0
> [ 1631.975417] nct6775: ### nct6775_find: superio_enter(0x4e) -> 0x0
> [ 1631.975455] nct6775: ### nct6775_find: (val & SIO_ID_MASK) == 0xffff
>
> "good" dmesg (rmmod i2c_i801; modprobe nct6775)
>
> [ 1730.751188] nct6775: ### sensors_nct6775_init:
> platform_driver_register() -> 0x0
> [ 1730.751213] nct6775: ### nct6775_find: superio_enter(0x2e) -> 0x0
> [ 1730.751251] nct6775: ### nct6775_find: (val & SIO_ID_MASK) == 0xd42b
> [ 1730.751359] nct6775: Found NCT6798D or compatible chip at 0x2e:0x290
> [ 1730.751367] ACPI Warning: SystemIO range
> 0x0000000000000295-0x0000000000000296 conflicts with OpRegion
> 0x0000000000000290-0x0000000000000299 (\AMW0.SHWM)
> (20190816/utaddress-204)
> [ 1730.751379] ACPI: This conflict may cause random problems and
> system instability
> [ 1730.751381] ACPI: If an ACPI driver is available for this device,
> you should use it instead of the native driver
> [ 1730.751431] nct6775: ### nct6775_probe: platform_get_resource() -> 0xfdac7b00
> [ 1730.751434] nct6775: ### nct6775_probe: devm_request_region() -> 0
> [ 1730.751554] nct6775 nct6775.656: Invalid temperature source 28 at
> index 0, source register 0x100, temp register 0x73
> [ 1730.751588] nct6775 nct6775.656: Invalid temperature source 28 at
> index 1, source register 0x200, temp register 0x75
> [ 1730.751686] nct6775 nct6775.656: Invalid temperature source 28 at
> index 4, source register 0x900, temp register 0x7b
> [ 1730.751865] nct6775: ### nct6775_probe: superio_enter(0x2e) -> 0x0
> [ 1730.753685] nct6775: ### nct6775_find: superio_enter(0x4e) -> 0x0
> [ 1730.753722] nct6775: ### nct6775_find: (val & SIO_ID_MASK) == 0xffff
>
> So 0x2e is the resource the two drivers are fighting for.
>

Yes, and it should not do that, since the range can be used to access
different segments of the same chip from multiple drivers. This region
should only be reserved temporarily, using request_muxed_region() when
needed and release_region() after the access is complete. Either case,
I don't immediately see why that region would be interesting for the
iTCO watchdog driver.

Can you add some debugging into the i801 driver to see what memory regions
it reserves, and how it gets to reserve 0x2e..0x31 ? That range really
doesn't make any sense to me.

Thanks,
Guenter

> I have created /etc/modprobe.d/nct6775-before-i2c_i801.conf with
> install i2c_i801 /sbin/modprobe nct6775; /sbin/modprobe
> --ignore-install i2c_i801
>
> and it is working. I'm OK with this workaround, but I can do more
> experiments if you instruct me what to try.
>
> Thanks,
>
> Martin
>
> --8<--
> --- nct6775.c.orig
> +++ nct6775.c
> @@ -3806,10 +3806,13 @@ static int nct6775_probe(struct platform
> int num_attr_groups = 0;
>
> res = platform_get_resource(pdev, IORESOURCE_IO, 0);
> +pr_info("### nct6775_probe: platform_get_resource() -> 0x%x\n", res);
> if (!devm_request_region(&pdev->dev, res->start, IOREGION_LENGTH,
> DRVNAME))
> return -EBUSY;
>
> +pr_info("### nct6775_probe: devm_request_region() -> 0");
> +
> data = devm_kzalloc(&pdev->dev, sizeof(struct nct6775_data),
> GFP_KERNEL);
> if (!data)
> @@ -4318,6 +4321,7 @@ static int nct6775_probe(struct platform
>
> break;
> default:
> +pr_info("### nct6775_probe: data->kind == 0x%x\n", data->kind);
> return -ENODEV;
> }
> data->have_in = BIT(data->in_num) - 1;
> @@ -4503,6 +4507,7 @@ static int nct6775_probe(struct platform
> nct6775_init_device(data);
>
> err = superio_enter(sio_data->sioreg);
> +pr_info("### nct6775_probe: superio_enter(0x%x) -> 0x%x\n",
> sio_data->sioreg, err);
> if (err)
> return err;
>
> @@ -4729,6 +4734,7 @@ static int __init nct6775_find(int sioad
> int addr;
>
> err = superio_enter(sioaddr);
> +pr_info("### nct6775_find: superio_enter(0x%x) -> 0x%x\n", sioaddr, err);
> if (err)
> return err;
>
> @@ -4737,6 +4743,7 @@ static int __init nct6775_find(int sioad
> if (force_id && val != 0xffff)
> val = force_id;
>
> +pr_info("### nct6775_find: (val & SIO_ID_MASK) == 0x%04x\n", val);
> switch (val & SIO_ID_MASK) {
> case SIO_NCT6106_ID:
> sio_data->kind = nct6106;
> @@ -4831,6 +4838,7 @@ static int __init sensors_nct6775_init(v
> int sioaddr[2] = { 0x2e, 0x4e };
>
> err = platform_driver_register(&nct6775_driver);
> +pr_info("### sensors_nct6775_init: platform_driver_register() -> 0x%x\n", err);
> if (err)
> return err;
>

2020-02-22 20:49:37

by Martin Volf

[permalink] [raw]
Subject: Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

Hello,

On Sat, Feb 22, 2020 at 8:05 PM Guenter Roeck <[email protected]> wrote:
> On 2/22/20 9:55 AM, Martin Volf wrote:
> > On Sat, Feb 22, 2020 at 4:41 PM Guenter Roeck <[email protected]> wrote:
> >> On 2/22/20 3:13 AM, Martin Volf wrote:
> >>> hardware monitoring sensors NCT6796D on my Asus PRIME Z390M-PLUS
> >>> motherboard with Intel i7-9700 CPU don't work with 5.4 and newer linux
> >>> kernels, the driver nct6775 does not load.
> >>>
> >>> It is working OK in version 5.3. I have used almost all released stable
> >>> versions from 5.3.8 to 5.3.16; I didn't try older kernels.
> > ...
> >> My wild guess would be that the i801 driver is a bit aggressive with
> >> reserving memory spaces, but I don't immediately see what it does
> >> differently in that regard after the offending patch. Does it work
> >> if you unload the i2c_i801 driver first ?
> >
> > Yes, after unloading i2c_i801, the nct6775 works.
...
> > This is diff of /proc/ioports in 5.3.18 with loaded nct6775 and in
> > 5.4.21 without:
> >
> > --- ioports-5.3.18
> > +++ ioports-5.4.21
> > @@ -2,6 +2,7 @@
> > 0000-001f : dma1
> > 0020-0021 : pic1
> > 002e-0031 : iTCO_wdt
> > + 002e-0031 : iTCO_wdt
> > 0040-0043 : timer0
> > 0050-0053 : timer1
...
> > So 0x2e is the resource the two drivers are fighting for.
...
> Yes, and it should not do that, since the range can be used to access
> different segments of the same chip from multiple drivers. This region
> should only be reserved temporarily, using request_muxed_region() when
> needed and release_region() after the access is complete. Either case,
> I don't immediately see why that region would be interesting for the
> iTCO watchdog driver.
>
> Can you add some debugging into the i801 driver to see what memory regions
> it reserves, and how it gets to reserve 0x2e..0x31 ? That range really
> doesn't make any sense to me.

in the function i801_add_tco() in drivers/i2c/busses/i2c-i801.c
(line 1601 in 5.4.21), there is this code:

/*
* Power Management registers.
*/
devfn = PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 2);
pci_bus_read_config_dword(pci_dev->bus, devfn, ACPIBASE, &base_addr);

res = &tco_res[ICH_RES_IO_SMI];
res->start = (base_addr & ~1) + ACPIBASE_SMI_OFF;
res->end = res->start + 3;
res->flags = IORESOURCE_IO;

base_addr is 0xffffffff after pci_bus_read_config_dword() call.
ACPIBASE_SMI_OFF is 0x030, therefore res->start is 0x2e.
Not that I understand even a bit of this...

Martin

2020-02-22 21:27:45

by Guenter Roeck

[permalink] [raw]
Subject: Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

On 2/22/20 12:49 PM, Martin Volf wrote:
> Hello,
>
> On Sat, Feb 22, 2020 at 8:05 PM Guenter Roeck <[email protected]> wrote:
>> On 2/22/20 9:55 AM, Martin Volf wrote:
>>> On Sat, Feb 22, 2020 at 4:41 PM Guenter Roeck <[email protected]> wrote:
>>>> On 2/22/20 3:13 AM, Martin Volf wrote:
>>>>> hardware monitoring sensors NCT6796D on my Asus PRIME Z390M-PLUS
>>>>> motherboard with Intel i7-9700 CPU don't work with 5.4 and newer linux
>>>>> kernels, the driver nct6775 does not load.
>>>>>
>>>>> It is working OK in version 5.3. I have used almost all released stable
>>>>> versions from 5.3.8 to 5.3.16; I didn't try older kernels.
>>> ...
>>>> My wild guess would be that the i801 driver is a bit aggressive with
>>>> reserving memory spaces, but I don't immediately see what it does
>>>> differently in that regard after the offending patch. Does it work
>>>> if you unload the i2c_i801 driver first ?
>>>
>>> Yes, after unloading i2c_i801, the nct6775 works.
> ...
>>> This is diff of /proc/ioports in 5.3.18 with loaded nct6775 and in
>>> 5.4.21 without:
>>>
>>> --- ioports-5.3.18
>>> +++ ioports-5.4.21
>>> @@ -2,6 +2,7 @@
>>> 0000-001f : dma1
>>> 0020-0021 : pic1
>>> 002e-0031 : iTCO_wdt
>>> + 002e-0031 : iTCO_wdt
>>> 0040-0043 : timer0
>>> 0050-0053 : timer1
> ...
>>> So 0x2e is the resource the two drivers are fighting for.
> ...
>> Yes, and it should not do that, since the range can be used to access
>> different segments of the same chip from multiple drivers. This region
>> should only be reserved temporarily, using request_muxed_region() when
>> needed and release_region() after the access is complete. Either case,
>> I don't immediately see why that region would be interesting for the
>> iTCO watchdog driver.
>>
>> Can you add some debugging into the i801 driver to see what memory regions
>> it reserves, and how it gets to reserve 0x2e..0x31 ? That range really
>> doesn't make any sense to me.
>
> in the function i801_add_tco() in drivers/i2c/busses/i2c-i801.c
> (line 1601 in 5.4.21), there is this code:
>
> /*
> * Power Management registers.
> */
> devfn = PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 2);
> pci_bus_read_config_dword(pci_dev->bus, devfn, ACPIBASE, &base_addr);
>
> res = &tco_res[ICH_RES_IO_SMI];
> res->start = (base_addr & ~1) + ACPIBASE_SMI_OFF;
> res->end = res->start + 3;
> res->flags = IORESOURCE_IO;
>
> base_addr is 0xffffffff after pci_bus_read_config_dword() call.
> ACPIBASE_SMI_OFF is 0x030, therefore res->start is 0x2e.
> Not that I understand even a bit of this...
>

Outch. This means that the code is broken. ACPIBASE is not configured,
or disabled, or the code reads from the wrong PCI configuration register.
What I don't understand is why this works with v5.3 kernels; the code
looks just as bad there for me. I must be missing something. Either case,
the only thing you can really do at this point is to blacklist the
iTCO_wdt driver.

Other than that, we can only hope that someone who understands above
code can provide a fix. Maybe Wolfram has an idea.


Guenter

2020-02-23 07:12:30

by Martin Volf

[permalink] [raw]
Subject: Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

Hello,

On Sat, Feb 22, 2020 at 10:26 PM Guenter Roeck <[email protected]> wrote:
> On 2/22/20 12:49 PM, Martin Volf wrote:
> > On Sat, Feb 22, 2020 at 8:05 PM Guenter Roeck <[email protected]> wrote:
> >> On 2/22/20 9:55 AM, Martin Volf wrote:
> >>> On Sat, Feb 22, 2020 at 4:41 PM Guenter Roeck <[email protected]> wrote:
> >>>> On 2/22/20 3:13 AM, Martin Volf wrote:
> >>>>> hardware monitoring sensors NCT6796D on my Asus PRIME Z390M-PLUS
> >>>>> motherboard with Intel i7-9700 CPU don't work with 5.4 and newer linux
> >>>>> kernels, the driver nct6775 does not load.
> >>>>>
> >>>>> It is working OK in version 5.3. I have used almost all released stable
> >>>>> versions from 5.3.8 to 5.3.16; I didn't try older kernels.
> >>> ...
> >>>> My wild guess would be that the i801 driver is a bit aggressive with
> >>>> reserving memory spaces, but I don't immediately see what it does
> >>>> differently in that regard after the offending patch. Does it work
> >>>> if you unload the i2c_i801 driver first ?
> >>>
> >>> Yes, after unloading i2c_i801, the nct6775 works.
> > ...
> >>> This is diff of /proc/ioports in 5.3.18 with loaded nct6775 and in
> >>> 5.4.21 without:
> >>>
> >>> --- ioports-5.3.18
> >>> +++ ioports-5.4.21
> >>> @@ -2,6 +2,7 @@
> >>> 0000-001f : dma1
> >>> 0020-0021 : pic1
> >>> 002e-0031 : iTCO_wdt
> >>> + 002e-0031 : iTCO_wdt
> >>> 0040-0043 : timer0
> >>> 0050-0053 : timer1
> > ...
> >>> So 0x2e is the resource the two drivers are fighting for.
> > ...
> >> Yes, and it should not do that, since the range can be used to access
> >> different segments of the same chip from multiple drivers. This region
> >> should only be reserved temporarily, using request_muxed_region() when
> >> needed and release_region() after the access is complete. Either case,
> >> I don't immediately see why that region would be interesting for the
> >> iTCO watchdog driver.
> >>
> >> Can you add some debugging into the i801 driver to see what memory regions
> >> it reserves, and how it gets to reserve 0x2e..0x31 ? That range really
> >> doesn't make any sense to me.
> >
> > in the function i801_add_tco() in drivers/i2c/busses/i2c-i801.c
> > (line 1601 in 5.4.21), there is this code:
> >
> > /*
> > * Power Management registers.
> > */
> > devfn = PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 2);
> > pci_bus_read_config_dword(pci_dev->bus, devfn, ACPIBASE, &base_addr);
> >
> > res = &tco_res[ICH_RES_IO_SMI];
> > res->start = (base_addr & ~1) + ACPIBASE_SMI_OFF;
> > res->end = res->start + 3;
> > res->flags = IORESOURCE_IO;
> >
> > base_addr is 0xffffffff after pci_bus_read_config_dword() call.
> > ACPIBASE_SMI_OFF is 0x030, therefore res->start is 0x2e.
> > Not that I understand even a bit of this...
> >
>
> Outch. This means that the code is broken. ACPIBASE is not configured,
> or disabled, or the code reads from the wrong PCI configuration register.
> What I don't understand is why this works with v5.3 kernels; the code
> looks just as bad there for me. I must be missing something. Either case,
> the only thing you can really do at this point is to blacklist the
> iTCO_wdt driver.
>
> Other than that, we can only hope that someone who understands above
> code can provide a fix. Maybe Wolfram has an idea.

I have disabled the watchdog subsystem in kernel config (v5.5.5)
and the modprobe.d workaround and sensors are working.

Thanks a lot for your support!

Martin

2020-02-23 16:42:31

by Wolfram Sang

[permalink] [raw]
Subject: Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080


> Outch. This means that the code is broken. ACPIBASE is not configured,
> or disabled, or the code reads from the wrong PCI configuration register.
> What I don't understand is why this works with v5.3 kernels; the code
> looks just as bad there for me. I must be missing something. Either case,
> the only thing you can really do at this point is to blacklist the
> iTCO_wdt driver.
>
> Other than that, we can only hope that someone who understands above
> code can provide a fix. Maybe Wolfram has an idea.

I'd love to but I don't know much about ACPI and its resource handling.
This is really Jean's realm, and and Jarkko, Andy, and Mika work with
this driver, too. Let's hope they can provide feedback after the
weekend.


Attachments:
(No filename) (747.00 B)
signature.asc (849.00 B)
Download all attachments

2020-02-23 17:57:44

by Gabriel C

[permalink] [raw]
Subject: Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

Am Sa., 22. Feb. 2020 um 12:13 Uhr schrieb Martin Volf
<[email protected]>:
>
> Hello,
>

Hello,

> git bisect found out the first bad commit is
> b84398d6d7f900805662b1619223fd644d862d7c,i801_probe
> i2c: i801: Use iTCO version 6 in Cannon Lake PCH and beyond
>
> Unfortunately I am not able to revert it in v5.4 to confirm it is really
> the culprit.

I don't think you need to revert that to test, just move back
Cannon Lake to use iTCO Version 4 in i801_probe().

Something like this:

https://crazy.dev.frugalware.org/CANNONLAKE-use-iTCO-ver4.patch

BR,

Gabriel C

2020-02-24 10:19:36

by Mika Westerberg

[permalink] [raw]
Subject: Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

+Andy and Jarkko

On Sat, Feb 22, 2020 at 01:26:48PM -0800, Guenter Roeck wrote:
> On 2/22/20 12:49 PM, Martin Volf wrote:
> > Hello,
> >
> > On Sat, Feb 22, 2020 at 8:05 PM Guenter Roeck <[email protected]> wrote:
> > > On 2/22/20 9:55 AM, Martin Volf wrote:
> > > > On Sat, Feb 22, 2020 at 4:41 PM Guenter Roeck <[email protected]> wrote:
> > > > > On 2/22/20 3:13 AM, Martin Volf wrote:
> > > > > > hardware monitoring sensors NCT6796D on my Asus PRIME Z390M-PLUS
> > > > > > motherboard with Intel i7-9700 CPU don't work with 5.4 and newer linux
> > > > > > kernels, the driver nct6775 does not load.
> > > > > >
> > > > > > It is working OK in version 5.3. I have used almost all released stable
> > > > > > versions from 5.3.8 to 5.3.16; I didn't try older kernels.
> > > > ...
> > > > > My wild guess would be that the i801 driver is a bit aggressive with
> > > > > reserving memory spaces, but I don't immediately see what it does
> > > > > differently in that regard after the offending patch. Does it work
> > > > > if you unload the i2c_i801 driver first ?
> > > >
> > > > Yes, after unloading i2c_i801, the nct6775 works.
> > ...
> > > > This is diff of /proc/ioports in 5.3.18 with loaded nct6775 and in
> > > > 5.4.21 without:
> > > >
> > > > --- ioports-5.3.18
> > > > +++ ioports-5.4.21
> > > > @@ -2,6 +2,7 @@
> > > > 0000-001f : dma1
> > > > 0020-0021 : pic1
> > > > 002e-0031 : iTCO_wdt
> > > > + 002e-0031 : iTCO_wdt
> > > > 0040-0043 : timer0
> > > > 0050-0053 : timer1
> > ...
> > > > So 0x2e is the resource the two drivers are fighting for.
> > ...
> > > Yes, and it should not do that, since the range can be used to access
> > > different segments of the same chip from multiple drivers. This region
> > > should only be reserved temporarily, using request_muxed_region() when
> > > needed and release_region() after the access is complete. Either case,
> > > I don't immediately see why that region would be interesting for the
> > > iTCO watchdog driver.
> > >
> > > Can you add some debugging into the i801 driver to see what memory regions
> > > it reserves, and how it gets to reserve 0x2e..0x31 ? That range really
> > > doesn't make any sense to me.
> >
> > in the function i801_add_tco() in drivers/i2c/busses/i2c-i801.c
> > (line 1601 in 5.4.21), there is this code:
> >
> > /*
> > * Power Management registers.
> > */
> > devfn = PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 2);
> > pci_bus_read_config_dword(pci_dev->bus, devfn, ACPIBASE, &base_addr);
> >
> > res = &tco_res[ICH_RES_IO_SMI];
> > res->start = (base_addr & ~1) + ACPIBASE_SMI_OFF;
> > res->end = res->start + 3;
> > res->flags = IORESOURCE_IO;
> >
> > base_addr is 0xffffffff after pci_bus_read_config_dword() call.
> > ACPIBASE_SMI_OFF is 0x030, therefore res->start is 0x2e.
> > Not that I understand even a bit of this...
> >
>
> Outch. This means that the code is broken. ACPIBASE is not configured,
> or disabled, or the code reads from the wrong PCI configuration register.
> What I don't understand is why this works with v5.3 kernels; the code
> looks just as bad there for me. I must be missing something. Either case,
> the only thing you can really do at this point is to blacklist the
> iTCO_wdt driver.

Indeed it looks like the code reads from a register that is not there
anymore in this generation hardware, or something like that. It tries to
read the PMC (1f.2) register add address 0x40 which is supposed to be
base of ACPI PM registers but that does not seem to exist any more in
newer chipset.

We'll look into this more and return back.

2020-02-24 10:37:51

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

On Mon, Feb 24, 2020 at 12:18:00PM +0200, Mika Westerberg wrote:
> On Sat, Feb 22, 2020 at 01:26:48PM -0800, Guenter Roeck wrote:
> > On 2/22/20 12:49 PM, Martin Volf wrote:
> > > On Sat, Feb 22, 2020 at 8:05 PM Guenter Roeck <[email protected]> wrote:

...

> > > devfn = PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 2);

I'm wondering if

pci_dev_is_present(...);

returns false here.

> > > pci_bus_read_config_dword(pci_dev->bus, devfn, ACPIBASE, &base_addr);
> > >
> > > res = &tco_res[ICH_RES_IO_SMI];
> > > res->start = (base_addr & ~1) + ACPIBASE_SMI_OFF;
> > > res->end = res->start + 3;
> > > res->flags = IORESOURCE_IO;
> > >
> > > base_addr is 0xffffffff after pci_bus_read_config_dword() call.
> > > ACPIBASE_SMI_OFF is 0x030, therefore res->start is 0x2e.
> > > Not that I understand even a bit of this...
> > >
> >
> > Outch. This means that the code is broken. ACPIBASE is not configured,
> > or disabled, or the code reads from the wrong PCI configuration register.
> > What I don't understand is why this works with v5.3 kernels; the code
> > looks just as bad there for me. I must be missing something. Either case,
> > the only thing you can really do at this point is to blacklist the
> > iTCO_wdt driver.
>
> Indeed it looks like the code reads from a register that is not there
> anymore in this generation hardware, or something like that. It tries to
> read the PMC (1f.2) register add address 0x40 which is supposed to be
> base of ACPI PM registers but that does not seem to exist any more in
> newer chipset.
>
> We'll look into this more and return back.

--
With Best Regards,
Andy Shevchenko


2020-02-24 10:52:50

by Mika Westerberg

[permalink] [raw]
Subject: Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

On Mon, Feb 24, 2020 at 12:37:31PM +0200, Andy Shevchenko wrote:
> On Mon, Feb 24, 2020 at 12:18:00PM +0200, Mika Westerberg wrote:
> > On Sat, Feb 22, 2020 at 01:26:48PM -0800, Guenter Roeck wrote:
> > > On 2/22/20 12:49 PM, Martin Volf wrote:
> > > > On Sat, Feb 22, 2020 at 8:05 PM Guenter Roeck <[email protected]> wrote:
>
> ...
>
> > > > devfn = PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 2);
>
> I'm wondering if
>
> pci_dev_is_present(...);
>
> returns false here.

Well that might also be the case since lspci shows this:

00:1f.0 ISA bridge: Intel Corporation Z390 Chipset LPC/eSPI Controller (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)

PMC is 1f.2 and not present here. However, it may be that the PMC is
still there it just does not "enumerate" because its devid/vendorid are
set to 0xffff. Similar hiding was done for the P2SB bridge.

2020-02-24 11:29:06

by Mika Westerberg

[permalink] [raw]
Subject: Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

On Mon, Feb 24, 2020 at 12:51:25PM +0200, Mika Westerberg wrote:
> > I'm wondering if
> >
> > pci_dev_is_present(...);
> >
> > returns false here.
>
> Well that might also be the case since lspci shows this:
>
> 00:1f.0 ISA bridge: Intel Corporation Z390 Chipset LPC/eSPI Controller (rev 10)
> 00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
> 00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
>
> PMC is 1f.2 and not present here. However, it may be that the PMC is
> still there it just does not "enumerate" because its devid/vendorid are
> set to 0xffff. Similar hiding was done for the P2SB bridge.

Actually I think this is the case here.

I don't know the iTCO_wdt well enough to say if it could live without
the ICH_RES_IO_SMI. It looks like this register is used to disable SMI
generation but not sure how well it works if it is left to BIOS to
configure. I suppose these systems should use WDAT instead.

Martin, can you try the below patch?

diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c
index ba87305f4332..c16e5ad08641 100644
--- a/drivers/i2c/busses/i2c-i801.c
+++ b/drivers/i2c/busses/i2c-i801.c
@@ -1593,7 +1593,7 @@ i801_add_tco_cnl(struct i801_priv *priv, struct pci_dev *pci_dev,
static void i801_add_tco(struct i801_priv *priv)
{
u32 base_addr, tco_base, tco_ctl, ctrl_val;
- struct pci_dev *pci_dev = priv->pci_dev;
+ struct pci_dev *pmc_dev, *pci_dev = priv->pci_dev;
struct resource tco_res[3], *res;
unsigned int devfn;

@@ -1620,7 +1620,12 @@ static void i801_add_tco(struct i801_priv *priv)
* Power Management registers.
*/
devfn = PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 2);
- pci_bus_read_config_dword(pci_dev->bus, devfn, ACPIBASE, &base_addr);
+ pmc_dev = pci_get_slot(pci_dev->bus, devfn);
+ if (!pmc_dev) {
+ dev_info(&pci_dev->dev, "PMC device disabled, not enabling iTCO\n");
+ return;
+ }
+ pci_read_config_dword(pmc_dev, ACPIBASE, &base_addr);

res = &tco_res[ICH_RES_IO_SMI];
res->start = (base_addr & ~1) + ACPIBASE_SMI_OFF;
@@ -1630,15 +1635,17 @@ static void i801_add_tco(struct i801_priv *priv)
/*
* Enable the ACPI I/O space.
*/
- pci_bus_read_config_dword(pci_dev->bus, devfn, ACPICTRL, &ctrl_val);
+ pci_read_config_dword(pmc_dev, ACPICTRL, &ctrl_val);
ctrl_val |= ACPICTRL_EN;
- pci_bus_write_config_dword(pci_dev->bus, devfn, ACPICTRL, ctrl_val);
+ pci_write_config_dword(pmc_dev, ACPICTRL, ctrl_val);

if (priv->features & FEATURE_TCO_CNL)
priv->tco_pdev = i801_add_tco_cnl(priv, pci_dev, tco_res);
else
priv->tco_pdev = i801_add_tco_spt(priv, pci_dev, tco_res);

+ pci_dev_put(pmc_dev);
+
if (IS_ERR(priv->tco_pdev))
dev_warn(&pci_dev->dev, "failed to create iTCO device\n");
}

2020-02-24 17:32:06

by Martin Volf

[permalink] [raw]
Subject: Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

On Mon, Feb 24, 2020 at 12:27 PM Mika Westerberg
<[email protected]> wrote:
>
> On Mon, Feb 24, 2020 at 12:51:25PM +0200, Mika Westerberg wrote:
> > > I'm wondering if
> > >
> > > pci_dev_is_present(...);
> > >
> > > returns false here.
> >
> > Well that might also be the case since lspci shows this:
> >
> > 00:1f.0 ISA bridge: Intel Corporation Z390 Chipset LPC/eSPI Controller (rev 10)
> > 00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
> > 00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
> >
> > PMC is 1f.2 and not present here. However, it may be that the PMC is
> > still there it just does not "enumerate" because its devid/vendorid are
> > set to 0xffff. Similar hiding was done for the P2SB bridge.
>
> Actually I think this is the case here.
>
> I don't know the iTCO_wdt well enough to say if it could live without
> the ICH_RES_IO_SMI. It looks like this register is used to disable SMI
> generation but not sure how well it works if it is left to BIOS to
> configure. I suppose these systems should use WDAT instead.
>
> Martin, can you try the below patch?
>
> diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c
> index ba87305f4332..c16e5ad08641 100644
> --- a/drivers/i2c/busses/i2c-i801.c
> +++ b/drivers/i2c/busses/i2c-i801.c
> @@ -1593,7 +1593,7 @@ i801_add_tco_cnl(struct i801_priv *priv, struct pci_dev *pci_dev,
> static void i801_add_tco(struct i801_priv *priv)
> {
> u32 base_addr, tco_base, tco_ctl, ctrl_val;
> - struct pci_dev *pci_dev = priv->pci_dev;
> + struct pci_dev *pmc_dev, *pci_dev = priv->pci_dev;
> struct resource tco_res[3], *res;
> unsigned int devfn;
>
> @@ -1620,7 +1620,12 @@ static void i801_add_tco(struct i801_priv *priv)
> * Power Management registers.
> */
> devfn = PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 2);
> - pci_bus_read_config_dword(pci_dev->bus, devfn, ACPIBASE, &base_addr);
> + pmc_dev = pci_get_slot(pci_dev->bus, devfn);
> + if (!pmc_dev) {
> + dev_info(&pci_dev->dev, "PMC device disabled, not enabling iTCO\n");
> + return;
> + }
> + pci_read_config_dword(pmc_dev, ACPIBASE, &base_addr);
>
> res = &tco_res[ICH_RES_IO_SMI];
> res->start = (base_addr & ~1) + ACPIBASE_SMI_OFF;
> @@ -1630,15 +1635,17 @@ static void i801_add_tco(struct i801_priv *priv)
> /*
> * Enable the ACPI I/O space.
> */
> - pci_bus_read_config_dword(pci_dev->bus, devfn, ACPICTRL, &ctrl_val);
> + pci_read_config_dword(pmc_dev, ACPICTRL, &ctrl_val);
> ctrl_val |= ACPICTRL_EN;
> - pci_bus_write_config_dword(pci_dev->bus, devfn, ACPICTRL, ctrl_val);
> + pci_write_config_dword(pmc_dev, ACPICTRL, ctrl_val);
>
> if (priv->features & FEATURE_TCO_CNL)
> priv->tco_pdev = i801_add_tco_cnl(priv, pci_dev, tco_res);
> else
> priv->tco_pdev = i801_add_tco_spt(priv, pci_dev, tco_res);
>
> + pci_dev_put(pmc_dev);
> +
> if (IS_ERR(priv->tco_pdev))
> dev_warn(&pci_dev->dev, "failed to create iTCO device\n");
> }

Hello,

with the patch applied, the sensors are working, dmesg says:
...
[ 2.804423] i801_smbus 0000:00:1f.4: SPD Write Disable is set
[ 2.804478] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt
[ 2.804491] i801_smbus 0000:00:1f.4: PMC device disabled, not enabling iTCO
...
[ 2.826373] nct6775: Enabling hardware monitor logical device mappings.
[ 2.826447] nct6775: Found NCT6798D or compatible chip at 0x2e:0x290
...

and there is no "002e-0031" line in /proc/ioports.

Martin

2020-02-24 18:28:40

by Guenter Roeck

[permalink] [raw]
Subject: Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

On Mon, Feb 24, 2020 at 01:27:40PM +0200, Mika Westerberg wrote:
> On Mon, Feb 24, 2020 at 12:51:25PM +0200, Mika Westerberg wrote:
> > > I'm wondering if
> > >
> > > pci_dev_is_present(...);
> > >
> > > returns false here.
> >
> > Well that might also be the case since lspci shows this:
> >
> > 00:1f.0 ISA bridge: Intel Corporation Z390 Chipset LPC/eSPI Controller (rev 10)
> > 00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
> > 00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
> >
> > PMC is 1f.2 and not present here. However, it may be that the PMC is
> > still there it just does not "enumerate" because its devid/vendorid are
> > set to 0xffff. Similar hiding was done for the P2SB bridge.
>
> Actually I think this is the case here.
>
> I don't know the iTCO_wdt well enough to say if it could live without
> the ICH_RES_IO_SMI. It looks like this register is used to disable SMI
> generation but not sure how well it works if it is left to BIOS to
> configure. I suppose these systems should use WDAT instead.
>

In practice the region is only used if
if (turn_SMI_watchdog_clear_off >= p->iTCO_version) {
where turn_SMI_watchdog_clear_off is 1 by default. It is also
passed to vendor specific code, but that is only relevant for
really old systems. In short, it isn't really needed for any
recent-generation systems, and could be made optional with
a few lines of code in iTCO_wdt.c.

Guenter

2020-02-25 12:15:52

by Mika Westerberg

[permalink] [raw]
Subject: Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

On Mon, Feb 24, 2020 at 06:30:21PM +0100, Martin Volf wrote:
> On Mon, Feb 24, 2020 at 12:27 PM Mika Westerberg
> <[email protected]> wrote:
> >
> > On Mon, Feb 24, 2020 at 12:51:25PM +0200, Mika Westerberg wrote:
> > > > I'm wondering if
> > > >
> > > > pci_dev_is_present(...);
> > > >
> > > > returns false here.
> > >
> > > Well that might also be the case since lspci shows this:
> > >
> > > 00:1f.0 ISA bridge: Intel Corporation Z390 Chipset LPC/eSPI Controller (rev 10)
> > > 00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
> > > 00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
> > >
> > > PMC is 1f.2 and not present here. However, it may be that the PMC is
> > > still there it just does not "enumerate" because its devid/vendorid are
> > > set to 0xffff. Similar hiding was done for the P2SB bridge.
> >
> > Actually I think this is the case here.
> >
> > I don't know the iTCO_wdt well enough to say if it could live without
> > the ICH_RES_IO_SMI. It looks like this register is used to disable SMI
> > generation but not sure how well it works if it is left to BIOS to
> > configure. I suppose these systems should use WDAT instead.
> >
> > Martin, can you try the below patch?
> >
> > diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c
> > index ba87305f4332..c16e5ad08641 100644
> > --- a/drivers/i2c/busses/i2c-i801.c
> > +++ b/drivers/i2c/busses/i2c-i801.c
> > @@ -1593,7 +1593,7 @@ i801_add_tco_cnl(struct i801_priv *priv, struct pci_dev *pci_dev,
> > static void i801_add_tco(struct i801_priv *priv)
> > {
> > u32 base_addr, tco_base, tco_ctl, ctrl_val;
> > - struct pci_dev *pci_dev = priv->pci_dev;
> > + struct pci_dev *pmc_dev, *pci_dev = priv->pci_dev;
> > struct resource tco_res[3], *res;
> > unsigned int devfn;
> >
> > @@ -1620,7 +1620,12 @@ static void i801_add_tco(struct i801_priv *priv)
> > * Power Management registers.
> > */
> > devfn = PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 2);
> > - pci_bus_read_config_dword(pci_dev->bus, devfn, ACPIBASE, &base_addr);
> > + pmc_dev = pci_get_slot(pci_dev->bus, devfn);
> > + if (!pmc_dev) {
> > + dev_info(&pci_dev->dev, "PMC device disabled, not enabling iTCO\n");
> > + return;
> > + }
> > + pci_read_config_dword(pmc_dev, ACPIBASE, &base_addr);
> >
> > res = &tco_res[ICH_RES_IO_SMI];
> > res->start = (base_addr & ~1) + ACPIBASE_SMI_OFF;
> > @@ -1630,15 +1635,17 @@ static void i801_add_tco(struct i801_priv *priv)
> > /*
> > * Enable the ACPI I/O space.
> > */
> > - pci_bus_read_config_dword(pci_dev->bus, devfn, ACPICTRL, &ctrl_val);
> > + pci_read_config_dword(pmc_dev, ACPICTRL, &ctrl_val);
> > ctrl_val |= ACPICTRL_EN;
> > - pci_bus_write_config_dword(pci_dev->bus, devfn, ACPICTRL, ctrl_val);
> > + pci_write_config_dword(pmc_dev, ACPICTRL, ctrl_val);
> >
> > if (priv->features & FEATURE_TCO_CNL)
> > priv->tco_pdev = i801_add_tco_cnl(priv, pci_dev, tco_res);
> > else
> > priv->tco_pdev = i801_add_tco_spt(priv, pci_dev, tco_res);
> >
> > + pci_dev_put(pmc_dev);
> > +
> > if (IS_ERR(priv->tco_pdev))
> > dev_warn(&pci_dev->dev, "failed to create iTCO device\n");
> > }
>
> Hello,
>
> with the patch applied, the sensors are working, dmesg says:
> ...
> [ 2.804423] i801_smbus 0000:00:1f.4: SPD Write Disable is set
> [ 2.804478] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt
> [ 2.804491] i801_smbus 0000:00:1f.4: PMC device disabled, not enabling iTCO
> ...
> [ 2.826373] nct6775: Enabling hardware monitor logical device mappings.
> [ 2.826447] nct6775: Found NCT6798D or compatible chip at 0x2e:0x290
> ...
>
> and there is no "002e-0031" line in /proc/ioports.

Great, thanks for testing. I'll make an updated patch as suggested by
Guenter that makes the SMI resource optional and send it to you guys.

2020-02-25 12:16:15

by Mika Westerberg

[permalink] [raw]
Subject: Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

On Mon, Feb 24, 2020 at 10:27:30AM -0800, Guenter Roeck wrote:
> On Mon, Feb 24, 2020 at 01:27:40PM +0200, Mika Westerberg wrote:
> > On Mon, Feb 24, 2020 at 12:51:25PM +0200, Mika Westerberg wrote:
> > > > I'm wondering if
> > > >
> > > > pci_dev_is_present(...);
> > > >
> > > > returns false here.
> > >
> > > Well that might also be the case since lspci shows this:
> > >
> > > 00:1f.0 ISA bridge: Intel Corporation Z390 Chipset LPC/eSPI Controller (rev 10)
> > > 00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
> > > 00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
> > >
> > > PMC is 1f.2 and not present here. However, it may be that the PMC is
> > > still there it just does not "enumerate" because its devid/vendorid are
> > > set to 0xffff. Similar hiding was done for the P2SB bridge.
> >
> > Actually I think this is the case here.
> >
> > I don't know the iTCO_wdt well enough to say if it could live without
> > the ICH_RES_IO_SMI. It looks like this register is used to disable SMI
> > generation but not sure how well it works if it is left to BIOS to
> > configure. I suppose these systems should use WDAT instead.
> >
>
> In practice the region is only used if
> if (turn_SMI_watchdog_clear_off >= p->iTCO_version) {
> where turn_SMI_watchdog_clear_off is 1 by default. It is also
> passed to vendor specific code, but that is only relevant for
> really old systems. In short, it isn't really needed for any
> recent-generation systems, and could be made optional with
> a few lines of code in iTCO_wdt.c.

Indeed that seems to be the case. Let me come up with a series that
makes the SMI optional and fixes the problematic code in the i801 driver
by not passing the SMI resource if the PMC device is not present.