2006-01-14 19:23:55

by Andrey Borzenkov

[permalink] [raw]
Subject: 2.6.15: lm90 0-004c: Register 0x13 read failed (-1)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vanilla 2.6.15 on Toshiba Portege 4000. I get constant messages in dmesg:

i2c_adapter i2c-0: Error: command never completed
lm90 0-004c: Register 0x1 read failed (-1)
i2c_adapter i2c-0: Error: command never completed
lm90 0-004c: Register 0x14 read failed (-1)
i2c_adapter i2c-0: Error: command never completed
lm90 0-004c: Register 0x8 read failed (-1)
i2c_adapter i2c-0: Error: command never completed
lm90 0-004c: Register 0x0 read failed (-1)

for quite a number of registers. Apparently I can read sensors just fine still
I am uneasy seeing those.

{pts/1}% lspci
00:00.0 Host bridge: ALi Corporation M1644/M1644T Northbridge+Trident (rev 01)
00:01.0 PCI bridge: ALi Corporation PCI to AGP Controller
00:02.0 USB Controller: ALi Corporation USB 1.1 Controller (rev 03)
00:04.0 IDE interface: ALi Corporation M5229 IDE (rev c3)
00:06.0 Multimedia audio controller: ALi Corporation M5451 PCI AC-Link
Controller Audio Device (rev 01)
00:07.0 ISA bridge: ALi Corporation M1533 PCI to ISA Bridge [Aladdin IV]
00:08.0 Bridge: ALi Corporation M7101 Power Management Controller [PMU]
00:0a.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100]
(rev 08)
00:10.0 CardBus bridge: Texas Instruments PCI1410 PC card Cardbus Controller
(rev 01)
00:11.0 CardBus bridge: Toshiba America Info Systems ToPIC100 PCI to Cardbus
Bridge with ZV Support (rev 32)
00:11.1 CardBus bridge: Toshiba America Info Systems ToPIC100 PCI to Cardbus
Bridge with ZV Support (rev 32)
00:12.0 System peripheral: Toshiba America Info Systems SD TypA Controller
(rev 03)
01:00.0 VGA compatible controller: Trident Microsystems CyberBlade XPAi1 (rev
82)
{pts/1}% sensors
eeprom-i2c-0-50
Adapter: SMBus ALI1535 adapter at ef00
Memory type: SDR SDRAM DIMM
Memory size (MB): 256

adm1032-i2c-0-4c
Adapter: SMBus ALI1535 adapter at ef00
M/B Temp: +43°C (low = -65°C, high = +127°C)
CPU Temp: +47.6°C (low = +43.0°C, high = +51.0°C) ALARM
M/B Crit: +127°C (hyst = +122°C)
CPU Crit: +100°C (hyst = +95°C)


- -andrey
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFDyU+3R6LMutpd94wRAhIpAJ9jAaVmEx6v3FF5f7pDvmD/Xu7GnQCeO/5O
RSvVH1lgezCRTdrAQdLD0js=
=i2dt
-----END PGP SIGNATURE-----


2006-01-14 21:20:09

by Jean Delvare

[permalink] [raw]
Subject: Re: [lm-sensors] 2.6.15: lm90 0-004c: Register 0x13 read failed (-1)

Hi Andrey,

> Vanilla 2.6.15 on Toshiba Portege 4000. I get constant messages in dmesg:
>
> i2c_adapter i2c-0: Error: command never completed
> lm90 0-004c: Register 0x1 read failed (-1)
> i2c_adapter i2c-0: Error: command never completed
> lm90 0-004c: Register 0x14 read failed (-1)
> i2c_adapter i2c-0: Error: command never completed
> lm90 0-004c: Register 0x8 read failed (-1)
> i2c_adapter i2c-0: Error: command never completed
> lm90 0-004c: Register 0x0 read failed (-1)
>
> for quite a number of registers. Apparently I can read sensors just fine still
> I am uneasy seeing those.

Before 2.6.15, the lm90 driver did not handle read errors in any way,
so they were probably already there, you simply were not aware of it.
However, I guess that you already had the "command never completed"
errors? These come from the i2c-ali1535 bus driver.

It would be possible to add a retry-on-failure mechanism in the lm90
driver. However, the real problem is more likely in the i2c-ali1535
driver so fixing this one driver would be preferable.

> eeprom-i2c-0-50
> Adapter: SMBus ALI1535 adapter at ef00
> Memory type: SDR SDRAM DIMM
> Memory size (MB): 256
>
> adm1032-i2c-0-4c
> Adapter: SMBus ALI1535 adapter at ef00
> M/B Temp: +43?C (low = -65?C, high = +127?C)
> CPU Temp: +47.6?C (low = +43.0?C, high = +51.0?C) ALARM
> M/B Crit: +127?C (hyst = +122?C)
> CPU Crit: +100?C (hyst = +95?C)

Do you also have "command never completed" errors without an associated
error from the lm90 driver? This would suggest that the eeprom driver
too is triggering errors, which in turn would confirm that we need to
fix the i2c-ali1535 driver rather than adding a workaround to the lm90
driver.

It looks like the i2c-ali1535 driver as it exists in the lm_sensors CVS
repository (for Linux 2.4 kernels) did receive a major change in March
2005. These changes were supposed to "fix stability problems" (by
adding delay loops pretty much everywhere). They were never ported to
the Linux 2.6 version of the driver. Maybe we should try doing so now.

This is a 400 lines patch, porting it won't be trivial, I am not
familiar with this driver myself and I don't have a chip to test my
changes on, so if someone else wants to take his/her chance, go. If
not, I'll do it.

Andrey, will you be able to test a i2c-ali1535 patch if we come up with
one?

--
Jean Delvare

2006-01-14 21:45:36

by Andrey Borzenkov

[permalink] [raw]
Subject: Re: [lm-sensors] 2.6.15: lm90 0-004c: Register 0x13 read failed (-1)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sunday 15 January 2006 00:20, Jean Delvare wrote:
> Hi Andrey,
>
> > Vanilla 2.6.15 on Toshiba Portege 4000. I get constant messages in dmesg:
> >
> > i2c_adapter i2c-0: Error: command never completed
> > lm90 0-004c: Register 0x1 read failed (-1)
> > i2c_adapter i2c-0: Error: command never completed
> > lm90 0-004c: Register 0x14 read failed (-1)
> > i2c_adapter i2c-0: Error: command never completed
> > lm90 0-004c: Register 0x8 read failed (-1)
> > i2c_adapter i2c-0: Error: command never completed
> > lm90 0-004c: Register 0x0 read failed (-1)
> >
> > for quite a number of registers. Apparently I can read sensors just fine
> > still I am uneasy seeing those.
>
> Before 2.6.15, the lm90 driver did not handle read errors in any way,
> so they were probably already there, you simply were not aware of it.
> However, I guess that you already had the "command never completed"
> errors? These come from the i2c-ali1535 bus driver.
>

Before 2.6.15 I run Mandriva kernel 2.6.12-12mdk. I do not remember them but
may be I just never actually looked in dmesg :)

> It would be possible to add a retry-on-failure mechanism in the lm90
> driver. However, the real problem is more likely in the i2c-ali1535
> driver so fixing this one driver would be preferable.
>
> > eeprom-i2c-0-50
> > Adapter: SMBus ALI1535 adapter at ef00
> > Memory type: SDR SDRAM DIMM
> > Memory size (MB): 256
> >
> > adm1032-i2c-0-4c
> > Adapter: SMBus ALI1535 adapter at ef00
> > M/B Temp: +43?C (low = -65?C, high = +127?C)
> > CPU Temp: +47.6?C (low = +43.0?C, high = +51.0?C) ALARM
> > M/B Crit: +127?C (hyst = +122?C)
> > CPU Crit: +100?C (hyst = +95?C)
>
> Do you also have "command never completed" errors without an associated
> error from the lm90 driver?

yes, on boot.

> This would suggest that the eeprom driver
> too is triggering errors, which in turn would confirm that we need to
> fix the i2c-ali1535 driver rather than adding a workaround to the lm90
> driver.
>
> It looks like the i2c-ali1535 driver as it exists in the lm_sensors CVS
> repository (for Linux 2.4 kernels) did receive a major change in March
> 2005. These changes were supposed to "fix stability problems" (by
> adding delay loops pretty much everywhere). They were never ported to
> the Linux 2.6 version of the driver. Maybe we should try doing so now.
>
> This is a 400 lines patch, porting it won't be trivial, I am not
> familiar with this driver myself and I don't have a chip to test my
> changes on, so if someone else wants to take his/her chance, go. If
> not, I'll do it.
>
> Andrey, will you be able to test a i2c-ali1535 patch if we come up with
> one?

Yes. Send me a patch (or give a link) and I'll try what I can do to port it. I
ask if I have a question :)

BTW that reminds me - I actually have two 256M modules. Sensors show just one.
Both are from Toshiba so it is unlikely that one does not have SPD - any idea
why eeprom does not find second one? Oh, and it was the same when I had two
128M modules so it is unlikely caused by modules.

thank you for reply

- -andrey
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFDyXD6R6LMutpd94wRAsIqAJwP5CdEisSKsA/iGqv2ouZ58xLe8ACgvRIY
WfuwZrsE996ZEtSoYvElgnQ=
=SSCR
-----END PGP SIGNATURE-----

2006-01-15 19:12:38

by Andrey Borzenkov

[permalink] [raw]
Subject: Re: [lm-sensors] 2.6.15: lm90 0-004c: Register 0x13 read failed (-1)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sunday 15 January 2006 00:45, Andrey Borzenkov wrote:
> On Sunday 15 January 2006 00:20, Jean Delvare wrote:
> > Hi Andrey,
> >
> > > Vanilla 2.6.15 on Toshiba Portege 4000. I get constant messages in
> > > dmesg:
> > >
> > > i2c_adapter i2c-0: Error: command never completed
> > > lm90 0-004c: Register 0x1 read failed (-1)
> > > i2c_adapter i2c-0: Error: command never completed
> > > lm90 0-004c: Register 0x14 read failed (-1)
> > > i2c_adapter i2c-0: Error: command never completed
> > > lm90 0-004c: Register 0x8 read failed (-1)
> > > i2c_adapter i2c-0: Error: command never completed
> > > lm90 0-004c: Register 0x0 read failed (-1)
> > >
> > > for quite a number of registers. Apparently I can read sensors just
> > > fine still I am uneasy seeing those.
> >
> > Before 2.6.15, the lm90 driver did not handle read errors in any way,
> > so they were probably already there, you simply were not aware of it.
> > However, I guess that you already had the "command never completed"
> > errors? These come from the i2c-ali1535 bus driver.
>
> Before 2.6.15 I run Mandriva kernel 2.6.12-12mdk. I do not remember them
> but may be I just never actually looked in dmesg :)
>
> > It would be possible to add a retry-on-failure mechanism in the lm90
> > driver. However, the real problem is more likely in the i2c-ali1535
> > driver so fixing this one driver would be preferable.
> >
> > > eeprom-i2c-0-50
> > > Adapter: SMBus ALI1535 adapter at ef00
> > > Memory type: SDR SDRAM DIMM
> > > Memory size (MB): 256
> > >
> > > adm1032-i2c-0-4c
> > > Adapter: SMBus ALI1535 adapter at ef00
> > > M/B Temp: +43?C (low = -65?C, high = +127?C)
> > > CPU Temp: +47.6?C (low = +43.0?C, high = +51.0?C) ALARM
> > > M/B Crit: +127?C (hyst = +122?C)
> > > CPU Crit: +100?C (hyst = +95?C)
> >
> > Do you also have "command never completed" errors without an associated
> > error from the lm90 driver?
>
> yes, on boot.
>
> > This would suggest that the eeprom driver
> > too is triggering errors, which in turn would confirm that we need to
> > fix the i2c-ali1535 driver rather than adding a workaround to the lm90
> > driver.
> >
> > It looks like the i2c-ali1535 driver as it exists in the lm_sensors CVS
> > repository (for Linux 2.4 kernels) did receive a major change in March
> > 2005. These changes were supposed to "fix stability problems" (by
> > adding delay loops pretty much everywhere). They were never ported to
> > the Linux 2.6 version of the driver. Maybe we should try doing so now.
> >
> > This is a 400 lines patch, porting it won't be trivial, I am not
> > familiar with this driver myself and I don't have a chip to test my
> > changes on, so if someone else wants to take his/her chance, go. If
> > not, I'll do it.
> >
> > Andrey, will you be able to test a i2c-ali1535 patch if we come up with
> > one?
>
> Yes. Send me a patch (or give a link) and I'll try what I can do to port
> it. I ask if I have a question :)
>

Do you mean revision 1.21 with date: 2005/03/27 02:22:10; author: mds? I
checked and this one seems to be in current 2.6.15.1 kernel. I did not check
if there were any omissions comparing with CVS but current kernel does
contain and use ali1535_transaction() added by mentioned patch.

- -andrey
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFDyp6fR6LMutpd94wRAnqNAKCn3kW51rt3YrPatfVibeU1WPClvQCfTWAG
u3RJ0TZnP3izDyPS1HwbVg0=
=35MJ
-----END PGP SIGNATURE-----

2006-01-15 19:48:21

by Andrey Borzenkov

[permalink] [raw]
Subject: Re: [lm-sensors] 2.6.15: lm90 0-004c: Register 0x13 read failed (-1)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sunday 15 January 2006 22:12, Andrey Borzenkov wrote:
> On Sunday 15 January 2006 00:45, Andrey Borzenkov wrote:
> > On Sunday 15 January 2006 00:20, Jean Delvare wrote:
> > > Hi Andrey,
> > >
> > > > Vanilla 2.6.15 on Toshiba Portege 4000. I get constant messages in
> > > > dmesg:
> > > >
> > > > i2c_adapter i2c-0: Error: command never completed
> > > > lm90 0-004c: Register 0x1 read failed (-1)
> > > > i2c_adapter i2c-0: Error: command never completed
> > > > lm90 0-004c: Register 0x14 read failed (-1)
> > > > i2c_adapter i2c-0: Error: command never completed
> > > > lm90 0-004c: Register 0x8 read failed (-1)
> > > > i2c_adapter i2c-0: Error: command never completed
> > > > lm90 0-004c: Register 0x0 read failed (-1)
> > > >
> > > > for quite a number of registers. Apparently I can read sensors just
> > > > fine still I am uneasy seeing those.
> > >
> > > Before 2.6.15, the lm90 driver did not handle read errors in any way,
> > > so they were probably already there, you simply were not aware of it.
> > > However, I guess that you already had the "command never completed"
> > > errors? These come from the i2c-ali1535 bus driver.
> >
> > Before 2.6.15 I run Mandriva kernel 2.6.12-12mdk. I do not remember them
> > but may be I just never actually looked in dmesg :)
> >
> > > It would be possible to add a retry-on-failure mechanism in the lm90
> > > driver. However, the real problem is more likely in the i2c-ali1535
> > > driver so fixing this one driver would be preferable.
> > >
> > > > eeprom-i2c-0-50
> > > > Adapter: SMBus ALI1535 adapter at ef00
> > > > Memory type: SDR SDRAM DIMM
> > > > Memory size (MB): 256
> > > >
> > > > adm1032-i2c-0-4c
> > > > Adapter: SMBus ALI1535 adapter at ef00
> > > > M/B Temp: +43?C (low = -65?C, high = +127?C)
> > > > CPU Temp: +47.6?C (low = +43.0?C, high = +51.0?C) ALARM
> > > > M/B Crit: +127?C (hyst = +122?C)
> > > > CPU Crit: +100?C (hyst = +95?C)
> > >
> > > Do you also have "command never completed" errors without an associated
> > > error from the lm90 driver?
> >
> > yes, on boot.
> >
> > > This would suggest that the eeprom driver
> > > too is triggering errors, which in turn would confirm that we need to
> > > fix the i2c-ali1535 driver rather than adding a workaround to the lm90
> > > driver.
> > >
> > > It looks like the i2c-ali1535 driver as it exists in the lm_sensors CVS
> > > repository (for Linux 2.4 kernels) did receive a major change in March
> > > 2005. These changes were supposed to "fix stability problems" (by
> > > adding delay loops pretty much everywhere). They were never ported to
> > > the Linux 2.6 version of the driver. Maybe we should try doing so now.
> > >
> > > This is a 400 lines patch, porting it won't be trivial, I am not
> > > familiar with this driver myself and I don't have a chip to test my
> > > changes on, so if someone else wants to take his/her chance, go. If
> > > not, I'll do it.
> > >
> > > Andrey, will you be able to test a i2c-ali1535 patch if we come up with
> > > one?
> >
> > Yes. Send me a patch (or give a link) and I'll try what I can do to port
> > it. I ask if I have a question :)
>
> Do you mean revision 1.21 with date: 2005/03/27 02:22:10; author: mds? I
> checked and this one seems to be in current 2.6.15.1 kernel. I did not
> check if there were any omissions comparing with CVS but current kernel
> does contain and use ali1535_transaction() added by mentioned patch.
>


I compiled i2c-ali1535 with debugging. I have to types of errors. First block
is:

Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=14,
TYP=10, CMD=03, ADD=99, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
TYP=00, CMD=03, ADD=9a, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Error: no response or bus
collision ADD=9a
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Error: command never
completed
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=44,
TYP=00, CMD=03, ADD=9a, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
TYP=00, CMD=03, ADD=a0, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=14,
TYP=00, CMD=03, ADD=a0, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
TYP=00, CMD=03, ADD=a0, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=14,
TYP=00, CMD=03, ADD=a0, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
TYP=00, CMD=03, ADD=a2, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Error: no response or bus
collision ADD=a2
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Error: command never
completed
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=44,
TYP=00, CMD=03, ADD=a2, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
TYP=00, CMD=03, ADD=a4, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Error: no response or bus
collision ADD=a4
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Error: command never
completed
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=44,
TYP=00, CMD=03, ADD=a4, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
TYP=00, CMD=03, ADD=a6, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Error: no response or bus
collision ADD=a6
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Error: command never
completed
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=44,
TYP=00, CMD=03, ADD=a6, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
TYP=00, CMD=03, ADD=a8, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Error: no response or bus
collision ADD=a8
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Error: command never
completed
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=44,
TYP=00, CMD=03, ADD=a8, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
TYP=00, CMD=03, ADD=aa, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Error: no response or bus
collision ADD=aa
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Error: command never
completed
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=44,
TYP=00, CMD=03, ADD=aa, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
TYP=00, CMD=03, ADD=ac, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Error: no response or bus
collision ADD=ac
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Error: command never
completed
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=44,
TYP=00, CMD=03, ADD=ac, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
TYP=00, CMD=03, ADD=ae, DAT0=00, DAT1=10
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Error: no response or bus
collision ADD=ae
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Error: command never
completed
Jan 15 22:17:53 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=44,
TYP=00, CMD=03, ADD=ae, DAT0=00, DAT1=10
Jan 15 22:17:57 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
TYP=10, CMD=00, ADD=98, DAT0=00, DAT1=10
Jan 15 22:17:57 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=14,
TYP=10, CMD=00, ADD=98, DAT0=00, DAT1=10

this appears simply a probing for non-existent i2c ports (correct me if I am
wrong) presumably by eeprom driver.

Second block are errors from lm90 for different registers:

Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
TYP=10, CMD=01, ADD=99, DAT0=a0, DAT1=10
Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=14,
TYP=10, CMD=01, ADD=99, DAT0=29, DAT1=10
Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
TYP=10, CMD=08, ADD=98, DAT0=29, DAT1=10
Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Error: command never
completed
Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=04,
TYP=10, CMD=08, ADD=98, DAT0=29, DAT1=10
Jan 15 22:24:02 cooker kernel: lm90 0-004c: Register 0x8 read failed (-1)
Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
TYP=10, CMD=07, ADD=98, DAT0=29, DAT1=10
Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=14,
TYP=10, CMD=07, ADD=98, DAT0=29, DAT1=10

Here I do not see SMBus errors - it appears really that i2c device did not
respond. OTOH interesting is that there is no timeout. Apparently command
completed without setting DONE bit. As I have zero knowledge about hardware I
cannot interpret it. Next driver resets SMBus and it works for some time
again. Judging by comments in source, it apprently signifies hung ali1535,
not external i2c device (it is using KILL, and "this doesn't seem to clear
the controller if an external device is hung")

I am ready to test any patch.

- -andrey
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFDyqb3R6LMutpd94wRAkvsAJ4/nD91TVzezwLIIcRzasBMjVbvewCeKxqa
I563XEGbgfGG239rAQZzJ/A=
=E7Yd
-----END PGP SIGNATURE-----

2006-01-15 20:34:27

by Rudolf Marek

[permalink] [raw]
Subject: Re: [lm-sensors] 2.6.15: lm90 0-004c: Register 0x13 read failed (-1)

Hello all,

>
> this appears simply a probing for non-existent i2c ports (correct me if I am
> wrong) presumably by eeprom driver.

yes I think you are right. (ADD/2 is the address of chip, that it tries to access)

> Second block are errors from lm90 for different registers:
>
> Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
> TYP=10, CMD=01, ADD=99, DAT0=a0, DAT1=10
> Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=14,
> TYP=10, CMD=01, ADD=99, DAT0=29, DAT1=10
> Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
> TYP=10, CMD=08, ADD=98, DAT0=29, DAT1=10
> Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Error: command never
> completed
> Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=04,
> TYP=10, CMD=08, ADD=98, DAT0=29, DAT1=10
> Jan 15 22:24:02 cooker kernel: lm90 0-004c: Register 0x8 read failed (-1)
> Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
> TYP=10, CMD=07, ADD=98, DAT0=29, DAT1=10
> Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=14,
> TYP=10, CMD=07, ADD=98, DAT0=29, DAT1=10
>
> Here I do not see SMBus errors - it appears really that i2c device did not
> respond. OTOH interesting is that there is no timeout. Apparently command
> completed without setting DONE bit. As I have zero knowledge about hardware I
> cannot interpret it. Next driver resets SMBus and it works for some time
> again. Judging by comments in source, it apprently signifies hung ali1535,
> not external i2c device (it is using KILL, and "this doesn't seem to clear
> the controller if an external device is hung")

Well it seems this ali 15x3 has maybe same hardware bug? It was mentioned already here:
http://www2.lm-sensors.nu/~lm78/readticket.cgi?ticket=2030

> In the log below you can see that the ALI15X3 chip seems to keep in idle-state
> without reporting "done", but it does not turn in "busy" state. I patched the
> driver to do the reset procedure (with ALI15X3_T_OUT) after the error, but
> afterwards, the chip turns to "busy" state until next reboot.

And it continued:

http://lists.lm-sensors.org/pipermail/lm-sensors/2005-October/013808.html

I asked for a patch and what I have received like a month after is patch that works for them:

> Dear Rudolf,
>
> unfortunately i do not have cvs installed on my machine. I hope it's okay if
> i send you the complete patched module (the only file i changed was the
> i2c-ali15x3.c) so you can do the patch yourself. Since i'm not a experienced
> driver developer i do not know what you ment with your last sentence and i
> did not find any remarks on the website.
>
> However, feel free to contact me if you have still any questions.
>
> This version works fine and without any problems over many days in our test
> system.
>
> Regards,
> Claudio Klingler

I'm putting it into attachment. (this is against the lmsensors CVS so 2.4 driver)

Since I dont own the motherboard with this chip (nor the datasheet) and the resulting driver was hard to read I just left this issue.
I hope it can help now.

Regards
Rudolf


Attachments:
i2c-ali15x3.c (16.59 kB)

2006-01-15 20:58:47

by Andrey Borzenkov

[permalink] [raw]
Subject: Re: [lm-sensors] 2.6.15: lm90 0-004c: Register 0x13 read failed (-1)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sunday 15 January 2006 23:33, Rudolf Marek wrote:
>
> Well it seems this ali 15x3 has maybe same hardware bug? It was mentioned
> already here: http://www2.lm-sensors.nu/~lm78/readticket.cgi?ticket=2030
>
> > In the log below you can see that the ALI15X3 chip seems to keep in
> > idle-state without reporting "done", but it does not turn in "busy"
> > state. I patched the driver to do the reset procedure (with
> > ALI15X3_T_OUT) after the error, but afterwards, the chip turns to "busy"
> > state until next reboot.
>

This is already done in i2c-ali1535 in current kernel. So it looks like HW
issue that can be ignored at best. After reset SMBus continues to work. The
only question is, should we provide an option to shut up those errors;
assuming user knows (s)he has buggy controller there is no reason to spam
dmesg with known issue. Will patch be accepted? I will emit first occurence
of this error to let users know something is fishy and supress further ones.
But this has to wait for next week, it is already too late here.

Thank you for information

- -andrey
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFDyrd9R6LMutpd94wRAuNVAKCwq+yTwvFt6jYLS1wL5pIDr68IMwCbBHb+
yXAnHp+jzVFW1ddKVbZVkY8=
=ABky
-----END PGP SIGNATURE-----

2006-01-16 19:40:21

by Andrey Borzenkov

[permalink] [raw]
Subject: Re: [lm-sensors] 2.6.15: lm90 0-004c: Register 0x13 read failed (-1)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sunday 15 January 2006 23:33, Rudolf Marek wrote:
> Well it seems this ali 15x3 has maybe same hardware bug? It was mentioned
> already here: http://www2.lm-sensors.nu/~lm78/readticket.cgi?ticket=2030
>
[...]
> Since I dont own the motherboard with this chip (nor the datasheet) and the
> resulting driver was hard to read I just left this issue. I hope it can
> help now.

Actually it did. I realized that 15x3 you sent attempted recovery while
current 1535 not. After some experiments I came up with this patch (it is not
meant for inclusion but only for discussion) that seems to work. I had hard
rime finding the exact place where to retry command but now I get

Jan 16 22:20:14 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
TYP=10, CMD=01, ADD=99, DAT0=05, DAT1=10
Jan 16 22:20:14 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=14,
TYP=10, CMD=01, ADD=99, DAT0=2c, DAT1=10
Jan 16 22:20:14 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=04,
TYP=10, CMD=10, ADD=98, DAT0=2c, DAT1=10
Jan 16 22:20:14 cooker kernel: i2c_adapter i2c-0: Error: command never
completed
Jan 16 22:20:14 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=04,
TYP=10, CMD=10, ADD=98, DAT0=2c, DAT1=10
Jan 16 22:20:14 cooker kernel: i2c_adapter i2c-0: Adapter hung, retrying after
reset
Jan 16 22:20:14 cooker kernel: i2c_adapter i2c-0: Transaction (pre): STS=00,
TYP=00, CMD=10, ADD=98, DAT0=2c, DAT1=10
Jan 16 22:20:14 cooker kernel: i2c_adapter i2c-0: Transaction (post): STS=14,
TYP=00, CMD=10, ADD=98, DAT0=2c, DAT1=10

so it appears to recover nicely. Does it look like it returns correct value
after retry?

I intend to squash errors, leaving only the first occurence but making it more
verbose. Probably:

Error: command never completed. It is probably hardware bug
Command will be retried after controller is reset
further occurences of this error won't be reported as long as retry is
sucessful

is wording OK (I am not native english speaker)?

regards

- -andrey

- --- linux-2.6.15/drivers/i2c/busses/i2c-ali1535.c 2006-01-03
06:21:10.000000000 +0300
+++ i2c-ali1535.c 2006-01-16 22:22:51.000000000 +0300
@@ -311,8 +311,8 @@ static int ali1535_transaction(struct i2
}

/* check to see if the "command complete" indication is set */
- - if (!(temp & ALI1535_STS_DONE)) {
- - result = -1;
+ if (!result && !(temp & ALI1535_STS_DONE)) {
+ result = -2;
dev_err(&adap->dev, "Error: command never completed\n");
}

@@ -344,6 +344,7 @@ static s32 ali1535_access(struct i2c_ada
int temp;
int timeout;
s32 result = 0;
+ int retry = 1;

down(&i2c_ali1535_sem);
/* make sure SMBus is idle */
@@ -360,6 +361,7 @@ static s32 ali1535_access(struct i2c_ada
/* clear status register (clear-on-write) */
outb_p(0xFF, SMBHSTSTS);

+retry:
switch (size) {
case I2C_SMBUS_PROC_CALL:
dev_err(&adap->dev, "I2C_SMBUS_PROC_CALL not supported!\n");
@@ -424,7 +426,14 @@ static s32 ali1535_access(struct i2c_ada
break;
}

- - if (ali1535_transaction(adap)) {
+ if (((result = ali1535_transaction(adap)) == -2) && retry--) {
+ /* Adapter hung and was reset; retry */
+ dev_dbg(&adap->dev, "Adapter hung, retrying after reset\n");
+ result = 0;
+ goto retry;
+ }
+
+ if (result) {
/* Error in transaction */
result = -1;
goto EXIT;

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD4DBQFDy/aZR6LMutpd94wRAoBlAJ0ZLlhPMIBC5Fmz0Iw4NBoNjM7wfwCUCB0t
+sFjdErqBnZatcpLmiPTKA==
=MW/Q
-----END PGP SIGNATURE-----

2006-01-16 21:18:09

by Rudolf Marek

[permalink] [raw]
Subject: Re: [lm-sensors] 2.6.15: lm90 0-004c: Register 0x13 read failed (-1)

Hello all,

> Actually it did. I realized that 15x3 you sent attempted recovery while
> current 1535 not. After some experiments I came up with this patch (it is not
> meant for inclusion but only for discussion) that seems to work. I had hard
> rime finding the exact place where to retry command but now I get
>
> so it appears to recover nicely. Does it look like it returns correct value
> after retry?

This can be possible. I may know why this is happening. I have now the datasheets
for both ali 1535 and 15x3. I found that there are special bits that are used to
control somehow when bus is considered idle.

Those bits are in PCI config space of same device as the smbus base addr is.

For the ali 15x3 the register is located at 0xe2 and bits are:

Bit Description
7-5 (001b) SMB Clock Select.
[7:5] : "clock"
000 : 149K
001 : 74K (recommended)
010 : 37K
100 : 223K
101 : 111K
110 : 55K
These three bits are used to select the base clock for internal state machine. All the
timings will be based on this clock. The clock is derived from OSC14M.
4-3 (0h) Idle Delay Setting.
[4:3] : "idle time"
00 : BaseClk*64 53.76 us ref. 1.19M base clock. (default)
01 : BaseClk*32
10 : BaseClk*128
Others : Reserved
These two bits are used to decide the idle time to qualify SMBus is in idle state. The
time is calculated based on the base clock defined in bits[7:5].
2-0 (0h) Reserved.

For the 1535 is the register offset 0xF2

Bit Description
7-5 (001) The base clock referenced by the SMB host controller.
000: 149K.
001: 74K.
010: 31K.
100: 223K.
101: 111K.
110: 55K.
4-3 Bus Delay Timer Setting. The base clock is set in the previous field. This timer decides
when the SMB bus is actually idle.
00: Base Clock × 4.
01: Base Clock × 2.
10: Base Clock × 8.
11: Reserved.
2-0 Reserved.

What is interresting both drivers sets this to 0x20, overwriting two reserved bits - this is no good.
/* set SMB clock to 74KHz as recommended in data sheet */
pci_write_config_byte(dev, SMBCLK, 0x20);

Andrey and Claudio,
please can you send back output of lscpi -d 10b9:7101 -x -x -x before you load the ali driver?

Also you both can try to change the delay a bit, after the driver loads (or kill the above line that sets it).

for andrey (1535): setpci -d 10b9:7101 f2.b=28
(this should set it to base*8)

for Claudio:
I dont know if you want to dig into this, but if you want so please try with such driver that reports that it reset the controller.
setpci -d 10b9:7101 e2.b=28
(this should set it to base*128)

when done please load your chip device driver and let it run, observe if it resets more or less often. You may play with the smbus clock too if you want.
I hope this helps.

> I intend to squash errors, leaving only the first occurence but making it more
> verbose. Probably:
>
> Error: command never completed. It is probably hardware bug
> Command will be retried after controller is reset
> further occurences of this error won't be reported as long as retry is
> sucessful
>
> is wording OK (I am not native english speaker)?

I guess best would be to to emit some kind of error after all retries, but question
is how to do it cleanly.


> regards
>
> -andrey
>
> --- linux-2.6.15/drivers/i2c/busses/i2c-ali1535.c 2006-01-03
> 06:21:10.000000000 +0300
> +++ i2c-ali1535.c 2006-01-16 22:22:51.000000000 +0300
> @@ -311,8 +311,8 @@ static int ali1535_transaction(struct i2
> }
>
> /* check to see if the "command complete" indication is set */
> - if (!(temp & ALI1535_STS_DONE)) {
> - result = -1;
> + if (!result && !(temp & ALI1535_STS_DONE)) {
> + result = -2;
> dev_err(&adap->dev, "Error: command never completed\n");

Perhaps this dev_err can be move down

> }
>
> @@ -344,6 +344,7 @@ static s32 ali1535_access(struct i2c_ada
> int temp;
> int timeout;
> s32 result = 0;
> + int retry = 1;
>
> down(&i2c_ali1535_sem);
> /* make sure SMBus is idle */
> @@ -360,6 +361,7 @@ static s32 ali1535_access(struct i2c_ada
> /* clear status register (clear-on-write) */
> outb_p(0xFF, SMBHSTSTS);
>
> +retry:
> switch (size) {
> case I2C_SMBUS_PROC_CALL:
> dev_err(&adap->dev, "I2C_SMBUS_PROC_CALL not supported!\n");
> @@ -424,7 +426,14 @@ static s32 ali1535_access(struct i2c_ada
> break;
> }
>
> - if (ali1535_transaction(adap)) {
> + if (((result = ali1535_transaction(adap)) == -2) && retry--) {
> + /* Adapter hung and was reset; retry */
> + dev_dbg(&adap->dev, "Adapter hung, retrying after reset\n");
> + result = 0;
> + goto retry;
> + }
> +
> + if (result) {

perhaps here to test if result is -2 and tell user that never completed?

> /* Error in transaction */
> result = -1;
> goto EXIT;
>

Thats all from me,

regards
Rudolf

2006-01-21 21:03:19

by Andrey Borzenkov

[permalink] [raw]
Subject: Re: [lm-sensors] 2.6.15: lm90 0-004c: Register 0x13 read failed (-1)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tuesday 17 January 2006 00:17, Rudolf Marek wrote:
> Hello all,
>
> > Actually it did. I realized that 15x3 you sent attempted recovery while
> > current 1535 not. After some experiments I came up with this patch (it is
> > not meant for inclusion but only for discussion) that seems to work. I
> > had hard rime finding the exact place where to retry command but now I
> > get
> >
> > so it appears to recover nicely. Does it look like it returns correct
> > value after retry?
>
> This can be possible. I may know why this is happening. I have now the
> datasheets for both ali 1535 and 15x3.

Any chance I can see it (for 1535)?

> I found that there are special bits
> that are used to control somehow when bus is considered idle.
>
> Those bits are in PCI config space of same device as the smbus base addr
> is.
>
> For the ali 15x3 the register is located at 0xe2 and bits are:
>
> Bit Description
> 7-5 (001b) SMB Clock Select.
> [7:5] : "clock"
> 000 : 149K
> 001 : 74K (recommended)
> 010 : 37K
> 100 : 223K
> 101 : 111K
> 110 : 55K
> These three bits are used to select the base clock for internal
> state machine. All the timings will be based on this clock. The clock is
> derived from OSC14M. 4-3 (0h) Idle Delay Setting.
> [4:3] : "idle time"
> 00 : BaseClk*64 53.76 us ref. 1.19M base clock. (default)
> 01 : BaseClk*32
> 10 : BaseClk*128
> Others : Reserved
> These two bits are used to decide the idle time to qualify SMBus
> is in idle state. The time is calculated based on the base clock defined in
> bits[7:5]. 2-0 (0h) Reserved.
>
> For the 1535 is the register offset 0xF2
>
> Bit Description
> 7-5 (001) The base clock referenced by the SMB host controller.
> 000: 149K.
> 001: 74K.
> 010: 31K.
> 100: 223K.
> 101: 111K.
> 110: 55K.
> 4-3 Bus Delay Timer Setting. The base clock is set in the previous
> field. This timer decides when the SMB bus is actually idle.
> 00: Base Clock × 4.
> 01: Base Clock × 2.
> 10: Base Clock × 8.
> 11: Reserved.
> 2-0 Reserved.
>
> What is interresting both drivers sets this to 0x20, overwriting two
> reserved bits - this is no good.

Fixed in attached patch.

> /* set SMB clock to 74KHz as recommended
> in data sheet */
> pci_write_config_byte(dev, SMBCLK, 0x20);
>
> Andrey and Claudio,
> please can you send back output of lscpi -d 10b9:7101 -x -x -x before you
> load the ali driver?
>

It is exactly 0x20 as set by driver anyway.

> Also you both can try to change the delay a bit, after the driver loads (or
> kill the above line that sets it).
>
> for andrey (1535): setpci -d 10b9:7101 f2.b=28
> (this should set it to base*8)
>

This did not completely eliminated problems but made them far less frequent
then before. Combining with patch below it results in something like:

i2c_adapter i2c-0: Error: adapter did not idle after transaction
i2c_adapter i2c-0: adapter not idle before command; retrying
i2c_adapter i2c-0: adapter not idle before command; retrying
i2c_adapter i2c-0: Error: command never completed
i2c_adapter i2c-0: Adapter hung, retrying after reset
i2c_adapter i2c-0: Error: adapter did not idle after transaction
i2c_adapter i2c-0: adapter not idle before command; retrying
i2c_adapter i2c-0: adapter not idle before command; retrying
i2c_adapter i2c-0: adapter not idle before command; retrying
i2c_adapter i2c-0: Failed to execute command after 3 retries status: 00
lm90 0-004c: Register 0x1 read failed (-1)
i2c_adapter i2c-0: adapter not idle before command; retrying
i2c_adapter i2c-0: adapter not idle before command; retrying
i2c_adapter i2c-0: Error: command never completed
i2c_adapter i2c-0: Adapter hung, retrying after reset

So sometimes it still could not be recovered. Unfortunately I cannot do much
at this stage without having data sheet, as everything else is just a
guesswork.

> I guess best would be to to emit some kind of error after all retries, but
> question is how to do it cleanly.
>

below is peroposed patch. After it has been sufficiently dicussed and tested I
intend to replace most of dev_{err,info,warn} in retry path with dev_dbg and
leave only one message after retry failed. Could you comment on it now you
have datasheet? Is there better way to reset adapter after error?

regards

- -andrey


Subject: [PATCH] ali1535 error recovery cleanup, PCI config fix

- - fix interpretation of BUSY flag. Old code apparently assumed it was
asserted during transaction was active. My test shows it is asserted
in response to transaction start command if it could not be initiated
sucessfully

- - introduced retry logic. If transaction did not complete, retry. Same goes
for waiting for idle condition. Number of retries is rather arbitrary.

- - preseve reserved bits in PCI config byte f2 (clock timing), suggested
by Rudolf Marek

- - set bus delay multiplier to x8, suggested by Rudolf Marek

- - restructured overall code; after that is actually became very similar
to patch for ali15x3 from Claudion Klinger, except I try to insure
adapter is in sane state before command is started

Signed-off-by: Andrey Borzenkov <[email protected]>

- ---

drivers/i2c/busses/i2c-ali1535.c | 203
++++++++++++++++++++------------------
1 files changed, 109 insertions(+), 94 deletions(-)

10f6fcf00b69ea2861eb2321e2f4d204a9fcfa2c
diff --git a/drivers/i2c/busses/i2c-ali1535.c
b/drivers/i2c/busses/i2c-ali1535.c
index 3eb4789..ae48573 100644
- --- a/drivers/i2c/busses/i2c-ali1535.c
+++ b/drivers/i2c/busses/i2c-ali1535.c
@@ -50,7 +50,6 @@
This driver does not use interrupts.
*/

- -
/* Note: we assume there can only be one ALI1535, with one SMBus interface */

#include <linux/module.h>
@@ -86,6 +85,9 @@

/* Other settings */
#define MAX_TIMEOUT 500 /* times 1/100 sec */
+#define MAX_RETRIES 3 /* times to retry hung transaction */
+#define STATUS_SET 1
+#define STATUS_UNSET 0
#define ALI1535_SMB_IOSIZE 32

#define ALI1535_SMB_DEFAULTBASE 0x8040
@@ -138,6 +140,22 @@ static struct pci_driver ali1535_driver;
static unsigned short ali1535_smba;
static DECLARE_MUTEX(i2c_ali1535_sem);

+static inline s32 ali1535_wait_for_status(int set, int status)
+{
+ int timeout = 0;
+ int temp = 0;
+
+ /* clear status register (clear-on-write) */
+ outb_p(0xFF, SMBHSTSTS);
+ do {
+ msleep(1);
+ timeout += 1;
+ temp = inb_p(SMBHSTSTS);
+ } while (!!(temp & status) != set && timeout <= MAX_TIMEOUT);
+
+ return temp;
+}
+
/* Detect whether a ALI1535 can be found, and initialize it, where necessary.
Note the differences between kernels with the old PCI BIOS interface and
newer kernels with the real PCI interface. In compat.h some things are
@@ -184,7 +202,11 @@ static int ali1535_setup(struct pci_dev
}

/* set SMB clock to 74KHz as recommended in data sheet */
- - pci_write_config_byte(dev, SMBCLK, 0x20);
+ /* set bus delay multiplier to x8 as suggested by Rudolf Marek
+ * also preserve reserved bits (also from Rudolf Marek)
+ */
+ pci_read_config_byte(dev, SMBCLK, &temp);
+ pci_write_config_byte(dev, SMBCLK, (temp & 3) | 0x28);

/*
The interrupt routing for SMB is set up in register 0x77 in the
@@ -210,83 +232,18 @@ static int ali1535_transaction(struct i2
{
int temp;
int result = 0;
- - int timeout = 0;

dev_dbg(&adap->dev, "Transaction (pre): STS=%02x, TYP=%02x, "
"CMD=%02x, ADD=%02x, DAT0=%02x, DAT1=%02x\n",
inb_p(SMBHSTSTS), inb_p(SMBHSTTYP), inb_p(SMBHSTCMD),
inb_p(SMBHSTADD), inb_p(SMBHSTDAT0), inb_p(SMBHSTDAT1));

- - /* get status */
- - temp = inb_p(SMBHSTSTS);
- -
- - /* Make sure the SMBus host is ready to start transmitting */
- - /* Check the busy bit first */
- - if (temp & ALI1535_STS_BUSY) {
- - /* If the host controller is still busy, it may have timed out
- - * in the previous transaction, resulting in a "SMBus Timeout"
- - * printk. I've tried the following to reset a stuck busy bit.
- - * 1. Reset the controller with an KILL command. (this
- - * doesn't seem to clear the controller if an external
- - * device is hung)
- - * 2. Reset the controller and the other SMBus devices with a
- - * T_OUT command. (this clears the host busy bit if an
- - * external device is hung, but it comes back upon a new
- - * access to a device)
- - * 3. Disable and reenable the controller in SMBHSTCFG. Worst
- - * case, nothing seems to work except power reset.
- - */
- -
- - /* Try resetting entire SMB bus, including other devices - This
- - * may not work either - it clears the BUSY bit but then the
- - * BUSY bit may come back on when you try and use the chip
- - * again. If that's the case you are stuck.
- - */
- - dev_info(&adap->dev,
- - "Resetting entire SMB Bus to clear busy condition (%02x)\n",
- - temp);
- - outb_p(ALI1535_T_OUT, SMBHSTTYP);
- - temp = inb_p(SMBHSTSTS);
- - }
- -
- - /* now check the error bits and the busy bit */
- - if (temp & (ALI1535_STS_ERR | ALI1535_STS_BUSY)) {
- - /* do a clear-on-write */
- - outb_p(0xFF, SMBHSTSTS);
- - if ((temp = inb_p(SMBHSTSTS)) &
- - (ALI1535_STS_ERR | ALI1535_STS_BUSY)) {
- - /* This is probably going to be correctable only by a
- - * power reset as one of the bits now appears to be
- - * stuck */
- - /* This may be a bus or device with electrical problems. */
- - dev_err(&adap->dev,
- - "SMBus reset failed! (0x%02x) - controller or "
- - "device on bus is probably hung\n", temp);
- - return -1;
- - }
- - } else {
- - /* check and clear done bit */
- - if (temp & ALI1535_STS_DONE) {
- - outb_p(temp, SMBHSTSTS);
- - }
- - }
- -
/* start the transaction by writing anything to the start register */
outb_p(0xFF, SMBHSTPORT);

/* We will always wait for a fraction of a second! */
- - timeout = 0;
- - do {
- - msleep(1);
- - temp = inb_p(SMBHSTSTS);
- - } while (((temp & ALI1535_STS_BUSY) && !(temp & ALI1535_STS_IDLE))
- - && (timeout++ < MAX_TIMEOUT));
- -
- - /* If the SMBus is still busy, we give up */
- - if (timeout >= MAX_TIMEOUT) {
- - result = -1;
- - dev_err(&adap->dev, "SMBus Timeout!\n");
- - }
+ temp = ali1535_wait_for_status(STATUS_SET,
+ ALI1535_STS_ERR | ALI1535_STS_DONE | ALI1535_STS_BUSY);

if (temp & ALI1535_STS_FAIL) {
result = -1;
@@ -311,9 +268,16 @@ static int ali1535_transaction(struct i2
}

/* check to see if the "command complete" indication is set */
- - if (!(temp & ALI1535_STS_DONE)) {
- - result = -1;
- - dev_err(&adap->dev, "Error: command never completed\n");
+ if (!result) {
+ if (temp & ALI1535_STS_BUSY) {
+ result = -2;
+ dev_err(&adap->dev, "Error: adapter busy\n");
+ } else if (!(temp & ALI1535_STS_DONE)) {
+ result = -2;
+ dev_err(&adap->dev, "Error: command never completed\n");
+ }
+ if (!(temp & ALI1535_STS_IDLE))
+ dev_err(&adap->dev, "Error: adapter did not idle after transaction\n");
}

dev_dbg(&adap->dev, "Transaction (post): STS=%02x, TYP=%02x, "
@@ -321,41 +285,83 @@ static int ali1535_transaction(struct i2
inb_p(SMBHSTSTS), inb_p(SMBHSTTYP), inb_p(SMBHSTCMD),
inb_p(SMBHSTADD), inb_p(SMBHSTDAT0), inb_p(SMBHSTDAT1));

- - /* take consequent actions for error conditions */
- - if (!(temp & ALI1535_STS_DONE)) {
- - /* issue "kill" to reset host controller */
- - outb_p(ALI1535_KILL,SMBHSTTYP);
- - outb_p(0xFF,SMBHSTSTS);
- - } else if (temp & ALI1535_STS_ERR) {
- - /* issue "timeout" to reset all devices on bus */
- - outb_p(ALI1535_T_OUT,SMBHSTTYP);
- - outb_p(0xFF,SMBHSTSTS);
- - }
- -
return result;
}

+static void ali1535_reset(struct i2c_adapter *adap)
+{
+ int temp = inb_p(SMBHSTSTS);
+
+ dev_dbg(&adap->dev, "reset(pre): STS=%02x\n", temp);
+
+ /* If the host controller is still busy, it may have timed out
+ * in the previous transaction, resulting in a "SMBus Timeout"
+ * printk. I've tried the following to reset a stuck busy bit.
+ * 1. Reset the controller with an KILL command. (this
+ * doesn't seem to clear the controller if an external
+ * device is hung)
+ * 2. Reset the controller and the other SMBus devices with a
+ * T_OUT command. (this clears the host busy bit if an
+ * external device is hung, but it comes back upon a new
+ * access to a device)
+ * 3. Disable and reenable the controller in SMBHSTCFG. Worst
+ * case, nothing seems to work except power reset.
+ */
+
+ /* Try resetting entire SMB bus, including other devices - This
+ * may not work either - it clears the BUSY bit but then the
+ * BUSY bit may come back on when you try and use the chip
+ * again. If that's the case you are stuck.
+ */
+
+ if ((temp & ALI1535_STS_ERR) || !(temp & ALI1535_STS_IDLE))
+ outb_p(ALI1535_T_OUT, SMBHSTTYP);
+ else if (!(temp & ALI1535_STS_DONE))
+ outb_p(ALI1535_KILL, SMBHSTTYP);
+
+ dev_dbg(&adap->dev, "reset(post): STS=%02x\n", inb_p(SMBHSTSTS));
+}
+
+static inline s32 ali1535_wait_for_idle(struct i2c_adapter *adap)
+{
+ int temp;
+
+ temp = inb_p(SMBHSTSTS);
+
+ dev_dbg(&adap->dev, "wait_for_idle(pre): STS=%02x\n", temp);
+
+ temp = ali1535_wait_for_status(STATUS_SET, ALI1535_STS_IDLE);
+
+ dev_dbg(&adap->dev, "wait_for_idle(post): STS=%02x\n", temp);
+
+ return !(temp & ALI1535_STS_IDLE);
+}
+
/* Return -1 on error. */
static s32 ali1535_access(struct i2c_adapter *adap, u16 addr,
unsigned short flags, char read_write, u8 command,
int size, union i2c_smbus_data *data)
{
int i, len;
- - int temp;
- - int timeout;
s32 result = 0;
+ int retry = 0;

down(&i2c_ali1535_sem);
+retry:
+ if (retry >= MAX_RETRIES) {
+ dev_err(&adap->dev, "Failed to execute command after %d retries"
+ " status: %02x\n", MAX_RETRIES, inb_p(SMBHSTSTS));
+ result = -1;
+ goto EXIT;
+ }
+
/* make sure SMBus is idle */
- - temp = inb_p(SMBHSTSTS);
- - for (timeout = 0;
- - (timeout < MAX_TIMEOUT) && !(temp & ALI1535_STS_IDLE);
- - timeout++) {
- - msleep(1);
- - temp = inb_p(SMBHSTSTS);
+ if (ali1535_wait_for_idle(adap)) {
+ dev_warn(&adap->dev, "adapter not idle before command; retrying\n");
+ retry++;
+ ali1535_reset(adap);
+ goto retry;
}
- - if (timeout >= MAX_TIMEOUT)
- - dev_warn(&adap->dev, "Idle wait Timeout! STS=0x%02x\n", temp);

/* clear status register (clear-on-write) */
outb_p(0xFF, SMBHSTSTS);
@@ -424,7 +430,16 @@ static s32 ali1535_access(struct i2c_ada
break;
}

- - if (ali1535_transaction(adap)) {
+ if ((result = ali1535_transaction(adap)) == -2) {
+ /* Adapter hung and was reset; retry */
+ dev_err(&adap->dev, "Adapter hung, retrying after reset\n");
+ result = 0;
+ retry++;
+ ali1535_reset(adap);
+ goto retry;
+ }
+
+ if (result) {
/* Error in transaction */
result = -1;
goto EXIT;
- --
1.1.3
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFD0qF0R6LMutpd94wRAqJZAKCGgz7BKIZBsZFZWy4xUdPnidt3AgCfRrA9
3vT8vnL7YWJf2iOGBF1I9RI=
=ZFOW
-----END PGP SIGNATURE-----

2006-01-27 04:15:21

by Andrey Borzenkov

[permalink] [raw]
Subject: Re: [lm-sensors] 2.6.15: lm90 0-004c: Register 0x13 read failed (-1)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sunday 15 January 2006 23:33, Rudolf Marek wrote:
> Hello all,
>
> > this appears simply a probing for non-existent i2c ports (correct me if I
> > am wrong) presumably by eeprom driver.
>
> yes I think you are right. (ADD/2 is the address of chip, that it tries to
> access)
>
> > Second block are errors from lm90 for different registers:
> >
> > Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Transaction (pre):
> > STS=04, TYP=10, CMD=01, ADD=99, DAT0=a0, DAT1=10
> > Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Transaction (post):
> > STS=14, TYP=10, CMD=01, ADD=99, DAT0=29, DAT1=10
> > Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Transaction (pre):
> > STS=04, TYP=10, CMD=08, ADD=98, DAT0=29, DAT1=10
> > Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Error: command never
> > completed
> > Jan 15 22:24:02 cooker kernel: i2c_adapter i2c-0: Transaction (post):
> > STS=04, TYP=10, CMD=08, ADD=98, DAT0=29, DAT1=10
> > Jan 15 22:24:02 cooker kernel: lm90 0-004c: Register 0x8 read failed (-1)

I still did not have much time to spend on it but booting today I suddenly got

i2c_adapter i2c-0: Unsupported chip (man_id=0x41, chip_id=0x42).

I begin to suspect that it is still lm90 (at least partly). Transacton did not
fail (otherwise we were not here) but returned some strange value. Anyone
knows if such chip really exits?

TIA

- -andrey
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFD2Z5UR6LMutpd94wRAmaYAKCAdwCutdUWK+RFbQu9nMiLuIl6jACdGgj9
IHiDsWm37Xr4UWmQYbvwIOk=
=a1Ao
-----END PGP SIGNATURE-----