On Fri, Sep 15, 2023 at 07:28:57AM -0700, Doug Smythies wrote:
> Kernel 6.6-rc1 has an error during boot. The guilty commit is:
> b7f1f7b2523a6a4382f12fe953380b847b80e09d
> hwmon: (nct6775) Additional TEMP registers for nct6799
>
> There seems to be confusion between the indexes into
> the NCT6799_ALARM_BITS array or the
> NCT6779_ALARM_BITS array. I do not understand the code
> and do not know if it is the indexing that is reversed or the
> wrong table is being used.
>
> The error from kern.log (edited):
>
> ================================================================================
> UBSAN: shift-out-of-bounds in drivers/hwmon/nct6775-core.c:1757:39
> shift exponent -1 is negative
> CPU: 9 PID: 822 Comm: sensors Not tainted 6.6.0-rc1-stock2 #1165
> Hardware name: ASUS System Product Name/PRIME Z490-A, BIOS 9902 09/15/2021
> Call Trace:
> <TASK>
> dump_stack_lvl+0x48/0x70
> dump_stack+0x10/0x20
> ubsan_epilogue+0x9/0x40
> __ubsan_handle_shift_out_of_bounds+0x10f/0x170
> ...
>
> I added a "pr_info" line (in the below it was as of the prior commit,
> 43fbe66dc216 hwmon: Add driver for Renesas HS3001):
>
> doug@s19:~/kernel/linux$ git diff
> diff --git a/drivers/hwmon/nct6775-core.c b/drivers/hwmon/nct6775-core.c
> index 33533d95cf48..12e3df84c034 100644
> --- a/drivers/hwmon/nct6775-core.c
> +++ b/drivers/hwmon/nct6775-core.c
> @@ -1727,6 +1727,7 @@ nct6775_show_alarm(struct device *dev, struct device_attribute *attr, char *buf)
> return PTR_ERR(data);
>
> nr = data->ALARM_BITS[sattr->index];
> + pr_info("doug: nr: %d ; index %d\n", nr, sattr->index);
> return sprintf(buf, "%u\n",
> (unsigned int)((data->alarms >> nr) & 0x01));
> }
>
> And for b7f1f7b2523a got (edited):
>
> nct6775_core: doug: nr: 0 ; index 0
> nct6775_core: doug: nr: 1 ; index 1
> nct6775_core: doug: nr: 2 ; index 2
> nct6775_core: doug: nr: 3 ; index 3
> nct6775_core: doug: nr: 8 ; index 4
> nct6775_core: doug: nr: -1 ; index 5
> ================================================================================
> UBSAN: shift-out-of-bounds in drivers/hwmon/nct6775-core.c:1758:39
> shift exponent -1 is negative
> ...
> nct6775_core: doug: nr: 20 ; index 6
> nct6775_core: doug: nr: 16 ; index 7
> nct6775_core: doug: nr: 17 ; index 8
> nct6775_core: doug: nr: 24 ; index 9
> nct6775_core: doug: nr: 25 ; index 10
> nct6775_core: doug: nr: 26 ; index 11
> nct6775_core: doug: nr: 27 ; index 12
> nct6775_core: doug: nr: 28 ; index 13
> nct6775_core: doug: nr: 29 ; index 14
> nct6775_core: doug: nr: 6 ; index 24
> nct6775_core: doug: nr: 7 ; index 25
> nct6775_core: doug: nr: 11 ; index 26
> nct6775_core: doug: nr: 10 ; index 27
> nct6775_core: doug: nr: 23 ; index 28
> nct6775_core: doug: nr: 33 ; index 29
> nct6775_core: doug: nr: 12 ; index 48
> nct6775_core: doug: nr: 9 ; index 49
>
> Observe that the table seems to be
> NCT6799_ALARM_BITS
> But the indexes seem to be for
> NCT6779_ALARM_BITS
>
> static const s8 NCT6799_ALARM_BITS[NUM_ALARM_BITS] = {
> 0, 1, 2, 3, 8, -1, 20, 16, 17, 24, 25, 26, /* in0-in11 */
> 27, 28, 29, 30, 31, -1, -1, -1, -1, -1, -1, -1, /* in12-in23 */
> 6, 7, 11, 10, 23, 33, -1, -1, -1, -1, -1, -1, /* fan1-fan12 */
> 4, 5, 40, 41, 42, 43, 44, -1, -1, -1, -1, -1, /* temp1-temp12 */
> 12, 9, /* intr0-intr1 */
> };
>
> Now repeat the test as of 43fbe66dc216:
>
> nct6775_core: doug: nr: 0 ; index 0
> nct6775_core: doug: nr: 1 ; index 1
> nct6775_core: doug: nr: 2 ; index 2
> nct6775_core: doug: nr: 3 ; index 3
> nct6775_core: doug: nr: 8 ; index 4
> nct6775_core: doug: nr: 21 ; index 5
> nct6775_core: doug: nr: 20 ; index 6
> nct6775_core: doug: nr: 16 ; index 7
> nct6775_core: doug: nr: 17 ; index 8
> nct6775_core: doug: nr: 24 ; index 9
> nct6775_core: doug: nr: 25 ; index 10
> nct6775_core: doug: nr: 26 ; index 11
> nct6775_core: doug: nr: 27 ; index 12
> nct6775_core: doug: nr: 28 ; index 13
> nct6775_core: doug: nr: 29 ; index 14
> nct6775_core: doug: nr: 6 ; index 24
> nct6775_core: doug: nr: 7 ; index 25
> nct6775_core: doug: nr: 11 ; index 26
> nct6775_core: doug: nr: 10 ; index 27
> nct6775_core: doug: nr: 23 ; index 28
> nct6775_core: doug: nr: 33 ; index 29
> nct6775_core: doug: nr: 12 ; index 48
> nct6775_core: doug: nr: 9 ; index 49
>
> Observe that the table seems to be
> NCT6779_ALARM_BITS
> And the indexing seems to be for that
> Table.
>
> static const s8 NCT6779_ALARM_BITS[NUM_ALARM_BITS] = {
> 0, 1, 2, 3, 8, 21, 20, 16, 17, 24, 25, 26, /* in0-in11 */
> 27, 28, 29, -1, -1, -1, -1, -1, -1, -1, -1, -1, /* in12-in23 */
> 6, 7, 11, 10, 23, -1, -1, -1, -1, -1, -1, -1, /* fan1-fan12 */
> 4, 5, 13, -1, -1, -1, -1, -1, -1, -1, -1, -1, /* temp1-temp12 */
> 12, 9, /* intr0-intr1 */
> };
>
> You probably need this information:
> nct6775: Found NCT6798D or compatible chip at 0x2e:0x290
>
Thanks for the regression report. I'm adding it to regzbot:
#regzbot ^introduced: b7f1f7b2523a6a
#regzbot title: shift out-of-bounds in nct6799's additional TEMP registers
--
An old man doll... just what I always wanted! - Clara
[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]
On 16.09.23 11:20, Bagas Sanjaya wrote:
> On Fri, Sep 15, 2023 at 07:28:57AM -0700, Doug Smythies wrote:
>> Kernel 6.6-rc1 has an error during boot. The guilty commit is:
>> b7f1f7b2523a6a4382f12fe953380b847b80e09d
>> hwmon: (nct6775) Additional TEMP registers for nct6799
> [...]
> Thanks for the regression report. I'm adding it to regzbot:
>
> #regzbot ^introduced: b7f1f7b2523a6a
> #regzbot title: shift out-of-bounds in nct6799's additional TEMP registers
Fix doesn't use a link regzbot is able to handle, thus point to it manually:
#regzbot fix: hwmon: (nct6775) Fix non-existent ALARM warning
#regzbot ignore-activity
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.