2021-05-11 22:53:46

by syzbot

[permalink] [raw]
Subject: [syzbot] KASAN: out-of-bounds Read in i801_isr

Hello,

syzbot found the following issue on:

HEAD commit: e1381380 Merge git://git.kernel.org/pub/scm/linux/kernel/g..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10be778ad00000
kernel config: https://syzkaller.appspot.com/x/.config?x=5adab0bdee099d7a
dashboard link: https://syzkaller.appspot.com/bug?extid=b4d3fd1dfd53e90afd79
userspace arch: i386

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

==================================================================
BUG: KASAN: out-of-bounds in i801_isr_byte_done drivers/i2c/busses/i2c-i801.c:608 [inline]
BUG: KASAN: out-of-bounds in i801_isr drivers/i2c/busses/i2c-i801.c:668 [inline]
BUG: KASAN: out-of-bounds in i801_isr+0xc70/0xd60 drivers/i2c/busses/i2c-i801.c:649
Read of size 1 at addr ffffc90004237d68 by task systemd-udevd/5027

CPU: 2 PID: 5027 Comm: systemd-udevd Not tainted 5.12.0-rc4-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x5/0x2f8 mm/kasan/report.c:232
__kasan_report mm/kasan/report.c:399 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
i801_isr_byte_done drivers/i2c/busses/i2c-i801.c:608 [inline]
i801_isr drivers/i2c/busses/i2c-i801.c:668 [inline]
i801_isr+0xc70/0xd60 drivers/i2c/busses/i2c-i801.c:649
__handle_irq_event_percpu+0x303/0x8f0 kernel/irq/handle.c:156
handle_irq_event_percpu kernel/irq/handle.c:196 [inline]
handle_irq_event+0x102/0x290 kernel/irq/handle.c:213
handle_fasteoi_irq+0x22f/0x9f0 kernel/irq/chip.c:714
generic_handle_irq_desc include/linux/irqdesc.h:158 [inline]
handle_irq arch/x86/kernel/irq.c:231 [inline]
__common_interrupt+0x9e/0x200 arch/x86/kernel/irq.c:250
common_interrupt+0x4c/0xd0 arch/x86/kernel/irq.c:240
asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:623
RIP: 0010:__do_softirq+0x196/0x9f6 kernel/softirq.c:331
Code: 00 48 01 f0 48 89 44 24 18 48 c7 c7 80 39 6a 89 e8 7f 8d c6 ff 65 66 c7 05 35 46 c3 76 00 00 e8 00 df 39 f8 fb b8 ff ff ff ff <48> c7 c3 c0 a0 c0 8b 41 0f bc c5 41 89 c7 41 83 c7 01 75 47 e9 d2
RSP: 0018:ffffc90000550f78 EFLAGS: 00000216
RAX: 00000000ffffffff RBX: ffff888018491c40 RCX: 1ffffffff1b8be19
RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000000
RBP: ffff888011159c40 R08: 0000000000000001 R09: 0000000000000001
R10: ffffffff8179e0c8 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000200 R14: 0000000000000000 R15: 0000000000000000
invoke_softirq kernel/softirq.c:221 [inline]
__irq_exit_rcu kernel/softirq.c:422 [inline]
irq_exit_rcu+0x134/0x200 kernel/softirq.c:434
sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1100
</IRQ>
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
RIP: 0010:check_kcov_mode kernel/kcov.c:163 [inline]
RIP: 0010:__sanitizer_cov_trace_pc+0x7/0x60 kernel/kcov.c:197
Code: ff ff b9 ff ff ff ff ba 08 00 00 00 4d 8b 03 48 0f bd ca 49 8b 45 00 48 63 c9 e9 64 ff ff ff 0f 1f 40 00 65 8b 05 49 fe 8d 7e <89> c1 48 8b 34 24 81 e1 00 01 00 00 65 48 8b 14 25 00 f0 01 00 a9
RSP: 0018:ffffc90009ff7868 EFLAGS: 00000283
RAX: 0000000080000000 RBX: 0000000000000004 RCX: ffff888018491c40
RDX: 0000000000000000 RSI: ffff888018491c40 RDI: 0000000000000003
RBP: ffff8880143aa100 R08: 0000000000000000 R09: 0000000000000004
R10: ffffffff8392c68a R11: 0000000000000010 R12: 0000000000000002
R13: 00000000000003e4 R14: dffffc0000000000 R15: 0000000000000000
tomoyo_domain_quota_is_ok+0x2f1/0x550 security/tomoyo/util.c:1093
tomoyo_supervisor+0x2f2/0xf00 security/tomoyo/common.c:2089
tomoyo_audit_path_log security/tomoyo/file.c:168 [inline]
tomoyo_path_permission security/tomoyo/file.c:587 [inline]
tomoyo_path_permission+0x270/0x3a0 security/tomoyo/file.c:573
tomoyo_path_perm+0x2f0/0x400 security/tomoyo/file.c:838
security_inode_getattr+0xcf/0x140 security/security.c:1288
vfs_getattr fs/stat.c:131 [inline]
vfs_statx+0x164/0x390 fs/stat.c:199
vfs_fstatat fs/stat.c:217 [inline]
vfs_lstat include/linux/fs.h:3240 [inline]
__do_sys_newlstat+0x91/0x110 fs/stat.c:372
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc3c1a88335
Code: 69 db 2b 00 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 83 ff 01 48 89 f0 77 30 48 89 c7 48 89 d6 b8 06 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 03 f3 c3 90 48 8b 15 31 db 2b 00 f7 d8 64 89
RSP: 002b:00007ffd4b6fa5c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
RAX: ffffffffffffffda RBX: 0000558fa852a210 RCX: 00007fc3c1a88335
RDX: 00007ffd4b6fa600 RSI: 00007ffd4b6fa600 RDI: 0000558fa8529210
RBP: 00007ffd4b6fa6c0 R08: 00007fc3c1d472c8 R09: 0000000000001010
R10: 0000000000000020 R11: 0000000000000246 R12: 0000558fa8529210
R13: 0000558fa852923a R14: 0000558fa8501ed1 R15: 0000558fa8501eda


Memory state around the buggy address:
ffffc90004237c00: f2 f2 f2 00 00 f3 f3 00 00 00 00 00 00 00 00 00
ffffc90004237c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffc90004237d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
ffffc90004237d80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
ffffc90004237e00: 00 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 f3
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


2021-05-21 20:17:34

by Jean Delvare

[permalink] [raw]
Subject: Re: [syzbot] KASAN: out-of-bounds Read in i801_isr

On Tue, 11 May 2021 15:52:21 -0700, syzbot wrote:
> syzbot found the following issue on:
>
> HEAD commit: e1381380 Merge git://git.kernel.org/pub/scm/linux/kernel/g..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=10be778ad00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=5adab0bdee099d7a
> dashboard link: https://syzkaller.appspot.com/bug?extid=b4d3fd1dfd53e90afd79
> userspace arch: i386
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> ==================================================================
> BUG: KASAN: out-of-bounds in i801_isr_byte_done drivers/i2c/busses/i2c-i801.c:608 [inline]
> BUG: KASAN: out-of-bounds in i801_isr drivers/i2c/busses/i2c-i801.c:668 [inline]
> BUG: KASAN: out-of-bounds in i801_isr+0xc70/0xd60 drivers/i2c/busses/i2c-i801.c:649
> Read of size 1 at addr ffffc90004237d68 by task systemd-udevd/5027
>
> CPU: 2 PID: 5027 Comm: systemd-udevd Not tainted 5.12.0-rc4-syzkaller #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
> Call Trace:
> <IRQ>
> __dump_stack lib/dump_stack.c:79 [inline]
> dump_stack+0x141/0x1d7 lib/dump_stack.c:120
> print_address_description.constprop.0.cold+0x5/0x2f8 mm/kasan/report.c:232
> __kasan_report mm/kasan/report.c:399 [inline]
> kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
> i801_isr_byte_done drivers/i2c/busses/i2c-i801.c:608 [inline]
> i801_isr drivers/i2c/busses/i2c-i801.c:668 [inline]
> i801_isr+0xc70/0xd60 drivers/i2c/busses/i2c-i801.c:649
> __handle_irq_event_percpu+0x303/0x8f0 kernel/irq/handle.c:156
> handle_irq_event_percpu kernel/irq/handle.c:196 [inline]
> handle_irq_event+0x102/0x290 kernel/irq/handle.c:213
> handle_fasteoi_irq+0x22f/0x9f0 kernel/irq/chip.c:714
> generic_handle_irq_desc include/linux/irqdesc.h:158 [inline]
> handle_irq arch/x86/kernel/irq.c:231 [inline]
> __common_interrupt+0x9e/0x200 arch/x86/kernel/irq.c:250
> common_interrupt+0x4c/0xd0 arch/x86/kernel/irq.c:240
> asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:623
> RIP: 0010:__do_softirq+0x196/0x9f6 kernel/softirq.c:331
> Code: 00 48 01 f0 48 89 44 24 18 48 c7 c7 80 39 6a 89 e8 7f 8d c6 ff 65 66 c7 05 35 46 c3 76 00 00 e8 00 df 39 f8 fb b8 ff ff ff ff <48> c7 c3 c0 a0 c0 8b 41 0f bc c5 41 89 c7 41 83 c7 01 75 47 e9 d2
> RSP: 0018:ffffc90000550f78 EFLAGS: 00000216
> RAX: 00000000ffffffff RBX: ffff888018491c40 RCX: 1ffffffff1b8be19
> RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000000
> RBP: ffff888011159c40 R08: 0000000000000001 R09: 0000000000000001
> R10: ffffffff8179e0c8 R11: 0000000000000001 R12: 0000000000000000
> R13: 0000000000000200 R14: 0000000000000000 R15: 0000000000000000
> invoke_softirq kernel/softirq.c:221 [inline]
> __irq_exit_rcu kernel/softirq.c:422 [inline]
> irq_exit_rcu+0x134/0x200 kernel/softirq.c:434
> sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1100
> </IRQ>
> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
> RIP: 0010:check_kcov_mode kernel/kcov.c:163 [inline]
> RIP: 0010:__sanitizer_cov_trace_pc+0x7/0x60 kernel/kcov.c:197
> Code: ff ff b9 ff ff ff ff ba 08 00 00 00 4d 8b 03 48 0f bd ca 49 8b 45 00 48 63 c9 e9 64 ff ff ff 0f 1f 40 00 65 8b 05 49 fe 8d 7e <89> c1 48 8b 34 24 81 e1 00 01 00 00 65 48 8b 14 25 00 f0 01 00 a9
> RSP: 0018:ffffc90009ff7868 EFLAGS: 00000283
> RAX: 0000000080000000 RBX: 0000000000000004 RCX: ffff888018491c40
> RDX: 0000000000000000 RSI: ffff888018491c40 RDI: 0000000000000003
> RBP: ffff8880143aa100 R08: 0000000000000000 R09: 0000000000000004
> R10: ffffffff8392c68a R11: 0000000000000010 R12: 0000000000000002
> R13: 00000000000003e4 R14: dffffc0000000000 R15: 0000000000000000
> tomoyo_domain_quota_is_ok+0x2f1/0x550 security/tomoyo/util.c:1093
> tomoyo_supervisor+0x2f2/0xf00 security/tomoyo/common.c:2089
> tomoyo_audit_path_log security/tomoyo/file.c:168 [inline]
> tomoyo_path_permission security/tomoyo/file.c:587 [inline]
> tomoyo_path_permission+0x270/0x3a0 security/tomoyo/file.c:573
> tomoyo_path_perm+0x2f0/0x400 security/tomoyo/file.c:838
> security_inode_getattr+0xcf/0x140 security/security.c:1288
> vfs_getattr fs/stat.c:131 [inline]
> vfs_statx+0x164/0x390 fs/stat.c:199
> vfs_fstatat fs/stat.c:217 [inline]
> vfs_lstat include/linux/fs.h:3240 [inline]
> __do_sys_newlstat+0x91/0x110 fs/stat.c:372
> do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7fc3c1a88335
> Code: 69 db 2b 00 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 83 ff 01 48 89 f0 77 30 48 89 c7 48 89 d6 b8 06 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 03 f3 c3 90 48 8b 15 31 db 2b 00 f7 d8 64 89
> RSP: 002b:00007ffd4b6fa5c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
> RAX: ffffffffffffffda RBX: 0000558fa852a210 RCX: 00007fc3c1a88335
> RDX: 00007ffd4b6fa600 RSI: 00007ffd4b6fa600 RDI: 0000558fa8529210
> RBP: 00007ffd4b6fa6c0 R08: 00007fc3c1d472c8 R09: 0000000000001010
> R10: 0000000000000020 R11: 0000000000000246 R12: 0000558fa8529210
> R13: 0000558fa852923a R14: 0000558fa8501ed1 R15: 0000558fa8501eda
>
>
> Memory state around the buggy address:
> ffffc90004237c00: f2 f2 f2 00 00 f3 f3 00 00 00 00 00 00 00 00 00
> ffffc90004237c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >ffffc90004237d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ^
> ffffc90004237d80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
> ffffc90004237e00: 00 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 f3
> ==================================================================

We get similar reports from Kasan since at least January 2020.
Sometimes about reads, sometimes about writes. Originally in vmalloc'd
area, then on the stack, then in vmalloc's aread again, and now just
"out-of-bound" without the memory type. What all these reports have in
common is that the problem happens in i801_isr_byte_done(). The exact
line differs though. Today's error is on this line:

} else if (priv->count < priv->len - 1) {
/* Write next byte, except for IRQ after last byte */
outb_p(priv->data[++priv->count], SMBBLKDAT(priv)); <<--- HERE
}

while the previous report was on that line:

/* Read next byte */
if (priv->count < priv->len)
priv->data[priv->count++] = inb(SMBBLKDAT(priv)); <<--- HERE
else
dev_dbg(&priv->pci_dev->dev,
"Discarding extra byte on block read\n");

All reports before that were on either location.

All reports also have in common that a previous transaction before the
bug triggers has failed with a timeout:

[ 376.721552][T12286] i801_smbus 0000:00:1f.3: Timeout waiting for interrupt!
[ 376.743857][T12286] i801_smbus 0000:00:1f.3: Transaction timeout
[ 376.766812][T12286] i801_smbus 0000:00:1f.3: Failed terminating the transaction

I think the restoration of the SMBus controller to a working state
after a transaction failed like that never actually worked, typically
the controller would stop responding until next reboot. Most probably
because the problem that caused the timeout (like a broken I2C device
holding either SDA or SCL permanently up or down) is still present. But
it should definitely not cause a memory overrun.

Looking at the ICH9 datasheet, I see the following description for the
KILL bit (which is what we try to use to reset the SMBus controller):

"Kills the current host transaction taking place, sets the FAILED
status bit, and asserts the interrupt (or SMI#)."

At the time the recovery code was written, i2c-i801 was a polling-only
driver, interrupts were not supported, so asserting the interrupt had
no effect. Now that the driver does support interrupts, this would call
i801_isr(), right?

So my theory is that our attempt to kill a timed-out byte-by-byte block
transaction triggers an interrupt, which calls in i801_isr() with the
SMBHSTSTS_BYTE_DONE bit set. This in turn causes i801_isr_byte_done()
to be called while we are absolutely not ready nor even supposed to
process the next data byte.

I guess we should clear SMBHSTSTS_BYTE_DONE before issuing a
SMBHSTCNT_KILL. Alternatively we could add a check at the beginning of
i801_isr() to bail out immediately if SMBHSTCNT_KILL is set. While
possibly more robust, this approach has the drawback of increasing the
processing time of all interrupts, even standard/legitimate ones. So
maybe just clearing SMBHSTSTS_BYTE_DONE is more reasonable. Something
like:

--- linux-5.11.orig/drivers/i2c/busses/i2c-i801.c
+++ linux-5.11/drivers/i2c/busses/i2c-i801.c
@@ -393,6 +393,8 @@ static int i801_check_post(struct i801_p
dev_err(&priv->pci_dev->dev, "Transaction timeout\n");
/* try to stop the current command */
dev_dbg(&priv->pci_dev->dev, "Terminating the current operation\n");
+ /* Clear BYTE_DONE so as to not confuse i801_isr() */
+ outb_p(SMBHSTSTS_BYTE_DONE, SMBHSTSTS(priv));
outb_p(inb_p(SMBHSTCNT(priv)) | SMBHSTCNT_KILL,
SMBHSTCNT(priv));
usleep_range(1000, 2000);

I must say I wonder why SMBHSTCNT_KILL generates an interrupt in the
first place, I can't see who would need this.

Jarkko, Andy, Mika, what do you think? I would appreciate a second pair
of eyes looking at the issue, in case I'm on the wrong track.

Of course, another question is why we hit the timeout in the first
place. But we would need a reproducer to investigate this. Is there any
place where I could see the actual I2C commands being sent by the bot?
The log only traces the bot-level functions, and I could guess how some
of the parameters are being mapped to struct members that are then fed
to i2c-dev, but some of the parameters are still a mystery to me.

To be honest, I find the bot logs pretty hard to read, as apparently a
lot of tests are running in parallel for various subsystems, and
everything ends mixed up in the log. Wouldn't it be more valuable to
serialize the tests on one given virtual host, so that the developers
can then read the log and more easily see what's going on?

Thanks,
--
Jean Delvare
SUSE L3 Support

2021-05-24 09:18:29

by Jean Delvare

[permalink] [raw]
Subject: Re: [syzbot] KASAN: out-of-bounds Read in i801_isr

On Fri, 21 May 2021 15:48:20 +0200, Jean Delvare wrote:
> Looking at the ICH9 datasheet, I see the following description for the
> KILL bit (which is what we try to use to reset the SMBus controller):
>
> "Kills the current host transaction taking place, sets the FAILED
> status bit, and asserts the interrupt (or SMI#)."
>
> At the time the recovery code was written, i2c-i801 was a polling-only
> driver, interrupts were not supported, so asserting the interrupt had
> no effect. Now that the driver does support interrupts, this would call
> i801_isr(), right?

I modified the i2c-i801 driver to simulate a timeout once in a while,
and I can confirm that i801_isr() gets called when the KILL bit is set.

> So my theory is that our attempt to kill a timed-out byte-by-byte block
> transaction triggers an interrupt, which calls in i801_isr() with the
> SMBHSTSTS_BYTE_DONE bit set. This in turn causes i801_isr_byte_done()
> to be called while we are absolutely not ready nor even supposed to
> process the next data byte.

I can also report that I'm reproducing the bug reported by syzbot when
this happens, by running decode-dimms with my modified i2c-i801 driver.
I have 4 SPD EEPROMs instantiated for my DIMMs, so decode-dimms
triggers a lot of calls to i2c_smbus_read_i2c_block_data().

> I guess we should clear SMBHSTSTS_BYTE_DONE before issuing a
> SMBHSTCNT_KILL. Alternatively we could add a check at the beginning of
> i801_isr() to bail out immediately if SMBHSTCNT_KILL is set. While
> possibly more robust, this approach has the drawback of increasing the
> processing time of all interrupts, even standard/legitimate ones. So
> maybe just clearing SMBHSTSTS_BYTE_DONE is more reasonable. Something
> like:
>
> --- linux-5.11.orig/drivers/i2c/busses/i2c-i801.c
> +++ linux-5.11/drivers/i2c/busses/i2c-i801.c
> @@ -393,6 +393,8 @@ static int i801_check_post(struct i801_p
> dev_err(&priv->pci_dev->dev, "Transaction timeout\n");
> /* try to stop the current command */
> dev_dbg(&priv->pci_dev->dev, "Terminating the current operation\n");
> + /* Clear BYTE_DONE so as to not confuse i801_isr() */
> + outb_p(SMBHSTSTS_BYTE_DONE, SMBHSTSTS(priv));
> outb_p(inb_p(SMBHSTCNT(priv)) | SMBHSTCNT_KILL,
> SMBHSTCNT(priv));
> usleep_range(1000, 2000);

I ended up with a different approach. Clearing BYTE_DONE would possibly
avoid the problem immediately at hand, however the interrupt would
still be triggered, even for other (not byte-by-byte) transactions,
causing us to fiddle with the status register. I don't think it's
needed, and it could have unintended side effects.

So instead I tried the alternative approach of checking the KILL bit in
i801_isr() and returning immediately if it is set. While it does work,
I noticed that i801_isr() is in fact called in a loop for the whole
duration of the usleep_range(1000, 2000) before we clear the KILL bit.
While better than an out-of-bounds memory access, an interrupt flood is
still not ideal.

> I must say I wonder why SMBHSTCNT_KILL generates an interrupt in the
> first place, I can't see who would need this.

Which made me think... The interrupt is being generated because we
*ask* for it. It doesn't have to be generated if we aren't interested
in it. So the fix would be very simple: don't bother preserving the
other bits in HST_CNT, they will be set properly again for the next
transaction anyway, only set the KILL bit to 1. In particular don't
preserve the INTREN bit, so no interrupt will be generated.

@@ -395,11 +401,9 @@ static int i801_check_post(struct i801_p
dev_err(&priv->pci_dev->dev, "Transaction timeout\n");
/* try to stop the current command */
dev_dbg(&priv->pci_dev->dev, "Terminating the current operation\n");
- outb_p(inb_p(SMBHSTCNT(priv)) | SMBHSTCNT_KILL,
- SMBHSTCNT(priv));
+ outb_p(SMBHSTCNT_KILL, SMBHSTCNT(priv));
usleep_range(1000, 2000);
- outb_p(inb_p(SMBHSTCNT(priv)) & (~SMBHSTCNT_KILL),
- SMBHSTCNT(priv));
+ outb_p(0, SMBHSTCNT(priv));

/* Check if it worked */
status = inb_p(SMBHSTSTS(priv));

This works for me. If there are no objections I'll post a proper patch.

--
Jean Delvare
SUSE L3 Support

2021-05-24 10:43:31

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [syzbot] KASAN: out-of-bounds Read in i801_isr

On Mon, May 24, 2021 at 11:16:52AM +0200, Jean Delvare wrote:
> On Fri, 21 May 2021 15:48:20 +0200, Jean Delvare wrote:
> > Looking at the ICH9 datasheet, I see the following description for the
> > KILL bit (which is what we try to use to reset the SMBus controller):
> >
> > "Kills the current host transaction taking place, sets the FAILED
> > status bit, and asserts the interrupt (or SMI#)."
> >
> > At the time the recovery code was written, i2c-i801 was a polling-only
> > driver, interrupts were not supported, so asserting the interrupt had
> > no effect. Now that the driver does support interrupts, this would call
> > i801_isr(), right?
>
> I modified the i2c-i801 driver to simulate a timeout once in a while,
> and I can confirm that i801_isr() gets called when the KILL bit is set.
>
> > So my theory is that our attempt to kill a timed-out byte-by-byte block
> > transaction triggers an interrupt, which calls in i801_isr() with the
> > SMBHSTSTS_BYTE_DONE bit set. This in turn causes i801_isr_byte_done()
> > to be called while we are absolutely not ready nor even supposed to
> > process the next data byte.
>
> I can also report that I'm reproducing the bug reported by syzbot when
> this happens, by running decode-dimms with my modified i2c-i801 driver.
> I have 4 SPD EEPROMs instantiated for my DIMMs, so decode-dimms
> triggers a lot of calls to i2c_smbus_read_i2c_block_data().
>
> > I guess we should clear SMBHSTSTS_BYTE_DONE before issuing a
> > SMBHSTCNT_KILL. Alternatively we could add a check at the beginning of
> > i801_isr() to bail out immediately if SMBHSTCNT_KILL is set. While
> > possibly more robust, this approach has the drawback of increasing the
> > processing time of all interrupts, even standard/legitimate ones. So
> > maybe just clearing SMBHSTSTS_BYTE_DONE is more reasonable. Something
> > like:
> >
> > --- linux-5.11.orig/drivers/i2c/busses/i2c-i801.c
> > +++ linux-5.11/drivers/i2c/busses/i2c-i801.c
> > @@ -393,6 +393,8 @@ static int i801_check_post(struct i801_p
> > dev_err(&priv->pci_dev->dev, "Transaction timeout\n");
> > /* try to stop the current command */
> > dev_dbg(&priv->pci_dev->dev, "Terminating the current operation\n");
> > + /* Clear BYTE_DONE so as to not confuse i801_isr() */
> > + outb_p(SMBHSTSTS_BYTE_DONE, SMBHSTSTS(priv));
> > outb_p(inb_p(SMBHSTCNT(priv)) | SMBHSTCNT_KILL,
> > SMBHSTCNT(priv));
> > usleep_range(1000, 2000);
>
> I ended up with a different approach. Clearing BYTE_DONE would possibly
> avoid the problem immediately at hand, however the interrupt would
> still be triggered, even for other (not byte-by-byte) transactions,
> causing us to fiddle with the status register. I don't think it's
> needed, and it could have unintended side effects.
>
> So instead I tried the alternative approach of checking the KILL bit in
> i801_isr() and returning immediately if it is set. While it does work,
> I noticed that i801_isr() is in fact called in a loop for the whole
> duration of the usleep_range(1000, 2000) before we clear the KILL bit.
> While better than an out-of-bounds memory access, an interrupt flood is
> still not ideal.
>
> > I must say I wonder why SMBHSTCNT_KILL generates an interrupt in the
> > first place, I can't see who would need this.
>
> Which made me think... The interrupt is being generated because we
> *ask* for it. It doesn't have to be generated if we aren't interested
> in it. So the fix would be very simple: don't bother preserving the
> other bits in HST_CNT, they will be set properly again for the next
> transaction anyway, only set the KILL bit to 1. In particular don't
> preserve the INTREN bit, so no interrupt will be generated.
>
> @@ -395,11 +401,9 @@ static int i801_check_post(struct i801_p
> dev_err(&priv->pci_dev->dev, "Transaction timeout\n");
> /* try to stop the current command */
> dev_dbg(&priv->pci_dev->dev, "Terminating the current operation\n");
> - outb_p(inb_p(SMBHSTCNT(priv)) | SMBHSTCNT_KILL,
> - SMBHSTCNT(priv));
> + outb_p(SMBHSTCNT_KILL, SMBHSTCNT(priv));
> usleep_range(1000, 2000);
> - outb_p(inb_p(SMBHSTCNT(priv)) & (~SMBHSTCNT_KILL),
> - SMBHSTCNT(priv));
> + outb_p(0, SMBHSTCNT(priv));
>
> /* Check if it worked */
> status = inb_p(SMBHSTSTS(priv));
>
> This works for me. If there are no objections I'll post a proper patch.

No objection from me, thanks for conducting a research and developing the fix!
Feel free to add mine
Acked-by: Andy Shevchenko <[email protected]>
to whatever solution you consider the best.

--
With Best Regards,
Andy Shevchenko