2022-10-18 00:43:42

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 4.9 1/8] of/fdt: Don't calculate initrd size from DT if start > end

From: Marek Bykowski <[email protected]>

[ Upstream commit d5e3050c0feb8bf7b9a75482fafcc31b90257926 ]

If the properties 'linux,initrd-start' and 'linux,initrd-end' of
the chosen node populated from the bootloader, eg. U-Boot, are so that
start > end, then the phys_initrd_size calculated from end - start is
negative that subsequently gets converted to a high positive value for
being unsigned long long. Then, the memory region with the (invalid)
size is added to the bootmem and attempted being paged in paging_init()
that results in the kernel fault.

For example, on the FVP ARM64 system I'm running, the U-Boot populates
the 'linux,initrd-start' with 8800_0000 and 'linux,initrd-end' with 0.
The phys_initrd_size calculated is then ffff_ffff_7800_0000
(= 0 - 8800_0000 = -8800_0000 + ULLONG_MAX + 1). paging_init() then
attempts to map the address 8800_0000 + ffff_ffff_7800_0000 and oops'es
as below.

It should be stressed, it is generally a fault of the bootloader's with
the kernel relying on it, however we should not allow the bootloader's
misconfiguration to lead to the kernel oops. Not only the kernel should be
bullet proof against it but also finding the root cause of the paging
fault spanning over the bootloader, DT, and kernel may happen is not so
easy.

Unable to handle kernel paging request at virtual address fffffffefe43c000
Mem abort info:
ESR = 0x96000007
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000007
CM = 0, WnR = 0
swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000080e3d000
[fffffffefe43c000] pgd=0000000080de9003, pud=0000000080de9003
Unable to handle kernel paging request at virtual address ffffff8000de9f90
Mem abort info:
ESR = 0x96000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000005
CM = 0, WnR = 0
swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000080e3d000
[ffffff8000de9f90] pgd=0000000000000000, pud=0000000000000000
Internal error: Oops: 96000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.51-yocto-standard #1
Hardware name: FVP Base (DT)
pstate: 60000085 (nZCv daIf -PAN -UAO)
pc : show_pte+0x12c/0x1b4
lr : show_pte+0x100/0x1b4
sp : ffffffc010ce3b30
x29: ffffffc010ce3b30 x28: ffffffc010ceed80
x27: fffffffefe43c000 x26: fffffffefe43a028
x25: 0000000080bf0000 x24: 0000000000000025
x23: ffffffc010b8d000 x22: ffffffc010e3d000
x23: ffffffc010b8d000 x22: ffffffc010e3d000
x21: 0000000080de9000 x20: ffffff7f80000f90
x19: fffffffefe43c000 x18: 0000000000000030
x17: 0000000000001400 x16: 0000000000001c00
x15: ffffffc010cef1b8 x14: ffffffffffffffff
x13: ffffffc010df1f40 x12: ffffffc010df1b70
x11: ffffffc010ce3b30 x10: ffffffc010ce3b30
x9 : 00000000ffffffc8 x8 : 0000000000000000
x7 : 000000000000000f x6 : ffffffc010df16e8
x5 : 0000000000000000 x4 : 0000000000000000
x3 : 00000000ffffffff x2 : 0000000000000000
x1 : 0000008080000000 x0 : ffffffc010af1d68
Call trace:
show_pte+0x12c/0x1b4
die_kernel_fault+0x54/0x78
__do_kernel_fault+0x11c/0x128
do_translation_fault+0x58/0xac
do_mem_abort+0x50/0xb0
el1_da+0x1c/0x90
__create_pgd_mapping+0x348/0x598
paging_init+0x3f0/0x70d0
setup_arch+0x2c0/0x5d4
start_kernel+0x94/0x49c
Code: 92748eb5 900052a0 9135a000 cb010294 (f8756a96) 

Signed-off-by: Marek Bykowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Rob Herring <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/of/fdt.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 513558eecfd6..44903f94d0cd 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -917,6 +917,8 @@ static void __init early_init_dt_check_for_initrd(unsigned long node)
if (!prop)
return;
end = of_read_number(prop, len/4);
+ if (start > end)
+ return;

__early_init_dt_declare_initrd(start, end);

--
2.35.1


2022-10-18 00:45:47

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 4.9 5/8] m68knommu: fix non-mmu classic 68000 legacy timer tick selection

From: Greg Ungerer <[email protected]>

[ Upstream commit 18011e50c497f04a57a8e00122906f04922b30b4 ]

The family of classic 68000 parts supported when in non-mmu mode all
currently use the legacy timer support. Move the selection of that config
option, LEGACY_TIMER_TICK, into the core CPU configuration.

This fixes compilation if no specific CPU variant is selected, since
the LEGACY_TIMER_TICK option was only selected in the specific CPU
variant configurations.

Reviewed-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Greg Ungerer <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/m68k/Kconfig.cpu | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/m68k/Kconfig.cpu b/arch/m68k/Kconfig.cpu
index 2268d19cc915..327347792550 100644
--- a/arch/m68k/Kconfig.cpu
+++ b/arch/m68k/Kconfig.cpu
@@ -42,6 +42,7 @@ config M68000
select GENERIC_CSUM
select CPU_NO_EFFICIENT_FFS
select HAVE_ARCH_HASH
+ select LEGACY_TIMER_TICK
help
The Freescale (was Motorola) 68000 CPU is the first generation of
the well known M68K family of processors. The CPU core as well as
--
2.35.1

2022-10-18 00:52:33

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 4.9 8/8] virtio_pci: don't try to use intxif pin is zero

From: Angus Chen <[email protected]>

[ Upstream commit 71491c54eafa318fdd24a1f26a1c82b28e1ac21d ]

The background is that we use dpu in cloud computing,the arch is x86,80
cores. We will have a lots of virtio devices,like 512 or more.
When we probe about 200 virtio_blk devices,it will fail and
the stack is printed as follows:

[25338.485128] virtio-pci 0000:b3:00.0: virtio_pci: leaving for legacy driver
[25338.496174] genirq: Flags mismatch irq 0. 00000080 (virtio418) vs. 00015a00 (timer)
[25338.503822] CPU: 20 PID: 5431 Comm: kworker/20:0 Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.30.1.el8.x86_64
[25338.516403] Hardware name: Inspur NF5280M5/YZMB-00882-10E, BIOS 4.1.21 08/25/2021
[25338.523881] Workqueue: events work_for_cpu_fn
[25338.528235] Call Trace:
[25338.530687] dump_stack+0x5c/0x80
[25338.534000] __setup_irq.cold.53+0x7c/0xd3
[25338.538098] request_threaded_irq+0xf5/0x160
[25338.542371] vp_find_vqs+0xc7/0x190
[25338.545866] init_vq+0x17c/0x2e0 [virtio_blk]
[25338.550223] ? ncpus_cmp_func+0x10/0x10
[25338.554061] virtblk_probe+0xe6/0x8a0 [virtio_blk]
[25338.558846] virtio_dev_probe+0x158/0x1f0
[25338.562861] really_probe+0x255/0x4a0
[25338.566524] ? __driver_attach_async_helper+0x90/0x90
[25338.571567] driver_probe_device+0x49/0xc0
[25338.575660] bus_for_each_drv+0x79/0xc0
[25338.579499] __device_attach+0xdc/0x160
[25338.583337] bus_probe_device+0x9d/0xb0
[25338.587167] device_add+0x418/0x780
[25338.590654] register_virtio_device+0x9e/0xe0
[25338.595011] virtio_pci_probe+0xb3/0x140
[25338.598941] local_pci_probe+0x41/0x90
[25338.602689] work_for_cpu_fn+0x16/0x20
[25338.606443] process_one_work+0x1a7/0x360
[25338.610456] ? create_worker+0x1a0/0x1a0
[25338.614381] worker_thread+0x1cf/0x390
[25338.618132] ? create_worker+0x1a0/0x1a0
[25338.622051] kthread+0x116/0x130
[25338.625283] ? kthread_flush_work_fn+0x10/0x10
[25338.629731] ret_from_fork+0x1f/0x40
[25338.633395] virtio_blk: probe of virtio418 failed with error -16

The log :
"genirq: Flags mismatch irq 0. 00000080 (virtio418) vs. 00015a00 (timer)"
was printed because of the irq 0 is used by timer exclusive,and when
vp_find_vqs call vp_find_vqs_msix and returns false twice (for
whatever reason), then it will call vp_find_vqs_intx as a fallback.
Because vp_dev->pci_dev->irq is zero, we request irq 0 with
flag IRQF_SHARED, and get a backtrace like above.

According to PCI spec about "Interrupt Pin" Register (Offset 3Dh):
"The Interrupt Pin register is a read-only register that identifies the
legacy interrupt Message(s) the Function uses. Valid values are 01h, 02h,
03h, and 04h that map to legacy interrupt Messages for INTA,
INTB, INTC, and INTD respectively. A value of 00h indicates that the
Function uses no legacy interrupt Message(s)."

So if vp_dev->pci_dev->pin is zero, we should not request legacy
interrupt.

Signed-off-by: Angus Chen <[email protected]>
Suggested-by: Michael S. Tsirkin <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/virtio/virtio_pci_common.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index 37e3ba5dadf6..d634eb926a2f 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -389,6 +389,9 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned nvqs,
true, false);
if (!err)
return 0;
+ /* Is there an interrupt pin? If not give up. */
+ if (!(to_vp_device(vdev)->pci_dev->pin))
+ return err;
/* Finally fall back to regular interrupts. */
return vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
false, false);
--
2.35.1

2022-10-18 00:55:02

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 4.9 7/8] 9p/trans_fd: always use O_NONBLOCK read/write

From: Tetsuo Handa <[email protected]>

[ Upstream commit ef575281b21e9a34dfae544a187c6aac2ae424a9 ]

syzbot is reporting hung task at p9_fd_close() [1], for p9_mux_poll_stop()
from p9_conn_destroy() from p9_fd_close() is failing to interrupt already
started kernel_read() from p9_fd_read() from p9_read_work() and/or
kernel_write() from p9_fd_write() from p9_write_work() requests.

Since p9_socket_open() sets O_NONBLOCK flag, p9_mux_poll_stop() does not
need to interrupt kernel_read()/kernel_write(). However, since p9_fd_open()
does not set O_NONBLOCK flag, but pipe blocks unless signal is pending,
p9_mux_poll_stop() needs to interrupt kernel_read()/kernel_write() when
the file descriptor refers to a pipe. In other words, pipe file descriptor
needs to be handled as if socket file descriptor.

We somehow need to interrupt kernel_read()/kernel_write() on pipes.

A minimal change, which this patch is doing, is to set O_NONBLOCK flag
from p9_fd_open(), for O_NONBLOCK flag does not affect reading/writing
of regular files. But this approach changes O_NONBLOCK flag on userspace-
supplied file descriptors (which might break userspace programs), and
O_NONBLOCK flag could be changed by userspace. It would be possible to set
O_NONBLOCK flag every time p9_fd_read()/p9_fd_write() is invoked, but still
remains small race window for clearing O_NONBLOCK flag.

If we don't want to manipulate O_NONBLOCK flag, we might be able to
surround kernel_read()/kernel_write() with set_thread_flag(TIF_SIGPENDING)
and recalc_sigpending(). Since p9_read_work()/p9_write_work() works are
processed by kernel threads which process global system_wq workqueue,
signals could not be delivered from remote threads when p9_mux_poll_stop()
from p9_conn_destroy() from p9_fd_close() is called. Therefore, calling
set_thread_flag(TIF_SIGPENDING)/recalc_sigpending() every time would be
needed if we count on signals for making kernel_read()/kernel_write()
non-blocking.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://syzkaller.appspot.com/bug?extid=8b41a1365f1106fd0f33 [1]
Reported-by: syzbot <[email protected]>
Signed-off-by: Tetsuo Handa <[email protected]>
Tested-by: syzbot <[email protected]>
Reviewed-by: Christian Schoenebeck <[email protected]>
[Dominique: add comment at Christian's suggestion]
Signed-off-by: Dominique Martinet <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/9p/trans_fd.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 83cdb13c6322..a7973bd56a40 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -820,11 +820,14 @@ static int p9_fd_open(struct p9_client *client, int rfd, int wfd)
goto out_free_ts;
if (!(ts->rd->f_mode & FMODE_READ))
goto out_put_rd;
+ /* prevent workers from hanging on IO when fd is a pipe */
+ ts->rd->f_flags |= O_NONBLOCK;
ts->wr = fget(wfd);
if (!ts->wr)
goto out_put_rd;
if (!(ts->wr->f_mode & FMODE_WRITE))
goto out_put_wr;
+ ts->wr->f_flags |= O_NONBLOCK;

client->trans = ts;
client->status = Connected;
--
2.35.1

2022-10-18 01:11:08

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 4.9 2/8] objtool,x86: Teach decode about LOOP* instructions

From: Peter Zijlstra <[email protected]>

[ Upstream commit 7a7621dfa417aa3715d2a3bd1bdd6cf5018274d0 ]

When 'discussing' control flow Masami mentioned the LOOP* instructions
and I realized objtool doesn't decode them properly.

As it turns out, these instructions are somewhat inefficient and as
such unlikely to be emitted by the compiler (a few vmlinux.o checks
can't find a single one) so this isn't critical, but still, best to
decode them properly.

Reported-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Sasha Levin <[email protected]>
---
tools/objtool/arch/x86/decode.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 540a209b78ab..faaf2820e932 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -440,6 +440,12 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
*type = INSN_CONTEXT_SWITCH;
break;

+ case 0xe0: /* loopne */
+ case 0xe1: /* loope */
+ case 0xe2: /* loop */
+ *type = INSN_JUMP_CONDITIONAL;
+ break;
+
case 0xe8:
*type = INSN_CALL;
break;
--
2.35.1

2022-10-18 01:27:47

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 4.9 6/8] 9p: trans_fd/p9_conn_cancel: drop client lock earlier

From: Dominique Martinet <[email protected]>

[ Upstream commit 52f1c45dde9136f964d63a77d19826c8a74e2c7f ]

syzbot reported a double-lock here and we no longer need this
lock after requests have been moved off to local list:
just drop the lock earlier.

Link: https://lkml.kernel.org/r/[email protected]
Reported-by: [email protected]
Signed-off-by: Dominique Martinet <[email protected]>
Tested-by: Schspa Shi <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
net/9p/trans_fd.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 33b317a25a2d..83cdb13c6322 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -215,6 +215,8 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
list_move(&req->req_list, &cancel_list);
}

+ spin_unlock(&m->client->lock);
+
list_for_each_entry_safe(req, rtmp, &cancel_list, req_list) {
p9_debug(P9_DEBUG_ERROR, "call back req %p\n", req);
list_del(&req->req_list);
@@ -222,7 +224,6 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
req->t_err = err;
p9_client_cb(m->client, req, REQ_STATUS_ERROR);
}
- spin_unlock(&m->client->lock);
}

static int
--
2.35.1

2022-10-18 01:41:57

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 4.9 4/8] m68knommu: fix non-specific 68328 choice interrupt build failure

From: Greg Ungerer <[email protected]>

[ Upstream commit 750321ace9107e103f254bf46900629ff347eb7b ]

Compiling for a classic m68k non-MMU target with no specific CPU
selected fails with the following error:

arch/m68k/68000/ints.c: In function 'process_int':
>> arch/m68k/68000/ints.c:82:30: error: 'ISR' undeclared (first use in this function)
82 | unsigned long pend = ISR;
| ^~~

This interrupt handling code is specific to the 68328 family of 68000
parts. There is a couple of variants (68EZ328, 68VZ328) and the common
ancestor of them the strait 68328.

The code here includes a specific header for each variant type. But if
none is selected then nothing is included to supply the appropriate
register and bit flags defines.

Rearrange the includes so that at least one type is always included.
At the very least the 68328 base type should be the fallback, so make
that true.

Reported-by: kernel test robot <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Greg Ungerer <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/m68k/68000/ints.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/m68k/68000/ints.c b/arch/m68k/68000/ints.c
index cda49b12d7be..f9a5ec781408 100644
--- a/arch/m68k/68000/ints.c
+++ b/arch/m68k/68000/ints.c
@@ -18,12 +18,12 @@
#include <asm/io.h>
#include <asm/machdep.h>

-#if defined(CONFIG_M68328)
-#include <asm/MC68328.h>
-#elif defined(CONFIG_M68EZ328)
+#if defined(CONFIG_M68EZ328)
#include <asm/MC68EZ328.h>
#elif defined(CONFIG_M68VZ328)
#include <asm/MC68VZ328.h>
+#else
+#include <asm/MC68328.h>
#endif

/* assembler routines */
--
2.35.1

2022-10-18 10:39:02

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH AUTOSEL 4.9 1/8] of/fdt: Don't calculate initrd size from DT if start > end

Hi!

> It should be stressed, it is generally a fault of the bootloader's with
> the kernel relying on it, however we should not allow the bootloader's
> misconfiguration to lead to the kernel oops. Not only the kernel
> should be

I believe we should at least printk() if we detect bootloader bug of
this severity.

Best regards,
Pavel


> +++ b/drivers/of/fdt.c
> @@ -917,6 +917,8 @@ static void __init early_init_dt_check_for_initrd(unsigned long node)
> if (!prop)
> return;
> end = of_read_number(prop, len/4);
> + if (start > end)
> + return;
>
> __early_init_dt_declare_initrd(start, end);
>
> --
> 2.35.1

--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


Attachments:
(No filename) (811.00 B)
signature.asc (201.00 B)
Download all attachments

2022-10-19 00:56:15

by Angus Chen

[permalink] [raw]
Subject: RE: [PATCH AUTOSEL 4.9 8/8] virtio_pci: don't try to use intxif pin is zero

Hi sasha

> -----Original Message-----
> From: Sasha Levin <[email protected]>
> Sent: Tuesday, October 18, 2022 8:12 AM
> To: [email protected]; [email protected]
> Cc: Angus Chen <[email protected]>; Michael S . Tsirkin
> <[email protected]>; Sasha Levin <[email protected]>; [email protected];
> [email protected]
> Subject: [PATCH AUTOSEL 4.9 8/8] virtio_pci: don't try to use intxif pin is zero
>
> From: Angus Chen <[email protected]>
>
> [ Upstream commit 71491c54eafa318fdd24a1f26a1c82b28e1ac21d ]
>
> The background is that we use dpu in cloud computing,the arch is x86,80
> cores. We will have a lots of virtio devices,like 512 or more.
> When we probe about 200 virtio_blk devices,it will fail and
> the stack is printed as follows:
>
> [25338.485128] virtio-pci 0000:b3:00.0: virtio_pci: leaving for legacy driver
> [25338.496174] genirq: Flags mismatch irq 0. 00000080 (virtio418) vs. 00015a00
> (timer)
> [25338.503822] CPU: 20 PID: 5431 Comm: kworker/20:0 Kdump: loaded Tainted:
> G OE --------- - - 4.18.0-305.30.1.el8.x86_64
> [25338.516403] Hardware name: Inspur NF5280M5/YZMB-00882-10E, BIOS
> 4.1.21 08/25/2021
> [25338.523881] Workqueue: events work_for_cpu_fn
> [25338.528235] Call Trace:
> [25338.530687] dump_stack+0x5c/0x80
> [25338.534000] __setup_irq.cold.53+0x7c/0xd3
> [25338.538098] request_threaded_irq+0xf5/0x160
> [25338.542371] vp_find_vqs+0xc7/0x190
> [25338.545866] init_vq+0x17c/0x2e0 [virtio_blk]
> [25338.550223] ? ncpus_cmp_func+0x10/0x10
> [25338.554061] virtblk_probe+0xe6/0x8a0 [virtio_blk]
> [25338.558846] virtio_dev_probe+0x158/0x1f0
> [25338.562861] really_probe+0x255/0x4a0
> [25338.566524] ? __driver_attach_async_helper+0x90/0x90
> [25338.571567] driver_probe_device+0x49/0xc0
> [25338.575660] bus_for_each_drv+0x79/0xc0
> [25338.579499] __device_attach+0xdc/0x160
> [25338.583337] bus_probe_device+0x9d/0xb0
> [25338.587167] device_add+0x418/0x780
> [25338.590654] register_virtio_device+0x9e/0xe0
> [25338.595011] virtio_pci_probe+0xb3/0x140
> [25338.598941] local_pci_probe+0x41/0x90
> [25338.602689] work_for_cpu_fn+0x16/0x20
> [25338.606443] process_one_work+0x1a7/0x360
> [25338.610456] ? create_worker+0x1a0/0x1a0
> [25338.614381] worker_thread+0x1cf/0x390
> [25338.618132] ? create_worker+0x1a0/0x1a0
> [25338.622051] kthread+0x116/0x130
> [25338.625283] ? kthread_flush_work_fn+0x10/0x10
> [25338.629731] ret_from_fork+0x1f/0x40
> [25338.633395] virtio_blk: probe of virtio418 failed with error -16
>
> The log :
> "genirq: Flags mismatch irq 0. 00000080 (virtio418) vs. 00015a00 (timer)"
> was printed because of the irq 0 is used by timer exclusive,and when
> vp_find_vqs call vp_find_vqs_msix and returns false twice (for
> whatever reason), then it will call vp_find_vqs_intx as a fallback.
> Because vp_dev->pci_dev->irq is zero, we request irq 0 with
> flag IRQF_SHARED, and get a backtrace like above.
>
> According to PCI spec about "Interrupt Pin" Register (Offset 3Dh):
> "The Interrupt Pin register is a read-only register that identifies the
> legacy interrupt Message(s) the Function uses. Valid values are 01h, 02h,
> 03h, and 04h that map to legacy interrupt Messages for INTA,
> INTB, INTC, and INTD respectively. A value of 00h indicates that the
> Function uses no legacy interrupt Message(s)."
>
> So if vp_dev->pci_dev->pin is zero, we should not request legacy
> interrupt.
>
> Signed-off-by: Angus Chen <[email protected]>
> Suggested-by: Michael S. Tsirkin <[email protected]>
> Message-Id: <[email protected]>
> Signed-off-by: Michael S. Tsirkin <[email protected]>
> Signed-off-by: Sasha Levin <[email protected]>
> ---
> drivers/virtio/virtio_pci_common.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/virtio/virtio_pci_common.c
> b/drivers/virtio/virtio_pci_common.c
> index 37e3ba5dadf6..d634eb926a2f 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -389,6 +389,9 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned
> nvqs,
> true, false);
> if (!err)
> return 0;
> + /* Is there an interrupt pin? If not give up. */
> + if (!(to_vp_device(vdev)->pci_dev->pin))
> + return err;
> /* Finally fall back to regular interrupts. */
> return vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
> false, false);
> --
> 2.35.1

the patch 71491c54eafa31 has been fixed by 2145ab513e3b3,
It is report by Michael Ellerman <[email protected]> and suggested by linus.
If it is merged in the stable git repo, I worry about powerpc arch.
Thans.

2022-10-19 06:57:18

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH AUTOSEL 4.9 8/8] virtio_pci: don't try to use intxif pin is zero

On Wed, Oct 19, 2022 at 12:27:46AM +0000, Angus Chen wrote:
> Hi sasha
>
> > -----Original Message-----
> > From: Sasha Levin <[email protected]>
> > Sent: Tuesday, October 18, 2022 8:12 AM
> > To: [email protected]; [email protected]
> > Cc: Angus Chen <[email protected]>; Michael S . Tsirkin
> > <[email protected]>; Sasha Levin <[email protected]>; [email protected];
> > [email protected]
> > Subject: [PATCH AUTOSEL 4.9 8/8] virtio_pci: don't try to use intxif pin is zero
> >
> > From: Angus Chen <[email protected]>
> >
> > [ Upstream commit 71491c54eafa318fdd24a1f26a1c82b28e1ac21d ]
> >
> > The background is that we use dpu in cloud computing,the arch is x86,80
> > cores. We will have a lots of virtio devices,like 512 or more.
> > When we probe about 200 virtio_blk devices,it will fail and
> > the stack is printed as follows:
> >
> > [25338.485128] virtio-pci 0000:b3:00.0: virtio_pci: leaving for legacy driver
> > [25338.496174] genirq: Flags mismatch irq 0. 00000080 (virtio418) vs. 00015a00
> > (timer)
> > [25338.503822] CPU: 20 PID: 5431 Comm: kworker/20:0 Kdump: loaded Tainted:
> > G OE --------- - - 4.18.0-305.30.1.el8.x86_64
> > [25338.516403] Hardware name: Inspur NF5280M5/YZMB-00882-10E, BIOS
> > 4.1.21 08/25/2021
> > [25338.523881] Workqueue: events work_for_cpu_fn
> > [25338.528235] Call Trace:
> > [25338.530687] dump_stack+0x5c/0x80
> > [25338.534000] __setup_irq.cold.53+0x7c/0xd3
> > [25338.538098] request_threaded_irq+0xf5/0x160
> > [25338.542371] vp_find_vqs+0xc7/0x190
> > [25338.545866] init_vq+0x17c/0x2e0 [virtio_blk]
> > [25338.550223] ? ncpus_cmp_func+0x10/0x10
> > [25338.554061] virtblk_probe+0xe6/0x8a0 [virtio_blk]
> > [25338.558846] virtio_dev_probe+0x158/0x1f0
> > [25338.562861] really_probe+0x255/0x4a0
> > [25338.566524] ? __driver_attach_async_helper+0x90/0x90
> > [25338.571567] driver_probe_device+0x49/0xc0
> > [25338.575660] bus_for_each_drv+0x79/0xc0
> > [25338.579499] __device_attach+0xdc/0x160
> > [25338.583337] bus_probe_device+0x9d/0xb0
> > [25338.587167] device_add+0x418/0x780
> > [25338.590654] register_virtio_device+0x9e/0xe0
> > [25338.595011] virtio_pci_probe+0xb3/0x140
> > [25338.598941] local_pci_probe+0x41/0x90
> > [25338.602689] work_for_cpu_fn+0x16/0x20
> > [25338.606443] process_one_work+0x1a7/0x360
> > [25338.610456] ? create_worker+0x1a0/0x1a0
> > [25338.614381] worker_thread+0x1cf/0x390
> > [25338.618132] ? create_worker+0x1a0/0x1a0
> > [25338.622051] kthread+0x116/0x130
> > [25338.625283] ? kthread_flush_work_fn+0x10/0x10
> > [25338.629731] ret_from_fork+0x1f/0x40
> > [25338.633395] virtio_blk: probe of virtio418 failed with error -16
> >
> > The log :
> > "genirq: Flags mismatch irq 0. 00000080 (virtio418) vs. 00015a00 (timer)"
> > was printed because of the irq 0 is used by timer exclusive,and when
> > vp_find_vqs call vp_find_vqs_msix and returns false twice (for
> > whatever reason), then it will call vp_find_vqs_intx as a fallback.
> > Because vp_dev->pci_dev->irq is zero, we request irq 0 with
> > flag IRQF_SHARED, and get a backtrace like above.
> >
> > According to PCI spec about "Interrupt Pin" Register (Offset 3Dh):
> > "The Interrupt Pin register is a read-only register that identifies the
> > legacy interrupt Message(s) the Function uses. Valid values are 01h, 02h,
> > 03h, and 04h that map to legacy interrupt Messages for INTA,
> > INTB, INTC, and INTD respectively. A value of 00h indicates that the
> > Function uses no legacy interrupt Message(s)."
> >
> > So if vp_dev->pci_dev->pin is zero, we should not request legacy
> > interrupt.
> >
> > Signed-off-by: Angus Chen <[email protected]>
> > Suggested-by: Michael S. Tsirkin <[email protected]>
> > Message-Id: <[email protected]>
> > Signed-off-by: Michael S. Tsirkin <[email protected]>
> > Signed-off-by: Sasha Levin <[email protected]>
> > ---
> > drivers/virtio/virtio_pci_common.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio_pci_common.c
> > b/drivers/virtio/virtio_pci_common.c
> > index 37e3ba5dadf6..d634eb926a2f 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -389,6 +389,9 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned
> > nvqs,
> > true, false);
> > if (!err)
> > return 0;
> > + /* Is there an interrupt pin? If not give up. */
> > + if (!(to_vp_device(vdev)->pci_dev->pin))
> > + return err;
> > /* Finally fall back to regular interrupts. */
> > return vp_try_to_find_vqs(vdev, nvqs, vqs, callbacks, names,
> > false, false);
> > --
> > 2.35.1
>
> the patch 71491c54eafa31 has been fixed by 2145ab513e3b3,
> It is report by Michael Ellerman <[email protected]> and suggested by linus.
> If it is merged in the stable git repo, I worry about powerpc arch.
> Thans.

Yes, please either pick up both this and the fixup or none, and
same for all other stable trees where this was autoselected.

It looks like autoselection basically picks up everything that
has a Fixes tag in it yes?

--
MST