Pali, et al,
I'm encountering a problem booting an i.MX6-based device (Gateworks
Ventana SBC). This is apparently a known issue:
https://lkml.org/lkml/2020/7/16/388
Do you guys know of a fix for this? Booting this machine reliably is
lately impossible.
First, it spews a warning:
pcieport 0000:00:00.0: PME: Signaling with IRQ 310
sysfs: cannot create duplicate filename '/devices/platform/soc/1ffc000.pcie/pci0000:00/0000:00:00.0/0000:e0'
CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.14.0-rc3+ #40
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Workqueue: events_unbound async_run_entry_fn
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x40/0x4c
dump_stack_lvl from sysfs_warn_dup+0x54/0x60
sysfs_warn_dup from sysfs_add_file_mode_ns+0x154/0x1a8
sysfs_add_file_mode_ns from sysfs_create_bin_file+0x60/0x8c
sysfs_create_bin_file from pci_create_resource_files+0xf4/0x140
pci_create_resource_files from pci_bus_add_device+0x20/0x8c
pci_bus_add_device from pci_bus_add_devices+0x3c/0x80
pci_bus_add_devices from pci_bus_add_devices+0x70/0x80
pci_bus_add_devices from pci_host_probe+0x3c/0x90
pci_host_probe from dw_pcie_host_init+0x200/0x4b4
dw_pcie_host_init from imx6_pcie_probe+0x338/0x668
imx6_pcie_probe from platform_probe+0x80/0xc0
platform_probe from really_probe+0x158/0x324
really_probe from __driver_probe_device+0x84/0xe4
__driver_probe_device from driver_probe_device+0x34/0xd0
driver_probe_device from __driver_attach_async_helper+0x20/0x38
__driver_attach_async_helper from async_run_entry_fn+0x24/0xb4
async_run_entry_fn from process_one_work+0x164/0x3b0
process_one_work from worker_thread+0x2c/0x52c
worker_thread from kthread+0x110/0x154
kthread from ret_from_fork+0x14/0x24
pcieport 0000:01:00.0: enabling device (0140 -> 0143)
Then:
Unable to handle kernel paging request at virtual address 6f736572 (ASCII = "reso")
pgd = (ptrval)
[6f736572] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.14.0-rc3+ #40
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Workqueue: events_unbound async_run_entry_fn
PC is at string_nocheck+0x20/0xa0
LR is at string+0x54/0x64
Process kworker/u8:0 (pid: 7, stack limit = 0x(ptrval))
string_nocheck from string+0x54/0x64
string from vsnprintf+0x1c4/0x48c
vsnprintf from vprintk_store+0x80/0x33c
vprintk_store from vprintk_emit+0x6c/0x1e0
vprintk_emit from vprintk_default+0x20/0x28
vprintk_default from printk+0x1c/0x2c
printk from sysfs_warn_dup+0x50/0x60
sysfs_warn_dup from sysfs_add_file_mode_ns+0x154/0x1a8
sysfs_add_file_mode_ns from sysfs_create_bin_file+0x60/0x8c
sysfs_create_bin_file from pci_create_resource_files+0xf4/0x140
pci_create_resource_files from pci_bus_add_device+0x20/0x8c
pci_bus_add_device from pci_bus_add_devices+0x3c/0x80
pci_bus_add_devices from pci_bus_add_devices+0x70/0x80
pci_bus_add_devices from pci_host_probe+0x3c/0x90
pci_host_probe from dw_pcie_host_init+0x200/0x4b4
dw_pcie_host_init from imx6_pcie_probe+0x338/0x668
imx6_pcie_probe from platform_probe+0x80/0xc0
platform_probe from really_probe+0x158/0x324
really_probe from __driver_probe_device+0x84/0xe4
__driver_probe_device from driver_probe_device+0x34/0xd0
driver_probe_device from __driver_attach_async_helper+0x20/0x38
__driver_attach_async_helper from async_run_entry_fn+0x24/0xb4
async_run_entry_fn from process_one_work+0x164/0x3b0
process_one_work from worker_thread+0x2c/0x52c
worker_thread from kthread+0x110/0x154
kthread from ret_from_fork+0x14/0x24
Code: e1dd42b2 e58d301c e3540000 0a000019 (e5d2e000)
--
Krzysztof "Chris" Hałasa
Sieć Badawcza Łukasiewicz
Przemysłowy Instytut Automatyki i Pomiarów PIAP
Al. Jerozolimskie 202, 02-486 Warszawa
On Friday 30 July 2021 10:18:44 Krzysztof Hałasa wrote:
> Pali, et al,
>
> I'm encountering a problem booting an i.MX6-based device (Gateworks
> Ventana SBC). This is apparently a known issue:
> https://lkml.org/lkml/2020/7/16/388
>
> Do you guys know of a fix for this? Booting this machine reliably is
> lately impossible.
Hello Krzysztof Hałasa! This is known issue and Krzysztof Wilczyński is
working on it... Just it will take some time as fixing it is not so
easy.
> First, it spews a warning:
>
> pcieport 0000:00:00.0: PME: Signaling with IRQ 310
>
> sysfs: cannot create duplicate filename '/devices/platform/soc/1ffc000.pcie/pci0000:00/0000:00:00.0/0000:e0'
> CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.14.0-rc3+ #40
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> Workqueue: events_unbound async_run_entry_fn
>
> unwind_backtrace from show_stack+0x10/0x14
> show_stack from dump_stack_lvl+0x40/0x4c
> dump_stack_lvl from sysfs_warn_dup+0x54/0x60
> sysfs_warn_dup from sysfs_add_file_mode_ns+0x154/0x1a8
> sysfs_add_file_mode_ns from sysfs_create_bin_file+0x60/0x8c
> sysfs_create_bin_file from pci_create_resource_files+0xf4/0x140
> pci_create_resource_files from pci_bus_add_device+0x20/0x8c
> pci_bus_add_device from pci_bus_add_devices+0x3c/0x80
> pci_bus_add_devices from pci_bus_add_devices+0x70/0x80
> pci_bus_add_devices from pci_host_probe+0x3c/0x90
> pci_host_probe from dw_pcie_host_init+0x200/0x4b4
> dw_pcie_host_init from imx6_pcie_probe+0x338/0x668
> imx6_pcie_probe from platform_probe+0x80/0xc0
> platform_probe from really_probe+0x158/0x324
> really_probe from __driver_probe_device+0x84/0xe4
> __driver_probe_device from driver_probe_device+0x34/0xd0
> driver_probe_device from __driver_attach_async_helper+0x20/0x38
> __driver_attach_async_helper from async_run_entry_fn+0x24/0xb4
> async_run_entry_fn from process_one_work+0x164/0x3b0
> process_one_work from worker_thread+0x2c/0x52c
> worker_thread from kthread+0x110/0x154
> kthread from ret_from_fork+0x14/0x24
>
> pcieport 0000:01:00.0: enabling device (0140 -> 0143)
>
> Then:
> Unable to handle kernel paging request at virtual address 6f736572 (ASCII = "reso")
> pgd = (ptrval)
> [6f736572] *pgd=00000000
> Internal error: Oops: 5 [#1] SMP ARM
> Modules linked in:
> CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.14.0-rc3+ #40
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> Workqueue: events_unbound async_run_entry_fn
> PC is at string_nocheck+0x20/0xa0
> LR is at string+0x54/0x64
> Process kworker/u8:0 (pid: 7, stack limit = 0x(ptrval))
>
> string_nocheck from string+0x54/0x64
> string from vsnprintf+0x1c4/0x48c
> vsnprintf from vprintk_store+0x80/0x33c
> vprintk_store from vprintk_emit+0x6c/0x1e0
> vprintk_emit from vprintk_default+0x20/0x28
> vprintk_default from printk+0x1c/0x2c
> printk from sysfs_warn_dup+0x50/0x60
> sysfs_warn_dup from sysfs_add_file_mode_ns+0x154/0x1a8
> sysfs_add_file_mode_ns from sysfs_create_bin_file+0x60/0x8c
> sysfs_create_bin_file from pci_create_resource_files+0xf4/0x140
> pci_create_resource_files from pci_bus_add_device+0x20/0x8c
> pci_bus_add_device from pci_bus_add_devices+0x3c/0x80
> pci_bus_add_devices from pci_bus_add_devices+0x70/0x80
> pci_bus_add_devices from pci_host_probe+0x3c/0x90
> pci_host_probe from dw_pcie_host_init+0x200/0x4b4
> dw_pcie_host_init from imx6_pcie_probe+0x338/0x668
> imx6_pcie_probe from platform_probe+0x80/0xc0
> platform_probe from really_probe+0x158/0x324
> really_probe from __driver_probe_device+0x84/0xe4
> __driver_probe_device from driver_probe_device+0x34/0xd0
> driver_probe_device from __driver_attach_async_helper+0x20/0x38
> __driver_attach_async_helper from async_run_entry_fn+0x24/0xb4
> async_run_entry_fn from process_one_work+0x164/0x3b0
> process_one_work from worker_thread+0x2c/0x52c
> worker_thread from kthread+0x110/0x154
> kthread from ret_from_fork+0x14/0x24
> Code: e1dd42b2 e58d301c e3540000 0a000019 (e5d2e000)
> --
> Krzysztof "Chris" Hałasa
>
> Sieć Badawcza Łukasiewicz
> Przemysłowy Instytut Automatyki i Pomiarów PIAP
> Al. Jerozolimskie 202, 02-486 Warszawa
On Fri, Jul 30, 2021 at 10:18:44AM +0200, Krzysztof Hałasa wrote:
> Pali, et al,
>
> I'm encountering a problem booting an i.MX6-based device (Gateworks
> Ventana SBC). This is apparently a known issue:
> https://lkml.org/lkml/2020/7/16/388
>
> Do you guys know of a fix for this? Booting this machine reliably is
> lately impossible.
Hi Krzysztof, is this still an issue?
e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute") and
similar patches appeared in v5.13, and they should resolve things like
this:
sysfs: cannot create duplicate filename '/devices/platform/soc/d0070000.pcie/pci0000:00/0000:00:00.0/config'
But your report is different because it looks like we have a problem
when creating a *directory*:
> sysfs: cannot create duplicate filename '/devices/platform/soc/1ffc000.pcie/pci0000:00/0000:00:00.0/0000:e0'
and then it seems that we corrupted a pointer:
> Unable to handle kernel paging request at virtual address 6f736572 (ASCII = "reso")
which is even more worrisome.
Bjorn
Hi Bjorn, Krzysztof,
Bjorn Helgaas <[email protected]> writes:
> On Fri, Jul 30, 2021 at 10:18:44AM +0200, Krzysztof Hałasa wrote:
>> I'm encountering a problem booting an i.MX6-based device (Gateworks
>> Ventana SBC). This is apparently a known issue:
>> https://lkml.org/lkml/2020/7/16/388
> Hi Krzysztof, is this still an issue?
Well... I'm still using i.MX6 with 5.14 and Krzysiek Wilczyński's "PCI:
Race condition in pci_create_sysfs_dev_files (can't boot)" patch (which
fixes the problem). It seems parts of this patch are now in mainline,
but most of it is still missing. So I guess the problem isn't fixed.
Perhaps because the DEC Alpha part of the patch is not yet ready?
Krzysztof?
> e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute") and
> similar patches appeared in v5.13,
Right, but they hadn't fixed the problem. 5.13 was released in June, and
I have started using the patch in August (apparently with pre-5.14
first, then with final 5.14).
--
Krzysztof "Chris" Hałasa
Sieć Badawcza Łukasiewicz
Przemysłowy Instytut Automatyki i Pomiarów PIAP
Al. Jerozolimskie 202, 02-486 Warszawa
On Fri, Jan 21, 2022 at 08:14:46AM +0100, Krzysztof Hałasa wrote:
> Hi Bjorn, Krzysztof,
>
> Bjorn Helgaas <[email protected]> writes:
>
> > On Fri, Jul 30, 2021 at 10:18:44AM +0200, Krzysztof Hałasa wrote:
> >> I'm encountering a problem booting an i.MX6-based device (Gateworks
> >> Ventana SBC). This is apparently a known issue:
> >> https://lkml.org/lkml/2020/7/16/388
>
> > Hi Krzysztof, is this still an issue?
>
> Well... I'm still using i.MX6 with 5.14 and Krzysiek Wilczyński's "PCI:
> Race condition in pci_create_sysfs_dev_files (can't boot)" patch (which
> fixes the problem). It seems parts of this patch are now in mainline,
> but most of it is still missing. So I guess the problem isn't fixed.
>
> Perhaps because the DEC Alpha part of the patch is not yet ready?
> Krzysztof?
>
> > e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute") and
> > similar patches appeared in v5.13,
>
> Right, but they hadn't fixed the problem. 5.13 was released in June, and
> I have started using the patch in August (apparently with pre-5.14
> first, then with final 5.14).
Thanks. e1d3f3268b0e and related patches converted individual files
("config", "rom", "vpd", etc) to static attributes, but since the
problem you're seeing is with a directory, it's likely different.
I opened this bugzilla report to try to keep this from getting lost:
https://bugzilla.kernel.org/show_bug.cgi?id=215515
Bjorn
Bjorn Helgaas <[email protected]> writes:
> Thanks. e1d3f3268b0e and related patches converted individual files
> ("config", "rom", "vpd", etc) to static attributes, but since the
> problem you're seeing is with a directory, it's likely different.
>
> I opened this bugzilla report to try to keep this from getting lost:
> https://bugzilla.kernel.org/show_bug.cgi?id=215515
Ok, thanks.
--
Krzysztof "Chris" Hałasa
Sieć Badawcza Łukasiewicz
Przemysłowy Instytut Automatyki i Pomiarów PIAP
Al. Jerozolimskie 202, 02-486 Warszawa
[+cc Koen, Dexuan from this old thread:
https://lore.kernel.org/all/20200716110423.xtfyb3n6tn5ixedh@pali/]
On Mon, Jan 31, 2022 at 12:56:18PM +0100, Krzysztof Hałasa wrote:
> Bjorn Helgaas <[email protected]> writes:
>
> > Thanks. e1d3f3268b0e and related patches converted individual files
> > ("config", "rom", "vpd", etc) to static attributes, but since the
> > problem you're seeing is with a directory, it's likely different.
> >
> > I opened this bugzilla report to try to keep this from getting lost:
> > https://bugzilla.kernel.org/show_bug.cgi?id=215515
>
> Ok, thanks.
Koen collected some interesting logs at
https://lore.kernel.org/all/[email protected]/
They're from v5.10, which was before all of Krzysztof W's nice work
converting to static attributes, but Koen's log shows the error
happening in the pci_sysfs_init() initcall, which is *after*
imx6_pcie_probe():
imx6_pcie_probe # probably device initcall (level 6)
...
pci_create_sysfs_dev_files
pci_sysfs_init # late initcall (level 7)
pci_create_sysfs_dev_files
"sysfs: cannot create duplicate filename"
Krzysztof, you're running v5.14, which includes Krzysztof W's work,
but that shouldn't affect the imx6_pcie_probe()/pci_sysfs_init()
ordering. Your log shows the error in imx6_pcie_probe().
Would you mind adding the patch below and attaching a complete dmesg
log to the bugzilla?
Bjorn
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 7bbf2673c7f2..6a2b62fe7704 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1200,6 +1200,9 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine)
res_attr->attr.mode = 0600;
res_attr->size = pci_resource_len(pdev, num);
res_attr->private = (void *)(unsigned long)num;
+ pci_info(pdev, "pci_create_attr:");
+ print_hex_dump(KERN_INFO, "", DUMP_PREFIX_OFFSET, 16, 1,
+ res_attr, 64, true);
retval = sysfs_create_bin_file(&pdev->dev.kobj, res_attr);
if (retval)
kfree(res_attr);
@@ -1380,6 +1383,9 @@ static const struct attribute_group pci_dev_reset_attr_group = {
int __must_check pci_create_sysfs_dev_files(struct pci_dev *pdev)
{
+ pci_info(pdev, "pci_create_sysfs_dev_files\n");
+ dump_stack();
+
if (!sysfs_initialized)
return -EACCES;
Bjorn,
> Krzysztof, you're running v5.14, which includes Krzysztof W's work,
> but that shouldn't affect the imx6_pcie_probe()/pci_sysfs_init()
> ordering. Your log shows the error in imx6_pcie_probe().
>
> Would you mind adding the patch below and attaching a complete dmesg
> log to the bugzilla?
Sure. In fact I was using 5.16-rc4 (with the media tree) lately, without
the fix from the other Krzysztof, and, as I can now see, it worked
without the problem in question.
I switched it to 5.14 I was using previously, reverted the fix, added
your patch, and tried again. It booted 100+ times without experiencing
the problem, too.
I will try again with the previous 5.14-rc3+ (which needed the fix) next
week. Perhaps it was an unrelated bug affecting 5.14-rc only (fixed
before final 5.14)? I haven't checked back then if 5.14 needed the fix,
most probably I've simply rebased from rc3+ to final.
The other possibility is changed .config. Or, I don't know, the gcc
versions. We will see.
--
Krzysztof "Chris" Hałasa
Sieć Badawcza Łukasiewicz
Przemysłowy Instytut Automatyki i Pomiarów PIAP
Al. Jerozolimskie 202, 02-486 Warszawa
Hi Bjorn,
Bjorn Helgaas <[email protected]> writes:
> Koen collected some interesting logs at
> https://lore.kernel.org/all/[email protected]/
> They're from v5.10, which was before all of Krzysztof W's nice work
> converting to static attributes, but Koen's log shows the error
> happening in the pci_sysfs_init() initcall, which is *after*
> imx6_pcie_probe():
>
> imx6_pcie_probe # probably device initcall (level 6)
> ...
> pci_create_sysfs_dev_files
>
> pci_sysfs_init # late initcall (level 7)
> pci_create_sysfs_dev_files
> "sysfs: cannot create duplicate filename"
Well, imx6_pcie_probe() is called indirectly by
platform_driver_register(). I guess it doesn't know about the initcall
ordering, after it's registered.
It looks like the problem is the imx6_pcie_probe() (via
dw_pcie_host_init() -> pci_host_probe()) is interfering with
pci_sysfs_init(). This may eventually cause some invalid memory access
as well.
BTW I thought for a moment that maybe 5.14 is free from this. I was
wrong. The problem doesn't manifest itself on my custom i.MX6 device
(using Tinyrex CPU module from Voipac/Fedevel, perhaps because I don't
use any PCI devices there). It does on Ventana SBC from Gateworks,
though. BTW the above (and below) is v5.16.
It goes like this:
[0.096212] do_initcall_level: 6
[0.105625] imx6_pcie_init
[0.106106] imx6_pcie_probe <<<<<<<<<<<<<<<<<<<<<
[0.106412] imx6q-pcie 1ffc000.pcie: host bridge /soc/pcie@1ffc000 ranges:
[0.322613] imx6q-pcie 1ffc000.pcie: Link up
[0.322776] imx6q-pcie 1ffc000.pcie: PCI host bridge to bus 0000:00
[0.322790] pci_bus 0000:00: root bus resource [bus 00-ff]
[0.405251] do_initcall_level: 6 ENDs but imx6_pcie_probe() still active
[0.405262] do_initcall_level: 7
[0.410393] pci_sysfs_init <<<<<<<<<<<<<<<<<<<<<
[0.410423] pci 0000:00:00.0: pci_create_sysfs_dev_files
[0.410532] [<8068091c>] (pci_create_sysfs_dev_files)
[0.410551] [<80918710>] (pci_sysfs_init)
[0.410568] [<8010166c>] (do_one_initcall)
[0.410717] pci_sysfs_init END <<<<<<<<<<<<<<<<<<<<<
[0.533843] [<803f1c74>] (pci_bus_add_devices)
[0.533862] [<803f574c>] (pci_host_probe)
[0.533879] [<80414310>] (dw_pcie_host_init)
[0.533895] [<80681ac8>] (imx6_pcie_probe)
[0.533915] [<8045e9e4>] (platform_probe)
(Repeats multiple times, I guess for each PCI device)
[0.543893] imx6_pcie_probe END <<<<<<<<<<<<<<<<<<<<<
[0.692244] do_initcall_level: 7 END
--
Krzysztof "Chris" Hałasa
Sieć Badawcza Łukasiewicz
Przemysłowy Instytut Automatyki i Pomiarów PIAP
Al. Jerozolimskie 202, 02-486 Warszawa
On 15.02.22 07:35, Krzysztof Hałasa wrote:
> Hi Bjorn,
>
> Bjorn Helgaas <[email protected]> writes:
>
>> Koen collected some interesting logs at
>> https://lore.kernel.org/all/[email protected]/
>> They're from v5.10, which was before all of Krzysztof W's nice work
>> converting to static attributes, but Koen's log shows the error
>> happening in the pci_sysfs_init() initcall, which is *after*
>> imx6_pcie_probe():
>>
>> imx6_pcie_probe # probably device initcall (level 6)
>> ...
>> pci_create_sysfs_dev_files
>>
>> pci_sysfs_init # late initcall (level 7)
>> pci_create_sysfs_dev_files
>> "sysfs: cannot create duplicate filename"
> Well, imx6_pcie_probe() is called indirectly by
> platform_driver_register(). I guess it doesn't know about the initcall
> ordering, after it's registered.
>
> It looks like the problem is the imx6_pcie_probe() (via
> dw_pcie_host_init() -> pci_host_probe()) is interfering with
> pci_sysfs_init(). This may eventually cause some invalid memory access
> as well.
>
> BTW I thought for a moment that maybe 5.14 is free from this. I was
> wrong. The problem doesn't manifest itself on my custom i.MX6 device
> (using Tinyrex CPU module from Voipac/Fedevel, perhaps because I don't
> use any PCI devices there). It does on Ventana SBC from Gateworks,
> though. BTW the above (and below) is v5.16.
>
> It goes like this:
> [0.096212] do_initcall_level: 6
> [0.105625] imx6_pcie_init
> [0.106106] imx6_pcie_probe <<<<<<<<<<<<<<<<<<<<<
> [0.106412] imx6q-pcie 1ffc000.pcie: host bridge /soc/pcie@1ffc000 ranges:
>
> [0.322613] imx6q-pcie 1ffc000.pcie: Link up
> [0.322776] imx6q-pcie 1ffc000.pcie: PCI host bridge to bus 0000:00
> [0.322790] pci_bus 0000:00: root bus resource [bus 00-ff]
>
> [0.405251] do_initcall_level: 6 ENDs but imx6_pcie_probe() still active
> [0.405262] do_initcall_level: 7
>
> [0.410393] pci_sysfs_init <<<<<<<<<<<<<<<<<<<<<
> [0.410423] pci 0000:00:00.0: pci_create_sysfs_dev_files
>
> [0.410532] [<8068091c>] (pci_create_sysfs_dev_files)
> [0.410551] [<80918710>] (pci_sysfs_init)
> [0.410568] [<8010166c>] (do_one_initcall)
>
> [0.410717] pci_sysfs_init END <<<<<<<<<<<<<<<<<<<<<
>
> [0.533843] [<803f1c74>] (pci_bus_add_devices)
> [0.533862] [<803f574c>] (pci_host_probe)
> [0.533879] [<80414310>] (dw_pcie_host_init)
> [0.533895] [<80681ac8>] (imx6_pcie_probe)
> [0.533915] [<8045e9e4>] (platform_probe)
> (Repeats multiple times, I guess for each PCI device)
>
> [0.543893] imx6_pcie_probe END <<<<<<<<<<<<<<<<<<<<<
>
> [0.692244] do_initcall_level: 7 END
Hi all,
Any update on this topic?
I just tested kernel 5.15 on imx6 (gateworks Ventana 5200) and as soon
as I connect a pcie device on one of the ports,
following happens:
https://pastebin.com/raw/mgfSvTRB
Any idea if this is related?
Thanks,
Koen
On 01.04.22 15:50, Koen Vandeputte wrote:
>
> On 15.02.22 07:35, Krzysztof Hałasa wrote:
>> Hi Bjorn,
>>
>> Bjorn Helgaas <[email protected]> writes:
>>
>>> Koen collected some interesting logs at
>>> https://lore.kernel.org/all/[email protected]/
>>>
>>> They're from v5.10, which was before all of Krzysztof W's nice work
>>> converting to static attributes, but Koen's log shows the error
>>> happening in the pci_sysfs_init() initcall, which is *after*
>>> imx6_pcie_probe():
>>>
>>> imx6_pcie_probe # probably device initcall (level 6)
>>> ...
>>> pci_create_sysfs_dev_files
>>>
>>> pci_sysfs_init # late initcall (level 7)
>>> pci_create_sysfs_dev_files
>>> "sysfs: cannot create duplicate filename"
>> Well, imx6_pcie_probe() is called indirectly by
>> platform_driver_register(). I guess it doesn't know about the initcall
>> ordering, after it's registered.
>>
>> It looks like the problem is the imx6_pcie_probe() (via
>> dw_pcie_host_init() -> pci_host_probe()) is interfering with
>> pci_sysfs_init(). This may eventually cause some invalid memory access
>> as well.
>>
>> BTW I thought for a moment that maybe 5.14 is free from this. I was
>> wrong. The problem doesn't manifest itself on my custom i.MX6 device
>> (using Tinyrex CPU module from Voipac/Fedevel, perhaps because I don't
>> use any PCI devices there). It does on Ventana SBC from Gateworks,
>> though. BTW the above (and below) is v5.16.
>>
>> It goes like this:
>> [0.096212] do_initcall_level: 6
>> [0.105625] imx6_pcie_init
>> [0.106106] imx6_pcie_probe <<<<<<<<<<<<<<<<<<<<<
>> [0.106412] imx6q-pcie 1ffc000.pcie: host bridge /soc/pcie@1ffc000
>> ranges:
>>
>> [0.322613] imx6q-pcie 1ffc000.pcie: Link up
>> [0.322776] imx6q-pcie 1ffc000.pcie: PCI host bridge to bus 0000:00
>> [0.322790] pci_bus 0000:00: root bus resource [bus 00-ff]
>>
>> [0.405251] do_initcall_level: 6 ENDs but imx6_pcie_probe() still active
>> [0.405262] do_initcall_level: 7
>>
>> [0.410393] pci_sysfs_init <<<<<<<<<<<<<<<<<<<<<
>> [0.410423] pci 0000:00:00.0: pci_create_sysfs_dev_files
>>
>> [0.410532] [<8068091c>] (pci_create_sysfs_dev_files)
>> [0.410551] [<80918710>] (pci_sysfs_init)
>> [0.410568] [<8010166c>] (do_one_initcall)
>>
>> [0.410717] pci_sysfs_init END <<<<<<<<<<<<<<<<<<<<<
>>
>> [0.533843] [<803f1c74>] (pci_bus_add_devices)
>> [0.533862] [<803f574c>] (pci_host_probe)
>> [0.533879] [<80414310>] (dw_pcie_host_init)
>> [0.533895] [<80681ac8>] (imx6_pcie_probe)
>> [0.533915] [<8045e9e4>] (platform_probe)
>> (Repeats multiple times, I guess for each PCI device)
>>
>> [0.543893] imx6_pcie_probe END <<<<<<<<<<<<<<<<<<<<<
>>
>> [0.692244] do_initcall_level: 7 END
>
>
> Hi all,
>
> Any update on this topic?
> I just tested kernel 5.15 on imx6 (gateworks Ventana 5200) and as soon
> as I connect a pcie device on one of the ports,
>
> following happens:
>
> https://pastebin.com/raw/mgfSvTRB
>
> Any idea if this is related?
>
>
> Thanks,
>
> Koen
>
Hi all,
I tested a bit more today and simply let the board reboot all day long.
After roughly 20 reboots, it suddenly booted once stable without any
errors/warnings.
Looks like a race condition ..
Any idea?
Thanks,
Koen
On 06.04.22 16:08, Koen Vandeputte wrote:
>
> On 01.04.22 15:50, Koen Vandeputte wrote:
>>
>> On 15.02.22 07:35, Krzysztof Hałasa wrote:
>>> Hi Bjorn,
>>>
>>> Bjorn Helgaas <[email protected]> writes:
>>>
>>>> Koen collected some interesting logs at
>>>> https://lore.kernel.org/all/[email protected]/
>>>>
>>>> They're from v5.10, which was before all of Krzysztof W's nice work
>>>> converting to static attributes, but Koen's log shows the error
>>>> happening in the pci_sysfs_init() initcall, which is *after*
>>>> imx6_pcie_probe():
>>>>
>>>> imx6_pcie_probe # probably device initcall (level 6)
>>>> ...
>>>> pci_create_sysfs_dev_files
>>>>
>>>> pci_sysfs_init # late initcall (level 7)
>>>> pci_create_sysfs_dev_files
>>>> "sysfs: cannot create duplicate filename"
>>> Well, imx6_pcie_probe() is called indirectly by
>>> platform_driver_register(). I guess it doesn't know about the initcall
>>> ordering, after it's registered.
>>>
>>> It looks like the problem is the imx6_pcie_probe() (via
>>> dw_pcie_host_init() -> pci_host_probe()) is interfering with
>>> pci_sysfs_init(). This may eventually cause some invalid memory access
>>> as well.
>>>
>>> BTW I thought for a moment that maybe 5.14 is free from this. I was
>>> wrong. The problem doesn't manifest itself on my custom i.MX6 device
>>> (using Tinyrex CPU module from Voipac/Fedevel, perhaps because I don't
>>> use any PCI devices there). It does on Ventana SBC from Gateworks,
>>> though. BTW the above (and below) is v5.16.
>>>
>>> It goes like this:
>>> [0.096212] do_initcall_level: 6
>>> [0.105625] imx6_pcie_init
>>> [0.106106] imx6_pcie_probe <<<<<<<<<<<<<<<<<<<<<
>>> [0.106412] imx6q-pcie 1ffc000.pcie: host bridge /soc/pcie@1ffc000
>>> ranges:
>>>
>>> [0.322613] imx6q-pcie 1ffc000.pcie: Link up
>>> [0.322776] imx6q-pcie 1ffc000.pcie: PCI host bridge to bus 0000:00
>>> [0.322790] pci_bus 0000:00: root bus resource [bus 00-ff]
>>>
>>> [0.405251] do_initcall_level: 6 ENDs but imx6_pcie_probe() still active
>>> [0.405262] do_initcall_level: 7
>>>
>>> [0.410393] pci_sysfs_init <<<<<<<<<<<<<<<<<<<<<
>>> [0.410423] pci 0000:00:00.0: pci_create_sysfs_dev_files
>>>
>>> [0.410532] [<8068091c>] (pci_create_sysfs_dev_files)
>>> [0.410551] [<80918710>] (pci_sysfs_init)
>>> [0.410568] [<8010166c>] (do_one_initcall)
>>>
>>> [0.410717] pci_sysfs_init END <<<<<<<<<<<<<<<<<<<<<
>>>
>>> [0.533843] [<803f1c74>] (pci_bus_add_devices)
>>> [0.533862] [<803f574c>] (pci_host_probe)
>>> [0.533879] [<80414310>] (dw_pcie_host_init)
>>> [0.533895] [<80681ac8>] (imx6_pcie_probe)
>>> [0.533915] [<8045e9e4>] (platform_probe)
>>> (Repeats multiple times, I guess for each PCI device)
>>>
>>> [0.543893] imx6_pcie_probe END <<<<<<<<<<<<<<<<<<<<<
>>>
>>> [0.692244] do_initcall_level: 7 END
>>
>>
>> Hi all,
>>
>> Any update on this topic?
>> I just tested kernel 5.15 on imx6 (gateworks Ventana 5200) and as
>> soon as I connect a pcie device on one of the ports,
>>
>> following happens:
>>
>> https://pastebin.com/raw/mgfSvTRB
>>
>> Any idea if this is related?
>>
>>
>> Thanks,
>>
>> Koen
>>
> Hi all,
>
> I tested a bit more today and simply let the board reboot all day long.
> After roughly 20 reboots, it suddenly booted once stable without any
> errors/warnings.
>
> Looks like a race condition ..
>
> Any idea?
>
> Thanks,
>
> Koen
>
As an additional addendum:
This issue is seen on a Gateworks Ventana gw5200 which has a PLX bridge.
I also have a GW5100 which is identical but without the PLX bridge, and
it works fine every time.
So when a PCI device is sitting behind a bridge, the issue is triggered.
Hope this helps to easily reproduce.
Koen