2024-02-06 16:11:42

by Nícolas F. R. A. Prado

[permalink] [raw]
Subject: Probe regression of efuse@11f10000 on mt8183-kukui-jacuzzi-juniper-sku16 running next-20240202

Hi,

KernelCI has identified a regression [1] on the
mt8183-kukui-jacuzzi-juniper-sku16 machine running on next-20240202 compared to
next-20240118:

<4>[ 0.627077] sysfs: cannot create duplicate filename '/bus/nvmem/devices/mtk-efuse0'
<4>[ 0.634945] CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc2-next-20240202 #1
<4>[ 0.642542] Hardware name: Google juniper sku16 board (DT)
<4>[ 0.648237] Call trace:
<4>[ 0.650917] dump_backtrace+0x94/0xec
<4>[ 0.654815] show_stack+0x18/0x24
<4>[ 0.658359] dump_stack_lvl+0x48/0x60
<4>[ 0.662252] dump_stack+0x18/0x24
<4>[ 0.665796] sysfs_warn_dup+0x64/0x80
<4>[ 0.669688] sysfs_do_create_link_sd+0xf0/0xf8
<4>[ 0.674353] sysfs_create_link+0x20/0x40
<4>[ 0.678500] bus_add_device+0x64/0x104
<4>[ 0.682475] device_add+0x33c/0x778
<4>[ 0.686193] nvmem_register+0x514/0x714
<4>[ 0.690256] devm_nvmem_register+0x1c/0x6c
<4>[ 0.694577] mtk_efuse_probe+0xe8/0x170
<4>[ 0.698637] platform_probe+0x68/0xd8
<4>[ 0.702525] really_probe+0x148/0x2b4
<4>[ 0.706413] __driver_probe_device+0x78/0x12c
<4>[ 0.710990] driver_probe_device+0xdc/0x160
<4>[ 0.715394] __driver_attach+0x94/0x19c
<4>[ 0.719453] bus_for_each_dev+0x74/0xd4
<4>[ 0.723512] driver_attach+0x24/0x30
<4>[ 0.727312] bus_add_driver+0xe4/0x1e8
<4>[ 0.731284] driver_register+0x60/0x128
<4>[ 0.735343] __platform_driver_register+0x28/0x34
<4>[ 0.740265] mtk_efuse_init+0x20/0x5c
<4>[ 0.744155] do_one_initcall+0x6c/0x1b0
<4>[ 0.748214] kernel_init_freeable+0x1c8/0x290
<4>[ 0.752795] kernel_init+0x20/0x1dc
<4>[ 0.756512] ret_from_fork+0x10/0x20
<4>[ 0.760353] mediatek,efuse: probe of 11f10000.efuse failed with error -17

This efuse probe failure causes the probe failure of other components that
depend on it, including the display pipeline:

/soc/dsi-phy@11e50000
/soc/dsi@14014000
/soc/efuse@11f10000
/soc/i2c@11008000/anx7625@58
/soc/i2c@11008000/anx7625@58/aux-bus/panel
/soc/thermal@1100b000

There is a series already addressing the issue [2]. The first two patches have
been merged into the mediatek tree, but that tree isn't currently being
integrated into linux-next. Besides that, patch 3 hasn't been merged into the
nvmem tree yet, and it is required in order to solve the issue.

I'm sending this regression report so we can properly track the regression while
the fixes don't land on linux-next.

Thanks,
N?colas

[1] https://linux.kernelci.org/test/plan/id/65bd63c3f12d8a95e200a225/
[2] https://lore.kernel.org/linux-mediatek/[email protected]/

#regzbot introduced next-20240118..next-20240202


2024-03-08 14:32:08

by Nícolas F. R. A. Prado

[permalink] [raw]
Subject: Re: Probe regression of efuse@11f10000 on mt8183-kukui-jacuzzi-juniper-sku16 running next-20240202

On Tue, Feb 06, 2024 at 11:11:00AM -0500, N?colas F. R. A. Prado wrote:
> Hi,
>
> KernelCI has identified a regression [1] on the
> mt8183-kukui-jacuzzi-juniper-sku16 machine running on next-20240202 compared to
> next-20240118:
>
> <4>[ 0.627077] sysfs: cannot create duplicate filename '/bus/nvmem/devices/mtk-efuse0'
> <4>[ 0.634945] CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc2-next-20240202 #1
> <4>[ 0.642542] Hardware name: Google juniper sku16 board (DT)
> <4>[ 0.648237] Call trace:
> <4>[ 0.650917] dump_backtrace+0x94/0xec
> <4>[ 0.654815] show_stack+0x18/0x24
> <4>[ 0.658359] dump_stack_lvl+0x48/0x60
> <4>[ 0.662252] dump_stack+0x18/0x24
> <4>[ 0.665796] sysfs_warn_dup+0x64/0x80
> <4>[ 0.669688] sysfs_do_create_link_sd+0xf0/0xf8
> <4>[ 0.674353] sysfs_create_link+0x20/0x40
> <4>[ 0.678500] bus_add_device+0x64/0x104
> <4>[ 0.682475] device_add+0x33c/0x778
> <4>[ 0.686193] nvmem_register+0x514/0x714
> <4>[ 0.690256] devm_nvmem_register+0x1c/0x6c
> <4>[ 0.694577] mtk_efuse_probe+0xe8/0x170
> <4>[ 0.698637] platform_probe+0x68/0xd8
> <4>[ 0.702525] really_probe+0x148/0x2b4
> <4>[ 0.706413] __driver_probe_device+0x78/0x12c
> <4>[ 0.710990] driver_probe_device+0xdc/0x160
> <4>[ 0.715394] __driver_attach+0x94/0x19c
> <4>[ 0.719453] bus_for_each_dev+0x74/0xd4
> <4>[ 0.723512] driver_attach+0x24/0x30
> <4>[ 0.727312] bus_add_driver+0xe4/0x1e8
> <4>[ 0.731284] driver_register+0x60/0x128
> <4>[ 0.735343] __platform_driver_register+0x28/0x34
> <4>[ 0.740265] mtk_efuse_init+0x20/0x5c
> <4>[ 0.744155] do_one_initcall+0x6c/0x1b0
> <4>[ 0.748214] kernel_init_freeable+0x1c8/0x290
> <4>[ 0.752795] kernel_init+0x20/0x1dc
> <4>[ 0.756512] ret_from_fork+0x10/0x20
> <4>[ 0.760353] mediatek,efuse: probe of 11f10000.efuse failed with error -17
>
> This efuse probe failure causes the probe failure of other components that
> depend on it, including the display pipeline:
>
> /soc/dsi-phy@11e50000
> /soc/dsi@14014000
> /soc/efuse@11f10000
> /soc/i2c@11008000/anx7625@58
> /soc/i2c@11008000/anx7625@58/aux-bus/panel
> /soc/thermal@1100b000
>
> There is a series already addressing the issue [2]. The first two patches have
> been merged into the mediatek tree, but that tree isn't currently being
> integrated into linux-next. Besides that, patch 3 hasn't been merged into the
> nvmem tree yet, and it is required in order to solve the issue.
>
> I'm sending this regression report so we can properly track the regression while
> the fixes don't land on linux-next.
>
> Thanks,
> N?colas
>
> [1] https://linux.kernelci.org/test/plan/id/65bd63c3f12d8a95e200a225/
> [2] https://lore.kernel.org/linux-mediatek/[email protected]/
>
> #regzbot introduced next-20240118..next-20240202

Not sure why this got filed by regzbot under the mainline tab rather than next.
Maybe it was the missing collon? Let me try again:

#regzbot introduced: next-20240118..next-20240202

In any case, the fix has already made it to linux-next, so this should close the
regression:

#regzbot fix: nvmem: mtk-efuse: Drop NVMEM device name

Thanks,
N?colas

Subject: Re: Probe regression of efuse@11f10000 on mt8183-kukui-jacuzzi-juniper-sku16 running next-20240202

On 08.03.24 15:31, Nícolas F. R. A. Prado wrote:
> On Tue, Feb 06, 2024 at 11:11:00AM -0500, Nícolas F. R. A. Prado wrote:
>>
>> KernelCI has identified a regression [1] on the
>> mt8183-kukui-jacuzzi-juniper-sku16 machine running on next-20240202 compared to
>> next-20240118:
>>
>> #regzbot introduced next-20240118..next-20240202
>
> Not sure why this got filed by regzbot under the mainline tab rather than next.
> Maybe it was the missing collon?

No, I guess that is a bug in regzbot: the support for -next is there,
but not much tested. Will need to take a closer look, will do so in the
next few days.

> In any case, the fix has already made it to linux-next, so this should close the
> regression:
>
> #regzbot fix: nvmem: mtk-efuse: Drop NVMEM device name

Out of interest: Is involving regzbot worth it in case the fix is
already in -next? Or is that primarily to keep track of "we found a
regression and a fix was already available in next". I don't mind if
it's the latter, just curious.

Ciao, Thorsten

2024-03-11 13:55:33

by Nícolas F. R. A. Prado

[permalink] [raw]
Subject: Re: Probe regression of efuse@11f10000 on mt8183-kukui-jacuzzi-juniper-sku16 running next-20240202

On Sat, Mar 09, 2024 at 03:06:38PM +0100, Thorsten Leemhuis wrote:
> On 08.03.24 15:31, N?colas F. R. A. Prado wrote:
> > On Tue, Feb 06, 2024 at 11:11:00AM -0500, N?colas F. R. A. Prado wrote:
> >>
> >> KernelCI has identified a regression [1] on the
> >> mt8183-kukui-jacuzzi-juniper-sku16 machine running on next-20240202 compared to
> >> next-20240118:
> >>
> >> #regzbot introduced next-20240118..next-20240202
> >
> > Not sure why this got filed by regzbot under the mainline tab rather than next.
> > Maybe it was the missing collon?
>
> No, I guess that is a bug in regzbot: the support for -next is there,
> but not much tested. Will need to take a closer look, will do so in the
> next few days.

Ah ok, that's good to know.

>
> > In any case, the fix has already made it to linux-next, so this should close the
> > regression:
> >
> > #regzbot fix: nvmem: mtk-efuse: Drop NVMEM device name
>
> Out of interest: Is involving regzbot worth it in case the fix is
> already in -next? Or is that primarily to keep track of "we found a
> regression and a fix was already available in next". I don't mind if
> it's the latter, just curious.

When the fix has already landed in next, no, I guess it wouldn't make sense to
involve regzbot, as that would be like creating a regression ticket that is
closed from the start. (And basically it would mean we're testing an outdated
kernel release, which is not very helpful)

In this case when I sent the regression report the regression was still there on
the latest next. The fix had been sent, but not yet merged into next. In that
case it's very much helpful to involve regzbot so we can track the fix and make
sure it gets applied. Also, as was for this case, and probably many others, the
fix patch didn't have that much information on the symptoms and circumstances of
the issue, while the regression report I sent did, so the regression report
should be easier to find for people encountering the regression, and then they
can easily see the status of the regression through regzbot.

Thanks,
N?colas