2023-02-26 01:37:53

by Munehisa Kamata

[permalink] [raw]
Subject: [PATCH] pwm: core: Zero-initialize the temp state

Zero-initialize the on-stack structure to avoid unexpected behaviors. Some
drivers may not set or initialize all the values in pwm_state through their
.get_state() callback and therefore some random values may remain there and
be set into pwm->state eventually.

This actually caused regression on ODROID-N2+ as reported in [1]; kernel
fails to boot due to random panic or hang-up.

[1] https://forum.odroid.com/viewtopic.php?f=177&t=46360

Fixes: c73a3107624d ("pwm: Handle .get_state() failures")
Cc: [email protected] # 6.2
Signed-off-by: Munehisa Kamata <[email protected]>
---
drivers/pwm/core.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/pwm/core.c b/drivers/pwm/core.c
index e01147f66e15..6eac8022a2c2 100644
--- a/drivers/pwm/core.c
+++ b/drivers/pwm/core.c
@@ -117,6 +117,7 @@ static int pwm_device_request(struct pwm_device *pwm, const char *label)
if (pwm->chip->ops->get_state) {
struct pwm_state state;

+ memset(&state, 0, sizeof(struct pwm_state));
err = pwm->chip->ops->get_state(pwm->chip, pwm, &state);
trace_pwm_get(pwm, &state, err);

--
2.25.1



2023-02-26 09:18:09

by Uwe Kleine-König

[permalink] [raw]
Subject: Re: [PATCH] pwm: core: Zero-initialize the temp state

Hello,

On Sat, Feb 25, 2023 at 05:37:21PM -0800, Munehisa Kamata wrote:
> Zero-initialize the on-stack structure to avoid unexpected behaviors. Some
> drivers may not set or initialize all the values in pwm_state through their
> .get_state() callback and therefore some random values may remain there and
> be set into pwm->state eventually.
>
> This actually caused regression on ODROID-N2+ as reported in [1]; kernel
> fails to boot due to random panic or hang-up.
>
> [1] https://forum.odroid.com/viewtopic.php?f=177&t=46360

Looking through the report I wonder what actually made the machine fail
to boot. Doesn't this paper over a problem that should be fixed (also)
somewhere else?

Which driver is the one that the problem occur for?

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | https://www.pengutronix.de/ |


Attachments:
(No filename) (917.00 B)
signature.asc (488.00 B)
Download all attachments

2023-02-27 02:48:54

by Munehisa Kamata

[permalink] [raw]
Subject: Re: [PATCH] pwm: core: Zero-initialize the temp state

On Sun, 2023-02-26 09:17:52 +0000, Uwe Kleine-König <[email protected]> wrote:
>
> Hello,
>
> On Sat, Feb 25, 2023 at 05:37:21PM -0800, Munehisa Kamata wrote:
> > Zero-initialize the on-stack structure to avoid unexpected behaviors. Some
> > drivers may not set or initialize all the values in pwm_state through their
> > .get_state() callback and therefore some random values may remain there and
> > be set into pwm->state eventually.
> >
> > This actually caused regression on ODROID-N2+ as reported in [1]; kernel
> > fails to boot due to random panic or hang-up.
> >
> > [1] https://forum.odroid.com/viewtopic.php?f=177&t=46360
>
> Looking through the report I wonder what actually made the machine fail
> to boot. Doesn't this paper over a problem that should be fixed (also)
> somewhere else?

Yes, you're right and I think the commit message could have described more
details. This patch is for ensuring all drivers see zeroed state same as
before, but I still don't fully understand how it ends up such random-ish
crashes. There could be another or bigger problem that should be fixed.

> Which driver is the one that the problem occur for?

It's pwm-meson and seems to be caused by random polarity value somehow. If
meson_pwm_get_state() sets polarity to zero instead, I don't see the
problem. According the comment, looks like the driver does not set polarity
by design.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pwm/pwm-meson.c?h=v6.2&id=c9c3395d5e3dcc6daee66c6908354d47bf98cb0c#n9

Before commit c73a3107624d, the memory was kcalloc'ed and always zeroed,
but I don't know if the driver was (is) assuming that. I'm adding Meson SoC
people to CC.

Apart from how polarity should be handled in the driver, I'm very puzzled
by the crashes I've observed so far. There seems to be some patterns, but
they don't seem obviously related to PWM.

[ 1.360542] soc soc0: Amlogic Meson G12B (S922X) Revision 29:c (40:2) Detected
[ 1.363906] Insufficient stack space to handle exception!
[ 1.363913] ESR: 0x0000000096000047 -- DABT (current EL)
[ 1.363917] FAR: 0xffff800009a47ff0
[ 1.363920] Task stack: [0xffff800009a48000..0xffff800009a4c000]
[ 1.363923] IRQ stack: [0xffff8000099a8000..0xffff8000099ac000]
[ 1.363927] Overflow stack: [0xffff000077b76100..0xffff000077b77100]
[ 1.363931] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.2.0-odroid-arm64 #47
[ 1.363938] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ 1.363941] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1.363947] pc : __do_kernel_fault+0x4/0x180
[ 1.363961] lr : do_page_fault+0xd0/0x3d0
[ 1.363968] sp : ffff800009a48020
[ 1.363970] x29: ffff800009a48020 x28: ffff000002948000 x27: 0000000000000000
[ 1.363980] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[ 1.363987] x23: 0000000096000004 x22: ffff800009a48110 x21: 00000000ffffffff
[ 1.363994] x20: ffff800009a48110 x19: 00000000ffffffff x18: 000000000000001c
[ 1.364001] x17: 00000000863047f6 x16: ffff800009860a80 x15: 0000000000000003
[ 1.364008] x14: 00000000000003ef x13: 0000000000000000 x12: 0000000000000279
[ 1.364015] x11: 0000000000000001 x10: 0000000000000001 x9 : 0000000000000400
[ 1.364021] x8 : ffff000077b7fc40 x7 : ffff000077b7fbc0 x6 : ffff8000094f7990
[ 1.364028] x5 : 0000000000000000 x4 : ffff000002948000 x3 : 0001000000000000
[ 1.364035] x2 : ffff800009a48110 x1 : 0000000096000004 x0 : 00000000ffffffff
[ 1.364043] Kernel panic - not syncing: kernel stack overflow
[ 1.364047] SMP: stopping secondary CPUs

Another example:

[ 1.360997] soc soc0: Amlogic Meson G12B (S922X) Revision 29:c (40:2) Detected
[ 1.364333] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 1.364435] Unable to handle kernel paging request at virtual address 0000000008003388
[ 1.367470] Internal error: Oops - Undefined instruction: 0000000002000000 [#1] PREEMPT SMP
[ 1.367483] Modules linked in:
[ 1.367490] CPU: 2 PID: 66 Comm: kworker/u12:1 Not tainted 6.2.0-odroid-arm64 #47
[ 1.367498] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ 1.367502] Workqueue: events_unbound async_run_entry_fn
[ 1.367516] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1.367523] pc : arch_timer_handler_phys+0x0/0x44
[ 1.367535] lr : handle_percpu_devid_irq+0x84/0x140
[ 1.367543] sp : ffff8000099abf70
[ 1.367546] x29: ffff8000099abf70 x28: ffff000002a40000 x27: 0000000000000006
[ 1.367556] x26: ffff800009d208a4 x25: ffff800009d20b44 x24: 0000000000000008
[ 1.367565] x23: ffff8000094f8b50 x22: 000000000000000b x21: ffff000002829000
[ 1.367573] x20: ffff800008f89fb0 x19: ffff00000282a600 x18: ffff0000021792e8
[ 1.367581] x17: ffff80006e685000 x16: ffff8000099ac000 x15: 0000000000004000
[ 1.367589] x14: ffff800009d27bde x13: ffff800009d2887b x12: 0000000000000308
[ 1.367597] x11: 0000000000000a92 x10: 0000000000000068 x9 : ef01a5948f440b31
[ 1.367606] x8 : 0000000000003273 x7 : ffff800009d262a8 x6 : 000000003754d375
[ 1.367614] x5 : ffff80006e685000 x4 : ffff8000099abf70 x3 : ffff80006e685000
[ 1.367622] x2 : ffff800008c26c30 x1 : ffff000077b83a00 x0 : 000000000000000b
[ 1.367630] Call trace:
[ 1.367633] arch_timer_handler_phys+0x0/0x44
[ 1.367641] generic_handle_domain_irq+0x2c/0x44
[ 1.367650] gic_handle_irq+0x44/0xc4
[ 1.367659] call_on_irq_stack+0x2c/0x5c
[ 1.367666] do_interrupt_handler+0x80/0x84
[ 1.367672] el1_interrupt+0x34/0x70
[ 1.367682] el1h_64_irq_handler+0x18/0x2c
[ 1.367686] el1h_64_irq+0x64/0x68
[ 1.367690] HUF_decompress4X1_usingDTable_internal_default+0x2fc/0xd60
[ 1.367702] HUF_decompress4X_hufOnly_wksp_bmi2+0xec/0x140
[ 1.367711] ZSTD_decodeLiteralsBlock+0x580/0x630
[ 1.367717] ZSTD_decompressBlock_internal.part.0+0x5c/0x1b4
[ 1.367723] ZSTD_decompressBlock_internal+0x1c/0x30
[ 1.367729] ZSTD_decompressContinue.part.0+0x364/0x444
[ 1.367734] ZSTD_decompressContinueStream+0x98/0x180
[ 1.367738] ZSTD_decompressStream+0x5b0/0x8c0
[ 1.367743] zstd_decompress_stream+0x10/0x20
[ 1.367751] unzstd+0x290/0x37c
[ 1.367760] unpack_to_rootfs+0x174/0x298
[ 1.367767] do_populate_rootfs+0x84/0x1dc
[ 1.367773] async_run_entry_fn+0x34/0x150
[ 1.367778] process_one_work+0x1d0/0x320
[ 1.367785] worker_thread+0x14c/0x444

Perhaps, the driver or the PWM core could do something wrong based on the
invalid polarity and corrupt certain memory location, but I'm still not
able to relate the random value to these crashes.


Regards,
Munehisa

> Best regards
> Uwe
>
> --
> Pengutronix e.K. | Uwe Kleine-König |
> Industrial Linux Solutions | https://www.pengutronix.de/ |
>

2023-02-27 09:20:35

by Uwe Kleine-König

[permalink] [raw]
Subject: Re: [PATCH] pwm: core: Zero-initialize the temp state

Hello,

On Sun, Feb 26, 2023 at 06:48:30PM -0800, Munehisa Kamata wrote:
> On Sun, 2023-02-26 09:17:52 +0000, Uwe Kleine-K?nig <[email protected]> wrote:
> > On Sat, Feb 25, 2023 at 05:37:21PM -0800, Munehisa Kamata wrote:
> > > Zero-initialize the on-stack structure to avoid unexpected behaviors. Some
> > > drivers may not set or initialize all the values in pwm_state through their
> > > .get_state() callback and therefore some random values may remain there and
> > > be set into pwm->state eventually.
> > >
> > > This actually caused regression on ODROID-N2+ as reported in [1]; kernel
> > > fails to boot due to random panic or hang-up.
> > >
> > > [1] https://forum.odroid.com/viewtopic.php?f=177&t=46360
> >
> > Looking through the report I wonder what actually made the machine fail
> > to boot. Doesn't this paper over a problem that should be fixed (also)
> > somewhere else?
>
> Yes, you're right and I think the commit message could have described more
> details. This patch is for ensuring all drivers see zeroed state same as
> before, but I still don't fully understand how it ends up such random-ish
> crashes. There could be another or bigger problem that should be fixed.
>
> > Which driver is the one that the problem occur for?
>
> It's pwm-meson and seems to be caused by random polarity value somehow. If
> meson_pwm_get_state() sets polarity to zero instead, I don't see the
> problem. According the comment, looks like the driver does not set polarity
> by design.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pwm/pwm-meson.c?h=v6.2&id=c9c3395d5e3dcc6daee66c6908354d47bf98cb0c#n9
>
> Before commit c73a3107624d, the memory was kcalloc'ed and always zeroed,
> but I don't know if the driver was (is) assuming that. I'm adding Meson SoC
> people to CC.
>
> Apart from how polarity should be handled in the driver, I'm very puzzled
> by the crashes I've observed so far. There seems to be some patterns, but
> they don't seem obviously related to PWM.
>
> [ 1.360542] soc soc0: Amlogic Meson G12B (S922X) Revision 29:c (40:2) Detected
> [ 1.363906] Insufficient stack space to handle exception!
> [ 1.363913] ESR: 0x0000000096000047 -- DABT (current EL)
> [ 1.363917] FAR: 0xffff800009a47ff0
> [ 1.363920] Task stack: [0xffff800009a48000..0xffff800009a4c000]
> [ 1.363923] IRQ stack: [0xffff8000099a8000..0xffff8000099ac000]
> [ 1.363927] Overflow stack: [0xffff000077b76100..0xffff000077b77100]
> [ 1.363931] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.2.0-odroid-arm64 #47
> [ 1.363938] Hardware name: Hardkernel ODROID-N2Plus (DT)
> [ 1.363941] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 1.363947] pc : __do_kernel_fault+0x4/0x180
> [ 1.363961] lr : do_page_fault+0xd0/0x3d0
> [ 1.363968] sp : ffff800009a48020
> [ 1.363970] x29: ffff800009a48020 x28: ffff000002948000 x27: 0000000000000000
> [ 1.363980] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
> [ 1.363987] x23: 0000000096000004 x22: ffff800009a48110 x21: 00000000ffffffff
> [ 1.363994] x20: ffff800009a48110 x19: 00000000ffffffff x18: 000000000000001c
> [ 1.364001] x17: 00000000863047f6 x16: ffff800009860a80 x15: 0000000000000003
> [ 1.364008] x14: 00000000000003ef x13: 0000000000000000 x12: 0000000000000279
> [ 1.364015] x11: 0000000000000001 x10: 0000000000000001 x9 : 0000000000000400
> [ 1.364021] x8 : ffff000077b7fc40 x7 : ffff000077b7fbc0 x6 : ffff8000094f7990
> [ 1.364028] x5 : 0000000000000000 x4 : ffff000002948000 x3 : 0001000000000000
> [ 1.364035] x2 : ffff800009a48110 x1 : 0000000096000004 x0 : 00000000ffffffff
> [ 1.364043] Kernel panic - not syncing: kernel stack overflow
> [ 1.364047] SMP: stopping secondary CPUs
>
> Another example:
>
> [ 1.360997] soc soc0: Amlogic Meson G12B (S922X) Revision 29:c (40:2) Detected
> [ 1.364333] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> [ 1.364435] Unable to handle kernel paging request at virtual address 0000000008003388
> [ 1.367470] Internal error: Oops - Undefined instruction: 0000000002000000 [#1] PREEMPT SMP
> [ 1.367483] Modules linked in:
> [ 1.367490] CPU: 2 PID: 66 Comm: kworker/u12:1 Not tainted 6.2.0-odroid-arm64 #47
> [ 1.367498] Hardware name: Hardkernel ODROID-N2Plus (DT)
> [ 1.367502] Workqueue: events_unbound async_run_entry_fn
> [ 1.367516] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 1.367523] pc : arch_timer_handler_phys+0x0/0x44
> [ 1.367535] lr : handle_percpu_devid_irq+0x84/0x140
> [ 1.367543] sp : ffff8000099abf70
> [ 1.367546] x29: ffff8000099abf70 x28: ffff000002a40000 x27: 0000000000000006
> [ 1.367556] x26: ffff800009d208a4 x25: ffff800009d20b44 x24: 0000000000000008
> [ 1.367565] x23: ffff8000094f8b50 x22: 000000000000000b x21: ffff000002829000
> [ 1.367573] x20: ffff800008f89fb0 x19: ffff00000282a600 x18: ffff0000021792e8
> [ 1.367581] x17: ffff80006e685000 x16: ffff8000099ac000 x15: 0000000000004000
> [ 1.367589] x14: ffff800009d27bde x13: ffff800009d2887b x12: 0000000000000308
> [ 1.367597] x11: 0000000000000a92 x10: 0000000000000068 x9 : ef01a5948f440b31
> [ 1.367606] x8 : 0000000000003273 x7 : ffff800009d262a8 x6 : 000000003754d375
> [ 1.367614] x5 : ffff80006e685000 x4 : ffff8000099abf70 x3 : ffff80006e685000
> [ 1.367622] x2 : ffff800008c26c30 x1 : ffff000077b83a00 x0 : 000000000000000b
> [ 1.367630] Call trace:
> [ 1.367633] arch_timer_handler_phys+0x0/0x44
> [ 1.367641] generic_handle_domain_irq+0x2c/0x44
> [ 1.367650] gic_handle_irq+0x44/0xc4
> [ 1.367659] call_on_irq_stack+0x2c/0x5c
> [ 1.367666] do_interrupt_handler+0x80/0x84
> [ 1.367672] el1_interrupt+0x34/0x70
> [ 1.367682] el1h_64_irq_handler+0x18/0x2c
> [ 1.367686] el1h_64_irq+0x64/0x68
> [ 1.367690] HUF_decompress4X1_usingDTable_internal_default+0x2fc/0xd60
> [ 1.367702] HUF_decompress4X_hufOnly_wksp_bmi2+0xec/0x140
> [ 1.367711] ZSTD_decodeLiteralsBlock+0x580/0x630
> [ 1.367717] ZSTD_decompressBlock_internal.part.0+0x5c/0x1b4
> [ 1.367723] ZSTD_decompressBlock_internal+0x1c/0x30
> [ 1.367729] ZSTD_decompressContinue.part.0+0x364/0x444
> [ 1.367734] ZSTD_decompressContinueStream+0x98/0x180
> [ 1.367738] ZSTD_decompressStream+0x5b0/0x8c0
> [ 1.367743] zstd_decompress_stream+0x10/0x20
> [ 1.367751] unzstd+0x290/0x37c
> [ 1.367760] unpack_to_rootfs+0x174/0x298
> [ 1.367767] do_populate_rootfs+0x84/0x1dc
> [ 1.367773] async_run_entry_fn+0x34/0x150
> [ 1.367778] process_one_work+0x1d0/0x320
> [ 1.367785] worker_thread+0x14c/0x444
>
> Perhaps, the driver or the PWM core could do something wrong based on the
> invalid polarity and corrupt certain memory location, but I'm still not
> able to relate the random value to these crashes.

I can imagine that the compiler creates code (a jump table) that relies
on .polarity being either PWM_POLARITY_NORMAL or PWM_POLARITY_INVERSED
and that exibits UB if that isn't true. (I think the compiler is allowed
to do that.)

Looking at the driver, there are several problems:

- The driver does just

if (state->polarity == PWM_POLARITY_INVERSED)
duty = period - duty;

which is broken because each period is supposed to start with the
active part. (You could argue this being ridiculous and I'd agree.
But that's what we have and just doing it differently in a driver is
wrong.) The fix is to check if the hardware actually emits normal or
inversed polarity and refuse the other one. Then in .get_state() set
the appropriate polarity.

- In meson_pwm_calc() we have:

unsigned int duty, ...;

duty = state->duty_cycle;

which is wrong for state->duty_cycle > U32_MAX. The same for period.

- After period is fixed to be proper u64, fin_freq * (u64)period might
overflow.

- harmless: The check

duty_cnt = div64_u64(fin_freq * (u64)duty,
NSEC_PER_SEC * (pre_div + 1));
if (duty_cnt > 0xffff)
...

never triggers, as duty <= period (after fixing the first issue) and
we already know that

div64_u64(fin_freq * (u64)period, NSEC_PER_SEC * (pre_div + 1)) <= 0xffff.

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | https://www.pengutronix.de/ |


Attachments:
(No filename) (8.27 kB)
signature.asc (488.00 B)
Download all attachments

2023-02-28 08:54:11

by Munehisa Kamata

[permalink] [raw]
Subject: Re: [PATCH] pwm: core: Zero-initialize the temp state

Hi

On Mon, 2023-02-27 09:16:21 +0000, Uwe Kleine-König wrote:
>
> Hello,
>
> On Sun, Feb 26, 2023 at 06:48:30PM -0800, Munehisa Kamata wrote:
> > On Sun, 2023-02-26 09:17:52 +0000, Uwe Kleine-König <[email protected]> wrote:
> > > On Sat, Feb 25, 2023 at 05:37:21PM -0800, Munehisa Kamata wrote:
> > > > Zero-initialize the on-stack structure to avoid unexpected behaviors. Some
> > > > drivers may not set or initialize all the values in pwm_state through their
> > > > .get_state() callback and therefore some random values may remain there and
> > > > be set into pwm->state eventually.
> > > >
> > > > This actually caused regression on ODROID-N2+ as reported in [1]; kernel
> > > > fails to boot due to random panic or hang-up.
> > > >
> > > > [1] https://forum.odroid.com/viewtopic.php?f=177&t=46360
> > >
> > > Looking through the report I wonder what actually made the machine fail
> > > to boot. Doesn't this paper over a problem that should be fixed (also)
> > > somewhere else?
> >
> > Yes, you're right and I think the commit message could have described more
> > details. This patch is for ensuring all drivers see zeroed state same as
> > before, but I still don't fully understand how it ends up such random-ish
> > crashes. There could be another or bigger problem that should be fixed.
> >
> > > Which driver is the one that the problem occur for?
> >
> > It's pwm-meson and seems to be caused by random polarity value somehow. If
> > meson_pwm_get_state() sets polarity to zero instead, I don't see the
> > problem. According the comment, looks like the driver does not set polarity
> > by design.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pwm/pwm-meson.c?h=v6.2&id=c9c3395d5e3dcc6daee66c6908354d47bf98cb0c#n9
> >
> > Before commit c73a3107624d, the memory was kcalloc'ed and always zeroed,
> > but I don't know if the driver was (is) assuming that. I'm adding Meson SoC
> > people to CC.
> >
> > Apart from how polarity should be handled in the driver, I'm very puzzled
> > by the crashes I've observed so far. There seems to be some patterns, but
> > they don't seem obviously related to PWM.
> >
> > [ 1.360542] soc soc0: Amlogic Meson G12B (S922X) Revision 29:c (40:2) Detected
> > [ 1.363906] Insufficient stack space to handle exception!
> > [ 1.363913] ESR: 0x0000000096000047 -- DABT (current EL)
> > [ 1.363917] FAR: 0xffff800009a47ff0
> > [ 1.363920] Task stack: [0xffff800009a48000..0xffff800009a4c000]
> > [ 1.363923] IRQ stack: [0xffff8000099a8000..0xffff8000099ac000]
> > [ 1.363927] Overflow stack: [0xffff000077b76100..0xffff000077b77100]
> > [ 1.363931] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.2.0-odroid-arm64 #47
> > [ 1.363938] Hardware name: Hardkernel ODROID-N2Plus (DT)
> > [ 1.363941] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [ 1.363947] pc : __do_kernel_fault+0x4/0x180
> > [ 1.363961] lr : do_page_fault+0xd0/0x3d0
> > [ 1.363968] sp : ffff800009a48020
> > [ 1.363970] x29: ffff800009a48020 x28: ffff000002948000 x27: 0000000000000000
> > [ 1.363980] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
> > [ 1.363987] x23: 0000000096000004 x22: ffff800009a48110 x21: 00000000ffffffff
> > [ 1.363994] x20: ffff800009a48110 x19: 00000000ffffffff x18: 000000000000001c
> > [ 1.364001] x17: 00000000863047f6 x16: ffff800009860a80 x15: 0000000000000003
> > [ 1.364008] x14: 00000000000003ef x13: 0000000000000000 x12: 0000000000000279
> > [ 1.364015] x11: 0000000000000001 x10: 0000000000000001 x9 : 0000000000000400
> > [ 1.364021] x8 : ffff000077b7fc40 x7 : ffff000077b7fbc0 x6 : ffff8000094f7990
> > [ 1.364028] x5 : 0000000000000000 x4 : ffff000002948000 x3 : 0001000000000000
> > [ 1.364035] x2 : ffff800009a48110 x1 : 0000000096000004 x0 : 00000000ffffffff
> > [ 1.364043] Kernel panic - not syncing: kernel stack overflow
> > [ 1.364047] SMP: stopping secondary CPUs
> >
> > Another example:
> >
> > [ 1.360997] soc soc0: Amlogic Meson G12B (S922X) Revision 29:c (40:2) Detected
> > [ 1.364333] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> > [ 1.364435] Unable to handle kernel paging request at virtual address 0000000008003388
> > [ 1.367470] Internal error: Oops - Undefined instruction: 0000000002000000 [#1] PREEMPT SMP
> > [ 1.367483] Modules linked in:
> > [ 1.367490] CPU: 2 PID: 66 Comm: kworker/u12:1 Not tainted 6.2.0-odroid-arm64 #47
> > [ 1.367498] Hardware name: Hardkernel ODROID-N2Plus (DT)
> > [ 1.367502] Workqueue: events_unbound async_run_entry_fn
> > [ 1.367516] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [ 1.367523] pc : arch_timer_handler_phys+0x0/0x44
> > [ 1.367535] lr : handle_percpu_devid_irq+0x84/0x140
> > [ 1.367543] sp : ffff8000099abf70
> > [ 1.367546] x29: ffff8000099abf70 x28: ffff000002a40000 x27: 0000000000000006
> > [ 1.367556] x26: ffff800009d208a4 x25: ffff800009d20b44 x24: 0000000000000008
> > [ 1.367565] x23: ffff8000094f8b50 x22: 000000000000000b x21: ffff000002829000
> > [ 1.367573] x20: ffff800008f89fb0 x19: ffff00000282a600 x18: ffff0000021792e8
> > [ 1.367581] x17: ffff80006e685000 x16: ffff8000099ac000 x15: 0000000000004000
> > [ 1.367589] x14: ffff800009d27bde x13: ffff800009d2887b x12: 0000000000000308
> > [ 1.367597] x11: 0000000000000a92 x10: 0000000000000068 x9 : ef01a5948f440b31
> > [ 1.367606] x8 : 0000000000003273 x7 : ffff800009d262a8 x6 : 000000003754d375
> > [ 1.367614] x5 : ffff80006e685000 x4 : ffff8000099abf70 x3 : ffff80006e685000
> > [ 1.367622] x2 : ffff800008c26c30 x1 : ffff000077b83a00 x0 : 000000000000000b
> > [ 1.367630] Call trace:
> > [ 1.367633] arch_timer_handler_phys+0x0/0x44
> > [ 1.367641] generic_handle_domain_irq+0x2c/0x44
> > [ 1.367650] gic_handle_irq+0x44/0xc4
> > [ 1.367659] call_on_irq_stack+0x2c/0x5c
> > [ 1.367666] do_interrupt_handler+0x80/0x84
> > [ 1.367672] el1_interrupt+0x34/0x70
> > [ 1.367682] el1h_64_irq_handler+0x18/0x2c
> > [ 1.367686] el1h_64_irq+0x64/0x68
> > [ 1.367690] HUF_decompress4X1_usingDTable_internal_default+0x2fc/0xd60
> > [ 1.367702] HUF_decompress4X_hufOnly_wksp_bmi2+0xec/0x140
> > [ 1.367711] ZSTD_decodeLiteralsBlock+0x580/0x630
> > [ 1.367717] ZSTD_decompressBlock_internal.part.0+0x5c/0x1b4
> > [ 1.367723] ZSTD_decompressBlock_internal+0x1c/0x30
> > [ 1.367729] ZSTD_decompressContinue.part.0+0x364/0x444
> > [ 1.367734] ZSTD_decompressContinueStream+0x98/0x180
> > [ 1.367738] ZSTD_decompressStream+0x5b0/0x8c0
> > [ 1.367743] zstd_decompress_stream+0x10/0x20
> > [ 1.367751] unzstd+0x290/0x37c
> > [ 1.367760] unpack_to_rootfs+0x174/0x298
> > [ 1.367767] do_populate_rootfs+0x84/0x1dc
> > [ 1.367773] async_run_entry_fn+0x34/0x150
> > [ 1.367778] process_one_work+0x1d0/0x320
> > [ 1.367785] worker_thread+0x14c/0x444
> >
> > Perhaps, the driver or the PWM core could do something wrong based on the
> > invalid polarity and corrupt certain memory location, but I'm still not
> > able to relate the random value to these crashes.
>
> I can imagine that the compiler creates code (a jump table) that relies
> on .polarity being either PWM_POLARITY_NORMAL or PWM_POLARITY_INVERSED
> and that exibits UB if that isn't true. (I think the compiler is allowed
> to do that.)
>
> Looking at the driver, there are several problems:
>
> - The driver does just
>
> if (state->polarity == PWM_POLARITY_INVERSED)
> duty = period - duty;
>
> which is broken because each period is supposed to start with the
> active part. (You could argue this being ridiculous and I'd agree.
> But that's what we have and just doing it differently in a driver is
> wrong.) The fix is to check if the hardware actually emits normal or
> inversed polarity and refuse the other one. Then in .get_state() set
> the appropriate polarity.

Looking at the comment in the source, I'm not sure if such checking is
possible with this hardware.

From pwm-meson.c:
* The hardware has no "polarity" setting. This driver reverses the period
* cycles (the low length is inverted with the high length) for
* PWM_POLARITY_INVERSED. This means that .get_state cannot read the polarity
* from the hardware.

As far as I see, there seems to be some drivers that hard-code polarity in
.get_state() without getting it from hardware. Perhaps, we could do that
instead? Honestly, I don't understand how it's a bad idea here.

> - In meson_pwm_calc() we have:
>
> unsigned int duty, ...;
>
> duty = state->duty_cycle;
>
> which is wrong for state->duty_cycle > U32_MAX. The same for period.
>
> - After period is fixed to be proper u64, fin_freq * (u64)period might
> overflow.
>
> - harmless: The check
>
> duty_cnt = div64_u64(fin_freq * (u64)duty,
> NSEC_PER_SEC * (pre_div + 1));
> if (duty_cnt > 0xffff)
> ...
>
> never triggers, as duty <= period (after fixing the first issue) and
> we already know that
>
> div64_u64(fin_freq * (u64)period, NSEC_PER_SEC * (pre_div + 1)) <= 0xffff.

Ah, I believe this requires a separate fix.


Regards,
Munehisa

2023-02-28 09:39:22

by Uwe Kleine-König

[permalink] [raw]
Subject: Re: [PATCH] pwm: core: Zero-initialize the temp state

Hello,

On Tue, Feb 28, 2023 at 12:53:28AM -0800, Munehisa Kamata wrote:
> On Mon, 2023-02-27 09:16:21 +0000, Uwe Kleine-K?nig wrote:
> > On Sun, Feb 26, 2023 at 06:48:30PM -0800, Munehisa Kamata wrote:
> > > On Sun, 2023-02-26 09:17:52 +0000, Uwe Kleine-K?nig <[email protected]> wrote:
> > > > On Sat, Feb 25, 2023 at 05:37:21PM -0800, Munehisa Kamata wrote:
> > > > > Zero-initialize the on-stack structure to avoid unexpected behaviors. Some
> > > > > drivers may not set or initialize all the values in pwm_state through their
> > > > > .get_state() callback and therefore some random values may remain there and
> > > > > be set into pwm->state eventually.
> > > > >
> > > > > This actually caused regression on ODROID-N2+ as reported in [1]; kernel
> > > > > fails to boot due to random panic or hang-up.
> > > > >
> > > > > [1] https://forum.odroid.com/viewtopic.php?f=177&t=46360
> > > >
> > > > Looking through the report I wonder what actually made the machine fail
> > > > to boot. Doesn't this paper over a problem that should be fixed (also)
> > > > somewhere else?
> > >
> > > Yes, you're right and I think the commit message could have described more
> > > details. This patch is for ensuring all drivers see zeroed state same as
> > > before, but I still don't fully understand how it ends up such random-ish
> > > crashes. There could be another or bigger problem that should be fixed.
> > >
> > > > Which driver is the one that the problem occur for?
> > >
> > > It's pwm-meson and seems to be caused by random polarity value somehow. If
> > > meson_pwm_get_state() sets polarity to zero instead, I don't see the
> > > problem. According the comment, looks like the driver does not set polarity
> > > by design.
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pwm/pwm-meson.c?h=v6.2&id=c9c3395d5e3dcc6daee66c6908354d47bf98cb0c#n9
> > >
> > > Before commit c73a3107624d, the memory was kcalloc'ed and always zeroed,
> > > but I don't know if the driver was (is) assuming that. I'm adding Meson SoC
> > > people to CC.
> > >
> > > Apart from how polarity should be handled in the driver, I'm very puzzled
> > > by the crashes I've observed so far. There seems to be some patterns, but
> > > they don't seem obviously related to PWM.
> > >
> > > [ 1.360542] soc soc0: Amlogic Meson G12B (S922X) Revision 29:c (40:2) Detected
> > > [ 1.363906] Insufficient stack space to handle exception!
> > > [ 1.363913] ESR: 0x0000000096000047 -- DABT (current EL)
> > > [ 1.363917] FAR: 0xffff800009a47ff0
> > > [ 1.363920] Task stack: [0xffff800009a48000..0xffff800009a4c000]
> > > [ 1.363923] IRQ stack: [0xffff8000099a8000..0xffff8000099ac000]
> > > [ 1.363927] Overflow stack: [0xffff000077b76100..0xffff000077b77100]
> > > [ 1.363931] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.2.0-odroid-arm64 #47
> > > [ 1.363938] Hardware name: Hardkernel ODROID-N2Plus (DT)
> > > [ 1.363941] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > [ 1.363947] pc : __do_kernel_fault+0x4/0x180
> > > [ 1.363961] lr : do_page_fault+0xd0/0x3d0
> > > [ 1.363968] sp : ffff800009a48020
> > > [ 1.363970] x29: ffff800009a48020 x28: ffff000002948000 x27: 0000000000000000
> > > [ 1.363980] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
> > > [ 1.363987] x23: 0000000096000004 x22: ffff800009a48110 x21: 00000000ffffffff
> > > [ 1.363994] x20: ffff800009a48110 x19: 00000000ffffffff x18: 000000000000001c
> > > [ 1.364001] x17: 00000000863047f6 x16: ffff800009860a80 x15: 0000000000000003
> > > [ 1.364008] x14: 00000000000003ef x13: 0000000000000000 x12: 0000000000000279
> > > [ 1.364015] x11: 0000000000000001 x10: 0000000000000001 x9 : 0000000000000400
> > > [ 1.364021] x8 : ffff000077b7fc40 x7 : ffff000077b7fbc0 x6 : ffff8000094f7990
> > > [ 1.364028] x5 : 0000000000000000 x4 : ffff000002948000 x3 : 0001000000000000
> > > [ 1.364035] x2 : ffff800009a48110 x1 : 0000000096000004 x0 : 00000000ffffffff
> > > [ 1.364043] Kernel panic - not syncing: kernel stack overflow
> > > [ 1.364047] SMP: stopping secondary CPUs
> > >
> > > Another example:
> > >
> > > [ 1.360997] soc soc0: Amlogic Meson G12B (S922X) Revision 29:c (40:2) Detected
> > > [ 1.364333] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> > > [ 1.364435] Unable to handle kernel paging request at virtual address 0000000008003388
> > > [ 1.367470] Internal error: Oops - Undefined instruction: 0000000002000000 [#1] PREEMPT SMP
> > > [ 1.367483] Modules linked in:
> > > [ 1.367490] CPU: 2 PID: 66 Comm: kworker/u12:1 Not tainted 6.2.0-odroid-arm64 #47
> > > [ 1.367498] Hardware name: Hardkernel ODROID-N2Plus (DT)
> > > [ 1.367502] Workqueue: events_unbound async_run_entry_fn
> > > [ 1.367516] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > [ 1.367523] pc : arch_timer_handler_phys+0x0/0x44
> > > [ 1.367535] lr : handle_percpu_devid_irq+0x84/0x140
> > > [ 1.367543] sp : ffff8000099abf70
> > > [ 1.367546] x29: ffff8000099abf70 x28: ffff000002a40000 x27: 0000000000000006
> > > [ 1.367556] x26: ffff800009d208a4 x25: ffff800009d20b44 x24: 0000000000000008
> > > [ 1.367565] x23: ffff8000094f8b50 x22: 000000000000000b x21: ffff000002829000
> > > [ 1.367573] x20: ffff800008f89fb0 x19: ffff00000282a600 x18: ffff0000021792e8
> > > [ 1.367581] x17: ffff80006e685000 x16: ffff8000099ac000 x15: 0000000000004000
> > > [ 1.367589] x14: ffff800009d27bde x13: ffff800009d2887b x12: 0000000000000308
> > > [ 1.367597] x11: 0000000000000a92 x10: 0000000000000068 x9 : ef01a5948f440b31
> > > [ 1.367606] x8 : 0000000000003273 x7 : ffff800009d262a8 x6 : 000000003754d375
> > > [ 1.367614] x5 : ffff80006e685000 x4 : ffff8000099abf70 x3 : ffff80006e685000
> > > [ 1.367622] x2 : ffff800008c26c30 x1 : ffff000077b83a00 x0 : 000000000000000b
> > > [ 1.367630] Call trace:
> > > [ 1.367633] arch_timer_handler_phys+0x0/0x44
> > > [ 1.367641] generic_handle_domain_irq+0x2c/0x44
> > > [ 1.367650] gic_handle_irq+0x44/0xc4
> > > [ 1.367659] call_on_irq_stack+0x2c/0x5c
> > > [ 1.367666] do_interrupt_handler+0x80/0x84
> > > [ 1.367672] el1_interrupt+0x34/0x70
> > > [ 1.367682] el1h_64_irq_handler+0x18/0x2c
> > > [ 1.367686] el1h_64_irq+0x64/0x68
> > > [ 1.367690] HUF_decompress4X1_usingDTable_internal_default+0x2fc/0xd60
> > > [ 1.367702] HUF_decompress4X_hufOnly_wksp_bmi2+0xec/0x140
> > > [ 1.367711] ZSTD_decodeLiteralsBlock+0x580/0x630
> > > [ 1.367717] ZSTD_decompressBlock_internal.part.0+0x5c/0x1b4
> > > [ 1.367723] ZSTD_decompressBlock_internal+0x1c/0x30
> > > [ 1.367729] ZSTD_decompressContinue.part.0+0x364/0x444
> > > [ 1.367734] ZSTD_decompressContinueStream+0x98/0x180
> > > [ 1.367738] ZSTD_decompressStream+0x5b0/0x8c0
> > > [ 1.367743] zstd_decompress_stream+0x10/0x20
> > > [ 1.367751] unzstd+0x290/0x37c
> > > [ 1.367760] unpack_to_rootfs+0x174/0x298
> > > [ 1.367767] do_populate_rootfs+0x84/0x1dc
> > > [ 1.367773] async_run_entry_fn+0x34/0x150
> > > [ 1.367778] process_one_work+0x1d0/0x320
> > > [ 1.367785] worker_thread+0x14c/0x444
> > >
> > > Perhaps, the driver or the PWM core could do something wrong based on the
> > > invalid polarity and corrupt certain memory location, but I'm still not
> > > able to relate the random value to these crashes.
> >
> > I can imagine that the compiler creates code (a jump table) that relies
> > on .polarity being either PWM_POLARITY_NORMAL or PWM_POLARITY_INVERSED
> > and that exibits UB if that isn't true. (I think the compiler is allowed
> > to do that.)
> >
> > Looking at the driver, there are several problems:
> >
> > - The driver does just
> >
> > if (state->polarity == PWM_POLARITY_INVERSED)
> > duty = period - duty;
> >
> > which is broken because each period is supposed to start with the
> > active part. (You could argue this being ridiculous and I'd agree.
> > But that's what we have and just doing it differently in a driver is
> > wrong.) The fix is to check if the hardware actually emits normal or
> > inversed polarity and refuse the other one. Then in .get_state() set
> > the appropriate polarity.
>
> Looking at the comment in the source, I'm not sure if such checking is
> possible with this hardware.

If I understand the situation right the driver claims to support both
polarities, but if inverted polarity (i.e.

___ ___ ___ ___
\__________/ \__________/ \__________/ \__________/ \
^ ^ ^ ^ ^

) is requested it just emits a wave form with normal polarity and
duty_cycle adapted to match the "activity average" (i.e.

___ ___ ___ ___
/ \__________/ \__________/ \__________/ \__________/
^ ^ ^ ^ ^

). That it's impossible to determine the polarity is obvious then.
That's like a fruiterer who ships apples no matter if apples or oranges
where ordered and who should look at a shipped fruit and tell if this
belongs to a shipment of apples or oranges.

So the actual problem is not "the driver cannot determine polarity", but
"the driver must not fake support for inverted polarity".

> From pwm-meson.c:
> * The hardware has no "polarity" setting. This driver reverses the period
> * cycles (the low length is inverted with the high length) for
> * PWM_POLARITY_INVERSED. This means that .get_state cannot read the polarity
> * from the hardware.
>
> As far as I see, there seems to be some drivers that hard-code polarity in
> .get_state() without getting it from hardware. Perhaps, we could do that
> instead?

This is the right thing to do for hardware that only supports a single
polarity. This should be matched by

if (state->polarity != PWM_POLARITY_INVERSED)
return -EINVAL;

in .apply(). (Depending on which polarity is actually supported, the
test has to be adapted.)

> Honestly, I don't understand how it's a bad idea here.

For dimming a LED it doesn't matter. If you drive a motor and need to
sync the PWM signal to something else (another PWM or an enable GPIO) it
might matter. (Of course today's PWM framework doesn't support such a
synchronization and if I hadn't marked the period starts above there
would be no way to distinguish the above wave forms. It might only be
identified by an external observer while the wave configuration changes
and even that is hardware dependant. That's why I wrote about polarity
being ridiculous in an earlier mail.)

> > - In meson_pwm_calc() we have:
> >
> > unsigned int duty, ...;
> >
> > duty = state->duty_cycle;
> >
> > which is wrong for state->duty_cycle > U32_MAX. The same for period.
> >
> > - After period is fixed to be proper u64, fin_freq * (u64)period might
> > overflow.
> >
> > - harmless: The check
> >
> > duty_cnt = div64_u64(fin_freq * (u64)duty,
> > NSEC_PER_SEC * (pre_div + 1));
> > if (duty_cnt > 0xffff)
> > ...
> >
> > never triggers, as duty <= period (after fixing the first issue) and
> > we already know that
> >
> > div64_u64(fin_freq * (u64)period, NSEC_PER_SEC * (pre_div + 1)) <= 0xffff.
>
> Ah, I believe this requires a separate fix.

Yes. Three separate changes even, I'd say.

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | https://www.pengutronix.de/ |


Attachments:
(No filename) (11.47 kB)
signature.asc (488.00 B)
Download all attachments

2023-02-28 10:16:14

by Uwe Kleine-König

[permalink] [raw]
Subject: [PATCH] pwm: Zero-initialize the pwm_state passed to driver's .get_state()

This is just to ensure that .usage_power is properly initialized and
doesn't contain random stack data. The other members of struct pwm_state
should get a value assigned in a successful call to .get_state(). So in
the absence of bugs in driver implementations, this is only a safe-guard
and no fix.

Reported-by: Munehisa Kamata <[email protected]>
Signed-off-by: Uwe Kleine-K?nig <[email protected]>
---
drivers/pwm/core.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

Hello,

On Sat, Feb 25, 2023 at 05:37:21PM -0800, Munehisa Kamata wrote:
> Zero-initialize the on-stack structure to avoid unexpected behaviors. Some
> drivers may not set or initialize all the values in pwm_state through their
> .get_state() callback and therefore some random values may remain there and
> be set into pwm->state eventually.
>
> This actually caused regression on ODROID-N2+ as reported in [1]; kernel
> fails to boot due to random panic or hang-up.
>
> [1] https://forum.odroid.com/viewtopic.php?f=177&t=46360
>
> Fixes: c73a3107624d ("pwm: Handle .get_state() failures")
> Cc: [email protected] # 6.2
> Signed-off-by: Munehisa Kamata <[email protected]>

My patch is essentially the same as Munehisa's, just written a bit
differently (to maybe make it easier for the compiler to optimize it?)
and with an explaining comment. The actual motivation is different so
the commit log is considerably different, too.

I was unsure how to honor Munehisa's effort, I went with a
"Reported-by". Please tell me if you want this to be different.

Best regards
Uwe

diff --git a/drivers/pwm/core.c b/drivers/pwm/core.c
index e01147f66e15..533ef5bd3add 100644
--- a/drivers/pwm/core.c
+++ b/drivers/pwm/core.c
@@ -115,7 +115,14 @@ static int pwm_device_request(struct pwm_device *pwm, const char *label)
}

if (pwm->chip->ops->get_state) {
- struct pwm_state state;
+ /*
+ * Zero-initialize state because most drivers are unaware of
+ * .usage_power. The other members of state are supposed to be
+ * set by lowlevel drivers. We still initialize the whole
+ * structure for simplicity even though this might paper over
+ * faulty implementations of .get_state().
+ */
+ struct pwm_state state = { 0, };

err = pwm->chip->ops->get_state(pwm->chip, pwm, &state);
trace_pwm_get(pwm, &state, err);
--
2.39.1


--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | https://www.pengutronix.de/ |


Attachments:
(No filename) (2.45 kB)
signature.asc (488.00 B)
Download all attachments

2023-02-28 19:45:24

by Munehisa Kamata

[permalink] [raw]
Subject: Re: [PATCH] pwm: Zero-initialize the pwm_state passed to driver's .get_state()

On Tue, 2023-02-28 10:15:58 +0000, Uwe Kleine-König wrote:
>
> This is just to ensure that .usage_power is properly initialized and
> doesn't contain random stack data. The other members of struct pwm_state
> should get a value assigned in a successful call to .get_state(). So in
> the absence of bugs in driver implementations, this is only a safe-guard
> and no fix.
>
> Reported-by: Munehisa Kamata <[email protected]>
> Signed-off-by: Uwe Kleine-König <[email protected]>
> ---
> drivers/pwm/core.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> Hello,
>
> On Sat, Feb 25, 2023 at 05:37:21PM -0800, Munehisa Kamata wrote:
> > Zero-initialize the on-stack structure to avoid unexpected behaviors. Some
> > drivers may not set or initialize all the values in pwm_state through their
> > .get_state() callback and therefore some random values may remain there and
> > be set into pwm->state eventually.
> >
> > This actually caused regression on ODROID-N2+ as reported in [1]; kernel
> > fails to boot due to random panic or hang-up.
> >
> > [1] https://forum.odroid.com/viewtopic.php?f=177&t=46360
> >
> > Fixes: c73a3107624d ("pwm: Handle .get_state() failures")
> > Cc: [email protected] # 6.2
> > Signed-off-by: Munehisa Kamata <[email protected]>
>
> My patch is essentially the same as Munehisa's, just written a bit
> differently (to maybe make it easier for the compiler to optimize it?)
> and with an explaining comment. The actual motivation is different so
> the commit log is considerably different, too.
>
> I was unsure how to honor Munehisa's effort, I went with a
> "Reported-by". Please tell me if you want this to be different.

I'm okay with that, thank you.

Perhaps, you should also add Cc tag for the stable tree? I did that in my
patch and we're actually CCing to the stable list, but I'm not sure if it
can pick up your patch without the tag. This should be fixed in linux-6.2.y.


Regards,
Munehisa

> Best regards
> Uwe
>
> diff --git a/drivers/pwm/core.c b/drivers/pwm/core.c
> index e01147f66e15..533ef5bd3add 100644
> --- a/drivers/pwm/core.c
> +++ b/drivers/pwm/core.c
> @@ -115,7 +115,14 @@ static int pwm_device_request(struct pwm_device *pwm, const char *label)
> }
>
> if (pwm->chip->ops->get_state) {
> - struct pwm_state state;
> + /*
> + * Zero-initialize state because most drivers are unaware of
> + * .usage_power. The other members of state are supposed to be
> + * set by lowlevel drivers. We still initialize the whole
> + * structure for simplicity even though this might paper over
> + * faulty implementations of .get_state().
> + */
> + struct pwm_state state = { 0, };
>
> err = pwm->chip->ops->get_state(pwm->chip, pwm, &state);
> trace_pwm_get(pwm, &state, err);
> --
> 2.39.1
>
>
> --
> Pengutronix e.K. | Uwe Kleine-König |
> Industrial Linux Solutions | https://www.pengutronix.de/ |
>

2023-02-28 21:48:27

by Uwe Kleine-König

[permalink] [raw]
Subject: Re: [PATCH] pwm: Zero-initialize the pwm_state passed to driver's .get_state()

Hello,

On Tue, Feb 28, 2023 at 11:43:27AM -0800, Munehisa Kamata wrote:
> On Tue, 2023-02-28 10:15:58 +0000, Uwe Kleine-K?nig wrote:
> >
> > This is just to ensure that .usage_power is properly initialized and
> > doesn't contain random stack data. The other members of struct pwm_state
> > should get a value assigned in a successful call to .get_state(). So in
> > the absence of bugs in driver implementations, this is only a safe-guard
> > and no fix.
> >
> > Reported-by: Munehisa Kamata <[email protected]>
> > Signed-off-by: Uwe Kleine-K?nig <[email protected]>
> > ---
> > drivers/pwm/core.c | 9 ++++++++-
> > 1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > Hello,
> >
> > On Sat, Feb 25, 2023 at 05:37:21PM -0800, Munehisa Kamata wrote:
> > > Zero-initialize the on-stack structure to avoid unexpected behaviors. Some
> > > drivers may not set or initialize all the values in pwm_state through their
> > > .get_state() callback and therefore some random values may remain there and
> > > be set into pwm->state eventually.
> > >
> > > This actually caused regression on ODROID-N2+ as reported in [1]; kernel
> > > fails to boot due to random panic or hang-up.
> > >
> > > [1] https://forum.odroid.com/viewtopic.php?f=177&t=46360
> > >
> > > Fixes: c73a3107624d ("pwm: Handle .get_state() failures")
> > > Cc: [email protected] # 6.2
> > > Signed-off-by: Munehisa Kamata <[email protected]>
> >
> > My patch is essentially the same as Munehisa's, just written a bit
> > differently (to maybe make it easier for the compiler to optimize it?)
> > and with an explaining comment. The actual motivation is different so
> > the commit log is considerably different, too.
> >
> > I was unsure how to honor Munehisa's effort, I went with a
> > "Reported-by". Please tell me if you want this to be different.
>
> I'm okay with that, thank you.
>
> Perhaps, you should also add Cc tag for the stable tree? I did that in my
> patch and we're actually CCing to the stable list, but I'm not sure if it
> can pick up your patch without the tag. This should be fixed in linux-6.2.y.

IMHO the problem you reported should be fixed by adapting .get_state()
in the meson driver.

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | https://www.pengutronix.de/ |


Attachments:
(No filename) (2.33 kB)
signature.asc (488.00 B)
Download all attachments

2023-03-10 19:16:11

by Uwe Kleine-König

[permalink] [raw]
Subject: Re: [PATCH] pwm: Zero-initialize the pwm_state passed to driver's .get_state()

Hello,

On Tue, Feb 28, 2023 at 10:48:18PM +0100, Uwe Kleine-K?nig wrote:
> On Tue, Feb 28, 2023 at 11:43:27AM -0800, Munehisa Kamata wrote:
> > Perhaps, you should also add Cc tag for the stable tree? I did that in my
> > patch and we're actually CCing to the stable list, but I'm not sure if it
> > can pick up your patch without the tag. This should be fixed in linux-6.2.y.
>
> IMHO the problem you reported should be fixed by adapting .get_state()
> in the meson driver.

Note I sent a patch that implements this now. Find it at
https://lore.kernel.org/linux-pwm/[email protected]

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | https://www.pengutronix.de/ |


Attachments:
(No filename) (798.00 B)
signature.asc (488.00 B)
Download all attachments