2018-06-06 19:40:48

by Mathieu Malaterre

[permalink] [raw]
Subject: [PATCH] percpu_counter: `percpu_counter_read_positive' should not return negative number

Since function `percpu_counter_add' may result in a signed integer overflow
the result stored in `fbc->count' could be negative. Make sure that
function `percpu_counter_read_positive' does not return a negative number
in this case. This will match behavior when CONFIG_SMP=y.

Detected wth CONFIG_UBSAN=y

[76404.888450] ================================================================================
[76404.888477] UBSAN: Undefined behaviour in ../include/linux/percpu_counter.h:136:13
[76404.888485] signed integer overflow:
[76404.888490] 9223308017647617321 + 76624449492175 cannot be represented in type 'long long int'
[76404.888501] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #50
[76404.888506] Call Trace:
[76404.888523] [dffedd30] [c0478b90] ubsan_epilogue+0x18/0x4c (unreliable)
[76404.888533] [dffedd40] [c0479530] handle_overflow+0xbc/0xdc
[76404.888548] [dffeddc0] [c0439044] cfq_completed_request+0x560/0x1234
[76404.888557] [dffede40] [c03f3fc4] __blk_put_request+0xb0/0x2dc
[76404.888570] [dffede80] [c05a81d0] scsi_end_request+0x19c/0x344
[76404.888579] [dffedeb0] [c05a9954] scsi_io_completion+0x4b4/0x854
[76404.888592] [dffedf10] [c04046b4] blk_done_softirq+0xe4/0x1e0
[76404.888605] [dffedf60] [c07ec1d4] __do_softirq+0x16c/0x5f0
[76404.888617] [dffedfd0] [c0065160] irq_exit+0x110/0x1a8
[76404.888629] [dffedff0] [c001646c] call_do_irq+0x24/0x3c
[76404.888641] [c0cdbe80] [c0009a2c] do_IRQ+0x98/0x1a0
[76404.888649] [c0cdbeb0] [c001b93c] ret_from_except+0x0/0x14
[76404.888659] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
LR = arch_cpu_idle+0x30/0x78
[76404.888667] [c0cdbf70] [c0cda000] 0xc0cda000 (unreliable)
[76404.888679] [c0cdbf80] [c00a3844] do_idle+0xc4/0x158
[76404.888687] [c0cdbfb0] [c00a3a90] cpu_startup_entry+0x24/0x28
[76404.888696] [c0cdbfc0] [c097f820] start_kernel+0x47c/0x490
[76404.888703] [c0cdbff0] [00003444] 0x3444
[76404.888708] ================================================================================
[76409.458652] ================================================================================
[76409.458679] UBSAN: Undefined behaviour in ../include/linux/percpu_counter.h:136:13
[76409.458687] signed integer overflow:
[76409.458692] 9223369047059056210 + 76629034867964 cannot be represented in type 'long long int'
[76409.458703] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #50
[76409.458708] Call Trace:
[76409.458725] [dffedd30] [c0478b90] ubsan_epilogue+0x18/0x4c (unreliable)
[76409.458736] [dffedd40] [c0479530] handle_overflow+0xbc/0xdc
[76409.458751] [dffeddc0] [c0438e60] cfq_completed_request+0x37c/0x1234
[76409.458759] [dffede40] [c03f3fc4] __blk_put_request+0xb0/0x2dc
[76409.458773] [dffede80] [c05a81d0] scsi_end_request+0x19c/0x344
[76409.458781] [dffedeb0] [c05a9954] scsi_io_completion+0x4b4/0x854
[76409.458794] [dffedf10] [c04046b4] blk_done_softirq+0xe4/0x1e0
[76409.458807] [dffedf60] [c07ec1d4] __do_softirq+0x16c/0x5f0
[76409.458820] [dffedfd0] [c0065160] irq_exit+0x110/0x1a8
[76409.458832] [dffedff0] [c001646c] call_do_irq+0x24/0x3c
[76409.458844] [c0cdbe80] [c0009a2c] do_IRQ+0x98/0x1a0
[76409.458852] [c0cdbeb0] [c001b93c] ret_from_except+0x0/0x14
[76409.458861] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
LR = arch_cpu_idle+0x30/0x78
[76409.458870] [c0cdbf70] [c0cda000] 0xc0cda000 (unreliable)
[76409.458882] [c0cdbf80] [c00a3844] do_idle+0xc4/0x158
[76409.458889] [c0cdbfb0] [c00a3a8c] cpu_startup_entry+0x20/0x28
[76409.458899] [c0cdbfc0] [c097f820] start_kernel+0x47c/0x490
[76409.458905] [c0cdbff0] [00003444] 0x3444
[76409.458910] ================================================================================

Signed-off-by: Mathieu Malaterre <[email protected]>
---
include/linux/percpu_counter.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
index 4f052496cdfd..fd14f629123f 100644
--- a/include/linux/percpu_counter.h
+++ b/include/linux/percpu_counter.h
@@ -133,6 +133,7 @@ static inline void
percpu_counter_add(struct percpu_counter *fbc, s64 amount)
{
preempt_disable();
+ /* possible signed integer overflow */
fbc->count += amount;
preempt_enable();
}
@@ -154,7 +155,10 @@ static inline s64 percpu_counter_read(struct percpu_counter *fbc)
*/
static inline s64 percpu_counter_read_positive(struct percpu_counter *fbc)
{
- return fbc->count;
+ if (fbc->count >= 0)
+ return fbc->count;
+
+ return 0;
}

static inline s64 percpu_counter_sum_positive(struct percpu_counter *fbc)
--
2.11.0



2018-06-06 21:03:20

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] percpu_counter: `percpu_counter_read_positive' should not return negative number

On Wed, Jun 06, 2018 at 09:39:40PM +0200, Mathieu Malaterre wrote:
> Since function `percpu_counter_add' may result in a signed integer overflow
> the result stored in `fbc->count' could be negative. Make sure that
> function `percpu_counter_read_positive' does not return a negative number
> in this case. This will match behavior when CONFIG_SMP=y.
>
> Detected wth CONFIG_UBSAN=y
>
> [76404.888450] ================================================================================
> [76404.888477] UBSAN: Undefined behaviour in ../include/linux/percpu_counter.h:136:13
> [76404.888485] signed integer overflow:
> [76404.888490] 9223308017647617321 + 76624449492175 cannot be represented in type 'long long int'
> [76404.888501] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #50
> [76404.888506] Call Trace:
> [76404.888523] [dffedd30] [c0478b90] ubsan_epilogue+0x18/0x4c (unreliable)
> [76404.888533] [dffedd40] [c0479530] handle_overflow+0xbc/0xdc
> [76404.888548] [dffeddc0] [c0439044] cfq_completed_request+0x560/0x1234
> [76404.888557] [dffede40] [c03f3fc4] __blk_put_request+0xb0/0x2dc
> [76404.888570] [dffede80] [c05a81d0] scsi_end_request+0x19c/0x344
> [76404.888579] [dffedeb0] [c05a9954] scsi_io_completion+0x4b4/0x854
> [76404.888592] [dffedf10] [c04046b4] blk_done_softirq+0xe4/0x1e0
> [76404.888605] [dffedf60] [c07ec1d4] __do_softirq+0x16c/0x5f0
> [76404.888617] [dffedfd0] [c0065160] irq_exit+0x110/0x1a8
> [76404.888629] [dffedff0] [c001646c] call_do_irq+0x24/0x3c
> [76404.888641] [c0cdbe80] [c0009a2c] do_IRQ+0x98/0x1a0
> [76404.888649] [c0cdbeb0] [c001b93c] ret_from_except+0x0/0x14
> [76404.888659] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
> LR = arch_cpu_idle+0x30/0x78
> [76404.888667] [c0cdbf70] [c0cda000] 0xc0cda000 (unreliable)
> [76404.888679] [c0cdbf80] [c00a3844] do_idle+0xc4/0x158
> [76404.888687] [c0cdbfb0] [c00a3a90] cpu_startup_entry+0x24/0x28
> [76404.888696] [c0cdbfc0] [c097f820] start_kernel+0x47c/0x490
> [76404.888703] [c0cdbff0] [00003444] 0x3444

So, the _positive versions are there to deal with small negative reads
coming from percpu propagation delays. It's not there to protect
against actual counter overflow although it can't detect that on SMP.
It's just outright wrong to report 0 when the counter actually
overflowed and applying the suggested patch masks a real problem
undetectable.

I think the right thing to do is actually undersatnding what's going
on (why is a 64bit counter overflowing?) and fix the underlying issue.

Thanks.

--
tejun

2018-06-07 06:26:32

by Mathieu Malaterre

[permalink] [raw]
Subject: Re: [PATCH] percpu_counter: `percpu_counter_read_positive' should not return negative number

On Wed, Jun 6, 2018 at 9:57 PM Tejun Heo <[email protected]> wrote:
>
> On Wed, Jun 06, 2018 at 09:39:40PM +0200, Mathieu Malaterre wrote:
> > Since function `percpu_counter_add' may result in a signed integer overflow
> > the result stored in `fbc->count' could be negative. Make sure that
> > function `percpu_counter_read_positive' does not return a negative number
> > in this case. This will match behavior when CONFIG_SMP=y.
> >
> > Detected wth CONFIG_UBSAN=y
> >
> > [76404.888450] ================================================================================
> > [76404.888477] UBSAN: Undefined behaviour in ../include/linux/percpu_counter.h:136:13
> > [76404.888485] signed integer overflow:
> > [76404.888490] 9223308017647617321 + 76624449492175 cannot be represented in type 'long long int'
> > [76404.888501] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #50
> > [76404.888506] Call Trace:
> > [76404.888523] [dffedd30] [c0478b90] ubsan_epilogue+0x18/0x4c (unreliable)
> > [76404.888533] [dffedd40] [c0479530] handle_overflow+0xbc/0xdc
> > [76404.888548] [dffeddc0] [c0439044] cfq_completed_request+0x560/0x1234
> > [76404.888557] [dffede40] [c03f3fc4] __blk_put_request+0xb0/0x2dc
> > [76404.888570] [dffede80] [c05a81d0] scsi_end_request+0x19c/0x344
> > [76404.888579] [dffedeb0] [c05a9954] scsi_io_completion+0x4b4/0x854
> > [76404.888592] [dffedf10] [c04046b4] blk_done_softirq+0xe4/0x1e0
> > [76404.888605] [dffedf60] [c07ec1d4] __do_softirq+0x16c/0x5f0
> > [76404.888617] [dffedfd0] [c0065160] irq_exit+0x110/0x1a8
> > [76404.888629] [dffedff0] [c001646c] call_do_irq+0x24/0x3c
> > [76404.888641] [c0cdbe80] [c0009a2c] do_IRQ+0x98/0x1a0
> > [76404.888649] [c0cdbeb0] [c001b93c] ret_from_except+0x0/0x14
> > [76404.888659] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
> > LR = arch_cpu_idle+0x30/0x78
> > [76404.888667] [c0cdbf70] [c0cda000] 0xc0cda000 (unreliable)
> > [76404.888679] [c0cdbf80] [c00a3844] do_idle+0xc4/0x158
> > [76404.888687] [c0cdbfb0] [c00a3a90] cpu_startup_entry+0x24/0x28
> > [76404.888696] [c0cdbfc0] [c097f820] start_kernel+0x47c/0x490
> > [76404.888703] [c0cdbff0] [00003444] 0x3444
>
> So, the _positive versions are there to deal with small negative reads
> coming from percpu propagation delays. It's not there to protect
> against actual counter overflow although it can't detect that on SMP.
> It's just outright wrong to report 0 when the counter actually
> overflowed and applying the suggested patch masks a real problem
> undetectable.

I see, thanks for the explanation.

> I think the right thing to do is actually undersatnding what's going
> on (why is a 64bit counter overflowing?) and fix the underlying issue.
>
> Thanks.
>
> --
> tejun