2014-04-29 19:20:00

by Rik van Riel

[permalink] [raw]
Subject: [PATCH] mm,writeback: fix divide by zero in pos_ratio_polynom

It is possible for "limit - setpoint + 1" to equal zero, leading to a
divide by zero error. Blindly adding 1 to "limit - setpoint" is not
working, so we need to actually test the divisor before calling div64.

Signed-off-by: Rik van Riel <[email protected]>
Cc: [email protected]
---
mm/page-writeback.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ef41349..2682516 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -597,11 +597,16 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
unsigned long dirty,
unsigned long limit)
{
+ unsigned int divisor;
long long pos_ratio;
long x;

+ divisor = limit - setpoint;
+ if (!divisor)
+ divisor = 1;
+
x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
- limit - setpoint + 1);
+ divisor);
pos_ratio = x;
pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;


2014-04-29 19:44:37

by Motohiro Kosaki

[permalink] [raw]
Subject: RE: [PATCH] mm,writeback: fix divide by zero in pos_ratio_polynom



> -----Original Message-----
> From: Rik van Riel [mailto:[email protected]]
> Sent: Tuesday, April 29, 2014 3:19 PM
> To: [email protected]
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; Motohiro Kosaki JP;
> [email protected]; [email protected]; [email protected]
> Subject: [PATCH] mm,writeback: fix divide by zero in pos_ratio_polynom
>
> It is possible for "limit - setpoint + 1" to equal zero, leading to a divide by zero error. Blindly adding 1 to "limit - setpoint" is not working,
> so we need to actually test the divisor before calling div64.
>
> Signed-off-by: Rik van Riel <[email protected]>
> Cc: [email protected]

Fairly obvious fix.

Acked-by: KOSAKI Motohiro <[email protected]>


> ---
> mm/page-writeback.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c index ef41349..2682516 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -597,11 +597,16 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
> unsigned long dirty,
> unsigned long limit)
> {
> + unsigned int divisor;
> long long pos_ratio;
> long x;
>
> + divisor = limit - setpoint;
> + if (!divisor)
> + divisor = 1;
> +
> x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
> - limit - setpoint + 1);
> + divisor);
> pos_ratio = x;
> pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
> pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;

2014-04-29 22:39:40

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm,writeback: fix divide by zero in pos_ratio_polynom

On Tue, 29 Apr 2014 15:19:10 -0400 Rik van Riel <[email protected]> wrote:

> It is possible for "limit - setpoint + 1" to equal zero, leading to a
> divide by zero error. Blindly adding 1 to "limit - setpoint" is not
> working, so we need to actually test the divisor before calling div64.
>
> ...
>
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -597,11 +597,16 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
> unsigned long dirty,
> unsigned long limit)
> {
> + unsigned int divisor;

I'm thinking this would be better as a ulong so I don't have to worry
my pretty head over truncation things?

> long long pos_ratio;
> long x;
>
> + divisor = limit - setpoint;
> + if (!divisor)
> + divisor = 1;
> +
> x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
> - limit - setpoint + 1);
> + divisor);
> pos_ratio = x;
> pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
> pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;

--- a/mm/page-writeback.c~mm-page-writebackc-fix-divide-by-zero-in-pos_ratio_polynom-fix
+++ a/mm/page-writeback.c
@@ -597,13 +597,13 @@ static inline long long pos_ratio_polyno
unsigned long dirty,
unsigned long limit)
{
- unsigned int divisor;
+ unsigned long divisor;
long long pos_ratio;
long x;

divisor = limit - setpoint;
if (!divisor)
- divisor = 1;
+ divisor = 1; /* Avoid div-by-zero */

x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
divisor);
_

2014-04-29 22:48:25

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] mm,writeback: fix divide by zero in pos_ratio_polynom

On 04/29/2014 06:39 PM, Andrew Morton wrote:
> On Tue, 29 Apr 2014 15:19:10 -0400 Rik van Riel <[email protected]> wrote:
>
>> It is possible for "limit - setpoint + 1" to equal zero, leading to a
>> divide by zero error. Blindly adding 1 to "limit - setpoint" is not
>> working, so we need to actually test the divisor before calling div64.
>>
>> ...
>>
>> --- a/mm/page-writeback.c
>> +++ b/mm/page-writeback.c
>> @@ -597,11 +597,16 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
>> unsigned long dirty,
>> unsigned long limit)
>> {
>> + unsigned int divisor;
>
> I'm thinking this would be better as a ulong so I don't have to worry
> my pretty head over truncation things?

I looked at div_*64, and the second argument is a 32 bit
variable. I guess a long would be ok, since if we are
dividing by more than 4 billion we don't really care :)

static inline s64 div_s64(s64 dividend, s32 divisor)

> --- a/mm/page-writeback.c~mm-page-writebackc-fix-divide-by-zero-in-pos_ratio_polynom-fix
> +++ a/mm/page-writeback.c
> @@ -597,13 +597,13 @@ static inline long long pos_ratio_polyno
> unsigned long dirty,
> unsigned long limit)
> {
> - unsigned int divisor;
> + unsigned long divisor;
> long long pos_ratio;
> long x;
>
> divisor = limit - setpoint;
> if (!divisor)
> - divisor = 1;
> + divisor = 1; /* Avoid div-by-zero */

Works for me :)

--
All rights reversed

2014-04-29 22:53:56

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm,writeback: fix divide by zero in pos_ratio_polynom

On Tue, 29 Apr 2014 18:48:11 -0400 Rik van Riel <[email protected]> wrote:

> On 04/29/2014 06:39 PM, Andrew Morton wrote:
> > On Tue, 29 Apr 2014 15:19:10 -0400 Rik van Riel <[email protected]> wrote:
> >
> >> It is possible for "limit - setpoint + 1" to equal zero, leading to a
> >> divide by zero error. Blindly adding 1 to "limit - setpoint" is not
> >> working, so we need to actually test the divisor before calling div64.
> >>
> >> ...
> >>
> >> --- a/mm/page-writeback.c
> >> +++ b/mm/page-writeback.c
> >> @@ -597,11 +597,16 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
> >> unsigned long dirty,
> >> unsigned long limit)
> >> {
> >> + unsigned int divisor;
> >
> > I'm thinking this would be better as a ulong so I don't have to worry
> > my pretty head over truncation things?
>
> I looked at div_*64, and the second argument is a 32 bit
> variable. I guess a long would be ok, since if we are
> dividing by more than 4 billion we don't really care :)
>
> static inline s64 div_s64(s64 dividend, s32 divisor)

ah, good point. Switching to ulong is perhaps a bit misleading then.

2014-04-30 08:04:20

by Maxim Patlasov

[permalink] [raw]
Subject: Re: [PATCH] mm,writeback: fix divide by zero in pos_ratio_polynom

Hi Rik!

On 04/29/2014 11:19 PM, Rik van Riel wrote:
> It is possible for "limit - setpoint + 1" to equal zero, leading to a
> divide by zero error. Blindly adding 1 to "limit - setpoint" is not
> working, so we need to actually test the divisor before calling div64.

The patch looks correct, but I'm afraid it can hide an actual bug in a
caller of pos_ratio_polynom(). The latter is not intended for setpoint >
limit. All callers take pains to ensure that setpoint <= limit. Look,
for example, at global_dirty_limits():

> if (background >= dirty)
> background = dirty / 2;

If you ever encountered "limit - setpoint + 1" equal zero, it may be
worthy to investigate how you came to setpoint greater than limit.

Thanks,
Maxim

>
> Signed-off-by: Rik van Riel <[email protected]>
> Cc: [email protected]
> ---
> mm/page-writeback.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index ef41349..2682516 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -597,11 +597,16 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
> unsigned long dirty,
> unsigned long limit)
> {
> + unsigned int divisor;
> long long pos_ratio;
> long x;
>
> + divisor = limit - setpoint;
> + if (!divisor)
> + divisor = 1;
> +
> x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
> - limit - setpoint + 1);
> + divisor);
> pos_ratio = x;
> pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
> pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;

2014-04-30 08:13:09

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm,writeback: fix divide by zero in pos_ratio_polynom

On Wed 30-04-14 12:04:04, Maxim Patlasov wrote:
> Hi Rik!
>
> On 04/29/2014 11:19 PM, Rik van Riel wrote:
> >It is possible for "limit - setpoint + 1" to equal zero, leading to a
> >divide by zero error. Blindly adding 1 to "limit - setpoint" is not
> >working, so we need to actually test the divisor before calling div64.
>
> The patch looks correct, but I'm afraid it can hide an actual bug in a
> caller of pos_ratio_polynom(). The latter is not intended for setpoint >
> limit. All callers take pains to ensure that setpoint <= limit. Look, for
> example, at global_dirty_limits():

The bug might trigger even if setpoint < limit because the result is
trucated to s32 and I guess this is what is going on here?
Is (limit - setpoint + 1) > 4G possible?

>
> > if (background >= dirty)
> > background = dirty / 2;
>
> If you ever encountered "limit - setpoint + 1" equal zero, it may be worthy
> to investigate how you came to setpoint greater than limit.
>
> Thanks,
> Maxim
>
> >
> >Signed-off-by: Rik van Riel <[email protected]>
> >Cc: [email protected]
> >---
> > mm/page-writeback.c | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> >diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> >index ef41349..2682516 100644
> >--- a/mm/page-writeback.c
> >+++ b/mm/page-writeback.c
> >@@ -597,11 +597,16 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
> > unsigned long dirty,
> > unsigned long limit)
> > {
> >+ unsigned int divisor;
> > long long pos_ratio;
> > long x;
> >+ divisor = limit - setpoint;
> >+ if (!divisor)
> >+ divisor = 1;
> >+
> > x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
> >- limit - setpoint + 1);
> >+ divisor);
> > pos_ratio = x;
> > pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
> > pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
>

--
Michal Hocko
SUSE Labs

2014-04-30 08:34:58

by Maxim Patlasov

[permalink] [raw]
Subject: Re: [PATCH] mm,writeback: fix divide by zero in pos_ratio_polynom

On 04/30/2014 12:12 PM, Michal Hocko wrote:
> On Wed 30-04-14 12:04:04, Maxim Patlasov wrote:
>> Hi Rik!
>>
>> On 04/29/2014 11:19 PM, Rik van Riel wrote:
>>> It is possible for "limit - setpoint + 1" to equal zero, leading to a
>>> divide by zero error. Blindly adding 1 to "limit - setpoint" is not
>>> working, so we need to actually test the divisor before calling div64.
>> The patch looks correct, but I'm afraid it can hide an actual bug in a
>> caller of pos_ratio_polynom(). The latter is not intended for setpoint >
>> limit. All callers take pains to ensure that setpoint <= limit. Look, for
>> example, at global_dirty_limits():
> The bug might trigger even if setpoint < limit because the result is
> trucated to s32 and I guess this is what is going on here?
> Is (limit - setpoint + 1) > 4G possible?

Yes, you are right. Probably the problem came from s32 overflow.

>
>>> if (background >= dirty)
>>> background = dirty / 2;
>> If you ever encountered "limit - setpoint + 1" equal zero, it may be worthy
>> to investigate how you came to setpoint greater than limit.
>>
>> Thanks,
>> Maxim
>>
>>> Signed-off-by: Rik van Riel <[email protected]>
>>> Cc: [email protected]
>>> ---
>>> mm/page-writeback.c | 7 ++++++-
>>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>>> index ef41349..2682516 100644
>>> --- a/mm/page-writeback.c
>>> +++ b/mm/page-writeback.c
>>> @@ -597,11 +597,16 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
>>> unsigned long dirty,
>>> unsigned long limit)
>>> {
>>> + unsigned int divisor;
>>> long long pos_ratio;
>>> long x;
>>> + divisor = limit - setpoint;
>>> + if (!divisor)
>>> + divisor = 1;
>>> +
>>> x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
>>> - limit - setpoint + 1);
>>> + divisor);
>>> pos_ratio = x;
>>> pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
>>> pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;

2014-04-30 10:02:39

by Masayoshi MIZUMA

[permalink] [raw]
Subject: Re: [PATCH] mm,writeback: fix divide by zero in pos_ratio_polynom

Hi Rik,

I applied your patch to linux-next kernel, then divide error happened
when I ran ltp stress test.
The divide error occurred on the following div_u64(), so the following
should be also fixed...

static unsigned long bdi_position_ratio(struct backing_dev_info *bdi,
unsigned long thresh,
unsigned long bg_thresh,
unsigned long dirty,
unsigned long bdi_thresh,
unsigned long bdi_dirty)
{
...
if (bdi_dirty < x_intercept - span / 4) {
pos_ratio = div_u64(pos_ratio * (x_intercept - bdi_dirty),
x_intercept - bdi_setpoint + 1);

The result of disassemble is as follows.

0xffffffff8116f520 <bdi_position_ratio+0xf0>: mov %rsi,%rax
0xffffffff8116f523 <bdi_position_ratio+0xf3>: sub %ebx,%esi
0xffffffff8116f525 <bdi_position_ratio+0xf5>: xor %edx,%edx
0xffffffff8116f527 <bdi_position_ratio+0xf7>: sub %r13,%rax
0xffffffff8116f52a <bdi_position_ratio+0xfa>: add $0x1,%esi
0xffffffff8116f52d <bdi_position_ratio+0xfd>: imul %r11,%rax
0xffffffff8116f531 <bdi_position_ratio+0x101>: div %rsi <= divide error!

The panic log is as follows.
---
[ 4102.894894] divide error: 0000 [#1] SMP
[ 4102.899344] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle tun bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables cfg80211 rfkill btrfs raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx coretemp kvm_intel kvm dm_mod raid6_pq i7core_edac edac_core iTCO_wdt iTCO_vendor_support xor lpc_ich i2c_i801 mfd_core serio_raw pcspkr acpi_power_meter crc32c_intel shpchp ipmi_si tpm_infineon ipmi_msghandler acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd sunrpc uinput xfs libcrc32c sd_mod crc_t10dif crct10dif_common sr_mod cdrom mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm e1000e ahci mptsas libahci scsi_transport_sas libata mptscsih ptp mptbase i2c_core pps_core
[ 4102.984462] CPU: 7 PID: 19758 Comm: mmap-corruption Not tainted 3.15.0-rc3-next-20140429+ #1
[ 4102.993984] Hardware name: FUJITSU PRIMERGY TX150 S7 /D2759, BIOS 6.00 Rev. 1.16.2759.A1 06/22/2010
[ 4103.008759] task: ffff88003680ed00 ti: ffff88000db62000 task.ti: ffff88000db62000
[ 4103.017179] RIP: 0010:[<ffffffff8116f531>] [<ffffffff8116f531>] bdi_position_ratio.isra.12+0x101/0x1d0
[ 4103.027772] RSP: 0000:ffff88000db63be8 EFLAGS: 00010256
[ 4103.033775] RAX: 04790000004f17b7 RBX: 00000000000026f0 RCX: 00003fffffffffff
[ 4103.041807] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 4103.049834] RBP: ffff88000db63c00 R08: 0000000000001623 R09: 000000000000063f
[ 4103.057897] R10: 000000000000065e R11: 0000000000000479 R12: 000000000000065f
[ 4103.065959] R13: 0000000000001540 R14: ffffea0000e4af30 R15: 0000000000000001
[ 4103.073992] FS: 00007f869a3e8740(0000) GS:ffff88003f5c0000(0000) knlGS:0000000000000000
[ 4103.083093] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4103.089558] CR2: 00007f8694bc6000 CR3: 0000000037332000 CR4: 00000000000007e0
[ 4103.097595] Stack:
[ 4103.099859] 0000000000001540 0000000000000000 ffff8800347e3570 ffff88000db63d18
[ 4103.108240] ffffffff8162f858 0000000000001540 ffffffff8120cbc1 ffff88000db63c48
[ 4103.116614] ffffffff8120c318 ffffea00003b7dc0 0000000000001000 0000000000000000
[ 4103.124984] Call Trace:
[ 4103.127740] [<ffffffff8162f858>] balance_dirty_pages.isra.21+0x278/0x5f1
[ 4103.135378] [<ffffffff8120cbc1>] ? __block_commit_write.isra.21+0x81/0xb0
[ 4103.143115] [<ffffffff8120c318>] ? __set_page_dirty_buffers+0x88/0xb0
[ 4103.150461] [<ffffffff8120fa43>] ? block_page_mkwrite+0x63/0xb0
[ 4103.157222] [<ffffffff81170c97>] balance_dirty_pages_ratelimited+0xe7/0x110
[ 4103.165154] [<ffffffff8119149c>] do_shared_fault+0x15c/0x230
[ 4103.171619] [<ffffffff81192406>] handle_mm_fault+0x2d6/0x1080
[ 4103.178184] [<ffffffff810a0666>] ? ftrace_raw_event_sched_stat_runtime+0x86/0xc0
[ 4103.188806] [<ffffffff8105dff6>] __do_page_fault+0x1b6/0x550
[ 4103.197479] [<ffffffff81142b2a>] ? ftrace_event_buffer_commit+0x8a/0xc0
[ 4103.207216] [<ffffffff810a0c63>] ? ftrace_raw_event_sched_switch+0xb3/0xf0
[ 4103.217240] [<ffffffff81012625>] ? __switch_to+0x165/0x590
[ 4103.225710] [<ffffffff8105e3c1>] do_page_fault+0x31/0x70
[ 4103.233948] [<ffffffff8163dec8>] page_fault+0x28/0x30
[ 4103.241856] Code: 48 c1 e9 12 48 c1 eb 10 48 c1 ee 10 48 01 de 48 89 f2 48 29 ca 49 39 d5 73 14 48 89 f0 29 de 31 d2 4c 29 e8 83 c6 01 49 0f af c3 <48> f7 f6 4c 89 e2 48 d1 ea 49 39 d5 73 15 49 c1 ec 04 4d 39 e5
[ 4103.268205] RIP [<ffffffff8116f531>] bdi_position_ratio.isra.12+0x101/0x1d0
[ 4103.278446] RSP <ffff88000db63be8>
---

Thanks,
Masayoshi Mizuma

On Tue, 29 Apr 2014 15:19:10 -0400 Rik van Riel wrote:
> It is possible for "limit - setpoint + 1" to equal zero, leading to a
> divide by zero error. Blindly adding 1 to "limit - setpoint" is not
> working, so we need to actually test the divisor before calling div64.
>
> Signed-off-by: Rik van Riel <[email protected]>
> Cc: [email protected]
> ---
> mm/page-writeback.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index ef41349..2682516 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -597,11 +597,16 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
> unsigned long dirty,
> unsigned long limit)
> {
> + unsigned int divisor;
> long long pos_ratio;
> long x;
>
> + divisor = limit - setpoint;
> + if (!divisor)
> + divisor = 1;
> +
> x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
> - limit - setpoint + 1);
> + divisor);
> pos_ratio = x;
> pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
> pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

2014-04-30 13:44:34

by Rik van Riel

[permalink] [raw]
Subject: [PATCH v2] mm,writeback: fix divide by zero in pos_ratio_polynom

On Wed, 30 Apr 2014 19:01:11 +0900
Masayoshi Mizuma <[email protected]> wrote:

> Hi Rik,
>
> I applied your patch to linux-next kernel, then divide error happened
> when I ran ltp stress test.
> The divide error occurred on the following div_u64(), so the following
> should be also fixed...
>
> static unsigned long bdi_position_ratio(struct backing_dev_info *bdi,

Good catch. This patch fixes both places, and also has Andrew's
improvements in both places.

Andrew, this can drop in -mm instead of my previous patch and
your two cleanups to it.

---8<---

Subject: mm,writeback: fix divide by zero in pos_ratio_polynom

It is possible for "limit - setpoint + 1" to equal zero, leading to a
divide by zero error. Blindly adding 1 to "limit - setpoint" is not
working, so we need to actually test the divisor before calling div64.

Signed-off-by: Rik van Riel <[email protected]>
---
mm/page-writeback.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ef41349..f98a297 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -597,11 +597,16 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
unsigned long dirty,
unsigned long limit)
{
+ unsigned long divisor;
long long pos_ratio;
long x;

+ divisor = limit - setpoint;
+ if (!divisor)
+ divisor = 1; /* Avoid div-by-zero */
+
x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
- limit - setpoint + 1);
+ divisor);
pos_ratio = x;
pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
@@ -842,8 +847,12 @@ static unsigned long bdi_position_ratio(struct backing_dev_info *bdi,
x_intercept = bdi_setpoint + span;

if (bdi_dirty < x_intercept - span / 4) {
+ unsigned long divisor = x_intercept - bdi_setpoint;
+ if (!divisor)
+ divisor = 1; /* Avoid div-by-zero */
+
pos_ratio = div_u64(pos_ratio * (x_intercept - bdi_dirty),
- x_intercept - bdi_setpoint + 1);
+ divisor);
} else
pos_ratio /= 4;

2014-04-30 13:48:31

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2] mm,writeback: fix divide by zero in pos_ratio_polynom

On Wed 30-04-14 09:30:35, Rik van Riel wrote:
[...]
> Subject: mm,writeback: fix divide by zero in pos_ratio_polynom
>
> It is possible for "limit - setpoint + 1" to equal zero, leading to a
> divide by zero error. Blindly adding 1 to "limit - setpoint" is not
> working, so we need to actually test the divisor before calling div64.
>
> Signed-off-by: Rik van Riel <[email protected]>
> ---
> mm/page-writeback.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index ef41349..f98a297 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -597,11 +597,16 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
> unsigned long dirty,
> unsigned long limit)
> {
> + unsigned long divisor;
> long long pos_ratio;
> long x;
>
> + divisor = limit - setpoint;
> + if (!divisor)
> + divisor = 1; /* Avoid div-by-zero */
> +

This is still prone to u64 -> s32 issue, isn't it?
What was the original problem anyway? Was it really setpoint > limit or
rather the overflow?

> x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
> - limit - setpoint + 1);
> + divisor);
> pos_ratio = x;
> pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
> pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
> @@ -842,8 +847,12 @@ static unsigned long bdi_position_ratio(struct backing_dev_info *bdi,
> x_intercept = bdi_setpoint + span;
>
> if (bdi_dirty < x_intercept - span / 4) {
> + unsigned long divisor = x_intercept - bdi_setpoint;

Same here.

> + if (!divisor)
> + divisor = 1; /* Avoid div-by-zero */
> +
> pos_ratio = div_u64(pos_ratio * (x_intercept - bdi_dirty),
> - x_intercept - bdi_setpoint + 1);
> + divisor);
> } else
> pos_ratio /= 4;
>

--
Michal Hocko
SUSE Labs

2014-04-30 14:29:12

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH v2] mm,writeback: fix divide by zero in pos_ratio_polynom

On 04/30/2014 09:48 AM, Michal Hocko wrote:
> On Wed 30-04-14 09:30:35, Rik van Riel wrote:
> [...]
>> Subject: mm,writeback: fix divide by zero in pos_ratio_polynom
>>
>> It is possible for "limit - setpoint + 1" to equal zero, leading to a
>> divide by zero error. Blindly adding 1 to "limit - setpoint" is not
>> working, so we need to actually test the divisor before calling div64.
>>
>> Signed-off-by: Rik van Riel <[email protected]>
>> ---
>> mm/page-writeback.c | 13 +++++++++++--
>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>> index ef41349..f98a297 100644
>> --- a/mm/page-writeback.c
>> +++ b/mm/page-writeback.c
>> @@ -597,11 +597,16 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
>> unsigned long dirty,
>> unsigned long limit)
>> {
>> + unsigned long divisor;
>> long long pos_ratio;
>> long x;
>>
>> + divisor = limit - setpoint;
>> + if (!divisor)
>> + divisor = 1; /* Avoid div-by-zero */
>> +
>
> This is still prone to u64 -> s32 issue, isn't it?
> What was the original problem anyway? Was it really setpoint > limit or
> rather the overflow?

Good point. I guess we need these divisors to be "int" and
"unsigned int" respectively, since those are the arguments
given to div_s64 and div_u64.

Let me send a v3 right now.

>> x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
>> - limit - setpoint + 1);
>> + divisor);
>> pos_ratio = x;
>> pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
>> pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
>> @@ -842,8 +847,12 @@ static unsigned long bdi_position_ratio(struct backing_dev_info *bdi,
>> x_intercept = bdi_setpoint + span;
>>
>> if (bdi_dirty < x_intercept - span / 4) {
>> + unsigned long divisor = x_intercept - bdi_setpoint;
>
> Same here.
>
>> + if (!divisor)
>> + divisor = 1; /* Avoid div-by-zero */
>> +
>> pos_ratio = div_u64(pos_ratio * (x_intercept - bdi_dirty),
>> - x_intercept - bdi_setpoint + 1);
>> + divisor);
>> } else
>> pos_ratio /= 4;
>>
>

2014-04-30 14:34:10

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH v2] mm,writeback: fix divide by zero in pos_ratio_polynom

On 04/30/2014 09:48 AM, Michal Hocko wrote:
> On Wed 30-04-14 09:30:35, Rik van Riel wrote:
> [...]
>> Subject: mm,writeback: fix divide by zero in pos_ratio_polynom
>>
>> It is possible for "limit - setpoint + 1" to equal zero, leading to a
>> divide by zero error. Blindly adding 1 to "limit - setpoint" is not
>> working, so we need to actually test the divisor before calling div64.
>>
>> Signed-off-by: Rik van Riel <[email protected]>
>> ---
>> mm/page-writeback.c | 13 +++++++++++--
>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>> index ef41349..f98a297 100644
>> --- a/mm/page-writeback.c
>> +++ b/mm/page-writeback.c
>> @@ -597,11 +597,16 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
>> unsigned long dirty,
>> unsigned long limit)
>> {
>> + unsigned long divisor;
>> long long pos_ratio;
>> long x;
>>
>> + divisor = limit - setpoint;
>> + if (!divisor)
>> + divisor = 1; /* Avoid div-by-zero */
>> +
>
> This is still prone to u64 -> s32 issue, isn't it?
> What was the original problem anyway? Was it really setpoint > limit or
> rather the overflow?

Thinking about it some more, is it possible that
limit and/or setpoint are larger than 32 bits, but
the difference between them is not?

In that case, truncating both to 32 bits before
doing the subtraction would be troublesome, and
it would be better to do a cast in the comparison:

if (!(s32)divisor)
divisor = 1;

2014-04-30 14:43:55

by Rik van Riel

[permalink] [raw]
Subject: [PATCH v3] mm,writeback: fix divide by zero in pos_ratio_polynom

On Wed, 30 Apr 2014 15:48:26 +0200
Michal Hocko <[email protected]> wrote:

> This is still prone to u64 -> s32 issue, isn't it?
> What was the original problem anyway? Was it really setpoint > limit or
> rather the overflow?

This patch should avoid math overflows with both the initial
subtraction, and with use of the truncated divisor by div_s64
and div_u64.

I added redundant casts in the div_s64 and div_u64 calls to
make it clear what those functions do internally, which should
make it easy to understand why we do the same cast in the if
statements right above.

I believe this version of the patch addresses everybody's concerns.

---8<---

Subject: mm,writeback: fix divide by zero in pos_ratio_polynom

It is possible for "limit - setpoint + 1" to equal zero, leading to a
divide by zero error. Blindly adding 1 to "limit - setpoint" is not
working, so we need to actually test the divisor before calling div64.

Signed-off-by: Rik van Riel <[email protected]>
---
mm/page-writeback.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ef41349..6405687 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -598,10 +598,15 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
unsigned long limit)
{
long long pos_ratio;
+ long divisor;
long x;

+ divisor = limit - setpoint;
+ if (!(s32)divisor)
+ divisor = 1; /* Avoid div-by-zero */
+
x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
- limit - setpoint + 1);
+ (s32)divisor);
pos_ratio = x;
pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
@@ -842,8 +847,12 @@ static unsigned long bdi_position_ratio(struct backing_dev_info *bdi,
x_intercept = bdi_setpoint + span;

if (bdi_dirty < x_intercept - span / 4) {
+ unsigned long divisor = x_intercept - bdi_setpoint;
+ if (!(u32)divisor)
+ divisor = 1; /* Avoid div-by-zero */
+
pos_ratio = div_u64(pos_ratio * (x_intercept - bdi_dirty),
- x_intercept - bdi_setpoint + 1);
+ (u32)divisor);
} else
pos_ratio /= 4;

2014-04-30 14:49:12

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2] mm,writeback: fix divide by zero in pos_ratio_polynom

On Wed 30-04-14 10:31:29, Rik van Riel wrote:
> On 04/30/2014 09:48 AM, Michal Hocko wrote:
> >On Wed 30-04-14 09:30:35, Rik van Riel wrote:
> >[...]
> >>Subject: mm,writeback: fix divide by zero in pos_ratio_polynom
> >>
> >>It is possible for "limit - setpoint + 1" to equal zero, leading to a
> >>divide by zero error. Blindly adding 1 to "limit - setpoint" is not
> >>working, so we need to actually test the divisor before calling div64.
> >>
> >>Signed-off-by: Rik van Riel <[email protected]>
> >>---
> >> mm/page-writeback.c | 13 +++++++++++--
> >> 1 file changed, 11 insertions(+), 2 deletions(-)
> >>
> >>diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> >>index ef41349..f98a297 100644
> >>--- a/mm/page-writeback.c
> >>+++ b/mm/page-writeback.c
> >>@@ -597,11 +597,16 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
> >> unsigned long dirty,
> >> unsigned long limit)
> >> {
> >>+ unsigned long divisor;
> >> long long pos_ratio;
> >> long x;
> >>
> >>+ divisor = limit - setpoint;
> >>+ if (!divisor)
> >>+ divisor = 1; /* Avoid div-by-zero */
> >>+
> >
> >This is still prone to u64 -> s32 issue, isn't it?
> >What was the original problem anyway? Was it really setpoint > limit or
> >rather the overflow?
>
> Thinking about it some more, is it possible that
> limit and/or setpoint are larger than 32 bits, but
> the difference between them is not?
>
> In that case, truncating both to 32 bits before
> doing the subtraction would be troublesome, and
> it would be better to do a cast in the comparison:
>
> if (!(s32)divisor)
> divisor = 1;

How is that any different than defining divisor as 32b directly?


--
Michal Hocko
SUSE Labs

2014-04-30 15:31:17

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH v2] mm,writeback: fix divide by zero in pos_ratio_polynom

On 04/30/2014 10:49 AM, Michal Hocko wrote:
> On Wed 30-04-14 10:31:29, Rik van Riel wrote:
>> On 04/30/2014 09:48 AM, Michal Hocko wrote:
>>> On Wed 30-04-14 09:30:35, Rik van Riel wrote:
>>> [...]
>>>> Subject: mm,writeback: fix divide by zero in pos_ratio_polynom
>>>>
>>>> It is possible for "limit - setpoint + 1" to equal zero, leading to a
>>>> divide by zero error. Blindly adding 1 to "limit - setpoint" is not
>>>> working, so we need to actually test the divisor before calling div64.
>>>>
>>>> Signed-off-by: Rik van Riel <[email protected]>
>>>> ---
>>>> mm/page-writeback.c | 13 +++++++++++--
>>>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>>>> index ef41349..f98a297 100644
>>>> --- a/mm/page-writeback.c
>>>> +++ b/mm/page-writeback.c
>>>> @@ -597,11 +597,16 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
>>>> unsigned long dirty,
>>>> unsigned long limit)
>>>> {
>>>> + unsigned long divisor;
>>>> long long pos_ratio;
>>>> long x;
>>>>
>>>> + divisor = limit - setpoint;
>>>> + if (!divisor)
>>>> + divisor = 1; /* Avoid div-by-zero */
>>>> +
>>>
>>> This is still prone to u64 -> s32 issue, isn't it?
>>> What was the original problem anyway? Was it really setpoint > limit or
>>> rather the overflow?
>>
>> Thinking about it some more, is it possible that
>> limit and/or setpoint are larger than 32 bits, but
>> the difference between them is not?
>>
>> In that case, truncating both to 32 bits before
>> doing the subtraction would be troublesome, and
>> it would be better to do a cast in the comparison:
>>
>> if (!(s32)divisor)
>> divisor = 1;
>
> How is that any different than defining divisor as 32b directly?

For unsigned, it probably doesn't make a difference.

For signed int vs unsigned long, I wonder if there is a
corner case where casting the second to the first before
doing the "limit - setpoint" calculation can lead to a
different outcome...

2014-04-30 19:00:06

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3] mm,writeback: fix divide by zero in pos_ratio_polynom

On Wed, 30 Apr 2014 10:41:14 -0400 Rik van Riel <[email protected]> wrote:

> It is possible for "limit - setpoint + 1" to equal zero, leading to a
> divide by zero error. Blindly adding 1 to "limit - setpoint" is not
> working, so we need to actually test the divisor before calling div64.
>
> ...
>
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -598,10 +598,15 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
> unsigned long limit)
> {
> long long pos_ratio;
> + long divisor;
> long x;
>
> + divisor = limit - setpoint;
> + if (!(s32)divisor)
> + divisor = 1; /* Avoid div-by-zero */
> +
> x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
> - limit - setpoint + 1);
> + (s32)divisor);

Doesn't this just paper over the bug one time in four billion? The
other 3999999999 times, pos_ratio_polynom() returns an incorect result?

If it is indeed the case that pos_ratio_polynom() callers are
legitimately passing a setpoint which is more than 2^32 less than limit
then it would be better to handle that input correctly.

Writing a new suite of div functions sounds overkillish. At some loss
of precision could we do something like

if (divisor > 2^32) {
divisor >>= log2(divisor) - 32;
dividend >>= log2(divisor) - 32;
}
x = div(dividend, divisor);

?

And let's uninline the sorry thing while we're in there ;)

2014-04-30 19:32:51

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH v3] mm,writeback: fix divide by zero in pos_ratio_polynom

On 04/30/2014 03:00 PM, Andrew Morton wrote:
> On Wed, 30 Apr 2014 10:41:14 -0400 Rik van Riel <[email protected]> wrote:
>
>> It is possible for "limit - setpoint + 1" to equal zero, leading to a
>> divide by zero error. Blindly adding 1 to "limit - setpoint" is not
>> working, so we need to actually test the divisor before calling div64.
>>
>> ...
>>
>> --- a/mm/page-writeback.c
>> +++ b/mm/page-writeback.c
>> @@ -598,10 +598,15 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
>> unsigned long limit)
>> {
>> long long pos_ratio;
>> + long divisor;
>> long x;
>>
>> + divisor = limit - setpoint;
>> + if (!(s32)divisor)
>> + divisor = 1; /* Avoid div-by-zero */
>> +
>> x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
>> - limit - setpoint + 1);
>> + (s32)divisor);
>
> Doesn't this just paper over the bug one time in four billion? The
> other 3999999999 times, pos_ratio_polynom() returns an incorect result?
>
> If it is indeed the case that pos_ratio_polynom() callers are
> legitimately passing a setpoint which is more than 2^32 less than limit
> then it would be better to handle that input correctly.

The easy way would be by calling div64_s64 and div64_u64,
which are 64 bit all the way through.

Any objections?

The inlined bits seem to be stubs calling the _rem variants
of the functions, and discarding the remainder.

2014-04-30 19:35:28

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3] mm,writeback: fix divide by zero in pos_ratio_polynom

On Wed, 30 Apr 2014 15:30:04 -0400 Rik van Riel <[email protected]> wrote:

> On 04/30/2014 03:00 PM, Andrew Morton wrote:
> > On Wed, 30 Apr 2014 10:41:14 -0400 Rik van Riel <[email protected]> wrote:
> >
> >> It is possible for "limit - setpoint + 1" to equal zero, leading to a
> >> divide by zero error. Blindly adding 1 to "limit - setpoint" is not
> >> working, so we need to actually test the divisor before calling div64.
> >>
> >> ...
> >>
> >> --- a/mm/page-writeback.c
> >> +++ b/mm/page-writeback.c
> >> @@ -598,10 +598,15 @@ static inline long long pos_ratio_polynom(unsigned long setpoint,
> >> unsigned long limit)
> >> {
> >> long long pos_ratio;
> >> + long divisor;
> >> long x;
> >>
> >> + divisor = limit - setpoint;
> >> + if (!(s32)divisor)
> >> + divisor = 1; /* Avoid div-by-zero */
> >> +
> >> x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
> >> - limit - setpoint + 1);
> >> + (s32)divisor);
> >
> > Doesn't this just paper over the bug one time in four billion? The
> > other 3999999999 times, pos_ratio_polynom() returns an incorect result?
> >
> > If it is indeed the case that pos_ratio_polynom() callers are
> > legitimately passing a setpoint which is more than 2^32 less than limit
> > then it would be better to handle that input correctly.
>
> The easy way would be by calling div64_s64 and div64_u64,
> which are 64 bit all the way through.
>
> Any objections?

Sounds good to me.

> The inlined bits seem to be stubs calling the _rem variants
> of the functions, and discarding the remainder.

I was referring to pos_ratio_polynom(). The compiler will probably be
uninlining it anyway, but still...

2014-04-30 20:05:31

by Rik van Riel

[permalink] [raw]
Subject: [PATCH v4] mm,writeback: fix divide by zero in pos_ratio_polynom

On Wed, 30 Apr 2014 12:35:26 -0700
Andrew Morton <[email protected]> wrote:

> > The easy way would be by calling div64_s64 and div64_u64,
> > which are 64 bit all the way through.
> >
> > Any objections?
>
> Sounds good to me.
>
> > The inlined bits seem to be stubs calling the _rem variants
> > of the functions, and discarding the remainder.
>
> I was referring to pos_ratio_polynom(). The compiler will probably be
> uninlining it anyway, but still...

I believe this should do the trick.

---8<---

Subject: mm,writeback: fix divide by zero in pos_ratio_polynom

It is possible for "limit - setpoint + 1" to equal zero, leading to a
divide by zero error. Blindly adding 1 to "limit - setpoint" is not
working, so we need to actually test the divisor before calling div64.

Signed-off-by: Rik van Riel <[email protected]>
---
mm/page-writeback.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ef41349..37f56bb 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -593,15 +593,20 @@ unsigned long bdi_dirty_limit(struct backing_dev_info *bdi, unsigned long dirty)
* (5) the closer to setpoint, the smaller |df/dx| (and the reverse)
* => fast response on large errors; small oscillation near setpoint
*/
-static inline long long pos_ratio_polynom(unsigned long setpoint,
+static long long pos_ratio_polynom(unsigned long setpoint,
unsigned long dirty,
unsigned long limit)
{
+ unsigned long divisor;
long long pos_ratio;
long x;

- x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
- limit - setpoint + 1);
+ divisor = limit - setpoint;
+ if (!divisor)
+ divisor = 1; /* Avoid div-by-zero */
+
+ x = div64_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
+ divisor);
pos_ratio = x;
pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
@@ -842,8 +847,12 @@ static unsigned long bdi_position_ratio(struct backing_dev_info *bdi,
x_intercept = bdi_setpoint + span;

if (bdi_dirty < x_intercept - span / 4) {
- pos_ratio = div_u64(pos_ratio * (x_intercept - bdi_dirty),
- x_intercept - bdi_setpoint + 1);
+ unsigned long divisor = x_intercept - bdi_setpoint;
+ if (!divisor)
+ divisor = 1; /* Avoid div-by-zero */
+
+ pos_ratio = div64_u64(pos_ratio * (x_intercept - bdi_dirty),
+ divisor);
} else
pos_ratio /= 4;

2014-04-30 20:13:56

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v4] mm,writeback: fix divide by zero in pos_ratio_polynom

On Wed, 30 Apr 2014 16:02:18 -0400 Rik van Riel <[email protected]> wrote:

> I believe this should do the trick.
>
> ---8<---
>
> Subject: mm,writeback: fix divide by zero in pos_ratio_polynom
>
> It is possible for "limit - setpoint + 1" to equal zero, leading to a
> divide by zero error. Blindly adding 1 to "limit - setpoint" is not
> working, so we need to actually test the divisor before calling div64.
>

Changelog is a bit stale.

> ---
> mm/page-writeback.c | 19 ++++++++++++++-----
> 1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index ef41349..37f56bb 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -593,15 +593,20 @@ unsigned long bdi_dirty_limit(struct backing_dev_info *bdi, unsigned long dirty)
> * (5) the closer to setpoint, the smaller |df/dx| (and the reverse)
> * => fast response on large errors; small oscillation near setpoint
> */
> -static inline long long pos_ratio_polynom(unsigned long setpoint,
> +static long long pos_ratio_polynom(unsigned long setpoint,
> unsigned long dirty,
> unsigned long limit)
> {
> + unsigned long divisor;
> long long pos_ratio;
> long x;
>
> - x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
> - limit - setpoint + 1);
> + divisor = limit - setpoint;
> + if (!divisor)
> + divisor = 1; /* Avoid div-by-zero */

This was a consequence of 64->32 truncation and it can't happen any
more, can it?

2014-04-30 20:35:27

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH v4] mm,writeback: fix divide by zero in pos_ratio_polynom

On 04/30/2014 04:13 PM, Andrew Morton wrote:
> On Wed, 30 Apr 2014 16:02:18 -0400 Rik van Riel <[email protected]> wrote:
>
>> I believe this should do the trick.
>>
>> ---8<---
>>
>> Subject: mm,writeback: fix divide by zero in pos_ratio_polynom
>>
>> It is possible for "limit - setpoint + 1" to equal zero, leading to a
>> divide by zero error. Blindly adding 1 to "limit - setpoint" is not
>> working, so we need to actually test the divisor before calling div64.
>
> Changelog is a bit stale.

Will update.

>> -static inline long long pos_ratio_polynom(unsigned long setpoint,
>> +static long long pos_ratio_polynom(unsigned long setpoint,
>> unsigned long dirty,
>> unsigned long limit)
>> {
>> + unsigned long divisor;
>> long long pos_ratio;
>> long x;
>>
>> - x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
>> - limit - setpoint + 1);
>> + divisor = limit - setpoint;
>> + if (!divisor)
>> + divisor = 1; /* Avoid div-by-zero */
>
> This was a consequence of 64->32 truncation and it can't happen any
> more, can it?

That is a good question. Looking at the code some more,
I guess it may indeed be exclusively due to the truncation,
and we can go back to the older code, just with the fully
64 bit divide functions...

Good thing Masayoshi-san has a reproducer :)

2014-04-30 20:45:37

by Rik van Riel

[permalink] [raw]
Subject: [PATCH v5] mm,writeback: fix divide by zero in pos_ratio_polynom

On Wed, 30 Apr 2014 13:13:53 -0700
Andrew Morton <[email protected]> wrote:

> This was a consequence of 64->32 truncation and it can't happen any
> more, can it?

Andrew, this is cleaner indeed :)

Masayoshi-san, does the bug still happen with this version, or does
this fix the problem?

---8<---

Subject: mm,writeback: fix divide by zero in pos_ratio_polynom

It is possible for "limit - setpoint + 1" to equal zero, after
getting truncated to a 32 bit variable, and resulting in a divide
by zero error.

Using the fully 64 bit divide functions avoids this problem.

Also uninline pos_ratio_polynom, at Andrew's request.

Signed-off-by: Rik van Riel <[email protected]>
---
mm/page-writeback.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ef41349..a4317da 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -593,14 +593,14 @@ unsigned long bdi_dirty_limit(struct backing_dev_info *bdi, unsigned long dirty)
* (5) the closer to setpoint, the smaller |df/dx| (and the reverse)
* => fast response on large errors; small oscillation near setpoint
*/
-static inline long long pos_ratio_polynom(unsigned long setpoint,
+static long long pos_ratio_polynom(unsigned long setpoint,
unsigned long dirty,
unsigned long limit)
{
long long pos_ratio;
long x;

- x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
+ x = div64_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
limit - setpoint + 1);
pos_ratio = x;
pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
@@ -842,7 +842,7 @@ static unsigned long bdi_position_ratio(struct backing_dev_info *bdi,
x_intercept = bdi_setpoint + span;

if (bdi_dirty < x_intercept - span / 4) {
- pos_ratio = div_u64(pos_ratio * (x_intercept - bdi_dirty),
+ pos_ratio = div64_u64(pos_ratio * (x_intercept - bdi_dirty),
x_intercept - bdi_setpoint + 1);
} else
pos_ratio /= 4;

2014-04-30 21:01:00

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v5] mm,writeback: fix divide by zero in pos_ratio_polynom

On Wed, 30 Apr 2014 16:42:55 -0400 Rik van Riel <[email protected]> wrote:

> On Wed, 30 Apr 2014 13:13:53 -0700
> Andrew Morton <[email protected]> wrote:
>
> > This was a consequence of 64->32 truncation and it can't happen any
> > more, can it?
>
> Andrew, this is cleaner indeed :)

I'm starting to get worried about 32-bit wraparound in the patch
version number ;)

> Masayoshi-san, does the bug still happen with this version, or does
> this fix the problem?
>

We could put something like

if (WARN_ON_ONCE(setpoint == limit))
setpoint--;

in there if we're not sure. But it's better to be sure!

2014-04-30 21:24:09

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH v5] mm,writeback: fix divide by zero in pos_ratio_polynom

On 04/30/2014 05:00 PM, Andrew Morton wrote:
> On Wed, 30 Apr 2014 16:42:55 -0400 Rik van Riel <[email protected]> wrote:
>
>> On Wed, 30 Apr 2014 13:13:53 -0700
>> Andrew Morton <[email protected]> wrote:
>>
>>> This was a consequence of 64->32 truncation and it can't happen any
>>> more, can it?
>>
>> Andrew, this is cleaner indeed :)
>
> I'm starting to get worried about 32-bit wraparound in the patch
> version number ;)
>
>> Masayoshi-san, does the bug still happen with this version, or does
>> this fix the problem?
>>
>
> We could put something like
>
> if (WARN_ON_ONCE(setpoint == limit))
> setpoint--;
>
> in there if we're not sure. But it's better to be sure!

The more I look at the code, the more I am convinced that
Michal is right, and we cannot actually hit the case that
"limit - setpoint + 1 == 0".

Setpoint always seems to be some in-between point.

2014-04-30 21:33:03

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v5] mm,writeback: fix divide by zero in pos_ratio_polynom

On Wed, 30 Apr 2014 16:42:55 -0400 Rik van Riel <[email protected]> wrote:

> On Wed, 30 Apr 2014 13:13:53 -0700
> Andrew Morton <[email protected]> wrote:
>
> > This was a consequence of 64->32 truncation and it can't happen any
> > more, can it?
>
> Andrew, this is cleaner indeed :)
>
> Masayoshi-san, does the bug still happen with this version, or does
> this fix the problem?

I assumed we wanted a reported-by in there.

> Subject: mm,writeback: fix divide by zero in pos_ratio_polynom
>
> It is possible for "limit - setpoint + 1" to equal zero, after
> getting truncated to a 32 bit variable, and resulting in a divide
> by zero error.
>
> Using the fully 64 bit divide functions avoids this problem.

This isn't the whole story, is it? I added stuff:

: Using the fully 64 bit divide functions avoids this problem. It also will
: cause pos_ratio_polynom() to return the correct value when (setpoint -
: limit) exceeds 2^32.