2017-09-08 20:43:09

by YASUAKI ISHIMATSU

[permalink] [raw]
Subject: [PATCH] mm/memory_hotplug: fix wrong casting for __remove_section()

__remove_section() calls __remove_zone() to shrink zone and pgdat.
But due to wrong castings, __remvoe_zone() cannot shrink zone
and pgdat correctly if pfn is over 0xffffffff.

So the patch fixes the following 3 wrong castings.

1. find_smallest_section_pfn() returns 0 or start_pfn which defined
as unsigned long. But the function always returns 32bit value
since the function is defined as int.

2. find_biggest_section_pfn() returns 0 or pfn which defined as
unsigned long. the function always returns 32bit value
since the function is defined as int.

3. __remove_section() calculates start_pfn using section_nr_to_pfn()
and scn_nr. section_nr_to_pfn() just shifts scn_nr by
PFN_SECTION_SHIFT bit. But since scn_nr is defined as int,
section_nr_to_pfn() always return 32 bit value.

The patch fixes the wrong castings.

Signed-off-by: Yasuaki Ishimatsu <[email protected]>
---
mm/memory_hotplug.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 73bf17d..3514ef2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -331,7 +331,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,

#ifdef CONFIG_MEMORY_HOTREMOVE
/* find the smallest valid pfn in the range [start_pfn, end_pfn) */
-static int find_smallest_section_pfn(int nid, struct zone *zone,
+static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
unsigned long start_pfn,
unsigned long end_pfn)
{
@@ -356,7 +356,7 @@ static int find_smallest_section_pfn(int nid, struct zone *zone,
}

/* find the biggest valid pfn in the range [start_pfn, end_pfn). */
-static int find_biggest_section_pfn(int nid, struct zone *zone,
+static unsigned long find_biggest_section_pfn(int nid, struct zone *zone,
unsigned long start_pfn,
unsigned long end_pfn)
{
@@ -544,7 +544,7 @@ static int __remove_section(struct zone *zone, struct mem_section *ms,
return ret;

scn_nr = __section_nr(ms);
- start_pfn = section_nr_to_pfn(scn_nr);
+ start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
__remove_zone(zone, start_pfn);

sparse_remove_one_section(zone, ms, map_offset);
--
1.8.3.1


2017-09-12 12:49:56

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm/memory_hotplug: fix wrong casting for __remove_section()

On Fri 08-09-17 16:43:04, YASUAKI ISHIMATSU wrote:
> __remove_section() calls __remove_zone() to shrink zone and pgdat.
> But due to wrong castings, __remvoe_zone() cannot shrink zone
> and pgdat correctly if pfn is over 0xffffffff.
>
> So the patch fixes the following 3 wrong castings.
>
> 1. find_smallest_section_pfn() returns 0 or start_pfn which defined
> as unsigned long. But the function always returns 32bit value
> since the function is defined as int.
>
> 2. find_biggest_section_pfn() returns 0 or pfn which defined as
> unsigned long. the function always returns 32bit value
> since the function is defined as int.

this is indeed wrong. Pfns over would be really broken 15TB. Not that
unrealistic these days

>
> 3. __remove_section() calculates start_pfn using section_nr_to_pfn()
> and scn_nr. section_nr_to_pfn() just shifts scn_nr by
> PFN_SECTION_SHIFT bit. But since scn_nr is defined as int,
> section_nr_to_pfn() always return 32 bit value.

Dohh, those nasty macros. This is hidden quite well. It seems other
callers are using unsigned long properly. But I would rather make sure
we won't repeat that error again. Can we instead make section_nr_to_pfn
resp. pfn_to_section_nr static inline and enfore proper types?

I would also split this into two patches.

Thanks!

> The patch fixes the wrong castings.
>
> Signed-off-by: Yasuaki Ishimatsu <[email protected]>
> ---
> mm/memory_hotplug.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 73bf17d..3514ef2 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -331,7 +331,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>
> #ifdef CONFIG_MEMORY_HOTREMOVE
> /* find the smallest valid pfn in the range [start_pfn, end_pfn) */
> -static int find_smallest_section_pfn(int nid, struct zone *zone,
> +static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
> unsigned long start_pfn,
> unsigned long end_pfn)
> {
> @@ -356,7 +356,7 @@ static int find_smallest_section_pfn(int nid, struct zone *zone,
> }
>
> /* find the biggest valid pfn in the range [start_pfn, end_pfn). */
> -static int find_biggest_section_pfn(int nid, struct zone *zone,
> +static unsigned long find_biggest_section_pfn(int nid, struct zone *zone,
> unsigned long start_pfn,
> unsigned long end_pfn)
> {
> @@ -544,7 +544,7 @@ static int __remove_section(struct zone *zone, struct mem_section *ms,
> return ret;
>
> scn_nr = __section_nr(ms);
> - start_pfn = section_nr_to_pfn(scn_nr);
> + start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
> __remove_zone(zone, start_pfn);
>
> sparse_remove_one_section(zone, ms, map_offset);
> --
> 1.8.3.1
>

--
Michal Hocko
SUSE Labs

2017-09-12 17:05:47

by YASUAKI ISHIMATSU

[permalink] [raw]
Subject: Re: [PATCH] mm/memory_hotplug: fix wrong casting for __remove_section()

Hi Michal,

Thanks you for reviewing my patch.

On 09/12/2017 08:49 AM, Michal Hocko wrote:
> On Fri 08-09-17 16:43:04, YASUAKI ISHIMATSU wrote:
>> __remove_section() calls __remove_zone() to shrink zone and pgdat.
>> But due to wrong castings, __remvoe_zone() cannot shrink zone
>> and pgdat correctly if pfn is over 0xffffffff.
>>
>> So the patch fixes the following 3 wrong castings.
>>
>> 1. find_smallest_section_pfn() returns 0 or start_pfn which defined
>> as unsigned long. But the function always returns 32bit value
>> since the function is defined as int.
>>
>> 2. find_biggest_section_pfn() returns 0 or pfn which defined as
>> unsigned long. the function always returns 32bit value
>> since the function is defined as int.
>
> this is indeed wrong. Pfns over would be really broken 15TB. Not that
> unrealistic these days

Why 15TB?

Actually, all callers use pfn which defined as unsigned long to receive
the return value of find_{smallest|biggest}_section_nr(). So it will break
over 16TB.

>
>>
>> 3. __remove_section() calculates start_pfn using section_nr_to_pfn()
>> and scn_nr. section_nr_to_pfn() just shifts scn_nr by
>> PFN_SECTION_SHIFT bit. But since scn_nr is defined as int,
>> section_nr_to_pfn() always return 32 bit value.
>
> Dohh, those nasty macros. This is hidden quite well. It seems other
> callers are using unsigned long properly. But I would rather make sure
> we won't repeat that error again. Can we instead make section_nr_to_pfn
> resp. pfn_to_section_nr static inline and enfore proper types?

I'll update it.

>
> I would also split this into two patches.

I'll update it.

Thanks,
Yasuaki Ishimatsu

>
> Thanks!
>
>> The patch fixes the wrong castings.
>>
>> Signed-off-by: Yasuaki Ishimatsu <[email protected]>
>> ---
>> mm/memory_hotplug.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 73bf17d..3514ef2 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -331,7 +331,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>>
>> #ifdef CONFIG_MEMORY_HOTREMOVE
>> /* find the smallest valid pfn in the range [start_pfn, end_pfn) */
>> -static int find_smallest_section_pfn(int nid, struct zone *zone,
>> +static unsigned long find_smallest_section_pfn(int nid, struct zone *zone,
>> unsigned long start_pfn,
>> unsigned long end_pfn)
>> {
>> @@ -356,7 +356,7 @@ static int find_smallest_section_pfn(int nid, struct zone *zone,
>> }
>>
>> /* find the biggest valid pfn in the range [start_pfn, end_pfn). */
>> -static int find_biggest_section_pfn(int nid, struct zone *zone,
>> +static unsigned long find_biggest_section_pfn(int nid, struct zone *zone,
>> unsigned long start_pfn,
>> unsigned long end_pfn)
>> {
>> @@ -544,7 +544,7 @@ static int __remove_section(struct zone *zone, struct mem_section *ms,
>> return ret;
>>
>> scn_nr = __section_nr(ms);
>> - start_pfn = section_nr_to_pfn(scn_nr);
>> + start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
>> __remove_zone(zone, start_pfn);
>>
>> sparse_remove_one_section(zone, ms, map_offset);
>> --
>> 1.8.3.1
>>
>

2017-09-13 05:59:19

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm/memory_hotplug: fix wrong casting for __remove_section()

On Tue 12-09-17 13:05:39, YASUAKI ISHIMATSU wrote:
> Hi Michal,
>
> Thanks you for reviewing my patch.
>
> On 09/12/2017 08:49 AM, Michal Hocko wrote:
> > On Fri 08-09-17 16:43:04, YASUAKI ISHIMATSU wrote:
> >> __remove_section() calls __remove_zone() to shrink zone and pgdat.
> >> But due to wrong castings, __remvoe_zone() cannot shrink zone
> >> and pgdat correctly if pfn is over 0xffffffff.
> >>
> >> So the patch fixes the following 3 wrong castings.
> >>
> >> 1. find_smallest_section_pfn() returns 0 or start_pfn which defined
> >> as unsigned long. But the function always returns 32bit value
> >> since the function is defined as int.
> >>
> >> 2. find_biggest_section_pfn() returns 0 or pfn which defined as
> >> unsigned long. the function always returns 32bit value
> >> since the function is defined as int.
> >
> > this is indeed wrong. Pfns over would be really broken 15TB. Not that
> > unrealistic these days
>
> Why 15TB?

0xffffffff>>28

--
Michal Hocko
SUSE Labs

2017-09-14 15:43:15

by YASUAKI ISHIMATSU

[permalink] [raw]
Subject: Re: [PATCH] mm/memory_hotplug: fix wrong casting for __remove_section()

Hi Michal,

On 09/13/2017 01:59 AM, Michal Hocko wrote:
> On Tue 12-09-17 13:05:39, YASUAKI ISHIMATSU wrote:
>> Hi Michal,
>>
>> Thanks you for reviewing my patch.
>>
>> On 09/12/2017 08:49 AM, Michal Hocko wrote:
>>> On Fri 08-09-17 16:43:04, YASUAKI ISHIMATSU wrote:
>>>> __remove_section() calls __remove_zone() to shrink zone and pgdat.
>>>> But due to wrong castings, __remvoe_zone() cannot shrink zone
>>>> and pgdat correctly if pfn is over 0xffffffff.
>>>>
>>>> So the patch fixes the following 3 wrong castings.
>>>>
>>>> 1. find_smallest_section_pfn() returns 0 or start_pfn which defined
>>>> as unsigned long. But the function always returns 32bit value
>>>> since the function is defined as int.
>>>>
>>>> 2. find_biggest_section_pfn() returns 0 or pfn which defined as
>>>> unsigned long. the function always returns 32bit value
>>>> since the function is defined as int.
>>>
>>> this is indeed wrong. Pfns over would be really broken 15TB. Not that
>>> unrealistic these days
>>
>> Why 15TB?
>
> 0xffffffff>>28
>

Even thought I see your explanation, I cannot understand.

In my understanding, find_{smallest|biggest}_section_pfn() return integer.
So the functions always return 0x00000000 - 0xffffffff. Therefore if pfn is over
0xffffffff (under 16TB), then the function cannot work correctly.

What am I wrong?

Thanks,
Yasuaki Ishimatsu

2017-09-15 09:37:00

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm/memory_hotplug: fix wrong casting for __remove_section()

On Thu 14-09-17 11:43:10, YASUAKI ISHIMATSU wrote:
> Hi Michal,
>
> On 09/13/2017 01:59 AM, Michal Hocko wrote:
> > On Tue 12-09-17 13:05:39, YASUAKI ISHIMATSU wrote:
> >> Hi Michal,
> >>
> >> Thanks you for reviewing my patch.
> >>
> >> On 09/12/2017 08:49 AM, Michal Hocko wrote:
> >>> On Fri 08-09-17 16:43:04, YASUAKI ISHIMATSU wrote:
> >>>> __remove_section() calls __remove_zone() to shrink zone and pgdat.
> >>>> But due to wrong castings, __remvoe_zone() cannot shrink zone
> >>>> and pgdat correctly if pfn is over 0xffffffff.
> >>>>
> >>>> So the patch fixes the following 3 wrong castings.
> >>>>
> >>>> 1. find_smallest_section_pfn() returns 0 or start_pfn which defined
> >>>> as unsigned long. But the function always returns 32bit value
> >>>> since the function is defined as int.
> >>>>
> >>>> 2. find_biggest_section_pfn() returns 0 or pfn which defined as
> >>>> unsigned long. the function always returns 32bit value
> >>>> since the function is defined as int.
> >>>
> >>> this is indeed wrong. Pfns over would be really broken 15TB. Not that
> >>> unrealistic these days
> >>
> >> Why 15TB?
> >
> > 0xffffffff>>28
> >
>
> Even thought I see your explanation, I cannot understand.
>
> In my understanding, find_{smallest|biggest}_section_pfn() return integer.
> So the functions always return 0x00000000 - 0xffffffff. Therefore if pfn is over
> 0xffffffff (under 16TB), then the function cannot work correctly.
>
> What am I wrong?

You are not wrong. We are talking about the same thing AFAICS. I was
just less precise...

--
Michal Hocko
SUSE Labs