2006-08-03 09:07:08

by moreau francis

[permalink] [raw]
Subject: Re : Re : sparsemem usage

Alan Cox wrote:
>
> Mapping out parts of a section is quite normal - think about the 640K to
> 1Mb hole in PC memory space.

OK. But I'm still worry. Please consider the following code

for (...; ...; ...) {
[...]
if (pfn_valid(i))
num_physpages++;
[...]
}

In that case num_physpages won't store an accurate value. Still it will be
used by the kernel to make some statistic assumptions on other kernel
data structure sizes.

Francis





2006-08-03 09:20:41

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: Re : Re : sparsemem usage

On Thu, 3 Aug 2006 09:07:06 +0000 (GMT)
moreau francis <[email protected]> wrote:

> Alan Cox wrote:
> >
> > Mapping out parts of a section is quite normal - think about the 640K to
> > 1Mb hole in PC memory space.
>
> OK. But I'm still worry. Please consider the following code
>
> for (...; ...; ...) {
> [...]
> if (pfn_valid(i))
> num_physpages++;
> [...]
> }
>
> In that case num_physpages won't store an accurate value. Still it will be
> used by the kernel to make some statistic assumptions on other kernel
> data structure sizes.
>
In my understanding, pfn_valid() just returns "the page has page struct or not".
So, don't use pfn_valid() to count physical pages..


-Kame

2006-08-03 09:48:42

by Andy Whitcroft

[permalink] [raw]
Subject: Re: Re : Re : sparsemem usage

moreau francis wrote:
> Alan Cox wrote:
>> Mapping out parts of a section is quite normal - think about the 640K to
>> 1Mb hole in PC memory space.
>
> OK. But I'm still worry. Please consider the following code
>
> for (...; ...; ...) {
> [...]
> if (pfn_valid(i))
> num_physpages++;
> [...]
> }
>
> In that case num_physpages won't store an accurate value. Still it will be
> used by the kernel to make some statistic assumptions on other kernel
> data structure sizes.

That would be incorrect usage. pfn_valid() simply doesn't tell you if
you have memory backing a pfn, it mearly means you can interrogate the
page* for it. A good example of code which counts pages in a region is
in count_highmem_pages() which has a form as below:

for (pfn = start; pfn < end; pfn++) {
if (!pfn_valid(pfn))
continue;
page = pfn_to_page(pfn);
if (PageReserved(page))
continue;
num_physpages++;
}

-apw

2006-08-03 12:46:31

by moreau francis

[permalink] [raw]
Subject: Re : Re : Re : sparsemem usage

Andy Whitcroft wrote:
> That would be incorrect usage. pfn_valid() simply doesn't tell you if
> you have memory backing a pfn, it mearly means you can interrogate the
> page* for it. A good example of code which counts pages in a region is
> in count_highmem_pages() which has a form as below:
>
> for (pfn = start; pfn < end; pfn++) {
> if (!pfn_valid(pfn))
> continue;
> page = pfn_to_page(pfn);
> if (PageReserved(page))
> continue;
> num_physpages++;
> }
>
num_physpages would still not give the right total number of pages in the
system. It will report a value smaller than the size of all memories which can
be suprising, depending on how it is used. In my mind I thought that it should
store the number of all pages in the system (reserved + free + ...).

Futhermore for flatmem model, my example that count the number of physical
pages is valid: reserved pages are really pages that are in used by the kernel.
But it's not valid anymore for sparsemem model. For consistency and code
sharing, I would make the same meaning of pfn_valid() and PageReserved() for
both models.

Francis


2006-08-03 13:15:27

by Andy Whitcroft

[permalink] [raw]
Subject: Re: Re : Re : Re : sparsemem usage

moreau francis wrote:
> Andy Whitcroft wrote:
>> That would be incorrect usage. pfn_valid() simply doesn't tell you if
>> you have memory backing a pfn, it mearly means you can interrogate the
>> page* for it. A good example of code which counts pages in a region is
>> in count_highmem_pages() which has a form as below:
>>
>> for (pfn = start; pfn < end; pfn++) {
>> if (!pfn_valid(pfn))
>> continue;
>> page = pfn_to_page(pfn);
>> if (PageReserved(page))
>> continue;
>> num_physpages++;
>> }
>>
> num_physpages would still not give the right total number of pages in the
> system. It will report a value smaller than the size of all memories which can
> be suprising, depending on how it is used. In my mind I thought that it should
> store the number of all pages in the system (reserved + free + ...).
>
> Futhermore for flatmem model, my example that count the number of physical
> pages is valid: reserved pages are really pages that are in used by the kernel.
> But it's not valid anymore for sparsemem model. For consistency and code
> sharing, I would make the same meaning of pfn_valid() and PageReserved() for
> both models.

The semantics and meaning of both pfn_valid() and PageReserved() are the
same in all three memory models, just not what you need them to be for
your pfn_valid() loop to tell you how many real frames there are.
I do not believe it is correct to say that your loop would give you the
number of physical pages under FLATMEM. If there are any gaps at all
(such as there is for IO space just below 1MB) that will pass
pfn_valid(), and yet does _not_ have any real memory associated with it.
With FLATMEM you will get pfn_valid() passing on non-memory pages.

I have to re-iterate pfn_valid() does not mean pfn_valid_memory(), it
means pfn_valid_memmap(). If you want to know if a page is valid and
memory (at least on x86) you could use:

if (pfn_valid(pfn) && page_is_ram(pfn)) {
}

It is rare you care how many real page frames there are in the system.
You are more interested in how many usable frames there are. Such as
for use in sizing hashes or caches. The reserved pages should be
excluded in this calculation. ACPI pages, BIOS pages and the like
simply are no interest.

I don't see anywhere in the kernel using that construct to work out how
many pages there are in the system. Mostly we have architectual
information to tell us what real physical pages exist in the system such
as the srat or e820 etc. If we really care about real page counts at
that accuracy we have those to refer to.

Do you have a usage model in which we really care about the number of
pages in the system to that level of accuracy?

-apw

2006-08-09 14:19:04

by moreau francis

[permalink] [raw]
Subject: Re : Re : Re : Re : sparsemem usage

Andy Whitcroft wrote:
>
> I have to re-iterate pfn_valid() does not mean pfn_valid_memory(), it
> means pfn_valid_memmap(). If you want to know if a page is valid and
> memory (at least on x86) you could use:
>
> if (pfn_valid(pfn) && page_is_ram(pfn)) {
> }
>
> It is rare you care how many real page frames there are in the system.
> You are more interested in how many usable frames there are. Such as
> for use in sizing hashes or caches. The reserved pages should be
> excluded in this calculation. ACPI pages, BIOS pages and the like
> simply are no interest.
>
> I don't see anywhere in the kernel using that construct to work out how
> many pages there are in the system. Mostly we have architectual
> information to tell us what real physical pages exist in the system such
> as the srat or e820 etc. If we really care about real page counts at
> that accuracy we have those to refer to.
>

Not all arch have page_is_ram(). OK it should be easy to have but we will
need create new data structures to keep this info. The point is that it's
really easy for memory model such sparsemem to keep this info.

> Do you have a usage model in which we really care about the number of
> pages in the system to that level of accuracy?
>

show_mem(), which is arch specific, needs to report them. And some
implementations use only pfn_valid().

thanks

Francis


2006-08-10 04:43:58

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: Re : Re : Re : Re : sparsemem usage

On Wed, 9 Aug 2006 14:19:01 +0000 (GMT)
moreau francis <[email protected]> wrote:

> Not all arch have page_is_ram(). OK it should be easy to have but we will
> need create new data structures to keep this info. The point is that it's
> really easy for memory model such sparsemem to keep this info.
>
> > Do you have a usage model in which we really care about the number of
> > pages in the system to that level of accuracy?
> >
>
> show_mem(), which is arch specific, needs to report them. And some
> implementations use only pfn_valid().
>

BTW, ioresouce information (see kernel/resouce.c)

[kamezawa@aworks Development]$ cat /proc/iomem | grep RAM
00000000-0009fbff : System RAM
000a0000-000bffff : Video RAM area
00100000-2dfeffff : System RAM

is not enough ?

I think kdump depends on this resouce information to determine
where should be dumped.

-Kame


2006-08-10 12:40:54

by moreau francis

[permalink] [raw]
Subject: Re : sparsemem usage

KAMEZAWA Hiroyuki wrote:
> On Wed, 9 Aug 2006 14:19:01 +0000 (GMT)
> moreau francis <[email protected]> wrote:
>
>> Not all arch have page_is_ram(). OK it should be easy to have but we will
>> need create new data structures to keep this info. The point is that it's
>> really easy for memory model such sparsemem to keep this info.
>>
>>> Do you have a usage model in which we really care about the number of
>>> pages in the system to that level of accuracy?
>>>
>> show_mem(), which is arch specific, needs to report them. And some
>> implementations use only pfn_valid().
>>
>
> BTW, ioresouce information (see kernel/resouce.c)
>
> [kamezawa@aworks Development]$ cat /proc/iomem | grep RAM
> 00000000-0009fbff : System RAM
> 000a0000-000bffff : Video RAM area
> 00100000-2dfeffff : System RAM
>
> is not enough ?
>

well actually you show that to get a really simple information, ie does
a page exist ?, we need to parse some kernel data structures like
ioresource (which is, IMHO, hackish) or duplicate in each architecture
some data to keep track of existing pages.

Francis

2006-08-10 12:50:31

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: Re : sparsemem usage

On Thu, 10 Aug 2006 14:40:52 +0200 (CEST)
moreau francis <[email protected]> wrote:
> > BTW, ioresouce information (see kernel/resouce.c)
> >
> > [kamezawa@aworks Development]$ cat /proc/iomem | grep RAM
> > 00000000-0009fbff : System RAM
> > 000a0000-000bffff : Video RAM area
> > 00100000-2dfeffff : System RAM
> >
> > is not enough ?
> >
>
> well actually you show that to get a really simple information, ie does
> a page exist ?, we need to parse some kernel data structures like
> ioresource (which is, IMHO, hackish) or duplicate in each architecture
> some data to keep track of existing pages.
>

becasue memory map from e820(x86) or efi(ia64) are registered to iomem_resource,
we should avoid duplicates that information. kdump and memory hotplug uses
this information. (memory hotplug updates this iomem_resource.)

Implementing "page_is_exist" function based on ioresouce is one of generic
and rubust way to go, I think.
(if performance of list walking is problem, enhancing ioresouce code is
better.)

-Kame