2004-04-08 16:32:57

by Andy Whitcroft

[permalink] [raw]
Subject: HUGETLB commit handling.

We have been looking at the HUGETLB page commit issue (offlist) and are
close a final merged patch. However, our testing seems to have thrown up
an inconsistency in interface which we are not sure whether to fix or not.

With normal shm segments we commit the pages we will need at shmget() time.
The real pages being allocated on demand. With hugetlb pages we currently
do not manage commit, but allocate them on map, shmat() in this case. When
we add commit handling it would seem most appropriate to commit the pages
in shmget() as for small page mappings. However, this might seem to change
the semantics slightly, in that if there is insufficient hugepages
available then the failure would come at shmget() and not shmat() time.

I would contend this is the right thing to do, as it makes the semantics of
hugepages match that of the existing small pages. We are looking for a
consensus as this might be construed as a semantic change.

Thoughts.

-apw



2004-04-08 17:05:58

by Andi Kleen

[permalink] [raw]
Subject: Re: HUGETLB commit handling.

Andy Whitcroft <[email protected]> writes:

> We have been looking at the HUGETLB page commit issue (offlist) and are
> close a final merged patch. However, our testing seems to have thrown up

This includes lazy allocation for i386 and IA64, right?

If yes, I'm waiting for your final patch then to remerge the NUMA
policy code into it (currently NUMA API contains a dumb version of lazy
allocation for i386 without any prereservation)

> I would contend this is the right thing to do, as it makes the semantics of
> hugepages match that of the existing small pages. We are looking for a
> consensus as this might be construed as a semantic change.

I think it's more clean to do it at shmget() time too, so it's probably the
right thing to do.

-Andi

2004-04-08 17:22:18

by Ray Bryant

[permalink] [raw]
Subject: Re: HUGETLB commit handling.

Andi,

Yes, that is the plan we are heading for. However, to make things simpler and
follow the "subnit a patch that does one thing" rule, we will likely do two
patches, one to add hugetlb commit handling, and a second one to add lazy
allocation for i386 and IA64.

The other problem we are wrestling with is how to do the ia386 and ia64 lazy
allocation code without breaking the architectures that haven't yet switched
to lazy allocation. There will probbaly be some

#define ARCH_USES_HUGETLB_PREFAULT

nonsense added to deal with the latter, if needed.

Then, further down the road, we'd like to get the common code across
architectures moved up from arch/mm to mm.

Andi Kleen wrote:
> Andy Whitcroft <[email protected]> writes:
>
>
>>We have been looking at the HUGETLB page commit issue (offlist) and are
>>close a final merged patch. However, our testing seems to have thrown up
>
>
> This includes lazy allocation for i386 and IA64, right?
>
> If yes, I'm waiting for your final patch then to remerge the NUMA
> policy code into it (currently NUMA API contains a dumb version of lazy
> allocation for i386 without any prereservation)
>
>
>>I would contend this is the right thing to do, as it makes the semantics of
>>hugepages match that of the existing small pages. We are looking for a
>>consensus as this might be construed as a semantic change.
>
>
> I think it's more clean to do it at shmget() time too, so it's probably the
> right thing to do.
>
> -Andi
>
>

--
Best Regards,
Ray
-----------------------------------------------
Ray Bryant
512-453-9679 (work) 512-507-7807 (cell)
[email protected] [email protected]
The box said: "Requires Windows 98 or better",
so I installed Linux.
-----------------------------------------------

2004-04-08 17:52:03

by Andi Kleen

[permalink] [raw]
Subject: Re: HUGETLB commit handling.

> The other problem we are wrestling with is how to do the ia386 and ia64
> lazy allocation code without breaking the architectures that haven't yet
> switched to lazy allocation. There will probbaly be some
>
> #define ARCH_USES_HUGETLB_PREFAULT
>
> nonsense added to deal with the latter, if needed.

In my patch I just used weak functions: use a dummy weak function
in the high level code and overwrite from the architecture specific
code as needed. This avoids all the ifdefs.

-Andi

2004-04-08 21:59:36

by Rohit Seth

[permalink] [raw]
Subject: RE: HUGETLB commit handling.

Andy Whitcroft <> wrote on Thursday, April 08, 2004 9:36 AM:

> We have been looking at the HUGETLB page commit issue (offlist) and
> are close a final merged patch. However, our testing seems to have
> thrown up an inconsistency in interface which we are not sure whether
> to fix or not.
>
> With normal shm segments we commit the pages we will need at shmget()
> time.
> The real pages being allocated on demand. With hugetlb pages we
> currently do not manage commit, but allocate them on map, shmat() in
> this case. When we add commit handling it would seem most
> appropriate to commit the pages in shmget() as for small page
> mappings. However, this might seem to change the semantics slightly,
> in that if there is insufficient hugepages available then the failure
> would come at shmget() and not shmat() time.
>
> I would contend this is the right thing to do, as it makes the
> semantics of hugepages match that of the existing small pages. We
> are looking for a consensus as this might be construed as a semantic
> change.
>

IMO, doing this accounting check at shmget time seems reasonable as it
aligns the accouting semantics of normal and hugepages.


> Thoughts.
>
> -apw
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64"
> in the body of a message to [email protected] More majordomo
> info at http://vger.kernel.org/majordomo-info.html

2004-04-08 22:46:13

by Andrew Morton

[permalink] [raw]
Subject: Re: HUGETLB commit handling.

Andy Whitcroft <[email protected]> wrote:
>
> We have been looking at the HUGETLB page commit issue (offlist) and are
> close a final merged patch.

Be aware that I've merged a patch from Bill which does all the hugetlb code
unduplication. A thousand lines gone:

25-akpm/arch/i386/mm/hugetlbpage.c | 264 ----------------------------------
25-akpm/arch/ia64/mm/hugetlbpage.c | 251 --------------------------------
25-akpm/arch/ppc64/mm/hugetlbpage.c | 257 ---------------------------------
25-akpm/arch/sh/mm/hugetlbpage.c | 258 ---------------------------------
25-akpm/arch/sparc64/mm/hugetlbpage.c | 259 ---------------------------------
25-akpm/fs/hugetlbfs/inode.c | 2
25-akpm/include/linux/hugetlb.h | 7
25-akpm/kernel/sysctl.c | 6
25-akpm/mm/Makefile | 1
25-akpm/mm/hugetlb.c | 245 +++++++++++++++++++++++++++++++
10 files changed, 263 insertions(+), 1287 deletions(-)

Of course, this buggers up everyone else's patches, but I do think this
work has to come first.

I still need to test this on ppc64 and ia64. I've dropped a rollup against
2.6.5 at http://www.zip.com.au/~akpm/linux/patches/stuff/mc3.bz2 which you
should work against until I get -mc3 out for real.

2004-04-13 10:22:43

by Andy Whitcroft

[permalink] [raw]
Subject: Re: HUGETLB commit handling.

--On Thursday, April 08, 2004 15:47:42 -0700 Andrew Morton <[email protected]> wrote:

> Andy Whitcroft <[email protected]> wrote:
>>
>> We have been looking at the HUGETLB page commit issue (offlist) and are
>> close a final merged patch.
>
> Be aware that I've merged a patch from Bill which does all the hugetlb code
> unduplication. A thousand lines gone:
>
> 25-akpm/arch/i386/mm/hugetlbpage.c | 264 ----------------------------------
> 25-akpm/arch/ia64/mm/hugetlbpage.c | 251 --------------------------------
> 25-akpm/arch/ppc64/mm/hugetlbpage.c | 257 ---------------------------------
> 25-akpm/arch/sh/mm/hugetlbpage.c | 258 ---------------------------------
> 25-akpm/arch/sparc64/mm/hugetlbpage.c | 259 ---------------------------------
> 25-akpm/fs/hugetlbfs/inode.c | 2
> 25-akpm/include/linux/hugetlb.h | 7
> 25-akpm/kernel/sysctl.c | 6
> 25-akpm/mm/Makefile | 1
> 25-akpm/mm/hugetlb.c | 245 +++++++++++++++++++++++++++++++
> 10 files changed, 263 insertions(+), 1287 deletions(-)
>
> Of course, this buggers up everyone else's patches, but I do think this
> work has to come first.
>
> I still need to test this on ppc64 and ia64. I've dropped a rollup against
> 2.6.5 at http://www.zip.com.au/~akpm/linux/patches/stuff/mc3.bz2 which you
> should work against until I get -mc3 out for real.

After bashing my poor bruised head against the screen for a
considerable period I've discovered that memset'ing your IO space
to zero is a very good way to stop your machine dead, silently.
Anyhow, here is a patch against -mc4 to make HUGETLB support actually
work in the presence of memory in ZONE_HIGHMEM.

-apw

=== 8< ===
When clearing a large page allocation ensure we use a page clear function
which will correctly clear a ZONE_HIGHMEM page.

---
hugetlb.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletion(-)

diff -upN reference/mm/hugetlb.c current/mm/hugetlb.c
--- reference/mm/hugetlb.c 2004-04-13 12:10:56.000000000 +0100
+++ current/mm/hugetlb.c 2004-04-13 12:12:20.000000000 +0100
@@ -9,6 +9,7 @@
#include <linux/mm.h>
#include <linux/hugetlb.h>
#include <linux/sysctl.h>
+#include <linux/highmem.h>

const unsigned long hugetlb_zero = 0, hugetlb_infinity = ~0UL;
static unsigned long nr_huge_pages, free_huge_pages;
@@ -66,6 +67,7 @@ void free_huge_page(struct page *page)
struct page *alloc_huge_page(void)
{
struct page *page;
+ int i;

spin_lock(&hugetlb_lock);
page = dequeue_huge_page();
@@ -77,7 +79,8 @@ struct page *alloc_huge_page(void)
spin_unlock(&hugetlb_lock);
set_page_count(page, 1);
page->lru.prev = (void *)free_huge_page;
- memset(page_address(page), 0, HPAGE_SIZE);
+ for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i)
+ clear_highpage(&page[i]);
return page;
}