2005-10-07 21:21:34

by Rohit Seth

[permalink] [raw]
Subject: Re: FW: [PATCH 0/3] Demand faulting for huge pages

On Fri, 2005-10-07 at 10:47 -0700, Adam Litke wrote:
>
>
> If I were to spend time coding up a patch to remove truncation support
> for hugetlbfs, would it be something other people would want to see
> merged as well?
>

In its current form, there is very little use of huegtlb truncate
functionality. Currently it only allows reducing the size of hugetlb
backing file.

IMO it will be useful to keep and enhance this capability so that apps
can dynamically reduce or increase the size of backing files (for
example based on availability of memory at any time).

-rohit


2005-10-08 07:58:31

by Chen, Kenneth W

[permalink] [raw]
Subject: RE: FW: [PATCH 0/3] Demand faulting for huge pages

Rohit Seth wrote on Friday, October 07, 2005 2:29 PM
> On Fri, 2005-10-07 at 10:47 -0700, Adam Litke wrote:
> > If I were to spend time coding up a patch to remove truncation
> > support for hugetlbfs, would it be something other people would
> > want to see merged as well?
>
> In its current form, there is very little use of huegtlb truncate
> functionality. Currently it only allows reducing the size of hugetlb
> backing file.
>
> IMO it will be useful to keep and enhance this capability so that
> apps can dynamically reduce or increase the size of backing files
> (for example based on availability of memory at any time).

Yup, here is a patch to enhance that capability. It is more of bring
ftruncate on hugetlbfs file a step closer to the same semantics for
file on other file systems.


---
Add expanding ftruncate to hugetlbfs.

Signed-off-by: Ken Chen <[email protected]>

--- linux-2.6.14-rc3/fs/hugetlbfs/inode.c.orig 2005-10-07 18:07:38.131373873 -0700
+++ linux-2.6.14-rc3/fs/hugetlbfs/inode.c 2005-10-08 00:31:15.951404405 -0700
@@ -327,20 +327,20 @@ hugetlb_vmtruncate_list(struct prio_tree
}
}

-/*
- * Expanding truncates are not allowed.
- */
static int hugetlb_vmtruncate(struct inode *inode, loff_t offset)
{
unsigned long pgoff;
struct address_space *mapping = inode->i_mapping;
-
- if (offset > inode->i_size)
- return -EINVAL;
+ struct vm_area_struct *vma;
+ struct prio_tree_iter iter;
+ int ret = 0;

BUG_ON(offset & ~HPAGE_MASK);
pgoff = offset >> HPAGE_SHIFT;

+ if (offset > inode->i_size)
+ goto do_expand;
+
inode->i_size = offset;
spin_lock(&mapping->i_mmap_lock);
if (!prio_tree_empty(&mapping->i_mmap))
@@ -348,6 +348,18 @@ static int hugetlb_vmtruncate(struct ino
spin_unlock(&mapping->i_mmap_lock);
truncate_hugepages(mapping, offset);
return 0;
+
+do_expand:
+ spin_lock(&mapping->i_mmap_lock);
+ vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, ULONG_MAX) {
+ ret = hugetlb_prefault(mapping, vma);
+ if (ret == 0)
+ inode->i_size = offset;
+ else
+ break;
+ }
+ spin_unlock(&mapping->i_mmap_lock);
+ return ret;
}

static int hugetlbfs_setattr(struct dentry *dentry, struct iattr *attr)
--- linux-2.6.14-rc3/mm/hugetlb.c.orig 2005-10-07 23:16:42.789349826 -0700
+++ linux-2.6.14-rc3/mm/hugetlb.c 2005-10-07 23:25:04.175085872 -0700
@@ -340,7 +340,7 @@ void zap_hugepage_range(struct vm_area_s

int hugetlb_prefault(struct address_space *mapping, struct vm_area_struct *vma)
{
- struct mm_struct *mm = current->mm;
+ struct mm_struct *mm = vma->vm_mm;
unsigned long addr;
int ret = 0;

@@ -360,6 +360,8 @@ int hugetlb_prefault(struct address_spac
ret = -ENOMEM;
goto out;
}
+ if (pte_present(*pte))
+ continue;

idx = ((addr - vma->vm_start) >> HPAGE_SHIFT)
+ (vma->vm_pgoff >> (HPAGE_SHIFT - PAGE_SHIFT));

2005-10-09 12:28:19

by Hugh Dickins

[permalink] [raw]
Subject: RE: FW: [PATCH 0/3] Demand faulting for huge pages

On Sat, 8 Oct 2005, Chen, Kenneth W wrote:
> Rohit Seth wrote on Friday, October 07, 2005 2:29 PM
> > On Fri, 2005-10-07 at 10:47 -0700, Adam Litke wrote:
> > > If I were to spend time coding up a patch to remove truncation
> > > support for hugetlbfs, would it be something other people would
> > > want to see merged as well?
> >
> > In its current form, there is very little use of huegtlb truncate
> > functionality. Currently it only allows reducing the size of hugetlb
> > backing file.

And is that functionality actually used?

> > IMO it will be useful to keep and enhance this capability so that
> > apps can dynamically reduce or increase the size of backing files
> > (for example based on availability of memory at any time).

And is that functionality actually being asked for?

> Yup, here is a patch to enhance that capability. It is more of bring
> ftruncate on hugetlbfs file a step closer to the same semantics for
> file on other file systems.

Well, it's peculiar semantics that extending a file slots its pages
into existing mmaps, as in your patch. Though that may indeed match
the existing prefault semantics for hugetlb mmaps and files. But in
those existing peculiar semantics, the file can already be extended,
by mmaping further, so you're not really adding new capability.

But please don't expect me to decide one way or another. We all seem
to have different agendas for hugetlb. I'm interested in fixing the
existing bugs with truncation (see -mm), and getting the locking to
fit with my page_table_lock patches. Prohibiting truncation is an
attractively easy and efficient way of fixing several such problems.
Adam is interested in fault on demand, which needs further work if
truncation is allowed. You and Rohit are interested in enhancing
the generality of hugetlbfs.

I'd imagine supporting "read" and "write" would be the first priorities
if you were really trying to make hugetlbfs more like an ordinary fs.
But I thought it was intentionally kept at the minimum to do its job.

Hugh

2005-10-10 06:53:36

by Chen, Kenneth W

[permalink] [raw]
Subject: RE: FW: [PATCH 0/3] Demand faulting for huge pages

Hugh Dickins wrote on Sunday, October 09, 2005 5:27 AM
> We all seem
> to have different agendas for hugetlb. I'm interested in fixing the
> existing bugs with truncation (see -mm), and getting the locking to
> fit with my page_table_lock patches. Prohibiting truncation is an
> attractively easy and efficient way of fixing several such problems.
> Adam is interested in fault on demand, which needs further work if
> truncation is allowed. You and Rohit are interested in enhancing
> the generality of hugetlbfs.

IMO, these three things are not contradictory with each other. They
are orthogonal. Even though maybe we are all touching same lines of
code, in the end, everyone is working toward better and more robust
hugetlb code.

Demand paging is one aspect of enhancing generality of hugetlb. Intel
initially proposed the feature 18 month ago [* see link below] along
with SGI. Christoph Lameter at SGI scratched that subject Oct 2004.
And now, Adam at IBM attempts it again. There is a growing need to
make hugetlb easier to use, more transparency in using hugetlb pages
etc. All requires hugetlb code to be more generalized, instead of
reducing functionality.

Granted, the patch I posted on expanding ftruncate will be replaced
once demand paging goes in. I wanted to demonstrate that it is a
feature we should implement, instead of cutting back more on current
thin functionality in hugetlbfs. (with demand paging, expanding
ftruncate should be really easy and clean, instead of "peculiar
semantics" all because of prefaulting).

- Ken

[*] http://marc.theaimsgroup.com/?l=linux-ia64&m=108189860401704&w=2

2005-10-10 09:35:05

by Andi Kleen

[permalink] [raw]
Subject: Re: FW: [PATCH 0/3] Demand faulting for huge pages

On Monday 10 October 2005 08:51, Chen, Kenneth W wrote:

> Demand paging is one aspect of enhancing generality of hugetlb. Intel
> initially proposed the feature 18 month ago [* see link below] along
> with SGI. Christoph Lameter at SGI scratched that subject Oct 2004.
> And now, Adam at IBM attempts it again. There is a growing need to
> make hugetlb easier to use, more transparency in using hugetlb pages
> etc. All requires hugetlb code to be more generalized, instead of
> reducing functionality.

It's also badly needed to make hugetlbfs NUMA policy aware. mbind
requires allocation on demand, because it runs after mmap and
cannot fix up the policy when the pages are already allocated.

> Granted, the patch I posted on expanding ftruncate will be replaced
> once demand paging goes in. I wanted to demonstrate that it is a
> feature we should implement, instead of cutting back more on current
> thin functionality in hugetlbfs. (with demand paging, expanding
> ftruncate should be really easy and clean, instead of "peculiar
> semantics" all because of prefaulting).

I would like to have it. I remember hating to implement extending
truncate by hand when I did the test programs for the hugetlbfs numa policy.

-Andi

2005-10-10 16:17:36

by Adam Litke

[permalink] [raw]
Subject: RE: FW: [PATCH 0/3] Demand faulting for huge pages

On Sun, 2005-10-09 at 13:27 +0100, Hugh Dickins wrote:
> On Sat, 8 Oct 2005, Chen, Kenneth W wrote:
> > Rohit Seth wrote on Friday, October 07, 2005 2:29 PM
> > > On Fri, 2005-10-07 at 10:47 -0700, Adam Litke wrote:
> > > > If I were to spend time coding up a patch to remove truncation
> > > > support for hugetlbfs, would it be something other people would
> > > > want to see merged as well?
> > >
> > > In its current form, there is very little use of huegtlb truncate
> > > functionality. Currently it only allows reducing the size of hugetlb
> > > backing file.
>
> And is that functionality actually used?
>
> > > IMO it will be useful to keep and enhance this capability so that
> > > apps can dynamically reduce or increase the size of backing files
> > > (for example based on availability of memory at any time).
>
> And is that functionality actually being asked for?
>
> > Yup, here is a patch to enhance that capability. It is more of bring
> > ftruncate on hugetlbfs file a step closer to the same semantics for
> > file on other file systems.
>
> Well, it's peculiar semantics that extending a file slots its pages
> into existing mmaps, as in your patch. Though that may indeed match
> the existing prefault semantics for hugetlb mmaps and files. But in
> those existing peculiar semantics, the file can already be extended,
> by mmaping further, so you're not really adding new capability.
>
> But please don't expect me to decide one way or another. We all seem
> to have different agendas for hugetlb. I'm interested in fixing the
> existing bugs with truncation (see -mm), and getting the locking to
> fit with my page_table_lock patches. Prohibiting truncation is an
> attractively easy and efficient way of fixing several such problems.
> Adam is interested in fault on demand, which needs further work if
> truncation is allowed. You and Rohit are interested in enhancing
> the generality of hugetlbfs.
>
> I'd imagine supporting "read" and "write" would be the first priorities
> if you were really trying to make hugetlbfs more like an ordinary fs.
> But I thought it was intentionally kept at the minimum to do its job.

Honestly, I think there is an even more fundamental issue at hand. If
the goal is transparent and flexible use of huge pages it seems to me
that there is two ways to go:

1) Continue with hugetlbfs and work to finish implementing all of the
operations (that make sense) properly (like read, write, truncate, etc).

2) Recognize that trying to use hugetlbfs files to transparently replace
normal memory is ultimately a hack. Normal memory is not implemented as
a file system so using hugetlb pages here will always cause headaches as
implemented. So work towards removing filesystem-like behaviour and
treating huge pages more like regular memory.

If we can all agree on 1 or 2 then it should be easier to make decisions
like this thread calls for. I'll put my vote in for #2. Thoughts?

--
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

2005-10-11 03:11:11

by Andrew Morton

[permalink] [raw]
Subject: Re: FW: [PATCH 0/3] Demand faulting for huge pages

Adam Litke <[email protected]> wrote:
>
> Honestly, I think there is an even more fundamental issue at hand. If
> the goal is transparent and flexible use of huge pages it seems to me
> that there is two ways to go:
>
> 1) Continue with hugetlbfs and work to finish implementing all of the
> operations (that make sense) properly (like read, write, truncate, etc).

hugetlbfs provides the API by which applications may obtain
hugetlb-page-backed memory. In fact the filesystem didn't even exist in the
initial version of the patch - the first version used specific syscalls to
obtain the hugepage memory.

So. Given that hugetlbfs is purely there as a means by which applications
can access (and share) hugepage memory, it doesn't make sense to flesh that
filesystem out any further. IOW: no need for read() and write().

> 2) Recognize that trying to use hugetlbfs files to transparently replace
> normal memory is ultimately a hack. Normal memory is not implemented as
> a file system so using hugetlb pages here will always cause headaches as
> implemented. So work towards removing filesystem-like behaviour and
> treating huge pages more like regular memory.

Early Linus diktat was that we shouldn't attempt to make the core MM aware
of multiple page sizes in the manner which you suggest. Trying to sneak
this in via "improved integration of hugepage support" would likely create
a mess.

The design approach for hugepage integration was that the MM would continue
to be focussed on a fixed page size and that hugepages would be some
non-intrusive thing off to the side - more like a mmappable device driver
than some core part of the MM system.

This is not all meant to say "don't do it". But I am saying that you'll
need to review several years worth of discussion on the topic and
understand the downsides and objections, and be prepared for a big project.
One which risks causing Hugh a ton of grief in ongoing core MM
improvements.

Aside: one problem with the kernel's hugepage support is that it doesn't
have a single person who performs the overall maintenance function. Bill
Irwin was doing this for a while, but now seems to have gone quiet.

Consequently various people come in and attempt various
this-is-a-change-i-need operations. Problem is, with no single person
keeping track of who the affected stakeholders are, and what the likely
effects of each change upon the stakeholders will be, things proceed slowly
and various people end up maintaining various out-of-tree things (I think).

I attempt to plug the gaps, but the time interval between flurries of
hugetlb activity are long and I forget who's doing what.