This patch makes the function make_huge_pte non-static, so it can be used
by drivers that want to mmap huge pages. Consequently, a prototype for the
function is added to hugetlb.h. Since I was looking here, I noticed some
coding style problems in the function and fix them with this patch.
Signed-off-by: Mark Rustad <[email protected]>
include/linux/hugetlb.h | 1 +
mm/hugetlb.c | 13 ++++++-------
2 files changed, 7 insertions(+), 7 deletions(-)
--- a/include/linux/hugetlb.h 2005-11-28 15:58:37.000000000 -0600
+++ b/include/linux/hugetlb.h 2005-12-08 10:38:36.099028314 -0600
@@ -24,6 +24,7 @@ int is_hugepage_mem_enough(size_t);
unsigned long hugetlb_total_pages(void);
struct page *alloc_huge_page(void);
void free_huge_page(struct page *);
+pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page);
int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, int write_access);
--- a/mm/hugetlb.c 2005-11-29 12:11:43.794928597 -0600
+++ b/mm/hugetlb.c 2005-12-08 10:43:43.983294767 -0600
@@ -261,16 +261,15 @@ struct vm_operations_struct hugetlb_vm_o
.nopage = hugetlb_nopage,
};
-static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page)
+pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page)
{
pte_t entry;
- if (vma->vm_flags & VM_WRITE) {
- entry =
- pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
- } else {
- entry = pte_wrprotect(mk_pte(page, vma->vm_page_prot));
- }
+ entry = mk_pte(page, vma->vm_page_prot);
+ if (vma->vm_flags & VM_WRITE)
+ entry = pte_mkwrite(pte_mkdirty(entry));
+ else
+ entry = pte_wrprotect(entry);
entry = pte_mkyoung(entry);
entry = pte_mkhuge(entry);
--
Mark Rustad, [email protected]
On Fri, 2005-12-09 at 10:39 -0600, Mark Rustad wrote:
> This patch makes the function make_huge_pte non-static, so it can be used
> by drivers that want to mmap huge pages. Consequently, a prototype for the
> function is added to hugetlb.h. Since I was looking here, I noticed some
> coding style problems in the function and fix them with this patch.
>
> Signed-off-by: Mark Rustad <[email protected]>
Call me crazy, but I cringe when I think of any old driver directly
mucking with huge_ptes. Forgive me if I am missing something, but why
can't you just call do_mmap with a hugetlbfs file like everyone else?
Otherwise, the CodingStyle cleanups look alright.
--
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center
On Fri, 2005-12-09 at 10:39 -0600, Mark Rustad wrote:
> This patch makes the function make_huge_pte non-static, so it can be used
> by drivers that want to mmap huge pages. Consequently, a prototype for the
> function is added to hugetlb.h. Since I was looking here, I noticed some
> coding style problems in the function and fix them with this patch.
What driver needs to map huge pages? Is it in the kernel tree now? If
not, can you post the source, please?
-- Dave
On Dec 9, 2005, at 11:05 AM, Adam Litke wrote:
> On Fri, 2005-12-09 at 10:39 -0600, Mark Rustad wrote:
>> This patch makes the function make_huge_pte non-static, so it can
>> be used
>> by drivers that want to mmap huge pages. Consequently, a prototype
>> for the
>> function is added to hugetlb.h. Since I was looking here, I
>> noticed some
>> coding style problems in the function and fix them with this patch.
>>
>> Signed-off-by: Mark Rustad <[email protected]>
>
> Call me crazy, but I cringe when I think of any old driver directly
> mucking with huge_ptes. Forgive me if I am missing something, but why
> can't you just call do_mmap with a hugetlbfs file like everyone else?
> Otherwise, the CodingStyle cleanups look alright.
That would be nice, but we need multiple, contiguous huge pages.
Actually, about 768M worth. Yeah, I guess I'll stipulate that what
we're doing is pretty crazy, but it works well. I figure if I can
call alloc_huge_page, I should be able to remap such a page.
Actually, I would prefer an explicit remap call for this purpose, but
in doing my own I found that I needed precisely the code that was
already in make_huge_pte.
I don't have any strong feeling about whether this is accepted or
not. I just thought that I should share a change that might be useful
to others.
--
Mark Rustad, [email protected]
On Dec 9, 2005, at 11:16 AM, Dave Hansen wrote:
> What driver needs to map huge pages? Is it in the kernel tree
> now? If
> not, can you post the source, please?
It is a funky driver for an embedded system. I can't imagine it ever
being in the kernel tree, because not many people want to share 768M
of contiguous physical memory.
I can post the source, but it really is a bunch of random stuff for
am embedded application. We do make it available as part of our GPL
source release to customers.
--
Mark Rustad, [email protected]
On Fri, 2005-12-09 at 11:55 -0600, Mark Rustad wrote:
> On Dec 9, 2005, at 11:16 AM, Dave Hansen wrote:
>
> > What driver needs to map huge pages? Is it in the kernel tree
> > now? If
> > not, can you post the source, please?
>
> It is a funky driver for an embedded system. I can't imagine it ever
> being in the kernel tree, because not many people want to share 768M
> of contiguous physical memory.
>
> I can post the source, but it really is a bunch of random stuff for
> am embedded application. We do make it available as part of our GPL
> source release to customers.
You'd be surprised. If we know what you're actually trying to do, we
might be able to suggest another option. As Adam said, having userspace
mmap a hugetlb area, then hand it to the driver would certainly keep
your kernel modifications to a minimum.
-- Dave
On Dec 9, 2005, at 12:08 PM, Dave Hansen wrote:
> On Fri, 2005-12-09 at 11:55 -0600, Mark Rustad wrote:
>> On Dec 9, 2005, at 11:16 AM, Dave Hansen wrote:
>>
>>> What driver needs to map huge pages? Is it in the kernel tree
>>> now? If
>>> not, can you post the source, please?
>>
>> It is a funky driver for an embedded system. I can't imagine it ever
>> being in the kernel tree, because not many people want to share 768M
>> of contiguous physical memory.
>>
>> I can post the source, but it really is a bunch of random stuff for
>> am embedded application. We do make it available as part of our GPL
>> source release to customers.
>
> You'd be surprised. If we know what you're actually trying to do, we
> might be able to suggest another option. As Adam said, having
> userspace
> mmap a hugetlb area, then hand it to the driver would certainly keep
> your kernel modifications to a minimum.
Actually, the driver never touches any of the memory at all either
directly or indirectly - it simply maps the memory for the processes
that use it. Those processes actually contain PCI device drivers
which do DMA on much of the memory. The same memory also holds data
structures shared by those processes. If hugetlbfs could be
guaranteed to provide contiguous memory for a file, that could be
used in this application. We used to use remap_pfn_range in our
driver, but recent changes there made that not work for this
application, so I figured I may as well switch to huge pages for
these monster areas. At least, gdb was unable to access these large
shared areas, which is a deal-breaker for our developers.
In case you are wondering, these processes were ported onto Linux
from a different, non-x86-based, system that had three cpus which
could directly address each other's memory. They have now been
running happily on top of x86 Linux now for well over a year.
--
Mark Rustad, [email protected]
On Fri, 9 Dec 2005, Mark Rustad wrote:
>
> If hugetlbfs could be guaranteed to provide contiguous memory for a file, that
> could be used in this application. We used to use remap_pfn_range in our
> driver, but recent changes there made that not work for this application, so I
You're not the only one to have trouble with recent remap_pfn_range changes.
Would you let us know what you were doing, that you can no longer do?
Some of the change may need to be reverted.
Hugh
On Dec 9, 2005, at 2:37 PM, Hugh Dickins wrote:
> On Fri, 9 Dec 2005, Mark Rustad wrote:
>>
>> If hugetlbfs could be guaranteed to provide contiguous memory for
>> a file, that
>> could be used in this application. We used to use remap_pfn_range
>> in our
>> driver, but recent changes there made that not work for this
>> application, so I
>
> You're not the only one to have trouble with recent remap_pfn_range
> changes.
> Would you let us know what you were doing, that you can no longer do?
> Some of the change may need to be reverted.
Well, our driver had been allocating two 320MB and one 128MB range of
memory, each of the three being contiguous. These were allocated by
allocating lots of 1MB groups of pages until we got a contiguous
range, then the unneeded pages were freed.
These areas were then mapped into the application with
remap_pfn_range. We have been running on a SuSE kernel derived from
2.6.5 for a long time where this worked fine, even for gdb to access
during debugging. Now that we are moving to a more current kernel,
changes were needed mainly to allow gdb to access these shared memory
areas.
I had messed with simply taking the large memory by restricting the
kernel's memory range with mem=, but gdb still can't get to the pages
because it believes that they are for I/O (there would be no struct
page in that case).
Given the situation, using hugepages seemed more attractive anyway,
so I just decided to go that way and specify hugepages=192 on the
kernel command line.
We also have a single page shared between our processes and the
driver, but we now use the new insert_single_page call for that,
which works nicely. It seemed to me that calling that for the each of
the single pages in our 768M of shared memory was silly, so I went
the hugepage route, and that proved to be less trouble than I had
expected. I feel like things now are really where they should have
been all along.
--
Mark Rustad, [email protected]
On Fri, 9 Dec 2005, Mark Rustad wrote:
> On Dec 9, 2005, at 2:37 PM, Hugh Dickins wrote:
> >
> > You're not the only one to have trouble with recent remap_pfn_range
> > changes.
> > Would you let us know what you were doing, that you can no longer do?
> > Some of the change may need to be reverted.
>
> Well, our driver had been allocating two 320MB and one 128MB range of memory,
> each of the three being contiguous. These were allocated by allocating lots of
> 1MB groups of pages until we got a contiguous range, then the unneeded pages
> were freed.
I can understand that you might be dissatisfied with that.
> These areas were then mapped into the application with remap_pfn_range. We
> have been running on a SuSE kernel derived from 2.6.5 for a long time where
> this worked fine, even for gdb to access during debugging. Now that we are
> moving to a more current kernel, changes were needed mainly to allow gdb to
> access these shared memory areas.
Okay, I think I get the picture. 2.6.15-rc5 would work if you used
three adjacent mmaps, but that would involve changes to your driver and
to your userspace, so you thought better to do it another way anyway.
> I had messed with simply taking the large memory by restricting the kernel's
> memory range with mem=, but gdb still can't get to the pages because it
> believes that they are for I/O (there would be no struct page in that case).
>
> Given the situation, using hugepages seemed more attractive anyway, so I just
> decided to go that way and specify hugepages=192 on the kernel command line.
>
> We also have a single page shared between our processes and the driver, but we
> now use the new insert_single_page call for that, which works nicely. It
> seemed to me that calling that for the each of the single pages in our 768M of
> shared memory was silly, so I went the hugepage route, and that proved to be
> less trouble than I had expected. I feel like things now are really where they
> should have been all along.
Hmm. Well, I share the doubts Dave and Adam have expressed. Out-of-tree
drivers making up their own page tables are likely to break and be broken,
and the more so once you get into hugepages. You'll be much more portable
from release to release if you stick with lots of vm_insert_pages, silly
as all that does seem, yes. Sorry, I don't have a better answer to hand.
Hugh
On Dec 9, 2005, at 4:12 PM, Hugh Dickins wrote:
> On Fri, 9 Dec 2005, Mark Rustad wrote:
>> On Dec 9, 2005, at 2:37 PM, Hugh Dickins wrote:
>>>
>>> You're not the only one to have trouble with recent remap_pfn_range
>>> changes.
>>> Would you let us know what you were doing, that you can no longer
>>> do?
>>> Some of the change may need to be reverted.
>>
>> Well, our driver had been allocating two 320MB and one 128MB range
>> of memory,
>> each of the three being contiguous. These were allocated by
>> allocating lots of
>> 1MB groups of pages until we got a contiguous range, then the
>> unneeded pages
>> were freed.
>
> I can understand that you might be dissatisfied with that.
Not dissatisfied really. It worked fine, not optimal of course, but
it worked. Since change was forced by kernel changes, it made sense
to try to do better while making it work again.
>> These areas were then mapped into the application with
>> remap_pfn_range. We
>> have been running on a SuSE kernel derived from 2.6.5 for a long
>> time where
>> this worked fine, even for gdb to access during debugging. Now
>> that we are
>> moving to a more current kernel, changes were needed mainly to
>> allow gdb to
>> access these shared memory areas.
>
> Okay, I think I get the picture. 2.6.15-rc5 would work if you used
> three adjacent mmaps, but that would involve changes to your driver
> and
> to your userspace, so you thought better to do it another way anyway.
We had to change the remap_pfn_range though, because that was no
longer usable if one wanted to be able to access the memory from gdb,
which is a requirement for debugging. Also, all of our shared memory
had been coming out of low memory - the change to hugepages now has
lifted that restriction, which is also a much better place to be.
>> I had messed with simply taking the large memory by restricting
>> the kernel's
>> memory range with mem=, but gdb still can't get to the pages
>> because it
>> believes that they are for I/O (there would be no struct page in
>> that case).
>>
>> Given the situation, using hugepages seemed more attractive
>> anyway, so I just
>> decided to go that way and specify hugepages=192 on the kernel
>> command line.
>>
>> We also have a single page shared between our processes and the
>> driver, but we
>> now use the new insert_single_page call for that, which works
>> nicely. It
>> seemed to me that calling that for the each of the single pages in
>> our 768M of
>> shared memory was silly, so I went the hugepage route, and that
>> proved to be
>> less trouble than I had expected. I feel like things now are
>> really where they
>> should have been all along.
>
> Hmm. Well, I share the doubts Dave and Adam have expressed. Out-
> of-tree
> drivers making up their own page tables are likely to break and be
> broken,
> and the more so once you get into hugepages. You'll be much more
> portable
> from release to release if you stick with lots of vm_insert_pages,
> silly
> as all that does seem, yes. Sorry, I don't have a better answer to
> hand.
Well, portability is less important than maintainability. I think
being able to call make_huge_pte likely makes what I'm doing somewhat
more maintainable than duplicating the code locally. I would prefer
an explicit call to do the mapping, but I don't really expect that
there are very many things doing memory mapping on this scale, so I
have not submitted a patch to implement that. I am also not very
confident in my understanding of this area of the kernel to think
that I could provide an adequate patch, even though the code that I
have is working for our application.
Do you foresee problems in using the make_huge_pte function? I am
also calling alloc_huge_page, free_huge_page, huge_pte_alloc,
set_huge_pte_at and pte_none in my code. I only had to change
make_huge_pte in order to be able to call it - the other functions
were already callable. Note that in my case, the driver is built into
the kernel - it is not a module. My patch for changing make_huge_pte
does not export that symbol from the kernel for use by modules.
--
Mark Rustad, [email protected]