2014-06-12 19:15:51

by Waiman Long

[permalink] [raw]
Subject: [PATCH] mm: Move __vma_address() to internal.h to be inlined in huge_memory.c

The vma_address() function which is used to compute the virtual address
within a VMA is used only by 2 files in the mm subsystem - rmap.c and
huge_memory.c. This function is defined in rmap.c and is inlined by
its callers there, but it is also declared as an external function.

However, the __split_huge_page() function which calls vma_address()
in huge_memory.c is calling it as a real function call. This is not
as efficient as an inlined function. This patch moves the underlying
inlined __vma_address() function to internal.h to be shared by both
the rmap.c and huge_memory.c file.

Signed-off-by: Waiman Long <[email protected]>
---
mm/huge_memory.c | 4 ++--
mm/internal.h | 19 +++++++++++++++----
mm/rmap.c | 14 --------------
3 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b4b1feb..75a54ce 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1816,7 +1816,7 @@ static void __split_huge_page(struct page *page,
mapcount = 0;
anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff, pgoff) {
struct vm_area_struct *vma = avc->vma;
- unsigned long addr = vma_address(page, vma);
+ unsigned long addr = __vma_address(page, vma);
BUG_ON(is_vma_temporary_stack(vma));
mapcount += __split_huge_page_splitting(page, vma, addr);
}
@@ -1840,7 +1840,7 @@ static void __split_huge_page(struct page *page,
mapcount2 = 0;
anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff, pgoff) {
struct vm_area_struct *vma = avc->vma;
- unsigned long addr = vma_address(page, vma);
+ unsigned long addr = __vma_address(page, vma);
BUG_ON(is_vma_temporary_stack(vma));
mapcount2 += __split_huge_page_map(page, vma, addr);
}
diff --git a/mm/internal.h b/mm/internal.h
index 07b6736..3c9dbc2 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -13,6 +13,7 @@

#include <linux/fs.h>
#include <linux/mm.h>
+#include <linux/hugetlb.h>

void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
unsigned long floor, unsigned long ceiling);
@@ -238,12 +239,22 @@ static inline void mlock_migrate_page(struct page *newpage, struct page *page)
}
}

+/*
+ * __vma_address - at what user virtual address is page expected in @vma?
+ */
+static inline unsigned long
+__vma_address(struct page *page, struct vm_area_struct *vma)
+{
+ pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+
+ if (unlikely(is_vm_hugetlb_page(vma)))
+ pgoff = page->index << huge_page_order(page_hstate(page));
+
+ return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
+}
+
extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma);

-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-extern unsigned long vma_address(struct page *page,
- struct vm_area_struct *vma);
-#endif
#else /* !CONFIG_MMU */
static inline int mlocked_vma_newpage(struct vm_area_struct *v, struct page *p)
{
diff --git a/mm/rmap.c b/mm/rmap.c
index 83bfafa..5ab9a74 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -509,21 +509,7 @@ void page_unlock_anon_vma_read(struct anon_vma *anon_vma)
anon_vma_unlock_read(anon_vma);
}

-/*
- * At what user virtual address is page expected in @vma?
- */
static inline unsigned long
-__vma_address(struct page *page, struct vm_area_struct *vma)
-{
- pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
-
- if (unlikely(is_vm_hugetlb_page(vma)))
- pgoff = page->index << huge_page_order(page_hstate(page));
-
- return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
-}
-
-inline unsigned long
vma_address(struct page *page, struct vm_area_struct *vma)
{
unsigned long address = __vma_address(page, vma);
--
1.7.1


2014-06-12 19:25:49

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: Move __vma_address() to internal.h to be inlined in huge_memory.c

On Thu, 12 Jun 2014 15:15:40 -0400 Waiman Long <[email protected]> wrote:

> The vma_address() function which is used to compute the virtual address
> within a VMA is used only by 2 files in the mm subsystem - rmap.c and
> huge_memory.c. This function is defined in rmap.c and is inlined by
> its callers there, but it is also declared as an external function.
>
> However, the __split_huge_page() function which calls vma_address()
> in huge_memory.c is calling it as a real function call. This is not
> as efficient as an inlined function. This patch moves the underlying
> inlined __vma_address() function to internal.h to be shared by both
> the rmap.c and huge_memory.c file.

This increases huge_memory.o's text+data_bss by 311 bytes, which makes
me suspect that it is a bad change due to its increase of kernel cache
footprint.

Perhaps we should be noinlining __vma_address()?

2014-06-12 21:34:22

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] mm: Move __vma_address() to internal.h to be inlined in huge_memory.c

On 06/12/2014 03:25 PM, Andrew Morton wrote:
> On Thu, 12 Jun 2014 15:15:40 -0400 Waiman Long<[email protected]> wrote:
>
>> The vma_address() function which is used to compute the virtual address
>> within a VMA is used only by 2 files in the mm subsystem - rmap.c and
>> huge_memory.c. This function is defined in rmap.c and is inlined by
>> its callers there, but it is also declared as an external function.
>>
>> However, the __split_huge_page() function which calls vma_address()
>> in huge_memory.c is calling it as a real function call. This is not
>> as efficient as an inlined function. This patch moves the underlying
>> inlined __vma_address() function to internal.h to be shared by both
>> the rmap.c and huge_memory.c file.
> This increases huge_memory.o's text+data_bss by 311 bytes, which makes
> me suspect that it is a bad change due to its increase of kernel cache
> footprint.
>
> Perhaps we should be noinlining __vma_address()?

On my test machine, I saw an increase of 144 bytes in the text segment
of huge_memory.o. The size in size is caused by an increase in the size
of the __split_huge_page function. When I remove the

if (unlikely(is_vm_hugetlb_page(vma)))
pgoff = page->index << huge_page_order(page_hstate(page));

check, the increase in size drops down to 24 bytes. As a THP cannot be
a hugetlb page, there is no point in doing this check for a THP. I will
update the patch to pass in an additional argument to disable this
check for __split_huge_page.

-Longman

2014-06-12 21:45:18

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] mm: Move __vma_address() to internal.h to be inlined in huge_memory.c

On Thu, 12 Jun 2014, Waiman Long wrote:

> > > The vma_address() function which is used to compute the virtual address
> > > within a VMA is used only by 2 files in the mm subsystem - rmap.c and
> > > huge_memory.c. This function is defined in rmap.c and is inlined by
> > > its callers there, but it is also declared as an external function.
> > >
> > > However, the __split_huge_page() function which calls vma_address()
> > > in huge_memory.c is calling it as a real function call. This is not
> > > as efficient as an inlined function. This patch moves the underlying
> > > inlined __vma_address() function to internal.h to be shared by both
> > > the rmap.c and huge_memory.c file.
> > This increases huge_memory.o's text+data_bss by 311 bytes, which makes
> > me suspect that it is a bad change due to its increase of kernel cache
> > footprint.
> >
> > Perhaps we should be noinlining __vma_address()?
>
> On my test machine, I saw an increase of 144 bytes in the text segment
> of huge_memory.o. The size in size is caused by an increase in the size
> of the __split_huge_page function. When I remove the
>
> if (unlikely(is_vm_hugetlb_page(vma)))
> pgoff = page->index << huge_page_order(page_hstate(page));
>
> check, the increase in size drops down to 24 bytes. As a THP cannot be
> a hugetlb page, there is no point in doing this check for a THP. I will
> update the patch to pass in an additional argument to disable this
> check for __split_huge_page.
>

I think we're seeking a reason or performance numbers that suggest
__vma_address() being inline is appropriate and so far we lack any such
evidence. Adding additional parameters to determine checks isn't going to
change the fact that it increases text size needlessly.

2014-06-16 19:34:52

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] mm: Move __vma_address() to internal.h to be inlined in huge_memory.c

On 06/12/2014 05:45 PM, David Rientjes wrote:
> On Thu, 12 Jun 2014, Waiman Long wrote:
>
>>>> The vma_address() function which is used to compute the virtual address
>>>> within a VMA is used only by 2 files in the mm subsystem - rmap.c and
>>>> huge_memory.c. This function is defined in rmap.c and is inlined by
>>>> its callers there, but it is also declared as an external function.
>>>>
>>>> However, the __split_huge_page() function which calls vma_address()
>>>> in huge_memory.c is calling it as a real function call. This is not
>>>> as efficient as an inlined function. This patch moves the underlying
>>>> inlined __vma_address() function to internal.h to be shared by both
>>>> the rmap.c and huge_memory.c file.
>>> This increases huge_memory.o's text+data_bss by 311 bytes, which makes
>>> me suspect that it is a bad change due to its increase of kernel cache
>>> footprint.
>>>
>>> Perhaps we should be noinlining __vma_address()?
>> On my test machine, I saw an increase of 144 bytes in the text segment
>> of huge_memory.o. The size in size is caused by an increase in the size
>> of the __split_huge_page function. When I remove the
>>
>> if (unlikely(is_vm_hugetlb_page(vma)))
>> pgoff = page->index<< huge_page_order(page_hstate(page));
>>
>> check, the increase in size drops down to 24 bytes. As a THP cannot be
>> a hugetlb page, there is no point in doing this check for a THP. I will
>> update the patch to pass in an additional argument to disable this
>> check for __split_huge_page.
>>
> I think we're seeking a reason or performance numbers that suggest
> __vma_address() being inline is appropriate and so far we lack any such
> evidence. Adding additional parameters to determine checks isn't going to
> change the fact that it increases text size needlessly.

This patch was motivated by my investigation of a freeze problem of an
application running on SLES11 sp3 which was caused by the long time it
took to munmap part of a THP. Inlining vma_address help a bit in that
situation. However, the problem will be essentially gone after including
patches that changing the anon_vma_chain to use rbtree instead of a
simple list.

I do agree that performance impact of inlining vma_address in minimal in
the latest kernel. So I am not going to pursue this any further.

Thank for the review.

-Longman