2013-03-18 15:41:00

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage

On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote:
> This patch extends check_range() to handle vma with VM_HUGETLB set.
> With this changes, we can migrate hugepage with migrate_pages(2).
> Note that for larger hugepages (covered by pud entries, 1GB for
> x86_64 for example), we simply skip it now.
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> ---
> include/linux/hugetlb.h | 6 ++++--
> mm/hugetlb.c | 10 ++++++++++
> mm/mempolicy.c | 46 ++++++++++++++++++++++++++++++++++------------
> 3 files changed, 48 insertions(+), 14 deletions(-)
>
> diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h
> index 8f87115..eb33df5 100644
> --- v3.8.orig/include/linux/hugetlb.h
> +++ v3.8/include/linux/hugetlb.h
> @@ -69,6 +69,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
> int dequeue_hwpoisoned_huge_page(struct page *page);
> void putback_active_hugepage(struct page *page);
> void putback_active_hugepages(struct list_head *l);
> +void migrate_hugepage_add(struct page *page, struct list_head *list);
> void copy_huge_page(struct page *dst, struct page *src);
>
> extern unsigned long hugepages_treat_as_movable;
> @@ -88,8 +89,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
> pmd_t *pmd, int write);
> struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
> pud_t *pud, int write);
> -int pmd_huge(pmd_t pmd);
> -int pud_huge(pud_t pmd);
> +extern int pmd_huge(pmd_t pmd);
> +extern int pud_huge(pud_t pmd);

extern is not needed here.

> unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
> unsigned long address, unsigned long end, pgprot_t newprot);
>
> @@ -134,6 +135,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
>
> #define putback_active_hugepage(p) 0
> #define putback_active_hugepages(l) 0
> +#define migrate_hugepage_add(p, l) 0
> static inline void copy_huge_page(struct page *dst, struct page *src)
> {
> }
> diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
> index cb9d43b8..86ffcb7 100644
> --- v3.8.orig/mm/hugetlb.c
> +++ v3.8/mm/hugetlb.c
> @@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l)
> list_for_each_entry_safe(page, page2, l, lru)
> putback_active_hugepage(page);
> }
> +
> +void migrate_hugepage_add(struct page *page, struct list_head *list)
> +{
> + VM_BUG_ON(!PageHuge(page));
> + get_page(page);
> + spin_lock(&hugetlb_lock);

Why hugetlb_lock? Comment for this lock says that it protects
hugepage_freelists, nr_huge_pages, and free_huge_pages.

> + list_move_tail(&page->lru, list);
> + spin_unlock(&hugetlb_lock);
> + return;
> +}
> diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c
> index e2df1c1..8627135 100644
> --- v3.8.orig/mm/mempolicy.c
> +++ v3.8/mm/mempolicy.c
> @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> return addr != end;
> }
>
> +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
> + const nodemask_t *nodes, unsigned long flags,
> + void *private)
> +{
> +#ifdef CONFIG_HUGETLB_PAGE
> + int nid;
> + struct page *page;
> +
> + spin_lock(&vma->vm_mm->page_table_lock);
> + page = pte_page(huge_ptep_get((pte_t *)pmd));
> + spin_unlock(&vma->vm_mm->page_table_lock);

I am a bit confused why page_table_lock is used here and why it doesn't
cover the page usage.

> + nid = page_to_nid(page);
> + if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT)
> + && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1)
> + || flags & MPOL_MF_MOVE_ALL))
> + migrate_hugepage_add(page, private);
> +#else
> + BUG();
> +#endif
> +}
> +
> static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> unsigned long addr, unsigned long end,
> const nodemask_t *nodes, unsigned long flags,
> @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> pmd = pmd_offset(pud, addr);
> do {
> next = pmd_addr_end(addr, end);
> + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) {

Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge()
sufficient?

> + check_hugetlb_pmd_range(vma, pmd, nodes,
> + flags, private);
> + continue;
> + }
> split_huge_page_pmd(vma, addr, pmd);
> if (pmd_none_or_trans_huge_or_clear_bad(pmd))
> continue;
[...]
--
Michal Hocko
SUSE Labs


2013-03-19 00:08:11

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage

On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote:
> On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote:
> > This patch extends check_range() to handle vma with VM_HUGETLB set.
> > With this changes, we can migrate hugepage with migrate_pages(2).
> > Note that for larger hugepages (covered by pud entries, 1GB for
> > x86_64 for example), we simply skip it now.
> >
> > Signed-off-by: Naoya Horiguchi <[email protected]>
> > ---
> > include/linux/hugetlb.h | 6 ++++--
> > mm/hugetlb.c | 10 ++++++++++
> > mm/mempolicy.c | 46 ++++++++++++++++++++++++++++++++++------------
> > 3 files changed, 48 insertions(+), 14 deletions(-)
> >
> > diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h
> > index 8f87115..eb33df5 100644
> > --- v3.8.orig/include/linux/hugetlb.h
> > +++ v3.8/include/linux/hugetlb.h
> > @@ -69,6 +69,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
> > int dequeue_hwpoisoned_huge_page(struct page *page);
> > void putback_active_hugepage(struct page *page);
> > void putback_active_hugepages(struct list_head *l);
> > +void migrate_hugepage_add(struct page *page, struct list_head *list);
> > void copy_huge_page(struct page *dst, struct page *src);
> >
> > extern unsigned long hugepages_treat_as_movable;
> > @@ -88,8 +89,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
> > pmd_t *pmd, int write);
> > struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
> > pud_t *pud, int write);
> > -int pmd_huge(pmd_t pmd);
> > -int pud_huge(pud_t pmd);
> > +extern int pmd_huge(pmd_t pmd);
> > +extern int pud_huge(pud_t pmd);
>
> extern is not needed here.

OK.

> > unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
> > unsigned long address, unsigned long end, pgprot_t newprot);
> >
> > @@ -134,6 +135,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
> >
> > #define putback_active_hugepage(p) 0
> > #define putback_active_hugepages(l) 0
> > +#define migrate_hugepage_add(p, l) 0
> > static inline void copy_huge_page(struct page *dst, struct page *src)
> > {
> > }
> > diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
> > index cb9d43b8..86ffcb7 100644
> > --- v3.8.orig/mm/hugetlb.c
> > +++ v3.8/mm/hugetlb.c
> > @@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l)
> > list_for_each_entry_safe(page, page2, l, lru)
> > putback_active_hugepage(page);
> > }
> > +
> > +void migrate_hugepage_add(struct page *page, struct list_head *list)
> > +{
> > + VM_BUG_ON(!PageHuge(page));
> > + get_page(page);
> > + spin_lock(&hugetlb_lock);
>
> Why hugetlb_lock? Comment for this lock says that it protects
> hugepage_freelists, nr_huge_pages, and free_huge_pages.

I think that this comment is out of date and hugepage_activelists,
which was introduced recently, should be protected because this
patchset adds is_hugepage_movable() which runs through the list.
So I'll update the comment in the next post.

> > + list_move_tail(&page->lru, list);
> > + spin_unlock(&hugetlb_lock);
> > + return;
> > +}
> > diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c
> > index e2df1c1..8627135 100644
> > --- v3.8.orig/mm/mempolicy.c
> > +++ v3.8/mm/mempolicy.c
> > @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> > return addr != end;
> > }
> >
> > +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
> > + const nodemask_t *nodes, unsigned long flags,
> > + void *private)
> > +{
> > +#ifdef CONFIG_HUGETLB_PAGE
> > + int nid;
> > + struct page *page;
> > +
> > + spin_lock(&vma->vm_mm->page_table_lock);
> > + page = pte_page(huge_ptep_get((pte_t *)pmd));
> > + spin_unlock(&vma->vm_mm->page_table_lock);
>
> I am a bit confused why page_table_lock is used here and why it doesn't
> cover the page usage.

I expected this function to do the same for pmd as check_pte_range() does
for pte, but the above code didn't do it. I should've put spin_unlock
below migrate_hugepage_add(). Sorry for the confusion.

> > + nid = page_to_nid(page);
> > + if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT)
> > + && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1)
> > + || flags & MPOL_MF_MOVE_ALL))
> > + migrate_hugepage_add(page, private);
> > +#else
> > + BUG();
> > +#endif
> > +}
> > +
> > static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> > unsigned long addr, unsigned long end,
> > const nodemask_t *nodes, unsigned long flags,
> > @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> > pmd = pmd_offset(pud, addr);
> > do {
> > next = pmd_addr_end(addr, end);
> > + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) {
>
> Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge()
> sufficient?

I think we need both check here because if we use only pmd_huge(),
pmd for thp goes into this branch wrongly.

Thanks,
Naoya

> > + check_hugetlb_pmd_range(vma, pmd, nodes,
> > + flags, private);
> > + continue;
> > + }
> > split_huge_page_pmd(vma, addr, pmd);
> > if (pmd_none_or_trans_huge_or_clear_bad(pmd))
> > continue;
> [...]
> --
> Michal Hocko
> SUSE Labs

2013-03-19 07:11:19

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage

On Mon 18-03-13 20:07:16, Naoya Horiguchi wrote:
> On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote:
> > On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote:
[...]
> > > @@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l)
> > > list_for_each_entry_safe(page, page2, l, lru)
> > > putback_active_hugepage(page);
> > > }
> > > +
> > > +void migrate_hugepage_add(struct page *page, struct list_head *list)
> > > +{
> > > + VM_BUG_ON(!PageHuge(page));
> > > + get_page(page);
> > > + spin_lock(&hugetlb_lock);
> >
> > Why hugetlb_lock? Comment for this lock says that it protects
> > hugepage_freelists, nr_huge_pages, and free_huge_pages.
>
> I think that this comment is out of date and hugepage_activelists,
> which was introduced recently, should be protected because this
> patchset adds is_hugepage_movable() which runs through the list.
> So I'll update the comment in the next post.
>
> > > + list_move_tail(&page->lru, list);
> > > + spin_unlock(&hugetlb_lock);
> > > + return;
> > > +}
> > > diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c
> > > index e2df1c1..8627135 100644
> > > --- v3.8.orig/mm/mempolicy.c
> > > +++ v3.8/mm/mempolicy.c
> > > @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> > > return addr != end;
> > > }
> > >
> > > +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
> > > + const nodemask_t *nodes, unsigned long flags,
> > > + void *private)
> > > +{
> > > +#ifdef CONFIG_HUGETLB_PAGE
> > > + int nid;
> > > + struct page *page;
> > > +
> > > + spin_lock(&vma->vm_mm->page_table_lock);
> > > + page = pte_page(huge_ptep_get((pte_t *)pmd));
> > > + spin_unlock(&vma->vm_mm->page_table_lock);
> >
> > I am a bit confused why page_table_lock is used here and why it doesn't
> > cover the page usage.
>
> I expected this function to do the same for pmd as check_pte_range() does
> for pte, but the above code didn't do it. I should've put spin_unlock
> below migrate_hugepage_add(). Sorry for the confusion.

OK, I see. So you want to prevent from racing with pmd unmap.

> > > + nid = page_to_nid(page);
> > > + if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT)
> > > + && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1)
> > > + || flags & MPOL_MF_MOVE_ALL))
> > > + migrate_hugepage_add(page, private);
> > > +#else
> > > + BUG();
> > > +#endif
> > > +}
> > > +
> > > static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> > > unsigned long addr, unsigned long end,
> > > const nodemask_t *nodes, unsigned long flags,
> > > @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> > > pmd = pmd_offset(pud, addr);
> > > do {
> > > next = pmd_addr_end(addr, end);
> > > + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) {
> >
> > Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge()
> > sufficient?
>
> I think we need both check here because if we use only pmd_huge(),
> pmd for thp goes into this branch wrongly.

Bahh. You are right. I thought that pmd_huge is hugetlb thingy but it
obviously checks only _PAGE_PSE same as pmd_large() which is really
unfortunate and confusing. Can we make it hugetlb specific?

>
> Thanks,
> Naoya
--
Michal Hocko
SUSE Labs

2013-03-20 00:31:14

by Simon Jeons

[permalink] [raw]
Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage

Hi Naoya,
On 03/19/2013 08:07 AM, Naoya Horiguchi wrote:
> On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote:
>> On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote:
>>> This patch extends check_range() to handle vma with VM_HUGETLB set.
>>> With this changes, we can migrate hugepage with migrate_pages(2).
>>> Note that for larger hugepages (covered by pud entries, 1GB for
>>> x86_64 for example), we simply skip it now.
>>>
>>> Signed-off-by: Naoya Horiguchi <[email protected]>
>>> ---
>>> include/linux/hugetlb.h | 6 ++++--
>>> mm/hugetlb.c | 10 ++++++++++
>>> mm/mempolicy.c | 46 ++++++++++++++++++++++++++++++++++------------
>>> 3 files changed, 48 insertions(+), 14 deletions(-)
>>>
>>> diff --git v3.8.orig/include/linux/hugetlb.h v3.8/include/linux/hugetlb.h
>>> index 8f87115..eb33df5 100644
>>> --- v3.8.orig/include/linux/hugetlb.h
>>> +++ v3.8/include/linux/hugetlb.h
>>> @@ -69,6 +69,7 @@ void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
>>> int dequeue_hwpoisoned_huge_page(struct page *page);
>>> void putback_active_hugepage(struct page *page);
>>> void putback_active_hugepages(struct list_head *l);
>>> +void migrate_hugepage_add(struct page *page, struct list_head *list);
>>> void copy_huge_page(struct page *dst, struct page *src);
>>>
>>> extern unsigned long hugepages_treat_as_movable;
>>> @@ -88,8 +89,8 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
>>> pmd_t *pmd, int write);
>>> struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
>>> pud_t *pud, int write);
>>> -int pmd_huge(pmd_t pmd);
>>> -int pud_huge(pud_t pmd);
>>> +extern int pmd_huge(pmd_t pmd);
>>> +extern int pud_huge(pud_t pmd);
>> extern is not needed here.
> OK.
>
>>> unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
>>> unsigned long address, unsigned long end, pgprot_t newprot);
>>>
>>> @@ -134,6 +135,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct page *page)
>>>
>>> #define putback_active_hugepage(p) 0
>>> #define putback_active_hugepages(l) 0
>>> +#define migrate_hugepage_add(p, l) 0
>>> static inline void copy_huge_page(struct page *dst, struct page *src)
>>> {
>>> }
>>> diff --git v3.8.orig/mm/hugetlb.c v3.8/mm/hugetlb.c
>>> index cb9d43b8..86ffcb7 100644
>>> --- v3.8.orig/mm/hugetlb.c
>>> +++ v3.8/mm/hugetlb.c
>>> @@ -3202,3 +3202,13 @@ void putback_active_hugepages(struct list_head *l)
>>> list_for_each_entry_safe(page, page2, l, lru)
>>> putback_active_hugepage(page);
>>> }
>>> +
>>> +void migrate_hugepage_add(struct page *page, struct list_head *list)
>>> +{
>>> + VM_BUG_ON(!PageHuge(page));
>>> + get_page(page);
>>> + spin_lock(&hugetlb_lock);
>> Why hugetlb_lock? Comment for this lock says that it protects
>> hugepage_freelists, nr_huge_pages, and free_huge_pages.
> I think that this comment is out of date and hugepage_activelists,
> which was introduced recently, should be protected because this
> patchset adds is_hugepage_movable() which runs through the list.
> So I'll update the comment in the next post.
>
>>> + list_move_tail(&page->lru, list);
>>> + spin_unlock(&hugetlb_lock);
>>> + return;
>>> +}
>>> diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c
>>> index e2df1c1..8627135 100644
>>> --- v3.8.orig/mm/mempolicy.c
>>> +++ v3.8/mm/mempolicy.c
>>> @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
>>> return addr != end;
>>> }
>>>
>>> +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
>>> + const nodemask_t *nodes, unsigned long flags,
>>> + void *private)
>>> +{
>>> +#ifdef CONFIG_HUGETLB_PAGE
>>> + int nid;
>>> + struct page *page;
>>> +
>>> + spin_lock(&vma->vm_mm->page_table_lock);
>>> + page = pte_page(huge_ptep_get((pte_t *)pmd));
>>> + spin_unlock(&vma->vm_mm->page_table_lock);
>> I am a bit confused why page_table_lock is used here and why it doesn't
>> cover the page usage.
> I expected this function to do the same for pmd as check_pte_range() does
> for pte, but the above code didn't do it. I should've put spin_unlock
> below migrate_hugepage_add(). Sorry for the confusion.

I still confuse! Could you explain more in details?

>
>>> + nid = page_to_nid(page);
>>> + if (node_isset(nid, *nodes) != !!(flags & MPOL_MF_INVERT)
>>> + && ((flags & MPOL_MF_MOVE && page_mapcount(page) == 1)
>>> + || flags & MPOL_MF_MOVE_ALL))
>>> + migrate_hugepage_add(page, private);
>>> +#else
>>> + BUG();
>>> +#endif
>>> +}
>>> +
>>> static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
>>> unsigned long addr, unsigned long end,
>>> const nodemask_t *nodes, unsigned long flags,
>>> @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
>>> pmd = pmd_offset(pud, addr);
>>> do {
>>> next = pmd_addr_end(addr, end);
>>> + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) {
>> Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge()
>> sufficient?
> I think we need both check here because if we use only pmd_huge(),
> pmd for thp goes into this branch wrongly.
>
> Thanks,
> Naoya
>
>>> + check_hugetlb_pmd_range(vma, pmd, nodes,
>>> + flags, private);
>>> + continue;
>>> + }
>>> split_huge_page_pmd(vma, addr, pmd);
>>> if (pmd_none_or_trans_huge_or_clear_bad(pmd))
>>> continue;
>> [...]
>> --
>> Michal Hocko
>> SUSE Labs
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-03-20 06:13:55

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage

On Tue, Mar 19, 2013 at 08:11:13AM +0100, Michal Hocko wrote:
> On Mon 18-03-13 20:07:16, Naoya Horiguchi wrote:
> > On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote:
> > > On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote:
...
> > > > @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> > > > pmd = pmd_offset(pud, addr);
> > > > do {
> > > > next = pmd_addr_end(addr, end);
> > > > + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) {
> > >
> > > Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge()
> > > sufficient?
> >
> > I think we need both check here because if we use only pmd_huge(),
> > pmd for thp goes into this branch wrongly.
>
> Bahh. You are right. I thought that pmd_huge is hugetlb thingy but it
> obviously checks only _PAGE_PSE same as pmd_large() which is really
> unfortunate and confusing. Can we make it hugetlb specific?

I agree that we had better fix this confusion.

What pmd_huge() (or pmd_large() in some architectures) does is just
checking whether a given pmd is pointing to huge/large page or not.
It does not say which type of hugepage it is.
So it shouldn't be used to decide whether the hugepage are hugetlbfs or not.
I think it would be better to introduce pmd_hugetlb() which has pmd and vma
as arguments and returns true only for hugetlbfs pmd.
Checking pmd_hugetlb() should come before checking pmd_trans_huge() because
pmd_trans_huge() implicitly assumes that the vma which covers the virtual
address of a given pmd is not hugetlbfs vma.

I'm interested in this cleanup, so will work on it after this patchset.

Thanks,
Naoya

2013-03-20 07:41:23

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage

On Wed 20-03-13 02:12:54, Naoya Horiguchi wrote:
> On Tue, Mar 19, 2013 at 08:11:13AM +0100, Michal Hocko wrote:
> > On Mon 18-03-13 20:07:16, Naoya Horiguchi wrote:
> > > On Mon, Mar 18, 2013 at 04:40:57PM +0100, Michal Hocko wrote:
> > > > On Thu 21-02-13 14:41:44, Naoya Horiguchi wrote:
> ...
> > > > > @@ -536,6 +557,11 @@ static inline int check_pmd_range(struct vm_area_struct *vma, pud_t *pud,
> > > > > pmd = pmd_offset(pud, addr);
> > > > > do {
> > > > > next = pmd_addr_end(addr, end);
> > > > > + if (pmd_huge(*pmd) && is_vm_hugetlb_page(vma)) {
> > > >
> > > > Why an explicit check for is_vm_hugetlb_page here? Isn't pmd_huge()
> > > > sufficient?
> > >
> > > I think we need both check here because if we use only pmd_huge(),
> > > pmd for thp goes into this branch wrongly.
> >
> > Bahh. You are right. I thought that pmd_huge is hugetlb thingy but it
> > obviously checks only _PAGE_PSE same as pmd_large() which is really
> > unfortunate and confusing. Can we make it hugetlb specific?
>
> I agree that we had better fix this confusion.
>
> What pmd_huge() (or pmd_large() in some architectures) does is just
> checking whether a given pmd is pointing to huge/large page or not.
> It does not say which type of hugepage it is.
> So it shouldn't be used to decide whether the hugepage are hugetlbfs or not.
> I think it would be better to introduce pmd_hugetlb() which has pmd and vma
> as arguments and returns true only for hugetlbfs pmd.
> Checking pmd_hugetlb() should come before checking pmd_trans_huge() because
> pmd_trans_huge() implicitly assumes that the vma which covers the virtual
> address of a given pmd is not hugetlbfs vma.
>
> I'm interested in this cleanup, so will work on it after this patchset.

pnd_huge is used only at few places so it shouldn't be very big. On the
other hand you do not have vma always available so it is getting tricky.

Thanks
--
Michal Hocko
SUSE Labs

2013-03-20 22:01:05

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage

On Wed, Mar 20, 2013 at 08:31:06AM +0800, Simon Jeons wrote:
...
> >>> diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c
> >>> index e2df1c1..8627135 100644
> >>> --- v3.8.orig/mm/mempolicy.c
> >>> +++ v3.8/mm/mempolicy.c
> >>> @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
> >>> return addr != end;
> >>> }
> >>>
> >>> +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
> >>> + const nodemask_t *nodes, unsigned long flags,
> >>> + void *private)
> >>> +{
> >>> +#ifdef CONFIG_HUGETLB_PAGE
> >>> + int nid;
> >>> + struct page *page;
> >>> +
> >>> + spin_lock(&vma->vm_mm->page_table_lock);
> >>> + page = pte_page(huge_ptep_get((pte_t *)pmd));
> >>> + spin_unlock(&vma->vm_mm->page_table_lock);
> >> I am a bit confused why page_table_lock is used here and why it doesn't
> >> cover the page usage.
> > I expected this function to do the same for pmd as check_pte_range() does
> > for pte, but the above code didn't do it. I should've put spin_unlock
> > below migrate_hugepage_add(). Sorry for the confusion.
>
> I still confuse! Could you explain more in details?

With the above code, check_hugetlb_pmd_range() checks page_mapcount
outside the page table lock, but mapcount can be decremented by
__unmap_hugepage_range(), so there's a race.
__unmap_hugepage_range() calls page_remove_rmap() inside page table lock,
so we can avoid this race by doing whole check_hugetlb_pmd_range()'s work
inside the page table lock.

Thanks,
Naoya

2013-03-21 00:06:14

by Simon Jeons

[permalink] [raw]
Subject: Re: [PATCH 5/9] migrate: enable migrate_pages() to migrate hugepage

Hi Naoya,
On 03/21/2013 05:59 AM, Naoya Horiguchi wrote:
> On Wed, Mar 20, 2013 at 08:31:06AM +0800, Simon Jeons wrote:
> ...
>>>>> diff --git v3.8.orig/mm/mempolicy.c v3.8/mm/mempolicy.c
>>>>> index e2df1c1..8627135 100644
>>>>> --- v3.8.orig/mm/mempolicy.c
>>>>> +++ v3.8/mm/mempolicy.c
>>>>> @@ -525,6 +525,27 @@ static int check_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
>>>>> return addr != end;
>>>>> }
>>>>>
>>>>> +static void check_hugetlb_pmd_range(struct vm_area_struct *vma, pmd_t *pmd,
>>>>> + const nodemask_t *nodes, unsigned long flags,
>>>>> + void *private)
>>>>> +{
>>>>> +#ifdef CONFIG_HUGETLB_PAGE
>>>>> + int nid;
>>>>> + struct page *page;
>>>>> +
>>>>> + spin_lock(&vma->vm_mm->page_table_lock);
>>>>> + page = pte_page(huge_ptep_get((pte_t *)pmd));
>>>>> + spin_unlock(&vma->vm_mm->page_table_lock);
>>>> I am a bit confused why page_table_lock is used here and why it doesn't
>>>> cover the page usage.
>>> I expected this function to do the same for pmd as check_pte_range() does
>>> for pte, but the above code didn't do it. I should've put spin_unlock
>>> below migrate_hugepage_add(). Sorry for the confusion.
>> I still confuse! Could you explain more in details?
> With the above code, check_hugetlb_pmd_range() checks page_mapcount
> outside the page table lock, but mapcount can be decremented by
> __unmap_hugepage_range(), so there's a race.
> __unmap_hugepage_range() calls page_remove_rmap() inside page table lock,
> so we can avoid this race by doing whole check_hugetlb_pmd_range()'s work
> inside the page table lock.

Why you use page_table_lock instead of split ptlock to protect 2MB?

>
> Thanks,
> Naoya