LinuxLists.cc - [PATCH 0/2] fix hugepage coredump

2013-03-28 15:43:59

Subject: [PATCH 0/2] fix hugepage coredump

Hi,

This small patch series fixes problems on hugepage coredump,
where we cannot include any data on hugepages into coredump.
See individual patches for more details.

Thanks,
Naoya Horiguchi

2013-03-28 15:43:58

by Naoya Horiguchi

[permalink] [raw]

Subject: [PATCH 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page()

With applying the previous patch "hugetlbfs: stop setting VM_DONTDUMP in
initializing vma(VM_HUGETLB)" to reenable hugepage coredump, if a memory
error happens on a hugepage and the affected processes try to access
the error hugepage, we hit VM_BUG_ON(atomic_read(&page->_count) <= 0)
in get_page().

The reason for this bug is that coredump-related code doesn't recognise
"hugepage hwpoison entry" with which a pmd entry is replaced when a memory
error occurs on a hugepage.
In other words, physical address information is stored in different bit layout
between hugepage hwpoison entry and pmd entry, so follow_hugetlb_page()
which is called in get_dump_page() returns a wrong page from a given address.

We need to filter out only hwpoison hugepages to have data on healthy
hugepages in coredump. So this patch makes follow_hugetlb_page() avoid
trying to get page when a pmd is in swap entry like format.

Signed-off-by: Naoya Horiguchi <[email protected]>
---
mm/hugetlb.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git v3.9-rc3.orig/mm/hugetlb.c v3.9-rc3/mm/hugetlb.c
index 0d1705b..8462e2c 100644
--- v3.9-rc3.orig/mm/hugetlb.c
+++ v3.9-rc3/mm/hugetlb.c
@@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
* first, for the page indexing below to work.
*/
pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
- absent = !pte || huge_pte_none(huge_ptep_get(pte));
+ absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
+ is_swap_pte(huge_ptep_get(pte));

/*
* When coredumping, it suits get_dump_page if we just return
--
1.7.11.7

2013-03-28 15:43:57

by Naoya Horiguchi

[permalink] [raw]

Subject: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)

Currently we fail to include any data on hugepages into coredump,
because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
mm->reserved_vm counter". This looks to me a serious regression,
so let's fix it.

Signed-off-by: Naoya Horiguchi <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
---
fs/hugetlbfs/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git v3.9-rc3.orig/fs/hugetlbfs/inode.c v3.9-rc3/fs/hugetlbfs/inode.c
index 84e3d85..523464e 100644
--- v3.9-rc3.orig/fs/hugetlbfs/inode.c
+++ v3.9-rc3/fs/hugetlbfs/inode.c
@@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
* way when do_mmap_pgoff unwinds (may be important on powerpc
* and ia64).
*/
- vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
+ vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
vma->vm_ops = &hugetlb_vm_ops;

if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
--
1.7.11.7

2013-03-28 15:51:15

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)

On Thu, Mar 28, 2013 at 11:42:37AM -0400, Naoya Horiguchi wrote:
> Currently we fail to include any data on hugepages into coredump,
> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> mm->reserved_vm counter". This looks to me a serious regression,
> so let's fix it.
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> Cc: Konstantin Khlebnikov <[email protected]>
> ---
> fs/hugetlbfs/inode.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

</formletter>

2013-03-28 16:05:26

by Naoya Horiguchi

[permalink] [raw]

Subject: Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)

On Thu, Mar 28, 2013 at 08:51:09AM -0700, Greg KH wrote:
> On Thu, Mar 28, 2013 at 11:42:37AM -0400, Naoya Horiguchi wrote:
> > Currently we fail to include any data on hugepages into coredump,
> > because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> > introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> > mm->reserved_vm counter". This looks to me a serious regression,
> > so let's fix it.
> >
> > Signed-off-by: Naoya Horiguchi <[email protected]>
> > Cc: Konstantin Khlebnikov <[email protected]>
> > ---
> > fs/hugetlbfs/inode.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
>
>
> <formletter>
>
> This is not the correct way to submit patches for inclusion in the
> stable kernel tree. Please read Documentation/stable_kernel_rules.txt
> for how to do this properly.
>
> </formletter>

I guess you mean this patch violates one/both of these rules:

- It must fix a problem that causes a build error (but not for things
marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
security issue, or some "oh, that's not good" issue. In short, something
critical.
- It or an equivalent fix must already exist in Linus' tree (upstream).

I'm not sure if the problem "we can't get any hugepage in coredump"
is considered as 'some "oh, that's not good" issue'.
But yes, it's not a critical one.
If you mean I violated the second rule, sorry, I'll get it into upstream first.

Thanks,
Naoya

2013-03-28 17:03:25

by Konstantin Khlebnikov

[permalink] [raw]

Subject: Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)

Naoya Horiguchi wrote:
> Currently we fail to include any data on hugepages into coredump,
> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> mm->reserved_vm counter". This looks to me a serious regression,
> so let's fix it.

That was introduced in my patch? Really?
Here was VM_RESERVED and it had the same effect as VM_DONTDUMP. At least I thought so.

>
> Signed-off-by: Naoya Horiguchi<[email protected]>
> Cc: Konstantin Khlebnikov<[email protected]>
> ---
> fs/hugetlbfs/inode.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git v3.9-rc3.orig/fs/hugetlbfs/inode.c v3.9-rc3/fs/hugetlbfs/inode.c
> index 84e3d85..523464e 100644
> --- v3.9-rc3.orig/fs/hugetlbfs/inode.c
> +++ v3.9-rc3/fs/hugetlbfs/inode.c
> @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
> * way when do_mmap_pgoff unwinds (may be important on powerpc
> * and ia64).
> */
> - vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
> + vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
> vma->vm_ops =&hugetlb_vm_ops;
>
> if (vma->vm_pgoff& (~huge_page_mask(h)>> PAGE_SHIFT))

2013-03-28 18:30:23

by Naoya Horiguchi

[permalink] [raw]

Subject: Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)

On Thu, Mar 28, 2013 at 09:03:16PM +0400, Konstantin Khlebnikov wrote:
> Naoya Horiguchi wrote:
> >Currently we fail to include any data on hugepages into coredump,
> >because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> >introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> >mm->reserved_vm counter". This looks to me a serious regression,
> >so let's fix it.
>
> That was introduced in my patch? Really?
> Here was VM_RESERVED and it had the same effect as VM_DONTDUMP. At least I thought so.

vma_dump_size() does like this (the diff is the one in 314e51b98):

static unsigned long vma_dump_size(struct vm_area_struct *vma,
unsigned long mm_flags)
{
#define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type))

/* always dump the vdso and vsyscall sections */
if (always_dump_vma(vma))
goto whole;

if (vma->vm_flags & VM_DONTDUMP)
return 0;

/* Hugetlb memory check */
if (vma->vm_flags & VM_HUGETLB) {
if ((vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_SHARED))
goto whole;
if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
goto whole;
}

/* Do not dump I/O mapped devices or special mappings */
- if (vma->vm_flags & (VM_IO | VM_RESERVED))
+ if (vma->vm_flags & VM_IO)
return 0;

We have hugetlb memory check after VM_DONTDUMP check, so the following
changed the behavior.

--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
* way when do_mmap_pgoff unwinds (may be important on powerpc
* and ia64).
*/
- vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
+ vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
vma->vm_ops = &hugetlb_vm_ops;

if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))

I think we don't have to set VM_DONTDUMP on hugetlbfs's vma.

Thanks,
Naoya

2013-03-28 19:39:28

by Ben Hutchings

[permalink] [raw]

Subject: Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)

On Thu, Mar 28, 2013 at 12:04:29PM -0400, Naoya Horiguchi wrote:
[...]
> I guess you mean this patch violates one/both of these rules:
>
> - It must fix a problem that causes a build error (but not for things
> marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
> security issue, or some "oh, that's not good" issue. In short, something
> critical.
> - It or an equivalent fix must already exist in Linus' tree (upstream).
>
> I'm not sure if the problem "we can't get any hugepage in coredump"
> is considered as 'some "oh, that's not good" issue'.
> But yes, it's not a critical one.
> If you mean I violated the second rule, sorry, I'll get it into upstream first.

The second rule is the clear one. If you are submitting a patch to
a subsystem maintainer and you want it to go into stable branches as
well, you must put 'Cc: [email protected]' in the commit message,
not just the mail header.

Ben.

--
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
- Albert Camus

2013-03-28 19:48:36

by Naoya Horiguchi

[permalink] [raw]

Subject: Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)

On Thu, Mar 28, 2013 at 07:39:01PM +0000, Ben Hutchings wrote:
> On Thu, Mar 28, 2013 at 12:04:29PM -0400, Naoya Horiguchi wrote:
> [...]
> > I guess you mean this patch violates one/both of these rules:
> >
> > - It must fix a problem that causes a build error (but not for things
> > marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
> > security issue, or some "oh, that's not good" issue. In short, something
> > critical.
> > - It or an equivalent fix must already exist in Linus' tree (upstream).
> >
> > I'm not sure if the problem "we can't get any hugepage in coredump"
> > is considered as 'some "oh, that's not good" issue'.
> > But yes, it's not a critical one.
> > If you mean I violated the second rule, sorry, I'll get it into upstream first.
>
> The second rule is the clear one. If you are submitting a patch to
> a subsystem maintainer and you want it to go into stable branches as
> well, you must put 'Cc: [email protected]' in the commit message,
> not just the mail header.

Got it. Thank you.
Naoya

2013-03-29 05:30:24

by Konstantin Khlebnikov

[permalink] [raw]

Subject: Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)

Naoya Horiguchi wrote:
> On Thu, Mar 28, 2013 at 09:03:16PM +0400, Konstantin Khlebnikov wrote:
>> Naoya Horiguchi wrote:
>>> Currently we fail to include any data on hugepages into coredump,
>>> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
>>> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
>>> mm->reserved_vm counter". This looks to me a serious regression,
>>> so let's fix it.
>>
>> That was introduced in my patch? Really?
>> Here was VM_RESERVED and it had the same effect as VM_DONTDUMP. At least I thought so.
>
> vma_dump_size() does like this (the diff is the one in 314e51b98):
>
> static unsigned long vma_dump_size(struct vm_area_struct *vma,
> unsigned long mm_flags)
> {
> #define FILTER(type) (mm_flags& (1UL<< MMF_DUMP_##type))
>
> /* always dump the vdso and vsyscall sections */
> if (always_dump_vma(vma))
> goto whole;
>
> if (vma->vm_flags& VM_DONTDUMP)
> return 0;
>
> /* Hugetlb memory check */
> if (vma->vm_flags& VM_HUGETLB) {
> if ((vma->vm_flags& VM_SHARED)&& FILTER(HUGETLB_SHARED))
> goto whole;
> if (!(vma->vm_flags& VM_SHARED)&& FILTER(HUGETLB_PRIVATE))
> goto whole;
> }
>
> /* Do not dump I/O mapped devices or special mappings */
> - if (vma->vm_flags& (VM_IO | VM_RESERVED))
> + if (vma->vm_flags& VM_IO)
> return 0;
>
> We have hugetlb memory check after VM_DONTDUMP check, so the following
> changed the behavior.

Ok, I missed this in my patch.

>
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
> * way when do_mmap_pgoff unwinds (may be important on powerpc
> * and ia64).
> */
> - vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
> + vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
> vma->vm_ops =&hugetlb_vm_ops;
>
> if (vma->vm_pgoff& (~huge_page_mask(h)>> PAGE_SHIFT))
>
> I think we don't have to set VM_DONTDUMP on hugetlbfs's vma.

Acked-by: Konstantin Khlebnikov <[email protected]>

>
> Thanks,
> Naoya

2013-03-29 12:09:42

by Konstantin Khlebnikov

[permalink] [raw]

Subject: Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)

Konstantin Khlebnikov wrote:
> Naoya Horiguchi wrote:
>> On Thu, Mar 28, 2013 at 09:03:16PM +0400, Konstantin Khlebnikov wrote:
>>> Naoya Horiguchi wrote:
>>>> Currently we fail to include any data on hugepages into coredump,
>>>> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
>>>> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
>>>> mm->reserved_vm counter". This looks to me a serious regression,
>>>> so let's fix it.
>>>
>>> That was introduced in my patch? Really?
>>> Here was VM_RESERVED and it had the same effect as VM_DONTDUMP. At least I thought so.
>>
>> vma_dump_size() does like this (the diff is the one in 314e51b98):
>>
>> static unsigned long vma_dump_size(struct vm_area_struct *vma,
>> unsigned long mm_flags)
>> {
>> #define FILTER(type) (mm_flags& (1UL<< MMF_DUMP_##type))
>>
>> /* always dump the vdso and vsyscall sections */
>> if (always_dump_vma(vma))
>> goto whole;
>>
>> if (vma->vm_flags& VM_DONTDUMP)
>> return 0;
>>
>> /* Hugetlb memory check */
>> if (vma->vm_flags& VM_HUGETLB) {
>> if ((vma->vm_flags& VM_SHARED)&& FILTER(HUGETLB_SHARED))
>> goto whole;
>> if (!(vma->vm_flags& VM_SHARED)&& FILTER(HUGETLB_PRIVATE))
>> goto whole;
>> }
>>
>> /* Do not dump I/O mapped devices or special mappings */
>> - if (vma->vm_flags& (VM_IO | VM_RESERVED))
>> + if (vma->vm_flags& VM_IO)
>> return 0;
>>
>> We have hugetlb memory check after VM_DONTDUMP check, so the following
>> changed the behavior.
>
> Ok, I missed this in my patch.
>
>>
>> --- a/fs/hugetlbfs/inode.c
>> +++ b/fs/hugetlbfs/inode.c
>> @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
>> * way when do_mmap_pgoff unwinds (may be important on powerpc
>> * and ia64).
>> */
>> - vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
>> + vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
>> vma->vm_ops =&hugetlb_vm_ops;
>>
>> if (vma->vm_pgoff& (~huge_page_mask(h)>> PAGE_SHIFT))
>>
>> I think we don't have to set VM_DONTDUMP on hugetlbfs's vma.
>
> Acked-by: Konstantin Khlebnikov<[email protected]>

hugetlb coredump filter also should be fixed in this way:

--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1154,6 +1154,7 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
goto whole;
if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
goto whole;
+ return 0;
}

/* Do not dump I/O mapped devices or special mappings */

>
>>
>> Thanks,
>> Naoya
>

2013-03-29 13:47:32

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)

On Thu 28-03-13 11:42:37, Naoya Horiguchi wrote:
> Currently we fail to include any data on hugepages into coredump,
> because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> mm->reserved_vm counter". This looks to me a serious regression,
> so let's fix it.
>
> Signed-off-by: Naoya Horiguchi <[email protected]>
> Cc: Konstantin Khlebnikov <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> fs/hugetlbfs/inode.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git v3.9-rc3.orig/fs/hugetlbfs/inode.c v3.9-rc3/fs/hugetlbfs/inode.c
> index 84e3d85..523464e 100644
> --- v3.9-rc3.orig/fs/hugetlbfs/inode.c
> +++ v3.9-rc3/fs/hugetlbfs/inode.c
> @@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
> * way when do_mmap_pgoff unwinds (may be important on powerpc
> * and ia64).
> */
> - vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
> + vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
> vma->vm_ops = &hugetlb_vm_ops;
>
> if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
> --
> 1.7.11.7
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

--
Michal Hocko
SUSE Labs

2013-03-29 13:57:34

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page()

On Thu 28-03-13 11:42:38, Naoya Horiguchi wrote:
[...]
> @@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> * first, for the page indexing below to work.
> */
> pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
> - absent = !pte || huge_pte_none(huge_ptep_get(pte));
> + absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> + is_swap_pte(huge_ptep_get(pte));

is_swap_pte doesn't seem right. Shouldn't you use is_hugetlb_entry_hwpoisoned
instead?

--
Michal Hocko
SUSE Labs

2013-03-29 17:00:22

by Naoya Horiguchi

[permalink] [raw]

Subject: Re: [PATCH 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)

On Fri, Mar 29, 2013 at 04:09:36PM +0400, Konstantin Khlebnikov wrote:
...
> hugetlb coredump filter also should be fixed in this way:
>
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1154,6 +1154,7 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
> goto whole;
> if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
> goto whole;
> + return 0;
> }

OK, I'll add it.

Thanks,
Naoya

2013-03-29 17:25:11

by Naoya Horiguchi

[permalink] [raw]

Subject: Re: [PATCH 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page()

Hi,

On Fri, Mar 29, 2013 at 02:57:30PM +0100, Michal Hocko wrote:
> On Thu 28-03-13 11:42:38, Naoya Horiguchi wrote:
> [...]
> > @@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> > * first, for the page indexing below to work.
> > */
> > pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
> > - absent = !pte || huge_pte_none(huge_ptep_get(pte));
> > + absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> > + is_swap_pte(huge_ptep_get(pte));
>
> is_swap_pte doesn't seem right. Shouldn't you use is_hugetlb_entry_hwpoisoned
> instead?

I tested only hwpoisoned hugepage, but the same can happen for hugepages
under migration. So I intended to filter out all types of swap entries.
The local variable 'absent' seems to mean whether data on the address
is immediately available, so swap type entry isn't included in it.

Thanks,
Naoya

2013-04-02 09:24:48

by Michal Hocko

[permalink] [raw]

Subject: Re: [PATCH 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page()

On Fri 29-03-13 13:23:38, Naoya Horiguchi wrote:
> Hi,
>
> On Fri, Mar 29, 2013 at 02:57:30PM +0100, Michal Hocko wrote:
> > On Thu 28-03-13 11:42:38, Naoya Horiguchi wrote:
> > [...]
> > > @@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> > > * first, for the page indexing below to work.
> > > */
> > > pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
> > > - absent = !pte || huge_pte_none(huge_ptep_get(pte));
> > > + absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> > > + is_swap_pte(huge_ptep_get(pte));
> >
> > is_swap_pte doesn't seem right. Shouldn't you use is_hugetlb_entry_hwpoisoned
> > instead?
>
> I tested only hwpoisoned hugepage, but the same can happen for hugepages
> under migration. So I intended to filter out all types of swap entries.
> The local variable 'absent' seems to mean whether data on the address
> is immediately available, so swap type entry isn't included in it.

OK, I didn't consider huge pages under migration and I was merely worried
that is_hugetlb_entry_hwpoisoned sounds more appropriate than
is_swap_pte.

Could you add a comment which would clarify that is_swap_pte covers both
migration and hwpoison pages, please? Something like:

/*
* is_swap_pte test covers both is_hugetlb_entry_hwpoisoned
* and hugepages under migration in which case
* hugetlb_fault waits for the migration and bails out
* properly for HWPosined pages.
*/
absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
is_swap_pte(huge_ptep_get(pte));

Other than that feel free to add
Reviewed-by: Michal Hocko <[email protected]>

Thanks.
--
Michal Hocko
SUSE Labs

2013-04-02 14:37:31

by Naoya Horiguchi

[permalink] [raw]

Subject: Re: [PATCH 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page()

On Tue, Apr 02, 2013 at 11:24:41AM +0200, Michal Hocko wrote:
> On Fri 29-03-13 13:23:38, Naoya Horiguchi wrote:
> > Hi,
> >
> > On Fri, Mar 29, 2013 at 02:57:30PM +0100, Michal Hocko wrote:
> > > On Thu 28-03-13 11:42:38, Naoya Horiguchi wrote:
> > > [...]
> > > > @@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> > > > * first, for the page indexing below to work.
> > > > */
> > > > pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
> > > > - absent = !pte || huge_pte_none(huge_ptep_get(pte));
> > > > + absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> > > > + is_swap_pte(huge_ptep_get(pte));
> > >
> > > is_swap_pte doesn't seem right. Shouldn't you use is_hugetlb_entry_hwpoisoned
> > > instead?
> >
> > I tested only hwpoisoned hugepage, but the same can happen for hugepages
> > under migration. So I intended to filter out all types of swap entries.
> > The local variable 'absent' seems to mean whether data on the address
> > is immediately available, so swap type entry isn't included in it.
>
> OK, I didn't consider huge pages under migration and I was merely worried
> that is_hugetlb_entry_hwpoisoned sounds more appropriate than
> is_swap_pte.
>
> Could you add a comment which would clarify that is_swap_pte covers both
> migration and hwpoison pages, please? Something like:
>
> /*
> * is_swap_pte test covers both is_hugetlb_entry_hwpoisoned
> * and hugepages under migration in which case
> * hugetlb_fault waits for the migration and bails out
> * properly for HWPosined pages.
> */
> absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
> is_swap_pte(huge_ptep_get(pte));

OK, I'll add this.

> Other than that feel free to add
> Reviewed-by: Michal Hocko <[email protected]>

Thank you!
Naoya