2013-04-01 17:22:40

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v2 0/2] fix hugepage coredump

Hi,

Here is 2nd version of hugepage coredump fix.
See individual patches for more details.

Thanks,
Naoya Horiguchi


2013-04-01 17:22:41

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v2 2/2] hugetlbfs: add swap entry check in follow_hugetlb_page()

With applying the previous patch "hugetlbfs: stop setting VM_DONTDUMP in
initializing vma(VM_HUGETLB)" to reenable hugepage coredump, if a memory
error happens on a hugepage and the affected processes try to access
the error hugepage, we hit VM_BUG_ON(atomic_read(&page->_count) <= 0)
in get_page().

The reason for this bug is that coredump-related code doesn't recognise
"hugepage hwpoison entry" with which a pmd entry is replaced when a memory
error occurs on a hugepage.
In other words, physical address information is stored in different bit layout
between hugepage hwpoison entry and pmd entry, so follow_hugetlb_page()
which is called in get_dump_page() returns a wrong page from a given address.

We need to filter out only hwpoison hugepages to have data on healthy
hugepages in coredump. So this patch makes follow_hugetlb_page() avoid
trying to get page when a pmd is in swap entry like format.

Signed-off-by: Naoya Horiguchi <[email protected]>
Cc: [email protected]
---
mm/hugetlb.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git v3.9-rc3.orig/mm/hugetlb.c v3.9-rc3/mm/hugetlb.c
index 0d1705b..8462e2c 100644
--- v3.9-rc3.orig/mm/hugetlb.c
+++ v3.9-rc3/mm/hugetlb.c
@@ -2968,7 +2968,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
* first, for the page indexing below to work.
*/
pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
- absent = !pte || huge_pte_none(huge_ptep_get(pte));
+ absent = !pte || huge_pte_none(huge_ptep_get(pte)) ||
+ is_swap_pte(huge_ptep_get(pte));

/*
* When coredumping, it suits get_dump_page if we just return
--
1.7.11.7

2013-04-01 17:22:40

by Naoya Horiguchi

[permalink] [raw]
Subject: [PATCH v2 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)

Currently we fail to include any data on hugepages into coredump,
because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
mm->reserved_vm counter". This looks to me a serious regression,
so let's fix it.

ChangeLog v2:
- add 'return 0' in hugepage memory check

Signed-off-by: Naoya Horiguchi <[email protected]>
Acked-by: Konstantin Khlebnikov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: [email protected]
---
fs/binfmt_elf.c | 1 +
fs/hugetlbfs/inode.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git v3.9-rc3.orig/fs/binfmt_elf.c v3.9-rc3/fs/binfmt_elf.c
index 3939829..86af964 100644
--- v3.9-rc3.orig/fs/binfmt_elf.c
+++ v3.9-rc3/fs/binfmt_elf.c
@@ -1137,6 +1137,7 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
goto whole;
if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
goto whole;
+ return 0;
}

/* Do not dump I/O mapped devices or special mappings */
diff --git v3.9-rc3.orig/fs/hugetlbfs/inode.c v3.9-rc3/fs/hugetlbfs/inode.c
index 84e3d85..523464e 100644
--- v3.9-rc3.orig/fs/hugetlbfs/inode.c
+++ v3.9-rc3/fs/hugetlbfs/inode.c
@@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
* way when do_mmap_pgoff unwinds (may be important on powerpc
* and ia64).
*/
- vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND | VM_DONTDUMP;
+ vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
vma->vm_ops = &hugetlb_vm_ops;

if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
--
1.7.11.7

2013-04-02 05:34:35

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] fix hugepage coredump

Naoya Horiguchi wrote:
> Hi,
>
> Here is 2nd version of hugepage coredump fix.
> See individual patches for more details.
>
> Thanks,
> Naoya Horiguchi

ACK to both patches


VM_* bits cleanup patchset was merged into v3.7, so only two recent stable kernels needs this fix.

2013-04-02 14:08:29

by Naoya Horiguchi

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)

On Tue, Apr 02, 2013 at 08:32:33PM +0900, HATAYAMA Daisuke wrote:
> 2013/4/2 Naoya Horiguchi <[email protected]>
>
> > Currently we fail to include any data on hugepages into coredump,
> > because VM_DONTDUMP is set on hugetlbfs's vma. This behavior was recently
> > introduced by commit 314e51b98 "mm: kill vma flag VM_RESERVED and
> > mm->reserved_vm counter". This looks to me a serious regression,
> > so let's fix it.
> >
> > ChangeLog v2:
> > - add 'return 0' in hugepage memory check
> >
> <cut>
>
> > @@ -1137,6 +1137,7 @@ static unsigned long vma_dump_size(struct
> > vm_area_struct *vma,
> > goto whole;
> > if (!(vma->vm_flags & VM_SHARED) &&
> > FILTER(HUGETLB_PRIVATE))
> > goto whole;
> > + return 0;
> > }
> >
>
> You should split this part into another patch. This fix is orthogonal to
> the bug this patch tries to fix.

Fair enough, thanks.

> The bug you're trying to fix implicitly here is the filtering behaviour
> that doesn't follow
> the description in Documentation/filesystems/proc.txt that:
>
> Note bit 0-4 doesn't effect any hugetlb memory. hugetlb memory are only
> effected by bit 5-6.
>
> Right?

Right. Without this return, we will go into the subsequent flag checks
of bit 0-4 for vma(VM_HUGETLB).

Thanks,
Naoya Horiguchi