2008-08-28 05:30:35

by KOSAKI Motohiro

[permalink] [raw]
Subject: [PATCH] coredump_filter: add hugepage core dumping

Now, hugepage's vma has VM_RESERVED flag because it cannot be swapped.

and VM_RESERVED vma isn't core dumped because its flag often be used for
kernel internal vma (e.g. vmalloc, sound related).

So, hugepage is never dumped and it indicate hugepages's program can't be debugged easily.

In these days, demand on making use of hugepage is increasing.
IMO, native support for coredump of hugepage is useful.


I think VM_RESERVED default dumping bahavior is good,
then I'd like to add coredump_filter mask.

This patch doesn't change dafault behavior.


I tested by following method.

# ulimit -c unlimited
# echo 0x23 > /proc/self/coredump_filter
# ./hugepage_dump
# gdb ./hugepage_dump core


hugepage_dump.c
------------------------------------------------
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>

#define HUGEPAGE_SIZE (256*1024*1024)

int main(int argc, char** argv)
{
int err;
int shmid;
int *pval;
int shm_flags = 0666;

if ((argc >= 2) && (strcmp(argv[1], "-h")==0))
shm_flags |= SHM_HUGETLB;

err = shmid = shmget(IPC_PRIVATE, HUGEPAGE_SIZE, shm_flags);
if (err < 0) {
perror("shmget");
exit(1);
}

pval = shmat(shmid, 0, 0);
if (pval == (void*)-1) {
perror("shmat");
exit(1);
}

*pval = 1;

*(int*)0 = 1;

exit(0);
}
-----------------------------------------------------


Signed-off-by: KOSAKI Motohiro <[email protected]>
CC: Kawai, Hidehiro <[email protected]>
CC: Hugh Dickins <[email protected]>
CC: William Irwin <[email protected]>
CC: Adam Litke <[email protected]>

---
Documentation/filesystems/proc.txt | 3 ++-
fs/binfmt_elf.c | 7 ++++++-
include/linux/sched.h | 3 ++-
3 files changed, 10 insertions(+), 3 deletions(-)

Index: b/Documentation/filesystems/proc.txt
===================================================================
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -2389,11 +2389,12 @@ will be dumped when the <pid> process is
of memory types. If a bit of the bitmask is set, memory segments of the
corresponding memory type are dumped, otherwise they are not dumped.

-The following 4 memory types are supported:
+The following 5 memory types are supported:
- (bit 0) anonymous private memory
- (bit 1) anonymous shared memory
- (bit 2) file-backed private memory
- (bit 3) file-backed shared memory
+ - (bit 5) hugetlb memory

Note that MMIO pages such as frame buffer are never dumped and vDSO pages
are always dumped regardless of the bitmask status.
Index: b/include/linux/sched.h
===================================================================
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -402,8 +402,9 @@ extern int get_dumpable(struct mm_struct
#define MMF_DUMP_MAPPED_PRIVATE 4
#define MMF_DUMP_MAPPED_SHARED 5
#define MMF_DUMP_ELF_HEADERS 6
+#define MMF_DUMP_HUGETLB 7
#define MMF_DUMP_FILTER_SHIFT MMF_DUMPABLE_BITS
-#define MMF_DUMP_FILTER_BITS 5
+#define MMF_DUMP_FILTER_BITS (MMF_DUMP_HUGETLB - MMF_DUMP_ANON_PRIVATE + 1)
#define MMF_DUMP_FILTER_MASK \
(((1 << MMF_DUMP_FILTER_BITS) - 1) << MMF_DUMP_FILTER_SHIFT)
#define MMF_DUMP_FILTER_DEFAULT \
Index: b/fs/binfmt_elf.c
===================================================================
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1160,11 +1160,16 @@ static unsigned long vma_dump_size(struc
if (vma->vm_flags & VM_ALWAYSDUMP)
goto whole;

+#define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type))
+
+ if ((vma->vm_flags & VM_HUGETLB) && FILTER(HUGETLB))
+ goto whole;
+
/* Do not dump I/O mapped devices or special mappings */
if (vma->vm_flags & (VM_IO | VM_RESERVED))
return 0;

-#define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type))

/* By default, dump shared memory if mapped from an anonymous file. */
if (vma->vm_flags & VM_SHARED) {


2008-08-28 14:48:55

by Adam Litke

[permalink] [raw]
Subject: Re: [PATCH] coredump_filter: add hugepage core dumping

On Thu, 2008-08-28 at 14:24 +0900, KOSAKI Motohiro wrote:
> Now, hugepage's vma has VM_RESERVED flag because it cannot be swapped.
>
> and VM_RESERVED vma isn't core dumped because its flag often be used for
> kernel internal vma (e.g. vmalloc, sound related).
>
> So, hugepage is never dumped and it indicate hugepages's program can't be debugged easily.
>
> In these days, demand on making use of hugepage is increasing.
> IMO, native support for coredump of hugepage is useful.
>
>
> I think VM_RESERVED default dumping bahavior is good,
> then I'd like to add coredump_filter mask.
>
> This patch doesn't change dafault behavior.

I have tested this using both the provided C program and with a program
that used libhugetlbfs to back its text and data segments with huge
pages. This seems to work as expected. From a hugetlbfs perspective,
this seems like a reasonable patch to me.

> Signed-off-by: KOSAKI Motohiro <[email protected]>
> CC: Kawai, Hidehiro <[email protected]>
> CC: Hugh Dickins <[email protected]>
> CC: William Irwin <[email protected]>
> CC: Adam Litke <[email protected]>

Tested-by: Adam Litke <[email protected]>
Acked-by: Adam Litke <[email protected]>

> ---
> Documentation/filesystems/proc.txt | 3 ++-
> fs/binfmt_elf.c | 7 ++++++-
> include/linux/sched.h | 3 ++-
> 3 files changed, 10 insertions(+), 3 deletions(-)
>
> Index: b/Documentation/filesystems/proc.txt
> ===================================================================
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -2389,11 +2389,12 @@ will be dumped when the <pid> process is
> of memory types. If a bit of the bitmask is set, memory segments of the
> corresponding memory type are dumped, otherwise they are not dumped.
>
> -The following 4 memory types are supported:
> +The following 5 memory types are supported:
> - (bit 0) anonymous private memory
> - (bit 1) anonymous shared memory
> - (bit 2) file-backed private memory
> - (bit 3) file-backed shared memory
> + - (bit 5) hugetlb memory
>
> Note that MMIO pages such as frame buffer are never dumped and vDSO pages
> are always dumped regardless of the bitmask status.
> Index: b/include/linux/sched.h
> ===================================================================
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -402,8 +402,9 @@ extern int get_dumpable(struct mm_struct
> #define MMF_DUMP_MAPPED_PRIVATE 4
> #define MMF_DUMP_MAPPED_SHARED 5
> #define MMF_DUMP_ELF_HEADERS 6
> +#define MMF_DUMP_HUGETLB 7
> #define MMF_DUMP_FILTER_SHIFT MMF_DUMPABLE_BITS
> -#define MMF_DUMP_FILTER_BITS 5
> +#define MMF_DUMP_FILTER_BITS (MMF_DUMP_HUGETLB - MMF_DUMP_ANON_PRIVATE + 1)
> #define MMF_DUMP_FILTER_MASK \
> (((1 << MMF_DUMP_FILTER_BITS) - 1) << MMF_DUMP_FILTER_SHIFT)
> #define MMF_DUMP_FILTER_DEFAULT \
> Index: b/fs/binfmt_elf.c
> ===================================================================
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1160,11 +1160,16 @@ static unsigned long vma_dump_size(struc
> if (vma->vm_flags & VM_ALWAYSDUMP)
> goto whole;
>
> +#define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type))
> +
> + if ((vma->vm_flags & VM_HUGETLB) && FILTER(HUGETLB))
> + goto whole;
> +
> /* Do not dump I/O mapped devices or special mappings */
> if (vma->vm_flags & (VM_IO | VM_RESERVED))
> return 0;
>
> -#define FILTER(type) (mm_flags & (1UL << MMF_DUMP_##type))
>
> /* By default, dump shared memory if mapped from an anonymous file. */
> if (vma->vm_flags & VM_SHARED) {
>
>
--
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

2008-08-28 14:59:19

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] coredump_filter: add hugepage core dumping

> I have tested this using both the provided C program and with a program
> that used libhugetlbfs to back its text and data segments with huge
> pages. This seems to work as expected. From a hugetlbfs perspective,
> this seems like a reasonable patch to me.

Thank you, Adam!

Yes, libhugetlbfs user increased in these days.
So, I believe debaggability of libhugetlbfs linked program is important.

>> Signed-off-by: KOSAKI Motohiro <[email protected]>
>> CC: Kawai, Hidehiro <[email protected]>
>> CC: Hugh Dickins <[email protected]>
>> CC: William Irwin <[email protected]>
>> CC: Adam Litke <[email protected]>
>
> Tested-by: Adam Litke <[email protected]>
> Acked-by: Adam Litke <[email protected]>
>

2008-08-28 16:40:41

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH] coredump_filter: add hugepage core dumping

On Thu, 28 Aug 2008, KOSAKI Motohiro wrote:
> Now, hugepage's vma has VM_RESERVED flag because it cannot be swapped.

(Ooh, don't get me started on what VM_RESERVED means or does not mean.)

>
> and VM_RESERVED vma isn't core dumped because its flag often be used for
> kernel internal vma (e.g. vmalloc, sound related).
>
> So, hugepage is never dumped and it indicate hugepages's program can't
> be debugged easily.
>
> In these days, demand on making use of hugepage is increasing.
> IMO, native support for coredump of hugepage is useful.
>
>
> I think VM_RESERVED default dumping bahavior is good,
> then I'd like to add coredump_filter mask.
>
> This patch doesn't change dafault behavior.

This seems very reasonable to me
(though I've little use for coredumps or hugepages myself).

One caution though: how well does it behave when coredumping a large
area of hugepages which have not actually been instantiated prior to
the coredump? We have that funny FOLL_ANON ZERO_PAGE code in
follow_page() to avoid wasting memory on large uninstantiated anon
areas, but hugepages won't go that way. If the dump hangs waiting for
memory to be freed, or OOMkills other processes, that wouldn't be good;
whereas if hugepage reservations (I've not followed what happens with
them) or whatever just make it skip when no more, that should be okay.

Hugh

2008-08-28 23:37:28

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH] coredump_filter: add hugepage core dumping

Hi

> > I think VM_RESERVED default dumping bahavior is good,
> > then I'd like to add coredump_filter mask.
> >
> > This patch doesn't change dafault behavior.
>
> This seems very reasonable to me
> (though I've little use for coredumps or hugepages myself).

Thanks a lot.

> One caution though: how well does it behave when coredumping a large
> area of hugepages which have not actually been instantiated prior to
> the coredump? We have that funny FOLL_ANON ZERO_PAGE code in
> follow_page() to avoid wasting memory on large uninstantiated anon
> areas, but hugepages won't go that way. If the dump hangs waiting for
> memory to be freed, or OOMkills other processes, that wouldn't be good;
> whereas if hugepage reservations (I've not followed what happens with
> them) or whatever just make it skip when no more, that should be okay.

I think hugepage reservation pages always exist when hugepage COW happend.
Then, hugepage access never cause try_to_free_pages() nor OOM.

Adam, if I talk lie, please tell us truth.



2008-08-29 15:28:45

by Adam Litke

[permalink] [raw]
Subject: Re: [PATCH] coredump_filter: add hugepage core dumping

On Fri, 2008-08-29 at 08:35 +0900, KOSAKI Motohiro wrote:
> Hi
>
> > > I think VM_RESERVED default dumping bahavior is good,
> > > then I'd like to add coredump_filter mask.
> > >
> > > This patch doesn't change dafault behavior.
> >
> > This seems very reasonable to me
> > (though I've little use for coredumps or hugepages myself).
>
> Thanks a lot.
>
> > One caution though: how well does it behave when coredumping a large
> > area of hugepages which have not actually been instantiated prior to
> > the coredump? We have that funny FOLL_ANON ZERO_PAGE code in
> > follow_page() to avoid wasting memory on large uninstantiated anon
> > areas, but hugepages won't go that way. If the dump hangs waiting for
> > memory to be freed, or OOMkills other processes, that wouldn't be good;
> > whereas if hugepage reservations (I've not followed what happens with
> > them) or whatever just make it skip when no more, that should be okay.
>
> I think hugepage reservation pages always exist when hugepage COW happend.
> Then, hugepage access never cause try_to_free_pages() nor OOM.

(Mel, since you wrote the private reservation hugetlb code, would you
care to verify the following:)

Well, reserved huge pages _almost_ always exist. The notable exception
happens when a process creates a MAP_PRIVATE hugetlb mapping and then
forks. No guarantees are made to the children for access to that
hugetlb mapping. So if such a child were to core dump an unavailable
huge page, follow_hugetlb_page() would fail. I think that case is
harmless since it looks like elf_coredump() will replace it with a
zeroed page?

The part of Hugh's email that does deserve more attention is the part
about FOLL_ANON and the ZERO_PAGE. It seems like an awful waste to zero
out and instantiate huge pages just for a core dump. I think it would
be worth adding a flag to follow_hugetlb_page() so that it can be
instructed to not fault in un-instantiated huge pages. This would take
some investigating as to whether it is even valid for
follow_hugetlb_page() to return the ZERO_PAGE().

--
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center