2022-02-01 20:42:31

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH] binfmt_elf: Take the mmap lock when walking the VMA list

I'm not sure if the VMA list can change under us, but dump_vma_snapshot()
is very careful to take the mmap_lock in write mode. We only need to
take it in read mode here as we do not care if the size of the stack
VMA changes underneath us.

If it can be changed underneath us, this is a potential use-after-free
for a multithreaded process which is dumping core.

Fixes: 2aa362c49c31 ("coredump: extend core dump note section to contain file names of mapped files")
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
---
fs/binfmt_elf.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 605017eb9349..dc2318355762 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1651,6 +1651,7 @@ static int fill_files_note(struct memelfnote *note)
name_base = name_curpos = ((char *)data) + names_ofs;
remaining = size - names_ofs;
count = 0;
+ mmap_read_lock(mm);
for (vma = mm->mmap; vma != NULL; vma = vma->vm_next) {
struct file *file;
const char *filename;
@@ -1661,6 +1662,7 @@ static int fill_files_note(struct memelfnote *note)
filename = file_path(file, name_curpos, remaining);
if (IS_ERR(filename)) {
if (PTR_ERR(filename) == -ENAMETOOLONG) {
+ mmap_read_unlock(mm);
kvfree(data);
size = size * 5 / 4;
goto alloc;
@@ -1680,6 +1682,7 @@ static int fill_files_note(struct memelfnote *note)
*start_end_ofs++ = vma->vm_pgoff;
count++;
}
+ mmap_read_unlock(mm);

/* Now we know exact count of files, can store it */
data[0] = count;
--
2.34.1


2022-02-01 20:43:06

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] binfmt_elf: Take the mmap lock when walking the VMA list

"Matthew Wilcox (Oracle)" <[email protected]> writes:

> I'm not sure if the VMA list can change under us, but dump_vma_snapshot()
> is very careful to take the mmap_lock in write mode. We only need to
> take it in read mode here as we do not care if the size of the stack
> VMA changes underneath us.
>
> If it can be changed underneath us, this is a potential use-after-free
> for a multithreaded process which is dumping core.

The problem is not multi-threaded process so much as processes that
share their mm.

I think rather than take a lock we should be using the snapshot captured
with dump_vma_snapshot. Otherwise we have the very real chance that the
two get out of sync. Which would result in a non-sense core file.

Probably that means that dump_vma_snapshot needs to call get_file on
vma->vm_file store it in core_vma_metadata.

Do you think you can fix it something like that?

Eric


> Fixes: 2aa362c49c31 ("coredump: extend core dump note section to contain file names of mapped files")
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> Reviewed-by: Liam R. Howlett <[email protected]>
> ---
> fs/binfmt_elf.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 605017eb9349..dc2318355762 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -1651,6 +1651,7 @@ static int fill_files_note(struct memelfnote *note)
> name_base = name_curpos = ((char *)data) + names_ofs;
> remaining = size - names_ofs;
> count = 0;
> + mmap_read_lock(mm);
> for (vma = mm->mmap; vma != NULL; vma = vma->vm_next) {
> struct file *file;
> const char *filename;
> @@ -1661,6 +1662,7 @@ static int fill_files_note(struct memelfnote *note)
> filename = file_path(file, name_curpos, remaining);
> if (IS_ERR(filename)) {
> if (PTR_ERR(filename) == -ENAMETOOLONG) {
> + mmap_read_unlock(mm);
> kvfree(data);
> size = size * 5 / 4;
> goto alloc;
> @@ -1680,6 +1682,7 @@ static int fill_files_note(struct memelfnote *note)
> *start_end_ofs++ = vma->vm_pgoff;
> count++;
> }
> + mmap_read_unlock(mm);
>
> /* Now we know exact count of files, can store it */
> data[0] = count;

2022-02-01 20:43:42

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] binfmt_elf: Take the mmap lock when walking the VMA list

On Mon, Jan 31, 2022 at 10:03:31AM -0600, Eric W. Biederman wrote:
> "Matthew Wilcox (Oracle)" <[email protected]> writes:
>
> > I'm not sure if the VMA list can change under us, but dump_vma_snapshot()
> > is very careful to take the mmap_lock in write mode. We only need to
> > take it in read mode here as we do not care if the size of the stack
> > VMA changes underneath us.
> >
> > If it can be changed underneath us, this is a potential use-after-free
> > for a multithreaded process which is dumping core.
>
> The problem is not multi-threaded process so much as processes that
> share their mm.

I don't understand the difference. I appreciate that another process can
get read access to an mm through, eg, /proc, but how can another process
(that isn't a thread of this process) modify the VMAs?

> I think rather than take a lock we should be using the snapshot captured
> with dump_vma_snapshot. Otherwise we have the very real chance that the
> two get out of sync. Which would result in a non-sense core file.
>
> Probably that means that dump_vma_snapshot needs to call get_file on
> vma->vm_file store it in core_vma_metadata.
>
> Do you think you can fix it something like that?

Uhh .. that seems like it needs a lot more understanding of binfmt_elf
than I currently possess. I'd rather spend my time working on folios
than learning much more about binfmt_elf. I was just trying to fix an
assertion failure with the maple tree patches (we now assert that you're
holding a lock when walking the list of VMAs).

2022-02-01 20:44:35

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] binfmt_elf: Take the mmap lock when walking the VMA list

On Mon, Jan 31, 2022 at 10:26:11AM -0600, Eric W. Biederman wrote:
> Matthew Wilcox <[email protected]> writes:
>
> > On Mon, Jan 31, 2022 at 10:03:31AM -0600, Eric W. Biederman wrote:
> >> "Matthew Wilcox (Oracle)" <[email protected]> writes:
> >>
> >> > I'm not sure if the VMA list can change under us, but dump_vma_snapshot()
> >> > is very careful to take the mmap_lock in write mode. We only need to
> >> > take it in read mode here as we do not care if the size of the stack
> >> > VMA changes underneath us.
> >> >
> >> > If it can be changed underneath us, this is a potential use-after-free
> >> > for a multithreaded process which is dumping core.
> >>
> >> The problem is not multi-threaded process so much as processes that
> >> share their mm.
> >
> > I don't understand the difference. I appreciate that another process can
> > get read access to an mm through, eg, /proc, but how can another process
> > (that isn't a thread of this process) modify the VMAs?
>
> There are a couple of ways.
>
> A classic way is a multi-threads process can call vfork, and the
> mm_struct is shared with the child until exec is called.

While true, I thought the semantics of vfork() were that the parent
was suspended. Given that, it can't core dump until the child execs
... right?

> A process can do this more deliberately by forking a child using
> clone(CLONE_VM) and not including CLONE_THREAD. Supporting this case
> is a hold over from before CLONE_THREAD was supported in the kernel and
> such processes were used to simulate threads.

That is a multithreaded process then! Maybe not in the strict POSIX
compliance sense, but the intent is to be a multithreaded process.
ie multiple threads of execution, sharing an address space.

> It also happens that there are subsystems in the kernel that do things
> like kthread_use_mm that can also be modifying the mm during a coredump.

Yikes. That's terrifying. It's really legitimate for a kthread to
attach to a process and start tearing down VMAs?

> > Uhh .. that seems like it needs a lot more understanding of binfmt_elf
> > than I currently possess. I'd rather spend my time working on folios
> > than learning much more about binfmt_elf. I was just trying to fix an
> > assertion failure with the maple tree patches (we now assert that you're
> > holding a lock when walking the list of VMAs).
>
> Fair enough. I will put it on my list of things to address.

Thanks. Now that I've disclosed it's a UAF, I hope you're able to
get to it soon. Otherwise we should put this band-aid in for now
and you can address it properly in the fullness of time.

2022-02-01 20:45:11

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] binfmt_elf: Take the mmap lock when walking the VMA list

Matthew Wilcox <[email protected]> writes:

> On Mon, Jan 31, 2022 at 10:03:31AM -0600, Eric W. Biederman wrote:
>> "Matthew Wilcox (Oracle)" <[email protected]> writes:
>>
>> > I'm not sure if the VMA list can change under us, but dump_vma_snapshot()
>> > is very careful to take the mmap_lock in write mode. We only need to
>> > take it in read mode here as we do not care if the size of the stack
>> > VMA changes underneath us.
>> >
>> > If it can be changed underneath us, this is a potential use-after-free
>> > for a multithreaded process which is dumping core.
>>
>> The problem is not multi-threaded process so much as processes that
>> share their mm.
>
> I don't understand the difference. I appreciate that another process can
> get read access to an mm through, eg, /proc, but how can another process
> (that isn't a thread of this process) modify the VMAs?

There are a couple of ways.

A classic way is a multi-threads process can call vfork, and the
mm_struct is shared with the child until exec is called.

A process can do this more deliberately by forking a child using
clone(CLONE_VM) and not including CLONE_THREAD. Supporting this case
is a hold over from before CLONE_THREAD was supported in the kernel and
such processes were used to simulate threads.

The practical difference between a CLONE_THREAD thread and a
non-CLONE_THREAD process is that the signal handling is not shared.
Without sharing the signal handlers it does not make sense for a fatal
signal to kill the other process.

From the perspective of coredump generation it stops the execution of
all CLONE_THREAD threads that are going to be part of the coredump
and allows anyone else who shared the mm_struct to keep running.


It also happens that there are subsystems in the kernel that do things
like kthread_use_mm that can also be modifying the mm during a coredump.

Which is why we have dump_vma_snapshot. Preventing the mm_struct and
the vmas from being modified during a coredump is not really practical.


>> I think rather than take a lock we should be using the snapshot captured
>> with dump_vma_snapshot. Otherwise we have the very real chance that the
>> two get out of sync. Which would result in a non-sense core file.
>>
>> Probably that means that dump_vma_snapshot needs to call get_file on
>> vma->vm_file store it in core_vma_metadata.
>>
>> Do you think you can fix it something like that?
>
> Uhh .. that seems like it needs a lot more understanding of binfmt_elf
> than I currently possess. I'd rather spend my time working on folios
> than learning much more about binfmt_elf. I was just trying to fix an
> assertion failure with the maple tree patches (we now assert that you're
> holding a lock when walking the list of VMAs).

Fair enough. I will put it on my list of things to address.

Eric

2022-02-01 20:45:49

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH] binfmt_elf: Take the mmap lock when walking the VMA list

On Mon, Jan 31, 2022 at 5:35 PM Matthew Wilcox <[email protected]> wrote:
>
> On Mon, Jan 31, 2022 at 10:26:11AM -0600, Eric W. Biederman wrote:
> > Matthew Wilcox <[email protected]> writes:
> >
> > > On Mon, Jan 31, 2022 at 10:03:31AM -0600, Eric W. Biederman wrote:
> > >> "Matthew Wilcox (Oracle)" <[email protected]> writes:
> > >>
> > >> > I'm not sure if the VMA list can change under us, but dump_vma_snapshot()
> > >> > is very careful to take the mmap_lock in write mode. We only need to
> > >> > take it in read mode here as we do not care if the size of the stack
> > >> > VMA changes underneath us.
> > >> >
> > >> > If it can be changed underneath us, this is a potential use-after-free
> > >> > for a multithreaded process which is dumping core.
> > >>
> > >> The problem is not multi-threaded process so much as processes that
> > >> share their mm.
> > >
> > > I don't understand the difference. I appreciate that another process can
> > > get read access to an mm through, eg, /proc, but how can another process
> > > (that isn't a thread of this process) modify the VMAs?
> >
> > There are a couple of ways.
> >
> > A classic way is a multi-threads process can call vfork, and the
> > mm_struct is shared with the child until exec is called.
>
> While true, I thought the semantics of vfork() were that the parent
> was suspended. Given that, it can't core dump until the child execs
> ... right?
>
> > A process can do this more deliberately by forking a child using
> > clone(CLONE_VM) and not including CLONE_THREAD. Supporting this case
> > is a hold over from before CLONE_THREAD was supported in the kernel and
> > such processes were used to simulate threads.

The syscall clone() is kind of the generalized version of fork() and
vfork(), and it lets you create fun combinations of these things (some
of which might be useful, others which make little sense), and e.g.
vfork() is basically just clone() with CLONE_VM (for sharing address
space) plus CLONE_VFORK (block until child exits or execs) plus
SIGCHLD (child should send SIGCHLD to parent when it terminates).

(Some combinations are disallowed, but you can IIRC make things like
threads with separate FD tables, or processes that share virtual
memory and signal handler tables but aren't actual threads.)


Note that until commit 0258b5fd7c71 ("coredump: Limit coredumps to a
single thread group", first in 5.16), coredumps would always tear down
every process that shares the MM before dumping, and the coredumping
code was trying to rely on that to protect against concurrency. (Which
at some point didn't actually work anymore, see below...)

> That is a multithreaded process then! Maybe not in the strict POSIX
> compliance sense, but the intent is to be a multithreaded process.
> ie multiple threads of execution, sharing an address space.

current_is_single_threaded() agrees with you. :P

> > It also happens that there are subsystems in the kernel that do things
> > like kthread_use_mm that can also be modifying the mm during a coredump.
>
> Yikes. That's terrifying. It's really legitimate for a kthread to
> attach to a process and start tearing down VMAs?

I don't know anything about kthreads doing this, but a fun way to
remotely mess with VMAs is userfaultfd. See
https://bugs.chromium.org/p/project-zero/issues/detail?id=1790 ("Issue
1790: Linux: missing locking between ELF coredump code and userfaultfd
VMA modification") for the long version - but the gist is that
userfaultfd can set/clear flags on virtual address ranges (implemented
as flags on VMAs), and that may involve splitting VMAs (when the
boundaries of the specified range don't correspond to existing VMA
boundaries) or merging VMAs (when the flags of adjacent VMAs become
equal). And userfaultfd can by design be used remotely on another
process (so long as that process first created the userfaultfd and
then handed it over).

That's why I added that locking in the coredumping code.

> > > Uhh .. that seems like it needs a lot more understanding of binfmt_elf
> > > than I currently possess. I'd rather spend my time working on folios
> > > than learning much more about binfmt_elf. I was just trying to fix an
> > > assertion failure with the maple tree patches (we now assert that you're
> > > holding a lock when walking the list of VMAs).
> >
> > Fair enough. I will put it on my list of things to address.
>
> Thanks. Now that I've disclosed it's a UAF, I hope you're able to
> get to it soon. Otherwise we should put this band-aid in for now
> and you can address it properly in the fullness of time.

2022-02-01 20:46:14

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] binfmt_elf: Take the mmap lock when walking the VMA list

Matthew Wilcox <[email protected]> writes:

> On Mon, Jan 31, 2022 at 10:26:11AM -0600, Eric W. Biederman wrote:
>> Matthew Wilcox <[email protected]> writes:
>>
>> > On Mon, Jan 31, 2022 at 10:03:31AM -0600, Eric W. Biederman wrote:
>> >> "Matthew Wilcox (Oracle)" <[email protected]> writes:
>> >>
>> >> > I'm not sure if the VMA list can change under us, but dump_vma_snapshot()
>> >> > is very careful to take the mmap_lock in write mode. We only need to
>> >> > take it in read mode here as we do not care if the size of the stack
>> >> > VMA changes underneath us.
>> >> >
>> >> > If it can be changed underneath us, this is a potential use-after-free
>> >> > for a multithreaded process which is dumping core.
>> >>
>> >> The problem is not multi-threaded process so much as processes that
>> >> share their mm.
>> >
>> > I don't understand the difference. I appreciate that another process can
>> > get read access to an mm through, eg, /proc, but how can another process
>> > (that isn't a thread of this process) modify the VMAs?
>>
>> There are a couple of ways.
>>
>> A classic way is a multi-threads process can call vfork, and the
>> mm_struct is shared with the child until exec is called.
>
> While true, I thought the semantics of vfork() were that the parent
> was suspended. Given that, it can't core dump until the child execs
> ... right?

The thread that called vfork is suspended. The other threads can
continue to execute.

>> A process can do this more deliberately by forking a child using
>> clone(CLONE_VM) and not including CLONE_THREAD. Supporting this case
>> is a hold over from before CLONE_THREAD was supported in the kernel and
>> such processes were used to simulate threads.
>
> That is a multithreaded process then! Maybe not in the strict POSIX
> compliance sense, but the intent is to be a multithreaded process.
> ie multiple threads of execution, sharing an address space.

Sometimes. From a coredump perspective it is just another process
that happens to share the mm. Like the vfork process.

For a while the coredump code was trying to kill and possibly dump all
of these ``threads'' that shared a vm. The practical problem was that
a failing exec after vfork could trigger a coredump that would kill
it's parent process.

So when I look at these from a coredump or signal perspective I just
treat them as weird processes that happen to share an mm_struct.

>> It also happens that there are subsystems in the kernel that do things
>> like kthread_use_mm that can also be modifying the mm during a coredump.
>
> Yikes. That's terrifying. It's really legitimate for a kthread to
> attach to a process and start tearing down VMAs?

I don't know how much VMA manipulation makes sense but it is legitimate
to attach to an mm and do those things as Jann pointed out.

> Thanks. Now that I've disclosed it's a UAF, I hope you're able to
> get to it soon. Otherwise we should put this band-aid in for now
> and you can address it properly in the fullness of time.

Working on it now.

Eric

2022-02-01 20:48:52

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 0/5] Fix fill_files_note


Matthew Wilcox has reported that a missing mmap_lock in file_files_note,
which could cause trouble.

Refactor the code and clean it up so that the vma snapshot makes
it to fill_files_note, and then use the vma snapshot in fill_files_note.

Folks please review this as this looks correct to me but I haven't done
anything beyond compile testing this yet.

Eric W. Biederman (5):
coredump: Move definition of struct coredump_params into coredump.h
coredump: Snapshot the vmas in do_coredump
coredump: Remove the WARN_ON in dump_vma_snapshot
coredump/elf: Pass coredump_params into fill_note_info
coredump: Use the vma snapshot in fill_files_note

fs/binfmt_elf.c | 61 ++++++++++++++++++++++--------------------------
fs/binfmt_elf_fdpic.c | 18 +++++---------
fs/coredump.c | 55 +++++++++++++++++++++++++++++--------------
include/linux/binfmts.h | 13 +----------
include/linux/coredump.h | 20 ++++++++++++----
5 files changed, 88 insertions(+), 79 deletions(-)


Eric

2022-02-01 20:49:12

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 2/5] coredump: Snapshot the vmas in do_coredump


Move the call of dump_vma_snapshot and kvfree(vma_meta) out of the
individual coredump routines into do_coredump itself. This makes
the code less error prone and easier to maintain.

Make the vma snapshot available to the coredump routines
in struct coredump_params. This makes it easier to
change and update what is captures in the vma snapshot
and will be needed for fixing fill_file_notes.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
fs/binfmt_elf.c | 20 +++++++-------------
fs/binfmt_elf_fdpic.c | 18 ++++++------------
fs/coredump.c | 36 ++++++++++++++++++++----------------
include/linux/coredump.h | 6 +++---
4 files changed, 36 insertions(+), 44 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 605017eb9349..4822b04154e1 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -2169,8 +2169,7 @@ static void fill_extnum_info(struct elfhdr *elf, struct elf_shdr *shdr4extnum,
static int elf_core_dump(struct coredump_params *cprm)
{
int has_dumped = 0;
- int vma_count, segs, i;
- size_t vma_data_size;
+ int segs, i;
struct elfhdr elf;
loff_t offset = 0, dataoff;
struct elf_note_info info = { };
@@ -2178,16 +2177,12 @@ static int elf_core_dump(struct coredump_params *cprm)
struct elf_shdr *shdr4extnum = NULL;
Elf_Half e_phnum;
elf_addr_t e_shoff;
- struct core_vma_metadata *vma_meta;
-
- if (dump_vma_snapshot(cprm, &vma_count, &vma_meta, &vma_data_size))
- return 0;

/*
* The number of segs are recored into ELF header as 16bit value.
* Please check DEFAULT_MAX_MAP_COUNT definition when you modify here.
*/
- segs = vma_count + elf_core_extra_phdrs();
+ segs = cprm->vma_count + elf_core_extra_phdrs();

/* for notes section */
segs++;
@@ -2226,7 +2221,7 @@ static int elf_core_dump(struct coredump_params *cprm)

dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);

- offset += vma_data_size;
+ offset += cprm->vma_data_size;
offset += elf_core_extra_data_size();
e_shoff = offset;

@@ -2246,8 +2241,8 @@ static int elf_core_dump(struct coredump_params *cprm)
goto end_coredump;

/* Write program headers for segments dump */
- for (i = 0; i < vma_count; i++) {
- struct core_vma_metadata *meta = vma_meta + i;
+ for (i = 0; i < cprm->vma_count; i++) {
+ struct core_vma_metadata *meta = cprm->vma_meta + i;
struct elf_phdr phdr;

phdr.p_type = PT_LOAD;
@@ -2284,8 +2279,8 @@ static int elf_core_dump(struct coredump_params *cprm)
/* Align to page */
dump_skip_to(cprm, dataoff);

- for (i = 0; i < vma_count; i++) {
- struct core_vma_metadata *meta = vma_meta + i;
+ for (i = 0; i < cprm->vma_count; i++) {
+ struct core_vma_metadata *meta = cprm->vma_meta + i;

if (!dump_user_range(cprm, meta->start, meta->dump_size))
goto end_coredump;
@@ -2302,7 +2297,6 @@ static int elf_core_dump(struct coredump_params *cprm)
end_coredump:
free_note_info(&info);
kfree(shdr4extnum);
- kvfree(vma_meta);
kfree(phdr4note);
return has_dumped;
}
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index c6f588dc4a9d..1a25536b0120 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1465,7 +1465,7 @@ static bool elf_fdpic_dump_segments(struct coredump_params *cprm,
static int elf_fdpic_core_dump(struct coredump_params *cprm)
{
int has_dumped = 0;
- int vma_count, segs;
+ int segs;
int i;
struct elfhdr *elf = NULL;
loff_t offset = 0, dataoff;
@@ -1480,8 +1480,6 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm)
elf_addr_t e_shoff;
struct core_thread *ct;
struct elf_thread_status *tmp;
- struct core_vma_metadata *vma_meta = NULL;
- size_t vma_data_size;

/* alloc memory for large data structures: too large to be on stack */
elf = kmalloc(sizeof(*elf), GFP_KERNEL);
@@ -1491,9 +1489,6 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm)
if (!psinfo)
goto end_coredump;

- if (dump_vma_snapshot(cprm, &vma_count, &vma_meta, &vma_data_size))
- goto end_coredump;
-
for (ct = current->signal->core_state->dumper.next;
ct; ct = ct->next) {
tmp = elf_dump_thread_status(cprm->siginfo->si_signo,
@@ -1513,7 +1508,7 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm)
tmp->next = thread_list;
thread_list = tmp;

- segs = vma_count + elf_core_extra_phdrs();
+ segs = cprm->vma_count + elf_core_extra_phdrs();

/* for notes section */
segs++;
@@ -1558,7 +1553,7 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm)
/* Page-align dumped data */
dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);

- offset += vma_data_size;
+ offset += cprm->vma_data_size;
offset += elf_core_extra_data_size();
e_shoff = offset;

@@ -1578,8 +1573,8 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm)
goto end_coredump;

/* write program headers for segments dump */
- for (i = 0; i < vma_count; i++) {
- struct core_vma_metadata *meta = vma_meta + i;
+ for (i = 0; i < cprm->vma_count; i++) {
+ struct core_vma_metadata *meta = cprm->vma_meta + i;
struct elf_phdr phdr;
size_t sz;

@@ -1628,7 +1623,7 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm)

dump_skip_to(cprm, dataoff);

- if (!elf_fdpic_dump_segments(cprm, vma_meta, vma_count))
+ if (!elf_fdpic_dump_segments(cprm, cprm->vma_meta, cprm->vma_count))
goto end_coredump;

if (!elf_core_write_extra_data(cprm))
@@ -1652,7 +1647,6 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm)
thread_list = thread_list->next;
kfree(tmp);
}
- kvfree(vma_meta);
kfree(phdr4note);
kfree(elf);
kfree(psinfo);
diff --git a/fs/coredump.c b/fs/coredump.c
index 1c060c0a2d72..def2a0c9cb14 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -53,6 +53,8 @@

#include <trace/events/sched.h>

+static bool dump_vma_snapshot(struct coredump_params *cprm);
+
static int core_uses_pid;
static unsigned int core_pipe_limit;
static char core_pattern[CORENAME_MAX_SIZE] = "core";
@@ -531,6 +533,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
* by any locks.
*/
.mm_flags = mm->flags,
+ .vma_meta = NULL,
};

audit_core_dumps(siginfo->si_signo);
@@ -745,6 +748,9 @@ void do_coredump(const kernel_siginfo_t *siginfo)
pr_info("Core dump to |%s disabled\n", cn.corename);
goto close_fail;
}
+ if (!dump_vma_snapshot(&cprm))
+ goto close_fail;
+
file_start_write(cprm.file);
core_dumped = binfmt->core_dump(&cprm);
/*
@@ -758,6 +764,7 @@ void do_coredump(const kernel_siginfo_t *siginfo)
dump_emit(&cprm, "", 1);
}
file_end_write(cprm.file);
+ kvfree(cprm.vma_meta);
}
if (ispipe && core_pipe_limit)
wait_for_dump_helpers(cprm.file);
@@ -1082,14 +1089,11 @@ static struct vm_area_struct *next_vma(struct vm_area_struct *this_vma,
* Under the mmap_lock, take a snapshot of relevant information about the task's
* VMAs.
*/
-int dump_vma_snapshot(struct coredump_params *cprm, int *vma_count,
- struct core_vma_metadata **vma_meta,
- size_t *vma_data_size_ptr)
+static bool dump_vma_snapshot(struct coredump_params *cprm)
{
struct vm_area_struct *vma, *gate_vma;
struct mm_struct *mm = current->mm;
int i;
- size_t vma_data_size = 0;

/*
* Once the stack expansion code is fixed to not change VMA bounds
@@ -1097,36 +1101,36 @@ int dump_vma_snapshot(struct coredump_params *cprm, int *vma_count,
* mmap_lock in read mode.
*/
if (mmap_write_lock_killable(mm))
- return -EINTR;
+ return false;

+ cprm->vma_data_size = 0;
gate_vma = get_gate_vma(mm);
- *vma_count = mm->map_count + (gate_vma ? 1 : 0);
+ cprm->vma_count = mm->map_count + (gate_vma ? 1 : 0);

- *vma_meta = kvmalloc_array(*vma_count, sizeof(**vma_meta), GFP_KERNEL);
- if (!*vma_meta) {
+ cprm->vma_meta = kvmalloc_array(cprm->vma_count, sizeof(*cprm->vma_meta), GFP_KERNEL);
+ if (!cprm->vma_meta) {
mmap_write_unlock(mm);
- return -ENOMEM;
+ return false;
}

for (i = 0, vma = first_vma(current, gate_vma); vma != NULL;
vma = next_vma(vma, gate_vma), i++) {
- struct core_vma_metadata *m = (*vma_meta) + i;
+ struct core_vma_metadata *m = cprm->vma_meta + i;

m->start = vma->vm_start;
m->end = vma->vm_end;
m->flags = vma->vm_flags;
m->dump_size = vma_dump_size(vma, cprm->mm_flags);

- vma_data_size += m->dump_size;
+ cprm->vma_data_size += m->dump_size;
}

mmap_write_unlock(mm);

- if (WARN_ON(i != *vma_count)) {
- kvfree(*vma_meta);
- return -EFAULT;
+ if (WARN_ON(i != cprm->vma_count)) {
+ kvfree(cprm->vma_meta);
+ return false;
}

- *vma_data_size_ptr = vma_data_size;
- return 0;
+ return true;
}
diff --git a/include/linux/coredump.h b/include/linux/coredump.h
index 2ee1460a1d66..7d05370e555e 100644
--- a/include/linux/coredump.h
+++ b/include/linux/coredump.h
@@ -23,6 +23,9 @@ struct coredump_params {
loff_t written;
loff_t pos;
loff_t to_skip;
+ int vma_count;
+ size_t vma_data_size;
+ struct core_vma_metadata *vma_meta;
};

/*
@@ -35,9 +38,6 @@ extern int dump_emit(struct coredump_params *cprm, const void *addr, int nr);
extern int dump_align(struct coredump_params *cprm, int align);
int dump_user_range(struct coredump_params *cprm, unsigned long start,
unsigned long len);
-int dump_vma_snapshot(struct coredump_params *cprm, int *vma_count,
- struct core_vma_metadata **vma_meta,
- size_t *vma_data_size_ptr);
extern void do_coredump(const kernel_siginfo_t *siginfo);
#else
static inline void do_coredump(const kernel_siginfo_t *siginfo) {}
--
2.29.2

2022-02-01 20:51:39

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 0/5] Fix fill_files_note

On Mon, Jan 31, 2022 at 12:44:53PM -0600, Eric W. Biederman wrote:
>
> Matthew Wilcox has reported that a missing mmap_lock in file_files_note,
> which could cause trouble.
>
> Refactor the code and clean it up so that the vma snapshot makes
> it to fill_files_note, and then use the vma snapshot in fill_files_note.
>
> Folks please review this as this looks correct to me but I haven't done
> anything beyond compile testing this yet.
>
> Eric W. Biederman (5):
> coredump: Move definition of struct coredump_params into coredump.h
> coredump: Snapshot the vmas in do_coredump
> coredump: Remove the WARN_ON in dump_vma_snapshot
> coredump/elf: Pass coredump_params into fill_note_info
> coredump: Use the vma snapshot in fill_files_note
>
> fs/binfmt_elf.c | 61 ++++++++++++++++++++++--------------------------
> fs/binfmt_elf_fdpic.c | 18 +++++---------
> fs/coredump.c | 55 +++++++++++++++++++++++++++++--------------
> include/linux/binfmts.h | 13 +----------
> include/linux/coredump.h | 20 ++++++++++++----
> 5 files changed, 88 insertions(+), 79 deletions(-)
>
>
> Eric

This looks like a good clean-up to me. For the series:

Reviewed-by: Kees Cook <[email protected]>

--
Kees Cook

2022-02-02 14:02:10

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH 2/5] coredump: Snapshot the vmas in do_coredump

On Mon, Jan 31, 2022 at 7:46 PM Eric W. Biederman <[email protected]> wrote:
> Move the call of dump_vma_snapshot and kvfree(vma_meta) out of the
> individual coredump routines into do_coredump itself. This makes
> the code less error prone and easier to maintain.
>
> Make the vma snapshot available to the coredump routines
> in struct coredump_params. This makes it easier to
> change and update what is captures in the vma snapshot
> and will be needed for fixing fill_file_notes.
>
> Signed-off-by: "Eric W. Biederman" <[email protected]>

Reviewed-by: Jann Horn <[email protected]>

> for (i = 0, vma = first_vma(current, gate_vma); vma != NULL;
> vma = next_vma(vma, gate_vma), i++) {
> - struct core_vma_metadata *m = (*vma_meta) + i;
> + struct core_vma_metadata *m = cprm->vma_meta + i;
>
> m->start = vma->vm_start;
> m->end = vma->vm_end;
> m->flags = vma->vm_flags;
> m->dump_size = vma_dump_size(vma, cprm->mm_flags);
>
> - vma_data_size += m->dump_size;
> + cprm->vma_data_size += m->dump_size;

FYI, this part is probably going to cause a merge conflict with the
fix https://www.ozlabs.org/~akpm/mmotm/broken-out/coredump-also-dump-first-pages-of-non-executable-elf-libraries.patch
in akpm's tree. I don't know what the right way to handle that is,
just thought I'd point it out.

2022-02-02 18:48:45

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 2/5] coredump: Snapshot the vmas in do_coredump

Jann Horn <[email protected]> writes:

> On Mon, Jan 31, 2022 at 7:46 PM Eric W. Biederman <[email protected]> wrote:
>> Move the call of dump_vma_snapshot and kvfree(vma_meta) out of the
>> individual coredump routines into do_coredump itself. This makes
>> the code less error prone and easier to maintain.
>>
>> Make the vma snapshot available to the coredump routines
>> in struct coredump_params. This makes it easier to
>> change and update what is captures in the vma snapshot
>> and will be needed for fixing fill_file_notes.
>>
>> Signed-off-by: "Eric W. Biederman" <[email protected]>
>
> Reviewed-by: Jann Horn <[email protected]>
>
>> for (i = 0, vma = first_vma(current, gate_vma); vma != NULL;
>> vma = next_vma(vma, gate_vma), i++) {
>> - struct core_vma_metadata *m = (*vma_meta) + i;
>> + struct core_vma_metadata *m = cprm->vma_meta + i;
>>
>> m->start = vma->vm_start;
>> m->end = vma->vm_end;
>> m->flags = vma->vm_flags;
>> m->dump_size = vma_dump_size(vma, cprm->mm_flags);
>>
>> - vma_data_size += m->dump_size;
>> + cprm->vma_data_size += m->dump_size;
>
> FYI, this part is probably going to cause a merge conflict with the
> fix https://www.ozlabs.org/~akpm/mmotm/broken-out/coredump-also-dump-first-pages-of-non-executable-elf-libraries.patch
> in akpm's tree. I don't know what the right way to handle that is,
> just thought I'd point it out.

There are not any conflicts in principle we could just let resolution
handle it. Unfortunately both are candidates for backporting.

Either we replace your fix with a simple deletion of the executable
check, or I need to base mine on yours.

Since I need to repost mine anyway I will look at the latter.

Eric


2022-03-09 00:52:51

by Eric W. Biederman

[permalink] [raw]
Subject: [GIT PULL] Fix fill_files_note


Kees,

Please pull the coredump-vma-snapshot-fix branch from the git tree:

git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git coredump-vma-snapshot-fix

HEAD: 390031c942116d4733310f0684beb8db19885fe6 coredump: Use the vma snapshot in fill_files_note

Matthew Wilcox has reported that a missing mmap_lock in file_files_note,
which could cause trouble.

Refactor the code and clean it up so that the vma snapshot makes
it to fill_files_note, and then use the vma snapshot in fill_files_note.

Eric W. Biederman (5):
coredump: Move definition of struct coredump_params into coredump.h
coredump: Snapshot the vmas in do_coredump
coredump: Remove the WARN_ON in dump_vma_snapshot
coredump/elf: Pass coredump_params into fill_note_info
coredump: Use the vma snapshot in fill_files_note

fs/binfmt_elf.c | 66 ++++++++++++++++++++++--------------------------
fs/binfmt_elf_fdpic.c | 18 +++++--------
fs/binfmt_flat.c | 1 +
fs/coredump.c | 59 ++++++++++++++++++++++++++++---------------
include/linux/binfmts.h | 13 +---------
include/linux/coredump.h | 20 ++++++++++++---
6 files changed, 93 insertions(+), 84 deletions(-)

---

Kees I realized I needed to rebase this on Jann Horn's commit
84158b7f6a06 ("coredump: Also dump first pages of non-executable ELF
libraries"). Unfortunately before I got that done I got distracted and
these changes have been sitting in limbo for most of the development
cycle. Since you are running a tree that is including changes like this
including Jann's can you please pull these changes into your tree.

Thank you,
Eric