Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751540AbdGQWv0 (ORCPT ); Mon, 17 Jul 2017 18:51:26 -0400 Received: from mail-it0-f53.google.com ([209.85.214.53]:35534 "EHLO mail-it0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751358AbdGQWvY (ORCPT ); Mon, 17 Jul 2017 18:51:24 -0400 MIME-Version: 1.0 In-Reply-To: <34b776c32a3f75ff155bc85d3c1554e21be7bab2.1500330091.git.luto@kernel.org> References: <34b776c32a3f75ff155bc85d3c1554e21be7bab2.1500330091.git.luto@kernel.org> From: Kees Cook Date: Mon, 17 Jul 2017 15:51:23 -0700 X-Google-Sender-Auth: 870-rx7_QB9IIIJd0buDfa-2viw Message-ID: Subject: Re: [PATCH] exec: Check stack space more strictly To: Andy Lutomirski Cc: "security@kernel.org" , Andrew Morton , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7487 Lines: 193 On Mon, Jul 17, 2017 at 3:22 PM, Andy Lutomirski wrote: > We can currently blow past the stack rlimit and cause odd behavior > if there are accounting bugs, rounding issues, or races. It's not > clear that the odd behavior is actually a problem, but it's nicer to > fail the exec instead of getting out of sync with stack limits. > > Improve the code to more carefully check for space and to abort if > our stack mm gets too large in setup_arg_pages(). > > Signed-off-by: Andy Lutomirski This is a nice check to have in case binfmts do insane things to the vma before setup_arg_pages(). However, we've still got create_elf_tables() which is a fixed size _except_ that it adds all the argv/envp pointers. Those are being accounted for in the get_arg_page() calls from copy_strings(), though. Does anything use stack between copy_strings() and create_elf_tables()? > --- > fs/exec.c | 44 ++++++++++++++++++++++++++++++++++---------- > 1 file changed, 34 insertions(+), 10 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 62175cbcc801..0c60c0495269 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -764,23 +764,47 @@ int setup_arg_pages(struct linux_binprm *bprm, > /* mprotect_fixup is overkill to remove the temporary stack flags */ > vma->vm_flags &= ~VM_STACK_INCOMPLETE_SETUP; > > - stack_expand = 131072UL; /* randomly 32*4k (or 2*64k) pages */ > - stack_size = vma->vm_end - vma->vm_start; > /* > * Align this down to a page boundary as expand_stack > * will align it up. > */ > rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK; > + stack_size = vma->vm_end - vma->vm_start; > + > + if (stack_size > rlim_stack) { > + /* > + * If we've already used too much space (due to accounting > + * bugs, alignment, races, or any other cause), bail. > + */ > + ret = -ENOMEM; > + goto out_unlock; > + } > + > + /* > + * stack_expand is the amount of space beyond the space already used > + * that we're going to pre-allocate in our stack. For historical > + * reasons, it's 128kB, unless we have less space than that available > + * in our rlimit. > + * > + * This particular historical wart is wrong-headed, though, since > + * we haven't finished binfmt-specific setup, and the binfmt code > + * is going to eat up some or all of this space. > + */ > + stack_expand = min(rlim_stack - stack_size, 131072UL); Yeah, this needs to account for argv/envp pointers, IMO. But it's still a magic number not tied to the actual results of create_elf_tables() (which is effectively fixed in stack usage except for argv/envp pointers). I think create_elf_tables() needs checking added, or the magic number should be tied to the size of the AUX strings (platform, base_platform, rand_bytes), and elf_info words. Hmm, does anything use the stack _after_ create_elf_tables()? Could we move create_elf_tables() right after setup_arg_pages() and then perform the check? > + > #ifdef CONFIG_STACK_GROWSUP > - if (stack_size + stack_expand > rlim_stack) > - stack_base = vma->vm_start + rlim_stack; > - else > - stack_base = vma->vm_end + stack_expand; > + if (TASK_SIZE_MAX - vma->vm_end < stack_expand) { > + ret = -ENOMEM; > + goto out_unlock; > + } > + stack_base = vma->vm_end + stack_expand; > #else > - if (stack_size + stack_expand > rlim_stack) > - stack_base = vma->vm_end - rlim_stack; > - else > - stack_base = vma->vm_start - stack_expand; > + if (vma->vm_start < mmap_min_addr || > + vma->vm_start - mmap_min_addr < stack_expand) { > + ret = -ENOMEM; > + goto out_unlock; > + } > + stack_base = vma->vm_start - stack_expand; > #endif > current->mm->start_stack = bprm->p; > ret = expand_stack(vma, stack_base); > -- > 2.9.4 > FWIW, here are my current notes on the whole existing flow (though note my other series which is trying to move the secureexec callback earlier): do_execvat_common: check RLIMIT_NPROC unshare_files() (restored on failure) allocate bprm prepare_bprm_creds() (allocate bprm->cred) mutex_lock_interruptible(¤t->signal->cred_guard_mutex) prepare_exec_creds() prepare_creds() memcpy(new, old, ...) security_prepare_creds() (unimplemented by commoncap) check_unsafe_exec() do_open_execat() sched_exec() if closed fd, set BINPRM_FLAGS_PATH_INACCESSIBLE bprm_mm_init() (allocate bprm->mm) prepare_binprm() bprm_fill_uid() (fill bprm->cred) resets euid/egid to current euid/egid sets euid/egid on bprm based on set*id security_bprm_set_creds() (handles all euid etc logic) kernel_read(bprm->file, ...) copy_strings() (argv) get_arg_page() (could fail with E2BIG for args size) copy_strings() (envp) get_arg_page() (could fail with E2BIG for args size) would_dump() (examine inode perms for MAY_READ) maybe set BINPRM_FLAGS_ENFORCE_NONDUMP exec_binprm() search_binary_handler() (may go recursive for script and misc) security_bprm_check() for each format load_binary() (e.g. load_elf_binary) load_elf_phdrs() kernel_read(bprm->file, ..., elf_interpreter) would_dump() (interpreter) maybe set BINPRM_FLAGS_ENFORCE_NONDUMP PT_GNU_STACK arch_check_elf() flush_old_exec() de_thread() exec_mmap() current->mm = bprm->mm bprm->mm = NULL; (failures after here will SEGV current) flush_thread() current->personality setup_new_exec() arch_pick_mmap_layout() (uses current RLIMIT_STACK) dumpable (using current uid/euid?!) arch_setup_new_exec() perf_event_exec() bprm->cred uid/euid pdeath_signal = 0 dumpable (based on BINPRM_FLAGS_ENFORCE_NONDUMP) current->self_exec_id++ flush_signal_handlers() install_exec_creds() security_bprm_committing_creds() commit_creds() bprm->cred = NULL if !dumpable perf_event_exit_task() security_bprm_committed_creds() mutex_unlock(¤t->signal->cred_guard_mutex) setup_arg_pages() uses RLIMIT_STACK load ELF image (ET_EXEC vs ET_DYN, load_bias, etc) set_brk() load_elf_interp() (if elf_interpreter) create_elf_tables() (eats fixed stack except argv/envp ptrs) arch_randomize_brk() start_thread() (everything is up and running now) if (retval < 0 && !bprm->mm) SIGSEGV (can we send signals to my setuid parent?) audit_bprm() trace_sched_process_exec() ptrace_event(PTRACE_EVENT_EXEC) proc_exec_connector() clean up -Kees -- Kees Cook Pixel Security