Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp5456053imm; Tue, 16 Oct 2018 10:29:40 -0700 (PDT) X-Google-Smtp-Source: ACcGV63riJ7MtrqP2AizVeZA0A/5yDNfWOSUua9nbAqI9p5gituc8Sl45hMINl1GhnzNggx3DMYr X-Received: by 2002:a63:4e18:: with SMTP id c24-v6mr16511651pgb.6.1539710980502; Tue, 16 Oct 2018 10:29:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539710980; cv=none; d=google.com; s=arc-20160816; b=c6zl+yK3Vasfn5K/iZ2zct9JuxRDFxYs6wuFwOmDAoU5zajk35H3ny85/IQ0QlpMLn ekjuBPCfUoSLFORisek0pyo0AqtyYtFl8647KcKAwRtZvIXvSTbEOJJQF/Ouk8DahvoC 79+IdSpdxd94ALQNs87LJjv3AlmQCk90llcNW2gnULkQqGtlkHHhbl4QubpSQB/Oldg9 9RPHOYhNwLMj/dgrUJENLh7ohVh0YlflkJGZ+S9bCNjRy568dLg4jmds5Y/ZkfH437Kr s5nh/aR+mk+1EsWy49nMOZ8NdZoZzHfXWyElvqjuzUEec0ropF6C9VjyEQ9iaAu3WYzS nmqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id; bh=UGo7stel7GdSaQ9ML8JlDuRUj3O8fMOGw/k3sQl+RMk=; b=DZJctWb+p8xfrbzBHqGyDqJTRHNVV0wM/3h+BuwzyeGwgGiU+WoJ8ACD2dVwTxKyjz Y3vU34ouUHH+8thPf3yw271GEfeMjm+eApfh+9Jd1DbhBGzDDIpnO9i2zmiRSRRzDvCl qMtpEIzZxPAjCfjbTzPaNr7Rh0lRrWtRELUG1zkbJXv6SYXOGDENfFm9gLxBNDv9uQ5A GTUznSoWSb6KzkjPXzyMtFu5P2plDGf8kqeZIDNQii/OgTc0z5FhcUlJHR1CPBeV/MpF zeAVUOC7Jsa1QcR7RMGMGPEdaD0j0CcHsUDSpXgXvibV7d+dAQqBc8pIxyMcDMZhWSdb Viqg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c22-v6si14299917pgk.292.2018.10.16.10.29.24; Tue, 16 Oct 2018 10:29:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727834AbeJQBUI (ORCPT + 99 others); Tue, 16 Oct 2018 21:20:08 -0400 Received: from mga06.intel.com ([134.134.136.31]:2409 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727154AbeJQBUH (ORCPT ); Tue, 16 Oct 2018 21:20:07 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Oct 2018 10:28:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,389,1534834800"; d="scan'208";a="78487647" Received: from 2b52.sc.intel.com ([143.183.136.51]) by fmsmga007.fm.intel.com with ESMTP; 16 Oct 2018 10:28:38 -0700 Message-ID: <124c1c2805286c70a9b2cc8e4b0abad7ef997ed4.camel@intel.com> Subject: Re: [RFC PATCH v4 21/27] x86/cet/shstk: ELF header parsing of Shadow Stack From: Yu-cheng Yu To: Kees Cook , Andy Lutomirski Cc: X86 ML , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , LKML , "open list:DOCUMENTATION" , Linux-MM , linux-arch , Linux API , Arnd Bergmann , Balbir Singh , Cyrill Gorcunov , Dave Hansen , Florian Weimer , "H.J. Lu" , Jann Horn , Jonathan Corbet , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , Vedvyas Shanbhogue Date: Tue, 16 Oct 2018 10:23:43 -0700 In-Reply-To: References: <20180921150351.20898-1-yu-cheng.yu@intel.com> <20180921150351.20898-22-yu-cheng.yu@intel.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.1-2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2018-10-15 at 16:40 -0700, Kees Cook wrote: > On Fri, Sep 21, 2018 at 8:03 AM, Yu-cheng Yu wrote: > > Look in .note.gnu.property of an ELF file and check if Shadow Stack needs > > to be enabled for the task. [...] > > +/* > > + * The .note.gnu.property layout: > > + * > > + * struct elf_note { > > + * u32 n_namesz; --> sizeof(n_name[]); always (4) > > + * u32 n_ndescsz;--> sizeof(property[]) > > + * u32 n_type; --> always NT_GNU_PROPERTY_TYPE_0 > > + * }; > > + * char n_name[4]; --> always 'GNU\0' > > + * > > + * struct { > > + * struct property_x86 { > > + * u32 pr_type; > > + * u32 pr_datasz; > > + * }; > > + * u8 pr_data[pr_datasz]; > > + * }[]; > > + */ > > Does NT_GNU_PROPERTY_TYPE_0 only ever contain property_x86 bytes? (I > assume not, since there is a pr_type?) There are other property types, but we only look for NT_GNU_PROPERTY_TYPE_0. > > + > > +#define BUF_SIZE (PAGE_SIZE / 4) > > + > > +struct property_x86 { > > + u32 pr_type; > > + u32 pr_datasz; > > +}; > > + > > +typedef bool (test_fn)(void *buf, u32 *arg); > > +typedef void *(next_fn)(void *buf, u32 *arg); > > + > > +static inline bool test_note_type_0(void *buf, u32 *arg) > > +{ > > + struct elf_note *n = buf; > > + > > + return ((n->n_namesz == 4) && (memcmp(n + 1, "GNU", 4) == 0) && > > + (n->n_type == NT_GNU_PROPERTY_TYPE_0)); > > Cheaper to test n_type first... Yes, Thanks! > > > +} > > + > > +static inline void *next_note(void *buf, u32 *arg) > > +{ > > + struct elf_note *n = buf; > > + u32 align = *arg; > > + int size; > > + > > + size = round_up(sizeof(*n) + n->n_namesz, align); > > I think this could overflow: n_namesz can be u64 for elf64_note. > > > + size = round_up(size + n->n_descsz, align); > > Same here. You may want to use check_add_overflow(), etc, an u64 types. Note->n_namesz is always four-byte. I should have used u32. > > > + > > + if (buf + size < buf) > > + return NULL; > > I don't understand this. You want to check size not exceeding the > allocation, which isn't passed into this function. Checking for a full > unsigned address wrap around is not sufficient to detect overflow. Here we only detect the warp around. After this returns we then check other types of overflow in scan(). > > > + else > > + return (buf + size); > > +} > > + > > +static inline bool test_property_x86(void *buf, u32 *arg) > > +{ > > + struct property_x86 *pr = buf; > > + u32 max_type = *arg; > > + > > + if (pr->pr_type > max_type) > > + *arg = pr->pr_type; > > Why is *arg being updated? I don't see last_pr used outside of here -- > are properties required to be pr_type-ordered? Yes, they need to be in ascending order. > > > + > > + return (pr->pr_type == GNU_PROPERTY_X86_FEATURE_1_AND); > > +} > > + > > +static inline void *next_property(void *buf, u32 *arg) > > +{ > > + struct property_x86 *pr = buf; > > + u32 max_type = *arg; > > + > > + if ((buf + sizeof(*pr) + pr->pr_datasz < buf) || > > Again, this "< buf" test doesn't look at all correct to me. > > > + (pr->pr_type > GNU_PROPERTY_X86_FEATURE_1_AND) || > > + (pr->pr_type > max_type)) > > + return NULL; > > + else > > + return (buf + sizeof(*pr) + pr->pr_datasz); > > +} > > + > > +/* > > + * Scan 'buf' for a pattern; return true if found. > > + * *pos is the distance from the beginning of buf to where > > + * the searched item or the next item is located. > > + */ > > +static int scan(u8 *buf, u32 buf_size, int item_size, > > + test_fn test, next_fn next, u32 *arg, u32 *pos) > > I'm not a fan of the short "scan", "test" and "next" names, and I > really don't like an arg named "arg". Something slightly more > descriptive for all of these would be nice, please. I need to work on that :-) What would you suggest? > > > +{ > > + int found = 0; > > + u8 *p, *max; > > + > > + max = buf + buf_size; > > + if (max < buf) > > + return 0; > > + > > + p = buf; > > + > > + while ((p + item_size < max) && (p + item_size > buf)) { > > These comparisons are safe due to the BUF_SIZE limit of buf_size and > the only used size of item_size, but if this becomes more generic, it > should be more defensive on the size calculations (e.g. make sure than > "item_size < max" and then here "p < max - item_size", etc). > > I'd kind of rather this code walked the base type and check each for > the matching feature. What is the general specification for what > NT_GNU_PROPERTY_TYPE_0 contains? There are other property types, but the kernel does not look at most of them. If the kernel needs to look at others, we need to rewrite this. [...] > > + > > +/* > > + * Search a PT_NOTE segment for the first NT_GNU_PROPERTY_TYPE_0. > > + */ > > +static int find_note_type_0(struct file *file, unsigned long note_size, > > + loff_t file_offset, u32 align, u32 *feature) > > +{ > > + u8 *buf; > > + u32 buf_pos; > > + unsigned long read_size; > > + unsigned long done; > > + int found = 0; > > + int ret = 0; > > + > > + buf = kmalloc(BUF_SIZE, GFP_KERNEL); > > + if (!buf) > > + return -ENOMEM; > > Why kmalloc over stack variable? (Or, does BUF_SIZE here really need > to be 1024?) BUF_SIZE can be smaller, for example 64. If it is too small, we need to do kernel_read() too often. > > > + > > + *feature = 0; > > + buf_pos = 0; > > + > > + for (done = 0; done < note_size; done += buf_pos) { > > + read_size = note_size - done; > > + if (read_size > BUF_SIZE) > > + read_size = BUF_SIZE; > > + > > + ret = kernel_read(file, buf, read_size, &file_offset); > > + > > + if (ret != read_size) { > > + ret = (ret < 0) ? ret : -EIO; > > + kfree(buf); > > + return ret; > > + } > > + > > + /* > > + * item_size = sizeof(struct elf_note) + elf_note.n_namesz. > > + * n_namesz is 4 for the note type we look for. > > + */ > > + ret = 0; > > + found += scan(buf, read_size, sizeof(struct elf_note) + 4, > > + test_note_type_0, next_note, > > + &align, &buf_pos); > > + > > + file_offset += buf_pos - read_size; > > + > > + if (found == 1) { > > + struct elf_note *n = > > + (struct elf_note *)(buf + buf_pos); > > + u32 start = round_up(sizeof(*n) + n->n_namesz, > > align); > > + u32 total = round_up(start + n->n_descsz, align); > > Same overflow notes from earlier... > > > + > > + ret = find_feature_x86(file, n->n_descsz, > > + file_offset + start, > > + buf, feature); > > + file_offset += total; > > + buf_pos += total; > > + } else if (!buf_pos) { > > + *feature = 0; > > + break; > > + } > > + } > > + > > + kfree(buf); > > + return ret; > > +} > > + > > +#ifdef CONFIG_COMPAT > > +static int check_notes_32(struct file *file, struct elf32_phdr *phdr, > > + int phnum, u32 *feature) > > +{ > > + int i; > > + int err = 0; > > + > > + for (i = 0; i < phnum; i++, phdr++) { > > + if ((phdr->p_type != PT_NOTE) || (phdr->p_align != 4)) > > + continue; > > + > > + err = find_note_type_0(file, phdr->p_filesz, phdr->p_offset, > > + phdr->p_align, feature); > > + if (err) > > + return err; > > + } > > + > > + return 0; > > +} > > +#endif > > + > > +#ifdef CONFIG_X86_64 > > +static int check_notes_64(struct file *file, struct elf64_phdr *phdr, > > + int phnum, u32 *feature) > > +{ > > + int i; > > + int err = 0; > > + > > + for (i = 0; i < phnum; i++, phdr++) { > > + if ((phdr->p_type != PT_NOTE) || (phdr->p_align != 8)) > > + continue; > > Instead of a separate parser here, wouldn't it be a bit nicer to > attach this to the existing binfmt_elf program header parsing loop: We need to wait until SET_PERSONALITY2() is done. [...] > > +int arch_setup_features(void *ehdr_p, void *phdr_p, > > + struct file *file, bool interp) > > +{ > > + int err = 0; > > + u32 feature = 0; > > + > > + struct elf64_hdr *ehdr64 = ehdr_p; > > + > > + if (!cpu_feature_enabled(X86_FEATURE_SHSTK)) > > + return 0; > > + > > + if (ehdr64->e_ident[EI_CLASS] == ELFCLASS64) { > > + struct elf64_phdr *phdr64 = phdr_p; > > + > > + err = check_notes_64(file, phdr64, ehdr64->e_phnum, > > + &feature); > > + if (err < 0) > > + goto out; > > + } else { > > +#ifdef CONFIG_COMPAT > > + struct elf32_hdr *ehdr32 = ehdr_p; > > + > > + if (ehdr32->e_ident[EI_CLASS] == ELFCLASS32) { > > + struct elf32_phdr *phdr32 = phdr_p; > > + > > + err = check_notes_32(file, phdr32, ehdr32->e_phnum, > > + &feature); > > + if (err < 0) > > + goto out; > > + } > > +#endif > > Should there be an #else error here? Yes, thanks. > I'd like to be using this code for a few other cases too (not just > x86-specific). For example, for marking KASan binaries as needing a > "legacy" memory layouts[1]. Others might be setting things like > no_new_privs at exec time, etc. If the item is a bit of GNU_PROPERTY_X86_FEATURE_1_AND, then this code would work. Has it been finalized? Yu-cheng