Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp19097233ybl; Fri, 3 Jan 2020 15:48:10 -0800 (PST) X-Google-Smtp-Source: APXvYqzezrl8jt3LjQ+HSziDm2nTVj4iYRJKJQa8nngmfZSNaVsdQAZWVfF5rToM/A94yYUN6XHo X-Received: by 2002:a9d:5e8c:: with SMTP id f12mr66798601otl.368.1578095290588; Fri, 03 Jan 2020 15:48:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578095290; cv=none; d=google.com; s=arc-20160816; b=A/bTrixUud9X6KEhrsYDIkF26wid67FYPmsQ6pLXuDBdVamLWuYUApTeIgOqlfLkVY poyIordVXfuJAPG+2Q1hbU+4RSKRqGZp0qM0GCNA4ODUeaTzfm4fOGy12+DkZCUmVYco 5nXTClua81HZylms07Xd2u32fkHRSD6TNBHWsRTAXN4dcDu85SVru7wI5EbF4p6fBDtj o3N8gqmRYO9Vi62iL+5KW/T/dq8ya4MRG+Pvyp9OYhRgYwxLaVqfuaKT3+KZqrWSVWcH irysffR99aJftt6Nk04Tyo0jQTG6OtUbgXF7j5Ka8p1eISXxxhYvE3D25v0sQuv+51Bo leIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=xqpA7+IQg8wmtI5+/njoQrdK03zJ9XpJ8M9GpWAhFPk=; b=lQmpIm0QtcFx07w8/rB/IkqbkcJ/ExLrHL+L6H/gKp7BPGkjw5P8ihZkj//dvJql3w 4POIzdfKUiU9ISWfFkOpRHfLmD8kqfNfE5LHx1O7J9sUvmJjnhryJ8lgH3EGOCsER5uq LzqrqzOjfnwKlSgOB+WIo30jlxMHmXNbrswhHcdf9W83918kEDKbLbtnDZf786i2p5xn 0PwBWgZ7MwQgYo5ESLum9c+dL/SRqZt179d5qC91COklkmj8m825ZFnH55gFrarMwxYB hcXXRN82ZSw0q053NsEAHdV74MUUiG8oGf9H8ElOIbkPVgUEuef7ePPjwogj70wtkhU5 EcOg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=SSe8Boj9; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l2si27227774oti.303.2020.01.03.15.47.57; Fri, 03 Jan 2020 15:48:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=SSe8Boj9; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726426AbgACXrQ (ORCPT + 99 others); Fri, 3 Jan 2020 18:47:16 -0500 Received: from mail-wm1-f65.google.com ([209.85.128.65]:53186 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726299AbgACXrQ (ORCPT ); Fri, 3 Jan 2020 18:47:16 -0500 Received: by mail-wm1-f65.google.com with SMTP id p9so9689925wmc.2 for ; Fri, 03 Jan 2020 15:47:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=xqpA7+IQg8wmtI5+/njoQrdK03zJ9XpJ8M9GpWAhFPk=; b=SSe8Boj9RRGI3BiJ+6SLOcjxHe8xsZyytqj19N8JTEqg14ts0P2ZcvhEl3Q0GrGUuq XjfL0eFLqlalIt2vXNcK38SVfF7btM6Jx/SumoH3F87p0SQ+WQxMjeIvHW3bEwosjzPm vOYFZR2wIUQKBHPhiHj5k5OvBfvYDDvvxCclo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=xqpA7+IQg8wmtI5+/njoQrdK03zJ9XpJ8M9GpWAhFPk=; b=Zh5oHXnDqxJ1JA1CFMfjJjpzJVxkRPLKTaINXdu/xKN7v4s+6FBo+G6wKOC5MSAChE qP3Sd9COBo3sYfN+r61Lc4eM8s98WR92sC/PkYmd5jzJT53Jqr3rDCkANpB7aXmt52e4 fvtVEOM+OfJqwvd6ir1sFMeHeVZ6GPtxB75eo4x1B6ipg74nlSQ8caURk3+rzc9bmKI4 bJ9UiI1t9xdBTnR7jUzx2SvMyn5SFUp66JYKw5Pk6GVzBrgGh/26IXr5j/aOopsPK6zV cMMcC9WNdXjTPLiKq/yvyzS2fNEquEIllzJSmEzJaW7FFY/BsgguVyjp2MPZ06ksoKkE Fd5A== X-Gm-Message-State: APjAAAW3lzZQELVwzMDrgDZn5Kx8gAOQwnBZBZGuVJsYjHkuwUqNsdEF 5ZHoLyCzd35jrXySx7iZzkofvPpx9BA= X-Received: by 2002:a05:600c:2c53:: with SMTP id r19mr21764423wmg.39.1578095233571; Fri, 03 Jan 2020 15:47:13 -0800 (PST) Received: from kpsingh-kernel.localdomain (77-56-209-237.dclient.hispeed.ch. [77.56.209.237]) by smtp.gmail.com with ESMTPSA id e6sm60931506wru.44.2020.01.03.15.47.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Jan 2020 15:47:13 -0800 (PST) From: KP Singh To: linux-kernel@vger.kernel.org, bpf@vger.kernel.org, x86@kernel.org, linux-security-module@vger.kernel.org Cc: Kees Cook , "David S. Miller" , Alexey Kuznetsov , Hideaki YOSHIFUJI , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song , Andrii Nakryiko , Thomas Garnier , Florent Revest , Brendan Jackman , Jann Horn , Matthew Garrett , Michael Halcrow Subject: [PATCH bpf-next] bpf: Make trampolines W^X Date: Sat, 4 Jan 2020 00:47:25 +0100 Message-Id: <20200103234725.22846-1-kpsingh@chromium.org> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: KP Singh The image for the BPF trampolines is allocated with bpf_jit_alloc_exe_page which marks this allocated page executable. This means that the allocated memory is W and X at the same time making it susceptible to WX based attacks. Since the allocated memory is shared between two trampolines (the current and the next), 2 pages must be allocated to adhere to W^X and the following sequence is obeyed where trampolines are modified: - Mark memory as non executable (set_memory_nx). While module_alloc for x86 allocates the memory as PAGE_KERNEL and not PAGE_KERNEL_EXEC, not all implementations of module_alloc do so - Mark the memory as read/write (set_memory_rw) - Modify the trampoline - Mark the memory as read-only (set_memory_ro) - Mark the memory as executable (set_memory_x) There's a window between the modification of the trampoline and setting the trampoline as read-only where an attacker and ~could~ change the contents of the memory. This can be further improved by adding more verification after the memory is marked as read-only. Reported-by: Kees Cook Signed-off-by: KP Singh --- arch/x86/net/bpf_jit_comp.c | 2 +- include/linux/bpf.h | 1 - kernel/bpf/dispatcher.c | 30 +++++++++++++++++++++++---- kernel/bpf/trampoline.c | 41 +++++++++++++++++++------------------ 4 files changed, 48 insertions(+), 26 deletions(-) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 4c8a2d1f8470..a510f8260322 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -1527,7 +1527,7 @@ int arch_prepare_bpf_trampoline(void *image, struct btf_func_model *m, u32 flags * Another half is an area for next trampoline. * Make sure the trampoline generation logic doesn't overflow. */ - if (WARN_ON_ONCE(prog - (u8 *)image > PAGE_SIZE / 2 - BPF_INSN_SAFETY)) + if (WARN_ON_ONCE(prog - (u8 *)image > PAGE_SIZE - BPF_INSN_SAFETY)) return -EFAULT; return 0; } diff --git a/include/linux/bpf.h b/include/linux/bpf.h index b14e51d56a82..3be8b1b0166d 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -502,7 +502,6 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key); int bpf_trampoline_link_prog(struct bpf_prog *prog); int bpf_trampoline_unlink_prog(struct bpf_prog *prog); void bpf_trampoline_put(struct bpf_trampoline *tr); -void *bpf_jit_alloc_exec_page(void); #define BPF_DISPATCHER_INIT(name) { \ .mutex = __MUTEX_INITIALIZER(name.mutex), \ .func = &name##func, \ diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c index 204ee61a3904..f4589da3bb34 100644 --- a/kernel/bpf/dispatcher.c +++ b/kernel/bpf/dispatcher.c @@ -93,13 +93,34 @@ int __weak arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs) static int bpf_dispatcher_prepare(struct bpf_dispatcher *d, void *image) { s64 ips[BPF_DISPATCHER_MAX] = {}, *ipsp = &ips[0]; - int i; + int i, ret; for (i = 0; i < BPF_DISPATCHER_MAX; i++) { if (d->progs[i].prog) *ipsp++ = (s64)(uintptr_t)d->progs[i].prog->bpf_func; } - return arch_prepare_bpf_dispatcher(image, &ips[0], d->num_progs); + + /* First make the page non-executable and then make it RW to avoid it + * from being W+X. While x86's implementation of module_alloc + * allocates memory as non-executable, not all implementations do so. + * Till these are fixed, explicitly mark the memory as NX. + */ + set_memory_nx((unsigned long)image, 1); + set_memory_rw((unsigned long)image, 1); + + ret = arch_prepare_bpf_dispatcher(image, &ips[0], d->num_progs); + if (ret) + return ret; + + /* First make the page read-only, and only then make it executable to + * prevent it from being W+X in between. + */ + set_memory_ro((unsigned long)image, 1); + /* More checks can be done here to ensure that nothing was changed + * between arch_prepare_bpf_dispatcher and set_memory_ro. + */ + set_memory_x((unsigned long)image, 1); + return 0; } static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs) @@ -113,7 +134,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs) noff = 0; } else { old = d->image + d->image_off; - noff = d->image_off ^ (PAGE_SIZE / 2); + noff = d->image_off ^ PAGE_SIZE; } new = d->num_progs ? d->image + noff : NULL; @@ -140,10 +161,11 @@ void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from, mutex_lock(&d->mutex); if (!d->image) { - d->image = bpf_jit_alloc_exec_page(); + d->image = bpf_jit_alloc_exec(2 * PAGE_SIZE); if (!d->image) goto out; } + set_vm_flush_reset_perms(d->image); prev_num_progs = d->num_progs; changed |= bpf_dispatcher_remove_prog(d, from); diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index 505f4e4b31d2..ff6a92ef8945 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -14,22 +14,6 @@ static struct hlist_head trampoline_table[TRAMPOLINE_TABLE_SIZE]; /* serializes access to trampoline_table */ static DEFINE_MUTEX(trampoline_mutex); -void *bpf_jit_alloc_exec_page(void) -{ - void *image; - - image = bpf_jit_alloc_exec(PAGE_SIZE); - if (!image) - return NULL; - - set_vm_flush_reset_perms(image); - /* Keep image as writeable. The alternative is to keep flipping ro/rw - * everytime new program is attached or detached. - */ - set_memory_x((long)image, 1); - return image; -} - struct bpf_trampoline *bpf_trampoline_lookup(u64 key) { struct bpf_trampoline *tr; @@ -50,12 +34,13 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key) goto out; /* is_root was checked earlier. No need for bpf_jit_charge_modmem() */ - image = bpf_jit_alloc_exec_page(); + image = bpf_jit_alloc_exec(2 * PAGE_SIZE); if (!image) { kfree(tr); tr = NULL; goto out; } + set_vm_flush_reset_perms(image); tr->key = key; INIT_HLIST_NODE(&tr->hlist); @@ -125,14 +110,14 @@ static int register_fentry(struct bpf_trampoline *tr, void *new_addr) } /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50 - * bytes on x86. Pick a number to fit into PAGE_SIZE / 2 + * bytes on x86. Pick a number to fit into PAGE_SIZE. */ #define BPF_MAX_TRAMP_PROGS 40 static int bpf_trampoline_update(struct bpf_trampoline *tr) { - void *old_image = tr->image + ((tr->selector + 1) & 1) * PAGE_SIZE/2; - void *new_image = tr->image + (tr->selector & 1) * PAGE_SIZE/2; + void *old_image = tr->image + ((tr->selector + 1) & 1) * PAGE_SIZE; + void *new_image = tr->image + (tr->selector & 1) * PAGE_SIZE; struct bpf_prog *progs_to_run[BPF_MAX_TRAMP_PROGS]; int fentry_cnt = tr->progs_cnt[BPF_TRAMP_FENTRY]; int fexit_cnt = tr->progs_cnt[BPF_TRAMP_FEXIT]; @@ -160,6 +145,13 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr) if (fexit_cnt) flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME; + + /* First make the page non-executable and then make it RW to avoid it + * from being being W+X. + */ + set_memory_nx((unsigned long)new_image, 1); + set_memory_rw((unsigned long)new_image, 1); + err = arch_prepare_bpf_trampoline(new_image, &tr->func.model, flags, fentry, fentry_cnt, fexit, fexit_cnt, @@ -167,6 +159,15 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr) if (err) goto out; + /* First make the page read-only, and only then make it executable to + * prevent it from being W+X in between. + */ + set_memory_ro((unsigned long)new_image, 1); + /* More checks can be done here to ensure that nothing was changed + * between arch_prepare_bpf_trampoline and set_memory_ro. + */ + set_memory_x((unsigned long)new_image, 1); + if (tr->selector) /* progs already running at this address */ err = modify_fentry(tr, old_image, new_image); -- 2.20.1