Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp3619804imm; Fri, 19 Oct 2018 13:52:52 -0700 (PDT) X-Google-Smtp-Source: ACcGV62G/oVVkHUgdlu1dAHf5W1sym439t+xb3it/6TXspv27NqMYdMrQBTqz8VAJdKHdsiXWdez X-Received: by 2002:a62:a50d:: with SMTP id v13-v6mr35823316pfm.18.1539982372236; Fri, 19 Oct 2018 13:52:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539982372; cv=none; d=google.com; s=arc-20160816; b=PS4rviEMI0sgga1q8qdXa2Ep+ycME8IAcf+Ssb0T/A8eyWNE4pIjxtsBX7xAy2Vtgu Xl50FIe9qUoFzy0pQKmpYzGuQCYEi1ENGkWMpaEFgND205loU3XPENbAsl1PKuwEk633 pqZ3rxH6OdOgJz29FK2J6Wt1AcWdekM/Rn9MbPYO50vv9fFH4KoG4Xp0USEkSgi1HF2E gheVLkqL58WMTGhIwG2f7keiyiIkT1FGx/LITSKbXkK4y7J1m905SSMtV3+cUnrDCZT0 ZlKA2pPNUVlQGaUv3nBRdTYP1/M9NCJsrn63xaENo/YtkIfoKMWMQqpMOvHzibJQeYZj GkQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=v0IIIaQO2aXZI+g5IQlrc29sJMUMf77w9YUOxs1B+gY=; b=0UMQLnyTG1rq/FSWudRttbPFDcvP0/gUVNRNIq6haWkWifQTTEA5D6h2wSDHag6NpR lm8USTKwau3bF2I4B5nRzsa8LCBCGMsK0RYgxw4boWqNN58IRLWDfeSHxDHoe2l0oLn0 nekPFFZ/6BlfRJfoL2XhUcpTrJwd0w6JBqTgfzJPstdCBj00szoCbj7mYux764xVo9xo L1vbI3qSvvhJKMY5TtX4qokxp/mNXRXth5OzaMCeRCrcsk5SSIO1Vsc+sJNFdbWoPQw6 LJMqhtjytOb9GIW4nYVRf5uXn3Pjy8sGgDw/CCbldX2GhNYDDxYyj36MD6VdGU3ryPG5 jfEg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c189-v6si8402515pfa.106.2018.10.19.13.52.37; Fri, 19 Oct 2018 13:52:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727730AbeJTE6g (ORCPT + 99 others); Sat, 20 Oct 2018 00:58:36 -0400 Received: from mga09.intel.com ([134.134.136.24]:44517 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726321AbeJTE6f (ORCPT ); Sat, 20 Oct 2018 00:58:35 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Oct 2018 13:50:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,401,1534834800"; d="scan'208";a="100971840" Received: from rpedgeco-desk5.jf.intel.com ([10.54.75.168]) by orsmga001.jf.intel.com with ESMTP; 19 Oct 2018 13:50:51 -0700 From: Rick Edgecombe To: kernel-hardening@lists.openwall.com, daniel@iogearbox.net, keescook@chromium.org, catalin.marinas@arm.com, will.deacon@arm.com, davem@davemloft.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, arnd@arndb.de, jeyu@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mips@linux-mips.org, linux-s390@vger.kernel.org, sparclinux@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, jannh@google.com Cc: kristen@linux.intel.com, dave.hansen@intel.com, arjan@linux.intel.com, deneen.t.dock@intel.com, Rick Edgecombe Subject: [PATCH RFC v3 0/3] Rlimit for module space Date: Fri, 19 Oct 2018 13:47:20 -0700 Message-Id: <20181019204723.3903-1-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org If BPF JIT is on, there is no effective limit to prevent filling the entire module space with JITed e/BPF filters. For classic BPF filters attached with setsockopt SO_ATTACH_FILTER, there is no memlock rlimit check to limit the number of insertions like there is for the bpf syscall. This patch adds a per user rlimit for module space, as well as a system wide limit for BPF JIT. In a previously reviewed patchset, Jann Horn pointed out the problem that in some cases a user can get access to 65536 UIDs, so the effective limit cannot be set low enough to stop an attacker and be useful for the general case. A discussed alternative solution was a system wide limit for BPF JIT filters. This much more simply resolves the problem of exhaustion and de-randomizing in the case of non-CONFIG_BPF_JIT_ALWAYS_ON. If CONFIG_BPF_JIT_ALWAYS_ON is on however, BPF insertions will fail if another user exhausts the BPF JIT limit. In this case a per user limit is still needed. If the subuid facility is disabled for normal users, this should still be ok because the higher limit will not be able to be worked around that way. The new BPF JIT limit can be set like this: echo 5000000 > /proc/sys/net/core/bpf_jit_limit So I *think* this patchset should resolve that issue except for the configuration of CONFIG_BPF_JIT_ALWAYS_ON and subuid allowed for normal users. Better module space KASLR is another way to resolve the de-randomizing issue, and so then you would just be left with the BPF DOS in that configuration. Jann also pointed out how, with purposely fragmenting the module space, you could make the effective module space blockage area much larger. This is also somewhat un-resolved. The impact would depend on how big of a space you are trying to allocate. The limit has been lowered on x86_64 so that at least typical sized BPF filters cannot be blocked. If anyone with more experience with subuid/user namespaces has any suggestions I'd be glad to hear. On an Ubuntu machine it didn't seem like a un-privileged user could do this. I am going to keep working on this and see if I can find a better solution. Changes since v2: - System wide BPF JIT limit (discussion with Jann Horn) - Holding reference to user correctly (Jann) - Having arch versions of modulde_alloc (Dave Hansen, Jessica Yu) - Shrinking of default limits, to help prevent the limit being worked around with fragmentation (Jann) Changes since v1: - Plug in for non-x86 - Arch specific default values Rick Edgecombe (3): modules: Create arch versions of module alloc/free modules: Create rlimit for module space bpf: Add system wide BPF JIT limit arch/arm/kernel/module.c | 2 +- arch/arm64/kernel/module.c | 2 +- arch/mips/kernel/module.c | 2 +- arch/nds32/kernel/module.c | 2 +- arch/nios2/kernel/module.c | 4 +- arch/parisc/kernel/module.c | 2 +- arch/s390/kernel/module.c | 2 +- arch/sparc/kernel/module.c | 2 +- arch/unicore32/kernel/module.c | 2 +- arch/x86/include/asm/pgtable_32_types.h | 3 + arch/x86/include/asm/pgtable_64_types.h | 2 + arch/x86/kernel/module.c | 2 +- fs/proc/base.c | 1 + include/asm-generic/resource.h | 8 ++ include/linux/bpf.h | 7 ++ include/linux/filter.h | 1 + include/linux/sched/user.h | 4 + include/uapi/asm-generic/resource.h | 3 +- kernel/bpf/core.c | 22 +++- kernel/bpf/inode.c | 16 +++ kernel/module.c | 152 +++++++++++++++++++++++- net/core/sysctl_net_core.c | 7 ++ 22 files changed, 233 insertions(+), 15 deletions(-) -- 2.17.1