Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp379359pxb; Wed, 11 Nov 2020 06:08:42 -0800 (PST) X-Google-Smtp-Source: ABdhPJyomW1Oc8C1upbq/c4SMqm/Z79xkJcqw9+dXA2z4rKMJQgXXOXNgpaUS3mV0CKHBz6LW18j X-Received: by 2002:a17:906:4803:: with SMTP id w3mr24763191ejq.406.1605103722382; Wed, 11 Nov 2020 06:08:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605103722; cv=none; d=google.com; s=arc-20160816; b=UkANNtfTEuZ8Pb+DSWLMJUWeeCf+f+2e4910c/8QPRrSR7TXZU4UtEUTpBjMufXz0r e4eNhysC1TOm2+tVcdUnd3vLm7LpSzA/X6Ypsl5QW8uH6eB4Heb+IcdfJluQO/bQ8MzP XO1iBfpHyCNXbyf3485vkXPnucuy26Ff9y5t32HrFWtTS5V50+0ZVJK2szR5D/6oAfTR juc8VZrbhnERgFEykfJEIV8k7kfFSSCy0nUpIF0wP2UR343W8yC/yqmdlEFs2SILY+3b N43dDWoeF8HxyUEgoFWGKgh4SVe9R1Ezrkjk2FkF5Wxmu81cUm7Q2visozJGTvuq03gy 5TnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=es+7eu/1JIS0u6NUqnkeBJLaNZv/3XzsgPMcIjCfPvw=; b=YdXaTvr4Pq7vxIYpg9mUEPx9IEFyrpCUmLcZ3YUNL6Eg0Q7w+dhVIjCrRHyJ83S8aR 8QJsiCsL3WHTeqjfP4TIPPDeb4uBNyW4Fl54SD//N0YLT9c5WvVHhXLBZntGefKSALCz 7+4V5S/C1bR5Ywu3z10jWCJzpL/BQN+asy2OHykRouZT8r8I5G9mOJUBr4AoR4QdCbWv Gh70YVf6MvJ9QLI3nosn63pnROEGkKLMspjcHz3Xq3I8ZO/b+LoTroxNZ8rvAJ9B3DQ1 FtbQco9oVuQjGjFsdaQNy0vmYJVZhv0rcvqz2JnCiJjpyxx5StUVUga3eHEkoMayXWZ5 miyw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=SiSS9BGt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s19si1539027edx.88.2020.11.11.06.08.17; Wed, 11 Nov 2020 06:08:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=SiSS9BGt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726844AbgKKNfN (ORCPT + 99 others); Wed, 11 Nov 2020 08:35:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54118 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726570AbgKKNep (ORCPT ); Wed, 11 Nov 2020 08:34:45 -0500 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C72A0C0613D1; Wed, 11 Nov 2020 05:34:45 -0800 (PST) Received: by mail-qk1-x744.google.com with SMTP id q5so1560962qkc.12; Wed, 11 Nov 2020 05:34:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=es+7eu/1JIS0u6NUqnkeBJLaNZv/3XzsgPMcIjCfPvw=; b=SiSS9BGtwjPR7pjydIAWebSVTiMEer42LpN7EpvhCX58lUuaU+5VL/V+DDUEclGLtr EdcBEjukeRJlEPueY+LcsN/JpriAjIIsc9XUZ8HS2rH0qqJR20vP48D53z0KxGGFw6aI tlyC7/8tNSL95t5fvWWV8XPYP6g7ssIFLJknQViYrRiWAd2Q+Ry08GJAb8cTJHI8pCK9 buTguV8FlprsoG+3jxCMVQxLj4MYGR2vub+bz/qGKFmULijoRwvTIkIDNrV7Wg0W21m6 +w7ArHtUx7CSn0LOJewHqljJ3+K6Jej2SPqV7G9EAbyec7KPiSCxZa+a5LHijsgcUkh0 +q8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=es+7eu/1JIS0u6NUqnkeBJLaNZv/3XzsgPMcIjCfPvw=; b=fXqUBRubeWFkcNyFH9ZJ8rlgnKKlNt5tAEAE7eTRIhufWTSH0y6YxMRWle1ug/WLuC XT09aLFa1y6zXKH+44hPCUM3jAbqmwY6LT8Ezm33TL76ps7YAwO1RU0uRKUSCPlLB0UT ztuo+M8ZqlfOdkDtSV9GxRsq+7p/ij/rOQrZEHFwiau5veLrVi95NQgvZICA3dFi8XWE jEasHOoWq8VHTkOJMFXzV3PEzaxeyP3WD9QfZiyBzk/BUXrpZvIX2zr7DzNQqdFcnK+F 20UdOUlzyx/ZCn6Wf2PTkMkt0Nu3EMN/fglxMpg2aEc31wGD4Er9pKJKWs7c8AoxuP7t N8lQ== X-Gm-Message-State: AOAM533M3BtKRrHkEEAwOehTIaHMGRFwQJh/QsczsKEQDxPdNhroP/RC VeQwSWxjFKiloDOlEZ4ptzU= X-Received: by 2002:a37:64d4:: with SMTP id y203mr23844160qkb.150.1605101685029; Wed, 11 Nov 2020 05:34:45 -0800 (PST) Received: from localhost.localdomain (host-173-230-99-154.tnkngak.clients.pavlovmedia.com. [173.230.99.154]) by smtp.gmail.com with ESMTPSA id r190sm1997814qkf.101.2020.11.11.05.34.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 05:34:44 -0800 (PST) From: YiFei Zhu To: containers@lists.linux-foundation.org Cc: YiFei Zhu , linux-csky@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-kernel@vger.kernel.org, Aleksa Sarai , Andrea Arcangeli , Andy Lutomirski , David Laight , Dimitrios Skarlatos , Giuseppe Scrivano , Hubertus Franke , Jack Chen , Jann Horn , Josep Torrellas , Kees Cook , Tianyin Xu , Tobin Feldman-Fitzthum , Tycho Andersen , Valentin Rothberg , Will Drewry Subject: [PATCH seccomp v2 8/8] seccomp/cache: Report cache data through /proc/pid/seccomp_cache Date: Wed, 11 Nov 2020 07:33:54 -0600 Message-Id: <94e663fa53136f5a11f432c661794d1ee7060779.1605101222.git.yifeifz2@illinois.edu> X-Mailer: git-send-email 2.29.2 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: YiFei Zhu Currently the kernel does not provide an infrastructure to translate architecture numbers to a human-readable name. Translating syscall numbers to syscall names is possible through FTRACE_SYSCALL infrastructure but it does not provide support for compat syscalls. This will create a file for each PID as /proc/pid/seccomp_cache. The file will be empty when no seccomp filters are loaded, or be in the format of: where ALLOW means the cache is guaranteed to allow the syscall, and filter means the cache will pass the syscall to the BPF filter. For the docker default profile on x86_64 it looks like: x86_64 0 ALLOW x86_64 1 ALLOW x86_64 2 ALLOW x86_64 3 ALLOW [...] x86_64 132 ALLOW x86_64 133 ALLOW x86_64 134 FILTER x86_64 135 FILTER x86_64 136 FILTER x86_64 137 ALLOW x86_64 138 ALLOW x86_64 139 FILTER x86_64 140 ALLOW x86_64 141 ALLOW [...] This file is guarded by CONFIG_SECCOMP_CACHE_DEBUG with a default of N because I think certain users of seccomp might not want the application to know which syscalls are definitely usable. For the same reason, it is also guarded by CAP_SYS_ADMIN. Suggested-by: Jann Horn Link: https://lore.kernel.org/lkml/CAG48ez3Ofqp4crXGksLmZY6=fGrF_tWyUCg7PBkAetvbbOPeOA@mail.gmail.com/ Signed-off-by: YiFei Zhu --- arch/Kconfig | 15 +++++++++++ fs/proc/base.c | 6 +++++ include/linux/seccomp.h | 7 +++++ kernel/seccomp.c | 59 +++++++++++++++++++++++++++++++++++++++++ 4 files changed, 87 insertions(+) diff --git a/arch/Kconfig b/arch/Kconfig index 56b6ccc0e32d..6e2eb7171da0 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -514,6 +514,21 @@ config SECCOMP_FILTER See Documentation/userspace-api/seccomp_filter.rst for details. +config SECCOMP_CACHE_DEBUG + bool "Show seccomp filter cache status in /proc/pid/seccomp_cache" + depends on SECCOMP + depends on SECCOMP_FILTER && !HAVE_SPARSE_SYSCALL_NR + depends on PROC_FS + help + This enables the /proc/pid/seccomp_cache interface to monitor + seccomp cache data. The file format is subject to change. Reading + the file requires CAP_SYS_ADMIN. + + This option is for debugging only. Enabling presents the risk that + an adversary may be able to infer the seccomp filter logic. + + If unsure, say N. + config HAVE_ARCH_STACKLEAK bool help diff --git a/fs/proc/base.c b/fs/proc/base.c index 0f707003dda5..d652f9dbaecc 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -3261,6 +3261,9 @@ static const struct pid_entry tgid_base_stuff[] = { #ifdef CONFIG_PROC_PID_ARCH_STATUS ONE("arch_status", S_IRUGO, proc_pid_arch_status), #endif +#ifdef CONFIG_SECCOMP_CACHE_DEBUG + ONE("seccomp_cache", S_IRUSR, proc_pid_seccomp_cache), +#endif }; static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx) @@ -3590,6 +3593,9 @@ static const struct pid_entry tid_base_stuff[] = { #ifdef CONFIG_PROC_PID_ARCH_STATUS ONE("arch_status", S_IRUGO, proc_pid_arch_status), #endif +#ifdef CONFIG_SECCOMP_CACHE_DEBUG + ONE("seccomp_cache", S_IRUSR, proc_pid_seccomp_cache), +#endif }; static int proc_tid_base_readdir(struct file *file, struct dir_context *ctx) diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index 02aef2844c38..76963ec4641a 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -121,4 +121,11 @@ static inline long seccomp_get_metadata(struct task_struct *task, return -EINVAL; } #endif /* CONFIG_SECCOMP_FILTER && CONFIG_CHECKPOINT_RESTORE */ + +#ifdef CONFIG_SECCOMP_CACHE_DEBUG +struct seq_file; + +int proc_pid_seccomp_cache(struct seq_file *m, struct pid_namespace *ns, + struct pid *pid, struct task_struct *task); +#endif #endif /* _LINUX_SECCOMP_H */ diff --git a/kernel/seccomp.c b/kernel/seccomp.c index d8cf468dbe1e..76f524e320b1 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -553,6 +553,9 @@ void seccomp_filter_release(struct task_struct *tsk) { struct seccomp_filter *orig = tsk->seccomp.filter; + /* We are effectively holding the siglock by not having any sighand. */ + WARN_ON(tsk->sighand != NULL); + /* Detach task from its filter tree. */ tsk->seccomp.filter = NULL; __seccomp_filter_release(orig); @@ -2335,3 +2338,59 @@ static int __init seccomp_sysctl_init(void) device_initcall(seccomp_sysctl_init) #endif /* CONFIG_SYSCTL */ + +#ifdef CONFIG_SECCOMP_CACHE_DEBUG +/* Currently CONFIG_SECCOMP_CACHE_DEBUG implies SECCOMP_ARCH_NATIVE */ +static void proc_pid_seccomp_cache_arch(struct seq_file *m, const char *name, + const void *bitmap, size_t bitmap_size) +{ + int nr; + + for (nr = 0; nr < bitmap_size; nr++) { + bool cached = test_bit(nr, bitmap); + char *status = cached ? "ALLOW" : "FILTER"; + + seq_printf(m, "%s %d %s\n", name, nr, status); + } +} + +int proc_pid_seccomp_cache(struct seq_file *m, struct pid_namespace *ns, + struct pid *pid, struct task_struct *task) +{ + struct seccomp_filter *f; + unsigned long flags; + + /* + * We don't want some sandboxed process to know what their seccomp + * filters consist of. + */ + if (!file_ns_capable(m->file, &init_user_ns, CAP_SYS_ADMIN)) + return -EACCES; + + if (!lock_task_sighand(task, &flags)) + return -ESRCH; + + f = READ_ONCE(task->seccomp.filter); + if (!f) { + unlock_task_sighand(task, &flags); + return 0; + } + + /* prevent filter from being freed while we are printing it */ + __get_seccomp_filter(f); + unlock_task_sighand(task, &flags); + + proc_pid_seccomp_cache_arch(m, SECCOMP_ARCH_NATIVE_NAME, + f->cache.allow_native, + SECCOMP_ARCH_NATIVE_NR); + +#ifdef SECCOMP_ARCH_COMPAT + proc_pid_seccomp_cache_arch(m, SECCOMP_ARCH_COMPAT_NAME, + f->cache.allow_compat, + SECCOMP_ARCH_COMPAT_NR); +#endif /* SECCOMP_ARCH_COMPAT */ + + __put_seccomp_filter(f); + return 0; +} +#endif /* CONFIG_SECCOMP_CACHE_DEBUG */ -- 2.29.2