Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp5313054yba; Wed, 10 Apr 2019 16:44:50 -0700 (PDT) X-Google-Smtp-Source: APXvYqyuff9bmU6DwMBk13zPSfQMMN3P4ytTPY+A03x8nXDYG/ARsEggDzD/SEyX1UB/ED1H0ZJL X-Received: by 2002:a17:902:9048:: with SMTP id w8mr47596946plz.195.1554939890272; Wed, 10 Apr 2019 16:44:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554939890; cv=none; d=google.com; s=arc-20160816; b=eEVbuQUbM+a2zwagxqAx5Buk8m4+yqchCFqGHw2x4zQa8nK52HpV6vcFthe1+pFUcF EdGFHpQHrmMTgcUGdkDqjAjGoVIyNvkWF+Zpg0UwmNRyhkw6P7vePKmq2xncxT198ewc lAzBsiEJ8BYZjNjxvyQf07WsDw7NdTOcZS+NqHK3b83VZvvlFJKMARgj8KLN02E8tc2i cAhyEeiqBbWkEvszxWPtK+L2KySR4ioheq8rTdvneBmwfWzq9hk9G1Aha9NBHtJM3YF/ sHwC73vTJFIjPssBSVCSqAybcmwWrdcOL0LD5dDw5mhahIlgwg7SjUTSw3WeHL6sMgGi KliA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=lVqryB6S1f+7+dRTAY6Ggc/gqXVR5boZXsAqEDKVXmc=; b=n8XDdiDdKf6b2kFxNo31wncR1kzwRQZY9C3hM2qm2rCbuF5DjleoK0PNIEG+36UTJA ePAn85d5Z7xh3DmjRZd9AtqJiAi4bTHO/8IFFbptl3Rc6nF+JmlTUEBpN6ftrAh0L2Sn O5xCWylfhUiTWqz6yH4z6SB4tba8fsICYLvGm2xaOgBrTQZYaVVrONHodwszMBy4ku0M 39IXJ0Z1SwqoOhwcGwI/8BZaqNDBpX7zZuloTEtRDHc0SDk4Yw3UkyBMAbgdjMLAc2jF +aabiXlZUFOS+W/WK/B2uQSsTfw3qRWXMmZ63tIF1orxlLpMP/97yuQ0fEDpMSDGjrtj E0AA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@brauner.io header.s=google header.b=XMLCDCS5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 31si6842193ple.102.2019.04.10.16.44.34; Wed, 10 Apr 2019 16:44:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@brauner.io header.s=google header.b=XMLCDCS5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726897AbfDJXnt (ORCPT + 99 others); Wed, 10 Apr 2019 19:43:49 -0400 Received: from mail-ed1-f68.google.com ([209.85.208.68]:42293 "EHLO mail-ed1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726833AbfDJXnj (ORCPT ); Wed, 10 Apr 2019 19:43:39 -0400 Received: by mail-ed1-f68.google.com with SMTP id x61so3529354edc.9 for ; Wed, 10 Apr 2019 16:43:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brauner.io; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=lVqryB6S1f+7+dRTAY6Ggc/gqXVR5boZXsAqEDKVXmc=; b=XMLCDCS5QTv8GV+Rvjug+XJQCaQmurB5fuOwa8c+rlrv/XCbZMXhlzI/6M3wT1TjLz 6PO63yyupurGt+7yYyEE4+qyn0a3vdFNK1RSXbmsh6aQ2+u0/U9Vwt+cegugMCJlZqmu kyk4N173v5qs84AQuXRxZti1hvWAV2BgDST1rh1CEZqIa7PBbney1SQSFbQvL9++EQNi tp5V5g2aSNfemBvKlwgoMXPZqUn7yFbndClFqiRqO7TlvDGIptpJH0FPowlTAQjF/+tX /kLi5Nk2H9QRYfMSaKnNF/Uzt9lVuMNQ7nBjWklfBRobCijcimLjY13DALcOjwRrxAIN CJ4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=lVqryB6S1f+7+dRTAY6Ggc/gqXVR5boZXsAqEDKVXmc=; b=UsHe/X7q4O6u5ry4ukTmCDVhICiRZ4zP+3JdMKOT8tn3OyMbzFdcbc/3jLTvpMvINg J5wTTFxqRCtmBF2jqduRRThFQwitZ1sOZXZQUnE9B1wo2miV3FdNaF1VuSs5aoclwVg0 Au+lcfPaXDIxUZWT3011gNXYqLf74eR4nH8iaQYXrtaCaaNY8Bm0cOQ8wYcn1dIqgUOo o4Ck0+0MBvafqMoYNlEX6/gsMpX3AJr1BIjWtlVP3CG5syjTlEJeK4eOBGp8ZcmBw6ID cJhQb7B29IxoWU3AiuhKaqjdlZTyHZzXH7IkfEMf6uOE6IqIoiYc8VMycw598Xd4KB8E ZHEw== X-Gm-Message-State: APjAAAUhCKd5U3IZjXmik5q8RCBEAz/bG50FkpJ8Ink9H6KdkUIz6nkn rAD95OjmbqTPTHWPdFFrYCZIBQ== X-Received: by 2002:a50:fa4c:: with SMTP id c12mr22849799edq.37.1554939816757; Wed, 10 Apr 2019 16:43:36 -0700 (PDT) Received: from localhost.localdomain ([212.91.227.56]) by smtp.gmail.com with ESMTPSA id f8sm4833015edt.36.2019.04.10.16.43.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 10 Apr 2019 16:43:36 -0700 (PDT) From: Christian Brauner To: torvalds@linux-foundation.org, viro@zeniv.linux.org.uk, jannh@google.com, dhowells@redhat.com, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org Cc: serge@hallyn.com, luto@kernel.org, arnd@arndb.de, ebiederm@xmission.com, keescook@chromium.org, adobriyan@gmail.com, tglx@linutronix.de, mtk.manpages@gmail.com, bl0pbl33p@gmail.com, ldv@altlinux.org, akpm@linux-foundation.org, oleg@redhat.com, cyphar@cyphar.com, joel@joelfernandes.org, dancol@google.com, Christian Brauner Subject: [RFC-2 PATCH 4/4] samples: show race-free pidfd metadata access Date: Thu, 11 Apr 2019 01:40:45 +0200 Message-Id: <20190410234045.29846-6-christian@brauner.io> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190410234045.29846-1-christian@brauner.io> References: <20190410234045.29846-1-christian@brauner.io> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is an sample program to show userspace how to get race-free access to process metadata from a pidfd. It is really not that difficult and instead of burdening the kernel with this task by using fds to /proc/ we can simply add a helper to libc that does it for the user. Signed-off-by: Christian Brauner Signed-off-by: Jann Horn Cc: Arnd Bergmann Cc: "Eric W. Biederman" Cc: Kees Cook Cc: Alexey Dobriyan Cc: Thomas Gleixner Cc: David Howells Cc: "Michael Kerrisk (man-pages)" Cc: Jonathan Kowalski Cc: "Dmitry V. Levin" Cc: Andy Lutomirsky Cc: Andrew Morton Cc: Oleg Nesterov Cc: Aleksa Sarai Cc: Linus Torvalds Cc: Al Viro --- samples/Makefile | 2 +- samples/pidfd/Makefile | 6 ++ samples/pidfd/pidfd-metadata.c | 169 +++++++++++++++++++++++++++++++++ 3 files changed, 176 insertions(+), 1 deletion(-) create mode 100644 samples/pidfd/Makefile create mode 100644 samples/pidfd/pidfd-metadata.c diff --git a/samples/Makefile b/samples/Makefile index b1142a958811..fadadb1c3b05 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -3,4 +3,4 @@ obj-$(CONFIG_SAMPLES) += kobject/ kprobes/ trace_events/ livepatch/ \ hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/ \ configfs/ connector/ v4l/ trace_printk/ \ - vfio-mdev/ statx/ qmi/ binderfs/ + vfio-mdev/ statx/ qmi/ binderfs/ pidfd/ diff --git a/samples/pidfd/Makefile b/samples/pidfd/Makefile new file mode 100644 index 000000000000..0ff97784177a --- /dev/null +++ b/samples/pidfd/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0 + +hostprogs-y := pidfd-metadata +always := $(hostprogs-y) +HOSTCFLAGS_pidfd-metadata.o += -I$(objtree)/usr/include +all: pidfd-metadata diff --git a/samples/pidfd/pidfd-metadata.c b/samples/pidfd/pidfd-metadata.c new file mode 100644 index 000000000000..c46c6c34a012 --- /dev/null +++ b/samples/pidfd/pidfd-metadata.c @@ -0,0 +1,169 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifndef CLONE_PIDFD +#define CLONE_PIDFD 0x00001000 +#endif + +static int raw_clone_pidfd(void) +{ + unsigned long flags = CLONE_PIDFD; + +#if defined(__s390x__) || defined(__s390__) || defined(__CRIS__) + /* On s390/s390x and cris the order of the first and second arguments + * of the system call is reversed. + */ + return (int)syscall(__NR_clone, NULL, flags | SIGCHLD); +#elif defined(__sparc__) && defined(__arch64__) + { + /* + * sparc64 always returns the other process id in %o0, and a + * boolean flag whether this is the child or the parent in %o1. + * Inline assembly is needed to get the flag returned in %o1. + */ + int in_child; + int child_pid; + asm volatile("mov %2, %%g1\n\t" + "mov %3, %%o0\n\t" + "mov 0 , %%o1\n\t" + "t 0x6d\n\t" + "mov %%o1, %0\n\t" + "mov %%o0, %1" + : "=r"(in_child), "=r"(child_pid) + : "i"(__NR_clone), "r"(flags | SIGCHLD) + : "%o1", "%o0", "%g1"); + + if (in_child) + return 0; + else + return child_pid; + } +#elif defined(__ia64__) + /* On ia64 the stack and stack size are passed as separate arguments. */ + return (int)syscall(__NR_clone, flags | SIGCHLD, NULL, prctl_arg(0)); +#else + return (int)syscall(__NR_clone, flags | SIGCHLD, NULL); +#endif +} + +static inline int sys_pidfd_send_signal(int pidfd, int sig, siginfo_t *info, + unsigned int flags) +{ + return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags); +} + +static int pidfd_metadata_fd(int pidfd) +{ + int procfd, ret; + char path[100]; + FILE *f; + size_t n = 0; + char *line = NULL; + + snprintf(path, sizeof(path), "/proc/self/fdinfo/%d", pidfd); + + f = fopen(path, "re"); + if (!f) + return -1; + + ret = 0; + while (getline(&line, &n, f) != -1) { + char *numstr; + size_t len; + + if (strncmp(line, "Pid:\t", 5)) + continue; + + numstr = line + 5; + len = strlen(numstr); + if (len > 0 && numstr[len - 1] == '\n') + numstr[len - 1] = '\0'; + ret = snprintf(path, sizeof(path), "/proc/%s", numstr); + break; + } + free(line); + fclose(f); + + if (!ret) { + errno = ENOENT; + warn("Failed to parse pid from fdinfo\n"); + return -1; + } + + procfd = open(path, O_DIRECTORY | O_RDONLY | O_CLOEXEC); + if (procfd < 0) { + warn("Failed to open %s\n", path); + return -1; + } + + /* + * Verify that the pid has not been recycled and our /proc/ handle + * is still valid. + */ + if (sys_pidfd_send_signal(pidfd, 0, NULL, 0) < 0) { + /* process does not exist */ + if (errno == ESRCH) { + warn("The pid was recycled\n"); + close(procfd); + return -1; + } + + /* just not allowed to signal it */ + } + + return procfd; +} + +int main(int argc, char *argv[]) +{ + int procfd, ret = EXIT_FAILURE; + ssize_t bytes; + char buf[4096] = { 0 }; + + int pidfd = raw_clone_pidfd(); + if (pidfd < 0) + return -1; + + if (pidfd == 0) { + printf("%d\n", getpid()); + exit(EXIT_SUCCESS); + } + + procfd = pidfd_metadata_fd(pidfd); + close(pidfd); + if (procfd < 0) + goto out; + + int statusfd = openat(procfd, "status", O_RDONLY | O_CLOEXEC); + close(procfd); + if (statusfd < 0) + goto out; + + bytes = read(statusfd, buf, sizeof(buf)); + if (bytes > 0) + bytes = write(STDOUT_FILENO, buf, bytes); + close(statusfd); + +out: + (void)wait(NULL); + if (bytes < 0 || ret) + exit(EXIT_FAILURE); + + exit(EXIT_SUCCESS); +} -- 2.21.0