Received: by 10.213.65.68 with SMTP id h4csp580780imn; Tue, 13 Mar 2018 13:49:32 -0700 (PDT) X-Google-Smtp-Source: AG47ELvaXnbAoprcCWfCuOrKWnMaM7b69z4+S9qdn2wKEdzJKpMnBYbWS7G4jf6aGVbuTlh8xURf X-Received: by 10.98.156.16 with SMTP id f16mr1881109pfe.180.1520974172642; Tue, 13 Mar 2018 13:49:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520974172; cv=none; d=google.com; s=arc-20160816; b=moP7C5DNbuud5tuGlM9Dqz6z8Ru8D3pxDGCvlPd9BPlO3iF8h7tRURW+UIcFg3uisP y+S33uj3WhpTTAm5bSBF3da2ZEkYEihbeflHd9BjvSZueBGYvudmFCucs051nZ3oEy0W 82KmtSHFxXCT2Lar3HaXRM9KF+hQJe5CdV7cbYoQvbj/C+f7AJhRSqwVd/xKTmaXHLiA AAAdSVOybYqLT/YSQvcZHPnWyrqH98m4AisGcbW+O2LtvqEmKZqlDs30AfvweP0378w/ gEjVHDN1QolDnTsUf5uDlg+ApoSOt1F+rpvpkNqXcpAGv/63+/IatwjTDj2yKw9G7pUK ZODQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=O/WHb0gPot0oBkL3nqqkMVgRQXojFK2Vkg/tZ+QPK04=; b=gq8xNTsz2Y1+nZCaB18k4C4HieUdkHwPYxkdeHSpCI8hDhhBMQMHBM8zuwTSIaqKGK EiyyYUAwWg8tpDFQ9xcUhElNTP81cgkEtJp/GVTScbtuUTQ+pI213jUNJ09Rgd4nxbL3 Nu8HRSxcDsqKRiQgHg+4H9kk8NMM45Y3B166ojZhMXnpchqmcFxVZ+022JekiJIr/bII McaxXPUfRZtxV1qFJbQHmo4YKPn2zswC+OA2NvCd8lIZu9hls/hJeDYD0xm7BZ/SoLui 1ojy/JT1BxMIcihy4Ftxnwn1dMW6Ug74M1DKwKQ32B/b+oc3DCTO/ssXa9atg4hLNRS7 AeuQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=MADkjNDy; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t3-v6si697295plm.240.2018.03.13.13.49.18; Tue, 13 Mar 2018 13:49:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=MADkjNDy; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753142AbeCMUsH (ORCPT + 99 others); Tue, 13 Mar 2018 16:48:07 -0400 Received: from mail-ot0-f196.google.com ([74.125.82.196]:45202 "EHLO mail-ot0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752749AbeCMUsF (ORCPT ); Tue, 13 Mar 2018 16:48:05 -0400 Received: by mail-ot0-f196.google.com with SMTP id w12-v6so1031704ote.12 for ; Tue, 13 Mar 2018 13:48:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=O/WHb0gPot0oBkL3nqqkMVgRQXojFK2Vkg/tZ+QPK04=; b=MADkjNDyBD5g3foV5/qoFI8Kx3LBg5HDJnVGUDOXIcWu07xeISsoQqpJsg1bXnmvhg yhOOUKfW1WlYRtR09lbDO0GUsWmcuE+Xk+gzQ/wbjmQqpgtH+UQupjnB/Mlr4/dlS0sT Wek6VdxPtPZ7J2RNB0UXzGofoWzPdWQo6i8VM/ZEcXVxUsmNNj52HU2T2IclM7nFGwfk 8llEd+dKob5SKqvHo5+z5ZGitONRBbrVEuruL/JOXvsCu7KWJaO/ELWdLPkvs4/490Le iKPrdmq9n1miCedQnmK+pB6KxIi1F6VtlAErR1+6pTawFdLFk+4K/snMAl91t4Ut4qWT cfxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=O/WHb0gPot0oBkL3nqqkMVgRQXojFK2Vkg/tZ+QPK04=; b=Ex/hvKKEbXu0KgWDmK2n0zRirpW3mc6cRxwazB9AQ3OygylIs4BlDqjPv9rigNxDS/ 4p183jsN7BeC6diIMCocLh5vvHzHqx8/oD6Azd2715YqJk/Dw563bjcWCDczJ+k3H/lU ICWfp5eLuEFubxBQeja/MwvzaBlhwxFqIf+WcG1xQX4/37o1SkldOCzorHF8M49k/Emu PNqKd2JWqUKoSHDzHH5KkNdX4dHKWQiDTe5FpSPvPdLCnKfbx+PNoWUGh/9Apn4Sp0vH yW1SIu3IqN8S1kpZQLpsHoJQUN3Sx4PKj/nTTqA+A69u1A+iZzNUw9M/8mXoZfRlZbTw hRLg== X-Gm-Message-State: AElRT7GeVIUOlcd6klB9bKyXZDVjw7d+Fjlcn4J8SGzX0YUe8ABK2vig ejU/0a2p9JqVuUnWhScAN9p9mIliLkU182KhUCbH4w== X-Received: by 10.157.22.213 with SMTP id s21mr1472524ots.115.1520974084024; Tue, 13 Mar 2018 13:48:04 -0700 (PDT) MIME-Version: 1.0 Received: by 10.74.39.84 with HTTP; Tue, 13 Mar 2018 13:47:43 -0700 (PDT) In-Reply-To: <1520875093-18174-1-git-send-email-nagarathnam.muthusamy@oracle.com> References: <1520875093-18174-1-git-send-email-nagarathnam.muthusamy@oracle.com> From: Jann Horn Date: Tue, 13 Mar 2018 13:47:43 -0700 Message-ID: Subject: Re: [RESEND RFC] translate_pid API To: Nagarathnam Muthusamy Cc: kernel list , Linux API , Konstantin Khlebnikov , Nagarajan.Muthukrishnan@oracle.com, Prakash Sangappa , Andy Lutomirski , Andrew Morton , Oleg Nesterov , Serge Hallyn , "Eric W. Biederman" , Eugene Syromiatnikov , xemul@parallels.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 12, 2018 at 10:18 AM, wrote: > Resending the RFC with participants of previous discussions > in the list. > > Following patch which is a variation of a solution discussed > in https://lwn.net/Articles/736330/ provides the users of > pid namespace, the functionality of pid translation between > namespaces using a namespace identifier. The topic of > pid translation has been discussed in the community few times > but there has always been a resistance to adding new solution > for this problem. > I will outline the planned usecase of pid namespace by oracle > database and explain why any of the existing solution cannot > be used to solve their problem. > > Consider a system in which several PID namespaces with multiple > nested levels exists in parallel with monitor processes managing > all the namespaces. PID translation is required for controlling > and accessing information about the processes by the monitors > and other processes down the hierarchy of namespaces. Controlling > primarily involves sending signals or using ptrace by a process in > parent namespace on any of the processes in its child namespace. > Accessing information deals with the reading /proc//* files > of processes in child namespace. None of the processes have > root/CAP_SYS_ADMIN privileges. How are you dealing with PID reuse? [...] > diff --git a/fs/nsfs.c b/fs/nsfs.c > index 36b0772..c635465 100644 > --- a/fs/nsfs.c > +++ b/fs/nsfs.c > @@ -222,8 +222,13 @@ int ns_get_name(char *buf, size_t size, struct task_struct *task, > const char *name; > ns = ns_ops->get(task); > if (ns) { > - name = ns_ops->real_ns_name ? : ns_ops->name; > - res = snprintf(buf, size, "%s:[%u]", name, ns->inum); > + if (!strcmp(ns_ops->name, "pidns_id")) { Wouldn't it be cleaner to check for "ns_ops==&pidns_id_operations"? > + res = snprintf(buf, size, "[%llu]", > + (unsigned long long)ns->ns_id); > + } else { > + name = ns_ops->real_ns_name ? : ns_ops->name; > + res = snprintf(buf, size, "%s:[%u]", name, ns->inum); > + } > ns_ops->put(ns); > } > return res; [...] > diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h > index 49538b1..11d1d57 100644 > --- a/include/linux/pid_namespace.h > +++ b/include/linux/pid_namespace.h > @@ -11,6 +11,7 @@ > #include > #include > #include > +#include > > > struct fs_pin; > @@ -44,6 +45,8 @@ struct pid_namespace { > kgid_t pid_gid; > int hide_pid; > int reboot; /* group exit code if this pidns was rebooted */ > + struct hlist_bl_node node; > + atomic_t lookups_pending; > struct ns_common ns; > } __randomize_layout; > [...] > diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c > index 0b53eef..ff83aa8 100644 > --- a/kernel/pid_namespace.c > +++ b/kernel/pid_namespace.c [...] > @@ -159,6 +201,30 @@ static void delayed_free_pidns(struct rcu_head *p) > > static void destroy_pid_namespace(struct pid_namespace *ns) > { > + struct pid_namespace *ph; > + struct hlist_bl_head *head; > + struct hlist_bl_node *dup_node; > + > + /* > + * Remove the namespace structure from hash table so > + * now new lookups can start on it. s/now new/no new/ [...] > @@ -474,9 +551,116 @@ static struct user_namespace *pidns_owner(struct ns_common *ns) > .get_parent = pidns_get_parent, > }; > > +/* > + * translate_pid - convert pid in source pid-ns into target pid-ns. > + * @pid: pid for translation > + * @source: pid-ns id > + * @target: pid-ns id > + * > + * Return pid in @target pid-ns, zero if task have no pid there, > + * or -ESRCH of task with @pid is not found in @source pid-ns. s/of/if/ > + */ > +SYSCALL_DEFINE3(translate_pid, pid_t, pid, u64, source, > + u64, target) > +{ > + struct pid_namespace *source_ns = NULL, *target_ns = NULL; > + struct pid *struct_pid; > + struct pid_namespace *ph; > + struct hlist_bl_head *shead = NULL; > + struct hlist_bl_head *thead = NULL; > + struct hlist_bl_node *dup_node; > + pid_t result; > + > + if (!source) { > + source_ns = &init_pid_ns; > + } else { > + shead = pid_ns_hash_head(pid_ns_hash, source); > + hlist_bl_lock(shead); > + hlist_bl_for_each_entry(ph, dup_node, shead, node) { > + if (source == ph->ns.ns_id) { > + source_ns = ph; > + break; > + } > + } > + if (!source_ns) { > + hlist_bl_unlock(shead); > + return -EINVAL; > + } > + } > + if (!ptrace_may_access(source_ns->child_reaper, > + PTRACE_MODE_READ_FSCREDS)) { AFAICS this proposal breaks the visibility restrictions that namespaces normally create. If there are two namespaces-based containers that use the same UID range, I don't think they should be able to learn information about each other, such as which PIDs are in use in the other container; but as far as I can tell, your proposal makes it possible to do that (unless an LSM or so is interfering). I would prefer it if this API required visibility of the targeted PID namespaces in the caller's PID namespace. When doing ptrace access checks, please use the real creds in syscalls like this one, not the fs creds. The fs creds are for filesystem syscalls (in particular sys_open()), not for specialized syscalls like ptrace() or this one.