Received: by 10.213.65.68 with SMTP id h4csp599916imn; Tue, 13 Mar 2018 14:30:19 -0700 (PDT) X-Google-Smtp-Source: AG47ELur0i7wrWewTWqB2aNxrqcNzUI0EymWhVdF4sOJh998h9IPoyoPpOU8r514KYGZgI4imYRX X-Received: by 2002:a17:902:934a:: with SMTP id g10-v6mr1833624plp.67.1520976619515; Tue, 13 Mar 2018 14:30:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520976619; cv=none; d=google.com; s=arc-20160816; b=s4yHqMh0KR68H6KMZLAXGpaIhzkXjTFJnUOwQtN95KBqiD4+59QPqNn0oOhUFBG0+S 0EYTpXPsPLrVJOd9e/pAsxlgrJ4Vb2g86uPI+ZsNLd94+9GjE/bUdWrUa+COk9heQm0H hwrSTIOpN4r6jmLJ5qBcBrmgjPnnCe7/GuGJbsuvMNY1A1tNwqU+VqLu4RvKhLvTesxj yJR5IMScGxWUBgwCr1UxTLXxSnOU/LCxqt3oP/bH348sDjZVCujzhxCfg8PhGUelvKRe zZv/cRXc6aIfnTOuSvZAsKGRmatEGhuL1u12Um7fZr43HTqocBypZe8gdCVbMwoZB8Ta hc6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=rsaJC+YCApNc73plO1OsOUpocaH87pk7+WfzzavQG7E=; b=gKSmbp69MfKVZO9UWEJkfBsoRk3CaFlMyd0Dzts+zty8DhTh+0WTm5v/OLT+uGoU1k 3h1FqqnWDSTci8ZA8fxxF7zzwl4qQDkyu62IRvUmkW7S+qKkM2s48kiY7GfneqEgPyT7 TlGkZHTToYyPjykB1R9bfRky5X+CmFl5H+HzdyB2EUKARFhKeFB225jj4L7RrtdgFKBl xBgOuo93qLxffByg1fjtDYHRRRKM1keBNImgEwyE3DmrSScG0jaQuoMZtE9wdI7VCVoK pUpi4OWVyF/Q9tZrwGtr7Qp4OraOGDrS+7DjpJYncoaFZsWpZncAh6904hkj5PCdf4iq 9sbQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=HTRR25Da; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k20-v6si705430pls.294.2018.03.13.14.30.04; Tue, 13 Mar 2018 14:30:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=HTRR25Da; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932638AbeCMV3A (ORCPT + 99 others); Tue, 13 Mar 2018 17:29:00 -0400 Received: from mail-oi0-f65.google.com ([209.85.218.65]:42306 "EHLO mail-oi0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932372AbeCMV25 (ORCPT ); Tue, 13 Mar 2018 17:28:57 -0400 Received: by mail-oi0-f65.google.com with SMTP id c18so968634oiy.9 for ; Tue, 13 Mar 2018 14:28:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=rsaJC+YCApNc73plO1OsOUpocaH87pk7+WfzzavQG7E=; b=HTRR25DaFxBdlC2SjKoAbe+aX6xgu4rOhm+pw1cXTJ1BuWmEBq8n8cUNztl216h6xB QsO6y0D5XX6ojhdwzNRVYq8tq04f13QMlCAaxqJuNzIqf25xLrdtH2eq1jVuNbv4H/YK xvJfrSFFxw/OTCge0XqUS697uxR+rGdbITVfNH67nyfBNJgR1KJF3AVwftn3f7uLkBSX Ir72nLZj9es+C5YE9J7c37R1vsYLm8lZQMjDeY5weLF3yZEj8mg69Gcu9Mxb3NUipSU4 N4SsPs/e246dJGzi8wylgBnTfb8JElh5u/luRltEoKRiUCz4vYFYc7zwz6IVGpx2+2FW epsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=rsaJC+YCApNc73plO1OsOUpocaH87pk7+WfzzavQG7E=; b=EwsrErcd2KPPBPrOaGTuvzSNHM1/Gv8HkPN7BVyqgojgVPSHpgnyP9fAzx8tIFiUCX kBAdKDpyy0hCmrDyrCvXdNWi5Fe1N/xTu++w15zxvxQ7G6NAOcdVDRwiYtokw47dHZzi K+zGuGYou05xTbNzX1cVbCcwdEmcaLFOFKimMkgjwkC98qkvxCKWrZj1Cl42ac8y6/Hr wGXV3WL1rPNcrOCnY1VVvHtHZDaeqRmJBxVXw7Wm9rZyzrw6kcMSWZ7Qvdjk10fZJQAI MkPWTBF+NJjkONoqOD505tAceuW/N43gHVFOTQX7MvBqCycNu0hD5wIEFTVADewuJLHT imMw== X-Gm-Message-State: AElRT7HN9YqIOUlqnfdN74A6omE3EbOpj2yroKX9+l7jEEPp0xI524j/ 99FsmeWy6u1Ur13XNmH4HCu+JlMuSsYRx+bo/qXflw== X-Received: by 10.202.25.13 with SMTP id l13mr1420816oii.173.1520976535746; Tue, 13 Mar 2018 14:28:55 -0700 (PDT) MIME-Version: 1.0 Received: by 10.74.39.84 with HTTP; Tue, 13 Mar 2018 14:28:35 -0700 (PDT) In-Reply-To: <69f13674-7f84-5dc7-0bd7-e5e65e9cb3b0@oracle.com> References: <1520875093-18174-1-git-send-email-nagarathnam.muthusamy@oracle.com> <69f13674-7f84-5dc7-0bd7-e5e65e9cb3b0@oracle.com> From: Jann Horn Date: Tue, 13 Mar 2018 14:28:35 -0700 Message-ID: Subject: Re: [RESEND RFC] translate_pid API To: Nagarathnam Muthusamy Cc: kernel list , Linux API , Konstantin Khlebnikov , Nagarajan.Muthukrishnan@oracle.com, Prakash Sangappa , Andy Lutomirski , Andrew Morton , Oleg Nesterov , Serge Hallyn , "Eric W. Biederman" , Eugene Syromiatnikov , xemul@parallels.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 13, 2018 at 2:20 PM, Nagarathnam Muthusamy wrote: > On 03/13/2018 01:47 PM, Jann Horn wrote: >> On Mon, Mar 12, 2018 at 10:18 AM, >> wrote: >>> >>> Resending the RFC with participants of previous discussions >>> in the list. >>> >>> Following patch which is a variation of a solution discussed >>> in https://lwn.net/Articles/736330/ provides the users of >>> pid namespace, the functionality of pid translation between >>> namespaces using a namespace identifier. The topic of >>> pid translation has been discussed in the community few times >>> but there has always been a resistance to adding new solution >>> for this problem. >>> I will outline the planned usecase of pid namespace by oracle >>> database and explain why any of the existing solution cannot >>> be used to solve their problem. >>> >>> Consider a system in which several PID namespaces with multiple >>> nested levels exists in parallel with monitor processes managing >>> all the namespaces. PID translation is required for controlling >>> and accessing information about the processes by the monitors >>> and other processes down the hierarchy of namespaces. Controlling >>> primarily involves sending signals or using ptrace by a process in >>> parent namespace on any of the processes in its child namespace. >>> Accessing information deals with the reading /proc//* files >>> of processes in child namespace. None of the processes have >>> root/CAP_SYS_ADMIN privileges. >> >> How are you dealing with PID reuse? > > > We have a monitor process which keeps track of the aliveness of > important processes. When a process dies, monitor makes a note of > it and hence detects if pid is reused. How do you do that in a race-free manner? >>> + */ >>> +SYSCALL_DEFINE3(translate_pid, pid_t, pid, u64, source, >>> + u64, target) >>> +{ >>> + struct pid_namespace *source_ns = NULL, *target_ns = NULL; >>> + struct pid *struct_pid; >>> + struct pid_namespace *ph; >>> + struct hlist_bl_head *shead = NULL; >>> + struct hlist_bl_head *thead = NULL; >>> + struct hlist_bl_node *dup_node; >>> + pid_t result; >>> + >>> + if (!source) { >>> + source_ns = &init_pid_ns; >>> + } else { >>> + shead = pid_ns_hash_head(pid_ns_hash, source); >>> + hlist_bl_lock(shead); >>> + hlist_bl_for_each_entry(ph, dup_node, shead, node) { >>> + if (source == ph->ns.ns_id) { >>> + source_ns = ph; >>> + break; >>> + } >>> + } >>> + if (!source_ns) { >>> + hlist_bl_unlock(shead); >>> + return -EINVAL; >>> + } >>> + } >>> + if (!ptrace_may_access(source_ns->child_reaper, >>> + PTRACE_MODE_READ_FSCREDS)) { >> >> AFAICS this proposal breaks the visibility restrictions that >> namespaces normally create. If there are two namespaces-based >> containers that use the same UID range, I don't think they should be >> able to learn information about each other, such as which PIDs are in >> use in the other container; but as far as I can tell, your proposal >> makes it possible to do that (unless an LSM or so is interfering). I >> would prefer it if this API required visibility of the targeted PID >> namespaces in the caller's PID namespace. > > > I am trying to simulate the same access restrictions allowed > on a process's /proc//ns/pid file. If the translator has > access to /proc//ns/pid file of both source and destination > namespaces, shouldn't it be allowed to translate the pid between > them? But the translator doesn't actually need to have access to those procfs files, right?