Received: by 10.223.164.221 with SMTP id h29csp1074778wrb; Wed, 1 Nov 2017 10:01:52 -0700 (PDT) X-Google-Smtp-Source: ABhQp+Sbw9ze5r07MfyY7X7E/LAiNWiuAy3+lw9S/Tp7bgcJANSM2palsDkDTPER1pZ6qWQlYVtu X-Received: by 10.84.198.164 with SMTP id p33mr244724pld.89.1509555712769; Wed, 01 Nov 2017 10:01:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1509555712; cv=none; d=google.com; s=arc-20160816; b=akcC+D+j786Hf0oZv/aEivNg8+fLeGfOWVPxQPm5jAt/LHJMTjPMk55PLjq6ea85yn z7oPAbK3YX8oN8PIV2l66sOVbqyCIhpPZHN7KOdA1VlP4GsaXHdAf+e0/DaT5eFqUjBS 5g8vPtW+lu0IHvAEOHsui9Vlsief9prxNcjT3z9jrrZ1REPRw2wON0v53uahA4Lk29EZ +NJOQ9DVzTZ3V2PG6Ss61+JfbllQo7EcHxLzJT0USbFY0/Tw7ZPgvnsJzgu1XiQ4LrBw GPtXToR+aEC7s6Uq3GH7TElYaXi/1iYieun5qeoNGJ7xYKLqgj9qe1rkuLdmWvX6scIf WpzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :references:subject:cc:to:mime-version:user-agent:organization:from :date:message-id:arc-authentication-results; bh=VvA/lEYAVHpnDyuUbFeL8fd0woMDy+Mf9xv2jeCld7A=; b=CJhxpfd2Aa8gBm1rJ1sQ5LVuAZnAxNEONvqALH+IrJNOo//a/Cf6CLFXsUhWOplTxH TfMoRswLeuaBC6BwQ8WNOpn7jg1rpSm0ZwamW+mXFe17NwCPI+ftlguSkbJZyX77hwlU PoR0kQDB1PQG7YRnrqk37xflpGjFGEzTUVsC23QIZMpvJW7D5zHMZ7USVvfa/Xtjw5ct jc84EjxIvf1uVnlh+0MmPjs8FFUusC9W27ZQbCc5DrF5NUzoCD1EBS1kJR2O9OCOf1hP OkP52cvurHf/V+/OZLnFK0Wy1D7e4WDWCbJmaEPe8slT/x10oYbfyg5a9kN/MQYXSanj 1SEw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b35si130412plh.621.2017.11.01.10.01.38; Wed, 01 Nov 2017 10:01:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755098AbdKAQ71 (ORCPT + 99 others); Wed, 1 Nov 2017 12:59:27 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:18305 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754817AbdKAQ7Z (ORCPT ); Wed, 1 Nov 2017 12:59:25 -0400 Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id vA1GxHb5011104 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 1 Nov 2017 16:59:18 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id vA1GxHGq014100 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 1 Nov 2017 16:59:17 GMT Received: from abhmp0003.oracle.com (abhmp0003.oracle.com [141.146.116.9]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id vA1GxG8n016123; Wed, 1 Nov 2017 16:59:16 GMT Received: from [10.159.250.212] (/10.159.250.212) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 01 Nov 2017 09:59:16 -0700 Message-ID: <59F9FD8B.8090607@oracle.com> Date: Wed, 01 Nov 2017 09:59:55 -0700 From: nagarathnam muthusamy Organization: Oracle Corporation User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20131028 Thunderbird/17.0.10 MIME-Version: 1.0 To: prakash sangappa CC: Andy Lutomirski , Andrew Morton , Konstantin Khlebnikov , Oleg Nesterov , Linux API , "linux-kernel@vger.kernel.org" , Serge Hallyn , "Eric W. Biederman" , Eugene Syromiatnikov Subject: Re: [PATCH v4] pidns: introduce syscall translate_pid References: <150788678482.924140.11785205105514746135.stgit@buzz> <20171013160514.GA27812@redhat.com> <3bdb5341-9ae6-265a-ce5b-45c2cfc76fad@yandex-team.ru> <20171016143628.b2ef80a9ef16d4345889b4d9@linux-foundation.org> <59E685B3.1000200@oracle.com> <59E689F5.2080706@oracle.com> In-Reply-To: <59E689F5.2080706@oracle.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: aserv0022.oracle.com [141.146.126.234] Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I believe all the questions raised in this thread were answered. Just wondering if there are any outstanding questions? Thanks, Nagarathnam. On 10/17/2017 3:53 PM, prakash sangappa wrote: > > On 10/17/2017 3:40 PM, Andy Lutomirski wrote: >> On Tue, Oct 17, 2017 at 3:35 PM, prakash sangappa >> wrote: >>> On 10/17/2017 3:02 PM, Andy Lutomirski wrote: >>>> On Tue, Oct 17, 2017 at 8:38 AM, Prakash Sangappa >>>> wrote: >>>>> >>>>> On 10/16/17 5:52 PM, Andy Lutomirski wrote: >>>>>> On Mon, Oct 16, 2017 at 3:54 PM, prakash.sangappa >>>>>> wrote: >>>>>>> >>>>>>> On 10/16/2017 03:07 PM, Nagarathnam Muthusamy wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 10/16/2017 02:36 PM, Andrew Morton wrote: >>>>>>>>> On Sat, 14 Oct 2017 11:17:47 +0300 Konstantin Khlebnikov >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>>>>> pid_t translate_pid(pid_t pid, int source, int target); >>>>>>>>>>>>> >>>>>>>>>>>>> This syscall converts pid from source pid-ns into pid in >>>>>>>>>>>>> target >>>>>>>>>>>>> pid-ns. >>>>>>>>>>>>> If pid is unreachable from target pid-ns it returns zero. >>>>>>>>>>>>> >>>>>>>>>>>>> Pid-namespaces are referred file descriptors opened to >>>>>>>>>>>>> proc files >>>>>>>>>>>>> /proc/[pid]/ns/pid or /proc/[pid]/ns/pid_for_children. >>>>>>>>>>>>> Negative >>>>>>>>>>>>> argument >>>>>>>>>>>>> refers to current pid namespace, same as file >>>>>>>>>>>>> /proc/self/ns/pid. >>>>>>>>>>>>> >>>>>>>>>>>>> Kernel expose virtual pids in /proc/[pid]/status:NSpid, but >>>>>>>>>>>>> backward >>>>>>>>>>>>> translation requires scanning all tasks. Also pids could be >>>>>>>>>>>>> translated >>>>>>>>>>>>> by sending them through unix socket between namespaces, this >>>>>>>>>>>>> method >>>>>>>>>>>>> is >>>>>>>>>>>>> slow and insecure because other side is exposed inside pid >>>>>>>>>>>>> namespace. >>>>>>>>>> Andrew asked why we might need this. >>>>>>>>>> >>>>>>>>>> Such conversion is required for interaction between processes >>>>>>>>>> across >>>>>>>>>> pid-namespaces. >>>>>>>>>> For example to identify process in container by pid file looking >>>>>>>>>> from >>>>>>>>>> outside. >>>>>>>>>> >>>>>>>>>> Two years ago I've solved this in project of mine with monstrous >>>>>>>>>> code >>>>>>>>>> which >>>>>>>>>> forks couple times just to convert pid, lucky for me performance >>>>>>>>>> wasn't >>>>>>>>>> important. >>>>>>>>> That's a single user who needed this a single time, and found a >>>>>>>>> userspace-based solution anyway. This is not exactly compelling! >>>>>>>>> >>>>>>>>> Is there a stronger case to be made? How does this change >>>>>>>>> benefit >>>>>>>>> our >>>>>>>>> users? Sell it to us! >>>>>>>> Oracle database is planning to use pid namespace for sandboxing >>>>>>>> database >>>>>>>> instances and they need an API similar to translate_pid to >>>>>>>> effectively >>>>>>>> translate process IDs from other pid namespaces. Prakash (cced in >>>>>>>> mail) >>>>>>>> can >>>>>>>> provide more details on this usecase. >>>>>>> >>>>>>> As Nagarathnam indicated, Oracle Database will be using pid >>>>>>> namespaces >>>>>>> and >>>>>>> needs a direct method of converting pids of processes in the pid >>>>>>> namespace >>>>>>> hierarchy. In this use case multiple >>>>>>> nested PID namespaces will be used. The currently available >>>>>>> mechanism >>>>>>> are >>>>>>> not very efficient for this use case. For ex. as Konstantin >>>>>>> described, >>>>>>> using >>>>>>> /proc//status would require the application to scan all the >>>>>>> pid's >>>>>>> status files to determine the pid of given process in a child >>>>>>> namespace. >>>>>>> >>>>>>> Use of SCM_CREDENTIALS's socket message is another way, which would >>>>>>> require >>>>>>> every process starting inside a pid namespace to send this >>>>>>> message and >>>>>>> the >>>>>>> receiving process in the target namespace would have to save the >>>>>>> converted >>>>>>> pid and reference it. This mechanism becomes cumbersome >>>>>>> especially if >>>>>>> the >>>>>>> application has to deal with multiple nested pid namespaces. >>>>>>> Also, the >>>>>>> Database needs to be able to convert a thread's global >>>>>>> pid(gettid()). >>>>>>> Passing the thread's pid(gettid()) in SCM_CREDENTIALS message >>>>>>> requires >>>>>>> CAP_SYS_ADMIN, which is an issue. >>>>>>> >>>>>>> So having a direct method, like the API that Konstantin is >>>>>>> proposing, >>>>>>> will >>>>>>> work best for the Database >>>>>>> since pid of a process in any of the nested pid namespaces can be >>>>>>> converted >>>>>>> as and when required. I think with the proposed API, the >>>>>>> application >>>>>>> should >>>>>>> be able to convert pid of a process or tid(gettid()) of a thread as >>>>>>> well. >>>>>>> >>>>>> Can you explain what Oracle's database is planning to do with this >>>>>> information? >>>>> >>>>> Database uses the PID to programmatically find out if the >>>>> process/thread >>>>> is >>>>> alive(kill 0) also send signals to the processes requesting it to >>>>> dump >>>>> status/debug information and kill the processes in case of a shutdown >>>>> abort >>>>> of the instance. >>>> What I'm wondering is: how does the caller of kill() end up >>>> controlling a task whose pid it doesn't know in its own namespace? >>> >>> I was generally describing how DB would use the PID of process. The >>> above >>> description >>> was in the case when no namespaces are used. >>> >>> With use of namespaces, the DB would convert the PID of processes >>> inside >>> its children namespaces to PID in its namespace and use that pid to >>> issue >>> kill(). >> Seems vaguely sensible. >> >> If I were designing this type of system, I'd have a manager process in >> each namespace running as PID 1, though -- PID 1 is special and needs >> to understand what's going on anyway. Then PID 1 would do the kill() >> calls and wouldn't need translate_pid(). > > Yes, this has been tried out with the prototype use of PID namespaces > in the DB. > It works, but would be slow as the manager would have to exchange > messages with the > controlling processes which would be in the parent namespace. > DB could use the api to convert the pid. > From 1581601318581092803@xxx Wed Oct 18 13:15:56 +0000 2017 X-GM-THRID: 1581133950441644275 X-Gmail-Labels: Inbox,Category Forums