Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp877556imm; Tue, 15 May 2018 10:26:05 -0700 (PDT) X-Google-Smtp-Source: AB8JxZr/8kU9K+8AAXWWQ0NQWl1XxJ9HPJgK5vXJk0lrcKZod/UxbyDSNCWalJBU4voC/1sbzEjP X-Received: by 2002:a62:66dd:: with SMTP id s90-v6mr15823090pfj.123.1526405165296; Tue, 15 May 2018 10:26:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526405165; cv=none; d=google.com; s=arc-20160816; b=acwByX+Lg1qTvrqEo7JrRFzhlGOUMvreQQQ4xmX6rxRtFaPz2Sv7aV+WZ8jmLrc9Ec sN/V2deHZcRTRMuYjvRKwHRVSI+fj2RhfBeqIUne17RrFgWNDiTjK74WaBWd36m4d6Jw 1X4CD7O3+6MTJO5FqjV/ySX2ken5pMZn4dDf90iIuYEqmd//Q8wIQtJReWNHW0oIg/cB yNAcq21NluUlvB7kZHPvE4fvglGGpz2vQ7l+qqoKw4mH8CGonyaRxhYCQ/JPiH6U6rnM cwd7M3XkWE0iKpD54XryA8BkXK6b6J6yl5de5pcTbeikW2vTKVxdyLMvJdbATE+Qu+Sg 6tCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=bHUokbUdEXmJIv9PqMofa7FH3Tl1cTwYsBuGnl5kMDY=; b=bEUU8aLaLnUJ1cRSDjpyIiGkkUSfO7HI8KxW9SxR3tQ2uBLN0ZfVugOaKrYHOZx2T9 e7IQAsHS8xQktVnuZbhHVCp7cdqFas89JFN6ujYvQoFY+XLK4mNKXmI6bewDndNRrGfB pbBmpvCvLYyYoG0xO6eDZcNI6e660jhM0lAZWL5nC+B7mqv5nrQBvK2eTe4xqZYs4AGD A72Yx+yYGaMH/oLqwpYSM3f4xqyJo0Ba6qSXfLoP0ExQnWs/e6r9gmI85ISjMqnwEiPc E3yu1eTEIq15M0VgfaaEJmul/9nR9kVaEiAJai9CoudEFJjNvYgTchmI6MO43bEV66Kz 8yGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=W2xTBqFG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 30-v6si457030plb.531.2018.05.15.10.25.50; Tue, 15 May 2018 10:26:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=W2xTBqFG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932210AbeEORYq (ORCPT + 99 others); Tue, 15 May 2018 13:24:46 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:46338 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754201AbeEORYo (ORCPT ); Tue, 15 May 2018 13:24:44 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w4FHFi1l140240; Tue, 15 May 2018 17:24:21 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=bHUokbUdEXmJIv9PqMofa7FH3Tl1cTwYsBuGnl5kMDY=; b=W2xTBqFGYIdxpL03DCkX9+e/Y4UJ/SXmJFFHxxaXcMSujsmRvqihhFpCCSfWjjLmxC7q n8F30cW4OlYYo+bfg60TGcECqVJe7mMgHhhyhetW8Opauy1cUTHt6ykPBkN2x10f+Tdo FEhOmEJZ1CIL3yhSFFfmUrTGTI3Qs5YwzAJFprIUZ+bFlaZ48aYLr/7rwx9BG9ppJQ11 gcltHSxtzO6sG2/QHLTj6UW5jISEr0cEmNYtHfTlKtclEEp5vyJw9WCKvGj963mxemWG MJgJJEofZGihnd6jDAbhsaedyFDyMf9cG9vX9ASRLvnlmSgEdYdsBUYEfKz+sOPopYO2 cQ== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp2130.oracle.com with ESMTP id 2hx29w9edm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 15 May 2018 17:24:21 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w4FHOLdk020941 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 15 May 2018 17:24:21 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w4FHOKlZ003314; Tue, 15 May 2018 17:24:20 GMT Received: from [10.132.93.82] (/10.132.93.82) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 15 May 2018 10:24:19 -0700 Subject: Re: [PATCH RFC v5] pidns: introduce syscall translate_pid To: Konstantin Khlebnikov , "Eric W. Biederman" Cc: linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, Jann Horn , Serge Hallyn , Oleg Nesterov , Andy Lutomirski , Prakash Sangappa , Andrew Morton References: <152286911105.615669.14053871624892399807.stgit@buzz> <87h8oqhagl.fsf@xmission.com> <112c7cac-1982-3a2e-ffc0-878bc5ae4bb6@yandex-team.ru> <778ab3d0-b6bc-fdb5-669a-40222e5020d4@yandex-team.ru> From: Nagarathnam Muthusamy Message-ID: Date: Tue, 15 May 2018 10:19:03 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <778ab3d0-b6bc-fdb5-669a-40222e5020d4@yandex-team.ru> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8894 signatures=668698 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=17 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1805150172 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/24/2018 10:36 PM, Konstantin Khlebnikov wrote: > On 23.04.2018 20:37, Nagarathnam Muthusamy wrote: >> >> >> On 04/05/2018 12:02 AM, Konstantin Khlebnikov wrote: >>> On 05.04.2018 01:29, Eric W. Biederman wrote: >>>> Nagarathnam Muthusamy writes: >>>> >>>>> On 04/04/2018 12:11 PM, Konstantin Khlebnikov wrote: >>>>>> Each process have different pids, one for each pid namespace it >>>>>> belongs. >>>>>> When interaction happens within single pid-ns translation isn't >>>>>> required. >>>>>> More complicated scenarios needs special handling. >>>>>> >>>>>> For example: >>>>>> - reading pid-files or logs written inside container with pid >>>>>> namespace >>>>>> - attaching with ptrace to tasks from different pid namespace >>>>>> - passing pids across pid namespaces in any kind of API >>>>>> >>>>>> Currently there are several interfaces that could be used here: >>>>>> >>>>>> Pid namespaces are identified by inode number of /proc/[pid]/ns/pid. >>>> >>>> Using the inode number in interfaces is not an option. Especially not >>>> withou referencing the device number for the filesystem as well. >>> >>> This is supposed to be single-instance fs, >>> not part of proc but referenced but its magic "symlinks". >>> >>> Device numbers are not mentioned in "man namespaces". >>> >>>> >>>>>> Pids for nested Pid namespaces are shown in file /proc/[pid]/status. >>>>>> In some cases conversion pid -> vpid could be easily done using this >>>>>> information, but backward translation requires scanning all tasks. >>>>>> >>>>>> Unix socket automatically translates pid attached to >>>>>> SCM_CREDENTIALS. >>>>>> This requires CAP_SYS_ADMIN for sending arbitrary pids and entering >>>>>> into pid namespace, this expose process and could be insecure. >>>>>> >>>>>> This patch adds new syscall for converting pids between pid >>>>>> namespaces: >>>>>> >>>>>> pid_t translate_pid(pid_t pid, int source_type, int source, >>>>>>                                  int target_type, int target); >>>>>> >>>>>> @source_type and @target_type defines type of following arguments: >>>>>> >>>>>> TRANSLATE_PID_CURRENT_PIDNS  - current pid namespace, argument is >>>>>> unused >>>>>> TRANSLATE_PID_TASK_PIDNS     - task pid-ns, argument is task pid >>>>> >>>>> I believe using pid to represent the namespace has been already >>>>> discussed in V1 of this patch in https://lkml.org/lkml/2015/9/22/1087 >>>>> after which we moved on to fd based version of this interface. >>>> >>>> Or in short why is the case of pids important? >>>> >>>> You Konstantin you almost said why they were important in your message >>>> saying you were going to send this one.  However you don't explain in >>>> your description why you want to identify pid namespaces by pid. >>>> >>> >>> Open of /proc/[pid]/ns/pid requires same permissions as ptrace, >>> pid based variant doesn't have such restrictions. >> >> Can you provide more information on usecase requiring PID translation >> but not used for tracing related purposes? > > Any introspection for [nested] containers. It's easier to work when > you have all information when you don't have any. > For example our CMS https://github.com/yandex/porto allows to start > nested sub-container (or even deeper) by request from any container > and have to tell back which pid task is have. And it could translate > any pid inside into accessible by client and vice versa. > I still dont get the exact reason why PID based approach to identify the namespace during pid translation process is absolutely required compared to fd based approach. From your version of TranslatePid in https://github.com/yandex/porto/blob/0d7e6e7e1830dcd0038a057b2ab9964cec5b8fab/src/util/unix.cpp I see that you are going through the trouble of forking a process and sending SMC_CREDENTIALS for pid translation. Even your existing API could be extremely simplified if translate_pid based on file descriptors make it to the gate and I believe from the last discussion it was almost there https://patchwork.kernel.org/patch/10305439/ >> On a side note, can we have the types TRANSLATE_PID_CURRENT_PIDNS and >> TRANSLATE_PID_FD_PIDNS integrated first and then possibly extend the >> interface to include TRANSLATE_PID_TASK_PIDNS in future? > > I don't see reason for this separation. > Pids and pid namespaces are part of the API for a long time. If you are talking about the translate_pid API proposed, I believe the V4 proposed under https://patchwork.kernel.org/patch/10003935/ had only fd based API before a mix of PID and fd based is proposed in V5. Again, I was just wondering if we can get the FD based approach in first and then extend the API to include PID based approach later as fd based approach could provide a lot of immediate benefits? Thanks, Nagarathnam. > >> >> Thanks, >> Nagarathnam. >>> Most pid-based syscalls are racy in some cases but they are >>> here for decades and everybody knowns how to deal with it. >>> So, I've decided to merge both worlds in one interface which clearly >>> tells what to expect. >>