Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp868977imm; Thu, 31 May 2018 10:47:47 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJI88029r1PBKZDkR2bf4jgn/t37Q+BNxsvFBVFnuc9ykYvOZlBxao9FysEHVYsuxvWo6xG X-Received: by 2002:a62:8b0a:: with SMTP id j10-v6mr7688850pfe.28.1527788867243; Thu, 31 May 2018 10:47:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527788867; cv=none; d=google.com; s=arc-20160816; b=VKf6YPTOxy384ibLR11ShEFUjveR/JxyCG2CJHppo+gKCCl9euqQNOWmEY/DLBvT24 OIgls0mHVyw1sOoAdh064BseaYTlbQGejaYLT7hHy10QPHpJcAqaIe8EXSMWfLBN/rse iPV5uXaEpffgzlHn48Bqlx4fj57ZONETM+xs0gEPWR5c847HwHsWWGC0gUSXhKIbLREd nUxihdlC/LhiLKNcO3TQFK8Ku5G4PcT2K6QQuj4IEauHfk4ncG9ecNTYzDrqABcuv/wf AmXnuMhTfnuhOSKbPM7c6QWFupE4anR/O13K4V3uywFW7sZMoXd7if4aTBsAqPLFlC0J J/hQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=hnlxrR8EnrCkhkXlyZPyBY1pNgFJ39ZptoNTYsJh7xA=; b=P0qJQjlF8Nm5J/G/YA/tSLd0U9p8D2/W7mTYvH35eTL94oMQJYorub8X5xS0QtjIR3 QTouEAakY7l/Jw/j8Pjd4bYBDPl34Wdi/vlRCrMeL2Qg/g10pyUjd+cF770veGXLpL6I wRpPxJKB9+/Oh/+es7oaRkIqf02+DcPpW/WOmLqwclsCGGn+fzD0Y6tbNzUKyxBSpFKE ZN78Es0ldbQTGu2/TDMce2WEb2BQ2x6dkBA9X3iSjHucxJ+90qvB4c+Y/s9JtltV0s0L t5B87uaeKs+oOFWlsxT+Rc/WnyfnXgKWiOkC23n9GwRjQ0V2ZBCGie3sArtynCLbX05i Ox9A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=hNMTo6ZG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v73-v6si8570058pfi.22.2018.05.31.10.47.33; Thu, 31 May 2018 10:47:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=hNMTo6ZG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755879AbeEaRrH (ORCPT + 99 others); Thu, 31 May 2018 13:47:07 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:56306 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755809AbeEaRrE (ORCPT ); Thu, 31 May 2018 13:47:04 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w4VHk1rN092033; Thu, 31 May 2018 17:46:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=hnlxrR8EnrCkhkXlyZPyBY1pNgFJ39ZptoNTYsJh7xA=; b=hNMTo6ZGoJKNtHsQC+PyKEKdJp4vai7RmBimH7YnO2EvCKRHadvo6p289aKvoRgVF73b FQlvpHnt8wtte25g0rb3NGQoz4g6ra6MnGJp4aaPtFlv4KZcwpUzR5JVbk+3L3U3B1aR CYz9yQRZCmEI+qpY/Z31tnZk7RuJ8uCmBys9f4MHBGeQXCmVz4EdyXKi+A8kbyFc44iJ nAKNWj3zY+TFf/2jzu0y6BhtPlvOSVNO4vOo+iRzj8e/bBlMDHrtncZUYOvDB92gYdkv Ywjq59r8VeLNqNSN4XhxOgKTol5D/lN7YqjPHDdufbJFFzgWE+DM5fPoq2HVFRc78tT3 2g== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp2130.oracle.com with ESMTP id 2janje029d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 31 May 2018 17:46:46 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w4VHkjpt016218 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 31 May 2018 17:46:45 GMT Received: from abhmp0018.oracle.com (abhmp0018.oracle.com [141.146.116.24]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w4VHkiM1009604; Thu, 31 May 2018 17:46:44 GMT Received: from [10.132.92.212] (/10.132.92.212) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 31 May 2018 10:46:44 -0700 Subject: Re: [PATCH RFC v5] pidns: introduce syscall translate_pid To: Konstantin Khlebnikov , "Eric W. Biederman" Cc: linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, Jann Horn , Serge Hallyn , Oleg Nesterov , Andy Lutomirski , Prakash Sangappa , Andrew Morton References: <152286911105.615669.14053871624892399807.stgit@buzz> <87h8oqhagl.fsf@xmission.com> <112c7cac-1982-3a2e-ffc0-878bc5ae4bb6@yandex-team.ru> <778ab3d0-b6bc-fdb5-669a-40222e5020d4@yandex-team.ru> <3e2c285a-1bf8-f71d-1b74-4d6465c29a54@yandex-team.ru> From: Nagarathnam Muthusamy Message-ID: <3bdd6b27-0a46-5802-8671-07268cecc1c7@oracle.com> Date: Thu, 31 May 2018 10:41:20 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <3e2c285a-1bf8-f71d-1b74-4d6465c29a54@yandex-team.ru> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8910 signatures=668702 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=17 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1805220000 definitions=main-1805310197 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/15/2018 10:36 AM, Konstantin Khlebnikov wrote: > > > On 15.05.2018 20:19, Nagarathnam Muthusamy wrote: >> >> >> On 04/24/2018 10:36 PM, Konstantin Khlebnikov wrote: >>> On 23.04.2018 20:37, Nagarathnam Muthusamy wrote: >>>> >>>> >>>> On 04/05/2018 12:02 AM, Konstantin Khlebnikov wrote: >>>>> On 05.04.2018 01:29, Eric W. Biederman wrote: >>>>>> Nagarathnam Muthusamy writes: >>>>>> >>>>>>> On 04/04/2018 12:11 PM, Konstantin Khlebnikov wrote: >>>>>>>> Each process have different pids, one for each pid namespace it >>>>>>>> belongs. >>>>>>>> When interaction happens within single pid-ns translation isn't >>>>>>>> required. >>>>>>>> More complicated scenarios needs special handling. >>>>>>>> >>>>>>>> For example: >>>>>>>> - reading pid-files or logs written inside container with pid >>>>>>>> namespace >>>>>>>> - attaching with ptrace to tasks from different pid namespace >>>>>>>> - passing pids across pid namespaces in any kind of API >>>>>>>> >>>>>>>> Currently there are several interfaces that could be used here: >>>>>>>> >>>>>>>> Pid namespaces are identified by inode number of >>>>>>>> /proc/[pid]/ns/pid. >>>>>> >>>>>> Using the inode number in interfaces is not an option. Especially >>>>>> not >>>>>> withou referencing the device number for the filesystem as well. >>>>> >>>>> This is supposed to be single-instance fs, >>>>> not part of proc but referenced but its magic "symlinks". >>>>> >>>>> Device numbers are not mentioned in "man namespaces". >>>>> >>>>>> >>>>>>>> Pids for nested Pid namespaces are shown in file >>>>>>>> /proc/[pid]/status. >>>>>>>> In some cases conversion pid -> vpid could be easily done using >>>>>>>> this >>>>>>>> information, but backward translation requires scanning all tasks. >>>>>>>> >>>>>>>> Unix socket automatically translates pid attached to >>>>>>>> SCM_CREDENTIALS. >>>>>>>> This requires CAP_SYS_ADMIN for sending arbitrary pids and >>>>>>>> entering >>>>>>>> into pid namespace, this expose process and could be insecure. >>>>>>>> >>>>>>>> This patch adds new syscall for converting pids between pid >>>>>>>> namespaces: >>>>>>>> >>>>>>>> pid_t translate_pid(pid_t pid, int source_type, int source, >>>>>>>>                                  int target_type, int target); >>>>>>>> >>>>>>>> @source_type and @target_type defines type of following arguments: >>>>>>>> >>>>>>>> TRANSLATE_PID_CURRENT_PIDNS  - current pid namespace, argument >>>>>>>> is unused >>>>>>>> TRANSLATE_PID_TASK_PIDNS     - task pid-ns, argument is task pid >>>>>>> >>>>>>> I believe using pid to represent the namespace has been already >>>>>>> discussed in V1 of this patch in >>>>>>> https://lkml.org/lkml/2015/9/22/1087 >>>>>>> after which we moved on to fd based version of this interface. >>>>>> >>>>>> Or in short why is the case of pids important? >>>>>> >>>>>> You Konstantin you almost said why they were important in your >>>>>> message >>>>>> saying you were going to send this one.  However you don't >>>>>> explain in >>>>>> your description why you want to identify pid namespaces by pid. >>>>>> >>>>> >>>>> Open of /proc/[pid]/ns/pid requires same permissions as ptrace, >>>>> pid based variant doesn't have such restrictions. >>>> >>>> Can you provide more information on usecase requiring PID >>>> translation but not used for tracing related purposes? >>> >>> Any introspection for [nested] containers. It's easier to work when >>> you have all information when you don't have any. >>> For example our CMS https://github.com/yandex/porto allows to start >>> nested sub-container (or even deeper) by request from any container >>> and have to tell back which pid task is have. And it could translate >>> any pid inside into accessible by client and vice versa. >>> >> >> I still dont get the exact reason why PID based approach to identify >> the namespace during pid translation process is absolutely required >> compared to fd based approach. > > As I told open(/proc/%d/ns/pid) have security restrictions - same > uid/CAP_SYS_PTRACE/whatever > Pidns-fd holds pid-namespace and without restrictions could be abused. > Pid based API is racy but always available without any restrictions. > > >> From your version of TranslatePid in >> >> https://github.com/yandex/porto/blob/0d7e6e7e1830dcd0038a057b2ab9964cec5b8fab/src/util/unix.cpp >> >> >> I see that you are going through the trouble of forking a process and >> sending SMC_CREDENTIALS for pid translation. Even your existing API >> could be extremely simplified if translate_pid based on file >> descriptors make it to the gate and I believe from the last >> discussion it was almost there >> https://patchwork.kernel.org/patch/10305439/ >> >> >>>> On a side note, can we have the types TRANSLATE_PID_CURRENT_PIDNS >>>> and TRANSLATE_PID_FD_PIDNS integrated first and then possibly >>>> extend the interface to include TRANSLATE_PID_TASK_PIDNS in future? >>> >>> I don't see reason for this separation. >>> Pids and pid namespaces are part of the API for a long time. >> >> If you are talking about the translate_pid API proposed, I believe >> the V4 proposed under https://patchwork.kernel.org/patch/10003935/ >> had only fd based API before a mix of PID and fd based is proposed in >> V5. Again, I was just wondering if we can get the FD based approach >> in first and then extend the API to include PID based approach later >> as fd based approach could provide a lot of immediate benefits? >> >> Thanks, >> Nagarathnam. >>> >>>> >>>> Thanks, >>>> Nagarathnam. >>>>> Most pid-based syscalls are racy in some cases but they are >>>>> here for decades and everybody knowns how to deal with it. >>>>> So, I've decided to merge both worlds in one interface which >>>>> clearly tells what to expect. >>>> >> Ping? Any additional comments on this patch? Thanks, Nagarathnam.