Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp900256imm; Tue, 15 May 2018 10:46:39 -0700 (PDT) X-Google-Smtp-Source: AB8JxZoxHIhE/4xzCYj9Mp1LQntJd3UQGi1+m/uv9KQjK0OANhaCVa8L/OIBVC59NcD6oDKR+ZQ8 X-Received: by 2002:a17:902:2f84:: with SMTP id t4-v6mr15882056plb.24.1526406399389; Tue, 15 May 2018 10:46:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526406399; cv=none; d=google.com; s=arc-20160816; b=cz4OMCLcf+dd2oMrqXnMHBj1FyZsCo0hDOMXxL+/S1HXW8XR1gdbtx2VoeidMHZmiv 5CBcIF2xKPADiKq6DWmYG+t7IQMJrSy1yGW8MaguDW5kc21ZY+05p2s4g1r/09G+zhur a5GQhLsL7YFEU1iWoh4htaan0e8EdI3MQ7GBrlYW/CobklooUUCTdI7Z4oFj/tnV+Vg9 UGTwLF5wYlYlDGv+gY/PSOqks4gdd4aFa80h9CNxH/zgJFhhdPEedFdRPKZZOaw83zXs Y23KIjtL4DgVYNJc4jRDv0YleLKwhXBLAqSlt5djYg9YNd6idTDf9OWPa3fvXug0wmcF mjVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=3gUGrieJZsqSCHbapfqR56meFV9TTKKVeq7UYCB+KBw=; b=HKjOYjxyEFgvvkBj/EW5x8fVmmyq1JKV+NJPjsebZce22lZvZNXpPE3Gma7SygPYH0 k/T2l/q6/rwaelDJO5HfENzn0ACzOjh5u2gCOrGsysYxikWko/0tsPFv6ULZHfpc9MGe kFpOiQ8NO0z6bC67siumQIuLJqceZkhb3FrRoMLZgxLKTnpCugU/JWd4+7Dq8NCXPdvf jAPPcx/As62XU9KOjo0nLd2tNT8uZW+zZdsFw80oYAy2ssygO58mFWTrb2vgHt9KDlYO LbEVLaE8tzemDtbyCTPY1Rc0zT1r9wwpZANXkYHRUmAFFZO0+BwGZRDAFKAv4hQKwp2V Gr7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=crma+cpd; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y23-v6si539034pff.177.2018.05.15.10.46.16; Tue, 15 May 2018 10:46:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=crma+cpd; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752749AbeEORqA (ORCPT + 99 others); Tue, 15 May 2018 13:46:00 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:60388 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751295AbeEORp6 (ORCPT ); Tue, 15 May 2018 13:45:58 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w4FHfATa158898; Tue, 15 May 2018 17:45:29 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=3gUGrieJZsqSCHbapfqR56meFV9TTKKVeq7UYCB+KBw=; b=crma+cpdYhFM2tiv0Fjm9yJPk4/h2RXPfHygRHWAaLf2UpzJvuqtPliDtX46PQwljTEM vmf+bvxr2QZs4jHhFQnxFvZWI+qb3g9zTebOFKWKKbFss2mKnRptYsFr+YaMQr9OiHpR yVD2zqSXOzi/0NFRuKViodyyTKJi2Zvl6O7CjjOMHx1vODt89bIu3S3o98b6oQZDLwZN Xo49IUNdv4Ch4y+rzu+SjI0mBmoyc06eGx4D5ziq04hKopZ0JvRY/a5AMcbRYG3qVKbp tZ+MxM7jZGLGR4uYrN35N4rBV7A6J+cjhGdEsMf7xLkNiZEh2aXAN5BfH+mMqGveebHJ mA== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp2130.oracle.com with ESMTP id 2hx29w9h53-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 15 May 2018 17:45:28 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w4FHjRpM008150 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 15 May 2018 17:45:28 GMT Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w4FHjP8x007300; Tue, 15 May 2018 17:45:26 GMT Received: from [10.132.93.82] (/10.132.93.82) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 15 May 2018 10:45:25 -0700 Subject: Re: [PATCH RFC v5] pidns: introduce syscall translate_pid To: Konstantin Khlebnikov , "Eric W. Biederman" Cc: linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, Jann Horn , Serge Hallyn , Oleg Nesterov , Andy Lutomirski , Prakash Sangappa , Andrew Morton References: <152286911105.615669.14053871624892399807.stgit@buzz> <87h8oqhagl.fsf@xmission.com> <112c7cac-1982-3a2e-ffc0-878bc5ae4bb6@yandex-team.ru> <778ab3d0-b6bc-fdb5-669a-40222e5020d4@yandex-team.ru> <3e2c285a-1bf8-f71d-1b74-4d6465c29a54@yandex-team.ru> From: Nagarathnam Muthusamy Message-ID: <4b21a648-0abc-e92d-ec37-681dff63bb55@oracle.com> Date: Tue, 15 May 2018 10:40:19 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <3e2c285a-1bf8-f71d-1b74-4d6465c29a54@yandex-team.ru> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8894 signatures=668698 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=17 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1805150176 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/15/2018 10:36 AM, Konstantin Khlebnikov wrote: > > > On 15.05.2018 20:19, Nagarathnam Muthusamy wrote: >> >> >> On 04/24/2018 10:36 PM, Konstantin Khlebnikov wrote: >>> On 23.04.2018 20:37, Nagarathnam Muthusamy wrote: >>>> >>>> >>>> On 04/05/2018 12:02 AM, Konstantin Khlebnikov wrote: >>>>> On 05.04.2018 01:29, Eric W. Biederman wrote: >>>>>> Nagarathnam Muthusamy writes: >>>>>> >>>>>>> On 04/04/2018 12:11 PM, Konstantin Khlebnikov wrote: >>>>>>>> Each process have different pids, one for each pid namespace it >>>>>>>> belongs. >>>>>>>> When interaction happens within single pid-ns translation isn't >>>>>>>> required. >>>>>>>> More complicated scenarios needs special handling. >>>>>>>> >>>>>>>> For example: >>>>>>>> - reading pid-files or logs written inside container with pid >>>>>>>> namespace >>>>>>>> - attaching with ptrace to tasks from different pid namespace >>>>>>>> - passing pids across pid namespaces in any kind of API >>>>>>>> >>>>>>>> Currently there are several interfaces that could be used here: >>>>>>>> >>>>>>>> Pid namespaces are identified by inode number of >>>>>>>> /proc/[pid]/ns/pid. >>>>>> >>>>>> Using the inode number in interfaces is not an option. Especially >>>>>> not >>>>>> withou referencing the device number for the filesystem as well. >>>>> >>>>> This is supposed to be single-instance fs, >>>>> not part of proc but referenced but its magic "symlinks". >>>>> >>>>> Device numbers are not mentioned in "man namespaces". >>>>> >>>>>> >>>>>>>> Pids for nested Pid namespaces are shown in file >>>>>>>> /proc/[pid]/status. >>>>>>>> In some cases conversion pid -> vpid could be easily done using >>>>>>>> this >>>>>>>> information, but backward translation requires scanning all tasks. >>>>>>>> >>>>>>>> Unix socket automatically translates pid attached to >>>>>>>> SCM_CREDENTIALS. >>>>>>>> This requires CAP_SYS_ADMIN for sending arbitrary pids and >>>>>>>> entering >>>>>>>> into pid namespace, this expose process and could be insecure. >>>>>>>> >>>>>>>> This patch adds new syscall for converting pids between pid >>>>>>>> namespaces: >>>>>>>> >>>>>>>> pid_t translate_pid(pid_t pid, int source_type, int source, >>>>>>>>                                  int target_type, int target); >>>>>>>> >>>>>>>> @source_type and @target_type defines type of following arguments: >>>>>>>> >>>>>>>> TRANSLATE_PID_CURRENT_PIDNS  - current pid namespace, argument >>>>>>>> is unused >>>>>>>> TRANSLATE_PID_TASK_PIDNS     - task pid-ns, argument is task pid >>>>>>> >>>>>>> I believe using pid to represent the namespace has been already >>>>>>> discussed in V1 of this patch in >>>>>>> https://lkml.org/lkml/2015/9/22/1087 >>>>>>> after which we moved on to fd based version of this interface. >>>>>> >>>>>> Or in short why is the case of pids important? >>>>>> >>>>>> You Konstantin you almost said why they were important in your >>>>>> message >>>>>> saying you were going to send this one.  However you don't >>>>>> explain in >>>>>> your description why you want to identify pid namespaces by pid. >>>>>> >>>>> >>>>> Open of /proc/[pid]/ns/pid requires same permissions as ptrace, >>>>> pid based variant doesn't have such restrictions. >>>> >>>> Can you provide more information on usecase requiring PID >>>> translation but not used for tracing related purposes? >>> >>> Any introspection for [nested] containers. It's easier to work when >>> you have all information when you don't have any. >>> For example our CMS https://github.com/yandex/porto allows to start >>> nested sub-container (or even deeper) by request from any container >>> and have to tell back which pid task is have. And it could translate >>> any pid inside into accessible by client and vice versa. >>> >> >> I still dont get the exact reason why PID based approach to identify >> the namespace during pid translation process is absolutely required >> compared to fd based approach. > > As I told open(/proc/%d/ns/pid) have security restrictions - same > uid/CAP_SYS_PTRACE/whatever > Pidns-fd holds pid-namespace and without restrictions could be abused. > Pid based API is racy but always available without any restrictions. I get that Pid based API is available without any restrictions but do we have any existing usecase which requires Pid based API but cannot use Pidns-fd based API? Most of the usecases discussed in this thread deals with introspection of a process by another process and I believe that security requirement for opening (/proc/%d/ns/pid) is required for all such usecases. In other words, Why would a process which does not belong to same uid of the process observed or have CAP_SYS_PTRACE be allowed to translate PID? Thanks, Nagarathnam. > > >> From your version of TranslatePid in >> >> https://github.com/yandex/porto/blob/0d7e6e7e1830dcd0038a057b2ab9964cec5b8fab/src/util/unix.cpp >> >> >> I see that you are going through the trouble of forking a process and >> sending SMC_CREDENTIALS for pid translation. Even your existing API >> could be extremely simplified if translate_pid based on file >> descriptors make it to the gate and I believe from the last >> discussion it was almost there >> https://patchwork.kernel.org/patch/10305439/ >> >> >>>> On a side note, can we have the types TRANSLATE_PID_CURRENT_PIDNS >>>> and TRANSLATE_PID_FD_PIDNS integrated first and then possibly >>>> extend the interface to include TRANSLATE_PID_TASK_PIDNS in future? >>> >>> I don't see reason for this separation. >>> Pids and pid namespaces are part of the API for a long time. >> >> If you are talking about the translate_pid API proposed, I believe >> the V4 proposed under https://patchwork.kernel.org/patch/10003935/ >> had only fd based API before a mix of PID and fd based is proposed in >> V5. Again, I was just wondering if we can get the FD based approach >> in first and then extend the API to include PID based approach later >> as fd based approach could provide a lot of immediate benefits? >> >> Thanks, >> Nagarathnam. >>> >>>> >>>> Thanks, >>>> Nagarathnam. >>>>> Most pid-based syscalls are racy in some cases but they are >>>>> here for decades and everybody knowns how to deal with it. >>>>> So, I've decided to merge both worlds in one interface which >>>>> clearly tells what to expect. >>>> >>