Received: by 10.213.65.68 with SMTP id h4csp708917imn; Tue, 20 Mar 2018 13:20:12 -0700 (PDT) X-Google-Smtp-Source: AG47ELthFoSJLgL/DPs+FFb3RRDFyaeUwiqAT7cSY457mtc6Q1c8UPoLRUBqwhjrtkWTGFhEzGEi X-Received: by 10.101.88.130 with SMTP id d2mr12961624pgu.383.1521577212314; Tue, 20 Mar 2018 13:20:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521577212; cv=none; d=google.com; s=arc-20160816; b=LM/6xPy7rt3hXCa9gPhZ74186nc5i/KnelUWpSpORZjySOWuFxkL4tCEtcvdQBnKZ0 axIrBAjrVkGmvDEx9DGVEHeba5yA4MaKSyimru9icuxXQWYumGHkubRa1fPGUI+0iT1K NRKa4sdQQ0y+L9a12uagTG/8GRZDuk0+lLOqdVp9vYupZn/aVADmOy7JEw5p4u/hMFTS FZNXIu49V4CqWzAPs09fSu7uVQv4sYC0SDWbDj7JPYPLhBMfYATHzLF4BitTljEeABS4 OPwgOrlI6yiP93hk8CXB1U1K+hIjPq6wLbwdu3g+sFk1N3d5vMueS0Hic2lzNTcGYvpQ O37Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=CRQWmCcFjUJ2DcV0skqXLHGjlhon9L2DL1LgNch5nyQ=; b=lkocVVxcG7g2UU/gLiUjPEmMsRRCA47jh4pag9h9psqJ8SWxd/6TVYIclwSEkYcfvK 2Uv600Yx1cy3nspztxCrCko9Y7iCDC1K9DpzbkTSmDsCcXpU4ekO7jPSRmyCu4figS/H dsO8Fpic2mzNi1njiE77YWQtc54N3ZGW+LFjlncqis4wkRZEAmOpcj/Vjt7zTdiWof9C 11uW+CzE802eyUY9vn2lgWXwtUmf2ntnvxQ2r0UW6vJzVzRp0NKoToDmrX+SdIHTx9Fs rWSBRhpWCZ6d+Dphy3x3BJGrKkvpybjt/lTRY55k4+IK+FFcg4dQzL3t6kJDPShK2pbj g4tg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=VtJsWrIk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i5-v6si2355481plk.139.2018.03.20.13.19.57; Tue, 20 Mar 2018 13:20:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=VtJsWrIk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751557AbeCTUS6 (ORCPT + 99 others); Tue, 20 Mar 2018 16:18:58 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:42022 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751317AbeCTUS4 (ORCPT ); Tue, 20 Mar 2018 16:18:56 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w2KKCYx0192419; Tue, 20 Mar 2018 20:18:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=CRQWmCcFjUJ2DcV0skqXLHGjlhon9L2DL1LgNch5nyQ=; b=VtJsWrIkkSgGx3qeJF1tTFW6vEVRrobLeIaKfsJi9BjTWKgvKSe9UGqXGEOpJis/b1bD Cqc3+lwx+XHcr6pnbJnknkktG+BxHc2oqvTPnxDzZ1vNKPEVZFTYEKf4AWi3gi7eOfGW TqboHTKRQu0ddzJJAKqKh9mlqp3cBSt/sj/nDlt2lXi1gYOx26IOxEJJmn59vyqXBsEN JWEUEAm3l78cxtWoDVC+asbHtzghjm1f5wdvNSXxEIEdxe3dqtdEqHbmUfQ2q5vgSQY1 bGWchVvbHhl0v3VX5pvtB/nrlOS37Rva3cXs9WLEk66HcG5r8fdEMSg5LrVUp0hWxgrm tg== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2120.oracle.com with ESMTP id 2gu94rr0wr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 20 Mar 2018 20:18:46 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w2KKIjTh032448 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 20 Mar 2018 20:18:46 GMT Received: from abhmp0007.oracle.com (abhmp0007.oracle.com [141.146.116.13]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w2KKIjrB027588; Tue, 20 Mar 2018 20:18:45 GMT Received: from [10.132.92.135] (/10.132.92.135) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 20 Mar 2018 13:18:44 -0700 Subject: Re: [RESEND RFC] translate_pid API To: "Eric W. Biederman" Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, khlebnikov@yandex-team.ru, prakash.sangappa@oracle.com, luto@kernel.org, akpm@linux-foundation.org, oleg@redhat.com, serge.hallyn@ubuntu.com, esyr@redhat.com, jannh@google.com References: <1520875093-18174-1-git-send-email-nagarathnam.muthusamy@oracle.com> <87vadzqqq6.fsf@xmission.com> <990e88fa-ab50-9645-b031-14e1afbf7ccc@oracle.com> <877eqejowd.fsf@xmission.com> From: Nagarathnam Muthusamy Message-ID: <3a46a03d-e4dd-59b6-e25f-0020be1b1dc9@oracle.com> Date: Tue, 20 Mar 2018 13:14:14 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <877eqejowd.fsf@xmission.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8838 signatures=668695 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1803200127 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (Resending the reply as there was a reject due to HTML in email) On 03/14/2018 03:03 PM, ebiederm@xmission.com wrote: > Nagarathnam Muthusamy writes: > >> On 03/13/2018 08:29 PM, ebiederm@xmission.com wrote: >>> The cost of that ``cheaper'' u64 that is not in any namespace is that >>> you now have to go and implement a namespace of namespaces. You haven't >>> even attempted it. So just no. Anything that brings us to needing >>> a namespace of namespaces is a bad design. >> I am not trying to implement a namespace of namespaces. > No you are using a design that will require a namespace of namespaces > to be implemented to support CRIU (checkpoint/restart in userspace). > > So when I see your patch I see a patch that only implements the easy > half of the work that needs to be done. > >>>> Following patch uses a 64-bit ID for namespace exported by procfs >>>> for pid translation through a new file /proc//ns/pidns_id. >>> And this design detail is what brings the automatic nack. >>> >>> Use file descriptros and it sounds like your use case justifies what you >>> are trying to do. >> File descriptors are problematic for following reasons. >> 1) I need to open a couple of file descriptors for every pid >> translation request. > You can cache descriptors across requests. I suspect simply > by tracking the origin of the shared memory segment you can figure > out it's pid namespace. > >> 2) In case of nested PID namespaces, say a new pid namespace is >> created at level 20, >>     with unique ID, I could just record this ID in a shared memory for >> interested process >>     to use. In case of file descriptors, every level has to figure out >> the process ID of the >>     newly created namespace's init process and open a file descriptor >> to track it. > Toss in a bind mount of the file in some filesystem if that helps. > > But if I understand what you are talking about you are talking about > having a shared memory segment shared between processes in different > pid namespaces. > > In that shared memory segment for a processes in different namespaces > you are talking about having the conversation structured as having > information structured as pid-namespace pid. > > And crucuially you want anyone in any pid namespace to be able to read > that shared memory segment and to make sense of what is going on, > by just reading the pid namespace id. This captures the usecase. Adding to that, every level is made up of a combination of User, pid and mount namespace. > > > Namespaces are all about making identifiers relative to their namespace. > > The only way I can see you gain an advantage with your shared memory > design is by making identifiers that are not relative to their pid > namespace. As such identifiers will completely defeat the ability > to implement CRIU support. > > The closest I have to such identifiers today are bind mounts of the > namespace files. So if you also have a common mount namespace you could > use that. We don't have common mount namespace. Each nested level will have a new mount namespace. When a new nested level (User + pid + mnt) is created, init process of new level cannot bind mount the namespace directory, as the effects wont be visible to the other levels. On other hand, the new init process could send SCM_CREDENTIALS message to a centralized listener running outside of the whole setup which does only bind mounts. Here, we have a single point of failure for the whole system and this listener has to run as root to be able to do bind mounts. Apart from these, I am not able to see the bind mount by listener being propagated to child namespaces in my setup. Not sure if I am missing anything or this is the expected behavior. Is it possible to have application provide the ID to be associated with the namespace? During dump, we can save the ID and during restore, we can assign the ID using the same API. There is a possibility of collision during restore. Is it ok to fail the restore during such scenario? Thanks, Nagarathnam. > > In theory a name in some other namespace is possible. However anyone in > a container will only be able to see the names in their container or in > nested sub containers. Which is what you have already with pids. So I > don't think that will help. > > Eric