Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp943905rdg; Fri, 13 Oct 2023 06:04:42 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGjKD5swTSS85hu3EHlJolXS3ifVKhZFK/7CsWVaH8u0NkPiqSsUeJRMJwLwD3TzgVgrJ/X X-Received: by 2002:a17:90a:d3c1:b0:27d:20f5:3629 with SMTP id d1-20020a17090ad3c100b0027d20f53629mr4373121pjw.46.1697202281559; Fri, 13 Oct 2023 06:04:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697202281; cv=none; d=google.com; s=arc-20160816; b=o1q0C5qmVV71if7HhHd4Cl4qTep5XYkg27NyQESA4Sf6B8FgtiNGZRO10VPMCxC8w5 dndJAPfiWA6kohqufUpbrwo2ZS2AxCnMqgklnwU12C+pNLgYrJc9vTM1ZB9B86edIc1C 8bJYtR4r3giF0U/3AdfXDuUVVv3nywSo0EiRiQJMBEyl3lGU+0akHWFKov6oELhKSYle 2jfHF7iNDtyDha+4siXcEXad6BseScowUAXerkOVgjEpYSoUxK3sv1bAc3rxHUVdVNjt MlugunihXUYIwEyDrjaahIYSbVBRcjhwQWffOpiriKVdf9cmVuS5WGe7I4MxGGkx8q17 +TpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:content-transfer-encoding:mime-version :user-agent:message-id:in-reply-to:date:references:cc:to:from; bh=CGNE8/IDFqBVxf0NPHP9z9mQ30DN9lbTIYZdXVdcOBU=; fh=9dBBFGAAqGbGx78aq3xek95Y2c39EJG5YMG6/eVL8uk=; b=nRxuGEg0RgQruAjUpNAGtLnu3tQuw0HW/gvh+Jp5ji8QZEcP5UfnlmpInoU9YYCJb9 juBShzugrA1H2uSv0jG1U+U/76+7asNt2P/6P9QTN9tYPNJNPHvUEj3qHknWlrm1W2Bg I8lyIrWP19ge1oK6I8gAhnxHKifQ9HoZWQ5gmg0SGHsObIVKef1o6wncT6e5Y3fTL6/v Epcz2vempBYm//mjGAG8ar5HOyuVWyE9K1prCfFRXv2Xm42jsiRljCRKGUZV+bSUAOaj B43LmGDo+o1qFRWK0DWS6yjvC4iFeuXR9sFg8MzfVTCh/grmD0uIBK0dneLwOB3few5x ++ZA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=xmission.com Return-Path: Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id ft16-20020a17090b0f9000b00274985b2fcdsi35673pjb.138.2023.10.13.06.04.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Oct 2023 06:04:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=xmission.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 7DC7E83B2E1C; Fri, 13 Oct 2023 06:04:36 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231846AbjJMNEH convert rfc822-to-8bit (ORCPT + 99 others); Fri, 13 Oct 2023 09:04:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43898 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231822AbjJMNEF (ORCPT ); Fri, 13 Oct 2023 09:04:05 -0400 Received: from out03.mta.xmission.com (out03.mta.xmission.com [166.70.13.233]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 93738BD for ; Fri, 13 Oct 2023 06:04:03 -0700 (PDT) Received: from in02.mta.xmission.com ([166.70.13.52]:51400) by out03.mta.xmission.com with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qrHpe-004kKO-NP; Fri, 13 Oct 2023 07:03:58 -0600 Received: from ip68-227-168-167.om.om.cox.net ([68.227.168.167]:60040 helo=email.froward.int.ebiederm.org.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1qrHpd-00GEuo-Bj; Fri, 13 Oct 2023 07:03:58 -0600 From: "Eric W. Biederman" To: yunhui cui Cc: akpm@linux-foundation.org, keescook@chromium.org, brauner@kernel.org, jeffxu@google.com, frederic@kernel.org, mcgrof@kernel.org, cyphar@cyphar.com, rongtao@cestc.cn, linux-kernel@vger.kernel.org, Linux Containers References: <20231011065446.53034-1-cuiyunhui@bytedance.com> <87sf6gcyb3.fsf@email.froward.int.ebiederm.org> Date: Fri, 13 Oct 2023 08:03:27 -0500 In-Reply-To: (yunhui cui's message of "Fri, 13 Oct 2023 10:44:45 +0800") Message-ID: <87r0lyad40.fsf@email.froward.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-XM-SPF: eid=1qrHpd-00GEuo-Bj;;;mid=<87r0lyad40.fsf@email.froward.int.ebiederm.org>;;;hst=in02.mta.xmission.com;;;ip=68.227.168.167;;;frm=ebiederm@xmission.com;;;spf=pass X-XM-AID: U2FsdGVkX18XjmovimnXJhc/c58GwdkiGPUaBlB1duc= X-SA-Exim-Connect-IP: 68.227.168.167 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-DCC: XMission; sa06 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;yunhui cui X-Spam-Relay-Country: X-Spam-Timing: total 800 ms - load_scoreonly_sql: 0.09 (0.0%), signal_user_changed: 20 (2.5%), b_tie_ro: 19 (2.3%), parse: 1.64 (0.2%), extract_message_metadata: 44 (5.5%), get_uri_detail_list: 2.7 (0.3%), tests_pri_-2000: 62 (7.7%), tests_pri_-1000: 8 (1.0%), tests_pri_-950: 1.94 (0.2%), tests_pri_-900: 1.62 (0.2%), tests_pri_-200: 1.31 (0.2%), tests_pri_-100: 6 (0.8%), tests_pri_-90: 224 (28.0%), check_bayes: 201 (25.2%), b_tokenize: 11 (1.4%), b_tok_get_all: 20 (2.5%), b_comp_prob: 4.2 (0.5%), b_tok_touch_all: 160 (20.0%), b_finish: 1.91 (0.2%), tests_pri_0: 408 (51.0%), check_dkim_signature: 0.91 (0.1%), check_dkim_adsp: 4.7 (0.6%), poll_dns_idle: 0.42 (0.1%), tests_pri_10: 2.5 (0.3%), tests_pri_500: 13 (1.7%), rewrite_mail: 0.00 (0.0%) Subject: Re: [External] Re: [PATCH] pid_ns: support pidns switching between sibling X-SA-Exim-Version: 4.2.1 (built Sat, 08 Feb 2020 21:53:50 +0000) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Fri, 13 Oct 2023 06:04:36 -0700 (PDT) yunhui cui writes: > Hi Eric, > > On Thu, Oct 12, 2023 at 11:31 AM Eric W. Biederman > wrote: >> >> The check you are deleting is what verifies the pid namespaces you are >> attempting to change pid_ns_for_children to is a member of the tasks >> current pid namespace (aka task_active_pid_ns). >> >> >> There is a perfectly good comment describing why what you are attempting >> to do is unsupportable. >> >> /* >> * Only allow entering the current active pid namespace >> * or a child of the current active pid namespace. >> * >> * This is required for fork to return a usable pid value and >> * this maintains the property that processes and their >> * children can not escape their current pid namespace. >> */ >> >> >> If you pick a pid namespace that does not meet the restrictions you are >> removing the pid of the new child can not be mapped into the pid >> namespace of the parent that called setns. >> >> AKA the following code can not work. >> >> pid = fork(); >> if (!pid) { >> /* child */ >> do_something(); >> _exit(0); >> } >> waitpid(pid); > > Sorry, I don't understand what you mean here. What I mean is that if your simple patch was adopted, then the classic way of controlling a fork would fail. pid = fork() ^--------------- Would return 0 for both parent and child ^--------------- Look at pid_nr_ns to understand. if (!pid() { /* child */ do_something(); _exit(0); } waitpid(pid); For your use case there are more serious problems as well. The entire process hierarchy built would be incorrect. Which means children signaling parents when they exit would be incorrect, and that parents would not be able to wait on their children. I do understand the desire to want to cow the memory space of all of the processes. That can potentially save a lot of resources. In other checkpoint/restart scenarios people have been using userfaultfd to get a similar benefit. I suggest you look at the CRIU project. Eric