Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp3112372pxj; Mon, 7 Jun 2021 02:35:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxc7447LLr7cG1kY//u5oxImMy+4YMav9nMHceBxwU0B8rSdwQGYKDetMwBgTozRzLYrqTy X-Received: by 2002:a50:9fe5:: with SMTP id c92mr19107997edf.93.1623058511324; Mon, 07 Jun 2021 02:35:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623058511; cv=none; d=google.com; s=arc-20160816; b=zDS7IMCXnedQ9sWy3E/gYAycpr31V9aV5Ish2nUC/0eLrTNc52ole8DDDfRcv7TwH9 Hq0MhTeV0BCN3zvcrEQxGZS5obL9CICOwWQM5Njameg98ODn62AyBRDpHgsGM+WIV04X OvCdmLDMs7P6cEjPrHIYBXYfNdiFuK7JKo/u6YizQSCuV6gd3mUM4HWtTnhm60ztyf+g nGHndTDlDInogIdo2a/+T/jwLpmr6qU6lzBShE2mbFykQBDvn0jkEezh8Sa7S6YgaeSX 6N97mtOz7kqfd847YG9vxvjLVGXJHzqCN5T2Tp6JNfigKL4oKKb6mLwrhjxFChgxJi2P RSow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:in-reply-to :date:references:subject:cc:to:from:dkim-signature; bh=PfOqujWFG50uWPMy3Vl/N6e7fiBUB3XX2l2izXiIL0U=; b=Tt+P42EP3r4BsdN8v9sJ+qzJexSmkR9kT1kA0lihlwIC+pi1vgiEPL00CS+/AqYuT3 YdTsNe6eSTwZxL0S+jf/B/pe0VSkhL0yhWErftg2p0XNsVxdmo7Il3DwabyIIEVfrxGI pwa9t/W4DJr8jqhQtGKAsuR1U5SNPgxMoy5LTlGkFqPfp30ZV6D5TAdpJdDC4Q8Tq9L3 DnhswqXZ+yU/0iTr7d5H8att+1TvP/ELYAXKpre1yfPvt0ssAlCGh+qwzcvWZ3/58Xc/ GbTqF0A+KgrIrlCXWeTcpgdBGQt94lyS+ht2I8IXe5IBN0/F6OJugRR+qsA6P0qT3QuB pkaA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XL8LBH2l; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cx14si12497097edb.356.2021.06.07.02.34.48; Mon, 07 Jun 2021 02:35:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XL8LBH2l; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230374AbhFGJde (ORCPT + 99 others); Mon, 7 Jun 2021 05:33:34 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:40511 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230382AbhFGJde (ORCPT ); Mon, 7 Jun 2021 05:33:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1623058302; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=PfOqujWFG50uWPMy3Vl/N6e7fiBUB3XX2l2izXiIL0U=; b=XL8LBH2lFMKovmqwpfUNYeE1aJXbBD6UuIxxN3m3ypEmiHAexGHr22fH9TdU1EvxV+MyvW /JSDQYNP594En2nU7zIKtWCDcdieU/FCsbWGFK5Acsp1wlKF5ust9mQ8nGdamYNzMrBuN1 3BIMjquypx9YovW8d0CuUdbErB6bcR8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-593-yPYEndsFNY6g7Hr6hovFtg-1; Mon, 07 Jun 2021 05:31:41 -0400 X-MC-Unique: yPYEndsFNY6g7Hr6hovFtg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0CCAABBEF4; Mon, 7 Jun 2021 09:31:40 +0000 (UTC) Received: from localhost (ovpn-114-205.ams2.redhat.com [10.36.114.205]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 45CAE5B826; Mon, 7 Jun 2021 09:31:39 +0000 (UTC) From: Giuseppe Scrivano To: Christian Brauner Cc: ebiederm@xmission.com, "Serge E. Hallyn" , linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] userns: automatically split user namespace extent References: <20201203150252.1229077-1-gscrivan@redhat.com> <20210510172351.GA19918@mail.hallyn.com> <20210510185715.GA20897@mail.hallyn.com> <87h7idbskw.fsf@redhat.com> <20210605130016.jdkkviwtuefocset@wittgenstein> Date: Mon, 07 Jun 2021 11:31:37 +0200 In-Reply-To: <20210605130016.jdkkviwtuefocset@wittgenstein> (Christian Brauner's message of "Sat, 5 Jun 2021 15:00:16 +0200") Message-ID: <874keaaume.fsf@redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Christian Brauner writes: > On Fri, Jun 04, 2021 at 04:41:19PM +0200, Giuseppe Scrivano wrote: >> Christian, Eric, >> >> are you fine with this patch or is there anything more you'd like me to >> change? > > Before being a little bit of a party pooper thanks for your patches! I > appreciate the work you're doing! > > So my concern is that this may cause silent regressions/security issues > for tools in userspace by making this work automagically. > > For example we have a go library that calculates idmap ranges and > extents. Those idmappings are stored in the database and in the > container's config and for backups and so on. > > The calculated extents match exactly with how these lines look in > /proc//*id_map. > If we miscalculate the extents and we try to write them to > /proc//*id_map we get told to go away and immediately recognize the > bug. > With this patch however we may succeed and then we record misleading > extents in the db or the config. > > Turning this into a general concern, I think it is a non-trivial > semantic change to break up the 1:1 correspondence between mappings > written and mappings applied that we had for such a long time now. > > In general I'm not sure it should be the kernel that has the idmapping > ranges smarts. > > I'd rather see a generic userspace library that container runtimes make > use of that also breaks up idmapping ranges. We can certainly accomodate > this in > https://pkg.go.dev/github.com/lxc/lxd/shared/idmap > > Is that a reasonable concern? I've ended up adding a similar logic to Podman for the same reason as above. In our use case, containers are created within a user namespace that usually has two extents, the current unprivileged ID mapped to root, and any additional ID allocated to the user through /etc/sub?id mapped to 1. Within this user namespace, other user namespaces can be created and we let users specify the mappings. It is a common mistake to specify a mapping that overlaps multiple extents in the parent userns e.g: 0:0:IDS_AVAILABLE. To avoid the problem we have to first parse /proc/self/?id_map and then split the specified extents when they overlap. In our case this is not an issue anymore, moving the logic to the kernel would just avoid a open syscall. IMHO the 1:1 mapping is just an implementation detail, that is not obvious for users. Having the split in the kernel will also avoid that this same check is added to each container runtimes that uses nested user namespaces. Thanks, Giuseppe