Received: by 2002:a05:6500:1b45:b0:1f5:f2ab:c469 with SMTP id cz5csp609709lqb; Wed, 17 Apr 2024 06:17:57 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXRIl1DfcTX+EZ8KpwQc1rWyrSwX4jGzWaXbjTT/SZEI85Ka8Yt55xOTt7qylOrZnJX1y65nPmyIsU5EF7/wXstIXUQEj7lFbvlEi4rSQ== X-Google-Smtp-Source: AGHT+IHJ3rmx/trhBndRNYfpWoAGcKsdOt5x4994ILt6rvp7zLZfHBIbmEVJE2TO5efQqj1y4h3B X-Received: by 2002:a05:6871:8797:b0:22e:4d91:68f3 with SMTP id td23-20020a056871879700b0022e4d9168f3mr19445943oab.59.1713359876726; Wed, 17 Apr 2024 06:17:56 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713359876; cv=pass; d=google.com; s=arc-20160816; b=eQIJvGbDPy/zC6tKFTuz+wwgF/IGwd+5CjxqyVzWV4Vvj+Dssc//VTKfRGsJQDGD0y MNn7Gq8lHh2cS73M6LIdpTRUVuqbMGav05l3pDGsV7hQpCj/p9uMRMOixu0LzlH4Ottf a33PMYHTvVqyKvH3V68DgKbHMDh2UHmOiHrA0FOe2sa4bnFKoM5NrhGgBacfLQyrUP2i 7U/ylC9Ls0IVOot99ahNH/fi9tGi8nHB5fSs8eaEwK3sZnNeauzz3EzqtnbMrD+W5Hym vRuEpmmMaEgyCNc2Ouon7JIewcJ8r5EvdzN04LqTLTrMzxeN0IXPyZpDQXQWNGkVsIcF 9saA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:in-reply-to:subject:cc:to:from:date :dkim-signature; bh=RlZm7gBDKPDgp7jZSK2jst68FiLcBaouXakURfB0IcY=; fh=NQjz1vfn2NyjnIGyPUDUNqttKsYPqv0CP8P0DHhqSTE=; b=aSccZT0f2WYbqaO4lEc2fvH/Rkh4mQihMMHVdtzw0ua1O1ozMWTUXbXneI6sfkXKnJ JEIkd5tO81HiSqsmY2/LnW8zO0Q9OMNTHwHFvMrPcEO7Hos6PBcBOdiK851RQabXvuzV m3E7kT5o+Sv3JGT3JMGCuT4aDXlt2ZErPjoPPRBsuw6OoLrE35/5BcDeFUetLHmSq6gb jZlfrlIhx85z3GsdoViyhrMNP8FpqyCAAo7qBt1jC8s33rXGt6mgxsC/95hvGLpH/I0Q LCEqz/9O2QZeubQJ6qIl50SxPxpf2m7BLGVitI5q36nBQEhIgWl+DJ0JBNohsHtYxtXR AFcQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ssi.bg header.s=ink header.b=rIP3z9lg; arc=pass (i=1 spf=pass spfdomain=ssi.bg dkim=pass dkdomain=ssi.bg dmarc=pass fromdomain=ssi.bg); spf=pass (google.com: domain of linux-kernel+bounces-148526-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-148526-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ssi.bg Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id x15-20020a63fe4f000000b005f3ffb036c8si11442736pgj.104.2024.04.17.06.17.56 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Apr 2024 06:17:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-148526-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@ssi.bg header.s=ink header.b=rIP3z9lg; arc=pass (i=1 spf=pass spfdomain=ssi.bg dkim=pass dkdomain=ssi.bg dmarc=pass fromdomain=ssi.bg); spf=pass (google.com: domain of linux-kernel+bounces-148526-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-148526-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ssi.bg Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 9B11DB24798 for ; Wed, 17 Apr 2024 13:14:23 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 215EF13E02A; Wed, 17 Apr 2024 13:12:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=ssi.bg header.i=@ssi.bg header.b="rIP3z9lg" Received: from mg.ssi.bg (mg.ssi.bg [193.238.174.37]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B052913C9AD; Wed, 17 Apr 2024 13:12:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.238.174.37 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713359568; cv=none; b=DjOkW6bNKFTC39XwOHw01u+zA392VCwJJJlkf2io6qKhwg1fxCPm6DTktlq5+sYSa+InHlKKtsPqZ63fIZiFq4v5CQeLIDGO6YxyeXaBqpq06H3+THWbVegfxGC4xahHcbJkar0JPhVc6uiVi14I6vW4qnfJJgOU8UBD6duX2Yk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713359568; c=relaxed/simple; bh=M5AbliW62VTYQiQJ1ZutBcfkVP4HB7k7NtYzfZIlJ/c=; h=Date:From:To:cc:Subject:In-Reply-To:Message-ID:References: MIME-Version:Content-Type; b=hslPlwJUgtJhjbd4hbAn6MsnjRDXUuILc1EqCRCDtLzj+a/BhEnciENnZcQiwbzN7gJKCud2aOwdHHwEzaolrvwrFnBegfoA82UXcfOpEEsw5RVWns1sRv+94FrlvVSM0EPlA7kEBZZtwHQJhb//518uCZJk45/pxwUqfPbyCL4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=ssi.bg; spf=pass smtp.mailfrom=ssi.bg; dkim=pass (1024-bit key) header.d=ssi.bg header.i=@ssi.bg header.b=rIP3z9lg; arc=none smtp.client-ip=193.238.174.37 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=ssi.bg Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ssi.bg Received: from mg.ssi.bg (localhost [127.0.0.1]) by mg.ssi.bg (Proxmox) with ESMTP id B332637CFC; Wed, 17 Apr 2024 16:02:28 +0300 (EEST) Received: from ink.ssi.bg (ink.ssi.bg [193.238.174.40]) by mg.ssi.bg (Proxmox) with ESMTPS; Wed, 17 Apr 2024 16:02:24 +0300 (EEST) Received: from ja.ssi.bg (unknown [213.16.62.126]) by ink.ssi.bg (Postfix) with ESMTPSA id 4288A90044E; Wed, 17 Apr 2024 16:02:16 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=ssi.bg; s=ink; t=1713358938; bh=M5AbliW62VTYQiQJ1ZutBcfkVP4HB7k7NtYzfZIlJ/c=; h=Date:From:To:cc:Subject:In-Reply-To:References; b=rIP3z9lg474l8tpwwtcJBro3ovnU4Qfnvcxkx1KMUiZeJfC+x4FRaIz45VEtYhIA7 2hXk/0GMt1SRfVrA/ZjGxU/NrGCNmi6eLwucjmlmiiwLcIMruuUoBP1u7fgn+0PecV gqnKrdP1nTNquMXl96hRMOVYdKZUhu1/Jfmh++ig= Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by ja.ssi.bg (8.17.1/8.17.1) with ESMTP id 43HD2EIr076830; Wed, 17 Apr 2024 16:02:15 +0300 Date: Wed, 17 Apr 2024 16:02:14 +0300 (EEST) From: Julian Anastasov To: Alexander Mikhalitsyn cc: horms@verge.net.au, netdev@vger.kernel.org, lvs-devel@vger.kernel.org, netfilter-devel@vger.kernel.org, linux-kernel@vger.kernel.org, =?UTF-8?Q?St=C3=A9phane_Graber?= , Christian Brauner , Pablo Neira Ayuso , Jozsef Kadlecsik , Florian Westphal Subject: Re: [PATCH net-next] ipvs: allow some sysctls in non-init user namespaces In-Reply-To: <20240416144814.173185-1-aleksandr.mikhalitsyn@canonical.com> Message-ID: <32f56a2e-8142-4391-916a-65fe51a57933@ssi.bg> References: <20240416144814.173185-1-aleksandr.mikhalitsyn@canonical.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="-1463811672-76452460-1713358936=:3334" This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1463811672-76452460-1713358936=:3334 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Hello, On Tue, 16 Apr 2024, Alexander Mikhalitsyn wrote: > Let's make all IPVS sysctls visible and RO even when > network namespace is owned by non-initial user namespace. > > Let's make a few sysctls to be writable: > - conntrack > - conn_reuse_mode > - expire_nodest_conn > - expire_quiescent_template > > I'm trying to be conservative with this to prevent > introducing any security issues in there. Maybe, > we can allow more sysctls to be writable, but let's > do this on-demand and when we see real use-case. > > This list of sysctls was chosen because I can't > see any security risks allowing them and also > Kubernetes uses [2] these specific sysctls. > > This patch is motivated by user request in the LXC > project [1]. > > [1] https://github.com/lxc/lxc/issues/4278 > [2] https://github.com/kubernetes/kubernetes/blob/b722d017a34b300a2284b890448e5a605f21d01e/pkg/proxy/ipvs/proxier.go#L103 > > Cc: Stéphane Graber > Cc: Christian Brauner > Cc: Julian Anastasov > Cc: Simon Horman > Cc: Pablo Neira Ayuso > Cc: Jozsef Kadlecsik > Cc: Florian Westphal > Signed-off-by: Alexander Mikhalitsyn > --- > net/netfilter/ipvs/ip_vs_ctl.c | 18 +++++++++++++++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c > index 143a341bbc0a..92a818c2f783 100644 > --- a/net/netfilter/ipvs/ip_vs_ctl.c > +++ b/net/netfilter/ipvs/ip_vs_ctl.c > @@ -4285,10 +4285,22 @@ static int __net_init ip_vs_control_net_init_sysctl(struct netns_ipvs *ipvs) As the list of privileged vars is short I prefer to use a bool and to make only some vars read-only: bool unpriv = false; > if (tbl == NULL) > return -ENOMEM; > > - /* Don't export sysctls to unprivileged users */ > + /* Let's show all sysctls in non-init user namespace-owned > + * net namespaces, but make them read-only. > + * > + * Allow only a few specific sysctls to be writable. > + */ > if (net->user_ns != &init_user_ns) { Here we should just set: unpriv = true; > - tbl[0].procname = NULL; > - ctl_table_size = 0; > + for (idx = 0; idx < ARRAY_SIZE(vs_vars); idx++) { > + if (!tbl[idx].procname) > + continue; > + > + if (!((strcmp(tbl[idx].procname, "conntrack") == 0) || > + (strcmp(tbl[idx].procname, "conn_reuse_mode") == 0) || > + (strcmp(tbl[idx].procname, "expire_nodest_conn") == 0) || > + (strcmp(tbl[idx].procname, "expire_quiescent_template") == 0))) > + tbl[idx].mode = 0444; > + } > } > } else > tbl = vs_vars; And below at every place to use: if (unpriv) tbl[idx].mode = 0444; for the following 4 privileged sysctl vars: - sync_qlen_max: - allocates messages in kernel context - this needs better tunning in another patch - sync_sock_size: - allocates messages in kernel context - run_estimation: - for now, better init ns to decide if to use est stats - est_nice: - for now, better init ns to decide the value - debug_level: - already set to 0444 I.e. these vars allocate resources (mem, CPU) without proper control, so for now we will just copy them from init ns without allowing writing. And they are vars that are not tuned often. Also we do not know which netns is supposed to be the privileged one, some solutions move all devices out of init_net, so we can not decide where to use lower limits. OTOH, "amemthresh" is not privileged but needs single READ_ONCE for sysctl_amemthresh in update_defense_level() due to the possible div by zero if we allow writing to anyone, eg.: int amemthresh = max(READ_ONCE(ipvs->sysctl_amemthresh), 0); ... nomem = availmem < amemthresh; ... use only amemthresh All other vars can be writable. Regards -- Julian Anastasov ---1463811672-76452460-1713358936=:3334--