Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp4213932yba; Tue, 9 Apr 2019 13:44:19 -0700 (PDT) X-Google-Smtp-Source: APXvYqxiE8vnOxfdpXclOTQO72kha/6Pxlm4iA+qGZVMwy40bgm53qSRIXRQ3PxgBG+4OyykXtyQ X-Received: by 2002:a63:e556:: with SMTP id z22mr35825797pgj.290.1554842659240; Tue, 09 Apr 2019 13:44:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554842659; cv=none; d=google.com; s=arc-20160816; b=BWNwjTArTZD7g1SqysHAtl01aJJVx4uf+b/37h3X8TrOoPNZk5MINMz+VYvgqSElzn KJu+dojNHnK1Hqo4nL3kLTR2KmzZa5lnDkVTQ0IRF/0/BTPCXSiwoS0mBCjHcyQY0cNH oAblp1bFTCSbTuNbr5ZXxT95hN6l5jS+SIo+nJLeyeilJQqkW+0TqDVBzK4akzwmBJBE EdRjl56KD5eA2UxAxydwtRsDj5iXVKOZVYXR1kbd2dwulYCQQaxJvhACLGKFdl9ciGFh AWqk4QSYzYFfGpD6mh1nF1aLUxSVknLJA10McQRJOG73dVLmDYJoGenMVSgRWKFK38wh mcSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=wYqVsWqQSWnGTgrw/VKRGKm9uoHuQiVWv/uRiw5UCoQ=; b=bwMR2KlccXmwqeTJwLDTzWF1oXGN4L11zthLoh0NhUTNOFhhot0kJsIvXpgTNP3oIm 4rTgDuX0ivVraIBRUPBngroqNZbM5T9LA2H0qZXLKaFggx0E9h2/Iwf5NrDTWUWT3+oz UFMkzV81A1I1ElTrJQjwLxGyke04WaOVClAIt1iboqRhotPDk60rK/JE7mQrPUKYI1dv wV03GN1XN7PohCP9TWEytRJMJA2ZzmMhgjbHqpBARrpVMxUFzBrF76SbBX4UqT0T8HhM qr9ws9SkBC33xA+WMj2kZwpMfL+ER7BejC8ocKDzBwCFa4lmG9os85daliAh1beLUoV+ 0dsA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=PxS5I0Ub; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s18si11925527pgv.506.2019.04.09.13.44.03; Tue, 09 Apr 2019 13:44:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=PxS5I0Ub; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726595AbfDIUmM (ORCPT + 99 others); Tue, 9 Apr 2019 16:42:12 -0400 Received: from mail-ot1-f68.google.com ([209.85.210.68]:33901 "EHLO mail-ot1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726559AbfDIUmL (ORCPT ); Tue, 9 Apr 2019 16:42:11 -0400 Received: by mail-ot1-f68.google.com with SMTP id k21so16915940otf.1 for ; Tue, 09 Apr 2019 13:42:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wYqVsWqQSWnGTgrw/VKRGKm9uoHuQiVWv/uRiw5UCoQ=; b=PxS5I0UbCAbzgi8wJQRxqK+nG0JjhboHitwK8nsW7DyyRGsMpjNn04uOtMbH4FajOB 3q23Yu5PdnKSRZo2d2LpoB/xMl+IX1MqD6QuPRYo1JT2sar++WO3ZlbjUTH1LtICeF2N Kg6jJiDm1b+BtlYD/3VGuMXsSLSER1E+7tUIXcqByWionKyiKmKDBifsr7YUMMTixP7Z sSlxD7dK7cEusMmFgN6EiVYOBEC57kuVfnB0woPARe95FMciwa86CyCH2BGk4Ukw4p72 bl4yYICbdKil0aSruDHtVBlpk4Rde0lx2fQee3TDW/je/hVjTuX1sg0uPu4PrzG9Idh8 v62g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wYqVsWqQSWnGTgrw/VKRGKm9uoHuQiVWv/uRiw5UCoQ=; b=aHObBJ2pcHmyIuEu/XZv/dlg0NLamFLoQniDTnHDNkFKLTT57KV2g9ulg8qCXs43Xa EJ1rde1N3oLGBpM5t1Tm9ZZrkr8e5kOAmczKhAtlr/J8yWbSNqLfM2bjIistgrceE+kb sJJys+mL5NMO4XxMPu/M3MTT4W/M7OhfYVFcEh7aLyxD7AnA4EoXahc7sR41WUWVojPk ogbune+NIZ/uRS/xzmOdAtJvy3W5ylvqbDbkxVb8Q/7XDNnAQa4OcbBmwWiqhKqhx6dr OHO6LGKTrnYyMei1zRZqCn3rH231G2mW9KyPYg6GqdLi7h7VBI6tEpd6AmKMboh8FKIb XcAw== X-Gm-Message-State: APjAAAW4CmOLTKDB0/cSXvIle4ckXvglS798u2i6toegNtAfsrTb0lZv tP37hJU7yH8kqPufD/BVo65vxFkQ4FDje6iksQ91PA== X-Received: by 2002:a9d:3b25:: with SMTP id z34mr25365368otb.298.1554842530941; Tue, 09 Apr 2019 13:42:10 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jann Horn Date: Tue, 9 Apr 2019 22:41:44 +0200 Message-ID: Subject: Re: [PATCH v3 bpf-next 00/21] bpf: Sysctl hook To: Andrey Ignatov Cc: Network Development , Alexei Starovoitov , Daniel Borkmann , guro@fb.com, kernel-team@fb.com, Luis Chamberlain , Kees Cook , Alexey Dobriyan , kernel list , linux-fsdevel , linux-security-module Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 9, 2019 at 10:26 PM Andrey Ignatov wrote: > The patch set introduces new BPF hook for sysctl. > > It adds new program type BPF_PROG_TYPE_CGROUP_SYSCTL and attach type > BPF_CGROUP_SYSCTL. > > BPF_CGROUP_SYSCTL hook is placed before calling to sysctl's proc_handler so > that accesses (read/write) to sysctl can be controlled for specific cgroup > and either allowed or denied, or traced. Don't look at the credentials of "current" in a read or write handler. Consider what happens if, for example, someone inside a cgroup opens a sysctl file and passes the file descriptor to another process outside the cgroup over a unix domain socket, and that other process then writes to it. Either do your access check on open, or use the credentials that were saved during open() in the read/write handler. > The hook has access to sysctl name, current sysctl value and (on write > only) to new sysctl value via corresponding helpers. New sysctl value can > be overridden by program. Both name and values (current/new) are > represented as strings same way they're visible in /proc/sys/. It is up to > program to parse these strings. But even if a filter is installed that prevents all access to a sysctl, you can still read it by installing your own filter that, when a read is attempted the next time, dumps the value into a map or something like that, right? > To help with parsing the most common kind of sysctl value, vector of > integers, two new helpers are provided: bpf_strtol and bpf_strtoul with > semantic similar to user space strtol(3) and strtoul(3). > > The hook also provides bpf_sysctl context with two fields: > * @write indicates whether sysctl is being read (= 0) or written (= 1); > * @file_pos is sysctl file position to read from or write to, can be > overridden. > > The hook allows to make better isolation for containerized applications > that are run as root so that one container can't change a sysctl and affect > all other containers on a host, make changes to allowed sysctl in a safer > way and simplify sysctl tracing for cgroups. Why can't you use a user namespace and isolate things properly that way? That would be much cleaner, wouldn't it?