Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp7143276rwd; Tue, 6 Jun 2023 07:00:43 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7AQSsfs5PHOZgcNkG/L6efPEctyPfVaPpwWeSc6VFOqAUSBoXr9trHPqs0kiTtF3Emb9Fs X-Received: by 2002:a05:620a:6887:b0:75d:5321:93c7 with SMTP id rv7-20020a05620a688700b0075d532193c7mr3554008qkn.8.1686060043298; Tue, 06 Jun 2023 07:00:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686060043; cv=none; d=google.com; s=arc-20160816; b=Ct4PtoIWbvZIgQ++5P4X+Ujb1wXXcrmdzrK3B46Caz4Tsm//UYmk02rrcQ8Kx6RFl0 Kk+rcYqEmGj5rQwIf02VQU4UFq12C9+g81xjdHmex5HK8GD+AEm3CYntqmEFxV7l/AOl sDQ91Z7ZPhJGdLVsfJzTjuGcSDEeoJMClNVs9KmVYuMBivo94f0Yu2GyMuXnxIQgrxDA LhLWSDjpy3WrFmPeqGeY0XmXYA40WPaFFkNNLQUd8OPd28KAh7BopXsXpjIo7DWCC9sK dlnadvk315O9Uz8lKfC55bDOXcR1pWZMWvRlRbUA7PIogRafwAnCfJDNKaxMliAlh+DG fybw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=eThZQvFWmzmhTggA3Otd6xOMXYBV7emOAlqFL94unyA=; b=hlBssHHRG6BzxnzBWjNIq5FdpFsQbrQpwbC1ASmWqvOtUcmRVHPss6/NcRdejvjtbf ylQnjPGV5G+MIOn74CGqarSjQbnXP6r9bWYTJiT9GQH7zmgrvPXYH3/nE9rY+VFDRyD9 ypACwle7fOv4P3wHFucgUXRUZI1VMgZ0kU6bWGcUevb/kkQ+I4Sv6jq+JQwhEzSHeu2K bBLtBqQa544Yn8RdgUVw8BCOEYFgeNFUbTblAcszIbY4PmhRKn8nwsnduDBAMpfj+w5w MCOaAmXrR73Yw2w4iZVf/EiDgafxjOoNazFD1hF6GDc29i1cqnoKRt8685mYXE2+oTx6 wQGQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="d/R/Q/nl"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r4-20020a056214212400b00626102d23e6si6713926qvc.102.2023.06.06.07.00.25; Tue, 06 Jun 2023 07:00:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="d/R/Q/nl"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237625AbjFFNiD (ORCPT + 99 others); Tue, 6 Jun 2023 09:38:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58144 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236082AbjFFNiA (ORCPT ); Tue, 6 Jun 2023 09:38:00 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2B33D12D; Tue, 6 Jun 2023 06:37:59 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id BC49762B4D; Tue, 6 Jun 2023 13:37:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1E865C433EF; Tue, 6 Jun 2023 13:37:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1686058678; bh=rwRJU4dA7H36UcTPEUz7Yplp6r5Exdcdk93rtl/HYAU=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=d/R/Q/nlfdcQMmHuOjZt9MdJfBYy5S1FIlzopQx4ZikGL0Tzkw7SFV4VurBbma8UL pkTTSDTfaI8MwHwcDaX401wEZzg5Op0yLdIzbwhPzwuKRbe/l9c1iDNQUw0n7wgIAF 0IxOfWUWmuH2BfevMH64WnRQwixQx1YLkPPXZQZ8OL4LuQnHigzV6DWs1q4+8J4Syo OQ0CXrXkecT9iMHLTYJ1fI74G65qhwFn5nAmZnSs8O7cd8vXWxbPqYlpGyZfojqoMy 2g9VA0Gv0Y/6uQQe+XD9Jc6CTBDxpHGFloJiTsqkxrynJUChHV6E8EXAWCHg7FH6Di 5k2rVgzIHEUTQ== Date: Tue, 6 Jun 2023 22:37:52 +0900 From: Masami Hiramatsu (Google) To: Beau Belgrave Cc: Christian Brauner , Alexei Starovoitov , Steven Rostedt , Masami Hiramatsu , LKML , linux-trace-kernel@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , bpf , David Vernet , Linus Torvalds , Dave Thaler , Christoph Hellwig Subject: Re: [PATCH] tracing/user_events: Run BPF program if attached Message-Id: <20230606223752.65dd725c04b11346b45e0546@kernel.org> In-Reply-To: <20230601162921.GA152@W11-BEAU-MD.localdomain> References: <20230509163050.127d5123@rorschach.local.home> <20230515165707.hv65ekwp2djkjj5i@MacBook-Pro-8.local> <20230515192407.GA85@W11-BEAU-MD.localdomain> <20230517003628.aqqlvmzffj7fzzoj@MacBook-Pro-8.local> <20230516212658.2f5cc2c6@gandalf.local.home> <20230517165028.GA71@W11-BEAU-MD.localdomain> <20230601-urenkel-holzofen-cd9403b9cadd@brauner> <20230601152414.GA71@W11-BEAU-MD.localdomain> <20230601-legten-festplatten-fe053c6f16a4@brauner> <20230601162921.GA152@W11-BEAU-MD.localdomain> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Beau, On Thu, 1 Jun 2023 09:29:21 -0700 Beau Belgrave wrote: > > > These are stubs to integrate namespace support. I've been working on a > > > series that adds a tracing namespace support similiar to the IMA > > > namespace work [1]. That series is ending up taking more time than I > > > > Look, this is all well and nice but you've integrated user events with > > tracefs. This is currently a single-instance global filesystem. So what > > you're effectively implying is that you're namespacing tracefs by > > hanging it off of struct user namespace making it mountable by > > unprivileged users. Or what's the plan? > > > > We don't have plans for unprivileged users currently. I think that is a > great goal and requires a proper tracing namespace, which we currently > don't have. I've done some thinking on this, but I would like to hear > your thoughts and others on how to do this properly. We do talk about > this in the tracefs meetings (those might be out of your time zone > unfortunately). > > > That alone is massive work with _wild_ security implications. My > > appetite for exposing more stuff under user namespaces is very low given > > the amount of CVEs we've had over the years. > > > > Ok, I based that approach on the feedback given in LPC 2022 - Containers > and Checkpoint/Retore MC [1]. I believe you gave feedback to use user > namespaces to provide the encapsulation that was required :) Even with the user namespace, I think we still need to provide separate "eventname-space" for each application, since it may depend on the context who and where it is launched. I think the easiest solution is (perhaps) providing a PID-based new groups for each instance (the PID-prefix or suffix will be hidden from the application). I think it may not good to allow unprivileged user processes to detect the registered event name each other by default. > > > > anticipated. > > > > Yet you were confident enough to leave the namespacing stubs for this > > functionality in the code. ;) > > > > What is the overall goal here? Letting arbitrary unprivileged containers > > define their own custom user event type by mounting tracefs inside > > unprivileged containers? If so, what security story is going to > > guarantee that writing arbitrary tracepoints from random unprivileged > > containers is safe? > > > > Unprivileged containers is not a goal, however, having a per-pod > user_event system name, such as user_event_, would be ideal > for certain diagnostic scenarios, such as monitoring the entire pod. That can be done in the user-space tools, not in the kernel. > When you have a lot of containers, you also want to limit how many > tracepoints each container can create, even if they are given access to > the tracefs file. The per-group can limit how many events/tracepoints > that container can go create, since we currently only have 16-bit > identifiers for trace_event's we need to be cautious we don't run out. I agree, we need to have a knob to limit it to avoid DoS attack. > user_events in general has tracepoint validators to ensure the payloads > coming in are "safe" from what the kernel might do with them, such as > filtering out data. [...] > > > changing the system name of user_events on a per-namespace basis. > > > > What is the "system name" and how does it protect against namespaces > > messing with each other? > > trace_events in the tracing facility require both a system name and an > event name. IE: sched/sched_waking, sched is the system name, > sched_waking is the event name. For user_events in the root group, the > system name is "user_events". When groups are introduced, the system > name can be "user_events_" for example. So my suggestion is using PID in root pid namespace instead of GUID by default. Thank you, -- Masami Hiramatsu (Google)