Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp3084354pxy; Mon, 3 May 2021 14:54:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwiAkI+axvoUOxIpmmt+I6er5eJaqfcEAY8jjqaqSHTIJ+dJ2YDezN0aA41AwTkJpRvJkNs X-Received: by 2002:a63:ee53:: with SMTP id n19mr12808788pgk.268.1620078862425; Mon, 03 May 2021 14:54:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620078862; cv=none; d=google.com; s=arc-20160816; b=JoIcXgUzTTf9fZYKdj8qqkTkcjMA78224gz/dQUHMAPBGsyU5ECKLcLGch8Py6s2zm YFuafumvkNI4mLF3rvv9PaCaeeydUWqUF9P7/YuxKNJtrkgPE/J35RVbQkOgpFot8lQF kaJtlFu3W/yR2xbC9GCxLLBNfu+2EhS7aFI2nyq/H19werc6kSK95SblZ58UPTN1f/Ft ECsP1Olwr7tUuVIcHhnDkjetvHK/5gCdhKxGxgnHiCiNJ7OH2gOwD1a1QZEsFhq50xjK CoZZ7y8ibVIJtfLPmevC70OJTlKv3eRzeuCbd7eGZy+CaODtGdjfKXfgriamyztQjxSu zU4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version; bh=vdtl/MAZu2OsCSIK0KpZJSUXJx1Iq+7Xqx4DLLui68o=; b=w64f3DLIK3FUJ4502jYSTO7IgHD6QG2qDsC/4K//yYEOPUKI5HrUyY6svr5XVSDTMP ZtCQ1rV4CEyihd1yX3TYo+WLDTApz9hr/smqwb6u9g1xOTx9j3PNma4JsedIl4rQuN7q 96bCG67PnXqK8J+t6vhB/pw2ohb0ncy5DyYoNV8ucV4g7GwuqFm7EW4Ojq7gCq+O1uAb mqzEEfeTBuXdolUPU0vxfGo6i6sd+XvSgQ59OuUsb7UL8GlKxlNnIk4QXnM8Ht5SOfex n4rj4z8pNRqEM4tnOOoOTc1gymuhqGxZxcnU66NFNp+rij7OtxXzYd1reCRQ61Ocy6zf 3mrg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d2si14026386pfr.87.2021.05.03.14.54.09; Mon, 03 May 2021 14:54:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229665AbhECVye (ORCPT + 99 others); Mon, 3 May 2021 17:54:34 -0400 Received: from mail-lj1-f181.google.com ([209.85.208.181]:33622 "EHLO mail-lj1-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229497AbhECVyd (ORCPT ); Mon, 3 May 2021 17:54:33 -0400 Received: by mail-lj1-f181.google.com with SMTP id s25so8730593lji.0 for ; Mon, 03 May 2021 14:53:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=vdtl/MAZu2OsCSIK0KpZJSUXJx1Iq+7Xqx4DLLui68o=; b=U2JjzFZ/MKITLTJF1gViXNG/rpIykk0irBgl6WMtwRXSJ3u0j1bJhvIdrU1kH/0mD2 MWG/cM64XFQBaGrZJXJK1qbIy/Zx08erXbM9QQ8ADg6NCQNmfKqepYR3pIt9kXduvA0j Hdj4JMIMfj/4mKze5/dpqE3Z3KKq6Oh6tN0Hekbu4Di/9fKH9ljwyNVHWPJz196Xwt/z b2K/lIDSBA8/B3RcKs6wI6J/NC3FMBgh5w01hZqOFnoqVGAkYv82nI0xgfzRiMJ/wEXF ABIF1MaGX6FXu89rWLRB5SZ6KBaYqIH+km0ibgY0HNoXT/9DqK1QkORaaCPzlYk2zCx3 MwNw== X-Gm-Message-State: AOAM530DxS8g92bi5hd1gKHPbUpJ8ZnPOq5AgA5WK9akdgSiT27A29pM C8RPiv/nPj+MZiVU9k4quPb3r2RLbTfPtt0MdKU= X-Received: by 2002:a05:651c:307:: with SMTP id a7mr15103342ljp.166.1620078819231; Mon, 03 May 2021 14:53:39 -0700 (PDT) MIME-Version: 1.0 References: <20210413155337.644993-1-namhyung@kernel.org> <20210413155337.644993-2-namhyung@kernel.org> In-Reply-To: From: Namhyung Kim Date: Mon, 3 May 2021 14:53:27 -0700 Message-ID: Subject: Re: [PATCH v3 1/2] perf/core: Share an event with multiple cgroups To: Peter Zijlstra Cc: Stephane Eranian , Ingo Molnar , Arnaldo Carvalho de Melo , Jiri Olsa , Mark Rutland , Alexander Shishkin , LKML , Andi Kleen , Ian Rogers , Song Liu , Tejun Heo , kernel test robot , Thomas Gleixner Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Peter, On Wed, Apr 21, 2021 at 12:37 PM Namhyung Kim wrote: > > On Tue, Apr 20, 2021 at 8:29 PM Peter Zijlstra wrote: > > > > On Tue, Apr 20, 2021 at 01:34:40AM -0700, Stephane Eranian wrote: > > > This does not scale for us: > > > - run against the fd limit, but also memory consumption in the > > > kernel per struct file, struct inode, struct perf_event .... > > > - number of events per-cpu is still also large > > > - require event scheduling on cgroup switches, even with RB-tree > > > improvements, still heavy > > > - require event scheduling even if measuring the same events across > > > all cgroups > > > > > > One factor in that equation above needs to disappear. The one counter > > > per file descriptor is respected with Nahmyung's patch because he is > > > operating a plain per-cpu mode. What changes is just how and where the > > > count is accumulated in perf_events. The resulting programming on the > > > hardware is the same as before. > > > > Yes, you're aggregating differently. And that's exactly the problem. The > > aggregation is a variable one with fairly poor semantics. Suppose you > > create a new cgroup, then you have to tear down and recreate the whole > > thing, which is pretty crap. > > Yep, but I think cgroup aggregation is an important use case and > we'd better support it efficiently. > > Tracking all cgroups (including new one) can be difficult, that's why > I suggested passing a list of interested cgroups and counting them > only. I can change it to allow adding new cgroups without tearing > down the existing list. Is that ok to you? Trying to move it forward.. I'll post v4 if you don't object to adding new cgroup nodes while keeping the existing ones. Thanks, Namhyung > > > > > Ftrace had a similar issue; where people wanted aggregation, and that > > resulted in the event histogram, which, quite frankla,y is a scary > > monster that I've no intention of duplicating. That's half a programming > > language implemented. > > The ftrace event histogram supports generic aggregation. IOW users > can specify which key and data field to aggregate. That surely would > complicate the things. > > > > > > As you point out, the difficulty is how to express the cgroups of > > > interest and how to read the counts back. I agree that the ioctl() is > > > not ideal for the latter. For the former, if you do not want ioctl() > > > then you would have to overload perf_event_open() with a vector of > > > cgroup fd, for instance. As for the read, you could, as you suggest, > > > use the read syscall if you want to read all the cgroups at once using > > > a new read_format. I don't have a problem with that. As for cgroup-id > > > vs. cgroup-fd, I think you make a fair point about consistency with > > > the existing approach. I don't have a problem with that either > > > > So that is a problem of aggregation; which is basically a > > programmability problem. You're asking for a variadic-fixed-function > > now, but tomorrow someone else will come and want another one. > > Well.. maybe we can add more stuff later if it's really needed. > But BPF also can handle many aggregations these days. :) > > Thanks, > Namhyung