Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
IronPort-SDR: eT9EWZ6McbmJyxUR46EEvuJf+wE83m/2qgJkKakHv6vf8j9vM9wZsLM83hwu+8azWAvoXdhNOz
 nsca1Nqq/iFw==
IronPort-SDR: vpzcQZYNsAW9jmuTF4kuBbii8AQLquvwRdeTooIXqtd8XQZ1yR4g0CDZCRIOZ1tG2wWtjlqdCU
 jErFZ0Bliw8Q==
Date:   Mon, 10 Aug 2020 07:45:18 -0700
From:   Andi Kleen <ak@linux.intel.com>
To:     Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc:     peterz@infradead.org, Arnaldo Carvalho de Melo <acme@redhat.com>,
        Ingo Molnar <mingo@redhat.com>, linux-kernel@vger.kernel.org,
        Jiri Olsa <jolsa@kernel.org>, alexey.budankov@linux.intel.com,
        adrian.hunter@intel.com
Subject: Re: [PATCH 1/2] perf: Add closing sibling events' file descriptors
Message-ID: <20200810144518.GB1448395@tassilo.jf.intel.com>
References: <20200708151635.81239-1-alexander.shishkin@linux.intel.com>
 <20200708151635.81239-2-alexander.shishkin@linux.intel.com>
 <20200806083530.GV2674@hirez.programming.kicks-ass.net>
 <20200806153205.GA1448395@tassilo.jf.intel.com>
 <875z9q1u3g.fsf@ashishki-desk.ger.corp.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <875z9q1u3g.fsf@ashishki-desk.ger.corp.intel.com>
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

> It didn't. I can't figure out what to charge on the locked memory, as
> all that memory is in kernel-side objects. It also needs to make sense

I don't see how that makes a difference for the count. It just account
bytes. Can you elaborate?

> as iirc the default MLOCK_LIMIT is quite low, you'd hit it sooner than
> the file descriptor limit.

For a single process? 

> 
> > It has a minor issue that it might break some existing setups that rely
> > on the mmap fitting exactly into the mmap allocation, but that could
> > be solved by allowing a little slack, since the existing setups
> > likely don't have that many events.
> 
> I don't see how to make this work in a sane way. Besides, if we have to
> have a limit anyway, sticking with the existing one is just easier and
> 1:1 is kind of more logical.

It's just a very wasteful way because we need an extra inode and file descriptor
for each event*cgroup.

And of course it's super user unfriendly because managing the fd limits
is a pain, apart from them not really working that well anyways 
(since someone who really wants to do a memory DoS can still open
RLIMIT_NPROC*RLIMIT_NFILE fds just by forking) 

Unfortunately we're kind of stuck with the old NFILE=1024 default
even though it makes little sense on modern servers. Right now a lot
of very reasonable perf stat command lines don't work out of the box
on larger machines because of this (and since cores are growing all
the time "larger machines" of today are the standard servers of
tomorrow)

Maybe an alternative would be to allow a multiplier. For each open fd
you can have N perf events. With N being a little higher than the current
cost of the inode + file descriptor together.

But would really prefer to have some kind of limit per uid that is
actually sane.

-Andi