MIME-Version: 1.0
References: <20230126105128.2249938-1-dvyukov@google.com> <20230126154118.2393850-1-dvyukov@google.com>
 <CANpmjNM=PVigDZKu-H_-cLECUJKSx7TH+kxSjfF=4UHdrGBj+g@mail.gmail.com>
 <87o7qlgjce.ffs@tglx> <CACT4Y+aMLeCo9+nwXrFWo8FLG8rKHDe8v2ppkZ+mOaKAF6qtgw@mail.gmail.com>
In-Reply-To: <CACT4Y+aMLeCo9+nwXrFWo8FLG8rKHDe8v2ppkZ+mOaKAF6qtgw@mail.gmail.com>
From:   Dmitry Vyukov <dvyukov@google.com>
Date:   Thu, 2 Feb 2023 08:36:34 +0100
Message-ID: <CACT4Y+ZKEHfg7M5QW3jDLPjxUmU6mFF8r+yb2Zb77_kL13NnqQ@mail.gmail.com>
Subject: Re: [PATCH v4] posix-timers: Prefer delivery of signals to the
 current thread
To:     Thomas Gleixner <tglx@linutronix.de>
Cc:     Marco Elver <elver@google.com>, oleg@redhat.com,
        linux-kernel@vger.kernel.org,
        "Eric W . Biederman" <ebiederm@xmission.com>,
        Frederic Weisbecker <frederic@kernel.org>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

,On Fri, 27 Jan 2023 at 07:58, Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Thu, 26 Jan 2023 at 20:57, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > On Thu, Jan 26 2023 at 18:51, Marco Elver wrote:
> > > On Thu, 26 Jan 2023 at 16:41, Dmitry Vyukov <dvyukov@google.com> wrote:
> > >>
> > >> Prefer to deliver signals to the current thread if SIGEV_THREAD_ID
> > >> is not set. We used to prefer the main thread, but delivering
> > >> to the current thread is both faster, and allows to sample actual thread
> > >> activity for CLOCK_PROCESS_CPUTIME_ID timers, and does not change
> > >> the semantics (since we queue into shared_pending, all thread may
> > >> receive the signal in both cases).
> > >
> > > Reviewed-by: Marco Elver <elver@google.com>
> > >
> > > Nice - and and given the test, hopefully this behaviour won't regress in future.
> >
> > The test does not tell much. It just waits until each thread got a
> > signal once, which can take quite a while. It does not tell about the
> > distribution of the signals, which can be completely randomly skewed
> > towards a few threads.
> >
> > Also for real world use cases where you have multiple threads with
> > different periods and runtime per period I have a hard time to
> > understand how that signal measures anything useful.
> >
> > The most time consuming thread might actually trigger rarely while other
> > short threads end up being the ones which cause the timer to fire.
> >
> > What's the usefulness of this information?
> >
> > Thanks,
> >
> >         tglx
>
> Hi Thomas,
>
> Our goal is to sample what threads are doing in production with low
> frequency and low overhead. We did not find any reasonable existing
> way of doing it on Linux today, as outlined in the RFC version of the
> patch (other solutions are either much more complex and/or incur
> higher memory and/or CPU overheads):
> https://lore.kernel.org/all/20221216171807.760147-1-dvyukov@google.com/
>
> This sampling does not need to be 100% precise as CPU profilers would
> require, getting high precision generally requires more complexity and
> overheads. The accent is on use in production and low overhead.
> Consider we sample with O(seconds) interval, so some activities can
> still be unsampled whatever we do here (if they take <second). But on
> the other hand the intention is to use this over billions of CPU
> hours. So on the global scale we still observe more-or-less
> everything.
>
> Currently all signals are practically delivered to the main thread and
> the added test does not pass (at least I couldn't wait long enough).
> After this change the test passes quickly (within a second for me).
> Testing the actual distribution without flaky failures is very hard in
> unit tests. After rounds of complaints and deflaking they usually
> transform into roughly what this test is doing -- all threads are
> getting at least something.
> If we want to test ultimate fairness, we would need to start with the
> scheduler itself. If threads don't get fair fractions, then signals
> won't be evenly distributed as well. I am not sure if there are unit
> tests for the scheduler that ensure this in all configurations (e.g.
> uneven ratio of runnable threads to CPUs, running in VMs, etc).
> I agree this test is not perfect, but as I said, it does not pass now.
> So it is useful and will detect a future regression in this logic. It
> ensures that running threads eventually get signals.
>
> But regardless of our motivation, this change looks like an
> improvement in general. Consider performance alone (why would we wake
> another thread, maybe send an IPI, evict caches). Sending the signal
> to the thread that overflowed the counter also looks reasonable. For
> some programs it may actually give a good picture. Say thread A is
> running for a prolonged time, then thread B is running. The program
> will first get signals in thread A and then in thread B (instead of
> getting them on an unrelated thread).


Hi Thomas,

Has this answered your question? Do you have any other concerns?
If not, please take this into some tree for upstreamming.

Thanks