Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2029463pxj; Sat, 19 Jun 2021 00:36:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxesUIeII8vbNJfeGK6AMOV7kq6jmqExJwLI7QfkIdFQ7qrfMTt9fD8QEDnYWT6Rki974Ig X-Received: by 2002:a05:6402:50d2:: with SMTP id h18mr8500044edb.384.1624088169323; Sat, 19 Jun 2021 00:36:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624088169; cv=none; d=google.com; s=arc-20160816; b=FgqDA5zc9/WRApemUwtp4KdWnd2OsgJI4ay77gw0Fx9BpSvyaBiboXL9BMlaVUsuBk 5RMwc6i1r/8gP1nakfuMQU35iS3kAlcK7P8v/y6PQBKRisAmTW5mqFyX83YDGpPkg0LE jNJMxFcrB0sjKcPCHyQICyg9QQOEAd0u9KrSUZEjxbN44VQwToTOlEbguy1t//LD1kOm 5V3hLUNxIkLsYXA9NMrT/sc1CA+hrt3b8pRYbfXnCtg5razwIoQ7ePNUDrdk1afJ13kK k1PeabKy6h8B+C1duNUjDex7HavzLiAa391G1Lqa4sadfl2Nc9m5loeA0yn4y15sOcrX iI8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=0dH3VEH9mRkPB+iMMG13Bg8QIpRSqyXaZ7vD6FSJ6Q8=; b=czmfrTBhwcJVPTpL+HApXypDc2KTc6fNbGyMVcM8f538ZmG8C9oI3a0Epvr6Jhuxg3 /SYXfv7uxnqUM/+t/JnnstUtW945uqMfUxpiu8vvviG37f1Izwkw97fp9v0ekEL3cVT3 DDgvFkEb5kElYeL9dXDqo4e43qAW/bRuX3I+ELslnv/3QDR9UaQigrR4t9a1xZ42xRzS J1CymwTKwCw5f2vvYoz3hkRtorie5x8tSPVJSbpn00c3GZj7Z/tf3XnEik0twtGKnmXe 3Rlx/Op6+Oq1qzCLlp3Y0bb5VFz6s7L9yvzsfRvDXkpVh6E09hKXAD40sZOzC5BKetRy 57JQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linutronix.de header.s=2020 header.b=EvgI4ThM; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=J1eYyM5F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v26si10908212eds.296.2021.06.19.00.35.46; Sat, 19 Jun 2021 00:36:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=fail header.i=@linutronix.de header.s=2020 header.b=EvgI4ThM; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=J1eYyM5F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235047AbhFRWtS (ORCPT + 99 others); Fri, 18 Jun 2021 18:49:18 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:59600 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231455AbhFRWtR (ORCPT ); Fri, 18 Jun 2021 18:49:17 -0400 From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1624056424; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=0dH3VEH9mRkPB+iMMG13Bg8QIpRSqyXaZ7vD6FSJ6Q8=; b=EvgI4ThMXq7mw55btOrbvC5XMqBy+62FBj6X7cCE+sM9qHF5Wd6m8w5uIMNyW7MJt7L482 NLiPaCjdEqFPc5cwtS00du/Q5HP9A1dwKB67axQ8Ig6Rs03wsiI7Z1reoTK0+ZNl/KQUze JQFbkUvrS2DeMhw5h5m0wFdopfPKkmFPzsKSlYFzU6XL3m+ilRgeWxJLGmGAPaqVddCMGr Tacf56+DP/cGko2g5hLQDgKxCKN2Y3/+OaQS2XCvXcv4O8otqRtrdl5qinu/QY33qLdd5u 8h/GV+3y4JsJKRjt+Y4bPi0sf/mUuPgZMPfm0wSkAtrqYB+gYTfLq0kod+Ghog== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1624056424; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=0dH3VEH9mRkPB+iMMG13Bg8QIpRSqyXaZ7vD6FSJ6Q8=; b=J1eYyM5FnvUOfduY5/xCpovdMVbnQQY/6U+oOtSydNL2Txvp6Alw8UltJsFNy9miAWtZlh 0lZpGx+w9S7zvbAQ== To: Nicolas Saenz Julienne Cc: linux-rt-users@vger.kernel.org, frederic@kernel.org, mtosatti@redhat.com, LKML , Peter Zijlstra Subject: Re: [RFC 2/2] timers: Make sure irq_work is handled when no pending timers In-Reply-To: <20210610125945.558872-2-nsaenzju@redhat.com> References: <20210610125945.558872-1-nsaenzju@redhat.com> <20210610125945.558872-2-nsaenzju@redhat.com> Date: Sat, 19 Jun 2021 00:47:04 +0200 Message-ID: <87mtrmeqon.ffs@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Nicolas, On Thu, Jun 10 2021 at 14:59, Nicolas Saenz Julienne wrote: > PREEMPT_RT systems defer most irq_work handling into the timer softirq. > This softirq is only triggered when a timer is expired, which adds some > delay to the irq_work handling. It's a price PREEMPT_RT systems are > willing to pay in exchange for less IRQ noise. > > This works fine for the majority of systems, but there's a catch. What > if no timer is ever armed after an irq_work is queued. This has been > observed on nohz_full CPUs while running oslat. The lack of armed timers > prevents a pending irq_work to run. Which in turn prevents the nohz code > from fully stopping the tick. > > To avoid this situation introduce new logic in run_local_timers(). The > timer softirq will be triggered when an irq_work is pending but no > timers have been armed. This situation is only possible in PREEMPT_RT > systems, so make the code conditional to it. now I can see the problem you are trying to solve, but unfortunately the solution is fundamentally wrong. > NOTE: All in all, this is the best I could think of with my limited > timers knowledge. A bigger hammer would be to unanimously trigger > the softirq if irq_work_needs_cpu(). But I get the feeling this is > something we want to avoid. Technical decisions based on feelings are not solving anything and they result in hard to analyse subtle issues: Q: Assume there is a timer armed to expire 24h from now. What's the difference to no timer being armed? A: None. Just because your use case has either no timers armed at all or has timers armed with short expiry is no reason to ignore the really simple and obvious technical facts. But that aside, you analyzed the problem pretty good, but then you stopped short of identifying the root cause and went off to cure the symptom. The irq_work deferring on RT to the timer softirq is the root cause of the problem. It's a sloppy hack and I'm well aware of that. So this needs to be fixed and not worked around by adding random undocumented workarounds into the timer code. Let's take a step back and analyze why this deferring to timer softirq context exists on RT. 1) The irq_work callbacks which are deferred on RT cannot be invoked from hard interrupt (IPI) context usually - due to spin_lock usage. 2) Such irq_work has to be delegated to some suitable context and the trivial and lazy way out was to just stick into the timer softirq. That hack survived for a long time and while I was aware of it, it was not really high on my priority list of cleanups. The real solution is to delegate this to a suitable context which is executed independent of any other constraints. There are two solutions: 1) Create a IRQ_WORK softirq and raise that 2) Simply delegate it to a workqueue So now you might rightfully ask why I did not do that back then: #1 is not an option because we don't want to proliferate softirqs for various reasons. #2 was not feasible because back then queueing work from hard interrupt context was not possible. So yes, I took the sloppy and easy way out and just glued it onto the timer softirqs. Nobody complained so far. As we since then made work queues RT aware and it's possible to queue work from the irq_work IPI context, the obvious solution is to delegate this to a work queue. If we do a proper analysis of the affected irq work callbacks then this probably makes a lot of sense independent of RT. There are only a few really urgent irq work items which need to be handled immediately in the IPI. RT is special, but as we have demonstrated over time it's not _that_ special. It just needs a proper analysis and a real good argument why something has to be special for RT and does not fit into the common case. Or to demonstrate that the common case approach of 'do it right away' is pointless or even harmfull. Thanks, tglx --- P.S.: I'm not blaming this on you as a newcomer. There are people on your team who should have known better.