Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp1132091pxb; Fri, 22 Apr 2022 20:49:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxfZUAWSSyV4vZlPcIxVNxKIzYWUuv/vqdjZCvMIowqRbvp/M+BVYYUjfR3paYumZD2o7qM X-Received: by 2002:a17:90b:4c84:b0:1d2:cadc:4e4d with SMTP id my4-20020a17090b4c8400b001d2cadc4e4dmr19713277pjb.8.1650685760406; Fri, 22 Apr 2022 20:49:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650685760; cv=none; d=google.com; s=arc-20160816; b=eQBbwo2aowLpUClgO/w+tZnQtMYIw1D5yzWKkXrOvl3HmuJH8uZSp5JTPorFgb3hLb xua07w2Y2k35iJtlyyDR03k4XEtRXfhcqsKmV15Ek8gKyS/yEyURt/wBSm5y+sLRMwrI eZiu7qeFDg9VY+yXM3LU+bEEncK9j/EVSWKsSBz/R62hXhS+H86aSvLpljaw2IVACCYn yHQwBwLP26PDEXPGrgshknDqxYOMx5GgE5BP0Ee0YsObpk2z+bmQQWPoEZJFHHucScYR 4SeDvoDRHu01Op7cTIzZt9dd0WONWjSh+zhzxzwZ0tnLBGwZ+HeY0Tth77aGuiSVHCc2 HSZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:message-id :mime-version:in-reply-to:references:cc:to:subject:from:date :dkim-signature; bh=viXXWgWXVo95+o/9NZZA7H7Wid1vy02UHZgXcnwGRG0=; b=grrT5ukFaWDm+ZjKzS7zuL+JVkzwH7TzbYpDErpxfBpwQOyf0BQARiG1ZLZmvWIJsk kQPhfRd6OnKwRpDffHaJ+xUSNNVS6UtEyyrVl16aryLS1MDqtkneI5lJPWGo33wAuu8f IrUluag5Yh5MRgL48WEPpS6ttM6Mm0scqfud2AKMjTKnfDA94ZkXcokqhrlCaGvcp2wv cEkAE0/v9fJrDYVQ6EoSRXCAFYs72lJRCBMl+avoIsOo+TOFNE1YX+eWzvjNqRHgIEAA kDfwp7iYVmuQ//S48TvUodbNCqa9zz2ZDe/3m92EQOHx9zug2N4vb8GiE4ITdYX8xFko nvsg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=gPj64rNy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bm19-20020a656e93000000b00382274f6ae9si10745388pgb.589.2022.04.22.20.49.04; Fri, 22 Apr 2022 20:49:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=gPj64rNy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231196AbiDWCck (ORCPT + 99 others); Fri, 22 Apr 2022 22:32:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49140 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231948AbiDWCcc (ORCPT ); Fri, 22 Apr 2022 22:32:32 -0400 Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 34EEA22CA98; Fri, 22 Apr 2022 19:29:37 -0700 (PDT) Received: by mail-pf1-x430.google.com with SMTP id x80so9644371pfc.1; Fri, 22 Apr 2022 19:29:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:subject:to:cc:references:in-reply-to:mime-version :message-id:content-transfer-encoding; bh=viXXWgWXVo95+o/9NZZA7H7Wid1vy02UHZgXcnwGRG0=; b=gPj64rNyAU5X7R2GnB5ivbb2GkOtZwpkeSL7PXb1I4pG9+VwTbZebtgmrNmy2h9xyZ Do+gaWYFMCm7y1fKMksrcse9C+iAW/8dF7Y3W9cUP0T6MkW4hRZELHDvqfCVk2MiWyYs x7UVyWVSC2kVmWuLTsd22vCDGi+rT0cKczrLmptm3Uf6P1i4i3a9NUoDuF0Qh4xAWkVC hNHnlAwDVqqzW9IdKnpRxa59naX3DoWYZXvOcb3S4WnvS2joavGB4772Y7uwKNyYygtv kyVrUNQ/jG3FdBhNYsgaDmoxcQXYIjeluqBuZsoqaOKR9B29Cmd13gp0t7lpRYPblAqA ROyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to :mime-version:message-id:content-transfer-encoding; bh=viXXWgWXVo95+o/9NZZA7H7Wid1vy02UHZgXcnwGRG0=; b=h0FoIZR0G5o7olkluQm834109/qeSDgCzX3DoU4d5hBMQpYWEIFxNVhXkxMnjjM8aH jZh2+8BWzL2bQwL29wgYfMmP5X1cAcmEopp7y0pMmf99e+m8YlzVHG6QQPvXV/Yw4C5U dQIcQkY0mAwpbePEV9Fcmv6/vCSBPAC68ErxcOf+jNmBT5tv7nOq47giG3xElma6UiuR A/Y9F3PbIKzFe6Rk0y/Dgz3/ZlzuHPBTPBX6wwiRJyqVq/enatAbPlRrAm5/AkpBhL4z +AF2bZ7zMLGn9KmvjPcpAmMhOfJqNrswTJhNfZ167amHQsINITnn1YYvvmwPFqHVcrxJ RW+g== X-Gm-Message-State: AOAM531gC4vMXyBslc3EPk9PVFh36xDscyl3PuLLJAF4Ke5jsGOXFU8n P73DJeaPjFBtczmsGvjYxuQ= X-Received: by 2002:a05:6a00:2444:b0:4fd:db81:cbdd with SMTP id d4-20020a056a00244400b004fddb81cbddmr7857701pfj.32.1650680976716; Fri, 22 Apr 2022 19:29:36 -0700 (PDT) Received: from localhost (193-116-116-20.tpgi.com.au. [193.116.116.20]) by smtp.gmail.com with ESMTPSA id g9-20020a056a00078900b0050a4bae652fsm3701750pfu.131.2022.04.22.19.29.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Apr 2022 19:29:36 -0700 (PDT) Date: Sat, 23 Apr 2022 12:29:30 +1000 From: Nicholas Piggin Subject: Re: To: Michael Ellerman , paulmck@kernel.org, Thomas Gleixner , Zhouyi Zhou Cc: =?iso-8859-1?q?Daniel=0A?= Lezcano , linux-kernel@vger.kernel.org, linuxppc-dev , Miguel Ojeda , rcu , =?iso-8859-1?q?Viresh=0A?= Kumar References: <20220406170012.GO4285@paulmck-ThinkPad-P17-Gen-1> <87pmls6nt7.fsf@mpe.ellerman.id.au> <87k0bz7i1s.fsf@mpe.ellerman.id.au> <1649818529.j46672mh2p.astroid@bobo.none> <87fsm55d3f.ffs@tglx> In-Reply-To: <87fsm55d3f.ffs@tglx> MIME-Version: 1.0 Message-Id: <1650680264.0sj0tnkzx7.astroid@bobo.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Excerpts from Thomas Gleixner's message of April 23, 2022 1:53 am: > On Wed, Apr 13 2022 at 15:11, Nicholas Piggin wrote: >> So we traced the problem down to possibly a misunderstanding between=20 >> decrementer clock event device and core code. >> >> The decrementer is only oneshot*ish*. It actually needs to either be=20 >> reprogrammed or shut down otherwise it just continues to cause=20 >> interrupts. >=20 > I always thought that PPC had sane timers. That's really disillusioning. My comment was probably a bit misleading explanation of the whole situation. This weirdness is actually in software in the powerpc clock event driver due to a recent change I made assuming the clock=20 event goes to oneshot-stopped. The hardware is relatively sane I think, global synchronized constant rate high frequency clock distributed to the CPUs so reads don't go off-core. And per-CPU "decrementer" event interrupt at the same frequency as the clock -- program it to a +ve value and it decrements until zero then creates basically a level triggered interrupt. Before my change, the decrementer interrupt would always clear the interrupt at entry. The event_handler usually programs another timer in so I tried to avoid that first clear counting on the oneshot_stopped callback to clear the interrupt if there was no other timer. >> Before commit 35de589cb879, it was sort of two-shot. The initial=20 >> interrupt at the programmed time would set its internal next_tb variable= =20 >> to ~0 and call the ->event_handler(). If that did not set_next_event or=20 >> stop the timer, the interrupt will fire again immediately, notice=20 >> next_tb is ~0, and only then stop the decrementer interrupt. >> >> So that was already kind of ugly, this patch just turned it into a hang. >> >> The problem happens when the tick is stopped with an event still=20 >> pending, then tick_nohz_handler() is called, but it bails out because=20 >> tick_stopped =3D=3D 1 so the device never gets programmed again, and so = it=20 >> keeps firing. >> >> How to fix it? Before commit a7cba02deced, powerpc's decrementer was=20 >> really oneshot, but we would like to avoid doing that because it require= s=20 >> additional programming of the hardware on each timer interrupt. We have=20 >> the ONESHOT_STOPPED state which seems to be just about what we want. >> >> Did the ONESHOT_STOPPED patch just miss this case, or is there a reason=20 >> we don't stop it here? This patch seems to fix the hang (not heavily >> tested though). >=20 > This was definitely overlooked, but it's arguable it is is not required > for real oneshot clockevent devices. This should only handle the case > where the interrupt was already pending. >=20 > The ONESHOT_STOPPED state was introduced to handle the case where the > last timer gets canceled, so the already programmed event does not fire. >=20 > It was not necessarily meant to "fix" clockevent devices which are > pretending to be ONESHOT, but keep firing over and over. >=20 > That, said. I'm fine with the change along with a big fat comment why > this is required. Thanks for taking a look and confirming. I just sent a patch with a comment and what looks like another missed case. Hopefully it's okay. Thanks, Nick