Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp6594585iob; Wed, 11 May 2022 00:37:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyOC+ZKYvjlw7bz/qjT1cK3oMW4yh2tPWiRA2ju9fv/PsqKj8K//NX+4QOV4Msqw7Gm+QrQ X-Received: by 2002:aa7:d70a:0:b0:425:e56d:c1b9 with SMTP id t10-20020aa7d70a000000b00425e56dc1b9mr26968614edq.78.1652254643455; Wed, 11 May 2022 00:37:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652254643; cv=none; d=google.com; s=arc-20160816; b=kFP+IcXpPNxokQfqgSPg+hD8rX4O0LSqpUmV5orqRimN5aY2z4i1KUxhQIYsoi8glV UcHKtGEZ8aagxFlh36C2DOVgEQMkSSZOEdLLD2MlnmPCVL5NrLdToolqrmRBbOVkspBc NEO2iiBWMQMefUclHWkgVD1rXxUyNMOaPiTx7dUtPADnoH8/neEdOwyAuSPtv0wm3XSe 83TCdXHI+l8RkhWs8bq4yIuu0Jstv5JcBsVyEcQ9r7O9m3Lwy9zkT4a/8wfiAaNwjuOm mMWTRg1Y+KVj9cnftoKTO7rc0JgRmbvWVZCCSiZFl3Uz5Pdlwqym1E8xCDM2XUlUeR2N LcFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=3oHyPqLJGENzv+UTgrFJwhzX7QSJJV6MZtQgyJXUE4c=; b=sukjmLx6fy4PcbE2bAeBgP4NQNIan2cCJFyfQjssc2QSlQOgPAst7BrCo1MSlJnS4U miS90vC78TyE6PNWSd8AUcobd66mRA9zZiMp8wiz8xT72ELIDOEKlsNZwXwrOGEUkKrt 8NhHygLhfgfVS4sGnv5OolWSBOlSwHYIH7+RiBi6sM19XcghQ65tI/Fj7quv3RElYxGr 2LVrmqyAjt/FpW57fVlv7YrxurmqKFrUcoVp669zPm96EtDMc8DatGv5sCh0tNx99ukJ bY1Lc1YyXBQdil3Bpi3KNjhAEI3I6Y89qnEATz4xkpuEFAT68O2AyEfBqvW+rc0gjmSJ foKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=mNWsoI+7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o7-20020a170906974700b006e87b73b8ffsi1941471ejy.131.2022.05.11.00.36.56; Wed, 11 May 2022 00:37:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=mNWsoI+7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238794AbiEKAdk (ORCPT + 99 others); Tue, 10 May 2022 20:33:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229795AbiEKAdh (ORCPT ); Tue, 10 May 2022 20:33:37 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D08F27BC67; Tue, 10 May 2022 17:33:36 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 0649EB80E7B; Wed, 11 May 2022 00:33:35 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3F4F6C385CC; Wed, 11 May 2022 00:33:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1652229213; bh=ysnlGX7zF/bsKYFpLB45z+ORre2t36cMPtH5tLe1FCA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=mNWsoI+7gjSAh0cJTJZ8fZANpw4IZvWP4A+7kd/Qa4CF1qvbOj28mAOaZXxmjZGXV WtM3zNdzLjKQRei7yXzNqQuxOu16TX1EW+i8uiVIm+FdB0P2ZrbsZULswJMFxTMhj7 jQo/UpinjfC4I1We6qpUyIMbfPBRVEbhvzVNEv1t65cetz/CuTXGUyBeE9gBTDrJBf Hs9m0azVuVvO4jSw1nd1VpwF1cG3ZNk60XDztH4M0rOQODr6Rr8/mCmAsb11OiM4lv Jo7GffZlN+4Sb3p29dLbhb+r7cGOQCV81VoHYf5rK99z1bOebLx937yCfApHbY9T9i TOAfR/O2slAIw== Date: Tue, 10 May 2022 17:33:31 -0700 From: Josh Poimboeuf To: Song Liu Cc: Rik van Riel , "song@kernel.org" , "joe.lawrence@redhat.com" , "peterz@infradead.org" , "mingo@redhat.com" , "vincent.guittot@linaro.org" , "pmladek@suse.com" , "live-patching@vger.kernel.org" , Kernel Team , "linux-kernel@vger.kernel.org" , "jpoimboe@redhat.com" Subject: Re: [RFC] sched,livepatch: call klp_try_switch_task in __cond_resched Message-ID: <20220511003331.clfvwfgpmbr5yx6n@treble> References: <3a9bfb4a52b715bd8739d8834409c9549ec7f22f.camel@fb.com> <6bf85ff908377508a5f5bcc7c4e75d598b96f388.camel@fb.com> <20220510165244.ikfh64ertnvodxb4@treble> <1bd15361edfd4db9fc9271d35e7bbe5edad1b87a.camel@fb.com> <20220510184213.l3gjweeleyg7obca@treble> <47440502-930F-4CBD-B859-3AC9BBFF8FC6@fb.com> <20220510230402.e5ymkwt45sg7bd35@treble> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-7.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 10, 2022 at 11:57:04PM +0000, Song Liu wrote: > > If it's a real bug, we should fix it everywhere, not just for Facebook. > > Otherwise CONFIG_PREEMPT and/or non-x86 arches become second-class > > citizens. > > I think "is it a real bug?" is the top question for me. So maybe we > should take a step back. > > The behavior we see is: A busy kernel thread blocks klp transition > for more than a minute. But the transition eventually succeeded after > < 10 retries on most systems. The kernel thread is well-behaved, as > it calls cond_resched() at a reasonable frequency, so this is not a > deadlock. > > If I understand Petr correctly, this behavior is expected, and thus > is not a bug or issue for the livepatch subsystem. This is different > to our original expectation, but if this is what we agree on, we > will look into ways to incorporate long wait time for patch > transition in our automations. That's how we've traditionally looked at it, though apparently Red Hat and SUSE have implemented different ideas of what a long wait time is. In practice, one minute has always been enough for all of kpatch's users -- AFAIK, everybody except SUSE -- up until now. That timeout is not set in stone, and can be extended if needed, either with "kpatch load" waiting longer, or with it returning immediately and instead reporting livepatch stalls via WARN(). The latter option is a bigger change in behavior, since any errors will be delayed and reported in a different way, but if other users are ok with that, it's fine with me. Though, these options might be considered workarounds, as it's theoretically possible for a kthread to be CPU-bound indefinitely, beyond any arbitrarily chosen timeout. But maybe that's not realistic beyond a certain timeout value of X and we don't care? I dunno. > OTOH, if we think this is a suboptimal behavior (not necessarily a > bug, IIUC), we should consider improve it. I personally don’t think > shipping an improvement to one configuration makes other configurations > second-class citizens. And, it is possible PREEMPT kernel does NOT > even have such suboptimal behavior, right? (I really don't know.) No, PREEMPT will have the same problem. Preempting a kthread doesn't patch it unless the task happens to still be blocked during the next invocation of klp_transition_work_fn(). > So, if we come back to the same question: is this a bug (or a suboptimal > behavior that worth fixing)? If so, we are open to any solution that > would also help PREEMPT and/or non-x86 arches. You're obviously trying to fix a real world problem. Whether you call that issue a bug, or "suboptimal behavior", it's still a problem. As you said, the kthread is acting within the allowed parameters of how a kthread should behave. But "kpatch load" will fail. That sounds broken to me. We need to figure out a way to fix that for all configs/arches. > Lastly, maybe a really naive question: does the following also helps > PREEMPT=y configurations? As I have been trying to say, that won't work for PREEMPT+!ORC, because, when the kthread gets preempted, the stack trace will be attempted from an IRQ and will be reported as unreliable. Ideally we'd have the ORC unwinder for all arches, that would make this much easier. But we're not there yet. -- Josh