Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp6137249iob; Tue, 10 May 2022 11:09:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwKJPHfjuENMbCYJx6tcuLOfEIdUamRzm11+SSMdjhGhlXqmPySyZ1SQtM1qhAbZVBVki4K X-Received: by 2002:a17:90b:3b52:b0:1dc:b438:68b7 with SMTP id ot18-20020a17090b3b5200b001dcb43868b7mr1131376pjb.166.1652206180227; Tue, 10 May 2022 11:09:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652206180; cv=none; d=google.com; s=arc-20160816; b=TPSxxBA00oYvMa8yjDuCuSEuzQpk9vSXi6DiB905b6jVQSxbakkPt3gmlS8Ppl4RXr ExAOmu1RRnVDVC4+cZd/2WBsZVQfwzpwvDCp7T+k72Wrhh5JYdTCa6oxZZRYtse+ERQ9 3sXwWn7JqIeahUm1ew2YYRLOiEtQRHKeyEL67h6pg537M7toSptkVdKWv6hODwRQCvHS PT8794gBSs2V0YZS0ajkaggNFBZoCE3fHPGPZefcjE59397k/nynSrGg93U8KEZPBfLf jNKRC+FJGlLXXg4CUt9ZYnaJLhECN265jcc2oOIqnmR3axp+mWxV2JN0SYgvLOiSs8uy pteQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=D94f5ARMrLCZ+JVe2Vf8yz6MNGDaxHLJphnyqhro/vw=; b=jR3iUPHPPWscsIrKRsfdp3i7DC1c864+iNNfelYWLIc9hjxdr8c8b6zgI6IkZHozP+ OXfp3OqzjSFa8S/5qFfhEygGDgHuLHU/Rbbg2Yq5bXwb2EhZ7ioHrNh7rqcAcRc1eNZV MWo1V6k6TFQloZn2JejNOni4gmX0EC2vK0IKTZKID7o3I8XfNqAzXRMjY9ljmg5fntzf /jNWgIzqGUfbb6bocEvchEptxiRbqW5OyrAx3y6wa0OF0bUNZKY1vfdSkl7B2hABq5Tf tZGCWb0FCGHu0h+8pxKTqVFquZCxqqnZAhbxUBdmydNdRsTLBYJoU969NDVFuNGj1vcw CX7g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=frs11Vzd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n5-20020a632705000000b003816043f118si7877pgn.781.2022.05.10.11.09.24; Tue, 10 May 2022 11:09:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=frs11Vzd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346249AbiEJPwR (ORCPT + 99 others); Tue, 10 May 2022 11:52:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33292 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243874AbiEJPu2 (ORCPT ); Tue, 10 May 2022 11:50:28 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E84952802F8; Tue, 10 May 2022 08:44:53 -0700 (PDT) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 553A21F8B2; Tue, 10 May 2022 15:44:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1652197492; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D94f5ARMrLCZ+JVe2Vf8yz6MNGDaxHLJphnyqhro/vw=; b=frs11VzdorCMe355yWFZX7IzHbf0FqqKt4GDqUp4yd+GG3ZxRqJqI0bRQ2j5JBYCyo1TWN uzeDxE6LKWQWUzyo3c8EUfW44e2c3+xK4sx4B1gh7YfWz9vGPC0O3NIDA4sY66p9omkVxG oQQjVjdEmvhL35mery3MjEHAMym3QWg= Received: from suse.cz (unknown [10.100.208.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 355482C141; Tue, 10 May 2022 15:44:52 +0000 (UTC) Date: Tue, 10 May 2022 17:44:51 +0200 From: Petr Mladek To: Rik van Riel Cc: Song Liu , "joe.lawrence@redhat.com" , "song@kernel.org" , "jpoimboe@redhat.com" , "peterz@infradead.org" , "mingo@redhat.com" , "vincent.guittot@linaro.org" , "live-patching@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Kernel Team Subject: Re: [RFC] sched,livepatch: call klp_try_switch_task in __cond_resched Message-ID: References: <20220507174628.2086373-1-song@kernel.org> <9C7DF147-5112-42E7-9F7C-7159EFDFB766@fb.com> <3a9bfb4a52b715bd8739d8834409c9549ec7f22f.camel@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <3a9bfb4a52b715bd8739d8834409c9549ec7f22f.camel@fb.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 2022-05-10 13:33:13, Rik van Riel wrote: > On Tue, 2022-05-10 at 09:56 +0200, Petr Mladek wrote: > > > > IMHO, the problem is that klp_transition_work_fn() tries the > > transition "only" once per second, see > > > > void klp_try_complete_transition(void) > > { > > [...] > > ????????????????schedule_delayed_work(&klp_transition_work, > > ????????????????????????????????????? round_jiffies_relative(HZ)); > > [...] > > } > > > > It means that there are "only" 60 attempts to migrate the busy > > process. > > It fails when the process is in the running state or sleeping in a > > livepatched function. There is a _non-zero_ chance of a bad luck. > > > > We are definitely hitting that non-zero chance :) > > > Anyway, the limit 60s looks like a bad idea to me. It is too low. > > That has its own issues, though. System management software > tracks whether kpatch succeeds, and a run of the system > management software will not complete until all of the commands > it has run have completed. > > One reason for this is that allowing system management software > to just fork more and more things that might potentially get > stuck is that you never want your system management software > to come even close to resembling a fork bomb :) > > Rollout of the next config change to a system should not be > blocked on KLP completion. Makes sense. > I think the best approach for us might be to just track what > is causing the transition failures, and send in trivial patches > to make the outer loop in such kernel threads do the same KLP > transition the idle task already does. I am afraid that is a way to hell. We might end up in doing really crazy things if we want to complete the transition in one minute. The great thing about the current approach is that it tries to livepatch the system without too much disruption. The more we try to speed up the transition the more we might disrupt the system. Not to say about the code complexity and potential bugs. IMHO a better approach is to fix your management system. The task is done when the livepatch module is loaded. If you want to know that there is some problem. Then the livepatch code might write some warning when the transition has not finished within some reasonable time frame (1 hour or so). It might be monitored the same way as the messages from various watchdogs, ... Best Regards, Petr