Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp987428iob; Fri, 13 May 2022 18:42:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzXz6FYiTFGv4e6+CIZjkpvyRjPIAF26QkTpd0V7pKgSYQQ/pCTdv0sUzzutuOZ/0IElhRm X-Received: by 2002:a05:600c:19ca:b0:394:8dc0:b5c2 with SMTP id u10-20020a05600c19ca00b003948dc0b5c2mr6924008wmq.167.1652492522848; Fri, 13 May 2022 18:42:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652492522; cv=none; d=google.com; s=arc-20160816; b=usXRWe7XfWTxQb0JBRGNDwA2tJ7lDjDD9jtbOgZCu2PmW1t3lQcvy1jalu4Iy9EIoA 3uELNF+iv282NXlJhKZSn3flT2NuQTOAU84zUzrWLeLvq8Xp3GebNBn0L2oj/YY8r4mU SgoBmUj05cO44xL49xFXx6Jg2I+8mnYAfOV59f3hAhgIbLwN+cKmy6ngT3y/Qoh7VU6o vOVu3DxmXBIp9xk1XpuIsoYN1/S1vuWrTiFzJkTTXYJNsTrcfs/z4DCe6iSbnhzZTyWg Shaun8v89r90h3pC4CcNV4mt+EEJnV9DX3PQ2+zKVE8s4gXsmzR6qGdiSygTbeZSWxYQ xJeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Djcq6OteRPEfJn//KjLPsXwBBDlJy1Icla/0iyhv198=; b=Oi0CskAA6kUc1/OaZJgYH9ocCB5lOEloME+OIckf8eT+TzByLMGVjzNqTF4Zng3hZD JouyESntOs4mLYOwt2YAXqKPnjRiHS04Q4oF1B56i1nRIgeCMcPCLNh9GGSjiBph6AMq DTmRne9vcboEtA2sIKHw7lGMhPqT1tYUIwYIw+pFG08FxmaorZwtIRxdKFdWQylh4lGf F/NwaRy94xr3QQ0W2Nry6gHZ/pxMR3Ny4u+tI/I0tbrB/5XJIiowdGk7gPBPgq/Ez11X Va7rEbTQExraF4gT1MlOjRPf+UdJMm0YrrTpDwQxmxF7pYmfBeUMMEQiZaySkUhOSK/x gilA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=RtoFWSja; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id u3-20020a5d6da3000000b0020cf8c56ab9si653690wrs.141.2022.05.13.18.42.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 May 2022 18:42:02 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=RtoFWSja; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E817D4289A2; Fri, 13 May 2022 17:07:47 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1380305AbiEMMd7 (ORCPT + 99 others); Fri, 13 May 2022 08:33:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1380327AbiEMMdk (ORCPT ); Fri, 13 May 2022 08:33:40 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09A2E674C2; Fri, 13 May 2022 05:33:39 -0700 (PDT) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id A42C721A28; Fri, 13 May 2022 12:33:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1652445217; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Djcq6OteRPEfJn//KjLPsXwBBDlJy1Icla/0iyhv198=; b=RtoFWSjaH5iSe8BXb79GYjey9O/UmVHeoV/3jkOlbZvLgsgUSLUBe16bayG8d7VxDxL8qT d25mAxBYg6g2M0NdK5e1HU3ETaY5+Zhn6efOh+Obifua/+CIYSmnJaFdHuayfDBka/ehfD cwQ+3BvUI8+lQ25c5+spYKDD+d6sGsg= Received: from suse.cz (unknown [10.100.208.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 7DC822C141; Fri, 13 May 2022 12:33:37 +0000 (UTC) Date: Fri, 13 May 2022 14:33:34 +0200 From: Petr Mladek To: Song Liu Cc: Josh Poimboeuf , Rik van Riel , "song@kernel.org" , "joe.lawrence@redhat.com" , "peterz@infradead.org" , "mingo@redhat.com" , "vincent.guittot@linaro.org" , "live-patching@vger.kernel.org" , Kernel Team , "linux-kernel@vger.kernel.org" , "jpoimboe@redhat.com" Subject: Re: [RFC] sched,livepatch: call klp_try_switch_task in __cond_resched Message-ID: References: <6bf85ff908377508a5f5bcc7c4e75d598b96f388.camel@fb.com> <20220510165244.ikfh64ertnvodxb4@treble> <1bd15361edfd4db9fc9271d35e7bbe5edad1b87a.camel@fb.com> <20220510184213.l3gjweeleyg7obca@treble> <47440502-930F-4CBD-B859-3AC9BBFF8FC6@fb.com> <20220510230402.e5ymkwt45sg7bd35@treble> <20220511003331.clfvwfgpmbr5yx6n@treble> <20220511092433.GA26047@pathway.suse.cz> <78DFED12-571B-489C-A662-DA333555266B@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <78DFED12-571B-489C-A662-DA333555266B@fb.com> X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 2022-05-11 16:33:57, Song Liu wrote: > > > > On May 11, 2022, at 2:24 AM, Petr Mladek wrote: > > > > On Tue 2022-05-10 17:33:31, Josh Poimboeuf wrote: > >> On Tue, May 10, 2022 at 11:57:04PM +0000, Song Liu wrote: > >>>> If it's a real bug, we should fix it everywhere, not just for Facebook. > >>>> Otherwise CONFIG_PREEMPT and/or non-x86 arches become second-class > >>>> citizens. > >>> > >>> I think "is it a real bug?" is the top question for me. So maybe we > >>> should take a step back. > >>> > >>> The behavior we see is: A busy kernel thread blocks klp transition > >>> for more than a minute. But the transition eventually succeeded after > >>> < 10 retries on most systems. The kernel thread is well-behaved, as > >>> it calls cond_resched() at a reasonable frequency, so this is not a > >>> deadlock. > >>> > >>> If I understand Petr correctly, this behavior is expected, and thus > >>> is not a bug or issue for the livepatch subsystem. This is different > >>> to our original expectation, but if this is what we agree on, we > >>> will look into ways to incorporate long wait time for patch > >>> transition in our automations. > >> > >> That's how we've traditionally looked at it, though apparently Red Hat > >> and SUSE have implemented different ideas of what a long wait time is. > >> > >> In practice, one minute has always been enough for all of kpatch's users > >> -- AFAIK, everybody except SUSE -- up until now. > > > > I am actually surprised that nobody met the problem yet. There are > > "only" 60 attempts to transition the pending tasks. > > Maybe we should consider increase the frequency we try? Say to 10 times > per second? I guess this will solve most of the failures we are seeing > in current case. My concern is that klp_try_complete_transition() checks all processes under read_lock(&tasklist_lock). It might create some contention on this lock. I am not sure if this lock is fair. It might slow down block writers (creating/deleting tasks). Best Regards, Petr