Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp5907735iob; Tue, 10 May 2022 06:30:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyi83u54lbyi6O0He20ka5MrTIkpRCPMfL6gVovryKpfVPJ7VN+qOKPqMqa/dk9EvL40ig+ X-Received: by 2002:a17:907:2d28:b0:6f4:487e:8554 with SMTP id gs40-20020a1709072d2800b006f4487e8554mr20216992ejc.87.1652189420306; Tue, 10 May 2022 06:30:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652189420; cv=none; d=google.com; s=arc-20160816; b=yzblJ+qecQLo8yQcFDc/Pj+9RDqUXGZYqW4y9mw0YLWf9OV7fbElJWoQhKnkZ4wXBw AfR7EvzJL+mYRET/enZJrKvq28/woGMjwRuWprdDcu3/lkLZKrNITAZ9B/dRqEMgHdBz 6RpOPni1Tkt4UFCyT4amLVLE4njns8ZK9My9DdXMYBunOrCvDkEFvrOGIwxhW52HN+iX xlT+zEuvNoeQUtX2+DPOLlXLnN/bOi98GTuhS0Ju2RCipevlcDhZ56zaCxbVwWgPgrYp HDLxxTHxXn+HzC52XUXJ2U7YeiQ/2XiTN/v6aVIfyjHDZvKJldDg3lePY+nwoEFkjg5t txWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=7ZUtK17hBNi/Td6QjHwkqgCi5bgm/d69DDW/g/56M8s=; b=Pu9gGQZgvPpoxFjWwviI30GUVUpMNhnD8V5VHBjL88h4EQxNWIgz8d2PXQg7XJtQAd q+g9oVcTmxbsYyfkYP/Q9+bHalB5TyrqIXqSpWilhybDOsbA/Ed3eRWE0Vj8kZKewLZ0 gd7QVQewOqNoyCEX5N9yBcw3Qru+dxTa6ZAtmp1t3TkhaF2PPS5uQQcm3oYlHfMr3/qg nZX69+inmOBrkhzwpns80jRbF0vlepNJcu6TvTYMv0HAzlSp7NQLA8wuWcg+bEspqett 5pvD07QUwonb53bJI5s2x/OlTMYSlatvEW//9W2fcUUgLXjygWWRZxspTMHcW2OUDUU4 XO4g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=Vz2UetjM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ne22-20020a1709077b9600b006e00f0a2fc0si19712245ejc.818.2022.05.10.06.29.55; Tue, 10 May 2022 06:30:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=Vz2UetjM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237603AbiEJIBR (ORCPT + 99 others); Tue, 10 May 2022 04:01:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237543AbiEJIA5 (ORCPT ); Tue, 10 May 2022 04:00:57 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D8AE1E251D; Tue, 10 May 2022 00:56:51 -0700 (PDT) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id D240F21B2F; Tue, 10 May 2022 07:56:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1652169409; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7ZUtK17hBNi/Td6QjHwkqgCi5bgm/d69DDW/g/56M8s=; b=Vz2UetjMX1QNQkhz/4lkCD4aJvTxtPPvFqqVkjpDar6JysSHDsVhT+r5olJ2KnUmt/iv/w 4l6bjj1ncF7O39Y9TNILbvo9iUBkARApAyGxeXQhwFjPDmsE9SblErbd8a7hDeJoffNvug U7qB1rd6+qUR+Q708VLARz5TRFOyk5c= Received: from suse.cz (unknown [10.100.208.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id ABC412C141; Tue, 10 May 2022 07:56:49 +0000 (UTC) Date: Tue, 10 May 2022 09:56:49 +0200 From: Petr Mladek To: Song Liu Cc: Song Liu , Linux Kernel Mailing List , "live-patching@vger.kernel.org" , Ingo Molnar , Peter Zijlstra , "vincent.guittot@linaro.org" , Josh Poimboeuf , "joe.lawrence@redhat.com" , Kernel Team Subject: Re: [RFC] sched,livepatch: call klp_try_switch_task in __cond_resched Message-ID: References: <20220507174628.2086373-1-song@kernel.org> <9C7DF147-5112-42E7-9F7C-7159EFDFB766@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <9C7DF147-5112-42E7-9F7C-7159EFDFB766@fb.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 2022-05-09 16:22:11, Song Liu wrote: > > > > On May 9, 2022, at 8:07 AM, Petr Mladek wrote: > > > > On Sat 2022-05-07 10:46:28, Song Liu wrote: > >> Busy kernel threads may block the transition of livepatch. Call > >> klp_try_switch_task from __cond_resched to make the transition easier. > > > > Do you have some numbers how this speeds up the transition > > and how it slows down the scheduler, please? > > We don’t have number on how much this would slow down the scheduler. > For the transition, we see cases where the transition cannot finish > with in 60 seconds (how much "kpatch load" waits by default). 60s might be too low limit, see below. > > cond_resched() is typically called in cycles with many interactions > > where the task might spend a lot of time. There are two possibilities. > > cond_resched() is called in: > > > > + livepatched function > > > > In this case, klp_try_switch_task(current) will always fail. > > And it will non-necessarily slow down every iteration by > > checking the very same stack. > > > > > > + non-livepatched function > > > > In this case, the transition will succeed on the first attempt. > > OK, but it would succeed also without that patch. The task would > > most likely sleep in this cond_resched() so that it might > > be successfully transitioned on the next occasion. > > We are in the non-live patched case. But the transition didn’t happen > in time, because the kernel thread doesn’t go to sleep. While there is > clearly something weird with this thread, we think live patch should > work because the thread does call cond_resched from time to time. I guess that it goes to sleep. Otherwise it would trigger soft lockup report if you have the watchdog enabled. IMHO, the problem is that klp_transition_work_fn() tries the transition "only" once per second, see void klp_try_complete_transition(void) { [...] schedule_delayed_work(&klp_transition_work, round_jiffies_relative(HZ)); [...] } It means that there are "only" 60 attempts to migrate the busy process. It fails when the process is in the running state or sleeping in a livepatched function. There is a _non-zero_ chance of a bad luck. It would be great to measure how long it will take to complete the transition if you remove the limit 60s. Anyway, the limit 60s looks like a bad idea to me. It is too low. For example, we do not use any limit at all in SUSE products. And the only report was that some thread from a 3rd party module could not be migrated. It was stuck with a livepatched function on the stack. The kthread had really problematic design. I am afraid that even this patch would not help in that case. Now, back to the limit. There are basically two problems when the transition takes too long: + It blocks another transition. But the frequency of new livepatches it typically counted in days or even weeks. + It means that a process is running the buggy/vulnerable code longer time. But few hours should be acceptable given how long it takes to prepare the livepatch. Also it looks better to have 99.9% of processes running the fixed code that revert the fix just because a single process needs longer time to get transitioned. I could imagine that there really is a process that is almost impossible to livepatch. It might get worse on NO_HZ system. The question is it happens in the real life. I would personally start with prolonging or removing the limit. Best Regards, Petr