Received: by 10.213.65.68 with SMTP id h4csp361926imn; Tue, 20 Mar 2018 05:28:09 -0700 (PDT) X-Google-Smtp-Source: AG47ELtmqYpBaqzyd+efc2i2lA8ISKVUs1BSd50WFGh4jRpXq+QC2maVVAn5O4k2Gyla7VxDN7o3 X-Received: by 10.99.136.194 with SMTP id l185mr12172355pgd.419.1521548889589; Tue, 20 Mar 2018 05:28:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521548889; cv=none; d=google.com; s=arc-20160816; b=ayWKC303p/cfOe92M7xmeKHZT9Cqe+MyCGaa9NUfllM2jrkbscbcHK4/LkucoAfWUV 2SgZ2h/VWmcD70xMsbpGVWI8bGa9wxyiJXkxeDfKjL9ToAeNzNQMK16EioNaVl0c7Rnj yWfdZAc31zkwPrUAbYn7mXMx8ZTr7n+FtvEO2/AczANbvN0J23G6AhJbpu437aHSObDf lwtd88lRFZzl8r1mkXj9CY8HKYpexRbfaBgwJjwERHUzXfUBJJ8azWQtgeyWXq1aTFzW ql1At/SZR87nkM3rOmfGzCia/MzJKN1BAsXW5Sc77SxELn6QVk2jtXSvrHB+HR/0Y7DW hHSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=HkW39vGgTFmeeaQ9p1dtDKctwf74SHUk21jspUhuaSU=; b=p1RhP3Z5P6+19XelQdL5wUpdoFZgNe7nbPkVc0EIPk57LDvrhPas3+ZbhRQNHY9ySo 7ia4yGTAoa22Wa55gNxVSGRScjXpQiLY1eicYwU4B3tdrB/PlwwXsV1v8XIuW9ghnnD0 fXC0nu4h38/wXlGWp5La0jZuetoIe87+QbAoHk9MkCYl2DxMx9ChI9WdP1wscWdADWsc IvzA9eSJ/WZQKxR/Tcl2M6zCsJITsaSjoP165XlcyByCjxadsfyWvpF+V8IuCKUJJXty Y2LU0vMRbcrXb7REPN6iy03CclgoubgNgpNI4jrDUIv2LRwED4mK0cZugkGrWn2mt9m2 J24Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e4si1122248pga.1.2018.03.20.05.27.55; Tue, 20 Mar 2018 05:28:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753194AbeCTMZo (ORCPT + 99 others); Tue, 20 Mar 2018 08:25:44 -0400 Received: from mx2.suse.de ([195.135.220.15]:54112 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752977AbeCTMZl (ORCPT ); Tue, 20 Mar 2018 08:25:41 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 39E9CADC6; Tue, 20 Mar 2018 12:25:39 +0000 (UTC) Date: Tue, 20 Mar 2018 13:25:38 +0100 From: Petr Mladek To: Josh Poimboeuf Cc: Jiri Kosina , Miroslav Benes , Jason Baron , Joe Lawrence , Jessica Yu , Evgenii Shatokhin , live-patching@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v10 05/10] livepatch: Support separate list for replaced patches. Message-ID: <20180320122538.t75rplwhmhtap5q2@pathway.suse.cz> References: <20180307082039.10196-1-pmladek@suse.com> <20180307082039.10196-6-pmladek@suse.com> <20180313224613.sdkdkcvhpqv54s6c@treble> <20180319150207.iz5ecbsogg5lpwac@pathway.suse.cz> <20180319214324.riyp233trtfxbeto@treble> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180319214324.riyp233trtfxbeto@treble> User-Agent: NeoMutt/20170421 (1.8.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 2018-03-19 16:43:24, Josh Poimboeuf wrote: > On Mon, Mar 19, 2018 at 04:02:07PM +0100, Petr Mladek wrote: > > > Can someone remind me why we're permanently disabling replaced patches? > > > I seem to remember being involved in that decision, but at least with > > > this latest version of the patches, it seems like it would be simpler to > > > just let 'replace' patches be rolled back to the previous state when > > > they're unpatched. Then we don't need two lists of patches, the nops > > > can become more permanent, the replaced patches remain "enabled" but > > > inert, and the unpatching behavior is less surprising to the user, more > > > like a normal rollback. > > > > Yes, keeping the patches might make some things easier. But it might > > also bring some problems and it would make the feature less useful. > > The following arguments come to my mind: > > > > 1. The feature should help to keep the system in a consistent and > > well defined state. It should not depend on what patches were > > installed before. > > But the nops already accomplish that. If they didn't, then this patch > set has a major problem. The nops are enough to keep the functionality but they might harm the performance. Livepatching is about preventing bugs without reboot. I could simply imagine that ftrace on a hot patch might cause performance problems on some workloads. And I would like to have a way out in this case. Anyway, I am reworking the patchset so that it implements your approach first. The possibility to remove NOPs and replaced livepatches is done via a followup patch. This might help to discuss if the changes are worth it or not. > > 2. The feature should allow to unpatch some functions while keeping > > the others patched. > > > > The ftrace handler might cause some unwanted slowdown or other > > problems. The performance might get restored only when we remove > > the NOPs when they are not longer necessary. > > I'd say simplicity and maintainability of the code is more important > than an (imagined) performance issue. The NOPs should be pretty fast > anyway. > > Not to mention that my proposal would make the behavior less surprising > and more user friendly (reverting a 'replace' patch restores it to its > previous state). If the "disable" way works as expected, see below. Also it is less surprising only if people understand the stack of patches. If they are familiar only with replace patches then it is normal for them that the patches get replaced. It is then like a package version update. > > 3. The handling of callbacks is already problematic. We run only > > the ones from the last patch to make things easier. > > > > We would need to come with something more complicated if we > > want to support rollback to "random" patches on the stack. > > And support for random patches is fundamental at least > > from my point of view. > > Can you elaborate on what you mean by random patches and why it would > require something more complicated from the callbacks? Let's say that we will use atomic replace for cumulative patches. Then every new patch knows what earlier patches did. It just did not know which of them was already installed. Therefore it needs to detect what callbacks were already called. The callbacks usually create or change something. So there should be something to check. Therefore the way forward should be rather straightforward. The way back is more problematic. The callbacks in the new cumulative patch would need to store information about the previous state and be able to restore it when the patch gets disabled. It might more or less double the callbacks code and testing scenarios. > > > Along those lines, I'd also propose that we constrain our existing patch > > > stacking even further. Right now we allow a new patch to be registered > > > on top of a disabled patch, though we don't allow the new patch to be > > > enabled until the previous patch gets enabled. I'd propose we no longer > > > allow that condition. We should instead enforce that all existing > > > patches are *enabled* before allowing a new patch to be registered on > > > top. That way the patch stacking is even more sane, and there are less > > > "unusual" conditions to worry about. We have enough of those already. > > > Each additional bit of flexibility has a maintenance cost, and this one > > > isn't worth it IMO. > > > > Again, this might make some things easier but it might also bring > > problems. > > > > For example, we would need to solve the situation when the last > > patch is disabled and cannot be removed because the transition > > was forced. This might be more common after removing the immediate > > feature. > > I would stop worrying about forced patches so much :-) I have already seen blocked transition several times. It is true that it was with kGraft. But we just do not have enough real life experience with the upstream livepatch code. > Forced patches already come with a disclaimer, and we can't bend over > backwards for them. In such a rare case, the admin can just re-enable > the forced patch before loading the 'replace' patch. > > Also it might be less user friendly. > > I don't know, does anybody really care about this case (patching on top > of a disabled patch)? It just adds to the crazy matrix of possible > scenarios we have to keep in our heads, which means more bugs, for very > little (hypothetical) gain. It depends if the we remove the replaced patches or not. If they are removed then replacing disabled patches is rather trivial from both coding and understanding side. I am going to add this as a separate patch as well. Let's discuss it with the code. > > White the atomic replace could make things easier for both developers > > and users. > > I agree that atomic replace is a useful feature and I'm not arguing > against it, so maybe I missed your point? Your suggestion allows easier code but it reduces the advantages of the atomic replace feature. We would achieve almost the same results with a normal livepatch where the functions behave like in the original code. Also removing replaced patches can be seen as a clean up after each patch. It might be more code but the target system might be easier to debug. Also we do not need to mind about various disable scenarios. Best Regards, Petr