Date: Tue, 12 Apr 2016 12:35:49 -0500
From: Chris J Arges <chris.j.arges@canonical.com>
To: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>, Jessica Yu <jeyu@redhat.com>,
        Miroslav Benes <mbenes@suse.cz>, linux-kernel@vger.kernel.org,
        live-patching@vger.kernel.org, Vojtech Pavlik <vojtech@suse.com>
Subject: Re: [RFC PATCH v1.9 08/14] livepatch: separate enabled and patched
 states
Message-ID: <20160412173548.GA26049@canonical.com>
References: <cover.1458933243.git.jpoimboe@redhat.com>
 <e4dbf8827421c83d45427c28b3a31516c59cc6a5.1458933243.git.jpoimboe@redhat.com>
 <20160412144443.GA21882@canonical.com>
 <20160412171600.eikmt75i4qapjj2g@treble>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160412171600.eikmt75i4qapjj2g@treble>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5447
Lines: 150

On Tue, Apr 12, 2016 at 12:16:00PM -0500, Josh Poimboeuf wrote:
> On Tue, Apr 12, 2016 at 09:44:43AM -0500, Chris J Arges wrote:
> > On Fri, Mar 25, 2016 at 02:34:55PM -0500, Josh Poimboeuf wrote:
> > > Once we have a consistency model, patches and their objects will be
> > > enabled and disabled at different times.  For example, when a patch is
> > > disabled, its loaded objects' funcs can remain registered with ftrace
> > > indefinitely until the unpatching operation is complete and they're no
> > > longer in use.
> > > 
> > > It's less confusing if we give them different names: patches can be
> > > enabled or disabled; objects (and their funcs) can be patched or
> > > unpatched:
> > > 
> > > - Enabled means that a patch is logically enabled (but not necessarily
> > >   fully applied).
> > > 
> > > - Patched means that an object's funcs are registered with ftrace and
> > >   added to the klp_ops func stack.
> > > 
> > > Also, since these states are binary, represent them with booleans
> > > instead of ints.
> > > 
> > 
> > Josh,
> > 
> > Awesome work here first of all!
> > 
> > Looking through the patchset a bit I see the following bools:
> > - functions: patched, transitioning
> > - objects: patched
> > - patches: enabled
> > 
> > It seems this reflects the following states at a patch level:
> > 	disabled - module inserted, not yet logically enabled
> > 	enabled - logically enabled, but not all objects/functions patched
> > 	transitioning - objects/functions are being applied or reverted
> > 	patched - all objects/functions patched
> > 
> > However each object and function could have the same state and the parent object
> > just reflects the 'aggregate state'. For example if all funcs in an object are
> > patched then the object is also patched.
> > 
> > Perhaps we need more states (or maybe there will be more in the future), but
> > wouldn't this just be easier to have something like for each patch, object, and
> > function?
> > 
> > enum klp_state{
> > 	KLP_DISABLED,
> > 	KLP_ENABLED,
> > 	KLP_TRANSITION,
> > 	KLP_PATCHED,
> > }
> > 
> > 
> > I'm happy to help out too.
> 
> Thanks for the comments.  First let me try to explain why I chose two
> bools rather than a single state variable.
> 
> At a func level, it's always in one of the following states:
> 
>   patched=0 transition=0: unpatched
>   patched=0 transition=1: unpatched, temporary starting state
>   patched=1 transition=1: patched, may be visible to some tasks
>   patched=1 transition=0: patched, visible to all tasks
> 
> And for unpatching, it goes in the reverse order:
> 
>   patched=1 transition=0: patched, visible to all tasks
>   patched=1 transition=1: patched, may be visible to some tasks
>   patched=0 transition=1: unpatched, temporary ending state
>   patched=0 transition=0: unpatched
> 
> (note to self, put the above in a comment somewhere)
>

This is helpful and makes more sense.
 
> Now, we could convert the states from two bools into a single enum.  But
> I think it would complicate the code.  Because nowhere in the code does
> it need to access the full state.  In some places it accesses
> func->patched and in other places it accesses func->transition, but it
> never needs to access both.
> 
> So for example, the following check in klp_ftrace_handler():
> 
> 	if (unlikely(func->transition)) { 
> 
> would change to:
> 
> 	if (unlikely(func->state == KLP_ENABLED || func->state == KLP_TRANSITION)) {
> 
> Sure, we could use a helper function to make that more readable.  But
> with the bools its clearer and you don't need a helper function.
> 
> As another example, see the following code in klp_complete_transition():
> 
> 	klp_for_each_object(klp_transition_patch, obj)
> 		klp_for_each_func(obj, func)
> 			func->transition = false;
> 
> The last line would have to be changed to something like:
> 
> 			if (patching...)
> 				func->state = KLP_PATCHED;
> 			else /* unpatching */
> 				func->state = KLP_DISABLED;
> 		
> So that's why I picked two bools over a single state variable: it seems
> to make the code simpler.
> 
> As to the other idea about copying the func states to the object and
> patch level, I get the feeling that would also complicate the code.  We
> patch and transition at a function granularity, so the "real" state is
> at the func level.  Proliferating that state to objects and patches
> might be tricky to get right, and it could make it harder to understand
> exactly what's going on in the code.  And I don't really see a benefit
> to doing that.
> 
> -- 
> Josh
> 

I think keeping it simple makes a ton of sense, thanks for the explanation.

I'm also envisioning how to troubleshoot patches that don't converge.

So if someone wanted to check if a function has been fully patched on all tasks
they would check:

/sys/kernel/livepatch/<patch>/transition
/sys/kernel/livepatch/<patch>/patched

And see if patched=1 and transition=0 things are applied.

However if patched=1 and transition=1 then if a user wanted to dig down and see
which pid's we were waiting on we could do:

 cat /proc/*/patch_status

and check if any pid's patch_status values still contain KLP_UNIVERSE_OLD.

If we wanted to see which function in the task needs patching we could:
 cat /proc/<pid>/stack
and see if anything in that stack contains the functions in question.

Anyway I'll keep looking at this patchset, patching using the samples/livepatch
code works for me without issue so far.

--chris