Date: Tue, 10 Feb 2015 08:21:36 +0100 (CET)
From: Jiri Kosina <jkosina@suse.cz>
To: Josh Poimboeuf <jpoimboe@redhat.com>
cc: Seth Jennings <sjenning@redhat.com>, Vojtech Pavlik <vojtech@suse.cz>,
        Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
        live-patching@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/9] livepatch: consistency model
In-Reply-To: <20150210030553.GA30688@treble.redhat.com>
Message-ID: <alpine.LNX.2.00.1502100806280.10719@pobox.suse.cz>
References: <cover.1423499826.git.jpoimboe@redhat.com> <alpine.LNX.2.00.1502100001450.10719@pobox.suse.cz> <20150210030553.GA30688@treble.redhat.com>
User-Agent: Alpine 2.00 (LNX 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2516
Lines: 60

On Mon, 9 Feb 2015, Josh Poimboeuf wrote:

> > The way how do detect whether given CPU is running in userspace 
> > (without interfering with it too much, like, say, sending costly IPI) 
> > is rather tricky though. On kernels with CONFIG_CONTEXT_TRACKING we 
> > could make use of that feature, but my gut feeling is that most people 
> > keep that disabled.

> Yeah, that seems to be related to nohz.  I think we'd have to have it
> enabled 100% of the time on all CPUs, even when not patching.  Sounds
> like a lot of unnecessary overhead (unless the user already has it
> enabled on all CPUs).

Agreed, we could make use of it when it's enabled in kernel config anyway, 
but it would be impractical for us to hard require it.

> > Another alternative is what we are doing in kgraft with 
> > kgr_needs_lazy_migration(), but admittedly that's very far from being 
> > pretty.
> 
> Hm, is it really safe to read a stack while the task could be writing to
> it?

It might indeed look like that on a first sight :) but let's look at the 
possible race scenarios:

(1) task is running in userspace when you start looking at its kernel 
    stack, and while you are examining it, it enters the kernel. That's 
    not a problem, because no matter what verdict  kgr_needs_lazy_migration() 
    yields, the migration to new universe happens during kernel entry 
    anyway

(2) task is actively running in kernelspace. There is no way for 
    print_context_stack() to result it that small number of nr_entries. 
    The stack context might be bogus due to the race, but it always 
    starts at a valid bp which can't be that low.

(3) task is running in kernelspace, but is about to exit to userspace, and 
    looking at the kernel stack races with this. That's again not a 
    problem, because no matter what verdict kgr_needs_lazy_migration() 
    yields, the migration to the new unuverse happens during kernel exit 
    anyway

So I agree that this is ugly as hell, and depends on architecture-specific 
implementation of print_context_stack(); but architectures are free to 
give up this optimization if it can't be used.

But yes, we should be able to come up with something better if we want to 
use this optimization upstream.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/