2014-06-25 11:05:40

by Jiri Slaby

[permalink] [raw]
Subject: [PATCH 00/21] kGraft

Hi,

this is a repost of the second round of RFC on kGraft, the linux
kernel online patching developed at SUSE. This repost only widened the
target audience for broader review, no code change happened.

Please speak up now (or be silent till the next merge window). That
is, if there are no objections, we plan pushing the tree into -next
and asking Linus in the next merge window for comments.

The patches are posted as a reply to this email and can be also
obtained as a whole tree from:
https://git.kernel.org/cgit/linux/kernel/git/jirislaby/kgraft.git/log/?h=kgraft

Jiri Kosina (6):
kgr: initial code
kgr: x86: refuse to build without fentry support
kgr: add procfs interface for per-process 'kgr_in_progress'
kgr: make a per-process 'in progress' flag a single bit
kgr: expose global 'in_progress' state through procfs
kgr: x86: optimize handling of CPU-bound tasks

Jiri Slaby (14):
ftrace: Add function to find fentry of function
ftrace: Make ftrace_is_dead available globally
kgr: add testing kgraft patch
kgr: update Kconfig documentation
kgr: add Documentation
kgr: trigger the first check earlier
kgr: sched.h, introduce kgr_task_safe helper
kgr: mark task_safe in some kthreads
kgr: kthreads support
kgr: handle irqs
kgr: add MAINTAINERS entry
kgr: add support for missing functions
kgr: exercise non-present function
kgr: fix race of stub and patching

Libor Pechacek (1):
kgr: rephrase the "kGraft failed" message

Documentation/kgraft.txt | 44 ++++
MAINTAINERS | 9 +
arch/x86/Kconfig | 2 +
arch/x86/include/asm/kgraft.h | 61 ++++++
arch/x86/include/asm/thread_info.h | 6 +-
arch/x86/kernel/entry_64.S | 9 +
drivers/base/devtmpfs.c | 1 +
drivers/scsi/scsi_error.c | 2 +
drivers/usb/core/hub.c | 4 +-
fs/jbd2/journal.c | 2 +
fs/notify/mark.c | 5 +-
fs/proc/base.c | 11 +
include/linux/freezer.h | 2 +
include/linux/ftrace.h | 4 +
include/linux/kgraft.h | 90 ++++++++
include/linux/sched.h | 9 +
kernel/Kconfig.kgraft | 10 +
kernel/Makefile | 1 +
kernel/hung_task.c | 5 +-
kernel/kgraft.c | 430 +++++++++++++++++++++++++++++++++++++
kernel/kthread.c | 3 +
kernel/rcu/tree.c | 6 +-
kernel/rcu/tree_plugin.h | 10 +-
kernel/smpboot.c | 2 +
kernel/trace/ftrace.c | 30 +++
kernel/trace/trace.h | 2 -
kernel/workqueue.c | 3 +
mm/huge_memory.c | 1 +
net/bluetooth/rfcomm/core.c | 2 +
samples/Kconfig | 8 +
samples/Makefile | 3 +-
samples/kgraft/Makefile | 1 +
samples/kgraft/kgraft_patcher.c | 99 +++++++++
33 files changed, 864 insertions(+), 13 deletions(-)
create mode 100644 Documentation/kgraft.txt
create mode 100644 arch/x86/include/asm/kgraft.h
create mode 100644 include/linux/kgraft.h
create mode 100644 kernel/Kconfig.kgraft
create mode 100644 kernel/kgraft.c
create mode 100644 samples/kgraft/Makefile
create mode 100644 samples/kgraft/kgraft_patcher.c

--
2.0.0


2014-06-25 12:56:22

by Alan Cox

[permalink] [raw]
Subject: Re: [PATCH 00/21] kGraft

On Wed, 25 Jun 2014 13:05:29 +0200
Jiri Slaby <[email protected]> wrote:

> Hi,
>
> this is a repost of the second round of RFC on kGraft, the linux
> kernel online patching developed at SUSE. This repost only widened the
> target audience for broader review, no code change happened.

- wait_event_freezable(khubd_wait,
+ wait_event_freezable(khubd_wait,
({ kgr_task_safe(current);

The changes are somewhat ugly with all the kgraft crap leaking into plces
like jbd and freezer and usb. That says to me its not well isolated and
there must be a better way of hiding that kgr stuff so we don't have to
kepe putting more code into all the kthreads, and inevitably missing them
or have people getting it wrong.

You also wake up all the kthreads when some of them might not expect to
be woken and may not check for the case of a bogus wake.

You add instructions to various hotpaths (despite the CONFIG comment).
Have you done timing analysis of their impact when turned on ?

Can you explain a bit more about why the path you've chosen was used as
opposed to using the power management freeze/resume path. Would that not
be a lot less intrusive ?

Don't get me wrong - I'd like a good working ksplice/graft !

Alan

2014-06-25 15:54:54

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH 00/21] kGraft

On Wed, 25 Jun 2014, One Thousand Gnomes wrote:

> > this is a repost of the second round of RFC on kGraft, the linux
> > kernel online patching developed at SUSE. This repost only widened the
> > target audience for broader review, no code change happened.
>
> - wait_event_freezable(khubd_wait,
> + wait_event_freezable(khubd_wait,
> ({ kgr_task_safe(current);
>
> The changes are somewhat ugly with all the kgraft crap leaking into plces
> like jbd and freezer and usb. That says to me its not well isolated and
> there must be a better way of hiding that kgr stuff so we don't have to
> kepe putting more code into all the kthreads, and inevitably missing them
> or have people getting it wrong.

Tejun commented on this very issue during the first RFC submission a
couple weeks ago. Part of his proposal was actually to take this as a good
pretext to review the existing kernel threads, as the idea is that a big
portion of those could easily be converted to workqueues.

It of course has its own implications, such as not being able to
prioritize that easily, but there is probably a lot of low-hanging fruits
where driver authors and their grandmas have been creating kthreads where
workqueue would suffice.

But it's a very long term goal, and it's probably impractical to make
kGraft dependent on it.

> You add instructions to various hotpaths (despite the CONFIG comment).

Kernel entry is hidden behind _TIF_WORK_SYSCALL_ENTRY and
_TIF_ALLWORK_MASK, so it doesn't add any overhead if kGraft is not
enabled. What other hot paths do you refer to?

> Don't get me wrong - I'd like a good working ksplice/graft !

Thanks Alan,

--
Jiri Kosina
SUSE Labs

2014-06-26 06:18:50

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: [PATCH 00/21] kGraft

On Wed, Jun 25, 2014 at 05:54:50PM +0200, Jiri Kosina wrote:
> > - wait_event_freezable(khubd_wait,
> > + wait_event_freezable(khubd_wait,
> > ({ kgr_task_safe(current);
> >
> > The changes are somewhat ugly with all the kgraft crap leaking into plces
> > like jbd and freezer and usb. That says to me its not well isolated and
> > there must be a better way of hiding that kgr stuff so we don't have to
> > kepe putting more code into all the kthreads, and inevitably missing them
> > or have people getting it wrong.
>
> Tejun commented on this very issue during the first RFC submission a
> couple weeks ago. Part of his proposal was actually to take this as a good
> pretext to review the existing kernel threads, as the idea is that a big
> portion of those could easily be converted to workqueues.

In the past few years there was a significant proliferation of kernel
threads just for cases where something needs to be done from a process
context now and then.

This is underlined by the fact that on an average installation there are
more kernel threads than real userspace processes.

Converting these bits to workqueues would then also take care of their
interaction with freezer, kthread_stop. The main loop code wouldn't have
to be replicated over and over with slight variations. kgr_taks_safe
would then plug into that rather elegantly.

In the absence of this rework, kGraft hijacks the freezer and
kthread_stop to pinpoint the 'end' of the main loop of most kernel
threads and annotates the rest that doesn't handle freezing or stopping
with kgr_task_safe directly.


> It of course has its own implications, such as not being able to
> prioritize that easily, but there is probably a lot of low-hanging fruits
> where driver authors and their grandmas have been creating kthreads where
> workqueue would suffice.

Indeed, on the other hand, there are enough workqueues today and on
realtime systems the need for prioritizing them exists already.

> But it's a very long term goal, and it's probably impractical to make
> kGraft dependent on it.

Should the kernel thread changes turn to be a big issue, we could do the
initial submission without them and depend on the kthread->workqueue
rework. Once we add support for multiple patches in progress, the fact
that we don't support kernel threads would not be as painful.

> > You add instructions to various hotpaths (despite the CONFIG
> > comment).
>
> Kernel entry is hidden behind _TIF_WORK_SYSCALL_ENTRY and
> _TIF_ALLWORK_MASK, so it doesn't add any overhead if kGraft is not
> enabled. What other hot paths do you refer to?

As Ji?? says, I don't see any hot path where kGraft would add
instructions: Kernel exit and entry are handled outside the hot path
in the _TIF_WORK_SYSCALL_ENTRY and _TIF_ALLWORK_MASK handlers. [patch 15/21]

It adds instructions into the main loops of kernel threads, but those
can hardly be called hot paths as they mostly sleep for long times.

--
Vojtech Pavlik
Director SUSE Labs

2014-07-02 12:12:23

by Jiri Slaby

[permalink] [raw]
Subject: kGraft to -next [was: 00/21 kGraft]

On 06/25/2014 01:05 PM, Jiri Slaby wrote:
> Hi,
>
> this is a repost of the second round of RFC on kGraft, the linux
> kernel online patching developed at SUSE. This repost only widened the
> target audience for broader review, no code change happened.
>
> Please speak up now (or be silent till the next merge window). That
> is, if there are no objections, we plan pushing the tree into -next
> and asking Linus in the next merge window for comments.
>
> The patches are posted as a reply to this email and can be also
> obtained as a whole tree from:
> https://git.kernel.org/cgit/linux/kernel/git/jirislaby/kgraft.git/log/?h=kgraft

Stephen,

may I ask you to add the kGraft tree to -next?

git://git.kernel.org/pub/scm/linux/kernel/git/jirislaby/kgraft.git#kgraft

Thank you.

Documentation/kgraft.txt | 44 ++++
MAINTAINERS | 9 +
arch/x86/Kconfig | 2 +
arch/x86/include/asm/kgraft.h | 61 +++++
arch/x86/include/asm/thread_info.h | 6 +-
arch/x86/kernel/entry_64.S | 9 +
drivers/base/devtmpfs.c | 1 +
drivers/scsi/scsi_error.c | 2 +
drivers/usb/core/hub.c | 4 +-
fs/jbd2/journal.c | 2 +
fs/notify/mark.c | 5 +-
fs/proc/base.c | 11 +
include/linux/freezer.h | 2 +
include/linux/ftrace.h | 4 +
include/linux/kgraft.h | 130 ++++++++++
include/linux/sched.h | 9 +
kernel/Kconfig.kgraft | 12 +
kernel/Makefile | 1 +
kernel/hung_task.c | 5 +-
kernel/kgraft.c | 477 ++++++++++++++++++++++++++++++
kernel/kgraft_files.c | 150 ++++++++++++
kernel/kthread.c | 3 +
kernel/rcu/tree.c | 6 +-
kernel/rcu/tree_plugin.h | 10 +-
kernel/smpboot.c | 2 +
kernel/trace/ftrace.c | 30 +++
kernel/trace/trace.h | 2 -
kernel/workqueue.c | 3 +
mm/huge_memory.c | 6 +-
net/bluetooth/rfcomm/core.c | 2 +
samples/Kconfig | 8 +
samples/Makefile | 3 +-
samples/kgraft/Makefile | 1 +
samples/kgraft/kgraft_patcher.c | 97 ++++++++
34 files changed, 1104 insertions(+), 15 deletions(-)
create mode 100644 Documentation/kgraft.txt
create mode 100644 arch/x86/include/asm/kgraft.h
create mode 100644 include/linux/kgraft.h
create mode 100644 kernel/Kconfig.kgraft
create mode 100644 kernel/kgraft.c
create mode 100644 kernel/kgraft_files.c
create mode 100644 samples/kgraft/Makefile
create mode 100644 samples/kgraft/kgraft_patcher.c

--
js
suse labs

2014-07-02 12:30:08

by Tejun Heo

[permalink] [raw]
Subject: Re: kGraft to -next [was: 00/21 kGraft]

Hello,

On Wed, Jul 02, 2014 at 02:04:38PM +0200, Jiri Slaby wrote:
> On 06/25/2014 01:05 PM, Jiri Slaby wrote:
...
> > https://git.kernel.org/cgit/linux/kernel/git/jirislaby/kgraft.git/log/?h=kgraft
>
> Stephen,
>
> may I ask you to add the kGraft tree to -next?
>
> git://git.kernel.org/pub/scm/linux/kernel/git/jirislaby/kgraft.git#kgraft

Do we have consensus on the approach? I personally really don't like
the fact that it's adding another aspect to kthread management which
is difficult to get right and nearly impossible to verify
automatically.

IIUC, there are three similar solutions. What are the pros and cons
of each? Can we combine the different approaches?

Thanks.

--
tejun

2014-07-02 12:48:05

by Steven Rostedt

[permalink] [raw]
Subject: Re: kGraft to -next [was: 00/21 kGraft]

On Wed, 2 Jul 2014 08:30:02 -0400
Tejun Heo <[email protected]> wrote:

> Hello,
>
> On Wed, Jul 02, 2014 at 02:04:38PM +0200, Jiri Slaby wrote:
> > On 06/25/2014 01:05 PM, Jiri Slaby wrote:
> ...
> > > https://git.kernel.org/cgit/linux/kernel/git/jirislaby/kgraft.git/log/?h=kgraft
> >
> > Stephen,
> >
> > may I ask you to add the kGraft tree to -next?
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/jirislaby/kgraft.git#kgraft
>
> Do we have consensus on the approach? I personally really don't like
> the fact that it's adding another aspect to kthread management which
> is difficult to get right and nearly impossible to verify
> automatically.
>
> IIUC, there are three similar solutions. What are the pros and cons
> of each? Can we combine the different approaches?

Has this been agreed on to be accepted yet? I don't believe so.

linux-next is for code that will be going into the next release of the
kernel. Not for developmental code or code that is still being
discussed.

-- Steve

2014-07-02 12:49:16

by Alan Cox

[permalink] [raw]
Subject: Re: kGraft to -next [was: 00/21 kGraft]

On Wed, 2 Jul 2014 08:30:02 -0400
Tejun Heo <[email protected]> wrote:

> Hello,
>
> On Wed, Jul 02, 2014 at 02:04:38PM +0200, Jiri Slaby wrote:
> > On 06/25/2014 01:05 PM, Jiri Slaby wrote:
> ...
> > > https://git.kernel.org/cgit/linux/kernel/git/jirislaby/kgraft.git/log/?h=kgraft
> >
> > Stephen,
> >
> > may I ask you to add the kGraft tree to -next?
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/jirislaby/kgraft.git#kgraft
>
> Do we have consensus on the approach? I personally really don't like
> the fact that it's adding another aspect to kthread management which
> is difficult to get right and nearly impossible to verify
> automatically.

Nor me. I don't see why it can't use the kernel freeze functionality ?

Alan

2014-07-02 13:01:21

by Jiri Kosina

[permalink] [raw]
Subject: Re: kGraft to -next [was: 00/21 kGraft]

On Wed, 2 Jul 2014, One Thousand Gnomes wrote:

> > > > https://git.kernel.org/cgit/linux/kernel/git/jirislaby/kgraft.git/log/?h=kgraft
> > >
> > > Stephen,
> > >
> > > may I ask you to add the kGraft tree to -next?
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/jirislaby/kgraft.git#kgraft
> >
> > Do we have consensus on the approach? I personally really don't like
> > the fact that it's adding another aspect to kthread management which
> > is difficult to get right and nearly impossible to verify
> > automatically.
>
> Nor me. I don't see why it can't use the kernel freeze functionality ?

Well, we are, see the patch no. 9. It all boils down basically to

[ ... ]
diff --git a/include/linux/freezer.h b/include/linux/freezer.h
index 7fd81b8c4897..e08c3bef251b 100644
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -61,6 +61,8 @@ static inline bool try_to_freeze_unsafe(void)

static inline bool try_to_freeze(void)
{
+ kgr_task_safe(current);
+
if (!(current->flags & PF_NOFREEZE))
debug_check_no_locks_held();
return try_to_freeze_unsafe();
[ ... ]


Then there are non-freezable kernel threads. We are currently handling
those explicitly, but I agree it can be argued that they might not require
any special handling at all, given they claimed themselves explicitly
non-freezable.

--
Jiri Kosina
SUSE Labs

2014-07-02 13:30:59

by Tejun Heo

[permalink] [raw]
Subject: Re: kGraft to -next [was: 00/21 kGraft]

Hello,

On Wed, Jul 02, 2014 at 03:01:16PM +0200, Jiri Kosina wrote:
> static inline bool try_to_freeze(void)
> {
> + kgr_task_safe(current);
> +
> if (!(current->flags & PF_NOFREEZE))
> debug_check_no_locks_held();
> return try_to_freeze_unsafe();

Heh, I'm totally confused now. Why is this correct? What guarantees
that context is not carried across try_to_freeze()?

Thanks.

--
tejun

2014-07-02 16:29:13

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: kGraft to -next [was: 00/21 kGraft]

On Wed, Jul 02, 2014 at 08:30:02AM -0400, Tejun Heo wrote:
> Hello,
>
> On Wed, Jul 02, 2014 at 02:04:38PM +0200, Jiri Slaby wrote:
> > On 06/25/2014 01:05 PM, Jiri Slaby wrote:
> ...
> > > https://git.kernel.org/cgit/linux/kernel/git/jirislaby/kgraft.git/log/?h=kgraft
> >
> > Stephen,
> >
> > may I ask you to add the kGraft tree to -next?
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/jirislaby/kgraft.git#kgraft
>
> Do we have consensus on the approach? I personally really don't like
> the fact that it's adding another aspect to kthread management which
> is difficult to get right and nearly impossible to verify
> automatically.
>
> IIUC, there are three similar solutions. What are the pros and cons
> of each? Can we combine the different approaches?

Please don't forget about kpatch. The most recent version was posted
here:

https://lkml.org/lkml/2014/5/1/273

We've made a ton of improvements since then, so I'll probably post a new
patch set soon.

kpatch advantages:

* 100% self-contained in its own module [1]

* Doesn't rely on changing all the kthreads

* Patch is applied atomically using stop_machine(), so it's safer with
respect to data semantic changes

* Patching atomically also makes it much easier to analyze a patch to
determine whether it's safe for live patching

* Already supports many advanced features which kGraft is lacking:
- patched functions can access non-exported symbols, e.g. static
variables
- safe unpatching
- module patching (and deferred module patching)
- atomic patch replacement
- supports atomic load/unload user hook functions
- proper duplicate symbol handling
- address verification sanity checks
- sophisticated user space tools for analyzing and converting source
patches to binary patch modules
- ability to properly deal with many special sections (__bug_table,
.data..percpu, etc)

kpatch disadvantages:

* Can't yet patch functions which are always active (schedule(),
sys_poll(), etc). We're currently working on ways to overcome this
limitation. One way is to allow the user to skip the backtrace check
for those patches which don't change data semantics (which, for
security fixes, should be most patches). We also have some other
ideas brewing...

* stop_machine() latency. We've found that stop_machine() is still
pretty fast. IIRC, we measured ~1ms on an idle system and ~40ms on a
heavily loaded 16 CPU system.

* Currently we don't freeze kernel threads. Instead we just put them to
sleep. We _could_ freeze them, but I think it needs more discussion.
It's definitely not a cure-all because you still have to worry about
user threads.

With our current approach, when analyzing whether patches are safe to
apply live, we assume that all kernel and user threads will be asleep.
We make no assumptions that kernel threads will be frozen. In general
we avoid changing data and data semantics as much as possible, so it
shouldn't matter in most cases. Personally I haven't yet run into a
case where freezing kernel threads would have made a patch become
"safe".


[1] https://github.com/dynup/kpatch/blob/master/kmod/core/core.c

--
Josh

2014-07-03 00:26:21

by Stephen Rothwell

[permalink] [raw]
Subject: Re: kGraft to -next [was: 00/21 kGraft]

Hi Jiri,

On Wed, 02 Jul 2014 14:04:38 +0200 Jiri Slaby <[email protected]> wrote:
>
> may I ask you to add the kGraft tree to -next?
>
> git://git.kernel.org/pub/scm/linux/kernel/git/jirislaby/kgraft.git#kgraft

Given the ongoing discussion, I think this needs to wait ...

--
Cheers,
Stephen Rothwell [email protected]


Attachments:
signature.asc (819.00 B)

2014-07-05 20:04:57

by Jiri Kosina

[permalink] [raw]
Subject: Re: kGraft to -next [was: 00/21 kGraft]

On Wed, 2 Jul 2014, Tejun Heo wrote:

> > static inline bool try_to_freeze(void)
> > {
> > + kgr_task_safe(current);
> > +
> > if (!(current->flags & PF_NOFREEZE))
> > debug_check_no_locks_held();
> > return try_to_freeze_unsafe();
>
> Heh, I'm totally confused now. Why is this correct? What guarantees
> that context is not carried across try_to_freeze()?

I think we need to take a step back now, and ask ourselves a question
"What is the actual goal here?".

What we need is to have a defined point in execution where we can draw a
line between "old" and "new" universes. For processess that are crossing
the userspace/kernelspace boundary, the obvious choice, that covers most
of the use-cases, has been made. There are still scenarios where this
aproach can't be just-blindly-applied(TM) for various reasons (changing
lock order might cause deadlocks, there are cases where state is lingering
between two user <-> kernel transitions, etc). So we'll need to provide
guidelines for kGraft patch writers anyway.

The same holds for the kernel threads -- until all (or most of) the
kthreads are converted to workqueues, the obivous choice, that should
cover most of the use-cases, has been made.

But manual/human inspection is absolutely unavoidably necessary in any
case.

Please keep in mind that this is designed for fixes that need immediate
response (getting bounds checking right, adding an extra check, adding a
missing lock, etc -- please see my previous mail on this topic in the old
thread). It's absolutely by design not intended for implementing whole new
features or exchanging the whole kernel on the fly; there are other
solutions for that (such as the criu-based thing). As such, we tend to
interfere with the rest of the kernel as little as possible, but it
inadverently brings drawbacks in the form of putting burden of more work
to the actual kGraft patch writers. I don't see that as a bad thing.

Thanks,

--
Jiri Kosina
SUSE Labs

2014-07-05 20:36:13

by Tejun Heo

[permalink] [raw]
Subject: Re: kGraft to -next [was: 00/21 kGraft]

Hello,

On Sat, Jul 05, 2014 at 10:04:52PM +0200, Jiri Kosina wrote:
> What we need is to have a defined point in execution where we can draw a
> line between "old" and "new" universes. For processess that are crossing
> the userspace/kernelspace boundary, the obvious choice, that covers most
> of the use-cases, has been made. There are still scenarios where this
> aproach can't be just-blindly-applied(TM) for various reasons (changing
> lock order might cause deadlocks, there are cases where state is lingering

Sure, if something breaks work due to semantic differences, there
isn't anything we can do.

> between two user <-> kernel transitions, etc). So we'll need to provide
> guidelines for kGraft patch writers anyway.
>
> The same holds for the kernel threads -- until all (or most of) the
> kthreads are converted to workqueues, the obivous choice, that should
> cover most of the use-cases, has been made.

But, this is different. This is the mechanism itself being flaky and
buggy. This can make things fail not because the patch itself is
semantically wrong but because the mechanism stopped the kernel at the
wrong place. The mechanism is simply broken and it might as well
declare that patching whatever kthreads are running isn't supported.

> But manual/human inspection is absolutely unavoidably necessary in any
> case.

This is extremely tricky to inspect and likely to just work in most,
but not all, cases when it goes wrong just out of sheer luck.

> Please keep in mind that this is designed for fixes that need immediate
> response (getting bounds checking right, adding an extra check, adding a
> missing lock, etc -- please see my previous mail on this topic in the old
> thread). It's absolutely by design not intended for implementing whole new
> features or exchanging the whole kernel on the fly; there are other
> solutions for that (such as the criu-based thing). As such, we tend to
> interfere with the rest of the kernel as little as possible, but it
> inadverently brings drawbacks in the form of putting burden of more work
> to the actual kGraft patch writers. I don't see that as a bad thing.

I'm not saying this must be able to do everything but it shouldn't be
voodoo either. Freezer points *can* coincide with what kgraft
requires but it may not too. Why claim that freezing points are safe
as stopping points when they aren't? That doesn't make any sense to
me. If kthread can't be safely supported now, that's fine but then
make it clear that functions which may be executed by kthreads aren't
supported.

Thanks.

--
tejun

2014-07-05 20:49:23

by Jiri Kosina

[permalink] [raw]
Subject: Re: kGraft to -next [was: 00/21 kGraft]

On Sat, 5 Jul 2014, Tejun Heo wrote:

> > What we need is to have a defined point in execution where we can draw a
> > line between "old" and "new" universes. For processess that are crossing
> > the userspace/kernelspace boundary, the obvious choice, that covers most
> > of the use-cases, has been made. There are still scenarios where this
> > aproach can't be just-blindly-applied(TM) for various reasons (changing
> > lock order might cause deadlocks, there are cases where state is lingering
>
> Sure, if something breaks work due to semantic differences, there
> isn't anything we can do.
[ ... ]
> > The same holds for the kernel threads -- until all (or most of) the
> > kthreads are converted to workqueues, the obivous choice, that should
> > cover most of the use-cases, has been made.
>
> But, this is different.

Quite frankly, I have to say that I don't personally see *that* big
difference. In both cases we are making assumptions about semantics, and
are trying to get "as close as possible" to the point in time where the
set of assumptions can be made as minimal as possible.

With userspace thread, this is kernel/userspace boundary. With kthread,
this is starting of new iteration of the main loop / try_to_freeze().

We were not able to come up with anything better. Alan, you were
stating "I don't see why it can't use the kernel freeze functionality?".
What exactly was your suggestion, if not this, please?

> This is the mechanism itself being flaky and buggy. This can make
> things fail not because the patch itself is semantically wrong but
> because the mechanism stopped the kernel at the wrong place.

Well, to be precise, we are not "stopping" the kernel at all.

> If kthread can't be safely supported now, that's fine but then make it
> clear that functions which may be executed by kthreads aren't supported.

So one of the aproaches implied by this would be first merging a light
version of kGraft which doesn't support kthreads, and working towards a
solution for kthreads as well (which might be tangential to kGraft; if
most of the kthreads get converted to workqueues, it's a win-win
situation anyway); would such kGraft-light get your Ack then? :)

Thanks,

--
Jiri Kosina
SUSE Labs

2014-07-05 21:04:10

by Tejun Heo

[permalink] [raw]
Subject: Re: kGraft to -next [was: 00/21 kGraft]

Hello,

On Sat, Jul 05, 2014 at 10:49:18PM +0200, Jiri Kosina wrote:
> Quite frankly, I have to say that I don't personally see *that* big
> difference. In both cases we are making assumptions about semantics, and
> are trying to get "as close as possible" to the point in time where the
> set of assumptions can be made as minimal as possible.
>
> With userspace thread, this is kernel/userspace boundary. With kthread,
> this is starting of new iteration of the main loop / try_to_freeze().

This is really different. With kernel/userspace boundary, if the
patch is correct, the mechanism, sans bugs, should be able to
guarantee that the patched result is correct. With kthreads, it
can't. The boundary between the old and new worlds might or might not
be correct. How can they be the same?

The fact that they may coincide often can be useful as a guideline or
whatever but I'm completely against just mushing it together when it
isn't correct. This kind of things quickly lead to ambiguous
situations where people are not sure about the specific semantics or
guarantees of the construct and implement weird voodoo code followed
by voodoo fixes. We already had a full round of that with the kernel
freezer itself, where people thought that the freezer magically makes
PM work properly for a subsystem. Let's please not do that again.

> > This is the mechanism itself being flaky and buggy. This can make
> > things fail not because the patch itself is semantically wrong but
> > because the mechanism stopped the kernel at the wrong place.
>
> Well, to be precise, we are not "stopping" the kernel at all.

Sure, whatever the term is, this is the boundary that the mechanism
considers it safe to swap the code, right? User/kernel boundary
should be able to serve that purpose. Freezing points can't.

> > If kthread can't be safely supported now, that's fine but then make it
> > clear that functions which may be executed by kthreads aren't supported.
>
> So one of the aproaches implied by this would be first merging a light
> version of kGraft which doesn't support kthreads, and working towards a
> solution for kthreads as well (which might be tangential to kGraft; if
> most of the kthreads get converted to workqueues, it's a win-win
> situation anyway); would such kGraft-light get your Ack then? :)

Yes, I think that converting things over to workqueue or
kthread_worker is generally a good idea but I'm not sure I'm in the
position to ack or nack kgraft as a whole. I am not too sure about
the capability itself (neither positive or negative) and it'd take
quite a bit of energy to scrutinize and compare all the alternatives.
It'd be awesome if people who are working on the features can talk to
each other and see whether things can be combined.

Thanks.

--
tejun

2014-07-05 21:06:32

by Jiri Kosina

[permalink] [raw]
Subject: Re: kGraft to -next [was: 00/21 kGraft]

On Sat, 5 Jul 2014, Tejun Heo wrote:

> It'd be awesome if people who are working on the features can talk to
> each other and see whether things can be combined.

Oh, I absolutely agree; trust me, we are trying to get as much discussion
going on as possible :) I proposed it as a Kernel Summit topic, and
hopefully work will be done at the mini-summit proposed by Steven at
Plumbers as well.

--
Jiri Kosina
SUSE Labs

2014-07-05 21:08:59

by Tejun Heo

[permalink] [raw]
Subject: Re: kGraft to -next [was: 00/21 kGraft]

On Sat, Jul 05, 2014 at 11:06:28PM +0200, Jiri Kosina wrote:
> On Sat, 5 Jul 2014, Tejun Heo wrote:
>
> > It'd be awesome if people who are working on the features can talk to
> > each other and see whether things can be combined.
>
> Oh, I absolutely agree; trust me, we are trying to get as much discussion
> going on as possible :) I proposed it as a Kernel Summit topic, and
> hopefully work will be done at the mini-summit proposed by Steven at
> Plumbers as well.

Ah, that's awesome to hear. :)

--
tejun

2014-07-29 14:05:39

by Jiri Kosina

[permalink] [raw]
Subject: Re: kGraft to -next [was: 00/21 kGraft]

On Sat, 5 Jul 2014, Jiri Kosina wrote:

> > > The same holds for the kernel threads -- until all (or most of) the
> > > kthreads are converted to workqueues, the obivous choice, that should
> > > cover most of the use-cases, has been made.
> >
> > But, this is different.
>
> Quite frankly, I have to say that I don't personally see *that* big
> difference. In both cases we are making assumptions about semantics, and
> are trying to get "as close as possible" to the point in time where the
> set of assumptions can be made as minimal as possible.
>
> With userspace thread, this is kernel/userspace boundary. With kthread,
> this is starting of new iteration of the main loop / try_to_freeze().
>
> We were not able to come up with anything better. Alan, you were
> stating "I don't see why it can't use the kernel freeze functionality?".
> What exactly was your suggestion, if not this, please?

Seems like there was no further activity on this wile I was on vacation,
so let me resurrect this.

Alan, either we are already doing what you suggested, or I misunderstood
your proposal above. Could you please elaborate?

Thanks,

--
Jiri Kosina
SUSE Labs