2020-05-01 16:02:32

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 0/4] allow multiple kthreadd's

From: "J. Bruce Fields" <[email protected]>

These patches allow a caller to create its own kthreadd.

The motivation is file delegations: currently any write operation from a
client breaks all delegations, even delegations held by the same client.

To fix that, we need to know which client is performing a given
operation.

So, we let nfsd put all the nfsd threads into the same thread group (by
spawning them from its own private kthreadd), then patch the delegation
code to treat delegation breaks from the same thread group as not
conflicting, and then leave it to nfsd to sort out conflicts among its
own clients. Those patches are in:

git://linux-nfs.org/~bfields/linux.git deleg-fix-self-conflicts

This was an idea from Trond. Part of his motivation was that it could
work for userspace servers (like Ganesha and Samba) as well. (We don't
currently let them request delegations, but probably will some day--it
shouldn't be difficult.)

Previously I considered instead adding a new field somewhere in the
struct task. That might require a new system call to expose to user
space. Or we might be able to put this in a keyring, if David Howells
thought that would work.

Before that I tried passing the identity of the breaker explicitly, but
that looks like it would require passing the new argument around to huge
swaths of the VFS.

Anyway, does this multiple kthreadd approach look reasonable?

(If so, who should handle the patches?)

--b.

J. Bruce Fields (4):
kthreads: minor kthreadd refactoring
kthreads: Simplify tsk_fork_get_node
kthreads: allow multiple kthreadd's
kthreads: allow cloning threads with different flags

include/linux/kthread.h | 21 +++++-
init/init_task.c | 3 +
init/main.c | 4 +-
kernel/fork.c | 4 ++
kernel/kthread.c | 140 +++++++++++++++++++++++++++++-----------
5 files changed, 132 insertions(+), 40 deletions(-)

--
2.26.2


2020-05-01 16:02:32

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 1/4] kthreads: minor kthreadd refactoring

From: "J. Bruce Fields" <[email protected]>

Trivial refactoring, no change in behavior.

Not really necessary, a separate function for the inner loop just seems
a little nicer to me.

Signed-off-by: J. Bruce Fields <[email protected]>
---
kernel/kthread.c | 33 +++++++++++++++++++--------------
1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index bfbfa481be3a..4217fded891a 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -578,6 +578,24 @@ int kthread_stop(struct task_struct *k)
}
EXPORT_SYMBOL(kthread_stop);

+void kthread_do_work(void)
+{
+ spin_lock(&kthread_create_lock);
+ while (!list_empty(&kthread_create_list)) {
+ struct kthread_create_info *create;
+
+ create = list_entry(kthread_create_list.next,
+ struct kthread_create_info, list);
+ list_del_init(&create->list);
+ spin_unlock(&kthread_create_lock);
+
+ create_kthread(create);
+
+ spin_lock(&kthread_create_lock);
+ }
+ spin_unlock(&kthread_create_lock);
+}
+
int kthreadd(void *unused)
{
struct task_struct *tsk = current;
@@ -597,20 +615,7 @@ int kthreadd(void *unused)
schedule();
__set_current_state(TASK_RUNNING);

- spin_lock(&kthread_create_lock);
- while (!list_empty(&kthread_create_list)) {
- struct kthread_create_info *create;
-
- create = list_entry(kthread_create_list.next,
- struct kthread_create_info, list);
- list_del_init(&create->list);
- spin_unlock(&kthread_create_lock);
-
- create_kthread(create);
-
- spin_lock(&kthread_create_lock);
- }
- spin_unlock(&kthread_create_lock);
+ kthread_do_work();
}

return 0;
--
2.26.2

2020-05-01 16:02:32

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 2/4] kthreads: Simplify tsk_fork_get_node

From: "J. Bruce Fields" <[email protected]>

This will also simplify a following patch that allows multiple
kthreadd's.

Signed-off-by: J. Bruce Fields <[email protected]>
---
init/init_task.c | 3 +++
kernel/fork.c | 4 ++++
kernel/kthread.c | 3 +--
3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/init/init_task.c b/init/init_task.c
index bd403ed3e418..fdd760393760 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -154,6 +154,9 @@ struct task_struct init_task
.vtime.starttime = 0,
.vtime.state = VTIME_SYS,
#endif
+#ifdef CONFIG_NUMA
+ .pref_node_fork = NUMA_NO_NODE,
+#endif
#ifdef CONFIG_NUMA_BALANCING
.numa_preferred_nid = NUMA_NO_NODE,
.numa_group = NULL,
diff --git a/kernel/fork.c b/kernel/fork.c
index 8c700f881d92..fa35890534d5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -942,6 +942,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
tsk->fail_nth = 0;
#endif

+#ifdef CONFIG_NUMA
+ tsk->pref_node_fork = NUMA_NO_NODE;
+#endif
+
#ifdef CONFIG_BLK_CGROUP
tsk->throttle_queue = NULL;
tsk->use_memdelay = 0;
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 4217fded891a..483bee57a9c8 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -274,8 +274,7 @@ static int kthread(void *_create)
int tsk_fork_get_node(struct task_struct *tsk)
{
#ifdef CONFIG_NUMA
- if (tsk == kthreadd_task)
- return tsk->pref_node_fork;
+ return tsk->pref_node_fork;
#endif
return NUMA_NO_NODE;
}
--
2.26.2

2020-05-01 16:02:32

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 4/4] kthreads: allow cloning threads with different flags

From: "J. Bruce Fields" <[email protected]>

This is so knfsd can add CLONE_THREAD.

Signed-off-by: J. Bruce Fields <[email protected]>
---
include/linux/kthread.h | 3 ++-
kernel/kthread.c | 11 +++++++----
2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index a7ffdf96a3b2..7069feb6da65 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -10,11 +10,12 @@ struct kthread_group {
spinlock_t create_lock;
struct list_head create_list;
struct task_struct *task;
+ unsigned long flags;
};

extern struct kthread_group kthreadd_default;

-struct kthread_group *kthread_start_group(char *);
+struct kthread_group *kthread_start_group(unsigned long, char *);
void kthread_stop_group(struct kthread_group *);

struct task_struct *kthread_group_create_on_node(struct kthread_group *,
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 5232f6f597b5..57f6687ecec7 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -29,6 +29,7 @@ struct kthread_group kthreadd_default = {
.name = "kthreadd",
.create_lock = __SPIN_LOCK_UNLOCKED(kthreadd_default.create_lock),
.create_list = LIST_HEAD_INIT(kthreadd_default.create_list),
+ .flags = CLONE_FS | CLONE_FILES | SIGCHLD,
};

void wake_kthreadd(struct kthread_group *kg)
@@ -36,7 +37,7 @@ void wake_kthreadd(struct kthread_group *kg)
wake_up_process(kg->task);
}

-struct kthread_group *kthread_start_group(char *name)
+struct kthread_group *kthread_start_group(unsigned long flags, char *name)
{
struct kthread_group *new;
struct task_struct *task;
@@ -47,6 +48,7 @@ struct kthread_group *kthread_start_group(char *name)
spin_lock_init(&new->create_lock);
INIT_LIST_HEAD(&new->create_list);
new->name = name;
+ new->flags = flags;
task = kthread_run(kthreadd, new, name);
if (IS_ERR(task)) {
kfree(new);
@@ -314,7 +316,8 @@ int tsk_fork_get_node(struct task_struct *tsk)
return NUMA_NO_NODE;
}

-static void create_kthread(struct kthread_create_info *create)
+static void create_kthread(struct kthread_create_info *create,
+ unsigned long flags)
{
int pid;

@@ -322,7 +325,7 @@ static void create_kthread(struct kthread_create_info *create)
current->pref_node_fork = create->node;
#endif
/* We want our own signal handler (we take no signals by default). */
- pid = kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | SIGCHLD);
+ pid = kernel_thread(kthread, create, flags);
if (pid < 0) {
/* If user was SIGKILLed, I release the structure. */
struct completion *done = xchg(&create->done, NULL);
@@ -645,7 +648,7 @@ void kthread_do_work(struct kthread_group *kg)
list_del_init(&create->list);
spin_unlock(&kg->create_lock);

- create_kthread(create);
+ create_kthread(create, kg->flags);

spin_lock(&kg->create_lock);
}
--
2.26.2

2020-05-01 18:00:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/4] allow multiple kthreadd's

On Fri, May 1, 2020 at 9:02 AM J. Bruce Fields <[email protected]> wrote:
>
> Anyway, does this multiple kthreadd approach look reasonable?

I don't see anything that looks alarming.

My main reaction was that I don't like the "kthreadd" name, but that's
because for some reason I always read it as "kthre add". That may be
just me. It normally doesn't bother me (this code doesn't get all that
much work on it, it's been very stable), but it was very obvious when
reading your patches.

In fact, I liked _your_ naming better, to the point where I was going
"'kthread_group' is a much better name than 'kthreadd', and that
'kthreadd()' function would read better as 'kthread_group_run()' or
something".

But that may just be a personal quirk of mine, and isn't a big deal.
On the whole the patches looked all sane to me.

> (If so, who should handle the patches?)

We have had _very_ little work in this area, probably because most of
the kthread work has been subsumed by workqueues.

Which kind of makes me want to point a finger at Tejun. But it's been
mostly PeterZ touching this file lately..

Linus

2020-05-01 18:22:49

by [email protected]

[permalink] [raw]
Subject: Re: [PATCH 0/4] allow multiple kthreadd's

Hello,

On Fri, May 01, 2020 at 10:59:24AM -0700, Linus Torvalds wrote:
> Which kind of makes me want to point a finger at Tejun. But it's been
> mostly PeterZ touching this file lately..

Looks fine to me too. I don't quite understand the usecase tho. It looks
like all it's being used for is to tag some kthreads as belonging to the
same group. Can't that be done with kthread_data()?

Thanks.

--
tejun

2020-05-01 18:32:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/4] allow multiple kthreadd's

On Fri, May 1, 2020 at 11:22 AM Tejun Heo <[email protected]> wrote:
>
> Looks fine to me too. I don't quite understand the usecase tho. It looks
> like all it's being used for is to tag some kthreads as belonging to the
> same group. Can't that be done with kthread_data()?

I _think_ Bruce wants the signal handling unification too, because
nfsd wants to react to being shut down with signals.

But you're right, more explanation of why nfsd wants/needs a separate
grouping is a good idea.

Linus

2020-05-01 18:50:14

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 0/4] allow multiple kthreadd's

On Fri, May 01, 2020 at 02:21:54PM -0400, Tejun Heo wrote:
> Hello,
>
> On Fri, May 01, 2020 at 10:59:24AM -0700, Linus Torvalds wrote:
> > Which kind of makes me want to point a finger at Tejun. But it's been
> > mostly PeterZ touching this file lately..
>
> Looks fine to me too. I don't quite understand the usecase tho. It looks
> like all it's being used for is to tag some kthreads as belonging to the
> same group.

Pretty much.

> Can't that be done with kthread_data()?

Huh, maybe so, thanks.

I need to check this from generic file locking code that could be run by
any task--but I assume there's an easy way I can check if I'm a kthread
before calling kthread_data(current).

I do expect to expose a delegation interface for userspace servers
eventually too. But we could do the tgid check for them and still use
kthread_data() for nfsd. That might work.

--b.

2020-05-01 19:06:20

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 0/4] allow multiple kthreadd's

On Fri, 2020-05-01 at 14:49 -0400, J. Bruce Fields wrote:
> On Fri, May 01, 2020 at 02:21:54PM -0400, Tejun Heo wrote:
> > Hello,
> >
> > On Fri, May 01, 2020 at 10:59:24AM -0700, Linus Torvalds wrote:
> > > Which kind of makes me want to point a finger at Tejun. But it's
> > > been
> > > mostly PeterZ touching this file lately..
> >
> > Looks fine to me too. I don't quite understand the usecase tho. It
> > looks
> > like all it's being used for is to tag some kthreads as belonging
> > to the
> > same group.
>
> Pretty much.
>

Wen running an instance of knfsd from inside a container, you want to
be able to have the knfsd kthreads be parented to the container init
process so that they get killed off when the container is killed.

Right now, we can easily leak those kernel threads simply by killing
the container.

> > Can't that be done with kthread_data()?
>
> Huh, maybe so, thanks.
>
> I need to check this from generic file locking code that could be run
> by
> any task--but I assume there's an easy way I can check if I'm a
> kthread
> before calling kthread_data(current).
>
> I do expect to expose a delegation interface for userspace servers
> eventually too. But we could do the tgid check for them and still
> use
> kthread_data() for nfsd. That might work.
>
> --b.
>
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2020-05-01 19:21:33

by [email protected]

[permalink] [raw]
Subject: Re: [PATCH 0/4] allow multiple kthreadd's

Hello,

On Fri, May 01, 2020 at 07:05:46PM +0000, Trond Myklebust wrote:
> Wen running an instance of knfsd from inside a container, you want to
> be able to have the knfsd kthreads be parented to the container init
> process so that they get killed off when the container is killed.
>
> Right now, we can easily leak those kernel threads simply by killing
> the container.

Hmm... I'm not sure that'd be a lot easier because they're in their own
thread group. It's not like you can queue signal to the group leader cause
the other kthreads to automatically die. Each would have to handle the exit
explicitly. As long as there is a way to iterate the member kthreads (e.g.
list going through kthread_data or sth else hanging off there), just using
existing kthread interface might not be much different, or maybe even easier
given how hairy signal handling in kthreads can get.

Thanks.

--
tejun

2020-05-01 19:23:31

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 0/4] allow multiple kthreadd's

On Fri, May 01, 2020 at 07:05:46PM +0000, Trond Myklebust wrote:
> On Fri, 2020-05-01 at 14:49 -0400, J. Bruce Fields wrote:
> > On Fri, May 01, 2020 at 02:21:54PM -0400, Tejun Heo wrote:
> > > Hello,
> > >
> > > On Fri, May 01, 2020 at 10:59:24AM -0700, Linus Torvalds wrote:
> > > > Which kind of makes me want to point a finger at Tejun. But it's
> > > > been
> > > > mostly PeterZ touching this file lately..
> > >
> > > Looks fine to me too. I don't quite understand the usecase tho. It
> > > looks
> > > like all it's being used for is to tag some kthreads as belonging
> > > to the
> > > same group.
> >
> > Pretty much.
>
> Wen running an instance of knfsd from inside a container, you want to
> be able to have the knfsd kthreads be parented to the container init
> process so that they get killed off when the container is killed.
>
> Right now, we can easily leak those kernel threads simply by killing
> the container.

Oh, got it.

Currently knfsd supports nfs service in containers, but it uses a single
set of threads to serve requests from any container. It should shut the
server threads down when the last container using them goes away.

--b.