LinuxLists.cc - [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

2005-02-17 14:58:31

Subject: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

Hello,

It's a new patch that implements a fork connector in the
kernel/fork.c:do_fork() routine. The connector sends information about
parent PID and child PID over a netlink interface. It allows to several
user space applications to be alerted when a fork occurs in the kernel.
The main drawback is that even if nobody listens, a message is send. I
don't know how to avoid that. I added an option (FORK_CONNECTOR) to
enable the fork connector (or disable) when compiling the kernel. To
work, connector must be compiled as built-in (CONFIG_CONNECTOR=y). It
has been tested on a 2.6.11-rc3-mm2 kernel with two user space
applications connected.

It is used by ELSA to manage group of processes in user space. In
conjunction with a per-process accounting information, like BSD or CSA,
ELSA provides a per-group of processes accounting.

Every comments are welcome,

Guillaume

Signed-off-by: Guillaume Thouvenin <[email protected]>
---

drivers/connector/Kconfig | 10 ++++++++++
include/linux/connector.h | 2 ++
kernel/fork.c | 41 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 53 insertions(+)

diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/drivers/connector/Kconfig linux-2.6.11-rc3-mm2-cnfork/drivers/connector/Kconfig
--- linux-2.6.11-rc3-mm2/drivers/connector/Kconfig 2005-02-11 11:00:16.000000000 +0100
+++ linux-2.6.11-rc3-mm2-cnfork/drivers/connector/Kconfig 2005-02-17 15:48:41.000000000 +0100
@@ -10,4 +10,14 @@ config CONNECTOR
Connector support can also be built as a module. If so, the module
will be called cn.ko.

+config FORK_CONNECTOR
+ bool "Enable fork connector"
+ depends on CONNECTOR
+ ---help---
+ It adds a connector in kernel/fork.c:do_fork() function. When a fork
+ occurs, netlink is used to transfer information about the parent and
+ its child. This information can be used by a user space application.
+
+ Note: it only works if connector is built in the kernel.
+
endmenu
diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/include/linux/connector.h linux-2.6.11-rc3-mm2-cnfork/include/linux/connector.h
--- linux-2.6.11-rc3-mm2/include/linux/connector.h 2005-02-11 11:00:18.000000000 +0100
+++ linux-2.6.11-rc3-mm2-cnfork/include/linux/connector.h 2005-02-16 15:07:46.000000000 +0100
@@ -28,6 +28,8 @@
#define CN_VAL_KOBJECT_UEVENT 0x0000
#define CN_IDX_SUPERIO 0xaabb /* SuperIO subsystem */
#define CN_VAL_SUPERIO 0xccdd
+#define CN_IDX_FORK 0xfeed /* fork events */
+#define CN_VAL_FORK 0xbeef

#define CONNECTOR_MAX_MSG_SIZE 1024
diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/kernel/fork.c linux-2.6.11-rc3-mm2-cnfork/kernel/fork.c
--- linux-2.6.11-rc3-mm2/kernel/fork.c 2005-02-11 11:00:18.000000000 +0100
+++ linux-2.6.11-rc3-mm2-cnfork/kernel/fork.c 2005-02-17 13:43:48.000000000 +0100
@@ -41,6 +41,7 @@
#include <linux/profile.h>
#include <linux/rmap.h>
#include <linux/acct.h>
+#include <linux/connector.h>

#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -63,6 +64,44 @@ DEFINE_PER_CPU(unsigned long, process_co

EXPORT_SYMBOL(tasklist_lock);

+#if defined(CONFIG_CONNECTOR) && defined(CONFIG_FORK_CONNECTOR)
+#define FORK_CN_INFO_SIZE 64
+static inline void fork_connector(pid_t parent, pid_t child)
+{
+ struct cb_id fork_id = {CN_IDX_FORK, CN_VAL_FORK};
+ static __u32 seq; /* used to test if we lost message */
+
+ if (cn_already_initialized) {
+ struct cn_msg *msg;
+ size_t size;
+
+ size = sizeof(*msg) + FORK_CN_INFO_SIZE;
+ msg = kmalloc(size, GFP_KERNEL);
+ if (msg) {
+ memset(msg, '\0', size);
+ memcpy(&msg->id, &fork_id, sizeof(msg->id));
+ msg->seq = seq++;
+ msg->ack = 0; /* not used */
+ /*
+ * size of data is the number of characters
+ * printed plus one for the trailing '\0'
+ */
+ msg->len = snprintf(msg->data, FORK_CN_INFO_SIZE-1,
+ "%i %i", parent, child) + 1;
+
+ cn_netlink_send(msg, 1);
+
+ kfree(msg);
+ }
+ }
+}
+#else
+static inline void fork_connector(pid_t parent, pid_t child)
+{
+ return;
+}
+#endif
+
int nr_processes(void)
{
int cpu;
@@ -1238,6 +1277,8 @@ long do_fork(unsigned long clone_flags,
if (unlikely (current->ptrace & PT_TRACE_VFORK_DONE))
ptrace_notify ((PTRACE_EVENT_VFORK_DONE << 8) | SIGTRAP);
}
+
+ fork_connector(current->pid, p->pid);
} else {
free_pidmap(pid);
pid = PTR_ERR(p);

2005-02-17 15:48:18

by Evgeniy Polyakov

[permalink] [raw]

Subject: Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

On Thu, 2005-02-17 at 15:55 +0100, Guillaume Thouvenin wrote:
> Hello,

Hello, I have small note about connector's usage in your module
in particular and others in general.

> It's a new patch that implements a fork connector in the
> kernel/fork.c:do_fork() routine. The connector sends information about
> parent PID and child PID over a netlink interface. It allows to several
> user space applications to be alerted when a fork occurs in the kernel.
> The main drawback is that even if nobody listens, a message is send. I
> don't know how to avoid that. I added an option (FORK_CONNECTOR) to
> enable the fork connector (or disable) when compiling the kernel. To
> work, connector must be compiled as built-in (CONFIG_CONNECTOR=y). It
> has been tested on a 2.6.11-rc3-mm2 kernel with two user space
> applications connected.
>
> It is used by ELSA to manage group of processes in user space. In
> conjunction with a per-process accounting information, like BSD or CSA,
> ELSA provides a per-group of processes accounting.

I think people will complain here...
From rom one point of view it is step to the chaotic microkernel message
flows,
from the other side why only fork is monitored in this way?
I still think that lsm with all calls logging is the best way to
achieve
this goal.

> Every comments are welcome,
>
> Guillaume
>
> Signed-off-by: Guillaume Thouvenin <[email protected]>
> ---
>
> drivers/connector/Kconfig | 10 ++++++++++
> include/linux/connector.h | 2 ++
> kernel/fork.c | 41 +++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 53 insertions(+)
>
> diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/drivers/connector/Kconfig linux-2.6.11-rc3-mm2-cnfork/drivers/connector/Kconfig
> --- linux-2.6.11-rc3-mm2/drivers/connector/Kconfig 2005-02-11 11:00:16.000000000 +0100
> +++ linux-2.6.11-rc3-mm2-cnfork/drivers/connector/Kconfig 2005-02-17 15:48:41.000000000 +0100
> @@ -10,4 +10,14 @@ config CONNECTOR
> Connector support can also be built as a module. If so, the module
> will be called cn.ko.
>
> +config FORK_CONNECTOR
> + bool "Enable fork connector"
> + depends on CONNECTOR
> + ---help---
> + It adds a connector in kernel/fork.c:do_fork() function. When a fork
> + occurs, netlink is used to transfer information about the parent and
> + its child. This information can be used by a user space application.
> +
> + Note: it only works if connector is built in the kernel.
> +
> endmenu
> diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/include/linux/connector.h linux-2.6.11-rc3-mm2-cnfork/include/linux/connector.h
> --- linux-2.6.11-rc3-mm2/include/linux/connector.h 2005-02-11 11:00:18.000000000 +0100
> +++ linux-2.6.11-rc3-mm2-cnfork/include/linux/connector.h 2005-02-16 15:07:46.000000000 +0100
> @@ -28,6 +28,8 @@
> #define CN_VAL_KOBJECT_UEVENT 0x0000
> #define CN_IDX_SUPERIO 0xaabb /* SuperIO subsystem */
> #define CN_VAL_SUPERIO 0xccdd
> +#define CN_IDX_FORK 0xfeed /* fork events */
> +#define CN_VAL_FORK 0xbeef
>
>
> #define CONNECTOR_MAX_MSG_SIZE 1024
> diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/kernel/fork.c linux-2.6.11-rc3-mm2-cnfork/kernel/fork.c
> --- linux-2.6.11-rc3-mm2/kernel/fork.c 2005-02-11 11:00:18.000000000 +0100
> +++ linux-2.6.11-rc3-mm2-cnfork/kernel/fork.c 2005-02-17 13:43:48.000000000 +0100
> @@ -41,6 +41,7 @@
> #include <linux/profile.h>
> #include <linux/rmap.h>
> #include <linux/acct.h>
> +#include <linux/connector.h>
>
> #include <asm/pgtable.h>
> #include <asm/pgalloc.h>
> @@ -63,6 +64,44 @@ DEFINE_PER_CPU(unsigned long, process_co
>
> EXPORT_SYMBOL(tasklist_lock);
>
> +#if defined(CONFIG_CONNECTOR) && defined(CONFIG_FORK_CONNECTOR)

I suspect CONFIG_FORK_CONNECTOR is enough.

> +#define FORK_CN_INFO_SIZE 64
> +static inline void fork_connector(pid_t parent, pid_t child)
> +{
> + struct cb_id fork_id = {CN_IDX_FORK, CN_VAL_FORK};
> + static __u32 seq; /* used to test if we lost message */
> +
> + if (cn_already_initialized) {
> + struct cn_msg *msg;
> + size_t size;
> +
> + size = sizeof(*msg) + FORK_CN_INFO_SIZE;
> + msg = kmalloc(size, GFP_KERNEL);
> + if (msg) {
> + memset(msg, '\0', size);
> + memcpy(&msg->id, &fork_id, sizeof(msg->id));
> + msg->seq = seq++;
> + msg->ack = 0; /* not used */
> + /*
> + * size of data is the number of characters
> + * printed plus one for the trailing '\0'
> + */
> + msg->len = snprintf(msg->data, FORK_CN_INFO_SIZE-1,
> + "%i %i", parent, child) + 1;
> +
> + cn_netlink_send(msg, 1);

"1" here means that this message will be delivered to any group
which has it's first bit set(1, 3, and so on) in given socket queue.
I suspect it is not what you want.
By design connector's users should send messages to the group it was
registered with
(which is obtained from idx field of the struct cb_id), in your case it
is CN_IDX_FORK,
and connector userspace consumers should bind to the same group (idx
value).
It is of course not requirement, but a fair path(hmm, I can add more
strict checks into connector).
By setting 0 as second parameter for cn_netlink_send() you will force
connector's core
to select proper group for you.

> + kfree(msg);
> + }
> + }
> +}
> +#else
> +static inline void fork_connector(pid_t parent, pid_t child)
> +{
> + return;
> +}
> +#endif
> +
> int nr_processes(void)
> {
> int cpu;
> @@ -1238,6 +1277,8 @@ long do_fork(unsigned long clone_flags,
> if (unlikely (current->ptrace & PT_TRACE_VFORK_DONE))
> ptrace_notify ((PTRACE_EVENT_VFORK_DONE << 8) | SIGTRAP);
> }
> +
> + fork_connector(current->pid, p->pid);
> } else {
> free_pidmap(pid);
> pid = PTR_ERR(p);
>
--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski

Attachments:

signature.asc (189.00 B)
This is a digitally signed message part

2005-02-21 07:08:10

by Guillaume Thouvenin

[permalink] [raw]

Subject: Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

On Thu, 2005-02-17 at 18:50 +0300, Evgeniy Polyakov wrote:
> On Thu, 2005-02-17 at 15:55 +0100, Guillaume Thouvenin wrote:
> > It's a new patch that implements a fork connector in the
> > kernel/fork.c:do_fork() routine. The connector sends information about
> > parent PID and child PID over a netlink interface. It allows to several
> > user space applications to be alerted when a fork occurs in the kernel.
> > The main drawback is that even if nobody listens, a message is send. I
> > don't know how to avoid that. I added an option (FORK_CONNECTOR) to
> > enable the fork connector (or disable) when compiling the kernel. To
> > work, connector must be compiled as built-in (CONFIG_CONNECTOR=y). It
> > has been tested on a 2.6.11-rc3-mm2 kernel with two user space
> > applications connected.
> >
> > It is used by ELSA to manage group of processes in user space. In
> > conjunction with a per-process accounting information, like BSD or CSA,
> > ELSA provides a per-group of processes accounting.
>
> I think people will complain here...
> ... [cut here] ...
> I still think that lsm with all calls logging is the best way to
> achieve this goal.

I agree with you. My first implementation was with LSM but Chris Wright
(I think it was him) notice that it's not the right framework (and it
seems true). So I looked for another solution. I though about kobject
but it was too "big" and finally, Greg KH spoke about connectors. It's
small and efficient.

> from the other side why only fork is monitored in this way?

The problem is the following: I have a user space daemon that manages
group of processes. The main idea is, if a parent belongs to a group
then its child belongs to the same group. To achieve this I need to know
when a fork occurs and which processes are involved. I don't see how to
do this without a hook in the do_fork() routine... Any ideas are
welcome.

Thank you Evgeniy for all your comments about the code, it helps and I
will modify the patch.

Regards,
Guillaume

2005-02-21 08:05:41

by Guillaume Thouvenin

[permalink] [raw]

Subject: Re: [Elsa-devel] Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

On Thu, 2005-02-17 at 18:50 +0300, Evgeniy Polyakov wrote:
> >
> > +#if defined(CONFIG_CONNECTOR) && defined(CONFIG_FORK_CONNECTOR)
>
> I suspect CONFIG_FORK_CONNECTOR is enough.

The problem here is that if connector is compiled as a module and
fork_connector as builtin there will be undefined reference to symbol
like 'cn_already_initialized' or 'cn_netlink_send'. That's why the
fork_connector() must be enable if CONFIG_CONNECTOR and
CONFIG_FORK_CONNECTOR are selected as builtin and not as a module. I
agree that it's not very elegant...

> > + cn_netlink_send(msg, 1);
>
> "1" here means that this message will be delivered to any group
> which has it's first bit set(1, 3, and so on) in given socket queue.
> I suspect it is not what you want.
> By design connector's users should send messages to the group it was
> registered with
> (which is obtained from idx field of the struct cb_id), in your case it
> is CN_IDX_FORK,
> and connector userspace consumers should bind to the same group (idx
> value).
> It is of course not requirement, but a fair path(hmm, I can add more
> strict checks into connector).
> By setting 0 as second parameter for cn_netlink_send() you will force
> connector's core
> to select proper group for you.

Yes but cn_netlink_send() is looking for a callback with the same idx.
As I don't use any callback, found == 0 and I have an error "Failed to
find multicast netlink group for callback[0xfeed.0xbeef]". So the good
call is: cn_netlink_send(msg, CN_IDX_FORK);

Thanks for your help,
Guillaume

2005-02-21 08:36:36

by Evgeniy Polyakov

[permalink] [raw]

Subject: Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

On Mon, 2005-02-21 at 08:07 +0100, Guillaume Thouvenin wrote:
> On Thu, 2005-02-17 at 18:50 +0300, Evgeniy Polyakov wrote:
> > On Thu, 2005-02-17 at 15:55 +0100, Guillaume Thouvenin wrote:
> > > It's a new patch that implements a fork connector in the
> > > kernel/fork.c:do_fork() routine. The connector sends information about
> > > parent PID and child PID over a netlink interface. It allows to several
> > > user space applications to be alerted when a fork occurs in the kernel.
> > > The main drawback is that even if nobody listens, a message is send. I
> > > don't know how to avoid that. I added an option (FORK_CONNECTOR) to
> > > enable the fork connector (or disable) when compiling the kernel. To
> > > work, connector must be compiled as built-in (CONFIG_CONNECTOR=y). It
> > > has been tested on a 2.6.11-rc3-mm2 kernel with two user space
> > > applications connected.
> > >
> > > It is used by ELSA to manage group of processes in user space. In
> > > conjunction with a per-process accounting information, like BSD or CSA,
> > > ELSA provides a per-group of processes accounting.
> >
> > I think people will complain here...
> > ... [cut here] ...
> > I still think that lsm with all calls logging is the best way to
> > achieve this goal.
>
> I agree with you. My first implementation was with LSM but Chris Wright
> (I think it was him) notice that it's not the right framework (and it
> seems true). So I looked for another solution. I though about kobject
> but it was too "big" and finally, Greg KH spoke about connectors. It's
> small and efficient.

Your do_fork() change really looks like either audit addon(but it is
really
not the case) or LSM logging facility.
I think adding cn_netlink_send() in every function in security/dummy.c
and renaming it to security/cn_logger.c or something is not such a bad
idea...
Or even wait in each function until userspace replies with the decision
to
allow or not such call.
Although it can create a lock (need to recheck security hooks in
send/recv pathes).

> > from the other side why only fork is monitored in this way?
>
> The problem is the following: I have a user space daemon that manages
> group of processes. The main idea is, if a parent belongs to a group
> then its child belongs to the same group. To achieve this I need to know
> when a fork occurs and which processes are involved. I don't see how to
> do this without a hook in the do_fork() routine... Any ideas are
> welcome.

Now I begin to understand Chris Wright - LSM are designed not for
monitoring,
but only for initialisation path - i.e. LSM will say only if something
is allowed or not,
but not if it was performed.

So, for exactly your setup there is no any other way then to patch
do_fork().

> Thank you Evgeniy for all your comments about the code, it helps and I
> will modify the patch.
>
> Regards,
> Guillaume
--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski

Attachments:

signature.asc (189.00 B)
This is a digitally signed message part

2005-02-21 08:43:09

by Evgeniy Polyakov

[permalink] [raw]

Subject: Re: [Elsa-devel] Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

On Mon, 2005-02-21 at 09:05 +0100, Guillaume Thouvenin wrote:
> On Thu, 2005-02-17 at 18:50 +0300, Evgeniy Polyakov wrote:
> > >
> > > +#if defined(CONFIG_CONNECTOR) && defined(CONFIG_FORK_CONNECTOR)
> >
> > I suspect CONFIG_FORK_CONNECTOR is enough.
>
> The problem here is that if connector is compiled as a module and
> fork_connector as builtin there will be undefined reference to symbol
> like 'cn_already_initialized' or 'cn_netlink_send'. That's why the
> fork_connector() must be enable if CONFIG_CONNECTOR and
> CONFIG_FORK_CONNECTOR are selected as builtin and not as a module. I
> agree that it's not very elegant...

Maybe "depends on CONNECTOR=y" ?

> > > + cn_netlink_send(msg, 1);
> >
> > "1" here means that this message will be delivered to any group
> > which has it's first bit set(1, 3, and so on) in given socket queue.
> > I suspect it is not what you want.
> > By design connector's users should send messages to the group it was
> > registered with
> > (which is obtained from idx field of the struct cb_id), in your case it
> > is CN_IDX_FORK,
> > and connector userspace consumers should bind to the same group (idx
> > value).
> > It is of course not requirement, but a fair path(hmm, I can add more
> > strict checks into connector).
> > By setting 0 as second parameter for cn_netlink_send() you will force
> > connector's core
> > to select proper group for you.
>
> Yes but cn_netlink_send() is looking for a callback with the same idx.
> As I don't use any callback, found == 0 and I have an error "Failed to
> find multicast netlink group for callback[0xfeed.0xbeef]". So the good
> call is: cn_netlink_send(msg, CN_IDX_FORK);

Uh-oh, I see...
I recall your previous patch with the fork_callback()...

Nevertheless "1" is a bad idea, CN_IDX_FORK is what is expected.

> Thanks for your help,
> Guillaume
--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski

Attachments:

signature.asc (189.00 B)
This is a digitally signed message part

2005-02-21 09:51:28

by Paul Jackson

[permalink] [raw]

Subject: Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

Guillaume wrote:
> The problem is the following: I have a user space daemon that manages
> group of processes. The main idea is, if a parent belongs to a group
> then its child belongs to the same group. To achieve this I need to know
> when a fork occurs and which processes are involved. I don't see how to
> do this without a hook in the do_fork() routine...

How is what you need, for process grouping, any more complex than
another sort of {bank, job, aggregate, session, group, ...} integer id
field in the task struct, that is copied on fork, and can be queried and
manipulated from user space, in accordance with whatever rules you
implement?

When I look at the elsacct_process_copy() routine, which is called from
fork, in your patch-2.6.8.1-elsa, I'm not sure what it does, but it sure
looks like it could cause scaling and performance problems. Linux works
really hard to keep fork costs low. Copying another integer field, as
part of the block copy of the task struct at fork, sure would be cheaper
than this. Not only does this hook look too expensive, I don't even see
the need for any such explicit code hook in fork for accounting at all.

Does your user space daemon require to know about each task as it is
forked, in near real time? Is it trying to do something with this
accounting information while the tasks being accounted for are
necessarily still alive? The classic accounting that I am familiar
with, from years ago, only did post-mortem analysis. So long as enough
entrails were left around so that it could piece together the story, it
didn't require any immediate notice of anything. You need to clear a
couple of accounting accumulators directly in the task struct at fork,
and write a record to a specified open file at exit. That's about it.

The main problems I was aware of with that classic accounting (which
is probably what is now known as BSD accounting) are:
1) The fixed length accounting record didn't allow for added or
longer fields. A little bit more flexible and extensible format
is desired, but some effort should be made to keep the format
still reasonably tight and compressed. No full spec XML.
The format should allow for some form of resyncronization after
a chunk of data is lost.
2) An additional bank/job/aggregate/session/group/... id seems desired.
I have yet to understand why this need be anything fancier than
another integer field in the task struct.
3) Probably some more data items are worth collecting -- which could
be placed in the outgoing compressed data stream, along with the
existing records written on task exit. Over time, appropriate
hooks should be proposed to collect such data as seems needed.
4) The current mechanism of collecting per-task data only on exit
makes it difficult to account for long running jobs. Perhaps we
could use a leisurely background task that slowly scans the tasks
looking for those that have been present since the last scan, and
causes an intermediate accounting record to be written for them.

What other essential deficiencies are there that you need to address?

I don't see any need for explicit hooks in fork to resolve the above
deficiencies.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373, 1.925.600.0401

2005-02-21 10:33:14

by Guillaume Thouvenin

[permalink] [raw]

Subject: Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

On Mon, 2005-02-21 at 01:47 -0800, Paul Jackson wrote:
> Guillaume wrote:
> > The problem is the following: I have a user space daemon that manages
> > group of processes. The main idea is, if a parent belongs to a group
> > then its child belongs to the same group. To achieve this I need to know
> > when a fork occurs and which processes are involved. I don't see how to
> > do this without a hook in the do_fork() routine...
>
> How is what you need, for process grouping, any more complex than
> another sort of {bank, job, aggregate, session, group, ...} integer id
> field in the task struct, that is copied on fork, and can be queried and
> manipulated from user space, in accordance with whatever rules you
> implement?

If a process belongs to several group of processes, an new integer in
the task_struct is not enough, you need a list or something like this.
If you're using a list you need to add function to manage this list in
the kernel but we don't want to add this kind of management inside the
kernel because with the fork connector we can keep it outside.

> When I look at the elsacct_process_copy() routine, which is called from
> fork, in your patch-2.6.8.1-elsa, I'm not sure what it does, but it sure
> looks like it could cause scaling and performance problems.

This patch is an old one with many kernel modifications that impacts the
Linux performance. That's why we thought about another solution where
all management is done by a user space daemon. Currently we're using the
fork connector.

> Does your user space daemon require to know about each task as it is
> forked, in near real time? Is it trying to do something with this
> accounting information while the tasks being accounted for are
> necessarily still alive?

I don't need real time. I just need to know which process forks during
the accounting period. The user space daemon provided by ELSA just keeps
a trace of parents and its children during a given period (generally
it's the accounting period). The analysis will be done later by another
application (also provided by ELSA) by using the trace of parents and
children plus the accounting trace.

> The main problems I was aware of with that classic accounting (which
> is probably what is now known as BSD accounting) are:
> ... [cut here] ...
> 2) An additional bank/job/aggregate/session/group/... id seems desired.
> I have yet to understand why this need be anything fancier than
> another integer field in the task struct.

We're trying to solve this one. I think that I answer to the integer
field problem. For the necessity of the per-group accounting, it can be
interesting to do accounting on a specific "task". For example if you
want to have accounting data for a compilation you can add the
corresponding shell in a group of processes and commands involved in the
compilation like gcc, cc, as, collect2, ... will be automatically added
in the same group and you will be able to get statistics about this
compilation.

> 3) Probably some more data items are worth collecting -- which could
> be placed in the outgoing compressed data stream, along with the
> existing records written on task exit. Over time, appropriate
> hooks should be proposed to collect such data as seems needed.

There is a discussion about this with Jay Lan to merge the CSA and BSD
accounting framework.

I don't know if there is some work around 1) and 4).

Regards,
Guillaume

2005-02-21 12:00:41

by Paul Jackson

[permalink] [raw]

Subject: Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

Thank-you for your quick answer.

Guillaume wrote:
>
> If a process belongs to several group of processes, an new integer in
> the task_struct is not enough, you need a list or something like this.
> If you're using a list you need to add function to manage this list in
> the kernel but we don't want to add this kind of management inside the
> kernel because with the fork connector we can keep it outside.

Ok - fork connect. From your patch of a couple days ago, for the
benefit of lurkers:
>
> It's a new patch that implements a fork connector in the
> kernel/fork.c:do_fork() routine. The connector sends information about
> parent PID and child PID over a netlink interface. It allows to several
> user space applications to be alerted when a fork occurs in the kernel.

Whoaa ... you're saying that because you might have several groups a
task could belong to at once, you'll use netlink to avoid managing lists
in the kernel. Seems that you're spending thousands of instructions to
save dozens. This is not a good trade off.

I can imagine several way cheaper ways to handle this.

If the number of groups to which a task could belong has some small
finite upper limit, like at most 5 groups, you could have 5 integer id's
in the task struct instead of 1. If the number of elements in a
particular group has a small upper bound, you could even replace the
ints with bit fields.

Or you could enumerate the different combinations of groups to which a
task might belong, assign each such combination a unique integer, and
keep that integer in the task struct. The enumeration could be done
dynamically, only counting the particular combinations of group
memberships that actually had use. This has the disadvantage that a
particular combination, once enumerated, would have to stay around until
the next boot - a potential memory leak. Probably not acceptable,
unless the cost of storing a no longer used combination is nearly zero.

Or you could have a little 'jobids' struct that held a list and a
reference counter, where the list held a particular combination of ids,
and the reference counter tracked how many tasks referenced that jobids
struct. Put a single pointer in the task struct to a jobids struct, and
increment and decrement the reference counter in the jobids struct on
fork and exit. Free it if the count goes to zero on exit. This solves
the memory leak of the previous, with increased cost to the fork. Since
we really do design these systems to stay up 'forever', this is perhaps
the winner. Any time a particular task is added to, or removed from, a
group, if the ref count of its jobids struct is one, then modify the id
list attached to that jobids struct in place. If the ref count is more
than one, copy the jobids struct and list to a new one, decrement the
count in the old one, and modify the new one in place. Such list and
counter manipulations are the daily stuff of kernel code. No need to
avoid such.

Just because you have more than one id doesn't mean each task has to be
connected directly into its own custom list, and even if you needed
that, I don't see that it's a win to avoid such a list by using netlink.

It can be a worthwhile exercise to single step through each machine
instruction that you add to fork, in the forking task or any other task
that is sent data or a signal therefrom. You really do want to keep the
number of added instructions (and number of additional cache lines and
memory pages accessed, especially written) to a minimum. If the effort
of single stepping through such would require the patience of
Copernicus, then it's back to the drawing board for a more efficient
solution.

> I don't know if there is some work around 1) and 4).

Well, you might have dodged the (1) bullet up until now by using netlink
and not extending the accounting record at exit. Bullet (1) was
extending the accounting record past its fairly constrained size, if
that's still a problem; it's been years since I looked. But if you
adapt one of the above suggestions, and don't send anything out of the
task context at fork, then you will have to deal with (1) in order to
include the list of job id's in the record written at exit.

If you want to collect any other data, bullet (3), you will also to
solve bullet (1).

Item (4), collecting accounting data for long running tasks, is probably
less pressing. Its solution will also likely require solving (1),
however.

Taking a quick look at init/Kconfig and include/linux/acct.h, it seems
we are using BSD_PROCESS_ACCT_V3 format, which is the latest 64 byte
format, allowing for larger uid/gid.

With slight variations, this 64 byte format has lasted about 25 years.
It's time to replace it, especially if you have designs on collecting
any additional information, which you clearly do.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373, 1.925.600.0401

2005-02-21 14:44:10

by Guillaume Thouvenin

[permalink] [raw]

Subject: Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

On Mon, 2005-02-21 at 03:58 -0800, Paul Jackson wrote:
> > It's a new patch that implements a fork connector in the
> > kernel/fork.c:do_fork() routine. The connector sends information about
> > parent PID and child PID over a netlink interface. It allows to several
> > user space applications to be alerted when a fork occurs in the kernel.
>
> Whoaa ... you're saying that because you might have several groups a
> task could belong to at once, you'll use netlink to avoid managing lists
> in the kernel. Seems that you're spending thousands of instructions to
> save dozens. This is not a good trade off.

I understand your point of view but I'm using netlink interface
because it's already in the kernel so my choice is to use something that
is already in the kernel instead of adding dozens of new instructions
and also to do things in user space. The fork connector is here to move
the management in the user space. Otherwise there is PAGG that manages
group of processes in the kernel. To test performances, I tried to
compile a kernel several times with and without the fork connector and
here are the resource usage computed with the following command:

time /bin/sh -c 'make O=/home/guill/build/k2610 oldconfig \
&& make O=/home/guill/build/k2610 bzImage \
&& make O=/home/guill/build/k2610 modules'

between each test, the directory that contains object files was
destroyed and a 'sync' was done.

Results are:

kernel without fork connector
real : 8m17.042s 8m10.113s 8m08.597s 8m10.068s 8m08.930s
user : 7m32.376s 7m35.985s 7m34.424s 7m34.221s 7m34.835s
sys : 0m50.730s 0m51.139s 0m51.159s 0m51.406s 0m51.020s

kernel with the fork connector
real : 8m14.492s 8m08.656s 8m07.754s 8m08.002s 8m07.854s
user : 7m31.664s 7m33.528s 7m33.625s 7m33.500s 7m33.822s
sys : 0m50.651s 0m51.222s 0m51.102s 0m51.367s 0m50.894s

kernel with the fork connector + application listens
real : 8m08.596s 8m08.950s 8m08.899s 8m08.678s 8m08.987s
user : 7m33.312s 7m33.898s 7m34.004s 7m33.285s 7m33.628s
sys : 0m52.222s 0m52.013s 0m51.809s 0m52.361s 0m52.036s

I also choose this implementation because Erich Focht wrote in the
email http://lkml.org/lkml/2004/12/17/99 that keeps the historic about
the creation of processes "sounds very useful for a lot of interesting
stuff". So I thought about something that can be used by other
application and with netlink, information is available to everyone.

> I can imagine several way cheaper ways to handle this.
>
> If the number of groups to which a task could belong has some small
> finite upper limit, like at most 5 groups, you could have 5 integer id's
> in the task struct instead of 1. If the number of elements in a
> particular group has a small upper bound, you could even replace the
> ints with bit fields.
>
> Or you could enumerate the different combinations of groups to which a
> task might belong, assign each such combination a unique integer, and
> keep that integer in the task struct. The enumeration could be done
> dynamically, only counting the particular combinations of group
> memberships that actually had use. This has the disadvantage that a
> particular combination, once enumerated, would have to stay around until
> the next boot - a potential memory leak. Probably not acceptable,
> unless the cost of storing a no longer used combination is nearly zero.

The problem with those solutions is that we suppose that a process can
belong to a finite number of task. I suppose that it can be true in
practice.

> Or you could have a little 'jobids' struct that held a list and a
> reference counter, where the list held a particular combination of ids,
> and the reference counter tracked how many tasks referenced that jobids
> struct. Put a single pointer in the task struct to a jobids struct, and
> increment and decrement the reference counter in the jobids struct on
> fork and exit. Free it if the count goes to zero on exit. This solves
> the memory leak of the previous, with increased cost to the fork. Since
> we really do design these systems to stay up 'forever', this is perhaps
> the winner. Any time a particular task is added to, or removed from, a
> group, if the ref count of its jobids struct is one, then modify the id
> list attached to that jobids struct in place. If the ref count is more
> than one, copy the jobids struct and list to a new one, decrement the
> count in the old one, and modify the new one in place. Such list and
> counter manipulations are the daily stuff of kernel code. No need to
> avoid such.

This solution is interesting. The problem is to know if a fork
connector is useful for some other projects. As I said, one of my goal
was to provide a way to alert user space application when a fork occurs
in the kernel because I think that other applications need this kind of
information. But if it is needed only by ELSA, you're right, a solution
specific to our problem is clearly more efficient.

> Just because you have more than one id doesn't mean each task has to be
> connected directly into its own custom list, and even if you needed
> that, I don't see that it's a win to avoid such a list by using netlink.

The advantages with the fork connector is that it can be used by other
application and modification in the current kernel is minimal. The main
drawbacks is maybe the performance...

So nobody needs such hook (Erich?)

Thank Paul for your comments,
Guillaume

2005-02-21 16:55:21

by Erich Focht

[permalink] [raw]

Subject: Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

On Monday 21 February 2005 15:43, Guillaume Thouvenin wrote:
>
> I also choose this implementation because Erich Focht wrote in the
> email http://lkml.org/lkml/2004/12/17/99 that keeps the historic about
> the creation of processes "sounds very useful for a lot of interesting
> stuff". So I thought about something that can be used by other
> application and with netlink, information is available to everyone.

Besides accounting I had in mind something like cluster-wide pid
tracking in userspace with builtin relationship information. A bit of
single system image integration... As I don't have it, yet, I'm not
(yet) a very strong requester for the service provided by your
module. But I still think it's usefull and might want later a hook on
exit, too. (And yes, I can imagine of other ways to get the data
effectively out of the kernel, too).

> Results are:
>
> kernel without fork connector
> real : 8m17.042s 8m10.113s 8m08.597s 8m10.068s 8m08.930s
> user : 7m32.376s 7m35.985s 7m34.424s 7m34.221s 7m34.835s
> sys : 0m50.730s 0m51.139s 0m51.159s 0m51.406s 0m51.020s
>
> kernel with the fork connector
> real : 8m14.492s 8m08.656s 8m07.754s 8m08.002s 8m07.854s
> user : 7m31.664s 7m33.528s 7m33.625s 7m33.500s 7m33.822s
> sys : 0m50.651s 0m51.222s 0m51.102s 0m51.367s 0m50.894s
>
> kernel with the fork connector + application listens
> real : 8m08.596s 8m08.950s 8m08.899s 8m08.678s 8m08.987s
> user : 7m33.312s 7m33.898s 7m34.004s 7m33.285s 7m33.628s
> sys : 0m52.222s 0m52.013s 0m51.809s 0m52.361s 0m52.036s

I liked the previous lean implementation more, but the performance
of this one doesn't look at all as scary as I thought.

Best regards,
Erich

2005-02-21 17:56:51

by Paul Jackson

[permalink] [raw]

Subject: Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

Guillaume wrote:
>
> I understand your point of view but I'm using netlink interface
> because it's already in the kernel so my choice is to use something that
> is already in the kernel instead of adding dozens of new instructions
> and also to do things in user space.

All else equal, yes it is good to use what facilities are
already available, and it is good to do things in user space.

If one adds many cpu cycles and quite a few pages of memory
to read and touch to every fork, then all else is not equal.

> To test performances, I tried to
> compile a kernel several times with and without the fork connector

I agree that a kernel compile does not measure well fork costs.

There is a good benchmark of fork, exit and other facilities such
as socket, bind and mmap, at:

http://bulk.fefe.de/scalability/

(the trailing '/' is needed).

You might try downloading them (see the cvs instructions on this page)
and seeing how your changes impact fork and exit. I had to fiddle with
the rl.rlim counts in forkbench.c to get my copy to run without exceeding
rlimits on fork.

Notice also the rather dramatic improvements from Linux 2.4 to 2.6 in
some of these benchmarks, described elsewhere on the above page. The
Linux developers take this stuff seriously, and have provided some
seriously good performance under load.

I'd be quite interested to see how your changes affect this benchmark.
Or perhaps you can find some other good measure of fork and exit costs,
the two areas that accounting is at immediate risk of impacting.

> I also choose this implementation because Erich Focht wrote in the
> email http://lkml.org/lkml/2004/12/17/99 that keeps the historic about
> the creation of processes "sounds very useful for a lot of interesting
> stuff".

(Some of us who only speak English would find 'history' more idiomatic
here than 'historic' ...)

Adding framework on the basis of such potential future useful value
is a hard sell in Linux land. It is better to wait until each need is
immediately clear, and it is essential to keep the kernel infrastructure
is light weight as we can, or it will overwhelm the mental capacity of
most of us working on it, including myself for sure ;).

> Thank Paul for your comments,

You're welcome. Thanks for tackling this task.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373, 1.925.600.0401

2005-02-23 08:52:43

by Guillaume Thouvenin

[permalink] [raw]

Subject: Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

Hello,

This patch replaces the relay_fork module and it implements a fork
connector in the kernel/fork.c:do_fork() routine. The connector sends
information about parent PID and child PID over a netlink interface. It
allows to several user space applications to be informed when a fork
occurs in the kernel. The main drawback is that even if nobody listens,
message is send. I don't know how to avoid that.

It is used by ELSA to manage group of processes in user space.

ChangeLog:

- Add a "depends on CONNECTOR=y" in the Kconfig
- Remove the macro #if defined(CONFIG_CONNECTOR)
- cn_netlink_send() sends messages to the correct group which is
CN_IDX_FORK

Thanks to Evgeniy and to everyone for the comments,
Guillaume

Signed-off-by: Guillaume Thouvenin <[email protected]>

---

drivers/connector/Kconfig | 8 ++++++++
include/linux/connector.h | 2 ++
kernel/fork.c | 41 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 51 insertions(+)

diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/drivers/connector/Kconfig linux-2.6.11-rc3-mm2-cnfork/drivers/connector/Kconfig
--- linux-2.6.11-rc3-mm2/drivers/connector/Kconfig 2005-02-11 11:00:16.000000000 +0100
+++ linux-2.6.11-rc3-mm2-cnfork/drivers/connector/Kconfig 2005-02-21 09:52:22.000000000 +0100
@@ -10,4 +10,12 @@ config CONNECTOR
Connector support can also be built as a module. If so, the module
will be called cn.ko.

+config FORK_CONNECTOR
+ bool "Enable fork connector"
+ depends on CONNECTOR=y
+ ---help---
+ It adds a connector in kernel/fork.c:do_fork() function. When a fork
+ occurs, netlink is used to transfer information about the parent and
+ its child. This information can be used by a user space application.
+
endmenu
diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/include/linux/connector.h linux-2.6.11-rc3-mm2-cnfork/include/linux/connector.h
--- linux-2.6.11-rc3-mm2/include/linux/connector.h 2005-02-11 11:00:18.000000000 +0100
+++ linux-2.6.11-rc3-mm2-cnfork/include/linux/connector.h 2005-02-16 15:07:46.000000000 +0100
@@ -28,6 +28,8 @@
#define CN_VAL_KOBJECT_UEVENT 0x0000
#define CN_IDX_SUPERIO 0xaabb /* SuperIO subsystem */
#define CN_VAL_SUPERIO 0xccdd
+#define CN_IDX_FORK 0xfeed /* fork events */
+#define CN_VAL_FORK 0xbeef

#define CONNECTOR_MAX_MSG_SIZE 1024
diff -uprN -X dontdiff linux-2.6.11-rc3-mm2/kernel/fork.c linux-2.6.11-rc3-mm2-cnfork/kernel/fork.c
--- linux-2.6.11-rc3-mm2/kernel/fork.c 2005-02-11 11:00:18.000000000 +0100
+++ linux-2.6.11-rc3-mm2-cnfork/kernel/fork.c 2005-02-21 09:52:40.000000000 +0100
@@ -41,6 +41,7 @@
#include <linux/profile.h>
#include <linux/rmap.h>
#include <linux/acct.h>
+#include <linux/connector.h>

#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -63,6 +64,44 @@ DEFINE_PER_CPU(unsigned long, process_co

EXPORT_SYMBOL(tasklist_lock);

+#ifdef CONFIG_FORK_CONNECTOR
+#define FORK_CN_INFO_SIZE 64
+static inline void fork_connector(pid_t parent, pid_t child)
+{
+ struct cb_id fork_id = {CN_IDX_FORK, CN_VAL_FORK};
+ static __u32 seq; /* used to test if we lost message */
+
+ if (cn_already_initialized) {
+ struct cn_msg *msg;
+ size_t size;
+
+ size = sizeof(*msg) + FORK_CN_INFO_SIZE;
+ msg = kmalloc(size, GFP_KERNEL);
+ if (msg) {
+ memset(msg, '\0', size);
+ memcpy(&msg->id, &fork_id, sizeof(msg->id));
+ msg->seq = seq++;
+ msg->ack = 0; /* not used */
+ /*
+ * size of data is the number of characters
+ * printed plus one for the trailing '\0'
+ */
+ msg->len = snprintf(msg->data, FORK_CN_INFO_SIZE-1,
+ "%i %i", parent, child) + 1;
+
+ cn_netlink_send(msg, CN_IDX_FORK);
+
+ kfree(msg);
+ }
+ }
+}
+#else
+static inline void fork_connector(pid_t parent, pid_t child)
+{
+ return;
+}
+#endif
+
int nr_processes(void)
{
int cpu;
@@ -1238,6 +1277,8 @@ long do_fork(unsigned long clone_flags,
if (unlikely (current->ptrace & PT_TRACE_VFORK_DONE))
ptrace_notify ((PTRACE_EVENT_VFORK_DONE << 8) | SIGTRAP);
}
+
+ fork_connector(current->pid, p->pid);
} else {
free_pidmap(pid);
pid = PTR_ERR(p);

2005-02-23 09:09:12

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

Guillaume Thouvenin <[email protected]> wrote:
>
> Hello,
>
> This patch replaces the relay_fork module and it implements a fork
> connector in the kernel/fork.c:do_fork() routine. The connector sends
> information about parent PID and child PID over a netlink interface. It
> allows to several user space applications to be informed when a fork
> occurs in the kernel. The main drawback is that even if nobody listens,
> message is send. I don't know how to avoid that.

We really should find a way to fix that. Especially if we want all the
distributors to enable the connector in their builds (we do).

What happened to the idea of sending an on/off message down the netlink
socket?

> +#ifdef CONFIG_FORK_CONNECTOR
> +#define FORK_CN_INFO_SIZE 64
> +static inline void fork_connector(pid_t parent, pid_t child)
> +{
> + struct cb_id fork_id = {CN_IDX_FORK, CN_VAL_FORK};

This can be static const, which will save some stack and cycles.

> + static __u32 seq; /* used to test if we lost message */
> +
> + if (cn_already_initialized) {
> + struct cn_msg *msg;
> + size_t size;
> +
> + size = sizeof(*msg) + FORK_CN_INFO_SIZE;
> + msg = kmalloc(size, GFP_KERNEL);
> + if (msg) {
> + memset(msg, '\0', size);

Do we really need to memset the whole thing?

> + memcpy(&msg->id, &fork_id, sizeof(msg->id));
> + msg->seq = seq++;

`seq' needs a lock to protect it. Or use atomic_add_return(), maybe.

> + msg->ack = 0; /* not used */
> + /*
> + * size of data is the number of characters
> + * printed plus one for the trailing '\0'
> + */
> + msg->len = snprintf(msg->data, FORK_CN_INFO_SIZE-1,
> + "%i %i", parent, child) + 1;

scnprintf() would be more appropriate here, even though it should be a "can't
happen".

> + cn_netlink_send(msg, CN_IDX_FORK);
> +
> + kfree(msg);

`msg' is only 84 bytes and do_fork() has a shallow call graph. Make `msg'
a local?

2005-02-23 10:44:17

by Evgeniy Polyakov

[permalink] [raw]

Subject: Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

On Wed, 23 Feb 2005 01:07:47 -0800
Andrew Morton <[email protected]> wrote:

> Guillaume Thouvenin <[email protected]> wrote:
> >
> > Hello,
> >
> > This patch replaces the relay_fork module and it implements a fork
> > connector in the kernel/fork.c:do_fork() routine. The connector sends
> > information about parent PID and child PID over a netlink interface. It
> > allows to several user space applications to be informed when a fork
> > occurs in the kernel. The main drawback is that even if nobody listens,
> > message is send. I don't know how to avoid that.
>
> We really should find a way to fix that. Especially if we want all the
> distributors to enable the connector in their builds (we do).

Mesage is never reached anyone if there are no listeners, skb will be just freed,
even without any linking.
do_one_broadcast() in net/netlink/af_netlink.c takes care of it.
Unicast message also will be discarded in cn_rx_skb().

These operations are quite cheap - just link/unlink skb to/from appropriate
queues.

> What happened to the idea of sending an on/off message down the netlink
> socket?

?

> > +#ifdef CONFIG_FORK_CONNECTOR
> > +#define FORK_CN_INFO_SIZE 64
> > +static inline void fork_connector(pid_t parent, pid_t child)
> > +{
> > + struct cb_id fork_id = {CN_IDX_FORK, CN_VAL_FORK};
>
> This can be static const, which will save some stack and cycles.
>
> > + static __u32 seq; /* used to test if we lost message */
> > +
> > + if (cn_already_initialized) {
> > + struct cn_msg *msg;
> > + size_t size;
> > +
> > + size = sizeof(*msg) + FORK_CN_INFO_SIZE;
> > + msg = kmalloc(size, GFP_KERNEL);
> > + if (msg) {
> > + memset(msg, '\0', size);
>
> Do we really need to memset the whole thing?

Yes, to not leak kernel memory context.

> > + memcpy(&msg->id, &fork_id, sizeof(msg->id));
> > + msg->seq = seq++;
>
> `seq' needs a lock to protect it. Or use atomic_add_return(), maybe.

Not necessary, I doubt fork userspace listener uses protocol described in
connector.c and relies on seq field since it is not needed to have acks/replays.
Although it can be used as a flag that it is new fork, but message itself
is already such an event.

> > + msg->ack = 0; /* not used */
> > + /*
> > + * size of data is the number of characters
> > + * printed plus one for the trailing '\0'
> > + */
> > + msg->len = snprintf(msg->data, FORK_CN_INFO_SIZE-1,
> > + "%i %i", parent, child) + 1;
>
> scnprintf() would be more appropriate here, even though it should be a "can't
> happen".
>
> > + cn_netlink_send(msg, CN_IDX_FORK);
> > +
> > + kfree(msg);
>
> `msg' is only 84 bytes and do_fork() has a shallow call graph. Make `msg'
> a local?

Yes it can be local.

Evgeniy Polyakov

Only failure makes us experts. -- Theo de Raadt

2005-02-23 10:59:27

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

Evgeniy Polyakov <[email protected]> wrote:
>
> On Wed, 23 Feb 2005 01:07:47 -0800
> Andrew Morton <[email protected]> wrote:
>
> > Guillaume Thouvenin <[email protected]> wrote:
> > >
> > > Hello,
> > >
> > > This patch replaces the relay_fork module and it implements a fork
> > > connector in the kernel/fork.c:do_fork() routine. The connector sends
> > > information about parent PID and child PID over a netlink interface. It
> > > allows to several user space applications to be informed when a fork
> > > occurs in the kernel. The main drawback is that even if nobody listens,
> > > message is send. I don't know how to avoid that.
> >
> > We really should find a way to fix that. Especially if we want all the
> > distributors to enable the connector in their builds (we do).
>
> Mesage is never reached anyone if there are no listeners, skb will be just freed,
> even without any linking.
> do_one_broadcast() in net/netlink/af_netlink.c takes care of it.
> Unicast message also will be discarded in cn_rx_skb().

We should assume that there will always be listeners. (why was the
connector thing added anyway? Its changelog is pathetic).

> These operations are quite cheap - just link/unlink skb to/from appropriate
> queues.

Please assume that <whatever secret application the connector stuff was
originally written for> will always be listening.

> > What happened to the idea of sending an on/off message down the netlink
> > socket?
>
> ?

All those emails I sent last week.

Arrange for the userspace daemon to send a message to the fork_connector
subsystem turning it on or off. So we can bypass all this code in the
common case where <secret application> is listening, but your daemon is
not.

> > > + if (msg) {
> > > + memset(msg, '\0', size);
> >
> > Do we really need to memset the whole thing?
>
> Yes, to not leak kernel memory context.

How would we do that? There are no gaps in the payload and we tell netlink
the exact length.

> > > + memcpy(&msg->id, &fork_id, sizeof(msg->id));
> > > + msg->seq = seq++;
> >
> > `seq' needs a lock to protect it. Or use atomic_add_return(), maybe.
>
> Not necessary, I doubt fork userspace listener uses protocol described in
> connector.c and relies on seq field since it is not needed to have acks/replays.
> Although it can be used as a flag that it is new fork, but message itself
> is already such an event.

Without a lock you can have two messages with the same sequence number.
Even if the daemon which you're planning on implementing can handle that,
we shouldn't allow it.

2005-02-23 11:17:05

by Evgeniy Polyakov

[permalink] [raw]

Subject: Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

On Wed, 23 Feb 2005 02:58:06 -0800
Andrew Morton <[email protected]> wrote:

> Evgeniy Polyakov <[email protected]> wrote:
> >
> > On Wed, 23 Feb 2005 01:07:47 -0800
> > Andrew Morton <[email protected]> wrote:
> >
> > > Guillaume Thouvenin <[email protected]> wrote:
> > > >
> > > > Hello,
> > > >
> > > > This patch replaces the relay_fork module and it implements a fork
> > > > connector in the kernel/fork.c:do_fork() routine. The connector sends
> > > > information about parent PID and child PID over a netlink interface. It
> > > > allows to several user space applications to be informed when a fork
> > > > occurs in the kernel. The main drawback is that even if nobody listens,
> > > > message is send. I don't know how to avoid that.
> > >
> > > We really should find a way to fix that. Especially if we want all the
> > > distributors to enable the connector in their builds (we do).
> >
> > Mesage is never reached anyone if there are no listeners, skb will be just freed,
> > even without any linking.
> > do_one_broadcast() in net/netlink/af_netlink.c takes care of it.
> > Unicast message also will be discarded in cn_rx_skb().
>
> We should assume that there will always be listeners. (why was the
> connector thing added anyway? Its changelog is pathetic).
>
> > These operations are quite cheap - just link/unlink skb to/from appropriate
> > queues.
>
> Please assume that <whatever secret application the connector stuff was
> originally written for> will always be listening.
>
> > > What happened to the idea of sending an on/off message down the netlink
> > > socket?
> >
> > ?
>
> All those emails I sent last week.
>
> Arrange for the userspace daemon to send a message to the fork_connector
> subsystem turning it on or off. So we can bypass all this code in the
> common case where <secret application> is listening, but your daemon is
> not.

Ok, now I see(I'm not a fork connector author, so I did not receive them).
That will require to add real fork connector with callback routing.
Guillaume?

> > > > + if (msg) {
> > > > + memset(msg, '\0', size);
> > >
> > > Do we really need to memset the whole thing?
> >
> > Yes, to not leak kernel memory context.
>
> How would we do that? There are no gaps in the payload and we tell netlink
> the exact length.

Without memset kernel memory from slab cache will be sent to the userspace.
Neither kmalloc() itself nor cache does not prefill the allocated area.

> > > > + memcpy(&msg->id, &fork_id, sizeof(msg->id));
> > > > + msg->seq = seq++;
> > >
> > > `seq' needs a lock to protect it. Or use atomic_add_return(), maybe.
> >
> > Not necessary, I doubt fork userspace listener uses protocol described in
> > connector.c and relies on seq field since it is not needed to have acks/replays.
> > Although it can be used as a flag that it is new fork, but message itself
> > is already such an event.
>
> Without a lock you can have two messages with the same sequence number.
> Even if the daemon which you're planning on implementing can handle that,
> we shouldn't allow it.

Yes, they can have the same number, but does it cost atomic/lock overhead?
Anyway, simple spin_lock() should be enough in do_fork() context.
Guillaume?

Evgeniy Polyakov

Only failure makes us experts. -- Theo de Raadt

2005-02-24 06:41:22

by Guillaume Thouvenin

[permalink] [raw]

Subject: Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

On Wed, 2005-02-23 at 14:41 +0300, Evgeniy Polyakov wrote:
> > Please assume that <whatever secret application the connector stuff was
> > originally written for> will always be listening.
> >
> > > > What happened to the idea of sending an on/off message down the netlink
> > > > socket?
> > ...
> > Arrange for the userspace daemon to send a message to the fork_connector
> > subsystem turning it on or off. So we can bypass all this code in the
> > common case where <secret application> is listening, but your daemon is
> > not.
>
> Ok, now I see(I'm not a fork connector author, so I did not receive them).
> That will require to add real fork connector with callback routing.
> Guillaume?

Yes the connector's callback is a good solution. I will add a fork
enable/disable callback in drivers/connector/connector.c that will
switch a global variable when called from user space. It will be
something like:

void cn_fork_callback(void)
{
if (cn_already_initialized)
cn_fork_enable = cn_fork_enable ? 0 : 1 ;
}

With cn_fork_enable set to 0 by default. In the do_fork() I will replace
the statement "if (cn_already_initialized)" by "if (cn_fork_enable)"

> > Without a lock you can have two messages with the same sequence number.
> > Even if the daemon which you're planning on implementing can handle that,
> > we shouldn't allow it.
>
> Yes, they can have the same number, but does it cost atomic/lock overhead?
> Anyway, simple spin_lock() should be enough in do_fork() context.
> Guillaume?

I will protect the incrementation by a spin_lock(&fork_cn_lock).

Guillaume

2005-02-24 09:15:54

by Evgeniy Polyakov

[permalink] [raw]

Subject: Re: [PATCH 2.6.11-rc3-mm2] connector: Add a fork connector

On Thu, 2005-02-24 at 07:41 +0100, Guillaume Thouvenin wrote:
> On Wed, 2005-02-23 at 14:41 +0300, Evgeniy Polyakov wrote:
> > > Please assume that <whatever secret application the connector stuff was
> > > originally written for> will always be listening.
> > >
> > > > > What happened to the idea of sending an on/off message down the netlink
> > > > > socket?
> > > ...
> > > Arrange for the userspace daemon to send a message to the fork_connector
> > > subsystem turning it on or off. So we can bypass all this code in the
> > > common case where <secret application> is listening, but your daemon is
> > > not.
> >
> > Ok, now I see(I'm not a fork connector author, so I did not receive them).
> > That will require to add real fork connector with callback routing.
> > Guillaume?
>
> Yes the connector's callback is a good solution. I will add a fork
> enable/disable callback in drivers/connector/connector.c that will
> switch a global variable when called from user space. It will be
> something like:
>
> void cn_fork_callback(void)
> {
> if (cn_already_initialized)
> cn_fork_enable = cn_fork_enable ? 0 : 1 ;
> }
>
> With cn_fork_enable set to 0 by default. In the do_fork() I will replace
> the statement "if (cn_already_initialized)" by "if (cn_fork_enable)"

Yes, it is right solution, but I do not think generic connector should
have it.
Create your own module that will "depend on CONNECTOR=y", which will
register
callback and export some function that will be called from do_fork() if
FORK_CONNECTOR is defined and do nothing otherwise.
I think you even need to create some simple protocol over connector,
i.e.:

void cn_fork_callback(void *data)
{
struct cn_msg *msg = (struct cn_msg *)data;

if (msg->len != sizeof(cn_fork_enable))
return;

memcpy(&cn_fork_enable, msg->data, sizeof(cn_fork_enable));
}

And register cn_for_callback() with the connector.

By default cn_fork_enable is zero and fork_connector() called from
do_fork()
will do nothing, otherwise cn_netlink_send(data, 0);
Since something is registered with connector then "0" here will select
appropriate group.
Or you may still use CN_IDX_FORK.

Userspace daemod, which is bound to the CN_IDX_FORK group will send
cn_msg with the appropriate
enable/disable message.

Or you can even create more complex protocol, which will enable/disable
various fork parameters to be logged...

> > > Without a lock you can have two messages with the same sequence number.
> > > Even if the daemon which you're planning on implementing can handle that,
> > > we shouldn't allow it.
> >
> > Yes, they can have the same number, but does it cost atomic/lock overhead?
> > Anyway, simple spin_lock() should be enough in do_fork() context.
> > Guillaume?
>
> I will protect the incrementation by a spin_lock(&fork_cn_lock).
>
> Guillaume
--
Evgeniy Polyakov

Crash is better than data corruption -- Arthur Grabowski

Attachments:

signature.asc (189.00 B)
This is a digitally signed message part