2006-01-23 00:43:56

by Albert Cahalan

[permalink] [raw]
Subject: anon unions are cool

Now that the kernel requires a compiler that
supports anonymous unions, we can do some
really neat stuff.

For example, suppose we wanted to rename
a badly-named struct member. If the struct is
used all over the kernel, this becomes a giant
project with huge potential for patch conflicts.

Anonymous unions solve this. First, we replace
the badly-named struct member with an anonymous
union containing both the old name and the new
name. Second, we change much of the kernel
to use the new name. Third, we mark the old name
with the __depricated tag to generate warnings.
Fourth, we get the stragglers over days or months.
Fifth, we get rid of the anonymous union by replacing
it with the new name.

The struct evolves like so:

struct a {
int foo;
int bad;
int bar;
};

struct a {
int foo;
union {
int bad;
int baz;
};
int bar;
};

struct a {
int foo;
union {
int __deprecated bad;
int baz;
};
int bar;
};

struct a {
int foo;
int baz;
int bar;
};


2006-01-23 00:56:32

by Kyle Moffett

[permalink] [raw]
Subject: Re: anon unions are cool

On Jan 22, 2006, at 19:36, Albert Cahalan wrote:
> For example, suppose we wanted to rename a badly-named struct
> member. If the struct is used all over the kernel, this becomes a
> giant project with huge potential for patch conflicts.

Actually, pure struct-member renames are not that common, however it
_is_ common to add an accessor method due to upcoming locking/
virtualization changes or similar. For that case (Using a random
recent example from the list. This was decided to not be the best
route for pid virtualization, but it's just an example):

struct task_struct {
[...]

pid_t pid;

[...]
};

This would become:

struct task_struct {
[...]

union {
pid_t pid __deprecated;
pid_t __internal_pid;
};

[...]
};

static inline pid_t task_pid(struct task_struct *t) {
return t->__internal_pid;
}

Code that used task->pid would get a deprecation warning, however new
code that used the task_pid(task) would work fine; and the task_pid()
function itself would generate no warnings.

Cheers,
Kyle Moffett

--
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it.
-- Brian Kernighan


2006-01-23 01:24:04

by Albert Cahalan

[permalink] [raw]
Subject: Re: anon unions are cool

On 1/22/06, Kyle Moffett <[email protected]> wrote:
> On Jan 22, 2006, at 19:36, Albert Cahalan wrote:
> > For example, suppose we wanted to rename a badly-named struct
> > member. If the struct is used all over the kernel, this becomes a
> > giant project with huge potential for patch conflicts.
>
> Actually, pure struct-member renames are not that common, however it
> _is_ common to add an accessor method due to upcoming locking/
> virtualization changes or similar. For that case (Using a random
> recent example from the list. This was decided to not be the best
> route for pid virtualization, but it's just an example):
>
> struct task_struct {
> [...]
>
> pid_t pid;
>
> [...]
> };

This case is rather interesting. At more than one place I've worked,
I found people assuming that the kernel's "pid" was a pid, when in
fact it is a tid. (a "task ID" or "thread ID") The confusion probably
leads to kernel bugs. I've seen many bugs related to this, though I
can't be 100% sure that they don't all involve code that existed prior
to the tgid concept.

2006-01-23 18:26:22

by Eric W. Biederman

[permalink] [raw]
Subject: Re: anon unions are cool

Albert Cahalan <[email protected]> writes:

> This case is rather interesting. At more than one place I've worked,
> I found people assuming that the kernel's "pid" was a pid, when in
> fact it is a tid. (a "task ID" or "thread ID") The confusion probably
> leads to kernel bugs. I've seen many bugs related to this, though I
> can't be 100% sure that they don't all involve code that existed prior
> to the tgid concept.

Actually this is a really weird area. A lot of it depends on your
perspective. current->pid is more than just a thread group ID.

You can always use kill to send a signal to the ``pid'' of any
task. However if that task is part of a thread group the signal
might go to one of that threads siblings.

If you were to follow the normal rules and send to a signal
to a thread group. All threads would get it. Although I
don't think the kernel has code for that.

So from a signal perspective current->pid is the pid.

However get_pid has been modified to return the thread group id.
Instead of the PID. And a lot of times when you are exporting
information to user space you want the thread group id.

But in practice neither thread ID nor process ID map directly
to the kernels concept.

Then there is the other weird side where only the leaders of
thread groups are placed in sessions and process groups.

Eric


2006-01-24 03:39:49

by Albert Cahalan

[permalink] [raw]
Subject: Re: anon unions are cool

On 1/23/06, Eric W. Biederman <[email protected]> wrote:
> Albert Cahalan <[email protected]> writes:
>
> > This case is rather interesting. At more than one place I've worked,
> > I found people assuming that the kernel's "pid" was a pid, when in
> > fact it is a tid. (a "task ID" or "thread ID") The confusion probably
> > leads to kernel bugs. I've seen many bugs related to this, though I
> > can't be 100% sure that they don't all involve code that existed prior
> > to the tgid concept.
>
> Actually this is a really weird area. A lot of it depends on your
> perspective. current->pid is more than just a thread group ID.

It's not a thread group ID at all. It's a task ID or thread ID.

The thread group ID is something different. It's the same as
a POSIX process ID. The kernel uses "tgid" as the name.

At least you're not alone in your confusion. :-)

> You can always use kill to send a signal to the ``pid'' of any
> task. However if that task is part of a thread group the signal
> might go to one of that threads siblings.

This is sort of a bug really, though we're probably stuck
with it now. You should get errno=ESRCH when you use
an ID that does not match up with a tgid.

The "feature" doesn't work for thread group leaders
because kill() doesn't let you specify that you really
want to address the task rather than the task group.
We have tkill() and tgkill() to address this problem.

> If you were to follow the normal rules and send to a signal
> to a thread group. All threads would get it. Although I
> don't think the kernel has code for that.

That would be wrong for most signals. The kernel follows
the POSIX behavior, with "process" being struct signal or
the tgid. A few signals, like SIGKILL, go the the whole process.
All others are delivered to the first task willing to take it,
which is how POSIX treats a process with threads. (there
is also different behavior for per-task stuff like SIGSEGV)

> So from a signal perspective current->pid is the pid.

Aside from bugs, no.

> However get_pid has been modified to return the thread group id.
> Instead of the PID. And a lot of times when you are exporting
> information to user space you want the thread group id.
>
> But in practice neither thread ID nor process ID map directly
> to the kernels concept.

They sure do, unless you play with abnormal clone() flags.

> Then there is the other weird side where only the leaders of
> thread groups are placed in sessions and process groups.

That's not at all weird. It fits perfectly with the use of the tgid
as the POSIX PID and the use of the "pid" (ugh) as the POSIX
thread ID.

Aside from POSIX just being arguably weird, the only weird
things here are:

1. kill() not returning errno=ESRCH when it should
2. the name "pid" being used oddly in the kernel

2006-01-24 06:04:22

by Kyle Moffett

[permalink] [raw]
Subject: Re: anon unions are cool

On Jan 23, 2006, at 22:39, Albert Cahalan wrote:
>> Then there is the other weird side where only the leaders of
>> thread groups are placed in sessions and process groups.
>
> That's not at all weird. It fits perfectly with the use of the tgid
> as the POSIX PID and the use of the "pid" (ugh) as the POSIX thread
> ID.
>
> Aside from POSIX just being arguably weird, the only weird things
> here are:
>
> 1. kill() not returning errno=ESRCH when it should
> 2. the name "pid" being used oddly in the kernel

And actually, my use of the task->pid member was correct. The "pid
virtualization" patches are virtualizing not only the process IDs,
but the thread IDs as well (the whole point is to provide completely
unique PID/TID spaces, so everything needs to be virtualized). In
any case, it was just a poorly chosen example (because that
particular virtualization bit was not necessary). I agree with you
that the kernel's behavior is weird, but it has its reasons and a lot
of history. If you think it's that important, a cleanup patch
(assuming it's not too intrusive) would probably be welcomed.

Cheers,
Kyle Moffett

--
I lost interest in "blade servers" when I found they didn't throw
knives at people who weren't supposed to be in your machine room.
-- Anthony de Boer