the attached patch implements a new syscall on x86, clone_startup().
The parameters are:
clone_startup(fn, child_stack, flags, tls_desc, pid_addr)
with the help of this syscall glibc's next-generation pthreads code is
able to perform single-syscall thread creation: clone_startup() sets up
the child TLS and writes the child PID back into userspace. (which address
points to the thread control block.).
the child PID has to be written back because otherwise the parent and the
child would have to do expensive and unrobust synchronization. The first
instruction the child executes might as well be a signal handler, and
glibc code needs the TLS and the PID of the thread. There are a number of
workarounds in userspace that can solve this problem without
clone_startup(), but each of them has disadvantages:
- the parent might disable all signals across clone() calls. This adds 3
extra syscalls: 2 in the parent, 1 in the child. The old LinuxThreads
code did something like this.
- glibc could check whether the PID is available and call getpid() if
not, but this introduces runtime overhead.
- glibc might define a wrapper function around signal handlers - this is
both unrobust and might cause compatibility problems.
- glibc could use a mutex with the parent to let the parent fill in the
PID. This causes two extra context-switches if the child is executed
first after clone().
and the kernel can indeed provide a pretty good solution - why not do it
this way.
the TLS setup avoids an extra set_thread_area() syscalls. [Standalone
glibc applications still use set_thread_area(), so this syscall does not
obsolete set_thread_area().]
Implementational issues: i've introduced a new kernel-internal clone flag,
CLONE_STARTUP. In theory we could use the existing clone() syscall and let
applications fill in CLONE_STARTUP - i felt uneasy about this solution
because it introduces a versioned sys_clone() parametering, makings things
messy. But it would undoubtedly work safely and reliably, and it's even a
bit slower than the separated syscalls solution this patch adds.
for performance reasons the kernel does not recognize the -1 TLS
descriptor number, but this is a non-issue, child threads inherit the TLS
index of the parent anyway. Also, the kernel does not allow an 'empty' TLS
to be defined.
Comments?
Ingo
--- linux/arch/i386/kernel/process.c.orig Tue Aug 13 16:24:59 2002
+++ linux/arch/i386/kernel/process.c Tue Aug 13 16:51:51 2002
@@ -559,6 +559,7 @@
unsigned long unused,
struct task_struct * p, struct pt_regs * regs)
{
+ struct thread_struct *t = &p->thread;
struct pt_regs * childregs;
struct task_struct *tsk;
@@ -567,17 +568,42 @@
childregs->eax = 0;
childregs->esp = esp;
- p->thread.esp = (unsigned long) childregs;
- p->thread.esp0 = (unsigned long) (childregs+1);
+ t->esp = (unsigned long) childregs;
+ t->esp0 = (unsigned long) (childregs+1);
+ t->eip = (unsigned long) ret_from_fork;
- p->thread.eip = (unsigned long) ret_from_fork;
-
- savesegment(fs,p->thread.fs);
- savesegment(gs,p->thread.gs);
+ savesegment(fs, t->fs);
+ savesegment(gs, t->gs);
tsk = current;
unlazy_fpu(tsk);
- struct_cpy(&p->thread.i387, &tsk->thread.i387);
+ struct_cpy(&t->i387, &tsk->thread.i387);
+
+ /*
+ * In the clone_startup() case we set up the child's
+ * TLS, and we also put its TID into userspace.
+ */
+ if (clone_flags & CLONE_STARTUP) {
+ struct desc_struct *desc;
+ struct user_desc info;
+ int idx;
+
+ if (copy_from_user(&info, (void *)childregs->esi, sizeof(info)))
+ return -EFAULT;
+ if (LDT_empty(&info))
+ return -EINVAL;
+
+ idx = info.entry_number;
+ if (idx < GDT_ENTRY_TLS_MIN || idx > GDT_ENTRY_TLS_MAX)
+ return -EINVAL;
+
+ desc = t->tls_array + idx - GDT_ENTRY_TLS_MIN;
+ desc->a = LDT_entry_a(&info);
+ desc->b = LDT_entry_b(&info);
+
+ if (put_user(p->pid, (pid_t *)childregs->edx))
+ return -EFAULT;
+ }
if (unlikely(NULL != tsk->thread.ts_io_bitmap)) {
p->thread.ts_io_bitmap = kmalloc(IO_BITMAP_BYTES, GFP_KERNEL);
@@ -744,18 +770,27 @@
return IS_ERR(p) ? PTR_ERR(p) : p->pid;
}
-asmlinkage int sys_clone(struct pt_regs regs)
+static inline int __clone(struct pt_regs *regs, unsigned long clone_flags)
{
struct task_struct *p;
- unsigned long clone_flags;
unsigned long newsp;
- clone_flags = regs.ebx;
- newsp = regs.ecx;
+ clone_flags = regs->ebx;
+ newsp = regs->ecx;
if (!newsp)
- newsp = regs.esp;
- p = do_fork(clone_flags & ~CLONE_IDLETASK, newsp, ®s, 0);
+ newsp = regs->esp;
+ p = do_fork(clone_flags & ~CLONE_IDLETASK, newsp, regs, 0);
return IS_ERR(p) ? PTR_ERR(p) : p->pid;
+}
+
+asmlinkage int sys_clone(struct pt_regs regs)
+{
+ return __clone(®s, regs.ebx & ~CLONE_STARTUP);
+}
+
+asmlinkage int sys_clone_startup(struct pt_regs regs)
+{
+ return __clone(®s, regs.ebx | CLONE_STARTUP);
}
/*
--- linux/arch/i386/kernel/entry.S.orig Tue Aug 13 16:50:01 2002
+++ linux/arch/i386/kernel/entry.S Tue Aug 13 16:49:48 2002
@@ -754,6 +754,7 @@
.long sys_sched_getaffinity
.long sys_set_thread_area
.long sys_get_thread_area
+ .long sys_clone_startup /* 245 */
.rept NR_syscalls-(.-sys_call_table)/4
.long sys_ni_syscall
--- linux/include/linux/sched.h.orig Tue Aug 13 16:48:27 2002
+++ linux/include/linux/sched.h Tue Aug 13 16:49:09 2002
@@ -45,6 +45,7 @@
#define CLONE_THREAD 0x00010000 /* Same thread group? */
#define CLONE_NEWNS 0x00020000 /* New namespace group? */
#define CLONE_SYSVSEM 0x00040000 /* share system V SEM_UNDO semantics */
+#define CLONE_STARTUP 0x00080000 /* create child state */
#define CLONE_SIGNAL (CLONE_SIGHAND | CLONE_THREAD)
--- linux/include/asm-i386/unistd.h.orig Tue Aug 13 16:50:11 2002
+++ linux/include/asm-i386/unistd.h Tue Aug 13 16:50:28 2002
@@ -249,6 +249,7 @@
#define __NR_sched_getaffinity 242
#define __NR_set_thread_area 243
#define __NR_get_thread_area 244
+#define __NR_clone_startup 245
/* user-visible error numbers are in the range -1 - -124: see <asm-i386/errno.h> */
On Tue, 13 Aug 2002, Ingo Molnar wrote:
>
> the attached patch implements a new syscall on x86, clone_startup().
Hmm.
Please don't do that CLONE_STARTUP flag thing. Just do the code inside the
sys_clone_startup(), instead of dynamically adding a new flag. And that
code looks like it returns EFAULT without freeing the stuff it has
allocated.
Also, "caching" &tsk->thread only adds code - even if it makes the sources
slightly shorter. It adds a register that is just a constant offset from
another register.
Other than that the thing seems to make sense.
Linus
On Tue, Aug 13, 2002 at 05:09:03PM +0200, Ingo Molnar wrote:
>
> the attached patch implements a new syscall on x86, clone_startup().
> The parameters are:
>
> clone_startup(fn, child_stack, flags, tls_desc, pid_addr)
First the name souns horrible. What about spawn_thread or create_thread
instead? it's not our good old clone and not a lookalike, it's some
pthreadish monster..
> with the help of this syscall glibc's next-generation pthreads code
have you discussed this code with IBM's pthread group? I think we don't
want to add a new syscall that is only useful to glibc..
On Tue, Aug 13, 2002 at 09:01:56AM -0700, Linus Torvalds wrote:
> > First the name souns horrible. What about spawn_thread or create_thread
> > instead? it's not our good old clone and not a lookalike, it's some
> > pthreadish monster..
>
> This one definitely isn't a pthread-specific problem. The old UNIX fork()
> semantics for <pid> returning really are fairly broken, since the notion
> of returning the pid in a local register is inherently racy for _anything_
> that wants to maintain a list of its children and needs to access the list
> in the SIGCHLD handler.
The TLS setting makes it pretty pthread-specific, though (or at least
thread-specific). Also the fn parameter makes it very different from
both clone and fork. What about spawn() if you dislike a thread in the name?
On Tue, 13 Aug 2002, Christoph Hellwig wrote:
>
> First the name souns horrible. What about spawn_thread or create_thread
> instead? it's not our good old clone and not a lookalike, it's some
> pthreadish monster..
This one definitely isn't a pthread-specific problem. The old UNIX fork()
semantics for <pid> returning really are fairly broken, since the notion
of returning the pid in a local register is inherently racy for _anything_
that wants to maintain a list of its children and needs to access the list
in the SIGCHLD handler.
(Simple explanation: imagine a child that exits so quickly that the parent
hasn't even had time to do the "store pid into the array" before the
parent is already signalled with SIGCHLD. Yes, this happens, and yes, it's
a real problem. It wasn't that long ago that bash would _crash_ on this).
With a system call like this, the parent can
- allocate the new child information block first
- do the fork with the pid pointer pointing into the child information
block.
- when SIGCHLD for that child comes in, all state will have been set up
with no races.
Note how this isn't even thread-specific: it very much would work with a
regular fork-like approach and a standard shell.
> > with the help of this syscall glibc's next-generation pthreads code
>
> have you discussed this code with IBM's pthread group? I think we don't
> want to add a new syscall that is only useful to glibc..
I agree that the name is a bit ugly, but this is a system call that I
actually think is fundamentally useful (ie I can see how it would be
totally usable quite outside any specific library implementation issues).
Talking it over with the IBM threading guys is still worthwhile, though.
There may be _other_ information that the IBM guys have problems with, and
it could easily be that the interface we really want is even more generic.
Linus
On Tue Aug 13, 2002 at 04:44:15PM +0100, Christoph Hellwig wrote:
> On Tue, Aug 13, 2002 at 05:09:03PM +0200, Ingo Molnar wrote:
> >
> > the attached patch implements a new syscall on x86, clone_startup().
> > The parameters are:
> >
> > clone_startup(fn, child_stack, flags, tls_desc, pid_addr)
>
> First the name souns horrible. What about spawn_thread or create_thread
> instead? it's not our good old clone and not a lookalike, it's some
> pthreadish monster..
How about "clone2"?
-Erik
--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--
On Tue, Aug 13, 2002 at 10:09:24AM -0600, Erik Andersen wrote:
> > First the name souns horrible. What about spawn_thread or create_thread
> > instead? it's not our good old clone and not a lookalike, it's some
> > pthreadish monster..
>
> How about "clone2"?
Already used by ia64 for a hybrid between the good old clone and the new
monster :)
And as I said again, it doesn't really clone - it starts something
different, namely the fn argument.
On Tue, 13 Aug 2002, Christoph Hellwig wrote:
> > This one definitely isn't a pthread-specific problem. The old UNIX fork()
> > semantics for <pid> returning really are fairly broken, since the notion
> > of returning the pid in a local register is inherently racy for _anything_
> > that wants to maintain a list of its children and needs to access the list
> > in the SIGCHLD handler.
>
> The TLS setting makes it pretty pthread-specific, though (or at least
> thread-specific).
That's certainly true, and potentially worth a clone() flag of its own,
quite independently of the startup thing ("CLONE_SETTLS").
Ingo, how about breaking it down that way?
> Also the fn parameter makes it very different from
> both clone and fork.
The fn thing is a purely user-mode effect, it's not there in the system
call. Which is true of a regular clone() too.
> What about spawn() if you dislike a thread in the name?
spawn() to me implies doing the equivalent of "vfork()+execve()", the way
most non-UNIX OS's do new process creation.
I don't dislike the "thread" name too much, but I want this to be generic,
and seen as such. The same way the original clone() was a proper superset
of fork(), this needs to be a proper superset, not just in name, but in
_usage_. If it's useful for only one thing, that's not good.
Linus
>>>>> On Tue, 13 Aug 2002 17:11:38 +0100, Christoph Hellwig <[email protected]> said:
Chris> On Tue, Aug 13, 2002 at 10:09:24AM -0600, Erik Andersen
Chris> wrote:
>> > First the name souns horrible. What about spawn_thread or
>> create_thread > instead? it's not our good old clone and not a
>> lookalike, it's some > pthreadish monster..
>>
>> How about "clone2"?
Chris> Already used by ia64 for a hybrid between the good old clone
Chris> and the new monster :)
The original clone() system call was misdesigned. Even if you chose
to ignore ia64, clone() cannot be used by portable applications to
specify a stack (think "stack-growth direction").
clone() should have specified a stack memory *area* from the
beginning. (UNIX got this right from the beginning, see, e.g.,
sigaltstack()).
--david
On Tue, Aug 13, 2002 at 09:32:50AM -0700, David Mosberger wrote:
> clone() should have specified a stack memory *area* from the
> beginning. (UNIX got this right from the beginning, see, e.g.,
> sigaltstack()).
doesn't sigstack() predate that :)
regards
john
--
"It is unbecoming for young men to utter maxims."
- Aristotle
On Tue, 13 Aug 2002, Christoph Hellwig wrote:
>On Tue, Aug 13, 2002 at 10:09:24AM -0600, Erik Andersen wrote:
>
>And as I said again, it doesn't really clone - it starts something
>different, namely the fn argument.
How about:
sys_launch
or
sys_run_process
Ruth
--
Ruth Ivimey-Cook
Software engineer and technical writer.
On Tue, 13 Aug 2002, Erik Andersen wrote:
> > First the name souns horrible. What about spawn_thread or create_thread
> > instead? it's not our good old clone and not a lookalike, it's some
> > pthreadish monster..
>
> How about "clone2"?
ni fact it was sys_clone2() first time around, but Ulrich Drepper
requested another name for it because in glibc it collided with ia64 where
clone2() is something different. So whatever name there is going to be, it
should not be sys_clone2().
Ingo
On Tue, 13 Aug 2002, Christoph Hellwig wrote:
> > with the help of this syscall glibc's next-generation pthreads code
>
> have you discussed this code with IBM's pthread group? I think we don't
> want to add a new syscall that is only useful to glibc..
what i did was i thought about the problem in the generic context of
threading, so these syscalls help both glibc and NGPT.
(i'd have loved to test these concepts with NGPT as well, but i couldnt
get it to run the attached simple pthreads application (perf.c - measures
a few kinds of threaded workloads), with the latest NGPT version from the
NGPT website. So if anyone could send me a working static binary of said
application linked against NGPT, that would be great.)
btw., with the help of these syscalls and futexes, the layer between Linux
and pthreads became very thin - almost nonexistant in the majority of
cases.
Ingo
#define _GNU_SOURCE 1
#include <argp.h>
#include <error.h>
#include <errno.h>
#include <fcntl.h>
#include <inttypes.h>
#include <limits.h>
#include <pthread.h>
#include <signal.h>
#include <stdbool.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
#include <sys/types.h>
#ifndef MAX_THREADS
# define MAX_THREADS 100000
#endif
#ifndef DEFAULT_THREADS
# define DEFAULT_THREADS 50
#endif
#define OPT_TO_THREAD 300
#define OPT_TO_PROCESS 301
static const struct argp_option options[] =
{
{ NULL, 0, NULL, 0, "\
This is a test for threads so we allow ther user to selection the number of \
threads which are used at any one time. Independently the total number of \
rounds can be selected. This is the total number of threads which will have \
run when the process terminates:" },
{ "threads", 't', "NUMBER", 0, "Number of threads used at once" },
{ "starts", 's', "NUMBER", 0, "Total number of working threads" },
{ NULL, 0, NULL, 0, "\
Each thread can do one of two things: sleep or do work. The latter is 100% \
CPU bound. The work load is the probability a thread does work. All values \
from zero to 100 (inclusive) are valid. How often each thread repeats this \
can be determined by the number of rounds. The work cost determines how long \
each work session (not sleeping) takes. If it is zero a thread would \
effectively nothing. By setting the number of rounds to zero the thread \
does no work at all and pure thread creation times can be measured." },
{ "workload", 'w', "PERCENT", 0, "Percentage of time spent working" },
{ "workcost", 'c', "NUMBER", 0,
"Factor in the cost of each round of working" },
{ "rounds", 'r', "NUMBER", 0, "Number of rounds each thread runs" },
{ NULL, 0, NULL, 0, "\
One parameter for each threads execution is the size of the stack. If this \
parameter is not used the system's default stack size is used. If many \
threads are used the stack size should be chosen quite small." },
{ "stacksize", 'S', "BYTES", 0, "Size of threads stack" },
{ "guardsize", 'g', "BYTES", 0,
"Size of stack guard area; must fit into the stack" },
{ NULL, 0, NULL, 0, "Signal options:" },
{ "to-thread", OPT_TO_THREAD, NULL, 0, "Send signal to main thread" },
{ "to-process", OPT_TO_PROCESS, NULL, 0,
"Send signal to process (default)" },
{ NULL, 0, NULL, 0, "Administrative options:" },
{ "progress", 'p', NULL, 0, "Show signs of progress" },
{ "timing", 'T', NULL, 0,
"Measure time from startup to the last thread finishing" },
{ NULL, 0, NULL, 0, NULL }
};
/* Prototype for option handler. */
static error_t parse_opt (int key, char *arg, struct argp_state *state);
/* Data structure to communicate with argp functions. */
static struct argp argp =
{
options, parse_opt
};
static unsigned long int threads = DEFAULT_THREADS;
static unsigned long int workload = 50;
static unsigned long int workcost = 20;
static unsigned long int rounds = 100;
static long int starts = 5000;
static unsigned long int stacksize;
static long int guardsize = -1;
static bool progress;
static bool timing;
static bool to_thread;
static long int running;
static pthread_mutex_t running_mutex = PTHREAD_MUTEX_INITIALIZER;
static pid_t pid;
static pthread_t tmain;
static clockid_t cl;
static struct timespec start_time;
static pthread_mutex_t sum_mutex = PTHREAD_MUTEX_INITIALIZER;
unsigned int sum;
/* We use 64bit values for the times. */
typedef unsigned long long int hp_timing_t;
static void *
thread_function (void *arg)
{
unsigned long int i;
unsigned int state = (unsigned long int) arg;
for (i = 0; i < rounds; ++i)
{
/* Determine what to do. */
unsigned int rnum;
/* Equal distribution. */
do
rnum = rand_r (&state);
while (rnum >= UINT_MAX - (UINT_MAX % 100));
rnum %= 100;
if (rnum < workload)
{
int j;
int a[4] = { i, rnum, i + rnum, rnum - i };
if (progress)
write (STDERR_FILENO, "c", 1);
for (j = 0; j < workcost * 10000; ++j)
{
a[0] += a[3] >> 12;
a[1] += a[2] >> 20;
a[2] += a[1] ^ 0x3423423;
a[3] += a[0] - a[1];
}
pthread_mutex_lock (&sum_mutex);
sum += a[0] + a[1] + a[2] + a[3];
pthread_mutex_unlock (&sum_mutex);
}
else
{
/* Just sleep. */
struct timespec tv;
tv.tv_sec = 0;
tv.tv_nsec = 40000000;
if (progress)
write (STDERR_FILENO, "w", 1);
nanosleep (&tv, NULL);
}
}
pthread_mutex_lock (&running_mutex);
if (--running <= 0 && starts <= 0)
{
/* We are done. */
if (progress)
write (STDERR_FILENO, "\n", 1);
if (timing)
{
struct timespec end_time;
if (clock_gettime (cl, &end_time) == 0)
{
end_time.tv_sec -= start_time.tv_sec;
end_time.tv_nsec -= start_time.tv_nsec;
if (end_time.tv_nsec < 0)
{
end_time.tv_nsec += 1000000000;
--end_time.tv_sec;
}
printf ("\nRuntime: %lu.%09lu seconds\n",
(unsigned long int) end_time.tv_sec,
(unsigned long int) end_time.tv_nsec);
}
}
printf ("Result: %08x\n", sum);
exit (0);
}
pthread_mutex_unlock (&running_mutex);
if (to_thread)
/* This code sends a signal to the main thread. */
pthread_kill (tmain, SIGUSR1);
else
/* Use this code to test sending a signal to the process. */
kill (pid, SIGUSR1);
if (progress)
write (STDERR_FILENO, "f", 1);
return NULL;
}
int
main (int argc, char *argv[])
{
int remaining;
sigset_t ss;
pthread_attr_t attr;
/* Parse and process arguments. */
argp_parse (&argp, argc, argv, 0, &remaining, NULL);
if (timing)
{
if (clock_getcpuclockid (0, &cl) != 0
|| clock_gettime (cl, &start_time) != 0)
timing = false;
}
/* We need this later. */
pid = getpid ();
tmain = pthread_self ();
/* We use signal SIGUSR1 for communication between the threads and
the main thread. We only want sychronous notification. */
sigemptyset (&ss);
sigaddset (&ss, SIGUSR1);
if (sigprocmask (SIG_BLOCK, &ss, NULL) != 0)
error (EXIT_FAILURE, errno, "cannot set signal mask");
/* Create the thread attribute. */
pthread_attr_init (&attr);
/* If the user provided a stack size use it. */
if (stacksize != 0
&& pthread_attr_setstacksize (&attr, stacksize) != 0)
puts ("could not set stack size; will use default");
/* And stack guard size. */
if (guardsize != -1
&& pthread_attr_setguardsize (&attr, guardsize) != 0)
puts ("invalid stack guard size; will use default");
/* All threads are created detached. */
pthread_attr_setdetachstate (&attr, PTHREAD_CREATE_DETACHED);
while (1)
{
pthread_t th;
int err;
bool cont = true;
bool do_wait = false;;
pthread_mutex_lock (&running_mutex);
if (starts-- < 0)
cont = false;
else
do_wait = ++running >= threads && starts > 0;
pthread_mutex_unlock (&running_mutex);
if (! cont)
break;
if (progress)
write (STDERR_FILENO, "t", 1);
err = pthread_create (&th, &attr, thread_function, (void *) starts);
if (err != 0)
error (EXIT_FAILURE, err, "cannot start thread %lu", starts);
if (do_wait)
sigwaitinfo (&ss, NULL);
}
/* Do nothing anymore. On of the threads will terminate the program. */
sigfillset (&ss);
sigdelset (&ss, SIGINT);
while (1)
sigsuspend (&ss);
/* NOTREACHED */
return 0;
}
/* Handle program arguments. */
static error_t
parse_opt (int key, char *arg, struct argp_state *state)
{
unsigned long int num;
long int snum;
switch (key)
{
case 't':
num = strtoul (arg, NULL, 0);
if (num < MAX_THREADS)
threads = num;
else
printf ("\
number of threads limited to %u; recompile with a higher limit if necessary",
MAX_THREADS);
break;
case 'w':
num = strtoul (arg, NULL, 0);
if (num <= 100)
workload = num;
else
puts ("workload must be between 0 and 100 percent");
break;
case 'c':
workcost = strtoul (arg, NULL, 0);
break;
case 'r':
rounds = strtoul (arg, NULL, 0);
break;
case 's':
starts = strtoul (arg, NULL, 0);
break;
case 'S':
num = strtoul (arg, NULL, 0);
if (num >= PTHREAD_STACK_MIN)
stacksize = num;
else
printf ("minimum stack size is %d\n", PTHREAD_STACK_MIN);
break;
case 'g':
snum = strtol (arg, NULL, 0);
if (snum < 0)
printf ("invalid guard size %s\n", arg);
else
guardsize = snum;
break;
case 'p':
progress = true;
break;
case 'T':
timing = true;
break;
case OPT_TO_THREAD:
to_thread = true;
break;
case OPT_TO_PROCESS:
to_thread = false;
break;
default:
return ARGP_ERR_UNKNOWN;
}
return 0;
}
static hp_timing_t
get_clockfreq (void)
{
/* We read the information from the /proc filesystem. It contains at
least one line like
cpu MHz : 497.840237
or also
cpu MHz : 497.841
We search for this line and convert the number in an integer. */
static hp_timing_t result;
int fd;
/* If this function was called before, we know the result. */
if (result != 0)
return result;
fd = open ("/proc/cpuinfo", O_RDONLY);
if (__builtin_expect (fd != -1, 1))
{
/* XXX AFAIK the /proc filesystem can generate "files" only up
to a size of 4096 bytes. */
char buf[4096];
ssize_t n;
n = read (fd, buf, sizeof buf);
if (__builtin_expect (n, 1) > 0)
{
char *mhz = memmem (buf, n, "cpu MHz", 7);
if (__builtin_expect (mhz != NULL, 1))
{
char *endp = buf + n;
int seen_decpoint = 0;
int ndigits = 0;
/* Search for the beginning of the string. */
while (mhz < endp && (*mhz < '0' || *mhz > '9') && *mhz != '\n')
++mhz;
while (mhz < endp && *mhz != '\n')
{
if (*mhz >= '0' && *mhz <= '9')
{
result *= 10;
result += *mhz - '0';
if (seen_decpoint)
++ndigits;
}
else if (*mhz == '.')
seen_decpoint = 1;
++mhz;
}
/* Compensate for missing digits at the end. */
while (ndigits++ < 6)
result *= 10;
}
}
close (fd);
}
return result;
}
int
clock_getcpuclockid (pid_t pid, clockid_t *clock_id)
{
/* We don't allow any process ID but our own. */
if (pid != 0 && pid != getpid ())
return EPERM;
#ifdef CLOCK_PROCESS_CPUTIME_ID
/* Store the number. */
*clock_id = CLOCK_PROCESS_CPUTIME_ID;
return 0;
#else
/* We don't have a timer for that. */
return ENOENT;
#endif
}
#define HP_TIMING_NOW(Var) __asm__ __volatile__ ("rdtsc" : "=A" (Var))
/* Get current value of CLOCK and store it in TP. */
int
clock_gettime (clockid_t clock_id, struct timespec *tp)
{
int retval = -1;
switch (clock_id)
{
case CLOCK_PROCESS_CPUTIME_ID:
{
static hp_timing_t freq;
hp_timing_t tsc;
/* Get the current counter. */
HP_TIMING_NOW (tsc);
if (freq == 0)
{
freq = get_clockfreq ();
if (freq == 0)
return EINVAL;
}
/* Compute the seconds. */
tp->tv_sec = tsc / freq;
/* And the nanoseconds. This computation should be stable until
we get machines with about 16GHz frequency. */
tp->tv_nsec = ((tsc % freq) * UINT64_C (1000000000)) / freq;
retval = 0;
}
break;
default:
errno = EINVAL;
break;
}
return retval;
}
On Tue, 13 Aug 2002, Christoph Hellwig wrote:
> On Tue, Aug 13, 2002 at 10:09:24AM -0600, Erik Andersen wrote:
> > > First the name souns horrible. What about spawn_thread or create_thread
> > > instead? it's not our good old clone and not a lookalike, it's some
> > > pthreadish monster..
> >
> > How about "clone2"?
>
> Already used by ia64 for a hybrid between the good old clone and the new
> monster :)
>
> And as I said again, it doesn't really clone - it starts something
> different, namely the fn argument.
>
fn_startup()
Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.
On Tue, 13 Aug 2002, Ingo Molnar wrote:
>
> CLONE_SETTLS => if present then the third clone() syscall parameter
> is the new TLS.
>
> CLONE_SETTID => if present then the child TID is written to the
> address specified by the fourth clone() parameter.
Except you actually test the CLONE_SETTLS bit..
This looks basically ok, although that "struct thread_struct *t" still
serves no useful purpose..
Linus
okay, the attached patch gets rid of clone_startup() and adds two new
clone() flags instead:
CLONE_SETTLS => if present then the third clone() syscall parameter
is the new TLS.
CLONE_SETTID => if present then the child TID is written to the
address specified by the fourth clone() parameter.
the new parameters are handled in a safe way, clone() returns -EFAULT or
-EINVAL if there's some problem with them.
No current code is affected by these new flags. Patch was testbooted on
2.5.31-BK-current.
Ingo
--- linux/arch/i386/kernel/process.c.orig Tue Aug 13 20:10:25 2002
+++ linux/arch/i386/kernel/process.c Tue Aug 13 20:30:11 2002
@@ -559,6 +559,7 @@
unsigned long unused,
struct task_struct * p, struct pt_regs * regs)
{
+ struct thread_struct *t = &p->thread;
struct pt_regs * childregs;
struct task_struct *tsk;
@@ -567,17 +568,45 @@
childregs->eax = 0;
childregs->esp = esp;
- p->thread.esp = (unsigned long) childregs;
- p->thread.esp0 = (unsigned long) (childregs+1);
+ t->esp = (unsigned long) childregs;
+ t->esp0 = (unsigned long) (childregs+1);
+ t->eip = (unsigned long) ret_from_fork;
- p->thread.eip = (unsigned long) ret_from_fork;
-
- savesegment(fs,p->thread.fs);
- savesegment(gs,p->thread.gs);
+ savesegment(fs, t->fs);
+ savesegment(gs, t->gs);
tsk = current;
unlazy_fpu(tsk);
- struct_cpy(&p->thread.i387, &tsk->thread.i387);
+ struct_cpy(&t->i387, &tsk->thread.i387);
+
+ /*
+ * Set a new TLS for the child thread?
+ */
+ if (clone_flags & CLONE_SETTLS) {
+ struct desc_struct *desc;
+ struct user_desc info;
+ int idx;
+
+ if (copy_from_user(&info, (void *)childregs->esi, sizeof(info)))
+ return -EFAULT;
+ if (LDT_empty(&info))
+ return -EINVAL;
+
+ idx = info.entry_number;
+ if (idx < GDT_ENTRY_TLS_MIN || idx > GDT_ENTRY_TLS_MAX)
+ return -EINVAL;
+
+ desc = t->tls_array + idx - GDT_ENTRY_TLS_MIN;
+ desc->a = LDT_entry_a(&info);
+ desc->b = LDT_entry_b(&info);
+ }
+
+ /*
+ * Notify the child of the TID?
+ */
+ if (clone_flags & CLONE_SETTLS)
+ if (put_user(p->pid, (pid_t *)childregs->edx))
+ return -EFAULT;
if (unlikely(NULL != tsk->thread.ts_io_bitmap)) {
p->thread.ts_io_bitmap = kmalloc(IO_BITMAP_BYTES, GFP_KERNEL);
@@ -747,8 +776,7 @@
asmlinkage int sys_clone(struct pt_regs regs)
{
struct task_struct *p;
- unsigned long clone_flags;
- unsigned long newsp;
+ unsigned long clone_flags, newsp;
clone_flags = regs.ebx;
newsp = regs.ecx;
--- linux/include/linux/sched.h.orig Tue Aug 13 19:55:06 2002
+++ linux/include/linux/sched.h Tue Aug 13 20:27:23 2002
@@ -45,6 +45,8 @@
#define CLONE_THREAD 0x00010000 /* Same thread group? */
#define CLONE_NEWNS 0x00020000 /* New namespace group? */
#define CLONE_SYSVSEM 0x00040000 /* share system V SEM_UNDO semantics */
+#define CLONE_SETTLS 0x00080000 /* create a new TLS for the child */
+#define CLONE_SETTID 0x00100000 /* write the TID back to userspace */
#define CLONE_SIGNAL (CLONE_SIGHAND | CLONE_THREAD)
On Tue, 13 Aug 2002, Linus Torvalds wrote:
> > CLONE_SETTLS => if present then the third clone() syscall parameter
> > is the new TLS.
> >
> > CLONE_SETTID => if present then the child TID is written to the
> > address specified by the fourth clone() parameter.
>
> Except you actually test the CLONE_SETTLS bit..
We've tested clone_startup() with real threads on a 2.4-backported version
of yesterday's final TLS API quite extensively, and it works as expected.
(as we've tested earlier incarnations of the TLS API and code as well.)
This portion of the code is not changed by CLONE_SETTLS - so it should
work equally well, unless i've done something stupid ... I'll backport
this and will fit glibc to the new API later today. (Ulrich's on LW)
> This looks basically ok, although that "struct thread_struct *t" still
> serves no useful purpose..
this was just me shortening some code a bit. I've attached a new patch
that gets rid of those unrelated changes and does the CLONE_SETTLS /
CLONE_SETTID bits only.
Ingo
--- linux/arch/i386/kernel/process.c.orig Tue Aug 13 21:08:01 2002
+++ linux/arch/i386/kernel/process.c Tue Aug 13 21:10:36 2002
@@ -579,6 +579,35 @@
unlazy_fpu(tsk);
struct_cpy(&p->thread.i387, &tsk->thread.i387);
+ /*
+ * Set a new TLS for the child thread?
+ */
+ if (clone_flags & CLONE_SETTLS) {
+ struct desc_struct *desc;
+ struct user_desc info;
+ int idx;
+
+ if (copy_from_user(&info, (void *)childregs->esi, sizeof(info)))
+ return -EFAULT;
+ if (LDT_empty(&info))
+ return -EINVAL;
+
+ idx = info.entry_number;
+ if (idx < GDT_ENTRY_TLS_MIN || idx > GDT_ENTRY_TLS_MAX)
+ return -EINVAL;
+
+ desc = p->thread.tls_array + idx - GDT_ENTRY_TLS_MIN;
+ desc->a = LDT_entry_a(&info);
+ desc->b = LDT_entry_b(&info);
+ }
+
+ /*
+ * Notify the child of the TID?
+ */
+ if (clone_flags & CLONE_SETTLS)
+ if (put_user(p->pid, (pid_t *)childregs->edx))
+ return -EFAULT;
+
if (unlikely(NULL != tsk->thread.ts_io_bitmap)) {
p->thread.ts_io_bitmap = kmalloc(IO_BITMAP_BYTES, GFP_KERNEL);
if (!p->thread.ts_io_bitmap)
--- linux/include/linux/sched.h.orig Tue Aug 13 21:01:51 2002
+++ linux/include/linux/sched.h Tue Aug 13 21:08:56 2002
@@ -45,6 +45,8 @@
#define CLONE_THREAD 0x00010000 /* Same thread group? */
#define CLONE_NEWNS 0x00020000 /* New namespace group? */
#define CLONE_SYSVSEM 0x00040000 /* share system V SEM_UNDO semantics */
+#define CLONE_SETTLS 0x00080000 /* create a new TLS for the child */
+#define CLONE_SETTID 0x00100000 /* write the TID back to userspace */
#define CLONE_SIGNAL (CLONE_SIGHAND | CLONE_THREAD)
On Tue, 13 Aug 2002, Linus Torvalds wrote:
> > CLONE_SETTID => if present then the child TID is written to the
> > address specified by the fourth clone() parameter.
>
> Except you actually test the CLONE_SETTLS bit..
doh - Joe Perches finally told me how to parse this sentence. Correct
patch is:
--- linux/arch/i386/kernel/process.c.orig Tue Aug 13 21:08:01 2002
+++ linux/arch/i386/kernel/process.c Tue Aug 13 21:10:36 2002
@@ -579,6 +579,35 @@
unlazy_fpu(tsk);
struct_cpy(&p->thread.i387, &tsk->thread.i387);
+ /*
+ * Set a new TLS for the child thread?
+ */
+ if (clone_flags & CLONE_SETTLS) {
+ struct desc_struct *desc;
+ struct user_desc info;
+ int idx;
+
+ if (copy_from_user(&info, (void *)childregs->esi, sizeof(info)))
+ return -EFAULT;
+ if (LDT_empty(&info))
+ return -EINVAL;
+
+ idx = info.entry_number;
+ if (idx < GDT_ENTRY_TLS_MIN || idx > GDT_ENTRY_TLS_MAX)
+ return -EINVAL;
+
+ desc = p->thread.tls_array + idx - GDT_ENTRY_TLS_MIN;
+ desc->a = LDT_entry_a(&info);
+ desc->b = LDT_entry_b(&info);
+ }
+
+ /*
+ * Notify the child of the TID?
+ */
+ if (clone_flags & CLONE_SETTID)
+ if (put_user(p->pid, (pid_t *)childregs->edx))
+ return -EFAULT;
+
if (unlikely(NULL != tsk->thread.ts_io_bitmap)) {
p->thread.ts_io_bitmap = kmalloc(IO_BITMAP_BYTES, GFP_KERNEL);
if (!p->thread.ts_io_bitmap)
--- linux/include/linux/sched.h.orig Tue Aug 13 21:01:51 2002
+++ linux/include/linux/sched.h Tue Aug 13 21:08:56 2002
@@ -45,6 +45,8 @@
#define CLONE_THREAD 0x00010000 /* Same thread group? */
#define CLONE_NEWNS 0x00020000 /* New namespace group? */
#define CLONE_SYSVSEM 0x00040000 /* share system V SEM_UNDO semantics */
+#define CLONE_SETTLS 0x00080000 /* create a new TLS for the child */
+#define CLONE_SETTID 0x00100000 /* write the TID back to userspace */
#define CLONE_SIGNAL (CLONE_SIGHAND | CLONE_THREAD)
On Tue, 13 Aug 2002, Linus Torvalds wrote:
> It's still buggy, and you didn't read what I wrote.
yeah - paperbag time.
> Notice how your testing did not find it, since you always just set both
> flags.
yes, the test used the old single CLONE_STARTUP flag.
Ingo
On Tue, 13 Aug 2002, Ingo Molnar wrote:
> >
> > Except you actually test the CLONE_SETTLS bit..
>
> We've tested clone_startup() with real threads on a 2.4-backported version
> of yesterday's final TLS API quite extensively, and it works as expected.
> (as we've tested earlier incarnations of the TLS API and code as well.)
It's still buggy, and you didn't read what I wrote.
+ /*
+ * Notify the child of the TID?
+ */
+ if (clone_flags & CLONE_SETTLS)
+ if (put_user(p->pid, (pid_t *)childregs->edx))
+ return -EFAULT;
Find the bug. Find the sentence I wrote that pointed it out last time.
Notice how your testing did not find it, since you always just set both
flags.
Linus
Followup to: <[email protected]>
By author: David Mosberger <[email protected]>
In newsgroup: linux.dev.kernel
>
> The original clone() system call was misdesigned. Even if you chose
> to ignore ia64, clone() cannot be used by portable applications to
> specify a stack (think "stack-growth direction").
>
This is something that can be handled in userspace on most
architectures. The rest (ia64) can pass all the information on to
kernel space.
The clone() system call cannot be used by portable applications *AT
ALL*, since it inherently needs a user-space assembly wrapper. It's
just a matter of how you define the interface to the assembly wrapper.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>
David Mosberger wrote:
>>>>>>On 13 Aug 2002 12:52:11 -0700, "H. Peter Anvin" <[email protected]> said:
>>>>>
>
> >> The clone() system call cannot be used by portable applications
> >> *AT ALL*, since it inherently needs a user-space assembly
> >> wrapper. It's just a matter of how you define the interface to
> >> the assembly wrapper.
>
> The function call issue is a separate question though. If you want to
> argue that the libc clone() wrapper is broken, fine by me. (BTW:
> wasn't that you who complained about platform-specific Linux syscalls
> recently? ;-)
>
I was, however, the flaws that you complained of had nothing to do with
the syscall -- it's all in the syscall wrapper (which is required for
clone(), like it or not.)
-hpa
>>>>> On 13 Aug 2002 12:52:11 -0700, "H. Peter Anvin" <[email protected]> said:
>> The clone() system call cannot be used by portable applications
>> *AT ALL*, since it inherently needs a user-space assembly
>> wrapper. It's just a matter of how you define the interface to
>> the assembly wrapper.
The function call issue is a separate question though. If you want to
argue that the libc clone() wrapper is broken, fine by me. (BTW:
wasn't that you who complained about platform-specific Linux syscalls
recently? ;-)
--david
>>>>> On Tue, 13 Aug 2002 13:27:02 -0700, "H. Peter Anvin" <[email protected]> said:
>> I was, however, the flaws that you complained of had nothing to
>> do with the syscall -- it's all in the syscall wrapper (which is
>> required for clone(), like it or not.)
The issue is not whether a wrapper is needed or not.
My point is that it is cleaner to always describe stack areas as
memory areas (e.g., as a base/size pair). Note that this is
effectively what's happening in the platform-independent part of the
kernel today.
--david
[email protected] (Ingo Molnar) wrote on 13.08.02 in <[email protected]>:
> On Tue, 13 Aug 2002, Erik Andersen wrote:
>
> > > First the name souns horrible. What about spawn_thread or create_thread
> > > instead? it's not our good old clone and not a lookalike, it's some
> > > pthreadish monster..
> >
> > How about "clone2"?
>
> ni fact it was sys_clone2() first time around, but Ulrich Drepper
> requested another name for it because in glibc it collided with ia64 where
> clone2() is something different. So whatever name there is going to be, it
> should not be sys_clone2().
clone_and_start() or clone_and_go() or something along those lines,
perhaps?
MfG Kai
David Mosberger wrote:
>>>>>>On Tue, 13 Aug 2002 13:27:02 -0700, "H. Peter Anvin" <[email protected]> said:
>>>>>
>
> >> I was, however, the flaws that you complained of had nothing to
> >> do with the syscall -- it's all in the syscall wrapper (which is
> >> required for clone(), like it or not.)
>
> The issue is not whether a wrapper is needed or not.
>
> My point is that it is cleaner to always describe stack areas as
> memory areas (e.g., as a base/size pair). Note that this is
> effectively what's happening in the platform-independent part of the
> kernel today.
>
Right, but all the stack handling is a matter of the wrapper. sys_clone
doesn't affect the stack at all.
-hpa
On Tue, 13 Aug 2002 19:27:35 +0200 (CEST)
Ingo Molnar <[email protected]> wrote:
> btw., with the help of these syscalls and futexes, the layer between Linux
> and pthreads became very thin - almost nonexistant in the majority of
> cases.
Woohoo!
Please send futex patch BTW, I wanna see 8)
Rusty.
--
there are those who do and those who hang on and you don't see too
many doers quoting their contemporaries. -- Larry McVoy
Rusty Russell wrote:
> > btw., with the help of these syscalls and futexes, the layer between Linux
> > and pthreads became very thin - almost nonexistant in the majority of
> > cases.
>
> Woohoo!
Yes, futexes are wonderful, if quite difficult to figure out at first.
(I still haven't understood why the futex example reader-writer locks are
they way are.)
Even things like waitpid() are potentially faster with futexes -- no pid
to lookup!
-- Jamie
Hi,
On 13 Aug 2002, Kai Henningsen wrote:
> clone_and_start() or clone_and_go() or something along those lines,
> perhaps?
clone_n_kick()
Thunder
--
--./../...-/. -.--/---/..-/.-./..././.-../..-. .---/..-/.../- .-
--/../-./..-/-/./--..-- ../.----./.-../.-.. --./../...-/. -.--/---/..-
.- -/---/--/---/.-./.-./---/.--/.-.-.-
--./.-/-.../.-./.././.-../.-.-.-
About new clone() flags...
One of the obvious things for a thread library is to:
(a) try clone() with lots of snazzy flags
(b) if that returns -EINVAL, we must be running on an older kernel;
try with fewer flags and more workarounds
However, I don't see any code in sys_clone() that rejects a call that
specifies unknown flags. So, code that uses e.g. CLONE_SETTID will
appear to run perfectly well on an old kernel... except that it will
behave incorrectly.
That leads to having to write some silly test for each feature prior to
using it, instead of trying it and falling back. E.g. I'd need to
do the silly signal-blocking workaround when creating the second thread
in a program, just to find out whether CLONE_SETTID actually worked.
Either that, or check the kernel version.
Ingo, how do you handle this sort of backward compatibility in your
latest pthreads library, or don't you do backward compatibility?
For future-proofing, here's a patch:
diff -u linux-2.5/kernel/fork.c.orig linux-2.5/kernel/fork.c
--- linux-2.5/kernel/fork.c
+++ linux-2.5/kernel/fork.c Thu Aug 15 23:35:00 2002
@@ -619,6 +619,11 @@
struct task_struct *p = NULL;
struct completion vfork;
+ if ((clone_flags & ~(0UL|CSIGNAL|CLONE_VM|CLONE_FS|CLONE_FILES
+ |CLONE_SIGHAND|CLONE_PID|CLONE_PTRACE|CLONE_VFORK
+ |CLONE_PARENT|CLONE_THREAD)))
+ return ERR_PTR (-EINVAL);
+
if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
return ERR_PTR(-EINVAL);
On Thu, 15 Aug 2002, Jamie Lokier wrote:
> Ingo, how do you handle this sort of backward compatibility in your
> latest pthreads library, or don't you do backward compatibility?
[btw., it's not me doing it but Ulrich Drepper. I'm mostly doing the 'lets
find out how the kernel could help' side of things.]
the proper way of doing this is a way of getting fundamental kernel
capabilities, not the 'probing' of the kernel in various ways. Glibc
starts looking like old ISA drivers trying to do nonintrusive
autodetection: 'lets try this port carefully without disturbing state,
perhaps this feature is there'.
one way to handle this cleanly would be to add a kernel capabilities
bitmask to sysconf(), and backport this to all mainstream Linux kernels
where current glibc is supposed to run. Support for something like this
would be added to glibc the day it's in the main kernel. Eg. glibc has to
symlink in the old pthreads library upon bootup, if the feature-set does
not enable the more integrated threading library.
(nevertheless your patch makes good sense, from an API-cleanliness POV.)
Ingo
Hi!
> > Ingo, how do you handle this sort of backward compatibility in your
> > latest pthreads library, or don't you do backward compatibility?
>
> [btw., it's not me doing it but Ulrich Drepper. I'm mostly doing the 'lets
> find out how the kernel could help' side of things.]
>
> the proper way of doing this is a way of getting fundamental kernel
> capabilities, not the 'probing' of the kernel in various ways. Glibc
> starts looking like old ISA drivers trying to do nonintrusive
> autodetection: 'lets try this port carefully without disturbing state,
> perhaps this feature is there'.
>
> one way to handle this cleanly would be to add a kernel capabilities
> bitmask to sysconf(), and backport this to all mainstream Linux kernels
I'm afraid that bitmask will get out-of-date. Doing EINVAL seems like a good
way to do this. [Hmm, perhaps we need CLONE_DONT which only checks capabilities
and returns? It still seems better than sysconf...]
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.