2004-06-05 15:24:59

by Russell Leighton

[permalink] [raw]
Subject: clone() <-> getpid() bug in 2.6?


#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sched.h>
#include <signal.h>
#include <sys/wait.h>

/* to compile:
gcc -Wall clone-pid-test.c -o clone-pid-test
*/

static int run_thread(void *arg)
{
printf("thread reported pid: %d\n", getpid());

return 0;
}

static char stack[4096];

static int create_thread()
{
int
pid;

/* create thread */
if ( (pid = clone(run_thread, stack + sizeof(stack) -1,
CLONE_FS|CLONE_FILES|CLONE_VM|SIGCHLD, NULL)) == -1 ) {
perror("clone:");
exit(-1);
}/* end if */

return pid ;
}

int main(int argc, char *argv[])
{
printf("parent pid: %d\n", getpid());
printf("clone returned pid: %d\n", create_thread());
wait(NULL);
return 0;
}


Attachments:
clone-pid-test.c (733.00 B)

2004-06-05 20:45:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?



On Sat, 5 Jun 2004, Russell Leighton wrote:
>
> I have a test program (see attached) that shows what looks like a bug in
> 2.6.5-1.358 (FedoraCore2)...and breaks my program :(
>
> In summary, I am doing:
>
> clone(run_thread, stack + sizeof(stack) -1,
> CLONE_FS|CLONE_FILES|CLONE_VM|SIGCHLD, NULL))
>
> According to the man page the child process should have its own pid as
> returned by getpid()...much like fork().
>
> In 2.6 the child receives the parent's pid from getpid(), while 2.4
> works as documented:
>
> In 2.4 the test program does:
> parent pid: 26647
> clone returned pid: 26648
> thread reported pid: 26648
>
> In 2.6 the test program does:
> parent pid: 16665
> thread reported pid: 16665
> clone returned pid: 16666

Hmm.. The above is the correct behaviour if you use CLONE_THREAD
("getpid()" will then return the _thread_ ID), but it shouldn't happen
without that. And clearly you don't have it set.

And indeed, it doesn't happen for me on my system:

parent pid: 13552
thread reported pid: 13553
clone returned pid: 13553

so I wonder if either the Fedora libc always adds that CLONE_THREAD thing
to the clone() calls, or whether the FC2 kernel is buggy.

Arjan?

Linus

2004-06-05 20:56:04

by Arjan van de Ven

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

On Sat, Jun 05, 2004 at 01:45:33PM -0700, Linus Torvalds wrote:
> so I wonder if either the Fedora libc always adds that CLONE_THREAD thing
> to the clone() calls, or whether the FC2 kernel is buggy.

... or glibc internally caches the getpid() result and doesn't notice the
app calls clone() internally... strace seems to show 1 getpid() call total
not 2.


Attachments:
(No filename) (358.00 B)
(No filename) (189.00 B)
Download all attachments

2004-06-05 21:13:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?



On Sat, 5 Jun 2004, Arjan van de Ven wrote:
>
> ... or glibc internally caches the getpid() result and doesn't notice the
> app calls clone() internally... strace seems to show 1 getpid() call total
> not 2.

Why the hell does glibc do that totally useless optimization?

It's a classic "optimize for benchmarks" thing - evil. I don't see any
real app caring, and clearly apps _do_ fail when you get it wrong, as
shown by Russell.

And it's really easy to get it wrong. You can't just invalidate the cache
when you see a "clone()" - you have to make sure that you never ever cache
it ever after either.

Uli, if Arjan is right, then please fix this. It's a buggy and pointless
optimization. Anybody who optimizes purely for benchmarks should be
ashamed of themselves.

Linus


2004-06-05 21:48:18

by Robert Love

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

On Sat, 2004-06-05 at 14:13 -0700, Linus Torvalds wrote:

> Uli, if Arjan is right, then please fix this. It's a buggy and pointless
> optimization. Anybody who optimizes purely for benchmarks should be
> ashamed of themselves.

Eh, it definitely does, in nptl/sysdeps/unix/sysv/linux/getpid.c:

pid_t result = THREAD_GETMEM (THREAD_SELF, pid);
if (__builtin_expect (result <= 0, 0))
result = really_getpid (result);

A few places, including the fork code, fix it:

/* Adjust the PID field for the new process. */
THREAD_SETMEM (self, pid, THREAD_GETMEM (self, tid));

But not direct calls to clone(2).

Robert Love


2004-06-05 21:53:54

by Chris Wedgwood

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

On Sat, Jun 05, 2004 at 10:55:47PM +0200, Arjan van de Ven wrote:

> ... or glibc internally caches the getpid() result and doesn't
> notice the app calls clone() internally... strace seems to show 1
> getpid() call total not 2.

glibc caches getpid() ?!?

it's not like it's a slow syscall or used often


--cw

2004-06-05 22:44:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?



On Sat, 5 Jun 2004, Robert Love wrote:
>
> Eh, it definitely does, in nptl/sysdeps/unix/sysv/linux/getpid.c:

What a piece of shit.

> A few places, including the fork code, fix it:
>
> /* Adjust the PID field for the new process. */
> THREAD_SETMEM (self, pid, THREAD_GETMEM (self, tid));
>
> But not direct calls to clone(2).

As mentioned, you can't even "adjust" it in a clone(). It's just _wrong_
to cache _ever_ after a clone(), whether it was cached before or not. You
can't just re-set the pid information like in a fork().

The whole notion of caching "pid" is sick and idiotic. It has no redeeming
values.

Linus

2004-06-05 22:47:45

by Robert Love

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

On Sat, 2004-06-05 at 14:53 -0700, Chris Wedgwood wrote:

> glibc caches getpid() ?!?
>
> it's not like it's a slow syscall or used often

It is almost certainly done to improve the speed of some stupid
microbenchmark - say, one that just calls getpid() repeatedly (simple
because it is NOT slow) to measure system call overhead.

Or maybe libc uses the PID a lot internally. I don't know.

But it sure seems wrong.

Robert Love


2004-06-05 22:59:38

by David Miller

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

On Sat, 05 Jun 2004 18:47:43 -0400
Robert Love <[email protected]> wrote:

> It is almost certainly done to improve the speed of some stupid
> microbenchmark - say, one that just calls getpid() repeatedly (simple
> because it is NOT slow) to measure system call overhead.

Luckily lmbench and friends use getppid() specifically because it
is impossible to cache that (as far as I can tell).

2004-06-05 23:01:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?



On Sat, 5 Jun 2004, Robert Love wrote:
>
> It is almost certainly done to improve the speed of some stupid
> microbenchmark - say, one that just calls getpid() repeatedly (simple
> because it is NOT slow) to measure system call overhead.

getpid() is only used in bad benchmarks these days, exactly because
caching of "pid" has been done for a long time. Every single microkernel
OS ever built did it, I think.

So yes, you'll find some broken benchmarks using getpid(), but nobody sane
would take those benchmarks seriously anyway. If you want _real_ system
call performance measurements, use something like lmbench, which does:

- getppid() - same cost as getpid(), except you can't just cache the
results on any POSIX OS (well, you could, but it's more complex:
you end up having to map the uarea-equvalent into user space or have
some notifier for the parent dying).
- read/write: a hell of a lot more important than the simple system calls
anyway.
- open/close/stat/fstat: very common system calls with interesting
performance issues.

That gives you some _real_ data.

> Or maybe libc uses the PID a lot internally. I don't know.

I think some threading libraries historically used hashes of the
thread-specific pid to look up the thread-specific data. But since glibc
shows the same pid for all regular threads, and gets the pid wrong for
other threads, that's clearly _not_ a valid reason for caching it either.

Linus

2004-06-05 23:07:14

by Davide Libenzi

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

On Sat, 5 Jun 2004, Robert Love wrote:

> On Sat, 2004-06-05 at 14:53 -0700, Chris Wedgwood wrote:
>
> > glibc caches getpid() ?!?
> >
> > it's not like it's a slow syscall or used often
>
> It is almost certainly done to improve the speed of some stupid
> microbenchmark - say, one that just calls getpid() repeatedly (simple
> because it is NOT slow) to measure system call overhead.
>
> Or maybe libc uses the PID a lot internally. I don't know.
>
> But it sure seems wrong.

It is likely used by pthread_self(), that is pretty much performance
sensitive. I'd agree with Ulrich here.



- Davide

2004-06-05 23:18:31

by Linus Torvalds

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?



On Sat, 5 Jun 2004, Davide Libenzi wrote:
>
> It is likely used by pthread_self(), that is pretty much performance
> sensitive. I'd agree with Ulrich here.

It _can't_ be used for pthread_self(), since the pid is the _same_ across
all threads in a pthread environment.

See?

I re-iterate:

- getpid() was used in some historical threading packages (maybe even
LinuxThreads) to get the thread ID thing.

But this particular usage requires the old Linux behaviour of returning
separate threads ID's for separate threads. CLONE_THREAD does not do
that, and in particular the glibc caching _breaks_ any such attempt
even when you don't use CLONE_THREAD.

pthreads() in modern libc sets CLONE_THREAD and then uses the
per-thread segment thing to identify threads. No getpid() anywhere.

Ergo: getpid() caching may result in faster execution, but in this case
it's actively _wrong_, and breaks the app. It's easy to make things go
fast if you don't actually care about whether the end result is correct
or not.

- getpid() is used in some silly benchmarks.

But in this case caching getpid() just lies to the benchmark, and is
pointless.

Ergo: getpid() caching results in higher scores, but the scores are now
meaningless, since they have nothing to do with system call speed any
more.

So in both of these cases, the caching literally results in WRONG
behaviour.

Can somebody actually find an application that doesn't fall into either of
the above two categories, and calls getpid() more than a handful of times?

Linus

2004-06-05 23:19:57

by Robert Love

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

On Sat, 2004-06-05 at 16:07 -0700, Davide Libenzi wrote:

> It is likely used by pthread_self(), that is pretty much performance
> sensitive. I'd agree with Ulrich here.

I think it would want gettid(), not getpid(), for that. CLONE_THREAD
behavior has getpid() return the same PID for all threads..

Robert Love


2004-06-05 23:26:23

by Davide Libenzi

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

On Sat, 5 Jun 2004, Linus Torvalds wrote:

>
>
> On Sat, 5 Jun 2004, Davide Libenzi wrote:
> >
> > It is likely used by pthread_self(), that is pretty much performance
> > sensitive. I'd agree with Ulrich here.
>
> It _can't_ be used for pthread_self(), since the pid is the _same_ across
> all threads in a pthread environment.

Yeah, I just checked. It is not for that. And like Robert was saying, it'd
have been a gettid().



- Davide

2004-06-06 05:09:42

by Kalin KOZHUHAROV

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

#include <sys/types.h>
#include <sys/param.h>
#include <netdb.h>
#include <openssl/ssl.h>
#include "uint16.h"
#include "str.h"
#include "byte.h"
#include "fmt.h"
#include "scan.h"
#include "ip4.h"
#include "fd.h"
#include "exit.h"
#include "env.h"
#include "prot.h"
#include "open.h"
#include "wait.h"
#include "readwrite.h"
#include "stralloc.h"
#include "alloc.h"
#include "buffer.h"
#include "error.h"
#include "strerr.h"
#include "sgetopt.h"
#include "pathexec.h"
#include "socket.h"
#include "ndelay.h"
#include "remoteinfo.h"
#include "rules.h"
#include "sig.h"
#include "dns.h"

int verbosity = 1;
int flagkillopts = 1;
int flagdelay = 1;
char *banner = "";
int flagremoteinfo = 1;
int flagremotehost = 1;
int flagparanoid = 0;
unsigned long timeout = 26;
#ifdef WITH_SSL
int flagssl = 0;
struct stralloc certfile = {0};
#define CERTFILE "./cert.pem"

void translate(SSL*, int, int, unsigned int);
#endif

static stralloc tcpremoteinfo;

uint16 localport;
char localportstr[FMT_ULONG];
char localip[4];
char localipstr[IP4_FMT];
static stralloc localhostsa;
char *localhost = 0;

uint16 remoteport;
char remoteportstr[FMT_ULONG];
char remoteip[4];
char remoteipstr[IP4_FMT];
static stralloc remotehostsa;
char *remotehost = 0;

char strnum[FMT_ULONG];
char strnum2[FMT_ULONG];

static stralloc tmp;
static stralloc fqdn;
static stralloc addresses;

char bspace[16];
buffer b;



/* ---------------------------- child */

#define DROP "tcpserver: warning: dropping connection, "

int flagdeny = 0;
int flagallownorules = 0;
char *fnrules = 0;

void drop_nomem(void)
{
strerr_die2sys(111,DROP,"out of memory");
}
void cats(char *s)
{
if (!stralloc_cats(&tmp,s)) drop_nomem();
}
void append(char *ch)
{
if (!stralloc_append(&tmp,ch)) drop_nomem();
}
void safecats(char *s)
{
char ch;
int i;

for (i = 0;i < 100;++i) {
ch = s[i];
if (!ch) return;
if (ch < 33) ch = '?';
if (ch > 126) ch = '?';
if (ch == '%') ch = '?'; /* logger stupidity */
if (ch == ':') ch = '?';
append(&ch);
}
cats("...");
}
void env(char *s,char *t)
{
if (!pathexec_env(s,t)) drop_nomem();
}
void drop_rules(void)
{
strerr_die4sys(111,DROP,"unable to read ",fnrules,": ");
}

void found(char *data,unsigned int datalen)
{
unsigned int next0;
unsigned int split;

while ((next0 = byte_chr(data,datalen,0)) < datalen) {
switch(data[0]) {
case 'D':
flagdeny = 1;
break;
case '+':
split = str_chr(data + 1,'=');
if (data[1 + split] == '=') {
data[1 + split] = 0;
env(data + 1,data + 1 + split + 1);
}
break;
}
++next0;
data += next0; datalen -= next0;
}
}

void doit(int t)
{
int j;

remoteipstr[ip4_fmt(remoteipstr,remoteip)] = 0;

if (verbosity >= 2) {
strnum[fmt_ulong(strnum,getpid())] = 0;
strerr_warn4("tcpserver: pid ",strnum," from ",remoteipstr,0);
}

if (flagkillopts)
socket_ipoptionskill(t);
if (!flagdelay)
socket_tcpnodelay(t);

if (*banner) {
buffer_init(&b,write,t,bspace,sizeof bspace);
if (buffer_putsflush(&b,banner) == -1)
strerr_die2sys(111,DROP,"unable to print banner: ");
}

if (socket_local4(t,localip,&localport) == -1)
strerr_die2sys(111,DROP,"unable to get local address: ");

localipstr[ip4_fmt(localipstr,localip)] = 0;
remoteportstr[fmt_ulong(remoteportstr,remoteport)] = 0;

if (!localhost)
if (dns_name4(&localhostsa,localip) == 0)
if (localhostsa.len) {
if (!stralloc_0(&localhostsa)) drop_nomem();
localhost = localhostsa.s;
}
env("PROTO","TCP");
env("TCPLOCALIP",localipstr);
env("TCPLOCALPORT",localportstr);
env("TCPLOCALHOST",localhost);

if (flagremotehost)
if (dns_name4(&remotehostsa,remoteip) == 0)
if (remotehostsa.len) {
if (flagparanoid)
if (dns_ip4(&tmp,&remotehostsa) == 0)
for (j = 0;j + 4 <= tmp.len;j += 4)
if (byte_equal(remoteip,4,tmp.s + j)) {
flagparanoid = 0;
break;
}
if (!flagparanoid) {
if (!stralloc_0(&remotehostsa)) drop_nomem();
remotehost = remotehostsa.s;
}
}
env("TCPREMOTEIP",remoteipstr);
env("TCPREMOTEPORT",remoteportstr);
env("TCPREMOTEHOST",remotehost);

if (flagremoteinfo) {
if (remoteinfo(&tcpremoteinfo,remoteip,remoteport,localip,localport,timeout) == -1)
flagremoteinfo = 0;
if (!stralloc_0(&tcpremoteinfo)) drop_nomem();
}
env("TCPREMOTEINFO",flagremoteinfo ? tcpremoteinfo.s : 0);

if (fnrules) {
int fdrules;
fdrules = open_read(fnrules);
if (fdrules == -1) {
if (errno != error_noent) drop_rules();
if (!flagallownorules) drop_rules();
}
else {
if (rules(found,fdrules,remoteipstr,remotehost,flagremoteinfo ? tcpremoteinfo.s : 0) == -1) drop_rules();
close(fdrules);
}
}

if (verbosity >= 2) {
strnum[fmt_ulong(strnum,getpid())] = 0;
if (!stralloc_copys(&tmp,"tcpserver: ")) drop_nomem();
safecats(flagdeny ? "deny" : "ok");
cats(" "); safecats(strnum);
cats(" "); if (localhost) safecats(localhost);
cats(":"); safecats(localipstr);
cats(":"); safecats(localportstr);
cats(" "); if (remotehost) safecats(remotehost);
cats(":"); safecats(remoteipstr);
cats(":"); if (flagremoteinfo) safecats(tcpremoteinfo.s);
cats(":"); safecats(remoteportstr);
cats("\n");
buffer_putflush(buffer_2,tmp.s,tmp.len);
}

if (flagdeny) _exit(100);
}



/* ---------------------------- parent */

#define FATAL "tcpserver: fatal: "

void usage(void)
{
#ifndef WITH_SSL
strerr_warn1("\
tcpserver: usage: tcpserver \
[ -1UXpPhHrRoOdDqQv ] \
[ -c limit ] \
[ -x rules.cdb ] \
[ -B banner ] \
[ -g gid ] \
[ -u uid ] \
[ -b backlog ] \
[ -l localname ] \
[ -t timeout ] \
host port program",0);
#else
strerr_warn1("\
tcpserver: usage: tcpserver \
[ -1UXpPhHrRoOdDqQsSv ] \
[ -c limit ] \
[ -x rules.cdb ] \
[ -B banner ] \
[ -g gid ] \
[ -u uid ] \
[ -b backlog ] \
[ -l localname ] \
[ -t timeout ] \
[ -n certfile ] \
host port program",0);
#endif
_exit(100);
}

unsigned long limit = 40;
unsigned long numchildren = 0;

int flag1 = 0;
unsigned long backlog = 20;
unsigned long uid = 0;
unsigned long gid = 0;

void printstatus(void)
{
if (verbosity < 2) return;
strnum[fmt_ulong(strnum,numchildren)] = 0;
strnum2[fmt_ulong(strnum2,limit)] = 0;
strerr_warn4("tcpserver: status: ",strnum,"/",strnum2,0);
}

void sigterm()
{
_exit(0);
}

void sigchld()
{
int wstat;
int pid;

while ((pid = wait_nohang(&wstat)) > 0) {
if (verbosity >= 2) {
strnum[fmt_ulong(strnum,pid)] = 0;
strnum2[fmt_ulong(strnum2,wstat)] = 0;
strerr_warn4("tcpserver: end ",strnum," status ",strnum2,0);
}
if (numchildren) --numchildren; printstatus();
}
}

main(int argc,char **argv)
{
char *hostname;
char *portname;
int opt;
struct servent *se;
char *x;
unsigned long u;
int s;
int t;
#ifdef WITH_SSL
BIO *sbio;
SSL *ssl;
SSL_CTX *ctx;
int pi2c[2], pi4c[2];

ctx = NULL;

if (!stralloc_copys(&certfile, CERTFILE) || !stralloc_0(&certfile) )
strerr_die2x(111,FATAL,"out of memory");
while ((opt = getopt(argc,argv,"dDvqQhHrRsS1UXx:t:u:g:l:b:B:c:n:pPoO")) != opteof)
#else
while ((opt = getopt(argc,argv,"dDvqQhHrR1UXx:t:u:g:l:b:B:c:pPoO")) != opteof)
#endif
switch(opt) {
case 'b': scan_ulong(optarg,&backlog); break;
case 'c': scan_ulong(optarg,&limit); break;
case 'X': flagallownorules = 1; break;
case 'x': fnrules = optarg; break;
case 'B': banner = optarg; break;
case 'd': flagdelay = 1; break;
case 'D': flagdelay = 0; break;
case 'v': verbosity = 2; break;
case 'q': verbosity = 0; break;
case 'Q': verbosity = 1; break;
case 'P': flagparanoid = 0; break;
case 'p': flagparanoid = 1; break;
case 'O': flagkillopts = 1; break;
case 'o': flagkillopts = 0; break;
case 'H': flagremotehost = 0; break;
case 'h': flagremotehost = 1; break;
case 'R': flagremoteinfo = 0; break;
case 'r': flagremoteinfo = 1; break;
case 't': scan_ulong(optarg,&timeout); break;
case 'U': x = env_get("UID"); if (x) scan_ulong(x,&uid);
x = env_get("GID"); if (x) scan_ulong(x,&gid); break;
case 'u': scan_ulong(optarg,&uid); break;
case 'g': scan_ulong(optarg,&gid); break;
case '1': flag1 = 1; break;
case 'l': localhost = optarg; break;
#ifdef WITH_SSL
case 's': flagssl = 1; break;
case 'S': flagssl = 0; break;
case 'n': if (!stralloc_copys(&certfile, optarg) ||
!stralloc_0(&certfile) )
strerr_die2x(111,FATAL,"out of memory");
break;
#endif
default: usage();
}
argc -= optind;
argv += optind;

if (!verbosity)
buffer_2->fd = -1;

hostname = *argv++;
if (!hostname) usage();
if (str_equal(hostname,"")) hostname = "0.0.0.0";
if (str_equal(hostname,"0")) hostname = "0.0.0.0";

x = *argv++;
if (!x) usage();
if (!x[scan_ulong(x,&u)])
localport = u;
else {
se = getservbyname(x,"tcp");
if (!se)
strerr_die3x(111,FATAL,"unable to figure out port number for ",x);
localport = ntohs(se->s_port);
}

if (!*argv) usage();

sig_block(sig_child);
sig_catch(sig_child,sigchld);
sig_catch(sig_term,sigterm);
sig_ignore(sig_pipe);

if (!stralloc_copys(&tmp,hostname))
strerr_die2x(111,FATAL,"out of memory");
if (dns_ip4_qualify(&addresses,&fqdn,&tmp) == -1)
strerr_die4sys(111,FATAL,"temporarily unable to figure out IP address for ",hostname,": ");
if (addresses.len < 4)
strerr_die3x(111,FATAL,"no IP address for ",hostname);
byte_copy(localip,4,addresses.s);

#ifdef WITH_SSL
if (flagssl == 1) {
/* setup SSL context (load key and cert into ctx) */
SSL_library_init();
ctx=SSL_CTX_new(SSLv23_server_method());
if (!ctx) strerr_die2x(111,FATAL,"unable to create SSL context");

if(SSL_CTX_use_RSAPrivateKey_file(ctx, certfile.s, SSL_FILETYPE_PEM) != 1)
strerr_die2x(111,FATAL,"unable to load RSA private key");
if(SSL_CTX_use_certificate_file(ctx, certfile.s, SSL_FILETYPE_PEM) != 1)
strerr_die2x(111,FATAL,"unable to load certificate");
}
#endif

s = socket_tcp();
if (s == -1)
strerr_die2sys(111,FATAL,"unable to create socket: ");
if (socket_bind4_reuse(s,localip,localport) == -1)
strerr_die2sys(111,FATAL,"unable to bind: ");
if (socket_local4(s,localip,&localport) == -1)
strerr_die2sys(111,FATAL,"unable to get local address: ");
if (socket_listen(s,backlog) == -1)
strerr_die2sys(111,FATAL,"unable to listen: ");
ndelay_off(s);

if (gid) if (prot_gid(gid) == -1)
strerr_die2sys(111,FATAL,"unable to set gid: ");
if (uid) if (prot_uid(uid) == -1)
strerr_die2sys(111,FATAL,"unable to set uid: ");


localportstr[fmt_ulong(localportstr,localport)] = 0;
if (flag1) {
buffer_init(&b,write,1,bspace,sizeof bspace);
buffer_puts(&b,localportstr);
buffer_puts(&b,"\n");
buffer_flush(&b);
}

close(0);
close(1);
printstatus();

for (;;) {
while (numchildren >= limit) sig_pause();

sig_unblock(sig_child);
t = socket_accept4(s,remoteip,&remoteport);
sig_block(sig_child);

if (t == -1) continue;
++numchildren; printstatus();

switch(fork()) {
case 0:
close(s);
doit(t);
if ((fd_move(0,t) == -1) || (fd_copy(1,0) == -1))
strerr_die2sys(111,DROP,"unable to set up descriptors: ");
sig_uncatch(sig_child);
sig_unblock(sig_child);
sig_uncatch(sig_term);
sig_uncatch(sig_pipe);
#ifdef WITH_SSL
if (flagssl == 1) {
if (pipe(pi2c) != 0)
strerr_die2sys(111,DROP,"unable to create pipe: ");
if (pipe(pi4c) != 0)
strerr_die2sys(111,DROP,"unable to create pipe: ");
switch(fork()) {
case 0:
close(0); close(1);
close(pi2c[1]);
close(pi4c[0]);
if ((fd_move(0,pi2c[0]) == -1) || (fd_move(1,pi4c[1]) == -1))
strerr_die2sys(111,DROP,"unable to set up descriptors: ");
/* signals are allready set in the parent */
pathexec(argv);
strerr_die4sys(111,DROP,"unable to run ",*argv,": ");
case -1:
strerr_die2sys(111,DROP,"unable to fork: ");
default:
ssl = SSL_new(ctx);
if (!ssl)
strerr_die2x(111,DROP,"unable to set up SSL session");
sbio = BIO_new_socket(0,BIO_NOCLOSE);
if (!sbio)
strerr_die2x(111,DROP,"unable to set up BIO socket");
SSL_set_bio(ssl,sbio,sbio);
close(pi2c[0]);
close(pi4c[1]);
translate(ssl, pi2c[1], pi4c[0], 3600);
_exit(0);
}
}
#endif
pathexec(argv);
strerr_die4sys(111,DROP,"unable to run ",*argv,": ");
case -1:
strerr_warn2(DROP,"unable to fork: ",&strerr_sys);
--numchildren; printstatus();
}
close(t);
}
}

#ifdef WITH_SSL
static int allwrite(int fd, char *buf, int len)
{
int w;

while (len) {
w = write(fd,buf,len);
if (w == -1) {
if (errno == error_intr) continue;
return -1; /* note that some data may have been written */
}
if (w == 0) ; /* luser's fault */
buf += w;
len -= w;
}
return 0;
}

static int allwritessl(SSL* ssl, char *buf, int len)
{
int w;

while (len) {
w = SSL_write(ssl,buf,len);
if (w == -1) {
if (errno == error_intr) continue;
return -1; /* note that some data may have been written */
}
if (w == 0) ; /* luser's fault */
buf += w;
len -= w;
}
return 0;
}

char tbuf[2048];

void translate(SSL* ssl, int clearout, int clearin, unsigned int iotimeout)
{
struct taia now;
struct taia deadline;
iopause_fd iop[2];
int flagexitasap;
int iopl;
int sslout, sslin;
int n, r;

sslin = SSL_get_fd(ssl);
sslout = SSL_get_fd(ssl);
if (sslin == -1 || sslout == -1)
strerr_die2x(111,DROP,"unable to set up SSL connection");

flagexitasap = 0;

if (SSL_accept(ssl)<=0)
strerr_die2x(111,DROP,"unable to accept SSL connection");

while (!flagexitasap) {
taia_now(&now);
taia_uint(&deadline,iotimeout);
taia_add(&deadline,&now,&deadline);

/* fill iopause struct */
iopl = 2;
iop[0].fd = sslin;
iop[0].events = IOPAUSE_READ;
iop[1].fd = clearin;
iop[1].events = IOPAUSE_READ;

/* do iopause read */
iopause(iop,iopl,&deadline,&now);
if (iop[0].revents) {
/* data on sslin */
n = SSL_read(ssl, tbuf, sizeof(tbuf));
if ( n < 0 )
strerr_die2sys(111,DROP,"unable to read form network: ");
if ( n == 0 )
flagexitasap = 1;
r = allwrite(clearout, tbuf, n);
if ( r < 0 )
strerr_die2sys(111,DROP,"unable to write to client: ");
}
if (iop[1].revents) {
/* data on clearin */
n = read(clearin, tbuf, sizeof(tbuf));
if ( n < 0 )
strerr_die2sys(111,DROP,"unable to read form client: ");
if ( n == 0 )
flagexitasap = 1;
r = allwritessl(ssl, tbuf, n);
if ( r < 0 )
strerr_die2sys(111,DROP,"unable to write to network: ");
}
if (!iop[0].revents && !iop[1].revents)
strerr_die2x(0, DROP,"timeout reached without input");
}
}
#endif


Attachments:
tcpserver.c (14.69 kB)

2004-06-06 05:14:03

by Chris Wedgwood

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

On Sun, Jun 06, 2004 at 02:08:52PM +0900, Kalin KOZHUHAROV wrote:

> [pid 13097] getpid() = 13097
> [pid 13097] getpid() = 13097
> [pid 13097] getpid() = 13097
> [pid 13097] getpid() = 13097
> [pid 13097] getpid() = 13097
> [pid 13097] getpid() = 13097
> [pid 13097] getpid() = 13097
> [pid 13097] getpid() = 13097
> [pid 13097] getpid() = 13097

is this repeartable? if you strace -tt how often is this?

what's more, i wonder why this is going on? i'd almost be tempted to
attach to it with gdb and take a bt from getpid and see wtf is going
on


--cw

2004-06-06 05:34:42

by Kalin KOZHUHAROV

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

Chris Wedgwood wrote:
> On Sun, Jun 06, 2004 at 02:08:52PM +0900, Kalin KOZHUHAROV wrote:
>
>
>>[pid 13097] getpid() = 13097
>>[pid 13097] getpid() = 13097
>>[pid 13097] getpid() = 13097
>>[pid 13097] getpid() = 13097
>>[pid 13097] getpid() = 13097
>>[pid 13097] getpid() = 13097
>>[pid 13097] getpid() = 13097
>>[pid 13097] getpid() = 13097
>>[pid 13097] getpid() = 13097
>
>
> is this repeartable? if you strace -tt how often is this?
Yes, definately on this system. Here we go:

poi root # strace -tt -f -F -p 3550
Process 3550 attached - interrupt to quit
14:24:31.589227 accept(3, {sa_family=AF_INET, sin_port=htons(40575), sin_addr=inet_addr("192.168.1.20")}, [16]) = 0
14:25:04.068649 rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
14:25:04.068898 write(2, "tcpserver: status: 1/40\n", 24) = 24
14:25:04.069229 fork(Process 14320 attached
) = 14320
[pid 3550] 14:25:04.069663 close(0 <unfinished ...>
[pid 14320] 14:25:04.069771 close(3 <unfinished ...>
[pid 3550] 14:25:04.069874 <... close resumed> ) = 0
[pid 3550] 14:25:04.069977 rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
[pid 3550] 14:25:04.070169 accept(3, <unfinished ...>
[pid 14320] 14:25:04.070278 <... close resumed> ) = 0
[pid 14320] 14:25:04.070406 getpid() = 14320
[pid 14320] 14:25:04.070614 write(2, "tcpserver: pid 14320 from 192.16"..., 39) = 39
...
[pid 14320] 14:25:04.139116 close(3) = 0
[pid 14320] 14:25:04.139242 close(6) = 0
[pid 14320] 14:25:04.139350 time(NULL) = 1086499504
[pid 14320] 14:25:04.139516 getpid() = 14320
[pid 14320] 14:25:04.139742 read(0, "\26\3\1\0s\1\0\0o\3\1", 11) = 11
[pid 14320] 14:25:04.139910 time(NULL) = 1086499504
[pid 14320] 14:25:04.140017 getpid() = 14320
[pid 14320] 14:25:04.140125 read(0, "\0?\230A\323\323\304\226\v\37NI\310\3634i\375\346\300p"..., 109) = 109
[pid 14320] 14:25:04.140278 time(NULL) = 1086499504
[pid 14320] 14:25:04.140383 getpid() = 14320
[pid 14320] 14:25:04.140475 getpid() = 14320
[pid 14320] 14:25:04.140584 getpid() = 14320
[pid 14320] 14:25:04.140681 open("/dev/urandom", O_RDONLY|O_NONBLOCK|O_NOCTTY) = 3
[pid 14320] 14:25:04.140811 select(4, [3], NULL, NULL, {0, 10000}) = 1 (in [3], left {0, 10000})
[pid 14320] 14:25:04.140972 read(3, "g\t\270X\304\260\17P\'\367\t\"\253\375\24\350\272\240\327"..., 32) = 32
[pid 14320] 14:25:04.141130 close(3) = 0
[pid 14320] 14:25:04.141224 getpid() = 14320
[pid 14320] 14:25:04.141326 getpid() = 14320
[pid 14320] 14:25:04.141435 getuid32() = 89
[pid 14320] 14:25:04.141531 getpid() = 14320
[pid 14320] 14:25:04.141650 time(NULL) = 1086499504
[pid 14320] 14:25:04.141747 getpid() = 14320
[pid 14320] 14:25:04.141846 getpid() = 14320
[pid 14320] 14:25:04.141949 getpid() = 14320
[pid 14320] 14:25:04.142051 getpid() = 14320
[pid 14320] 14:25:04.142153 getpid() = 14320
[pid 14320] 14:25:04.142257 getpid() = 14320
[pid 14320] 14:25:04.142359 getpid() = 14320
[pid 14320] 14:25:04.142461 getpid() = 14320
[pid 14320] 14:25:04.142580 getpid() = 14320
[pid 14320] 14:25:04.142683 getpid() = 14320
[pid 14320] 14:25:04.142784 getpid() = 14320
[pid 14320] 14:25:04.142886 getpid() = 14320
[pid 14320] 14:25:04.142989 getpid() = 14320
[pid 14320] 14:25:04.143090 getpid() = 14320
[pid 14320] 14:25:04.143192 getpid() = 14320
[pid 14320] 14:25:04.143294 getpid() = 14320
[pid 14320] 14:25:04.143396 getpid() = 14320
[pid 14320] 14:25:04.143499 getpid() = 14320
[pid 14320] 14:25:04.143619 getpid() = 14320
[pid 14320] 14:25:04.143722 getpid() = 14320
[pid 14320] 14:25:04.143824 getpid() = 14320
[pid 14320] 14:25:04.143927 getpid() = 14320
[pid 14320] 14:25:04.144029 getpid() = 14320
[pid 14320] 14:25:04.144131 getpid() = 14320
[pid 14320] 14:25:04.144233 getpid() = 14320
[pid 14320] 14:25:04.144336 getpid() = 14320
[pid 14320] 14:25:04.144438 getpid() = 14320
[pid 14320] 14:25:04.144540 getpid() = 14320
[pid 14320] 14:25:04.144701 getpid() = 14320
[pid 14320] 14:25:04.144804 getpid() = 14320
[pid 14320] 14:25:04.144906 getpid() = 14320
[pid 14320] 14:25:04.145009 getpid() = 14320
[pid 14320] 14:25:04.145111 getpid() = 14320
[pid 14320] 14:25:04.145214 getpid() = 14320
[pid 14320] 14:25:04.145316 getpid() = 14320
[pid 14320] 14:25:04.145419 getpid() = 14320
[pid 14320] 14:25:04.145520 getpid() = 14320
[pid 14320] 14:25:04.145640 getpid() = 14320
[pid 14320] 14:25:04.145742 getpid() = 14320
[pid 14320] 14:25:04.145845 getpid() = 14320
[pid 14320] 14:25:04.145948 getpid() = 14320
[pid 14320] 14:25:04.146050 getpid() = 14320
[pid 14320] 14:25:04.146152 getpid() = 14320
[pid 14320] 14:25:04.146254 getpid() = 14320
[pid 14320] 14:25:04.146356 getpid() = 14320
[pid 14320] 14:25:04.146458 getpid() = 14320
[pid 14320] 14:25:04.146575 getpid() = 14320
[pid 14320] 14:25:04.146678 getpid() = 14320
[pid 14320] 14:25:04.146780 getpid() = 14320
[pid 14320] 14:25:04.146882 getpid() = 14320
[pid 14320] 14:25:04.146985 getpid() = 14320
[pid 14320] 14:25:04.147087 getpid() = 14320
[pid 14320] 14:25:04.147189 getpid() = 14320
[pid 14320] 14:25:04.147345 time(NULL) = 1086499504
[pid 14320] 14:25:04.147442 getpid() = 14320
[pid 14320] 14:25:04.147531 getpid() = 14320
[pid 14320] 14:25:04.147773 write(0, "\26\3\1\0J\2\0\0F\3\1@\302\252\260\204\203~\367\r\305G"..., 1127) = 1127
[pid 14320] 14:25:04.147937 read(0, "\26\3\1\1\6", 5) = 5
[pid 14320] 14:25:04.151416 read(0, "\20\0\1\2\1\0S\303\2\335E\264\330\250\261oj\307a~en\262"..., 262) = 262
[pid 14320] 14:25:04.151575 getpid() = 14320
[pid 14320] 14:25:04.151685 time([1086499504]) = 1086499504
[pid 14320] 14:25:04.151790 getpid() = 14320
[pid 14320] 14:25:04.151883 getpid() = 14320
[pid 14320] 14:25:04.155052 getpid() = 14320
[pid 14320] 14:25:04.155150 getpid() = 14320
[pid 14320] 14:25:04.178154 read(0, "\24\3\1\0\1", 5) = 5
[pid 14320] 14:25:04.178346 read(0, "\1", 1) = 1
[pid 14320] 14:25:04.178646 read(0, "\26\3\1\0000", 5) = 5
[pid 14320] 14:25:04.178810 read(0, "\10\215\371\334e+\324\356\224|\216\371\204\351\330f+\203"..., 48) = 48
[pid 14320] 14:25:04.179055 write(0, "\24\3\1\0\1\1\26\3\1\0000\26\2678\202\31\16\251\346\224"..., 59) = 59
[pid 14320] 14:25:04.179302 gettimeofday({1086499504, 179360}, NULL) = 0
...

The ppid changed because I restarted /service while testing.
I sent a short mail via SMTP/SSL to trigger the above trace.

> what's more, i wonder why this is going on? i'd almost be tempted to
> attach to it with gdb and take a bt from getpid and see wtf is going
> on
Well, there was no gdb on that machine, compiling it now.
AFAIR, tcpserver is stripped (it is a "production" machine), so I wonder how inforamative it will be.

I havent used gbd a lot, educate me what to do.
For example how do I set a bp to trigger when getpid() is called ?
I know how to set one in a function, but I guess function names are stripped.

Kalin.

--
||///_ o *****************************
||//'_/> WWW: http://ThinRope.net/
|||\/<"
|||\\ '
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

2004-06-06 06:07:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?



On Sun, 6 Jun 2004, Kalin KOZHUHAROV wrote:
>
> Well, not exactly sure about my reply, but let me try.
>
> The other day I was debugging some config problems with my qmail instalation and I ended up doing:
> # strace -p 4563 -f -F
> ...
> [pid 13097] read(3, "\347\374\375TBH~\342\233\337\220\302l\220\317\237\37\25"..., 32) = 32
> [pid 13097] close(3) = 0
> [pid 13097] getpid() = 13097
> [pid 13097] getpid() = 13097
> [pid 13097] getuid32() = 89
> [pid 13097] getpid() = 13097
> [pid 13097] time(NULL) = 1086497450
> [pid 13097] getpid() = 13097
> [pid 13097] getpid() = 13097
> [pid 13097] getpid() = 13097

qmail is a piece of crap. The source code is completely unreadable, and it
seems to think that "getpid()" is a good source of random data. Don't ask
me why.

It literally does things like

random = now() + (getpid() << 16);

and since there isn't a single comment in the whole source tree, it's
pointless to wonder why. (In case you wonder, "now()" just does a
"time(NULL)" call - whee.).

I don't understand why people bother with it. It's not like Dan Bernstein
is so charming that it makes up for the deficiencies of his programs.

But no, even despite the strange usage, this isn't a performance issue.
qmail will call "getpid()" a few tens of times per connection because of
the wonderful quality of randomness it provides, or something.

This is another gem you find when grepping for "getpid()" in qmail, and
apparently the source of most of them:

if (now() - when < ((60 + (getpid() & 31)) << 6))

Don't you love it how timeouts etc seem to be based on random values that
are calculated off the lower 5 bits of the process ID? And don't you find
the above (totally uncommented) line just a thing of beauty and clarity?

Yeah.

Anyway, you did find something that used more than a handful of getpid()
calls, but no, it doesn't qualify as performance-critical, and even
despite it's peyote-induced (or hey, some people are just crazy on their
own) getpid() usage, it's not a reason to have a buggy glibc.

Linus

2004-06-06 06:43:33

by Kalin KOZHUHAROV

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

Linus Torvalds wrote:
>
> On Sun, 6 Jun 2004, Kalin KOZHUHAROV wrote:
>
>>Well, not exactly sure about my reply, but let me try.
>>
>>The other day I was debugging some config problems with my qmail instalation and I ended up doing:
>># strace -p 4563 -f -F
>>...
>>[pid 13097] read(3, "\347\374\375TBH~\342\233\337\220\302l\220\317\237\37\25"..., 32) = 32
>>[pid 13097] close(3) = 0
>>[pid 13097] getpid() = 13097
>>[pid 13097] getpid() = 13097
>>[pid 13097] getuid32() = 89
>>[pid 13097] getpid() = 13097
>>[pid 13097] time(NULL) = 1086497450
>>[pid 13097] getpid() = 13097
>>[pid 13097] getpid() = 13097
>>[pid 13097] getpid() = 13097
>
>
> qmail is a piece of crap. The source code is completely unreadable, and it
> seems to think that "getpid()" is a good source of random data. Don't ask
> me why.
>
> It literally does things like
>
> random = now() + (getpid() << 16);
>
> and since there isn't a single comment in the whole source tree, it's
> pointless to wonder why. (In case you wonder, "now()" just does a
> "time(NULL)" call - whee.).
>
> I don't understand why people bother with it. It's not like Dan Bernstein
> is so charming that it makes up for the deficiencies of his programs.
Well, it just works once you set it up right. And many people use it and you can get community support to a certain extent.

> But no, even despite the strange usage, this isn't a performance issue.
> qmail will call "getpid()" a few tens of times per connection because of
> the wonderful quality of randomness it provides, or something.
>
> This is another gem you find when grepping for "getpid()" in qmail, and
> apparently the source of most of them:
>
> if (now() - when < ((60 + (getpid() & 31)) << 6))
>
> Don't you love it how timeouts etc seem to be based on random values that
> are calculated off the lower 5 bits of the process ID? And don't you find
> the above (totally uncommented) line just a thing of beauty and clarity?
:-) DJB is (in)famous for its "code clarity".

> Yeah.
>
> Anyway, you did find something that used more than a handful of getpid()
> calls, but no, it doesn't qualify as performance-critical, and even
> despite it's peyote-induced (or hey, some people are just crazy on their
> own) getpid() usage, it's not a reason to have a buggy glibc.

I definately agree that getpid() should not be cached as it gives inconsistent results.
That is why I just reported I case of "more than a handful of getpid() calls" that struck me recently.

Ok, I think I/we have no more to say about the above getpid() usage.

Is there anybody insisting on getpid() caching?
And can/will anydoby fix that in glibc?

Kalin.


--
||///_ o *****************************
||//'_/> WWW: http://ThinRope.net/
|||\/<"
|||\\ '
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

2004-06-06 07:58:11

by Erik Andersen

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

On Sat Jun 05, 2004 at 11:07:25PM -0700, Linus Torvalds wrote:
> qmail is a piece of crap. The source code is completely unreadable, and it
> seems to think that "getpid()" is a good source of random data. Don't ask
> me why.
>
> It literally does things like
>
> random = now() + (getpid() << 16);
[-----------snip-----------]

http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/libc/sysdeps/posix/tempname.c?rev=1.36&content-type=text/plain&cvsroot=glibc

/* Get some more or less random data. */
random_time_bits = ((uint64_t) tv.tv_usec << 16) ^ tv.tv_sec;
value += random_time_bits ^ __getpid ();

etc....

-Erik

--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--

2004-06-06 09:52:13

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

In article <[email protected]> you wrote:
> It literally does things like
>
> random = now() + (getpid() << 16);

It does that for the unique filenames and id stamps (maildir format and
message ids). But it should be easy to replace this with a cached getpid
result, if this is realy a performance problem. On a traditional unix system
pid and timestamp should be locally unique for non threaded applications.

> Anyway, you did find something that used more than a handful of getpid()
> calls, but no, it doesn't qualify as performance-critical, and even
> despite it's peyote-induced (or hey, some people are just crazy on their
> own) getpid() usage, it's not a reason to have a buggy glibc.

I wonder if it easyly would be possible to cache the getpid() result in some
thread local segment. Is there any, which is present for all clone flags?
Not tha I care much about this unneeded glibc optimizsation, but more out of
curiousity about the new threadind functionality.

Greetings
Bernd
--
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/

2004-06-06 13:07:56

by Paul Rolland

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

Hello,

> It does that for the unique filenames and id stamps
> (maildir format and

I though this was a job for mktemp and co.

Regards,
Paul


2004-06-06 14:25:28

by Russell Leighton

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?


I have read the discussion on this issue and I wanted to get
confirmation of my understanding...

Issue:
It seems glibc is caching getpid() which is wrong and breaks
programs like mine.

You are using an older version of glibc than I and that is why you
could run the test program.

Given the above, that means that:
Upgrading my kernel on the FedoraCore2 system won't help because the
bug is in glibc

Assuming that this is a "new" feature of glibc, any others upgrading
would then starting seeing this bug

Linus Torvalds wrote:

>On Sat, 5 Jun 2004, Russell Leighton wrote:
>
>
>>I have a test program (see attached) that shows what looks like a bug in
>>2.6.5-1.358 (FedoraCore2)...and breaks my program :(
>>
>>In summary, I am doing:
>>
>> clone(run_thread, stack + sizeof(stack) -1,
>> CLONE_FS|CLONE_FILES|CLONE_VM|SIGCHLD, NULL))
>>
>>According to the man page the child process should have its own pid as
>>returned by getpid()...much like fork().
>>
>>In 2.6 the child receives the parent's pid from getpid(), while 2.4
>>works as documented:
>>
>>In 2.4 the test program does:
>> parent pid: 26647
>> clone returned pid: 26648
>> thread reported pid: 26648
>>
>>In 2.6 the test program does:
>> parent pid: 16665
>> thread reported pid: 16665
>> clone returned pid: 16666
>>
>>
>
>Hmm.. The above is the correct behaviour if you use CLONE_THREAD
>("getpid()" will then return the _thread_ ID), but it shouldn't happen
>without that. And clearly you don't have it set.
>
>And indeed, it doesn't happen for me on my system:
>
> parent pid: 13552
> thread reported pid: 13553
> clone returned pid: 13553
>
>so I wonder if either the Fedora libc always adds that CLONE_THREAD thing
>to the clone() calls, or whether the FC2 kernel is buggy.
>
>Arjan?
>
> Linus
>
>
>
>

2004-06-06 15:34:45

by Russell Leighton

[permalink] [raw]
Subject: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]

Linus said he could not imagine a program using getpid() more than a
handful of times...
well, (I am ashamed to admit it) I found this getpid() bug with just
such a program.

Could someone can suggest an alternative to using getpid() for me?
Here's the problem...

I have a library that creates 2 threads using clone().
[NOTE: I can't use pthreads for a variety of reasons, mostly due
to the wacky signal handling rules...it turns out that using clone() is
cleaner for me anyway.]

Some of the code can ONLY be executed in one of the threads
called the 'handler' thread. The reason for that restriction is that
there are
performance, locking and synchronization issues that demand this
constraint.

Entry points to this code are exported for programmers to use.
I currently have a check in these functions similar to:

if ( !INHANDLERCONTEXT(mon) ) barf();

where:

#define INHANDLERCONTEXT(mon) ( (mon)->handler_q.thread->pid == getpid() )

So the questions are:

Given a code library with some exported functions which CAN be
executed outside a particular thread and others that MUST be executed in
a particular thread, how can I efficiently prevent or trap using of
these functions outside the proper execution context?

Would gettid() be any better?

Thx

Russ

Russell Leighton wrote:

>
> I have read the discussion on this issue and I wanted to get
> confirmation of my understanding...
>
> Issue:
> It seems glibc is caching getpid() which is wrong and breaks
> programs like mine.
>
> You are using an older version of glibc than I and that is why you
> could run the test program.
>
> Given the above, that means that:
> Upgrading my kernel on the FedoraCore2 system won't help because
> the bug is in glibc
>
> Assuming that this is a "new" feature of glibc, any others
> upgrading would then starting seeing this bug
>
> Linus Torvalds wrote:
>
>> On Sat, 5 Jun 2004, Russell Leighton wrote:
>>
>>
>>> I have a test program (see attached) that shows what looks like a
>>> bug in 2.6.5-1.358 (FedoraCore2)...and breaks my program :(
>>>
>>> In summary, I am doing:
>>>
>>> clone(run_thread, stack + sizeof(stack) -1,
>>> CLONE_FS|CLONE_FILES|CLONE_VM|SIGCHLD, NULL))
>>>
>>> According to the man page the child process should have its own pid
>>> as returned by getpid()...much like fork().
>>>
>>> In 2.6 the child receives the parent's pid from getpid(), while 2.4
>>> works as documented:
>>>
>>> In 2.4 the test program does:
>>> parent pid: 26647
>>> clone returned pid: 26648
>>> thread reported pid: 26648
>>>
>>> In 2.6 the test program does:
>>> parent pid: 16665
>>> thread reported pid: 16665
>>> clone returned pid: 16666
>>>
>>
>>
>> Hmm.. The above is the correct behaviour if you use CLONE_THREAD
>> ("getpid()" will then return the _thread_ ID), but it shouldn't
>> happen without that. And clearly you don't have it set.
>>
>> And indeed, it doesn't happen for me on my system:
>>
>> parent pid: 13552
>> thread reported pid: 13553
>> clone returned pid: 13553
>>
>> so I wonder if either the Fedora libc always adds that CLONE_THREAD
>> thing
>> to the clone() calls, or whether the FC2 kernel is buggy.
>>
>> Arjan?
>>
>> Linus
>>
>>
>>
>>
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>

2004-06-06 15:44:14

by Robert Love

[permalink] [raw]
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]

On Sun, 2004-06-06 at 11:38 -0400, Russell Leighton wrote:

> Given a code library with some exported functions which CAN be
> executed outside a particular thread and others that MUST be executed in
> a particular thread, how can I efficiently prevent or trap using of
> these functions outside the proper execution context?

Are you sure you cannot use pthreads? The new implementation (NPTL) has
a lot of improvements, including saner signal handling behavior.

If not, you probably are out of luck. Best I can think of is somehow
using thread-specific storage, but you would obviously need to index
into it using something OTHER than the PID. Which basically leaves you
with the stack. So, unless you could cache the PID in a local
variable..

> Would gettid() be any better?

If you aren't using CLONE_THREAD, gettid() will just return the PID.
And if you were using CLONE_THREAD, then getpid() would be worthless for
you and you would have to use gettid(). Either way, though, they
basically do the same thing (return current->tid vs. current->pid).

Robert Love


2004-06-06 15:58:15

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]

On Sun, 2004-06-06 at 17:38, Russell Leighton wrote:
> I have a library that creates 2 threads using clone().
> [NOTE: I can't use pthreads for a variety of reasons, mostly due
> to the wacky signal handling rules...it turns out that using clone() is
> cleaner for me anyway.]

a library using clone sounds suspect to me, I can't imagine an app using
pthreads being able to just use your library as a result. Note also that
clone() is not a portable interface even within linux architectures.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-06-06 16:33:21

by Chris Evans

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

Hi Linus,

On Sat, 5 Jun 2004, Linus Torvalds wrote:

> But no, even despite the strange usage, this isn't a performance issue.
> qmail will call "getpid()" a few tens of times per connection because of
> the wonderful quality of randomness it provides, or something.

The openssl library seems to have a strange love for getpid() too. At
least, my old-ish version "openssl-0.9.6b-35.7" does.
My strace logs show over 50 consecutive getpid() calls during the
processing of the SSL_accept() routine. (I'm adding SSL support to vsftpd;
SSL_accept() is called every client connect and also to accept passive
data connections).

Cheers
Chris

2004-06-06 16:57:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?



On Sun, 6 Jun 2004, Erik Andersen wrote:
>
> http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/libc/sysdeps/posix/tempname.c?rev=1.36&content-type=text/plain&cvsroot=glibc
>
> /* Get some more or less random data. */
> random_time_bits = ((uint64_t) tv.tv_usec << 16) ^ tv.tv_sec;
> value += random_time_bits ^ __getpid ();

This one is historically at least a bit understandable: it's quite common
for various programs to create their lockfiles with the "unique"
identifier that is the pid of the locker. So the algorithm used to be
roughly (used by various things, ranging from tty subsystem locking to
whatever):

int fd, len;
int pid = getpid();

len = sprintf(name, "lockfile.%d", pid);
fd = open(name, O_EXCL | O_CREAT | O_WRONLY, 0644);
if (fd < 0) {
if (errno == EEXIST) {
fprintf(stderr, "Stale lockfile %s. Remove me\n", name);
exit(1); /* Or just "unlink + repeat" */
}
perror("lockfile");
exit(1);
}

/* Write the pid into the lockfile, fsync it */
write(fd, name + 9, len - 9);
fsync(fd);

/* Try to move it atomically */
while (rename("lockfile", name) < 0) {
if (errno != EEXIST) {
perror("lockfile");
exit(1);
}
/* Try again later */
sleep(1);
}

... We now have an exclusive lock ..

and here "pid" is in fact a good system-wide unique identifier. It's not
random, and it works badly with networked filesystems (which is why you'll
find versions of this algorithm that use the hostname in addition to the
pid too), but it's "obvious".

I think the original "mktemp()" also used to start off with "pid" as a
"unique" identifier. Because while it's not random, it _is_ system-wide
unique, so getpid() actually makes _sense_ in that respect.

In other words: getpid() makes little sense as a random value, but it does
make some sense as a unique value. Maybe less now than it did 20 years ago
due to the prevalence of networking, but history is powerful.

Linus

2004-06-06 17:19:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]



On Sun, 6 Jun 2004, Russell Leighton wrote:
>
> Linus said he could not imagine a program using getpid() more than a
> handful of times...

No, I said that I could not imagine using it more than a handful of times
_except_ for the cases of:

- thread identification without a native thread area
- benchmarking.

(and in both of these cases it is _wrong_ to cache the pid value)

> well, (I am ashamed to admit it) I found this getpid() bug with just
> such a program.
>
> Could someone can suggest an alternative to using getpid() for me?
> Here's the problem...
>
> I have a library that creates 2 threads using clone().

Your problem falls under the thread ID thing. It's fine and understandable
to use getpid() for that, although you could probably do it faster if you
are willing to use the support that modern kernels give you and that glibc
uses: the "TLS" aka Thread Local Storage support.

Thread-local storage involved having a user-mode register that points some
way to a special part of the address space. On x86, where the general
register set is very limited and stealing a general reg would thus be bad,
it uses a segment and loads the TLS pointer into a segment register
(segment registers are registers too - and since nobody sane uses them for
anything else these days, both %gs and %fs are freely usable).

> Would gettid() be any better?

You'd avoid this particular glibc bug with gettid.

Linus

2004-06-06 17:21:05

by Patrick J. LoPresti

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

"Paul Rolland" <[email protected]> writes:

> Hello,
>
> > It does that for the unique filenames and id stamps
> > (maildir format and
>
> I though this was a job for mktemp and co.

Sure, if you want to create a unique filename on a local unshared
disk.

But suppose you want a unique filename on a partition which might be
NFS-mounted with multiple concurrent writers. Oh, and the year is
1997. Still want to use mktemp?

Bernstein decided to name each file:

<time>.<PID>.<fully-qualified-domain-name>

This approach makes three assumptions. First, that the time does not
repeat. Second, that the FQDN is unique. Third, that the time
changes (i.e., one second elapses) before a PID is reused on the local
system. Subject to these assumptions, this is a perfectly reliable,
very efficient, lock-free way to create unique file names.

Can you do better?

Sure, Bernstein's code is an unmaintainable nightmare. But his
designs and implementations are usually pretty sound. The last
release of qmail was in 1998, and we still use it where I work. Six
years with zero crashes, zero dropped messages, and zero security
holes is not a trivial track record for an MTA.

Just don't ever try to edit the sources. :-)

- Pat

2004-06-06 17:31:28

by Paul Rolland

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

> > I though this was a job for mktemp and co.
>
> Sure, if you want to create a unique filename on a local unshared
> disk.

Right.

> But suppose you want a unique filename on a partition which might be
> NFS-mounted with multiple concurrent writers. Oh, and the year is
> 1997. Still want to use mktemp?

Probably not...

> Bernstein decided to name each file:
>
> <time>.<PID>.<fully-qualified-domain-name>
>
> This approach makes three assumptions. First, that the time does not
> repeat. Second, that the FQDN is unique. Third, that the time
> changes (i.e., one second elapses) before a PID is reused on the local
> system. Subject to these assumptions, this is a perfectly reliable,
> very efficient, lock-free way to create unique file names.
>
> Can you do better?
Not sure it's the best place to run a contest...
However, I could have used something like :
<FQDN>.<time-up-to-usec>.<random-number>

But this is biased because I know I must find an answer with <PID> :-)

> Sure, Bernstein's code is an unmaintainable nightmare. But his
> designs and implementations are usually pretty sound. The last
> release of qmail was in 1998, and we still use it where I work. Six
> years with zero crashes, zero dropped messages, and zero security
> holes is not a trivial track record for an MTA.

One point for him.

Paul


2004-06-06 17:48:01

by Davide Libenzi

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

On Sun, 6 Jun 2004, Patrick J. LoPresti wrote:

> "Paul Rolland" <[email protected]> writes:
>
> > Hello,
> >
> > > It does that for the unique filenames and id stamps
> > > (maildir format and
> >
> > I though this was a job for mktemp and co.
>
> Sure, if you want to create a unique filename on a local unshared
> disk.
>
> But suppose you want a unique filename on a partition which might be
> NFS-mounted with multiple concurrent writers. Oh, and the year is
> 1997. Still want to use mktemp?
>
> Bernstein decided to name each file:
>
> <time>.<PID>.<fully-qualified-domain-name>
>
> This approach makes three assumptions. First, that the time does not
> repeat. Second, that the FQDN is unique. Third, that the time
> changes (i.e., one second elapses) before a PID is reused on the local
> system. Subject to these assumptions, this is a perfectly reliable,
> very efficient, lock-free way to create unique file names.

As long as 1) your program is not MT 2) your process creates exactly *one*
unique file. Then yes, it is fine. But, if your program is MT (with
threads sharing the PID) or your single thread process creates two (or
more) unique files within 1 sec, you've got a problem. Using gettid()
would solve the MT issue (there's still the tid recycling period), but not
the time issue.



- Davide

2004-06-06 17:55:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?



On Sun, 6 Jun 2004, Denis Vlasenko wrote:
> >
> > if (now() - when < ((60 + (getpid() & 31)) << 6))
>
> timeout randomization across several qmail-send processes, to prevent
> stampede effect?
>
> > And don't you find
> > the above (totally uncommented) line just a thing of beauty and clarity?
>
> Yes, a comment would be in place. :(

The problem is not so much what it does, but how it's coded.

For example, if it really wanted to do this, then how about having a nice
variable that is initialized at run-time (preferably just once at
startup), and has a nice name and a comment on what it's all about? Or
even a macro, to make it more readable? Or even _just_ the comment?

You could write the above incomprehensible one-liner as

/* Timeout in seconds: 64 - 97 minutes, depending on pid */
#define ERROR_FAILURE_TIMEOUT ((60 + (getpid() & 31)) << 6)

...

/*
* If this IP address had a SMTP connection timeout last
* time, don't let it try to connect again immediately
*/
if (now() - when < ERROR_FAILURE_TIMEOUT)

and it would at least have made some sense.

As it is, you CANNOT read "tcpto.c" and make any sense of it. It's not
sensible code. You have to know what the hell the whole thing is all
about, and you have to actually try to figure out what the expression is
to even understand the code.

Trust me. Try sometime. The code is horrible.

It wouldn't irritate me so much, but then the person who wrote that
abomination has the galls to complain about sendmail, NFS, System V etc.

And it's not like qmail is so big and has such a long history that it
would be a huge deal to comment it and write it more readably. But since
the license doesn't allow for anything but patches, and patches would be
totally unmaintainable if they tried to fix these kinds of things, it will
never be done.

Which is a pity, since I actually _agree_ with you that qmail has some
good sides: it's small, it's efficient, and it's pretty modular. It has
the _potential_ to be great code.

Linus

2004-06-06 18:17:31

by Rik van Riel

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

On 6 Jun 2004, Patrick J. LoPresti wrote:

> Sure, Bernstein's code is an unmaintainable nightmare. But his
> designs and implementations are usually pretty sound. The last
> release of qmail was in 1998, and we still use it where I work.

Ahhh, so YOU'RE the guy who's flooding the whole world
with bounces from spam and virusses to unknown users,
with the virusses still attached ? ;)))

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

2004-06-06 18:37:24

by Patrick J. LoPresti

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

Rik van Riel <[email protected]> writes:

> On 6 Jun 2004, Patrick J. LoPresti wrote:
>
> > Sure, Bernstein's code is an unmaintainable nightmare. But his
> > designs and implementations are usually pretty sound. The last
> > release of qmail was in 1998, and we still use it where I work.
>
> Ahhh, so YOU'RE the guy who's flooding the whole world
> with bounces from spam and virusses to unknown users,
> with the virusses still attached ? ;)))

OK, I admit we no longer use it as an external SMTP server. The world
has changed a bit since 1998. :-)

My main point still stands, though, which is that djb is crazy, not
stupid (i.e., a mathematician).

- Pat

2004-06-06 18:53:30

by Simon Kirby

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

On Sun, Jun 06, 2004 at 09:57:20AM -0700, Linus Torvalds wrote:

> /* Write the pid into the lockfile, fsync it */
> write(fd, name + 9, len - 9);
> fsync(fd);

Unrelated to this discussion -- and there is a close() missing -- but is
there any reason for fsync() to be there? I've seen this often before,
but I've never understood why it would be necessary to force the data to
disk, especially when it will likely be removed later before it would
have otherwise been written to disk. Shouldn't the lock file behave
properly without fsync(), even across NFS, and even across all OSes?

Simon-

2004-06-06 19:00:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?



On Sun, 6 Jun 2004, Simon Kirby wrote:
>
> Unrelated to this discussion -- and there is a close() missing -- but is
> there any reason for fsync() to be there?

It only matters if the lock is meaningful over an involuntary reboot (aka
crash). Usually it isn't. So yes, the fsync() is usually not necessary.

Sometimes the lock file may contain real data that is meaningful for
recovery, though. The pid generally isn't, but it could point to a log
that describes what the thing was about to do. Or it could just be helpful
on next reboot when you have a stale lockfile, and you want to match that
lockfile to any syslog messages or similar.

Linus

2004-06-06 23:45:48

by Russell Leighton

[permalink] [raw]
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]

Arjan van de Ven wrote:

> Note also that
>
>clone() is not a portable interface even within linux architectures.
>
>
>
Really???!!! How so?

2004-06-07 00:05:24

by Russell Leighton

[permalink] [raw]
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]

Arjan van de Ven wrote:

>On Sun, 2004-06-06 at 17:38, Russell Leighton wrote:
>
>
>>I have a library that creates 2 threads using clone().
>>[NOTE: I can't use pthreads for a variety of reasons, mostly due
>>to the wacky signal handling rules...it turns out that using clone() is
>>cleaner for me anyway.]
>>
>>
>
>a library using clone sounds suspect to me, I can't imagine an app using
>pthreads being able to just use your library as a result.
>
Why? In what way would a program that uses pthreads interfere with
threads created using clone()?

2004-06-07 00:17:00

by Russell Leighton

[permalink] [raw]
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]


Ah, the stack. Yes, I think that will work. You see, the functions that
need this test
always take a pointer to a struct that holds all the thread info called
'mon'. That info is the pid of the
thread , the ptr to the stack mmap'd and the size of the stack, I think
I can change the test to:

{
int32_t
stackvar;

/* if address of the stackvar is OUTSIDE the stack of handler
thread, then
you are not running this function from the handler thread */
if ( &stackvar < mon->handler_q.thread->stack ||
&stackvar > mon->handler_q.thread->stack +
mon->handler_q.thread->stacksize - 1) {

fprintf(stderr, "bad, bad, bad!\n");
exit(-1);

}

}

Would that work? If so, that is nice because no syscall.

Robert Love wrote:

>On Sun, 2004-06-06 at 11:38 -0400, Russell Leighton wrote:
>
>
>
>> Given a code library with some exported functions which CAN be
>>executed outside a particular thread and others that MUST be executed in
>>a particular thread, how can I efficiently prevent or trap using of
>>these functions outside the proper execution context?
>>
>>
>
>Are you sure you cannot use pthreads? The new implementation (NPTL) has
>a lot of improvements, including saner signal handling behavior.
>
>If not, you probably are out of luck. Best I can think of is somehow
>using thread-specific storage, but you would obviously need to index
>into it using something OTHER than the PID. Which basically leaves you
>with the stack. So, unless you could cache the PID in a local
>variable..
>
>
>
>> Would gettid() be any better?
>>
>>
>
>If you aren't using CLONE_THREAD, gettid() will just return the PID.
>And if you were using CLONE_THREAD, then getpid() would be worthless for
>you and you would have to use gettid(). Either way, though, they
>basically do the same thing (return current->tid vs. current->pid).
>
> Robert Love
>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>
>

2004-06-07 12:35:12

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]


On Sun, Jun 06, 2004 at 07:49:50PM -0400, Russell Leighton wrote:
> Arjan van de Ven wrote:
>
> >Note also that
> >
> >clone() is not a portable interface even within linux architectures.
> >
> >
> >
> Really???!!! How so?

for example ia64 doesn't have it.


Attachments:
(No filename) (261.00 B)
(No filename) (189.00 B)
Download all attachments

2004-06-07 13:12:07

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]

On Sun, Jun 06, 2004 at 08:09:17PM -0400, Russell Leighton wrote:
> >a library using clone sounds suspect to me, I can't imagine an app using
> >pthreads being able to just use your library as a result.
> >
> Why? In what way would a program that uses pthreads interfere with
> threads created using clone()?

you do all kinds of tricks behind the threading library's back, and share
some stuff while not others. I'd be suprised of that would NOT break.


Attachments:
(No filename) (456.00 B)
(No filename) (189.00 B)
Download all attachments

2004-06-07 13:48:37

by Sean Neakums

[permalink] [raw]
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]

Arjan van de Ven <[email protected]> writes:

> On Sun, Jun 06, 2004 at 07:49:50PM -0400, Russell Leighton wrote:
>> Arjan van de Ven wrote:
>>
>> >Note also that
>> >
>> >clone() is not a portable interface even within linux architectures.
>> >
>> >
>> >
>> Really???!!! How so?
>
> for example ia64 doesn't have it.

Then what is the sys_clone2 implementation in arch/is64/kernel/entry.S for?

2004-06-07 14:00:18

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]

On Mon, Jun 07, 2004 at 02:48:31PM +0100, Sean Neakums wrote:
> > for example ia64 doesn't have it.
>
> Then what is the sys_clone2 implementation in arch/is64/kernel/entry.S for?

It's clone with a slightly different calling convention.

2004-06-07 14:11:58

by Sean Neakums

[permalink] [raw]
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]

Christoph Hellwig <[email protected]> writes:

> On Mon, Jun 07, 2004 at 02:48:31PM +0100, Sean Neakums wrote:
>> > for example ia64 doesn't have it.
>>
>> Then what is the sys_clone2 implementation in arch/is64/kernel/entry.S for?
>
> It's clone with a slightly different calling convention.

Ah. I misintereted Arjan as saying that ia64 didn't have a clone at
all, which would have been pretty wacky. Sorry for the noise.

2004-06-07 18:22:28

by Bruce Guenter

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

On Mon, Jun 07, 2004 at 11:35:23PM +0000, Linus Torvalds wrote:
> On Sun, 6 Jun 2004, Kalin KOZHUHAROV wrote:
> >
> > Well, not exactly sure about my reply, but let me try.
> >
> > The other day I was debugging some config problems with my qmail instalation and I ended up doing:
> > # strace -p 4563 -f -F
> > ...
> > [pid 13097] read(3, "\347\374\375TBH~\342\233\337\220\302l\220\317\237\37\25"..., 32) = 32
> > [pid 13097] close(3) = 0
> > [pid 13097] getpid() = 13097
> > [pid 13097] getpid() = 13097
> > [pid 13097] getuid32() = 89
> > [pid 13097] getpid() = 13097
> > [pid 13097] time(NULL) = 1086497450
> > [pid 13097] getpid() = 13097
> > [pid 13097] getpid() = 13097
> > [pid 13097] getpid() = 13097
>
> qmail is a piece of crap. The source code is completely unreadable, and it
> seems to think that "getpid()" is a good source of random data. Don't ask
> me why.

In this case, however, it has nothing directly to do with qmail. This
is tcpserver, and tcpserver only uses getpid for two things: printing
out status lines with the PID in them (which seems perfectly valid to
me), and once when adding to random initializer for DNS requests.

This strace pattern seemed rather odd to me, so for comparison I straced
my own tcpserver setups, and could not get them to produce more than two
getpid calls per connection. Something is wrong with this trace,
possibly some weirdness in a patch, like whatever the SSL library is
doing.
--
Bruce Guenter <[email protected]> http://em.ca/~bruceg/ http://untroubled.org/
OpenPGP key: 699980E8 / D0B7 C8DD 365D A395 29DA 2E2A E96F B2DC 6999 80E8


Attachments:
(No filename) (1.72 kB)
(No filename) (189.00 B)
Download all attachments

2004-06-07 18:42:30

by David Mosberger

[permalink] [raw]
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]

>>>>> On Mon, 7 Jun 2004 15:00:09 +0100, Christoph Hellwig <[email protected]> said:

Christoph> On Mon, Jun 07, 2004 at 02:48:31PM +0100, Sean Neakums
Christoph> wrote:
>> > for example ia64 doesn't have it.

>> Then what is the sys_clone2 implementation in
>> arch/is64/kernel/entry.S for?

Christoph> It's clone with a slightly different calling convention.

Note that the only difference is that the stack-area is expressed as a
range (starting-address + size), rather than a direct stack-pointer
value. IMHO, it was a mistake to not do it that way right from the
beginning (consider that different arches grow stacks in different
directions, for example).

--david

2004-06-07 22:58:36

by Russell Leighton

[permalink] [raw]
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]

David Mosberger wrote:

>>>>>>On Mon, 7 Jun 2004 15:00:09 +0100, Christoph Hellwig <[email protected]> said:
>>>>>>
>>>>>>
>
> Christoph> On Mon, Jun 07, 2004 at 02:48:31PM +0100, Sean Neakums
> Christoph> wrote:
> >> > for example ia64 doesn't have it.
>
> >> Then what is the sys_clone2 implementation in
> >> arch/is64/kernel/entry.S for?
>
> Christoph> It's clone with a slightly different calling convention.
>
>Note that the only difference is that the stack-area is expressed as a
>range (starting-address + size), rather than a direct stack-pointer
>value. IMHO, it was a mistake to not do it that way right from the
>beginning (consider that different arches grow stacks in different
>directions, for example).
>
>
>
So Ia64 does have it..that's good. Does glibc wrap it?

I agree with the above...could glibc's clone() should have a size added?
Then the arch specific stack issues
could be hidden.

BTW, does gcc have a built-in #define like __STACK_GROWSUP__ that would
allow one to deal with the missing size parameter
when you called clone() by adjusting what you passed with and #ifdef?.

2004-06-07 23:27:36

by David Mosberger

[permalink] [raw]
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]

>>>>> On Mon, 07 Jun 2004 19:02:34 -0400, Russell Leighton <[email protected]> said:

Russell> So Ia64 does have it..that's good. Does glibc wrap it?

Here is how it works at the user-level:

- There _is_ a clone(), with the original interface. However,
this version only works when you create a new address-space.

- There is clone2(), which adds the extra "size" argument. This
one works for all cases.

Russell> I agree with the above...could glibc's clone() should have
Russell> a size added? Then the arch specific stack issues could be
Russell> hidden.

In my opinion, it would make sense for all platforms to support
clone2(), since it's more in line with the normal UNIX-convention of
specifying stacks as a memory-range (e.g., see stack_t). So far, the
interest in doing this has been lack-luster (and, IIRC, Linus was
against it in the past, so I haven't spent a lot of effort on it).

--david

2004-06-08 06:03:48

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]

On Mon, Jun 07, 2004 at 07:02:34PM -0400, Russell Leighton wrote:
> >
> So Ia64 does have it..that's good. Does glibc wrap it?
>
> I agree with the above...could glibc's clone() should have a size added?
> Then the arch specific stack issues
> could be hidden.

glibc doesn't provide clone other than a raw syscall wrapper, under the
assumption that when you want threads, you'll use it's thread creation call.
Not too unfair imo.


Attachments:
(No filename) (433.00 B)
(No filename) (189.00 B)
Download all attachments

2004-06-08 09:51:23

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]

Arjan van de Ven <[email protected]> writes:

> On Mon, Jun 07, 2004 at 07:02:34PM -0400, Russell Leighton wrote:
> > >
> > So Ia64 does have it..that's good. Does glibc wrap it?
> >
> > I agree with the above...could glibc's clone() should have a size added?
> > Then the arch specific stack issues
> > could be hidden.
>
> glibc doesn't provide clone other than a raw syscall wrapper, under the
> assumption that when you want threads, you'll use it's thread creation call.
> Not too unfair imo.

That fn parameter certainly more than a raw wrapper. I do agree that
what is needed is a fairly raw wrapper though.

I don't see how creating a clone2 wrapper that drops the extra argument
on platforms that don't use it is any different than what glibc already
does though.

Eric


2004-06-08 11:07:07

by Kalin KOZHUHAROV

[permalink] [raw]
Subject: Re: clone() <-> getpid() bug in 2.6?

Bruce Guenter wrote:
> On Mon, Jun 07, 2004 at 11:35:23PM +0000, Linus Torvalds wrote:
>
>>On Sun, 6 Jun 2004, Kalin KOZHUHAROV wrote:
>>
>>>Well, not exactly sure about my reply, but let me try.
>>>
>>>The other day I was debugging some config problems with my qmail instalation and I ended up doing:
>>># strace -p 4563 -f -F
>>>...
>>>[pid 13097] read(3, "\347\374\375TBH~\342\233\337\220\302l\220\317\237\37\25"..., 32) = 32
>>>[pid 13097] close(3) = 0
>>>[pid 13097] getpid() = 13097
>>>[pid 13097] getpid() = 13097
>>>[pid 13097] getuid32() = 89
>>>[pid 13097] getpid() = 13097
>>>[pid 13097] time(NULL) = 1086497450
>>>[pid 13097] getpid() = 13097
>>>[pid 13097] getpid() = 13097
>>>[pid 13097] getpid() = 13097
>>
>>qmail is a piece of crap. The source code is completely unreadable, and it
>>seems to think that "getpid()" is a good source of random data. Don't ask
>>me why.
>
>
> In this case, however, it has nothing directly to do with qmail. This
> is tcpserver, and tcpserver only uses getpid for two things: printing
> out status lines with the PID in them (which seems perfectly valid to
> me), and once when adding to random initializer for DNS requests.
>
> This strace pattern seemed rather odd to me, so for comparison I straced
> my own tcpserver setups, and could not get them to produce more than two
> getpid calls per connection. Something is wrong with this trace,
> possibly some weirdness in a patch, like whatever the SSL library is
> doing.

Yes, it is strange. I'll have a look at the applied patches. Without SSL there is no such getpidding :-)
Will post you (not LKML) if I find the culprit.

Kalin.

--
||///_ o *****************************
||//'_/> WWW: http://ThinRope.net/
|||\/<"
|||\\ '
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

2004-06-12 09:15:50

by Dominik Straßer

[permalink] [raw]
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]

Linus Torvalds wrote:

>On Sun, 6 Jun 2004, Russell Leighton wrote:
>
>
>>Linus said he could not imagine a program using getpid() more than a
>>handful of times...
>>
>>
>
>No, I said that I could not imagine using it more than a handful of times
>_except_ for the cases of:
>
> - thread identification without a native thread area
> - benchmarking.
>
>(and in both of these cases it is _wrong_ to cache the pid value)
>
>
>
>>well, (I am ashamed to admit it) I found this getpid() bug with just
>>such a program.
>>
>>Could someone can suggest an alternative to using getpid() for me?
>>Here's the problem...
>>
>>I have a library that creates 2 threads using clone().
>>
>>
>
>Your problem falls under the thread ID thing. It's fine and understandable
>to use getpid() for that, although you could probably do it faster if you
>are willing to use the support that modern kernels give you and that glibc
>uses: the "TLS" aka Thread Local Storage support.
>
>Thread-local storage involved having a user-mode register that points some
>way to a special part of the address space. On x86, where the general
>register set is very limited and stealing a general reg would thus be bad,
>it uses a segment and loads the TLS pointer into a segment register
>(segment registers are registers too - and since nobody sane uses them for
>anything else these days, both %gs and %fs are freely usable).
>
>
>
>> Would gettid() be any better?
>>
>>
>
>You'd avoid this particular glibc bug with gettid.
>
>
I am facing the following problem:
I want to sum up the time spent in the main thread + all threads that
ever existed.
getrusage dosn't work (and didn't do so in pre-NPTL-times) as the time
spent in threads is not taken into account.
To work around this problem I created a map pid->time used which used
getpid in the pre-NPTL-time and looked up the time in /proc/<pid>. As
this doesn't work with NPTL, changed it to use the gettid syscall as I
didn't find a saner way.

Regards

Dominik

2004-06-12 13:48:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: Using getpid() often, another way? [was Re: clone() <-> getpid() bug in 2.6?]



On Sat, 12 Jun 2004, Dominik Stra?er wrote:
>
> I am facing the following problem:
> I want to sum up the time spent in the main thread + all threads that
> ever existed.
> getrusage dosn't work (and didn't do so in pre-NPTL-times) as the time
> spent in threads is not taken into account.

Hmm.. That's likely a bug. It definitely should work, but I guess the
self-reaping ends up meaning that the time never gets percolated to the
parent any more.

Ingo, any comments/ideas?

> To work around this problem I created a map pid->time used which used
> getpid in the pre-NPTL-time and looked up the time in /proc/<pid>. As
> this doesn't work with NPTL, changed it to use the gettid syscall as I
> didn't find a saner way.

Changing it to gettid sounds like the right thing to do, but I also think
that you shouldn't _need_ to do things like this.

Linus