2003-11-08 17:46:40

by Denis Vlasenko

[permalink] [raw]
Subject: 2.6-test6: nanosleep+SIGCONT weirdness

I observe some strange behaviour in 2.6-test6 with this small program:

n.c
===
#include <sys/time.h>
#include <errno.h>
int main(int argc, char* argv[]) {
struct timespec t = { 5000,0};
while(nanosleep(&t,&t)<0) {
puts("Yeah");
if(errno!=EINTR) break;
}
return 0;
}

In 2.4 stracing it while doing killall -CONT ./n works fine:

# strace ./n
execve("./n", ["./n", "5000"], [/* 23 vars */]) = 0
nanosleep({5000, 0}, 0xbffffd54) = -1 EINTR (Interrupted system
call)
--- SIGCONT (Continued) ---
write(1, "Yeah", 4Yeah) = 4
write(1, "\n", 1
) = 1
nanosleep({4994, 730000000}, 0xbffffd54) = -1 EINTR (Interrupted system
call)
--- SIGCONT (Continued) ---
write(1, "Yeah", 4Yeah) = 4
write(1, "\n", 1
) = 1
nanosleep({4994, 280000000}, 0xbffffd54) = -1 EINTR (Interrupted system
call)
--- SIGCONT (Continued) ---
write(1, "Yeah", 4Yeah) = 4
write(1, "\n", 1
) = 1
nanosleep({4993, 930000000},

But in 2.6 it does this:

# strace ./n
execve("./n", ["./n", "400"], [/* 26 vars */]) = 0
nanosleep({5000, 0}, 0xbffffc44) = -1 ERRNO_516 (errno 516)
--- SIGCONT (Continued) ---
setup() = -1 EFAULT (Bad address)
--- SIGCONT (Continued) ---
write(1, "Yeah", 4Yeah) = 4
write(1, "\n", 1
) = 1
_exit(0) = ?
--
vda


2003-11-08 18:29:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6-test6: nanosleep+SIGCONT weirdness


On Sat, 8 Nov 2003, Denis wrote:
>
> I observe some strange behaviour in 2.6-test6 with this small program:

Good catch.

That nanosleep restart seems to be broken, and quite frankly, looking at
the mess in kernel/posix-timers.c I'm not all that surprised. The code is
total and absolute crap. I have no idea how it's even supposed to work.

I suspect that it might just work right if you disable the nanosleep stuff
by undefining FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP in <linux/signal.h>.
Because unlike the "folded" version in posix-timers.c, the original
version at least looks sane.

Linus

2003-11-08 18:45:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6-test6: nanosleep+SIGCONT weirdness


On Sat, 8 Nov 2003, Linus Torvalds wrote:
>
> That nanosleep restart seems to be broken, and quite frankly, looking at
> the mess in kernel/posix-timers.c I'm not all that surprised. The code is
> total and absolute crap. I have no idea how it's even supposed to work.

I'd suggest just removing the regular nanosleep() emulation from there.
The clock_nanosleep() restart is likely still broken, but at least this
way the _regular_ nanosleep() system call works correctly, and fixing
clock_nanosleep() is likely easier since the restart stuff doesn't have to
worry about _which_ system call it should restart.

Denis, does this work for you?

Linus

-----
===== include/linux/signal.h 1.13 vs edited =====
--- 1.13/include/linux/signal.h Mon Jun 2 04:13:33 2003
+++ edited/include/linux/signal.h Sat Nov 8 10:42:20 2003
@@ -214,7 +214,7 @@
struct pt_regs;
extern int get_signal_to_deliver(siginfo_t *info, struct pt_regs *regs, void *cookie);
#endif
-#define FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
+
#endif /* __KERNEL__ */

#endif /* _LINUX_SIGNAL_H */
===== kernel/posix-timers.c 1.21 vs edited =====
--- 1.21/kernel/posix-timers.c Sun Sep 21 14:50:25 2003
+++ edited/kernel/posix-timers.c Sat Nov 8 10:41:57 2003
@@ -1104,29 +1104,6 @@
extern long do_clock_nanosleep(clockid_t which_clock, int flags,
struct timespec *t);

-#ifdef FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
-
-asmlinkage long
-sys_nanosleep(struct timespec __user *rqtp, struct timespec __user *rmtp)
-{
- struct timespec t;
- long ret;
-
- if (copy_from_user(&t, rqtp, sizeof (t)))
- return -EFAULT;
-
- if ((unsigned) t.tv_nsec >= NSEC_PER_SEC || t.tv_sec < 0)
- return -EINVAL;
-
- ret = do_clock_nanosleep(CLOCK_REALTIME, 0, &t);
-
- if (ret == -ERESTART_RESTARTBLOCK && rmtp &&
- copy_to_user(rmtp, &t, sizeof (t)))
- return -EFAULT;
- return ret;
-}
-#endif // ! FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
-
asmlinkage long
sys_clock_nanosleep(clockid_t which_clock, int flags,
const struct timespec __user *rqtp,
@@ -1244,7 +1221,7 @@
return 0;
}
/*
- * This will restart either clock_nanosleep or clock_nanosleep
+ * This will restart clock_nanosleep
*/
long
clock_nanosleep_restart(struct restart_block *restart_block)
===== kernel/timer.c 1.72 vs edited =====
--- 1.72/kernel/timer.c Tue Oct 21 22:09:54 2003
+++ edited/kernel/timer.c Sat Nov 8 10:42:37 2003
@@ -1059,7 +1059,6 @@
{
return current->pid;
}
-#ifndef FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP

static long nanosleep_restart(struct restart_block *restart)
{
@@ -1118,7 +1117,6 @@
}
return ret;
}
-#endif // ! FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP

/*
* sys_sysinfo - fill in sysinfo struct

2003-11-08 18:46:36

by Anton Blanchard

[permalink] [raw]
Subject: Re: 2.6-test6: nanosleep+SIGCONT weirdness


> I observe some strange behaviour in 2.6-test6 with this small program:

Something looks wrong with the syscall restart stuff in 2.6, I noticed it
when doing:

sleep 600

Then doing ctrl z; fg twice.

Happens on x86 and ppc32.

Anton

2003-11-08 18:56:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6-test6: nanosleep+SIGCONT weirdness


On Sun, 9 Nov 2003, Anton Blanchard wrote:
>
> > I observe some strange behaviour in 2.6-test6 with this small program:
>
> Something looks wrong with the syscall restart stuff in 2.6, I noticed it
> when doing:

No, the restart code is fine. But the posix timer code looks fundamentally
broken. Try the patch I just sent, I bet it fixes it.

I can try to fix the posix-timer.c code too, but it looks like even modern
glibc doesn't even export the clock_nanosleep() function. So it might not
be worth fixing at this point..

Linus

2003-11-08 19:19:49

by Ulrich Drepper

[permalink] [raw]
Subject: Re: 2.6-test6: nanosleep+SIGCONT weirdness

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Linus Torvalds wrote:

> I can try to fix the posix-timer.c code too, but it looks like even modern
> glibc doesn't even export the clock_nanosleep() function. So it might not
> be worth fixing at this point..

Of course it's supported. the clock_* functions are in librt. And they
are supported and, when working, greatly increase the usability. The
old user-level implementation is as good as we get it but really not up
to the job.

- --
- --------------. ,-. 444 Castro Street
Ulrich Drepper \ ,-----------------' \ Mountain View, CA 94041 USA
Red Hat `--' drepper at redhat.com `---------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/rUGN2ijCOnn/RHQRAtV2AKCTXNl4dAW2OXzxmmvox6uEXxBSBwCfboRq
Ml2plaRE9tY7YzB19auWmUQ=
=sZrM
-----END PGP SIGNATURE-----

2003-11-08 19:46:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6-test6: nanosleep+SIGCONT weirdness


On Sat, 8 Nov 2003, Ulrich Drepper wrote:
>
> Of course it's supported. the clock_* functions are in librt. And they
> are supported and, when working, greatly increase the usability. The
> old user-level implementation is as good as we get it but really not up
> to the job.

Okey-dokey - I verified that the clock_nanosleep() function is also
broken.

I'll see if I can see what the mess is all about. That posix-timer "retry"
code really is a pretty horrible thing.

Linus


2003-11-08 20:17:58

by Anton Blanchard

[permalink] [raw]
Subject: Re: 2.6-test6: nanosleep+SIGCONT weirdness


> No, the restart code is fine. But the posix timer code looks fundamentally
> broken. Try the patch I just sent, I bet it fixes it.

Yep, things work fine with your patch.

Anton

2003-11-08 20:34:32

by Denis Vlasenko

[permalink] [raw]
Subject: Re: 2.6-test6: nanosleep+SIGCONT weirdness

On Saturday 08 November 2003 20:45, Linus Torvalds wrote:
> On Sat, 8 Nov 2003, Linus Torvalds wrote:
> > That nanosleep restart seems to be broken, and quite frankly,
> > looking at the mess in kernel/posix-timers.c I'm not all that
> > surprised. The code is total and absolute crap. I have no idea how
> > it's even supposed to work.
>
> I'd suggest just removing the regular nanosleep() emulation from
> there. The clock_nanosleep() restart is likely still broken, but at
> least this way the _regular_ nanosleep() system call works correctly,
> and fixing clock_nanosleep() is likely easier since the restart stuff
> doesn't have to worry about _which_ system call it should restart.
>
> Denis, does this work for you?

Does not seem to work, same symptoms 8(

Although I did not do make clean since build system is supposed to be
fixed. I just patched the tree where kernel was already built,
and rebuild it (make bzImage modules modules_install).

Will apply to pristine test6 tree (or test9 - need to check whether it
is downloaded or not yet) and retest
--
vda

2003-11-09 08:18:36

by Denis Vlasenko

[permalink] [raw]
Subject: Re: 2.6-test6: nanosleep+SIGCONT weirdness

On Saturday 08 November 2003 22:32, Denis wrote:
> > > That nanosleep restart seems to be broken, and quite frankly,
> > > looking at the mess in kernel/posix-timers.c I'm not all that
> > > surprised. The code is total and absolute crap. I have no idea
> > > how it's even supposed to work.
> >
> > I'd suggest just removing the regular nanosleep() emulation from
> > there. The clock_nanosleep() restart is likely still broken, but at
> > least this way the _regular_ nanosleep() system call works
> > correctly, and fixing clock_nanosleep() is likely easier since the
> > restart stuff doesn't have to worry about _which_ system call it
> > should restart.
> >
> > Denis, does this work for you?
>
> Does not seem to work, same symptoms 8(

Ok I retested with pristine test9+patch and it works.
--
vda

2003-11-11 22:46:21

by George Anzinger

[permalink] [raw]
Subject: [PATCH] 2.6-test6: nanosleep+SIGCONT weirdness

diff -urP -I '\$Id:.*Exp \$' -X /usr/src/patch.exclude linux-2.6.0-test9-kb/MAINTAINERS linux/MAINTAINERS
--- linux-2.6.0-test9-kb/MAINTAINERS 2003-11-10 16:01:26.000000000 -0800
+++ linux/MAINTAINERS 2003-11-10 16:50:25.000000000 -0800
@@ -1575,6 +1575,12 @@
L: [email protected]
S: Maintained

+POSIX CLOCKS and TIMERS
+P: George Anzinger
+M: [email protected]
+L: [email protected]
+S: Supported
+
PNP SUPPORT
P: Adam Belay
M: [email protected]
diff -urP -I '\$Id:.*Exp \$' -X /usr/src/patch.exclude linux-2.6.0-test9-kb/include/linux/signal.h linux/include/linux/signal.h
--- linux-2.6.0-test9-kb/include/linux/signal.h 2003-11-10 16:13:59.000000000 -0800
+++ linux/include/linux/signal.h 2003-06-30 15:37:10.000000000 -0700
@@ -214,7 +214,7 @@
struct pt_regs;
extern int get_signal_to_deliver(siginfo_t *info, struct pt_regs *regs, void *cookie);
#endif
-
+#define FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
#endif /* __KERNEL__ */

#endif /* _LINUX_SIGNAL_H */
diff -urP -I '\$Id:.*Exp \$' -X /usr/src/patch.exclude linux-2.6.0-test9-kb/kernel/posix-timers.c linux/kernel/posix-timers.c
--- linux-2.6.0-test9-kb/kernel/posix-timers.c 2003-11-10 16:13:59.000000000 -0800
+++ linux/kernel/posix-timers.c 2003-11-11 14:17:41.000000000 -0800
@@ -1104,12 +1104,43 @@
extern long do_clock_nanosleep(clockid_t which_clock, int flags,
struct timespec *t);

+#ifdef FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
+
+asmlinkage long
+sys_nanosleep(struct timespec __user *rqtp, struct timespec __user *rmtp)
+{
+ struct timespec t;
+ struct restart_block *restart_block =
+ &(current_thread_info()->restart_block);
+ long ret;
+
+ if (copy_from_user(&t, rqtp, sizeof (t)))
+ return -EFAULT;
+
+ if ((unsigned) t.tv_nsec >= NSEC_PER_SEC || t.tv_sec < 0)
+ return -EINVAL;
+
+ ret = do_clock_nanosleep(CLOCK_REALTIME, 0, &t);
+ /*
+ * Do this here as do_clock_nanosleep does not have the real address
+ */
+ restart_block->arg1 = (unsigned long)rmtp;
+
+ if (ret == -ERESTART_RESTARTBLOCK && rmtp &&
+ copy_to_user(rmtp, &t, sizeof (t)))
+ return -EFAULT;
+ return ret;
+}
+#endif // ! FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP
+
asmlinkage long
sys_clock_nanosleep(clockid_t which_clock, int flags,
const struct timespec __user *rqtp,
struct timespec __user *rmtp)
{
struct timespec t;
+ struct restart_block *restart_block =
+ &(current_thread_info()->restart_block);
int ret;

if ((unsigned) which_clock >= MAX_CLOCKS ||
@@ -1123,6 +1154,10 @@
return -EINVAL;

ret = do_clock_nanosleep(which_clock, flags, &t);
+ /*
+ * Do this here as do_clock_nanosleep does not have the real address
+ */
+ restart_block->arg1 = (unsigned long)rmtp;

if ((ret == -ERESTART_RESTARTBLOCK) && rmtp &&
copy_to_user(rmtp, &t, sizeof (t)))
@@ -1152,7 +1187,7 @@
if (restart_block->fn == clock_nanosleep_restart) {
/*
* Interrupted by a non-delivered signal, pick up remaining
- * time and continue.
+ * time and continue. Remaining time is in arg2 & 3.
*/
restart_block->fn = do_no_restart_syscall;

@@ -1209,9 +1244,20 @@
tsave->tv_sec = div_long_long_rem(left,
NSEC_PER_SEC,
&tsave->tv_nsec);
+ /*
+ * Restart works by saving the time remaing in
+ * arg2 & 3 (it is 64-bits of jiffies). The other
+ * info we need is the clock_id (saved in arg0).
+ * The sys_call interface needs the users
+ * timespec return address which _it_ saves in arg1.
+ * Since we have cast the nanosleep call to a clock_nanosleep
+ * both can be restarted with the same code.
+ */
restart_block->fn = clock_nanosleep_restart;
restart_block->arg0 = which_clock;
- restart_block->arg1 = (unsigned long)tsave;
+ /*
+ * Caller sets arg1
+ */
restart_block->arg2 = rq_time & 0xffffffffLL;
restart_block->arg3 = rq_time >> 32;

@@ -1221,7 +1267,7 @@
return 0;
}
/*
- * This will restart clock_nanosleep. Incorrectly, btw.
+ * This will restart either clock_nanosleep or clock_nanosleep
*/
long
clock_nanosleep_restart(struct restart_block *restart_block)
diff -urP -I '\$Id:.*Exp \$' -X /usr/src/patch.exclude linux-2.6.0-test9-kb/kernel/timer.c linux/kernel/timer.c
--- linux-2.6.0-test9-kb/kernel/timer.c 2003-11-10 16:13:59.000000000 -0800
+++ linux/kernel/timer.c 2003-11-10 16:00:09.000000000 -0800
@@ -1059,6 +1059,7 @@
{
return current->pid;
}
+#ifndef FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP

static long nanosleep_restart(struct restart_block *restart)
{
@@ -1117,6 +1118,7 @@
}
return ret;
}
+#endif // ! FOLD_NANO_SLEEP_INTO_CLOCK_NANO_SLEEP

/*
* sys_sysinfo - fill in sysinfo struct
Binary files linux-2.6.0-test9-kb/scripts/bin2c and linux/scripts/bin2c differ


Attachments:
hrtimers-2.6.0-t9-fix.patch (4.60 kB)