LinuxLists.cc - [PATCH] prevent timespec/timeval to ktime

[permalink] [raw]

Subject: Re: [PATCH] prevent timespec/timeval to ktime_t overflow

On Wed, Aug 30, 2006 at 10:44:28AM +0200, Thomas Gleixner wrote:
> Frank v. Waveren pointed out that on 64bit machines the timespec to
> ktime_t conversion might overflow. This is also true for timeval to
> ktime_t conversions. This breaks a "sleep inf" on 64bit machines.
...
> Check the seconds argument to the conversion and limit it to the maximum
> time which can be represented by ktime_t.

It's a solution, and it more or less fixes things without any changes
to userspace, which is nice. I still prefer my patch in
<[email protected]> though, possibly with modifications so
it doesn't affect all timespec users but only nanosleep (we'd have to
check if the other timespec users aren't converting to ktime_t).

With this patch, we sleep shorter than specified, and don't signal
this in any way. Returning EINVAL for anything except negative tv_sec
or invalid tv_nsec breaks the spec too, but I prefer errors to
silently sleeping too short.

I'll grant this is more of an aesthetic point than something that'll
cause real-world problems (300 years is a long time for any sleep),
but if things break I like them to do so as loudly as possible, as a
general rule.

--
Frank v Waveren Key fingerprint: BDD7 D61E
[email protected] 5D39 CF05 4BFC F57A
Public key: hkp://wwwkeys.pgp.net/468D62C8 FA00 7D51 468D 62C8

Attachments:

(No filename) (1.38 kB)
signature.asc (189.00 B)
Digital signature Download all attachments

2006-08-30 22:01:20

[permalink] [raw]

Subject: Re: [PATCH] prevent timespec/timeval to ktime_t overflow

On Wed, 2006-08-30 at 23:44 +0200, Frank v Waveren wrote:
> On Wed, Aug 30, 2006 at 10:44:28AM +0200, Thomas Gleixner wrote:
> > Frank v. Waveren pointed out that on 64bit machines the timespec to
> > ktime_t conversion might overflow. This is also true for timeval to
> > ktime_t conversions. This breaks a "sleep inf" on 64bit machines.
> ...
> > Check the seconds argument to the conversion and limit it to the maximum
> > time which can be represented by ktime_t.
>
> It's a solution, and it more or less fixes things without any changes
> to userspace, which is nice. I still prefer my patch in
> <[email protected]> though, possibly with modifications so
> it doesn't affect all timespec users but only nanosleep (we'd have to
> check if the other timespec users aren't converting to ktime_t).
>
> With this patch, we sleep shorter than specified, and don't signal
> this in any way. Returning EINVAL for anything except negative tv_sec
> or invalid tv_nsec breaks the spec too, but I prefer errors to
> silently sleeping too short.

I really don't care whether we sleep 100 or 5000 years in the case of
"sleep MAX_LONG"

> I'll grant this is more of an aesthetic point than something that'll
> cause real-world problems (300 years is a long time for any sleep),
> but if things break I like them to do so as loudly as possible, as a
> general rule.

One thing you ignore is that your patch does not cure the introduced
user space breakage, it just replaces the overflow caused very short
sleep by a return -EINVAL, which is breaking existing userspace in a
different way. We have to preserve user space interfaces even when they
violate your aesthetic well-being.

tglx

2006-08-30 22:08:35

[permalink] [raw]

Subject: Re: [PATCH] prevent timespec/timeval to ktime_t overflow

On Thu, Aug 31, 2006 at 12:05:02AM +0200, Thomas Gleixner wrote:
> > With this patch, we sleep shorter than specified, and don't signal
> > this in any way. Returning EINVAL for anything except negative tv_sec
> > or invalid tv_nsec breaks the spec too, but I prefer errors to
> > silently sleeping too short.
>
> I really don't care whether we sleep 100 or 5000 years in the case of
> "sleep MAX_LONG"
Don't sell your patch short, it still manages nearly 300 years..

> > I'll grant this is more of an aesthetic point than something that'll
> > cause real-world problems (300 years is a long time for any sleep),
> > but if things break I like them to do so as loudly as possible, as a
> > general rule.
>
> One thing you ignore is that your patch does not cure the introduced
> user space breakage, it just replaces the overflow caused very short
> sleep by a return -EINVAL, which is breaking existing userspace in a
> different way. We have to preserve user space interfaces even when they
> violate your aesthetic well-being.

The userspace interface gets broken either way. The error might
actually serve as a decent portability wake up call, solaris 64 bit
also silently overflows in nanosleep, and since I've only had the
opportunity to check on solaris and linux, I wouldn't be surprised if
other OSes had the same problem.

--
Frank v Waveren Key fingerprint: BDD7 D61E
[email protected] 5D39 CF05 4BFC F57A
Public key: hkp://wwwkeys.pgp.net/468D62C8 FA00 7D51 468D 62C8

Attachments:

(No filename) (1.53 kB)
signature.asc (189.00 B)
Digital signature Download all attachments

2006-08-30 22:19:12

[permalink] [raw]

Subject: Re: [PATCH] prevent timespec/timeval to ktime_t overflow

On Thu, 2006-08-31 at 00:08 +0200, Frank v Waveren wrote:
> On Thu, Aug 31, 2006 at 12:05:02AM +0200, Thomas Gleixner wrote:
> > > With this patch, we sleep shorter than specified, and don't signal
> > > this in any way. Returning EINVAL for anything except negative tv_sec
> > > or invalid tv_nsec breaks the spec too, but I prefer errors to
> > > silently sleeping too short.
> >
> > I really don't care whether we sleep 100 or 5000 years in the case of
> > "sleep MAX_LONG"
> Don't sell your patch short, it still manages nearly 300 years..

Hehe

> > > I'll grant this is more of an aesthetic point than something that'll
> > > cause real-world problems (300 years is a long time for any sleep),
> > > but if things break I like them to do so as loudly as possible, as a
> > > general rule.
> >
> > One thing you ignore is that your patch does not cure the introduced
> > user space breakage, it just replaces the overflow caused very short
> > sleep by a return -EINVAL, which is breaking existing userspace in a
> > different way. We have to preserve user space interfaces even when they
> > violate your aesthetic well-being.
>
> The userspace interface gets broken either way. The error might
> actually serve as a decent portability wake up call, solaris 64 bit
> also silently overflows in nanosleep, and since I've only had the
> opportunity to check on solaris and linux, I wouldn't be surprised if
> other OSes had the same problem.

Well, the point is that pre hrtimer kernels did just sleep as long as
they internaly could. So the hrtimers / ktime_t merge changed the
userspace interface behaviour, which is breakage. We need to restore the
old behaviour and I consider it to be better than

1. silent overflows (even if solaris does the same)
2. returning -EINVAL without giving a minimum 1 year warning/fixup time
(see the paragraph about "Usage of invalid timevals in setitimer" in
Documentation/feature-removal-schedule.txt)

tglx

2006-08-30 22:26:46

[permalink] [raw]

Subject: Re: [PATCH] prevent timespec/timeval to ktime_t overflow

On Thu, Aug 31, 2006 at 12:22:36AM +0200, Thomas Gleixner wrote:
> Well, the point is that pre hrtimer kernels did just sleep as long as
> they internaly could. So the hrtimers / ktime_t merge changed the
> userspace interface behaviour, which is breakage.
...

Ok, that's a convincing argument, your patch is the better solution.

--
Frank v Waveren Key fingerprint: BDD7 D61E
[email protected] 5D39 CF05 4BFC F57A
Public key: hkp://wwwkeys.pgp.net/468D62C8 FA00 7D51 468D 62C8

Attachments:

(No filename) (564.00 B)
signature.asc (189.00 B)
Digital signature Download all attachments

2006-09-01 03:46:50

[permalink] [raw]

Subject: Re: [PATCH] prevent timespec/timeval to ktime_t overflow

On Wed, 30 Aug 2006 10:44:28 +0200
Thomas Gleixner <[email protected]> wrote:

> Frank v. Waveren pointed out that on 64bit machines the timespec to
> ktime_t conversion might overflow. This is also true for timeval to
> ktime_t conversions. This breaks a "sleep inf" on 64bit machines.
>
> While a timespec/timeval with tx.sec = MAX_LONG is valid by
> specification the internal representation of ktime_t is based on
> nanoseconds. The conversion of seconds to nanoseconds overflows for
> seconds values >= (MAX_LONG / NSEC_PER_SEC).
>
> Check the seconds argument to the conversion and limit it to the maximum
> time which can be represented by ktime_t.
>
> Signed-off-by: Thomas Gleixner <[email protected]>
>
> diff --git a/include/linux/ktime.h b/include/linux/ktime.h
> index ed3396d..1beafb3 100644
> --- a/include/linux/ktime.h
> +++ b/include/linux/ktime.h
> @@ -57,6 +57,7 @@ typedef union {
> } ktime_t;
>
> #define KTIME_MAX (~((u64)1 << 63))
> +#define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC)
>
> /*
> * ktime_t definitions when using the 64-bit scalar representation:
> @@ -73,6 +74,10 @@ typedef union {
> */
> static inline ktime_t ktime_set(const long secs, const unsigned long nsecs)
> {
> +#if (BITS_PER_LONG == 64)
> + if (unlikely(secs >= KTIME_SEC_MAX))
> + return (ktime_t){ .tv64 = KTIME_MAX };
> +#endif
> return (ktime_t) { .tv64 = (s64)secs * NSEC_PER_SEC + (s64)nsecs };
> }
>

This makes my FC3 x86_64 testbox hang during udev startup. sysrq-T trace at
http://www.zip.com.au/~akpm/linux/patches/stuff/log-x.

I had a quick look at the args to hrtimer_nanosleep and all seems to be in
order.

2006-09-01 08:52:31

[permalink] [raw]

Subject: Re: [PATCH] prevent timespec/timeval to ktime_t overflow

On Thu, 2006-08-31 at 20:46 -0700, Andrew Morton wrote:
> > * ktime_t definitions when using the 64-bit scalar representation:
> > @@ -73,6 +74,10 @@ typedef union {
> > */
> > static inline ktime_t ktime_set(const long secs, const unsigned long nsecs)
> > {
> > +#if (BITS_PER_LONG == 64)
> > + if (unlikely(secs >= KTIME_SEC_MAX))
> > + return (ktime_t){ .tv64 = KTIME_MAX };
> > +#endif
> > return (ktime_t) { .tv64 = (s64)secs * NSEC_PER_SEC + (s64)nsecs };
> > }
> >
>
> This makes my FC3 x86_64 testbox hang during udev startup. sysrq-T trace at
> http://www.zip.com.au/~akpm/linux/patches/stuff/log-x.

Does not help much.

> I had a quick look at the args to hrtimer_nanosleep and all seems to be in
> order.

Hmm. I can not reproduce that on any one of my boxen. Can you please try
with this debug variant, so we have a chance to figure out what's wrong.

tglx

Index: linux-2.6.18-rc5/include/linux/ktime.h
===================================================================
--- linux-2.6.18-rc5.orig/include/linux/ktime.h 2006-08-31 18:58:32.000000000 +0200
+++ linux-2.6.18-rc5/include/linux/ktime.h 2006-09-01 10:47:59.000000000 +0200
@@ -57,6 +57,7 @@ typedef union {
} ktime_t;

#define KTIME_MAX (~((u64)1 << 63))
+#define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC)

/*
* ktime_t definitions when using the 64-bit scalar representation:
@@ -73,6 +74,13 @@ typedef union {
*/
static inline ktime_t ktime_set(const long secs, const unsigned long nsecs)
{
+#if (BITS_PER_LONG == 64)
+ if (unlikely(secs >= KTIME_SEC_MAX)) {
+ printk("ktime_set: %ld : %lu\n", secs, nsecs);
+ WARN_ON(1);
+ return (ktime_t){ .tv64 = KTIME_MAX };
+ }
+#endif
return (ktime_t) { .tv64 = (s64)secs * NSEC_PER_SEC + (s64)nsecs };
}

2006-09-01 09:04:15

[permalink] [raw]

Subject: Re: [PATCH] prevent timespec/timeval to ktime_t overflow

On Fri, 01 Sep 2006 10:56:19 +0200
Thomas Gleixner <[email protected]> wrote:

> On Thu, 2006-08-31 at 20:46 -0700, Andrew Morton wrote:
> > > * ktime_t definitions when using the 64-bit scalar representation:
> > > @@ -73,6 +74,10 @@ typedef union {
> > > */
> > > static inline ktime_t ktime_set(const long secs, const unsigned long nsecs)
> > > {
> > > +#if (BITS_PER_LONG == 64)
> > > + if (unlikely(secs >= KTIME_SEC_MAX))
> > > + return (ktime_t){ .tv64 = KTIME_MAX };
> > > +#endif
> > > return (ktime_t) { .tv64 = (s64)secs * NSEC_PER_SEC + (s64)nsecs };
> > > }
> > >
> >
> > This makes my FC3 x86_64 testbox hang during udev startup. sysrq-T trace at
> > http://www.zip.com.au/~akpm/linux/patches/stuff/log-x.
>
> Does not help much.
>
> > I had a quick look at the args to hrtimer_nanosleep and all seems to be in
> > order.
>
> Hmm. I can not reproduce that on any one of my boxen. Can you please try
> with this debug variant, so we have a chance to figure out what's wrong.
>

With that patch the machine emits 88 bigabytes of stuff then locks up.
Something a little less aggressive is needed, methinks.

2006-09-01 09:26:53

[permalink] [raw]

Subject: Re: [PATCH] prevent timespec/timeval to ktime_t overflow

On Fri, 2006-09-01 at 02:04 -0700, Andrew Morton wrote:
> > > This makes my FC3 x86_64 testbox hang during udev startup. sysrq-T trace at
> > > http://www.zip.com.au/~akpm/linux/patches/stuff/log-x.
> >
> > Does not help much.
> >
> > > I had a quick look at the args to hrtimer_nanosleep and all seems to be in
> > > order.
> >
> > Hmm. I can not reproduce that on any one of my boxen. Can you please try
> > with this debug variant, so we have a chance to figure out what's wrong.
> >
>
> With that patch the machine emits 88 bigabytes of stuff then locks up.
> Something a little less aggressive is needed, methinks.

Fun, here is a version with a bigabyte blocker.

Index: linux-2.6.18-rc5/include/linux/ktime.h
===================================================================
--- linux-2.6.18-rc5.orig/include/linux/ktime.h 2006-09-01 10:48:38.000000000 +0200
+++ linux-2.6.18-rc5/include/linux/ktime.h 2006-09-01 11:20:10.000000000 +0200
@@ -57,6 +57,7 @@ typedef union {
} ktime_t;

#define KTIME_MAX (~((u64)1 << 63))
+#define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC)

/*
* ktime_t definitions when using the 64-bit scalar representation:
@@ -73,6 +74,18 @@ typedef union {
*/
static inline ktime_t ktime_set(const long secs, const unsigned long nsecs)
{
+#if (BITS_PER_LONG == 64)
+ static int no88bigabytes = 0;
+
+ if (unlikely(secs >= KTIME_SEC_MAX)) {
+ if (!no88bigabytes) {
+ no88bigabytes = 1;
+ printk("ktime_set: %ld : %lu\n", secs, nsecs);
+ WARN_ON(1);
+ }
+ return (ktime_t){ .tv64 = KTIME_MAX };
+ }
+#endif
return (ktime_t) { .tv64 = (s64)secs * NSEC_PER_SEC + (s64)nsecs };
}

2006-09-02 03:13:39

[permalink] [raw]

Subject: Re: [PATCH] prevent timespec/timeval to ktime_t overflow

On Fri, 01 Sep 2006 11:30:42 +0200
Thomas Gleixner <[email protected]> wrote:

> >
> > With that patch the machine emits 88 bigabytes of stuff then locks up.
> > Something a little less aggressive is needed, methinks.
>
> Fun, here is a version with a bigabyte blocker.

Your patch triggers waaaaaaay early.

netconsole: remote IP 192.168.2.33
netconsole: remote ethernet address 00:0d:56:c6:c6:cc
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
time.c: Detected 3400.238 MHz processor.
ktime_set: -1157140842 : 0
BUG: warning at include/linux/ktime.h:84/ktime_set()

Call Trace:
<IRQ> [<ffffffff80247021>] hrtimer_run_queues+0x10e/0x211
[<ffffffff8023a5ca>] run_timer_softirq+0x28/0x1dc
[<ffffffff802368f1>] __do_softirq+0x58/0xcf
[<ffffffff8020ac28>] call_softirq+0x1c/0x28
[<ffffffff8020c7b5>] do_softirq+0x31/0x84
[<ffffffff802365ea>] irq_exit+0x3f/0x4b
[<ffffffff8020c77a>] do_IRQ+0x62/0x6c
[<ffffffff80209f1d>] ret_from_intr+0x0/0xa
<EOI> [<ffffffff806a46ec>] console_init+0x0/0x37
[<ffffffff80691749>] start_kernel+0x123/0x1d0
[<ffffffff8069127a>] _sinittext+0x27a/0x281

Console: colour VGA+ 80x25
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)

So I modified it to only trigger if current->mm!=NULL and:

EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 156k freed
ktime_set: -1157141318 : 0
BUG: warning at kernel/hrtimer.c:892/ktime_set()

Call Trace:
<IRQ> [<ffffffff80246957>] ktime_set+0x73/0x8b
[<ffffffff80246de4>] hrtimer_run_queues+0x55/0x140
[<ffffffff8023a4f2>] run_timer_softirq+0x28/0x1dc
[<ffffffff80236819>] __do_softirq+0x58/0xcf
[<ffffffff8020ac28>] call_softirq+0x1c/0x28
[<ffffffff8020c7b5>] do_softirq+0x31/0x84
[<ffffffff80236512>] irq_exit+0x3f/0x4b
[<ffffffff802193b1>] smp_apic_timer_interrupt+0x42/0x46
[<ffffffff8020a5c6>] apic_timer_interrupt+0x66/0x6c
<EOI> [<ffffffff8026611b>] __handle_mm_fault+0x42a/0x990
[<ffffffff802660fa>] __handle_mm_fault+0x409/0x990
[<ffffffff804d6629>] _spin_unlock_irqrestore+0x1a/0x38
[<ffffffff8021f656>] do_page_fault+0x3b6/0x75c
[<ffffffff8025d214>] free_hot_cold_page+0x12c/0x14f
[<ffffffff8025d28a>] free_hot_page+0xb/0xd
[<ffffffff8025d2b5>] __free_pages+0x29/0x32
[<ffffffff8025d33e>] free_pages+0x80/0x82
[<ffffffff8020a721>] error_exit+0x0/0x84

I wonder if this is related to the occasional hrtimr_run_queues() lockup
which Andi is encountering.

--
VGER BF report: H 9.0096e-09

2006-09-02 03:32:28

[permalink] [raw]

Subject: Re: [PATCH] prevent timespec/timeval to ktime_t overflow

On Fri, 1 Sep 2006 20:13:05 -0700
Andrew Morton <[email protected]> wrote:

> So I modified it to only trigger if current->mm!=NULL and:

Hey, that worked.

With this patch:

diff -puN include/linux/ktime.h~ktime-debug include/linux/ktime.h
--- a/include/linux/ktime.h~ktime-debug
+++ a/include/linux/ktime.h
@@ -57,6 +57,7 @@ typedef union {
} ktime_t;

#define KTIME_MAX (~((u64)1 << 63))
+#define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC)

/*
* ktime_t definitions when using the 64-bit scalar representation:
@@ -64,17 +65,7 @@ typedef union {

#if (BITS_PER_LONG == 64) || defined(CONFIG_KTIME_SCALAR)

-/**
- * ktime_set - Set a ktime_t variable from a seconds/nanoseconds value
- * @secs: seconds to set
- * @nsecs: nanoseconds to set
- *
- * Return the ktime_t representation of the value
- */
-static inline ktime_t ktime_set(const long secs, const unsigned long nsecs)
-{
- return (ktime_t) { .tv64 = (s64)secs * NSEC_PER_SEC + (s64)nsecs };
-}
+ktime_t ktime_set(const long secs, const unsigned long nsecs);

/* Subtract two ktime_t variables. rem = lhs -rhs: */
#define ktime_sub(lhs, rhs) \
diff -puN kernel/hrtimer.c~ktime-debug kernel/hrtimer.c
--- a/kernel/hrtimer.c~ktime-debug
+++ a/kernel/hrtimer.c
@@ -870,3 +870,32 @@ void __init hrtimers_init(void)
register_cpu_notifier(&hrtimers_nb);
}

+
+#if (BITS_PER_LONG == 64) || defined(CONFIG_KTIME_SCALAR)
+
+/**
+ * ktime_set - Set a ktime_t variable from a seconds/nanoseconds value
+ * @secs: seconds to set
+ * @nsecs: nanoseconds to set
+ *
+ * Return the ktime_t representation of the value
+ */
+ktime_t ktime_set(const long secs, const unsigned long nsecs)
+{
+#if (BITS_PER_LONG == 64)
+ static int no88bigabytes = 0;
+
+ if (current->mm && unlikely(secs >= KTIME_SEC_MAX)) {
+ if (!no88bigabytes) {
+ no88bigabytes = 1;
+ printk("ktime_set: %ld : %lu\n", secs, nsecs);
+ WARN_ON(1);
+ }
+ return (ktime_t){ .tv64 = KTIME_MAX };
+ }
+#endif
+ return (ktime_t) { .tv64 = (s64)secs * NSEC_PER_SEC + (s64)nsecs };
+}
+EXPORT_SYMBOL(ktime_set);
+
+#endif
_

It emits that interrupt-time warning and gets to a login prompt.

But with this patch, which is the same thing with the debug stuff removed:

diff -puN include/linux/ktime.h~ktime-debug include/linux/ktime.h
--- a/include/linux/ktime.h~ktime-debug
+++ a/include/linux/ktime.h
@@ -57,6 +57,7 @@ typedef union {
} ktime_t;

#define KTIME_MAX (~((u64)1 << 63))
+#define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC)

/*
* ktime_t definitions when using the 64-bit scalar representation:
@@ -64,17 +65,7 @@ typedef union {

#if (BITS_PER_LONG == 64) || defined(CONFIG_KTIME_SCALAR)

-/**
- * ktime_set - Set a ktime_t variable from a seconds/nanoseconds value
- * @secs: seconds to set
- * @nsecs: nanoseconds to set
- *
- * Return the ktime_t representation of the value
- */
-static inline ktime_t ktime_set(const long secs, const unsigned long nsecs)
-{
- return (ktime_t) { .tv64 = (s64)secs * NSEC_PER_SEC + (s64)nsecs };
-}
+ktime_t ktime_set(const long secs, const unsigned long nsecs);

/* Subtract two ktime_t variables. rem = lhs -rhs: */
#define ktime_sub(lhs, rhs) \
diff -puN kernel/hrtimer.c~ktime-debug kernel/hrtimer.c
--- a/kernel/hrtimer.c~ktime-debug
+++ a/kernel/hrtimer.c
@@ -870,3 +870,24 @@ void __init hrtimers_init(void)
register_cpu_notifier(&hrtimers_nb);
}

+
+#if (BITS_PER_LONG == 64) || defined(CONFIG_KTIME_SCALAR)
+
+/**
+ * ktime_set - Set a ktime_t variable from a seconds/nanoseconds value
+ * @secs: seconds to set
+ * @nsecs: nanoseconds to set
+ *
+ * Return the ktime_t representation of the value
+ */
+ktime_t ktime_set(const long secs, const unsigned long nsecs)
+{
+#if (BITS_PER_LONG == 64)
+ if (unlikely(secs >= KTIME_SEC_MAX))
+ return (ktime_t){ .tv64 = KTIME_MAX };
+#endif
+ return (ktime_t) { .tv64 = (s64)secs * NSEC_PER_SEC + (s64)nsecs };
+}
+EXPORT_SYMBOL(ktime_set);
+
+#endif
_

it hangs in udev startup.

How very unpleasant. gcc-4.0.2.

--
VGER BF report: H 0

2006-09-02 08:08:14

by Andi Kleen

[permalink] [raw]

Subject: Re: [PATCH] prevent timespec/timeval to ktime_t overflow

> I wonder if this is related to the occasional hrtimr_run_queues() lockup
> which Andi is encountering.

I'm not sure all of my lockups are related to this. It was just one
case I caught when the system was still usable enough to Sysrq

I did a bisect yesterday but i ended up in some /proc patches which
didn't look too likely. Repeating it currently.

-Andi

--
VGER BF report: H 0.000684748

2006-09-02 11:04:41

[permalink] [raw]

Subject: Re: [PATCH] prevent timespec/timeval to ktime_t overflow

Here's a different patch, which should actually sleep for the
specified amount of time up to 2^64 seconds with a loop around the
sleeps and a tally of how long is left to sleep. It does mean we wake
up once every 300 years on long sleeps, but that shouldn't cause any
huge performance problems.

It's not particularly pretty, the restart handler means there's a
little code duplication and the limit of 4 args in the restart block
is heaps of fun, but on the bright side this may avoid the hangs
Andrew was complaining of. As it's a patch to nanosleep instead of
timespec_to_ktime, it won't catch it if there are any other buggy
users. It should be compatible with Thomas' patch though, so once the
hangs on FC3 are resolved both could be used.

Thomas: This doesn't exactly preserve the pre-ktime behaviour, but I
assume that's ok if instead it does what it says on the tin?

Syscall restarts and such are new territory to me, so if people could
cast a critical eye over this for bugs I'd be most grateful.

Signed-off-by: Frank v Waveren <[email protected]>

diff -urpN linux-2.6.17.11/include/linux/ktime.h linux-2.6.17.11-fvw/include/linux/ktime.h
--- linux-2.6.17.11/include/linux/ktime.h 2006-08-23 23:16:33.000000000 +0200
+++ linux-2.6.17.11-fvw/include/linux/ktime.h 2006-09-02 10:48:43.000000000 +0200
@@ -57,6 +57,7 @@ typedef union {
} ktime_t;

#define KTIME_MAX (~((u64)1 << 63))
+#define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC)

/*
* ktime_t definitions when using the 64-bit scalar representation:
diff -urpN linux-2.6.17.11/kernel/hrtimer.c linux-2.6.17.11-fvw/kernel/hrtimer.c
--- linux-2.6.17.11/kernel/hrtimer.c 2006-08-23 23:16:33.000000000 +0200
+++ linux-2.6.17.11-fvw/kernel/hrtimer.c 2006-09-02 12:07:02.000000000 +0200
@@ -706,31 +706,46 @@ static long __sched nanosleep_restart(st
{
struct hrtimer_sleeper t;
struct timespec __user *rmtp;
- struct timespec tu;
+ struct timespec tu, touter, tinner;
ktime_t time;

restart->fn = do_no_restart_syscall;

- hrtimer_init(&t.timer, restart->arg3, HRTIMER_ABS);
- t.timer.expires.tv64 = ((u64)restart->arg1 << 32) | (u64) restart->arg0;
-
- if (do_nanosleep(&t, HRTIMER_ABS))
- return 0;
-
- rmtp = (struct timespec __user *) restart->arg2;
- if (rmtp) {
- time = ktime_sub(t.timer.expires, t.timer.base->get_time());
- if (time.tv64 <= 0)
- return 0;
- tu = ktime_to_timespec(time);
- if (copy_to_user(rmtp, &tu, sizeof(tu)))
- return -EFAULT;
- }
-
- restart->fn = nanosleep_restart;
-
- /* The other values in restart are already filled in */
- return -ERESTART_RESTARTBLOCK;
+ time.tv64=(((u64)restart->arg1 & 0xFFFFFF) << 32) | (u64) restart->arg0;
+ touter=ktime_to_timespec(time);
+ touter.tv_sec += (u64)restart->arg1 >> 32;
+
+ while (unlikely(touter.tv_sec > 0 || touter.tv_nsec > 0))
+ {
+ tinner.tv_sec = touter.tv_sec % KTIME_SEC_MAX;
+ touter.tv_sec -= tinner.tv_sec;
+ tinner.tv_nsec = touter.tv_nsec;
+ touter.tv_nsec = 0;
+
+
+ hrtimer_init(&t.timer, restart->arg3, HRTIMER_ABS);
+ t.timer.expires = timespec_to_ktime(tinner);
+
+ if (do_nanosleep(&t, HRTIMER_ABS))
+ continue;
+
+ rmtp = (struct timespec __user *) restart->arg2;
+ if (rmtp) {
+ time = ktime_sub(t.timer.expires, t.timer.base->get_time());
+ if (time.tv64 <= 0)
+ continue;
+ tu = ktime_to_timespec(time);
+ if (copy_to_user(rmtp, &tu, sizeof(tu)))
+ return -EFAULT;
+ }
+
+ restart->fn = nanosleep_restart;
+
+ /* The other values in restart are already filled in */
+ return -ERESTART_RESTARTBLOCK;
+ }
+
+ return 0;
}

long hrtimer_nanosleep(struct timespec *rqtp, struct timespec __user *rmtp,
@@ -738,35 +753,55 @@ long hrtimer_nanosleep(struct timespec *
{
struct restart_block *restart;
struct hrtimer_sleeper t;
- struct timespec tu;
+ struct timespec tu, touter, tinner;
ktime_t rem;

- hrtimer_init(&t.timer, clockid, mode);
- t.timer.expires = timespec_to_ktime(*rqtp);
- if (do_nanosleep(&t, mode))
- return 0;
-
- /* Absolute timers do not update the rmtp value and restart: */
- if (mode == HRTIMER_ABS)
- return -ERESTARTNOHAND;
-
- if (rmtp) {
- rem = ktime_sub(t.timer.expires, t.timer.base->get_time());
- if (rem.tv64 <= 0)
- return 0;
- tu = ktime_to_timespec(rem);
- if (copy_to_user(rmtp, &tu, sizeof(tu)))
- return -EFAULT;
- }
+ touter=*rqtp;
+
+ while (touter.tv_sec > 0 || touter.tv_nsec > 0)
+ {
+ tinner.tv_sec = touter.tv_sec % KTIME_SEC_MAX;
+ touter.tv_sec -= tinner.tv_sec;
+ tinner.tv_nsec = touter.tv_nsec;
+ touter.tv_nsec = 0;
+
+ hrtimer_init(&t.timer, clockid, mode);
+ t.timer.expires = timespec_to_ktime(tinner);
+ if (do_nanosleep(&t, mode))
+ continue;
+
+ /* Absolute timers do not update the rmtp value and restart: */
+ if (mode == HRTIMER_ABS)
+ return -ERESTARTNOHAND;
+
+ if (rmtp) {
+ rem = ktime_sub(t.timer.expires, t.timer.base->get_time());
+ if (rem.tv64 <= 0)
+ continue;
+ tu = ktime_to_timespec(rem);
+ tu.tv_sec+=touter.tv_sec; /* No need to add tv_nsec, it always
+ * gets fully handled on the first
+ * iteration */
+ if (copy_to_user(rmtp, &tu, sizeof(tu)))
+ return -EFAULT;
+ }
+
+ restart = &current_thread_info()->restart_block;
+ restart->fn = nanosleep_restart;
+ restart->arg0 = t.timer.expires.tv64 & 0xFFFFFFFF;
+ /* Iff longs are 64 bits, we may have a non-zero touter.tv_sec.
+ * As long is 64 bits, the restart args are also 64 bits so
+ * we can store the touter.tv_sec in the high 32 bits of
+ * arg2 */
+ restart->arg1 = (t.timer.expires.tv64 >> 32) + ((u64)touter.tv_sec << 32);
+ restart->arg2 = (unsigned long) rmtp;
+ restart->arg3 = (unsigned long) t.timer.base->index;

- restart = &current_thread_info()->restart_block;
- restart->fn = nanosleep_restart;
- restart->arg0 = t.timer.expires.tv64 & 0xFFFFFFFF;
- restart->arg1 = t.timer.expires.tv64 >> 32;
- restart->arg2 = (unsigned long) rmtp;
- restart->arg3 = (unsigned long) t.timer.base->index;
+ return -ERESTART_RESTARTBLOCK;
+ }

- return -ERESTART_RESTARTBLOCK;
+ return 0;
+
}

asmlinkage long

--
Frank v Waveren Key fingerprint: BDD7 D61E
[email protected] 5D39 CF05 4BFC F57A
Public key: hkp://wwwkeys.pgp.net/468D62C8 FA00 7D51 468D 62C8

Attachments:

(No filename) (6.49 kB)
signature.asc (189.00 B)
Digital signature Download all attachments

2006-09-02 18:37:39

[permalink] [raw]

Subject: Re: [PATCH] prevent timespec/timeval to ktime_t overflow

On Fri, 2006-09-01 at 20:13 -0700, Andrew Morton wrote:
> > Fun, here is a version with a bigabyte blocker.
>
> Your patch triggers waaaaaaay early.
>
> netconsole: remote IP 192.168.2.33
> netconsole: remote ethernet address 00:0d:56:c6:c6:cc
> Initializing CPU#0
> PID hash table entries: 4096 (order: 12, 32768 bytes)
> time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
> time.c: Detected 3400.238 MHz processor.
> ktime_set: -1157140842 : 0

-------------^ !!!!!!!

> BUG: warning at include/linux/ktime.h:84/ktime_set()
>
> Call Trace:
> <IRQ> [<ffffffff80247021>] hrtimer_run_queues+0x10e/0x211

This seems to happen inside hrtimer_get_softirq_time().
wall_to_monotonic is negative.

Why does the check trigger ? We compare a "long", which contains a
negative value against some positive constant.

ktime_t ktime_set(const long secs, const unsigned long nsecs)
{
if (unlikely(secs >= KTIME_SEC_MAX)) {

where KTIME_SEC_MAX is 0x7FFFFFFFFFFFFFFF / 1000 000 000 =

9223372036 == 0x225C17D04,

which is compared against

-1157140842 == 0xFFFFFFFFBB076E96

This smells like gcc magic. Can you please disassemble the code in
question ?

> I wonder if this is related to the occasional hrtimr_run_queues() lockup
> which Andi is encountering.

Hmm, not sure.

tglx

--
VGER BF report: H 1.60283e-12

2006-09-02 18:40:43

[permalink] [raw]

Subject: Re: [PATCH] prevent timespec/timeval to ktime_t overflow

On Sat, 2006-09-02 at 13:04 +0200, Frank v Waveren wrote:
> Here's a different patch, which should actually sleep for the
> specified amount of time up to 2^64 seconds with a loop around the
> sleeps and a tally of how long is left to sleep. It does mean we wake
> up once every 300 years on long sleeps, but that shouldn't cause any
> huge performance problems.

Which non academic problem is solved by this patch ?

tglx

--
VGER BF report: U 0.491521

2006-09-02 19:24:30

[permalink] [raw]

Subject: Re: [PATCH] prevent timespec/timeval to ktime_t overflow

On Sat, 2006-09-02 at 20:41 +0200, Thomas Gleixner wrote:
> This smells like gcc magic. Can you please disassemble the code in
> question ?

Doh.

Fortunately I own shares of a brown-paperbag-factory and the dividend is
payed in kind.

> > I wonder if this is related to the occasional hrtimr_run_queues() lockup
> > which Andi is encountering.
>
> Hmm, not sure.

I'm quite sure right now.

tglx

diff --git a/include/linux/ktime.h b/include/linux/ktime.h
index ed3396d..84eeecd 100644
--- a/include/linux/ktime.h
+++ b/include/linux/ktime.h
@@ -56,7 +56,8 @@ typedef union {
#endif
} ktime_t;

-#define KTIME_MAX (~((u64)1 << 63))
+#define KTIME_MAX ((s64)~((u64)1 << 63))
+#define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC)

/*
* ktime_t definitions when using the 64-bit scalar representation:
@@ -73,6 +74,10 @@ typedef union {
*/
static inline ktime_t ktime_set(const long secs, const unsigned long nsecs)
{
+#if (BITS_PER_LONG == 64)
+ if (unlikely(secs >= KTIME_SEC_MAX))
+ return (ktime_t){ .tv64 = KTIME_MAX };
+#endif
return (ktime_t) { .tv64 = (s64)secs * NSEC_PER_SEC + (s64)nsecs };
}

--
VGER BF report: U 0.482714

2006-09-02 19:32:43

[permalink] [raw]

Subject: Re: [PATCH] prevent timespec/timeval to ktime_t overflow

On Sat, 02 Sep 2006 20:41:33 +0200
Thomas Gleixner <[email protected]> wrote:

> On Fri, 2006-09-01 at 20:13 -0700, Andrew Morton wrote:
> > > Fun, here is a version with a bigabyte blocker.
> >
> > Your patch triggers waaaaaaay early.
> >
> > netconsole: remote IP 192.168.2.33
> > netconsole: remote ethernet address 00:0d:56:c6:c6:cc
> > Initializing CPU#0
> > PID hash table entries: 4096 (order: 12, 32768 bytes)
> > time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
> > time.c: Detected 3400.238 MHz processor.
> > ktime_set: -1157140842 : 0
>
> -------------^ !!!!!!!
>
> > BUG: warning at include/linux/ktime.h:84/ktime_set()
> >
> > Call Trace:
> > <IRQ> [<ffffffff80247021>] hrtimer_run_queues+0x10e/0x211
>
> This seems to happen inside hrtimer_get_softirq_time().
> wall_to_monotonic is negative.
>
> Why does the check trigger ? We compare a "long", which contains a
> negative value against some positive constant.
>
> ktime_t ktime_set(const long secs, const unsigned long nsecs)
> {
> if (unlikely(secs >= KTIME_SEC_MAX)) {
>
> where KTIME_SEC_MAX is 0x7FFFFFFFFFFFFFFF / 1000 000 000 =
>
> 9223372036 == 0x225C17D04,
>
> which is compared against
>
> -1157140842 == 0xFFFFFFFFBB076E96
>
> This smells like gcc magic. Can you please disassemble the code in
> question ?
>

ktime_set:
.LFB522:
.file 1 "kernel/hrtimer.c"
.loc 1 884 0
.LVL0:
pushq %rbp #
.LCFI0:
movq %rsp, %rbp #,
.LCFI1:
.LBB2:
.LBB3:
.LBB4:
.file 2 "include/asm/current.h"
.loc 2 11 0
#APP
movq %gs:0,%rax #, t
.LVL1:
#NO_APP
.LBE4:
.LBE3:
.LBE2:
.loc 1 888 0
cmpq $0, 208(%rax) #, <variable>.mm
je .L2 #,
movabsq $9223372035, %rax #, tmp66
.LVL2:
cmpq %rax, %rdi # tmp66, secs
jbe .L2 #,
.loc 1 889 0
cmpl $0, no88bigabytes.11819(%rip) #, no88bigabytes
jne .L5 #,
.loc 1 890 0
movl $1, no88bigabytes.11819(%rip) #, no88bigabytes
.loc 1 891 0
movq %rsi, %rdx # nsecs, nsecs
movq %rdi, %rsi # secs, secs

--
VGER BF report: H 0.000301227

2006-09-02 19:43:55

[permalink] [raw]

Subject: Re: [PATCH] prevent timespec/timeval to ktime_t overflow

On Sat, 02 Sep 2006 21:28:26 +0200
Thomas Gleixner <[email protected]> wrote:

> I'm quite sure right now.

Seems to work now, thanks.

--
VGER BF report: H 0

2006-09-03 03:12:54