2007-12-19 01:13:45

by Parag Warudkar

[permalink] [raw]
Subject: [PATCH] sky2: Use deferrable timer for watchdog


sky2 can use deferrable timer for watchdog - reduces wakeups from idle per
second.

Signed-off-by: Parag Warudkar <[email protected]>

--- linux-2.6/drivers/net/sky2.c 2007-12-07 10:04:39.000000000 -0500
+++ linux-2.6-work/drivers/net/sky2.c 2007-12-18 20:07:58.000000000 -0500
@@ -4230,7 +4230,10 @@
sky2_show_addr(dev1);
}

- setup_timer(&hw->watchdog_timer, sky2_watchdog, (unsigned long) hw);
+ hw->watchdog_timer.function = sky2_watchdog;
+ hw->watchdog_timer.data = (unsigned long) hw;
+ init_timer_deferrable(&hw->watchdog_timer);
+
INIT_WORK(&hw->restart_work, sky2_restart);

pci_set_drvdata(pdev, hw);


2007-12-20 17:16:46

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] sky2: Use deferrable timer for watchdog

On Tue, 18 Dec 2007 20:13:28 -0500 (EST)
Parag Warudkar <[email protected]> wrote:

>
> sky2 can use deferrable timer for watchdog - reduces wakeups from idle per
> second.
>
> Signed-off-by: Parag Warudkar <[email protected]>
>
> --- linux-2.6/drivers/net/sky2.c 2007-12-07 10:04:39.000000000 -0500
> +++ linux-2.6-work/drivers/net/sky2.c 2007-12-18 20:07:58.000000000 -0500
> @@ -4230,7 +4230,10 @@
> sky2_show_addr(dev1);
> }
>
> - setup_timer(&hw->watchdog_timer, sky2_watchdog, (unsigned long) hw);
> + hw->watchdog_timer.function = sky2_watchdog;
> + hw->watchdog_timer.data = (unsigned long) hw;
> + init_timer_deferrable(&hw->watchdog_timer);
> +
> INIT_WORK(&hw->restart_work, sky2_restart);
>
> pci_set_drvdata(pdev, hw);

Does it really reduce the wakeup's or only change who gets charged by powertop?
The system is going to wakeup once a second anyway. Looks to me that if the
timer is using round_jiffies(), that setting deferrable just changes the accounting.

My interpretation of the api is:
* round_jiffies() - timer wants to wakeup but isn't precise about when so schedule
on next second when system will wake up anyway;
e.g why meetings are usually scheduled on the hour

* deferrable - timer doesn't have to really wakeup but wants to happen near
a particular time. e.g. "I'll meet you at the pub around 8pm"

Therefore doing deferrable is unnecessary for timers using round_jiffies unless system
is so good at doing timers that it is going to skip doing timer once per second.

--
Stephen Hemminger <[email protected]>

2007-12-20 17:52:32

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] sky2: Use deferrable timer for watchdog

On Thu, 20 Dec 2007 17:29:23 +0000

>
> -----Original Message-----
> From: Stephen Hemminger <[email protected]>
>
> Date: Thu, 20 Dec 2007 09:16:03
> To:[email protected]
> Cc:[email protected], [email protected], [email protected]
> Subject: Re: [PATCH] sky2: Use deferrable timer for watchdog
>
>
> On Tue, 18 Dec 2007 20:13:28 -0500 (EST)
> Parag Warudkar <[email protected]> wrote:
>
> >
> > sky2 can use deferrable timer for watchdog - reduces wakeups from idle per
> > second.
> >
> > Signed-off-by: Parag Warudkar <[email protected]>
> >
> > --- linux-2.6/drivers/net/sky2.c 2007-12-07 10:04:39.000000000 -0500
> > +++ linux-2.6-work/drivers/net/sky2.c 2007-12-18 20:07:58.000000000 -0500
> > @@ -4230,7 +4230,10 @@
> > sky2_show_addr(dev1);
> > }
> >
> > - setup_timer(&hw->watchdog_timer, sky2_watchdog, (unsigned long) hw);
> > + hw->watchdog_timer.function = sky2_watchdog;
> > + hw->watchdog_timer.data = (unsigned long) hw;
> > + init_timer_deferrable(&hw->watchdog_timer);
> > +
> > INIT_WORK(&hw->restart_work, sky2_restart);
> >
> > pci_set_drvdata(pdev, hw);
>
> Does it really reduce the wakeup's or only change who gets charged by powertop?
> The system is going to wakeup once a second anyway. Looks to me that if the
> timer is using round_jiffies(), that setting deferrable just changes the accounting.
>
> My interpretation of the api is:
> * round_jiffies() - timer wants to wakeup but isn't precise about when so schedule
> on next second when system will wake up anyway;
> e.g why meetings are usually scheduled on the hour
>
> * deferrable - timer doesn't have to really wakeup but wants to happen near
> a particular time. e.g. "I'll meet you at the pub around 8pm"
>
> Therefore doing deferrable is unnecessary for timers using round_jiffies unless system
> is so good at doing timers that it is going to skip doing timer once per second.
>

[email protected] wrote:

> NO_HZ kernels don't do timers every second - if you do round_jiffies() the kernel will wakeup and run the timer at that time no matter what.
>
> The reason deferrable was introduced is to avoid waking up the kernel just for this one timer that can be called when the CPU is not idle for some reason other than this timer.
>
> In other words let's say there were two timers - one non-deferrable expiring in 3 seconds and other deferrable, expiring in 1.5 seconds. The kernel will not wake up twice - once for 1.5 second and other for 3 second - it will wake up once at expiry of 3 second timer and execute both the 1.5 second and 3 second timers.
>
> And this is not just powertop accounting thing - like I said the total num of wakeups per second go down with this patch.
>
> Parag
>
> Sent via BlackBerry from T-Mobile


Quit top-posting!

If this is the case then the whole usage of round_jiffies() is bogus. All users of round_jiffies()
should just be converted to deferrable?? I am a bit concerned that if deferrable gets used everywhere
then a strange situation would occur where all timers were waiting for some other timer to finally
happen, kind of a wierd timelock situation. Like the old chip/dale cartoon:
"you first, no you first, after you mister chip, no after you mister dale,..."

--
Stephen Hemminger <[email protected]>

2007-12-20 18:06:20

by Parag Warudkar

[permalink] [raw]
Subject: Re: [PATCH] sky2: Use deferrable timer for watchdog

On Dec 20, 2007 12:51 PM, Stephen Hemminger
<[email protected]> wrote:
>
> Quit top-posting!
>
> If this is the case then the whole usage of round_jiffies() is bogus. All users of round_jiffies()
> should just be converted to deferrable?? I am a bit concerned that if deferrable gets used everywhere
> then a strange situation would occur where all timers were waiting for some other timer to finally
> happen, kind of a wierd timelock situation. Like the old chip/dale cartoon:
> "you first, no you first, after you mister chip, no after you mister dale,..."
>

Haha - I thought about this too. I think there should be mechanism
where the machine does not idle infinitely even if there are no
non-deferrable timers. Something like an affordable QoS for non
deferrable timers - the kernel wakes up after that interval and runs
all deferrable timers even if nothing non-deferrable is set to run.
So we still get advantage of not having to wake individually for each
timer and the non-deferrable timers do get all run in reasonable
amount of time.

Who knows Thomas/Ingo already built in something of that nature or effect?!

Parag

2007-12-20 19:10:21

by Kok, Auke

[permalink] [raw]
Subject: Re: [PATCH] sky2: Use deferrable timer for watchdog

Stephen Hemminger wrote:
> On Thu, 20 Dec 2007 17:29:23 +0000
>
>> -----Original Message-----
>> From: Stephen Hemminger <[email protected]>
>>
>> Date: Thu, 20 Dec 2007 09:16:03
>> To:[email protected]
>> Cc:[email protected], [email protected], [email protected]
>> Subject: Re: [PATCH] sky2: Use deferrable timer for watchdog
>>
>>
>> On Tue, 18 Dec 2007 20:13:28 -0500 (EST)
>> Parag Warudkar <[email protected]> wrote:
>>
>>> sky2 can use deferrable timer for watchdog - reduces wakeups from idle per
>>> second.
>>>
>>> Signed-off-by: Parag Warudkar <[email protected]>
>>>
>>> --- linux-2.6/drivers/net/sky2.c 2007-12-07 10:04:39.000000000 -0500
>>> +++ linux-2.6-work/drivers/net/sky2.c 2007-12-18 20:07:58.000000000 -0500
>>> @@ -4230,7 +4230,10 @@
>>> sky2_show_addr(dev1);
>>> }
>>>
>>> - setup_timer(&hw->watchdog_timer, sky2_watchdog, (unsigned long) hw);
>>> + hw->watchdog_timer.function = sky2_watchdog;
>>> + hw->watchdog_timer.data = (unsigned long) hw;
>>> + init_timer_deferrable(&hw->watchdog_timer);
>>> +
>>> INIT_WORK(&hw->restart_work, sky2_restart);
>>>
>>> pci_set_drvdata(pdev, hw);
>> Does it really reduce the wakeup's or only change who gets charged by powertop?
>> The system is going to wakeup once a second anyway. Looks to me that if the
>> timer is using round_jiffies(), that setting deferrable just changes the accounting.
>>
>> My interpretation of the api is:
>> * round_jiffies() - timer wants to wakeup but isn't precise about when so schedule
>> on next second when system will wake up anyway;
>> e.g why meetings are usually scheduled on the hour
>>
>> * deferrable - timer doesn't have to really wakeup but wants to happen near
>> a particular time. e.g. "I'll meet you at the pub around 8pm"
>>
>> Therefore doing deferrable is unnecessary for timers using round_jiffies unless system
>> is so good at doing timers that it is going to skip doing timer once per second.
>>
>
> [email protected] wrote:
>
>> NO_HZ kernels don't do timers every second - if you do round_jiffies() the kernel will wakeup and run the timer at that time no matter what.
>>
>> The reason deferrable was introduced is to avoid waking up the kernel just for this one timer that can be called when the CPU is not idle for some reason other than this timer.
>>
>> In other words let's say there were two timers - one non-deferrable expiring in 3 seconds and other deferrable, expiring in 1.5 seconds. The kernel will not wake up twice - once for 1.5 second and other for 3 second - it will wake up once at expiry of 3 second timer and execute both the 1.5 second and 3 second timers.
>>
>> And this is not just powertop accounting thing - like I said the total num of wakeups per second go down with this patch.
>>
>> Parag
>>
>> Sent via BlackBerry from T-Mobile
>
>
> Quit top-posting!
>
> If this is the case then the whole usage of round_jiffies() is bogus. All users of round_jiffies()
> should just be converted to deferrable?? I am a bit concerned that if deferrable gets used everywhere
> then a strange situation would occur where all timers were waiting for some other timer to finally
> happen, kind of a wierd timelock situation. Like the old chip/dale cartoon:
> "you first, no you first, after you mister chip, no after you mister dale,..."



that's a dangerous situation indeed and I'd really like to know what the limits
are for deferring deferrable timers.... Arjan, do you know? Anyone?

I don't see a danger just yet on normal systems - I get something like 10 wakeups
per second from just the kernel (acpi, ahci, usb) on most my systems which
guarantees that the watchdog runs often enough, but for embedded systems and
critical timers in other drivers this may be an issue quickly


Auke

2007-12-20 19:14:05

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] sky2: Use deferrable timer for watchdog


>>> My interpretation of the api is:
>>> * round_jiffies() - timer wants to wakeup but isn't precise about when so schedule
>>> on next second when system will wake up anyway;
>>> e.g why meetings are usually scheduled on the hour
>>>
>>> * deferrable - timer doesn't have to really wakeup but wants to happen near
>>> a particular time. e.g. "I'll meet you at the pub around 8pm"

this is not correct.

deferrable means "if you're busy wake me up at this time. But if not, don't bother waking up for me, get to it
later".

The "later" can be a LONG time later, several seconds easily, if not more.
(timers are on a per cpu bases, and you may end up with a several-core system where the common timers are all on another cpu
than this one)



>> If this is the case then the whole usage of round_jiffies() is bogus. All users of round_jiffies()
>> should just be converted to deferrable?? I am a bit concerned that if deferrable gets used everywhere
>> then a strange situation would occur where all timers were waiting for some other timer to finally
>> happen, kind of a wierd timelock situation. Like the old chip/dale cartoon:
>> "you first, no you first, after you mister chip, no after you mister dale,..."
>
>
>
> that's a dangerous situation indeed and I'd really like to know what the limits
> are for deferring deferrable timers.... Arjan, do you know? Anyone?

there is NO limit to deferring a timer. Do NOT use a deferrable timer if you can't afford the timer to not happen
within.. 10 to 100 seconds! (or more)
They are really meant for things where you CAN afford for it to not happen when you're idle....


>
> I don't see a danger just yet on normal systems - I get something like 10 wakeups
> per second from just the kernel (acpi, ahci, usb) on most my systems which
> guarantees that the watchdog runs often enough, but for embedded systems and
> critical timers in other drivers this may be an issue quickly

on my work desktop test box the average time between cpu wakeups is 1.4 seconds
(and that's single core). It would be higher if it wasn't for some hpet limit issues.

2007-12-20 19:43:20

by Kok, Auke

[permalink] [raw]
Subject: Re: [PATCH] sky2: Use deferrable timer for watchdog

Arjan van de Ven wrote:
>
>>>> My interpretation of the api is:
>>>> * round_jiffies() - timer wants to wakeup but isn't precise
>>>> about when so schedule
>>>> on next second when system will wake up anyway;
>>>> e.g why meetings are usually scheduled on
>>>> the hour
>>>>
>>>> * deferrable - timer doesn't have to really wakeup but
>>>> wants to happen near
>>>> a particular time. e.g. "I'll meet you at
>>>> the pub around 8pm"
>
> this is not correct.
>
> deferrable means "if you're busy wake me up at this time. But if not,
> don't bother waking up for me, get to it
> later".
>
> The "later" can be a LONG time later, several seconds easily, if not more.
> (timers are on a per cpu bases, and you may end up with a several-core
> system where the common timers are all on another cpu
> than this one)
>
>
>
>>> If this is the case then the whole usage of round_jiffies() is bogus.
>>> All users of round_jiffies()
>>> should just be converted to deferrable?? I am a bit concerned that
>>> if deferrable gets used everywhere
>>> then a strange situation would occur where all timers were waiting
>>> for some other timer to finally
>>> happen, kind of a wierd timelock situation. Like the old chip/dale
>>> cartoon:
>>> "you first, no you first, after you mister chip, no after you mister
>>> dale,..."
>>
>>
>>
>> that's a dangerous situation indeed and I'd really like to know what
>> the limits
>> are for deferring deferrable timers.... Arjan, do you know? Anyone?
>
> there is NO limit to deferring a timer. Do NOT use a deferrable timer if
> you can't afford the timer to not happen
> within.. 10 to 100 seconds! (or more)
> They are really meant for things where you CAN afford for it to not
> happen when you're idle....

ok, that's just bad and if there's no user-defineable limit to the deferral I
definately don't like this change.

Can I safely assume that any irq will cause all deferred timers to run?

If this is the case then for e1000 this patch is still OK since the watchdog needs
to run (1) after a link up/down interrupt or (2) to update statistics. Those
statistics won't increase if there is no traffic of course...

Auke

2007-12-20 20:01:21

by Parag Warudkar

[permalink] [raw]
Subject: Re: [PATCH] sky2: Use deferrable timer for watchdog

On Dec 20, 2007 2:22 PM, Kok, Auke <[email protected]> wrote:
> ok, that's just bad and if there's no user-defineable limit to the deferral I
> definately don't like this change.
>
> Can I safely assume that any irq will cause all deferred timers to run?

I think even other causes for wakeup like process related ones will
cause the CPU to go busy and run the timers.
This, coupled with the fact that no one is yet able to reach 0 wakeups
per second makes it pretty unlikely that deferrable timers will be
deferred indefinitely.

>
> If this is the case then for e1000 this patch is still OK since the watchdog needs
> to run (1) after a link up/down interrupt or (2) to update statistics. Those
> statistics won't increase if there is no traffic of course...
>

I think it is reasonable for Network driver watchdogs to use a
deferrable timer - if the machine is 100% IDLE there is no one needing
the network to be up. If there is something running even on the other
CPU - that is going to cause an IPI, reschedule, TLB invalidation etc.
which will make it very likely in practice that each CPU will be
interrupted in reasonable amount of time.

Of course there are theoretical cases where we could land into a
situation where a CPU in a multiprocessor machine is IDLE infinitely
and that causes the watchdog that happens to be bound to run on the
same CPU to not run. To take care of these unlikely cases I think the
timer mechanism should have a reasonable limit on how long a CPU can
go IDLE if there are deferrable timers.

Parag

2007-12-20 20:04:41

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] sky2: Use deferrable timer for watchdog

Kok, Auke wrote:

> ok, that's just bad and if there's no user-defineable limit to the deferral I
> definately don't like this change.
>
> Can I safely assume that any irq will cause all deferred timers to run?

*on that cpu*. Timers are per cpu, as are interrupts. Just not per se the same one ...

2007-12-20 20:13:56

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] sky2: Use deferrable timer for watchdog

Parag Warudkar wrote:
> On Dec 20, 2007 2:22 PM, Kok, Auke <[email protected]> wrote:
>> ok, that's just bad and if there's no user-defineable limit to the deferral I
>> definately don't like this change.
>>
>> Can I safely assume that any irq will cause all deferred timers to run?
>
> I think even other causes for wakeup like process related ones will
> cause the CPU to go busy and run the timers.
> This, coupled with the fact that no one is yet able to reach 0 wakeups
> per second makes it pretty unlikely that deferrable timers will be
> deferred indefinitely.

0.8 is easy on single core today.
multicore just increases how idle you can be for a given core.

>
>> If this is the case then for e1000 this patch is still OK since the watchdog needs
>> to run (1) after a link up/down interrupt or (2) to update statistics. Those
>> statistics won't increase if there is no traffic of course...
>>
>
> I think it is reasonable for Network driver watchdogs to use a
> deferrable timer - if the machine is 100% IDLE there is no one needing
> the network to be up. If there is something running even on the other
> CPU - that is going to cause an IPI, reschedule, TLB invalidation etc.
> which will make it very likely in practice that each CPU will be
> interrupted in reasonable amount of time.

this is not correct; many machines are idle waiting for network data. Think of webservers...

>
> Of course there are theoretical cases where we could land into a
> situation where a CPU in a multiprocessor machine is IDLE infinitely
> and that causes the watchdog that happens to be bound to run on the
> same CPU to not run. To take care of these unlikely cases I think the
> timer mechanism should have a reasonable limit on how long a CPU can
> go IDLE if there are deferrable timers.

how about something else instead: a timer mechanism that takes a range instead..
that at least has defined semantics; the deferrable semantics really are "indefinite".
Lets keep at least the semantics clear and clean.

2007-12-20 20:18:25

by Krzysztof Oledzki

[permalink] [raw]
Subject: Re: [PATCH] sky2: Use deferrable timer for watchdog



On Thu, 20 Dec 2007, Parag Warudkar wrote:

> On Dec 20, 2007 2:22 PM, Kok, Auke <[email protected]> wrote:
>> ok, that's just bad and if there's no user-defineable limit to the deferral I
>> definately don't like this change.
>>
>> Can I safely assume that any irq will cause all deferred timers to run?
>
> I think even other causes for wakeup like process related ones will
> cause the CPU to go busy and run the timers.
> This, coupled with the fact that no one is yet able to reach 0 wakeups
> per second makes it pretty unlikely that deferrable timers will be
> deferred indefinitely.
>
>>
>> If this is the case then for e1000 this patch is still OK since the watchdog needs
>> to run (1) after a link up/down interrupt or (2) to update statistics. Those
>> statistics won't increase if there is no traffic of course...
>>
>
> I think it is reasonable for Network driver watchdogs to use a
> deferrable timer - if the machine is 100% IDLE there is no one needing
> the network to be up.

Please note tha being connected to a network does not only mean to send
but also to receive.

Best regards,

Krzysztof Oledzki

2007-12-20 20:36:29

by Parag Warudkar

[permalink] [raw]
Subject: Re: [PATCH] sky2: Use deferrable timer for watchdog

On Dec 20, 2007 3:04 PM, Arjan van de Ven <[email protected]> wrote:
> > I think it is reasonable for Network driver watchdogs to use a
> > deferrable timer - if the machine is 100% IDLE there is no one needing
> > the network to be up. If there is something running even on the other
> > CPU - that is going to cause an IPI, reschedule, TLB invalidation etc.
> > which will make it very likely in practice that each CPU will be
> > interrupted in reasonable amount of time.
>
> this is not correct; many machines are idle waiting for network data. Think of webservers...

Yes, I forgot the receive case. So if a server was 100% IDLE and a web
server was listening for network data and we reach 0 wakeups per
second on the CPU where the network watchdog timer is scheduled to run
deferred _and_ the network link went down, it would cause the watchdog
to not run and redo the link until some one else wakes up that CPU
later.
So as long as we make sure we don't convert every timer to deferrable
we should be ok - may be this can be resolved easily by having a
non-deferrable "dont-allow-deferring-for-too-long" timer on each CPU
that just causes at least one wake up in some reasonable time delta
from the previous wakeup (whoever caused that one.) It is still
beneficial in that all deferrable timers would run at once without
needing to have separate wakeup for each.

>
> >
> > Of course there are theoretical cases where we could land into a
> > situation where a CPU in a multiprocessor machine is IDLE infinitely
> > and that causes the watchdog that happens to be bound to run on the
> > same CPU to not run. To take care of these unlikely cases I think the
> > timer mechanism should have a reasonable limit on how long a CPU can
> > go IDLE if there are deferrable timers.
>
> how about something else instead: a timer mechanism that takes a range instead..
> that at least has defined semantics; the deferrable semantics really are "indefinite".
> Lets keep at least the semantics clear and clean.
>

Would not the simpler solution of installing a non-deferrable timer
per cpu which will not allow the CPU to go IDLE for more than x units
of time at once (or something to that effect) work? Range would
complicate the thing and I am not sure how many cases will know
reasonably correct range for their normal operation. In this instance
of the e1000 watchdog what range could it give and be successful at
what it wants to do - bring up the link in reasonable amount of time,
while also realizing the power savings?

Perhaps depending on Server/Laptop/Desktop machine (may be based on
Preemption) we could have normal or deferrable timers but that'll
exclude Servers from power savings and I am not sure Data center folks
will like that :) .

Parag

2007-12-20 21:10:04

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] sky2: Use deferrable timer for watchdog

On Thu, 20 Dec 2007 15:36:13 -0500
"Parag Warudkar" <[email protected]> wrote:

> On Dec 20, 2007 3:04 PM, Arjan van de Ven <[email protected]> wrote:
> > > I think it is reasonable for Network driver watchdogs to use a
> > > deferrable timer - if the machine is 100% IDLE there is no one needing
> > > the network to be up. If there is something running even on the other
> > > CPU - that is going to cause an IPI, reschedule, TLB invalidation etc.
> > > which will make it very likely in practice that each CPU will be
> > > interrupted in reasonable amount of time.
> >
> > this is not correct; many machines are idle waiting for network data. Think of webservers...
>
> Yes, I forgot the receive case. So if a server was 100% IDLE and a web
> server was listening for network data and we reach 0 wakeups per
> second on the CPU where the network watchdog timer is scheduled to run
> deferred _and_ the network link went down, it would cause the watchdog
> to not run and redo the link until some one else wakes up that CPU
> later.
> So as long as we make sure we don't convert every timer to deferrable
> we should be ok - may be this can be resolved easily by having a
> non-deferrable "dont-allow-deferring-for-too-long" timer on each CPU
> that just causes at least one wake up in some reasonable time delta
> from the previous wakeup (whoever caused that one.) It is still
> beneficial in that all deferrable timers would run at once without
> needing to have separate wakeup for each.
>
> >
> > >
> > > Of course there are theoretical cases where we could land into a
> > > situation where a CPU in a multiprocessor machine is IDLE infinitely
> > > and that causes the watchdog that happens to be bound to run on the
> > > same CPU to not run. To take care of these unlikely cases I think the
> > > timer mechanism should have a reasonable limit on how long a CPU can
> > > go IDLE if there are deferrable timers.
> >
> > how about something else instead: a timer mechanism that takes a range instead..
> > that at least has defined semantics; the deferrable semantics really are "indefinite".
> > Lets keep at least the semantics clear and clean.
> >
>
> Would not the simpler solution of installing a non-deferrable timer
> per cpu which will not allow the CPU to go IDLE for more than x units
> of time at once (or something to that effect) work? Range would
> complicate the thing and I am not sure how many cases will know
> reasonably correct range for their normal operation. In this instance
> of the e1000 watchdog what range could it give and be successful at
> what it wants to do - bring up the link in reasonable amount of time,
> while also realizing the power savings?
>
> Perhaps depending on Server/Laptop/Desktop machine (may be based on
> Preemption) we could have normal or deferrable timers but that'll
> exclude Servers from power savings and I am not sure Data center folks
> will like that :) .
>
> Parag


The problem is that on a server the receiver will go deaf if the chip
bug that the watchdog is looking for triggers. Yes, no packets in
and it happily will just sit there.

So for now, I am not going to apply your simple patch and work on a
two stage timer per arjan's suggestion for a later release.

--
Stephen Hemminger <[email protected]>

2007-12-20 21:32:40

by Kok, Auke

[permalink] [raw]
Subject: Re: [PATCH] sky2: Use deferrable timer for watchdog

Stephen Hemminger wrote:
> On Thu, 20 Dec 2007 15:36:13 -0500
> "Parag Warudkar" <[email protected]> wrote:
>
>> On Dec 20, 2007 3:04 PM, Arjan van de Ven <[email protected]> wrote:
>>>> I think it is reasonable for Network driver watchdogs to use a
>>>> deferrable timer - if the machine is 100% IDLE there is no one needing
>>>> the network to be up. If there is something running even on the other
>>>> CPU - that is going to cause an IPI, reschedule, TLB invalidation etc.
>>>> which will make it very likely in practice that each CPU will be
>>>> interrupted in reasonable amount of time.
>>> this is not correct; many machines are idle waiting for network data. Think of webservers...
>> Yes, I forgot the receive case. So if a server was 100% IDLE and a web
>> server was listening for network data and we reach 0 wakeups per
>> second on the CPU where the network watchdog timer is scheduled to run
>> deferred _and_ the network link went down, it would cause the watchdog
>> to not run and redo the link until some one else wakes up that CPU
>> later.
>> So as long as we make sure we don't convert every timer to deferrable
>> we should be ok - may be this can be resolved easily by having a
>> non-deferrable "dont-allow-deferring-for-too-long" timer on each CPU
>> that just causes at least one wake up in some reasonable time delta
>> from the previous wakeup (whoever caused that one.) It is still
>> beneficial in that all deferrable timers would run at once without
>> needing to have separate wakeup for each.
>>
>>>> Of course there are theoretical cases where we could land into a
>>>> situation where a CPU in a multiprocessor machine is IDLE infinitely
>>>> and that causes the watchdog that happens to be bound to run on the
>>>> same CPU to not run. To take care of these unlikely cases I think the
>>>> timer mechanism should have a reasonable limit on how long a CPU can
>>>> go IDLE if there are deferrable timers.
>>> how about something else instead: a timer mechanism that takes a range instead..
>>> that at least has defined semantics; the deferrable semantics really are "indefinite".
>>> Lets keep at least the semantics clear and clean.
>>>
>> Would not the simpler solution of installing a non-deferrable timer
>> per cpu which will not allow the CPU to go IDLE for more than x units
>> of time at once (or something to that effect) work? Range would
>> complicate the thing and I am not sure how many cases will know
>> reasonably correct range for their normal operation. In this instance
>> of the e1000 watchdog what range could it give and be successful at
>> what it wants to do - bring up the link in reasonable amount of time,
>> while also realizing the power savings?
>>
>> Perhaps depending on Server/Laptop/Desktop machine (may be based on
>> Preemption) we could have normal or deferrable timers but that'll
>> exclude Servers from power savings and I am not sure Data center folks
>> will like that :) .
>>
>> Parag
>
>
> The problem is that on a server the receiver will go deaf if the chip
> bug that the watchdog is looking for triggers. Yes, no packets in
> and it happily will just sit there.
>
> So for now, I am not going to apply your simple patch and work on a
> two stage timer per arjan's suggestion for a later release.

I also think that's the right way to go for now. I'll ask jeff to hold off on the
two patches for now.

Auke