LinuxLists.cc - loaded router, excessive getnstimeofday in oprofile

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Friday 22 August 2008, Denys Fedoryshchenko wrote:

Most significant event types where i notice getnstimeofday at top of list.

Additions:
Counted MEMORY_REQUESTS events (Memory requests by type) with a unit mask of
0x01 (Requests to non-cacheable (UC) memory) count 5000
samples % samples % symbol name
129 31.0843 596 31.1879 getnstimeofday
54 13.0120 251 13.1345 __napi_schedule
36 8.6747 178 9.3145 default_idle
34 8.1928 164 8.5819 irq_entries_start
23 5.5422 143 7.4830 __do_softirq

and
CPU: AMD64 processors, speed 2193.74 MHz (estimated)
Counted INTERRUPTS_MASKED_CYCLES events (Cycles with interrupts masked (IF=0))
with a unit mask of 0x00 (No unit mask) count 5000
samples % symbol name
630015 62.4741 getnstimeofday
28634 2.8394 get_next_timer_interrupt
23279 2.3084 __slab_alloc
15775 1.5643 schedule
14765 1.4641 __slab_free
11154 1.1061 native_read_tsc
10953 1.0861 kmem_cache_alloc
10918 1.0827 tick_nohz_stop_sched_tick
10752 1.0662 update_wall_time
10430 1.0343 net_rx_action
10220 1.0134 __do_softirq
9895 0.9812 __update_sched_clock

2008-08-26 09:51:45

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On 22-08-2008 03:57, Denys Fedoryshchenko wrote:
> I have loaded router (~650 Mbps In+Out), based on 2xAMD Opteron 248, Sun Fire
> X4100. HPET timer available (TSC seems not available on this platform).
> Network interfaces is onboard, connected over PCI-X.
>
> Right now i am using only one processor, cause using only one interface and
> interrupts stick to it. Other is almost not used.
> At peak time i notice in mpstat, that this processor is almost "dead", and if
> i run minor application consuming resources - ping over this router will be
> terrible. For me it is clear - system overloaded. I did oprofile, and here is
> result (at low load time, but at peak time it is very similar).
...
> CPU: AMD64 processors, speed 2193.74 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
> mask of 0x00 (No unit mask) count 100000
> samples % symbol name
> 1031727 37.1736 getnstimeofday
> 230457 8.3035 __napi_schedule
> 122154 4.4013 __do_softirq
> 110036 3.9647 dev_queue_xmit
...
> What is bothering me, why getnstimeofday called so much? Even i remove HTB
> shaper, it still takes 30-40% of whole vmlinux time. From other
> applications - only zebra is running.
> Any ideas?

This function is really used in many places, and these profiles are
not enough at least to me, but it seems you could have a lot of
softirqs (and probably hrtimers) scheduling, so maybe you should try
if e.g. disabling hrtimers or changing kernel HZ makes any difference.

Jarek P.

2008-08-26 10:31:34

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Tuesday 26 August 2008, Jarek Poplawski wrote:
>
> This function is really used in many places, and these profiles are
> not enough at least to me, but it seems you could have a lot of
> softirqs (and probably hrtimers) scheduling, so maybe you should try
> if e.g. disabling hrtimers or changing kernel HZ makes any difference.
>
> Jarek P.
One user is shapers, it is ok for me.
I am not sure, but maybe another user is softlockup debug option... and if
there is a lot of task switches maybe it will cause excessive load of timers
slow?

2008-08-26 10:47:34

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Tue, Aug 26, 2008 at 01:29:53PM +0300, Denys Fedoryshchenko wrote:
> On Tuesday 26 August 2008, Jarek Poplawski wrote:
> >
> > This function is really used in many places, and these profiles are
> > not enough at least to me, but it seems you could have a lot of
> > softirqs (and probably hrtimers) scheduling, so maybe you should try
> > if e.g. disabling hrtimers or changing kernel HZ makes any difference.
> >
> > Jarek P.
> One user is shapers, it is ok for me.

The question is if you really need so exact shaping at a cost of
higher system load.

> I am not sure, but maybe another user is softlockup debug option... and if
> there is a lot of task switches maybe it will cause excessive load of timers
> slow?

Maybe. Anyway, you could try if lower HZ (with longer jiffies) can help
with processing more skbs without rescheduling.

Jarek P.

2008-08-26 10:50:48

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Tuesday 26 August 2008, Jarek Poplawski wrote:
> The question is if you really need so exact shaping at a cost of
> higher system load.
Thats maybe another reason to have your patch in mainline :-)
I will try it today with this case, if it will help.

Maybe it can be optional, and enabled via kernel parameter and /sys , so it
can be useful in case of crashes when TSC used and when timer is too slow.
Because it is not so useful just to disable hrtimers completely, if you need
them for some other task...

2008-08-26 11:08:09

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Tue, Aug 26, 2008 at 01:49:09PM +0300, Denys Fedoryshchenko wrote:
> On Tuesday 26 August 2008, Jarek Poplawski wrote:
> > The question is if you really need so exact shaping at a cost of
> > higher system load.
> Thats maybe another reason to have your patch in mainline :-)

We should be first sure when it's really needed.

> I will try it today with this case, if it will help.
>
> Maybe it can be optional, and enabled via kernel parameter and /sys , so it
> can be useful in case of crashes when TSC used and when timer is too slow.
> Because it is not so useful just to disable hrtimers completely, if you need
> them for some other task...

Maybe it could be enough to use current parameters like: "highres=off"
according to Documentation/kernel-parameters.txt?

Jarek P.

2008-08-26 11:16:13

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Tue, Aug 26, 2008 at 11:07:46AM +0000, Jarek Poplawski wrote:
> On Tue, Aug 26, 2008 at 01:49:09PM +0300, Denys Fedoryshchenko wrote:
...
> > Maybe it can be optional, and enabled via kernel parameter and /sys , so it
> > can be useful in case of crashes when TSC used and when timer is too slow.
> > Because it is not so useful just to disable hrtimers completely, if you need
> > them for some other task...
>
> Maybe it could be enough to use current parameters like: "highres=off"
> according to Documentation/kernel-parameters.txt?

Hmm.. it isn't actually answer to your question, sorry. As I said
before I think we need to have more people interested in using such
additional options, and btw. I understood from your message that
disabling htb didn't solve the problem?

Jarek P.

2008-08-26 11:18:18

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Tuesday 26 August 2008, Jarek Poplawski wrote:
> Hmm.. it isn't actually answer to your question, sorry. As I said
> before I think we need to have more people interested in using such
> additional options, and btw. I understood from your message that
> disabling htb didn't solve the problem?
>
> Jarek P.
Only HTB - no. If i disable softlockup debug - seems the load is less (i must
make sure), and if i remove HTB - it is becoming low. I will try to give
exact numbers in recent days.

2008-08-26 11:32:45

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Tue, Aug 26, 2008 at 02:16:32PM +0300, Denys Fedoryshchenko wrote:
> On Tuesday 26 August 2008, Jarek Poplawski wrote:
> > Hmm.. it isn't actually answer to your question, sorry. As I said
> > before I think we need to have more people interested in using such
> > additional options, and btw. I understood from your message that
> > disabling htb didn't solve the problem?
> >
> > Jarek P.
> Only HTB - no. If i disable softlockup debug - seems the load is less (i must
> make sure), and if i remove HTB - it is becoming low. I will try to give
> exact numbers in recent days.
>

So maybe you could try again this htb patch for limiting
qdisc_watchdog_schedule()?

Jarek P.

2008-08-26 11:33:47

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Tuesday 26 August 2008, Jarek Poplawski wrote:

> So maybe you could try again this htb patch for limiting
> qdisc_watchdog_schedule()?
>
> Jarek P.
Yes, and i am going to take snapshops from system load with different boot
flags. It will take time but, cause it is major router.

2008-08-26 20:14:50

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Fri, Aug 22, 2008 at 04:57:40AM +0300, Denys Fedoryshchenko ([email protected]) wrote:
> I have loaded router (~650 Mbps In+Out), based on 2xAMD Opteron 248, Sun Fire
> X4100. HPET timer available (TSC seems not available on this platform).
> Network interfaces is onboard, connected over PCI-X.
>
> Right now i am using only one processor, cause using only one interface and
> interrupts stick to it. Other is almost not used.
> At peak time i notice in mpstat, that this processor is almost "dead", and if
> i run minor application consuming resources - ping over this router will be
> terrible. For me it is clear - system overloaded. I did oprofile, and here is
> result (at low load time, but at peak time it is very similar).

Do you have any packet sockets in this system? Like running dhcp daemon?

--
Evgeniy Polyakov

2008-08-26 20:45:29

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

Evgeniy Polyakov a ?crit :
> On Fri, Aug 22, 2008 at 04:57:40AM +0300, Denys Fedoryshchenko ([email protected]) wrote:
>> I have loaded router (~650 Mbps In+Out), based on 2xAMD Opteron 248, Sun Fire
>> X4100. HPET timer available (TSC seems not available on this platform).
>> Network interfaces is onboard, connected over PCI-X.
>>
>> Right now i am using only one processor, cause using only one interface and
>> interrupts stick to it. Other is almost not used.
>> At peak time i notice in mpstat, that this processor is almost "dead", and if
>> i run minor application consuming resources - ping over this router will be
>> terrible. For me it is clear - system overloaded. I did oprofile, and here is
>> result (at low load time, but at peak time it is very similar).
>
> Do you have any packet sockets in this system? Like running dhcp daemon?
>

Another way to see this problem can be to start a sniffer on the machine, even with a
restrictive pcap filter, to check if performance change or not. (It should decrease)

For example, I believe that running "ping" could have the same effect
(increasing netstamp_needed variable : every incoming packet has to be timestamped)

So beware of pings, traceroute and other networking tools...

2008-08-26 20:52:37

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Tue, Aug 26, 2008 at 10:44:56PM +0200, Eric Dumazet ([email protected]) wrote:
> >Do you have any packet sockets in this system? Like running dhcp daemon?
> >
>
> Another way to see this problem can be to start a sniffer on the machine,
> even with a restrictive pcap filter, to check if performance change or not.
> (It should decrease)

Or just check /proc/net/packet iirc.
Anyway, having at least one packet socket ends up with timestamping of
each packet, so you will get fair load of getnstimeofday() in that case.

> For example, I believe that running "ping" could have the same effect
> (increasing netstamp_needed variable : every incoming packet has to be
> timestamped)
>
> So beware of pings, traceroute and other networking tools...

Yup, this innocent toys can end up with this such behaviour on modern
highly loaded machines.

--
Evgeniy Polyakov

2008-08-27 12:11:01

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

tcp 0 0 127.0.0.1:2600 0.0.0.0:* LISTEN 3167/zebra
tcp 0 0 0.0.0.0:2601 0.0.0.0:* LISTEN 3167/zebra
tcp 0 0 0.0.0.0:2602 0.0.0.0:* LISTEN 3174/ripd
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 3549/sshd
tcp 0 0 194.146.153.17:22 194.146.154.161:37549 ESTABLISHED11593/sshd
tcp 0 0 194.146.153.17:22 192.168.0.92:45891 ESTABLISHED11803/sshd
tcp 0 0 127.0.0.1:42537 127.0.0.1:2600 ESTABLISHED3174/ripd
tcp 0 0 194.146.153.17:22 194.146.153.18:51810 ESTABLISHED11799/sshd
tcp 0 0 127.0.0.1:2600 127.0.0.1:42537 ESTABLISHED3167/zebra
udp 0 0 0.0.0.0:520 0.0.0.0:* 3174/ripd
udp 0 0 0.0.0.0:161 0.0.0.0:* 3194/snmpd
udp 0 0 0.0.0.0:67 0.0.0.0:* 3207/udhcpd
udp 111360 0 0.0.0.0:49619 0.0.0.0:* 2449/syslogd

Attachments:

(No filename) (1.31 kB)
tasks.txt (1.14 kB)
Download all attachments

2008-08-27 12:37:21

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Wed, Aug 27, 2008 at 03:09:17PM +0300, Denys Fedoryshchenko ([email protected]) wrote:
> On Tuesday 26 August 2008, Evgeniy Polyakov wrote:
> > On Tue, Aug 26, 2008 at 10:44:56PM +0200, Eric Dumazet ([email protected])
> wrote:
> > > >Do you have any packet sockets in this system? Like running dhcp daemon?
> No, nothing at all.

Can you put debug print into
net_enable_timestamp()/net_disable_timestamp() to determine if someone
enabled timestamp socket option?

> tcp 0 0 127.0.0.1:2600 0.0.0.0:* LISTEN 3167/zebra
> tcp 0 0 0.0.0.0:2601 0.0.0.0:* LISTEN 3167/zebra
> tcp 0 0 0.0.0.0:2602 0.0.0.0:* LISTEN 3174/ripd
> tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 3549/sshd
> tcp 0 0 194.146.153.17:22 194.146.154.161:37549 ESTABLISHED11593/sshd
> tcp 0 0 194.146.153.17:22 192.168.0.92:45891 ESTABLISHED11803/sshd
> tcp 0 0 127.0.0.1:42537 127.0.0.1:2600 ESTABLISHED3174/ripd
> tcp 0 0 194.146.153.17:22 194.146.153.18:51810 ESTABLISHED11799/sshd
> tcp 0 0 127.0.0.1:2600 127.0.0.1:42537 ESTABLISHED3167/zebra
> udp 0 0 0.0.0.0:520 0.0.0.0:* 3174/ripd
> udp 0 0 0.0.0.0:161 0.0.0.0:* 3194/snmpd
> udp 0 0 0.0.0.0:67 0.0.0.0:* 3207/udhcpd

This one looks suspicious ^^^^^^^^^^

> udp 111360 0 0.0.0.0:49619 0.0.0.0:* 2449/syslogd

--
Evgeniy Polyakov

2008-08-27 12:54:33

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

Evgeniy Polyakov <[email protected]> writes:
>
> Yup, this innocent toys can end up with this such behaviour on modern
> highly loaded machines.

I and also other people had some patches to move the time stamp
measuring into the socket. This way the time stamping didn't need to
be enabled on all packets, only on those that actually end up at a
socket that requires the time stamp.

Unfortunately DaveM didn't like it because some bank wanted
different semantics, see the discussion in
http://thread.gmane.org/gmane.linux.network/91679

Perhaps you can find out which bank it was and send them a bill for
your CPU time ;-)

-Andi

--
[email protected]

2008-08-27 14:10:45

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Wednesday 27 August 2008, Evgeniy Polyakov wrote:
> Can you put debug print into
> net_enable_timestamp()/net_disable_timestamp() to determine if someone
> enabled timestamp socket option?
OK, i will do that on next system reboot.

> > 0.0.0.0:* 3207/udhcpd
>
> This one looks suspicious
> ^^^^^^^^^^
It is busybox udhcpd... i guess it is innocent. Even i kill it - it doesn't
change anything at all.
Only who possible listen multicast socket - it is ripd, i cannot kill him. But
i think it doesn't matter much too...

2008-08-27 14:23:55

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Wed, Aug 27, 2008 at 05:00:35PM +0300, Denys Fedoryshchenko ([email protected]) wrote:
> > > 0.0.0.0:* 3207/udhcpd
> >
> > This one looks suspicious
> > ^^^^^^^^^^
> It is busybox udhcpd... i guess it is innocent. Even i kill it - it doesn't
> change anything at all.
> Only who possible listen multicast socket - it is ripd, i cannot kill him. But
> i think it doesn't matter much too...

It depends... If it turns timestamps on, then you will have this
behaviour. Please check if timestamps are actually enabled, so we could
remove one (im)possible case.

--
Evgeniy Polyakov

2008-08-27 16:08:29

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

Andi Kleen wrote:
> Evgeniy Polyakov <[email protected]> writes:
>
>>Yup, this innocent toys can end up with this such behaviour on modern
>>highly loaded machines.
>
>
> I and also other people had some patches to move the time stamp
> measuring into the socket. This way the time stamping didn't need to
> be enabled on all packets, only on those that actually end up at a
> socket that requires the time stamp.
>
> Unfortunately DaveM didn't like it because some bank wanted
> different semantics, see the discussion in
> http://thread.gmane.org/gmane.linux.network/91679
>
> Perhaps you can find out which bank it was and send them a bill for
> your CPU time ;-)

Those banks really want to crank down on latency - to the point they
start disabling interrupt coalescing. I bet they'd toss anything out
they could to shave another microsecond.

rick jones

2008-08-27 16:17:23

by Stephen Hemminger

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Wed, 27 Aug 2008 14:54:12 +0200
Andi Kleen <[email protected]> wrote:

> Evgeniy Polyakov <[email protected]> writes:
> >
> > Yup, this innocent toys can end up with this such behaviour on modern
> > highly loaded machines.
>
> I and also other people had some patches to move the time stamp
> measuring into the socket. This way the time stamping didn't need to
> be enabled on all packets, only on those that actually end up at a
> socket that requires the time stamp.
>
> Unfortunately DaveM didn't like it because some bank wanted
> different semantics, see the discussion in
> http://thread.gmane.org/gmane.linux.network/91679
>
> Perhaps you can find out which bank it was and send them a bill for
> your CPU time ;-)
>
> -Andi
>

Look at /proc/net/ptype to see if any AF_PACKET sockets are open.
There are several causes of this:
* Applications like DHCP use AF_PACKET when they could use something else
* AF_PACKET API was poorly designed and always has timestamps
* The choice was made to get more accurate timestamps by stamping early in
receive code. A better alternative would be to do it in protocol handler
after the socket filter. Sorry, Andi socket layer is too late.
* No driver is using hardware mechanisms to get accurate/free timestamps.
I was working on sky2, but never was stable/complete.

Easist advice now is to fix userspace.

2008-08-27 16:26:11

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

> Those banks really want to crank down on latency - to the point they
> start disabling interrupt coalescing. I bet they'd toss anything out
> they could to shave another microsecond.

This change would actually likely lower their latency.

-Andi

2008-08-27 16:49:27

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

Andi Kleen wrote:
>>Those banks really want to crank down on latency - to the point they
>>start disabling interrupt coalescing. I bet they'd toss anything out
>>they could to shave another microsecond.
>
>
> This change would actually likely lower their latency.

I'm guessing you mean increase their latency? I agree, it could -
depends entirely on the PPS in production I suspect.

rick jones

ftp://ftp.cup.hp.com/dist/networking/briefs/nic_latency_vs_tput.txt

I should probably refresh/update that one of these days

2008-08-27 16:54:27

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Wed, Aug 27, 2008 at 09:49:10AM -0700, Rick Jones wrote:
> Andi Kleen wrote:
> >>Those banks really want to crank down on latency - to the point they
> >>start disabling interrupt coalescing. I bet they'd toss anything out
> >>they could to shave another microsecond.
> >
> >
> >This change would actually likely lower their latency.
>
> I'm guessing you mean increase their latency? I agree, it could -
> depends entirely on the PPS in production I suspect.

No, moving the time stamps into the socket decreases latency
for all packets that don't need time stamps. And they likely
have some packets which don't need time stamps too.

As a secondary effect if they use a RT kernel it might
be also beneficial to do the (depending on the platform)
costly time stamp in the lower priority socket context
than in the high priority interrupt thread.

-Andi

2008-08-27 16:58:47

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

Andi Kleen wrote:
> On Wed, Aug 27, 2008 at 09:49:10AM -0700, Rick Jones wrote:
>
>>Andi Kleen wrote:
>>
>>>>Those banks really want to crank down on latency - to the point they
>>>>start disabling interrupt coalescing. I bet they'd toss anything out
>>>>they could to shave another microsecond.
>>>
>>>
>>>This change would actually likely lower their latency.
>>
>>I'm guessing you mean increase their latency? I agree, it could -
>>depends entirely on the PPS in production I suspect.
>
>
> No, moving the time stamps into the socket decreases latency
> for all packets that don't need time stamps. And they likely
> have some packets which don't need time stamps too.

Ah, since that part of the discussion wasn't in the quoted text I
assumed you were talking about the disabling of interrupt coalescing.

rick jones

>
> As a secondary effect if they use a RT kernel it might
> be also beneficial to do the (depending on the platform)
> costly time stamp in the lower priority socket context
> than in the high priority interrupt thread.
>
> -Andi

2008-08-27 17:14:22

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

Stephen Hemminger wrote, On 08/27/2008 06:17 PM:

> On Wed, 27 Aug 2008 14:54:12 +0200
> Andi Kleen <[email protected]> wrote:
>
>> Evgeniy Polyakov <[email protected]> writes:
>>> Yup, this innocent toys can end up with this such behaviour on modern
>>> highly loaded machines.
>> I and also other people had some patches to move the time stamp
>> measuring into the socket. This way the time stamping didn't need to
>> be enabled on all packets, only on those that actually end up at a
>> socket that requires the time stamp.
>>
>> Unfortunately DaveM didn't like it because some bank wanted
>> different semantics, see the discussion in
>> http://thread.gmane.org/gmane.linux.network/91679
>>
>> Perhaps you can find out which bank it was and send them a bill for
>> your CPU time ;-)
>>
>> -Andi
>>
>
> Look at /proc/net/ptype to see if any AF_PACKET sockets are open.
> There are several causes of this:
> * Applications like DHCP use AF_PACKET when they could use something else
> * AF_PACKET API was poorly designed and always has timestamps
> * The choice was made to get more accurate timestamps by stamping early in
> receive code. A better alternative would be to do it in protocol handler
> after the socket filter. Sorry, Andi socket layer is too late.
> * No driver is using hardware mechanisms to get accurate/free timestamps.
> I was working on sky2, but never was stable/complete.
>
> Easist advice now is to fix userspace.

And what is working advice?

Why exactly admin can't chose between 2 alternatives here?

Jarek P.

2008-08-27 17:27:27

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

Andi Kleen a ?crit :
> On Wed, Aug 27, 2008 at 09:49:10AM -0700, Rick Jones wrote:
>> Andi Kleen wrote:
>>>> Those banks really want to crank down on latency - to the point they
>>>> start disabling interrupt coalescing. I bet they'd toss anything out
>>>> they could to shave another microsecond.
>>>
>>> This change would actually likely lower their latency.
>> I'm guessing you mean increase their latency? I agree, it could -
>> depends entirely on the PPS in production I suspect.
>
> No, moving the time stamps into the socket decreases latency
> for all packets that don't need time stamps. And they likely
> have some packets which don't need time stamps too.
>
> As a secondary effect if they use a RT kernel it might
> be also beneficial to do the (depending on the platform)
> costly time stamp in the lower priority socket context
> than in the high priority interrupt thread.
>

Doing the expensive timestamping in a possibly delayed thread (ie some milliseconds
after hardware notification) is wrong/useless.

Better use plain xtime instead of getnstimeofday() in this case.

We could provide a sysctl setting so that admin can chose between precise timestamps
(current behavior) or fast but low resolution timestamping (xtime based)

2008-08-27 18:29:57

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile\

> Doing the expensive timestamping in a possibly delayed thread (ie some
> milliseconds
> after hardware notification) is wrong/useless.

We had this discussion earlier, please review the thread I linked to.

Note that interrupts can be arbitarily delayed too (both by cli
and by interrupt mitigation), even on a non RT kernel.

If you want exact notification (packet arriving at your NIC's buffers) you
need NIC hardware support (and more and more NICs have it[1]).
If you do it in software then even the interrupt is at the end of a long
queue with a pretty much arbitary delay. Doing it in socket context is
just one queue more. It's pretty much all arbitary.

The argument for doing it as late as possible is the prohibitive
cost on some systems as people notice all the time.

-Andi

[1] Unfortunately not necessarily synchronized with system time.

2008-08-27 21:34:20

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

From: Andi Kleen <[email protected]>
Date: Wed, 27 Aug 2008 14:54:12 +0200

> Evgeniy Polyakov <[email protected]> writes:
> >
> > Yup, this innocent toys can end up with this such behaviour on modern
> > highly loaded machines.
>
> I and also other people had some patches to move the time stamp
> measuring into the socket. This way the time stamping didn't need to
> be enabled on all packets, only on those that actually end up at a
> socket that requires the time stamp.

By the time you get to the socket, it might be eons (relatively
speaking) later, decreasing the usefulness of the timestamp.

As just an odd example if the TCP socket is user locked at the moment,
because the user is blocked on a GFP_KERNEL allocation, it could be
a very long time before we actually process the packet and timestamp
it.

UDP now does similar socket locking so could potentially hit the same
kind of problem.

That was my argument against such a change.

I find it amusing that nobody it talking about fixing the tools
that are creating the timestamp requests when they have no real
reason for having them in the first place.

2008-08-27 22:18:40

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

From: Andi Kleen <[email protected]>
Date: Wed, 27 Aug 2008 18:27:35 +0200

> > Those banks really want to crank down on latency - to the point they
> > start disabling interrupt coalescing. I bet they'd toss anything out
> > they could to shave another microsecond.
>
> This change would actually likely lower their latency.

They want the timestamps, but they want it to match when the packet
arrived at their system as closely as is reasonably possible.

Socket based solutions don't do that, because we can be sleeping on
GFP_KERNEL memory or similar with the socket locked, and thus not
be able to set the timestamp until the task wakes up and processes
the backlog.

2008-08-27 22:23:27

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile\

From: Andi Kleen <[email protected]>
Date: Wed, 27 Aug 2008 20:32:16 +0200

> > Doing the expensive timestamping in a possibly delayed thread (ie some
> > milliseconds
> > after hardware notification) is wrong/useless.
>
> We had this discussion earlier, please review the thread I linked to.
>
> Note that interrupts can be arbitarily delayed too (both by cli
> and by interrupt mitigation), even on a non RT kernel.

This is a much different kind of delay compared to sleeping for seconds
or longer on the socket lock while a GFP_KERNEL allocation is being
satisfied by swapping tons of crap out to disk.

Your socket solution is not a workable scheme.

2008-08-27 22:36:09

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile\

> This is a much different kind of delay compared to sleeping for seconds
> or longer on the socket lock while a GFP_KERNEL allocation is being
> satisfied by swapping tons of crap out to disk.

When this happens then new incoming packets will be lost anyways
because there will be no new packets fed back into the RX ring
because their allocation will either stall or fail too.
I don't think time stamps of dropped packets are very useful ;-)

-Andi

--
[email protected]

2008-08-27 22:37:28

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Wed, Aug 27, 2008 at 03:18:24PM -0700, David Miller wrote:
> From: Andi Kleen <[email protected]>
> Date: Wed, 27 Aug 2008 18:27:35 +0200
>
> > > Those banks really want to crank down on latency - to the point they
> > > start disabling interrupt coalescing. I bet they'd toss anything out
> > > they could to shave another microsecond.
> >
> > This change would actually likely lower their latency.
>
> They want the timestamps, but they want it to match when the packet
> arrived at their system as closely as is reasonably possible.

Then they should use hardware time stamps which are increasingly
available (e.g. current Intel e1000 design has them and I expect
others too).

-Andi

2008-08-28 00:45:29

by Nick Piggin

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Thursday 28 August 2008 08:39, Andi Kleen wrote:
> On Wed, Aug 27, 2008 at 03:18:24PM -0700, David Miller wrote:
> > From: Andi Kleen <[email protected]>
> > Date: Wed, 27 Aug 2008 18:27:35 +0200
> >
> > > > Those banks really want to crank down on latency - to the point they
> > > > start disabling interrupt coalescing. I bet they'd toss anything out
> > > > they could to shave another microsecond.
> > >
> > > This change would actually likely lower their latency.
> >
> > They want the timestamps, but they want it to match when the packet
> > arrived at their system as closely as is reasonably possible.
>
> Then they should use hardware time stamps which are increasingly
> available (e.g. current Intel e1000 design has them and I expect
> others too).

Would it make sense to make a new option for these socket timestamps
and encourage some apps move over to it?

2008-08-28 00:49:20

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

From: Nick Piggin <[email protected]>
Date: Thu, 28 Aug 2008 10:45:03 +1000

> On Thursday 28 August 2008 08:39, Andi Kleen wrote:
> > On Wed, Aug 27, 2008 at 03:18:24PM -0700, David Miller wrote:
> > > From: Andi Kleen <[email protected]>
> > > Date: Wed, 27 Aug 2008 18:27:35 +0200
> > >
> > > > > Those banks really want to crank down on latency - to the point they
> > > > > start disabling interrupt coalescing. I bet they'd toss anything out
> > > > > they could to shave another microsecond.
> > > >
> > > > This change would actually likely lower their latency.
> > >
> > > They want the timestamps, but they want it to match when the packet
> > > arrived at their system as closely as is reasonably possible.
> >
> > Then they should use hardware time stamps which are increasingly
> > available (e.g. current Intel e1000 design has them and I expect
> > others too).
>
> Would it make sense to make a new option for these socket timestamps
> and encourage some apps move over to it?

We don't have support to using these specific hardware provided timestamps
sources yet, so it's kind of premature to recommend the facility to
applications. :)

2008-08-28 01:07:42

by Nick Piggin

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Thursday 28 August 2008 10:48, David Miller wrote:
> From: Nick Piggin <[email protected]>
> Date: Thu, 28 Aug 2008 10:45:03 +1000
>
> > On Thursday 28 August 2008 08:39, Andi Kleen wrote:
> > > On Wed, Aug 27, 2008 at 03:18:24PM -0700, David Miller wrote:
> > > > From: Andi Kleen <[email protected]>
> > > > Date: Wed, 27 Aug 2008 18:27:35 +0200
> > > >
> > > > > > Those banks really want to crank down on latency - to the point
> > > > > > they start disabling interrupt coalescing. I bet they'd toss
> > > > > > anything out they could to shave another microsecond.
> > > > >
> > > > > This change would actually likely lower their latency.
> > > >
> > > > They want the timestamps, but they want it to match when the packet
> > > > arrived at their system as closely as is reasonably possible.
> > >
> > > Then they should use hardware time stamps which are increasingly
> > > available (e.g. current Intel e1000 design has them and I expect
> > > others too).
> >
> > Would it make sense to make a new option for these socket timestamps
> > and encourage some apps move over to it?
>
> We don't have support to using these specific hardware provided timestamps
> sources yet, so it's kind of premature to recommend the facility to
> applications. :)

Dang, that was a really badly quoted. I was reading the thread and
got to the end and just fired off my reply from there...

Sorry -- what I meant to ask was, would it make sense to have a new
option to enable time stamp measuring in the socket receive layer
as in the patchset that Andi referenced, but without removing existing
support for early timestamping?

2008-08-28 02:45:21

by Jason Uhlenkott

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Wed, Aug 27, 2008 at 14:34:01 -0700, David Miller wrote:
> By the time you get to the socket, it might be eons (relatively
> speaking) later, decreasing the usefulness of the timestamp.

It's a *socket* option. It's named SO_TIMESTAMP. Users of it ought
to *expect* that it records the time the packet hits the socket, not
the time the frame hits the device.

If banks want to know when frames are hitting their devices, that's
fine, but setsockopt() is the wrong layer for controlling that sort of
thing. An interface flag would make a lot more sense.

> I find it amusing that nobody it talking about fixing the tools
> that are creating the timestamp requests when they have no real
> reason for having them in the first place.

I don't agree that the tools are broken. Some of them may have
frivolous reasons for wanting timestamps, but they're asking for
something at the socket layer, with the scope of a single socket, and
it's hardly their fault that we respond to that by doing something
expensive and global at a much lower level.

2008-08-28 03:10:37

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

From: Jason Uhlenkott <[email protected]>
Date: Wed, 27 Aug 2008 19:39:58 -0700

> It's a *socket* option. It's named SO_TIMESTAMP. Users of it ought
> to *expect* that it records the time the packet hits the socket, not
> the time the frame hits the device.

When expectations equal reality, and then we change reality, that's
called breaking things.

What might (and I do mean "might") save us is how other systems
implement this. A quick check of BSD shows that at least OpenBSD
fetches the timestamp inside of the RAW and UDP usrreq handler,
which is basically socket receive.

Our man pages simply say "reception" as when the timestamp is from,
which may also give us some more leeway.
From: Jason Uhlenkott <[email protected]>
Date: Wed, 27 Aug 2008 19:39:58 -0700

> > I find it amusing that nobody it talking about fixing the tools
> > that are creating the timestamp requests when they have no real
> > reason for having them in the first place.
>
> I don't agree that the tools are broken. Some of them may have
> frivolous reasons for wanting timestamps, but they're asking for
> something at the socket layer, with the scope of a single socket, and
> it's hardly their fault that we respond to that by doing something
> expensive and global at a much lower level.

Every application using AF_PACKET sockets gets timestamps by
default. And we do know of several specific cases where the
timestamps are unnecessary.

Even for other cases, why in the world does a DHCP client need
accurate timestamps? Give me a break. :)

2008-08-28 03:35:33

by Stephen Hemminger

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Fri, 22 Aug 2008 04:57:40 +0300
Denys Fedoryshchenko <[email protected]> wrote:

> I have loaded router (~650 Mbps In+Out), based on 2xAMD Opteron 248, Sun Fire
> X4100. HPET timer available (TSC seems not available on this platform).
> Network interfaces is onboard, connected over PCI-X.
>
> Right now i am using only one processor, cause using only one interface and
> interrupts stick to it. Other is almost not used.
> At peak time i notice in mpstat, that this processor is almost "dead", and if
> i run minor application consuming resources - ping over this router will be
> terrible. For me it is clear - system overloaded. I did oprofile, and here is
> result (at low load time, but at peak time it is very similar).
>
> CPU: AMD64 processors, speed 2193.74 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
> mask of 0x00 (No unit mask) count 100000
> CPU_CLK_UNHALT...|
> samples| %|
> ------------------
> 2679376 71.9851 vmlinux
> 287212 7.7163 e1000
> 278674 7.4870 ip_tables
> 259923 6.9832 nf_conntrack
> 29699 0.7979 iptable_nat
> 26752 0.7187 nf_nat
> 26093 0.7010 nf_conntrack_ipv4
> 16525 0.4440 iptable_mangle
> 14988 0.4027 oprofiled
>
>
> CPU: AMD64 processors, speed 2193.74 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit
> mask of 0x00 (No unit mask) count 100000
> samples % symbol name
> 1031727 37.1736 getnstimeofday
> 230457 8.3035 __napi_schedule
> 122154 4.4013 __do_softirq
> 110036 3.9647 dev_queue_xmit
> 88800 3.1995 net_rx_action
> 71163 2.5640 ip_route_input
> 52232 1.8819 local_bh_enable
> 43804 1.5783 get_next_timer_interrupt
> 43387 1.5633 ip_forward
> 35501 1.2791 nf_iterate
> 35212 1.2687 __slab_alloc
> 34652 1.2485 default_idle
> 32375 1.1665 kfree
> 28127 1.0134 kmem_cache_alloc
>
> What is bothering me, why getnstimeofday called so much? Even i remove HTB
> shaper, it still takes 30-40% of whole vmlinux time. From other
> applications - only zebra is running.
> Any ideas?

What kernel version is this? There was a fix to AF_PACKET about a year ago
to reduce this.

2008-08-28 06:46:46

by Joe Malicki

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

----- "David Miller" <[email protected]> wrote:
>
> Every application using AF_PACKET sockets gets timestamps by
> default. And we do know of several specific cases where the
> timestamps are unnecessary.
>
> Even for other cases, why in the world does a DHCP client need
> accurate timestamps? Give me a break. :)
>

I've worked with systems where SO_TIMESTAMP has been used for
H.323 videoconferencing systems to synchronize audio and video
where remote systems' timestamps on the protocol streams proved
to be inaccurate (based off of different, unsynchronized clocks).

I can't see any other realistic use of this, but trying to get
timestamps for quasi-realtime protocols may be an important use
case - and in that case, you want the time when it hits the
interface, NOT when it hits the socket.

What utility does the time of hitting the socket get you?

2008-08-28 07:19:58

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

> I've worked with systems where SO_TIMESTAMP has been used for
> H.323 videoconferencing systems to synchronize audio and video

But didn't you really want a "end2end" time stamp in this case,
as in really at the end of all kernel/hardware queues on your side.

A packet roughly travels this way on a normal NIC before it hits recvmsg()

wire -> NIC on die buffers -> NIC RX ring -> interrupt handler ->
NAPI or per CPU queue -> softirq socket lookup -> socket queue -> recvmsg

These all do their own queuing and all queues can add delays depending
on the load. Right now SO_TIMESTAMP is in the interrupt handler,
but it's just an arbitary position in a multitude of queues.

For video conferencing (or e.g. in general if you implement a retransmit
timeout in user space) scheduling delays on the local box
surely need to be taken into account too because they all add
to the final timing of the packets on the wire.

The queues inside the system are really part of the network
too. In Linux for example the algorithms who size the TCP
buffer space know that and especially take account for it
and reserve a local queue buffer.

> where remote systems' timestamps on the protocol streams proved
> to be inaccurate (based off of different, unsynchronized clocks).

Yes, but why ignore local scheduling delays?

>
> I can't see any other realistic use of this, but trying to get
> timestamps for quasi-realtime protocols may be an important use
> case - and in that case, you want the time when it hits the
> interface, NOT when it hits the socket.

I think it's the other way round. Why would the real time
protocol care when it hits some arbitary queue in the network
stack instead of the time when the application can really
read the data?

> What utility does the time of hitting the socket get you?

SO_TIMESTAMP was originally invented for passive network
monitoring as in tcpdump (for which PACKET sockets were designed
originally, DHCP is really just abusing them imho). There it makes
some sense to do the time stamp as near on the wire as possible
but really a hardware time stamp would be better because
it is even nearer. But for anything that does end2end it's
the wrong semantics anyways because ignoring local queueing
delays would be just a bug, and SO_TIMESTAMP ignores them currently.

-Andi

--
[email protected]

2008-08-28 08:50:51

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Thursday 28 August 2008, Stephen Hemminger wrote:
>
> What kernel version is this? There was a fix to AF_PACKET about a year ago
> to reduce this.
> --
git net-2.6 based on 2.6.27-rc3. Means very fresh.

2008-08-28 15:03:57

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Thursday 28 August 2008, Andi Kleen wrote:
I hit one more bug, while deleting root class for htb on ifb0 i got tc stuck
(and all operations related to tc), but there was some fixes for this things
in net-2.6, so i tried to update git tree.

It seems i cannot test current net-2.6, because it is broken for me
on USB part (fixed by workaround in init scripts), HPET totally broken in
net-2.6, but works for latest main git from torvalds tree.
I have to wait when net-2.6 rebased to current torvalds tree, then i will try
to test.

2008-08-28 16:50:44

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

My small IMHO regarding SO_TIMESTAMP.

1)Right now i have 400-500 Mbps passing router. If i will run
5 "pings" ,simultaneous ,under _USER_ privileges(i know ping is suid),
instead of free 20% CPU time, i will have 1-2% free CPU time. Sure i know
ping is suid program, but it is has been "like this" since long time. By
security psychos it will be caled DoS.

2)Usefullness of this option. What is a difference if on almost idle machine
timestamp retrieved on higher level or lower level?
And why we need on highly loaded server so high precision timestamp (with
expensive timer), if in my case enabling any socket with SO_TIMESTAMP
creating delays more than 10ms(up to 100ms)?

3)Who is most users of SO_TIMESTAMP? iputils which is installed on almost
_ANY_ linux machine? busybox which is using same option? Many others
userspace multiplatform applications? Or banks? I dont take much in account
dhcpd, who is maybe abusing this option.

So there is few good solutions available (IMHO):
1)Introduce some SO_REALTIMESTAMP (anyway even SO_TIMESTAMP not defined in any
standard) for banks and ntp folks, who need them. And even give them timespec
instead timeval, so they will be even more happy with resolution.
2)Provide sysctl,kernel boot, or even "build time" option for "banks" to have
high resolution(and expensive) SO_TIMESTAMP.

2008-08-28 16:54:21

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Thu, Aug 28, 2008 at 07:48:52PM +0300, Denys Fedoryshchenko wrote:
> 1)Right now i have 400-500 Mbps passing router. If i will run
> 5 "pings" ,simultaneous ,under _USER_ privileges(i know ping is suid),
> instead of free 20% CPU time, i will have 1-2% free CPU time. Sure i know
> ping is suid program, but it is has been "like this" since long time. By
> security psychos it will be caled DoS.

The skb timestamp overhead does not add up, it's either on or off.
If multiple pings make the router slower it must be something else.

-Andi
--
[email protected]

2008-08-28 18:01:14

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

> What utility does the time of hitting the socket get you?

The earliest time the application could have been expected to start
processing the request. Until it hits the socket, it might as well be
somewhere in the cloud. By that reasoning of course, one could argue
that a gettimeofday() call immediately following recv() would suffice.

Earlier in the thread mention was made of financial services types. If
someone has knowledge of the (probably) arcane rules under which they
must operate it would be great to hear more. Does some entity like the
SEC (Securities and Exchange Commission in the United States) mandate
some sort of timestamp for when the trading request "arrives at the
trading system" and do they define that "arriving at the trading system"
means?

rick jones

2008-08-28 18:57:30

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

Denys Fedoryshchenko a ?crit :
> My small IMHO regarding SO_TIMESTAMP.
>
> 1)Right now i have 400-500 Mbps passing router. If i will run
> 5 "pings" ,simultaneous ,under _USER_ privileges(i know ping is suid),
> instead of free 20% CPU time, i will have 1-2% free CPU time. Sure i know
> ping is suid program, but it is has been "like this" since long time. By
> security psychos it will be caled DoS.
>

>

So... if using ping on your machine has direct an noticeable effect on cpu load, problem is elsewhere
(if no ping is running, you dont have skb timestamping, but still getnstimeofday() is the top function in oprofile)

1) Do you have any netfilter rule using xt_time ?
(This module also calls __net_timestamp(skb))

2) You maybe have a bad program that do something expensive relative to kernel time services.

bad_program()
{
while (1) {
struct timeval t0,t1;
gettimeofday(&tv0, NULL); // or whatever function that calls getnstimeofday()
do_small_work();
gettimeofday(&tv1, NULL); // or whatever function that calls getnstimeofday()
add_stat_event(&tv1, &tv0);
}

> 2)Usefullness of this option. What is a difference if on almost idle machine
> timestamp retrieved on higher level or lower level?
> And why we need on highly loaded server so high precision timestamp (with
> expensive timer), if in my case enabling any socket with SO_TIMESTAMP
> creating delays more than 10ms(up to 100ms)?

Your setup is probably not common.
You want a PersonnalComputer class machine acts as a SuperCiscoDevice(TM),
while most PC machines dont use more than 10% of CPU power in average...

Many existing programs depend on current SO_TIMESTAMP.
We wont break them to solve a particular problem (yet to be demonstrated)

>
> 3)Who is most users of SO_TIMESTAMP? iputils which is installed on almost
> _ANY_ linux machine? busybox which is using same option? Many others
> userspace multiplatform applications? Or banks? I dont take much in account
> dhcpd, who is maybe abusing this option.
>
> So there is few good solutions available (IMHO):
> 1)Introduce some SO_REALTIMESTAMP (anyway even SO_TIMESTAMP not defined in any
> standard) for banks and ntp folks, who need them. And even give them timespec
> instead timeval, so they will be even more happy with resolution.

kernel already provides nanosecond resolution :)
Check SO_TIMESTAMPNS and SCM_TIMESTAMPNS

2008-08-28 19:01:27

by Ilpo Järvinen

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Thu, 28 Aug 2008, Denys Fedoryshchenko wrote:

> On Thursday 28 August 2008, Andi Kleen wrote:
> I hit one more bug, while deleting root class for htb on ifb0 i got tc stuck
> (and all operations related to tc), but there was some fixes for this things
> in net-2.6, so i tried to update git tree.
>
> It seems i cannot test current net-2.6, because it is broken for me
> on USB part (fixed by workaround in init scripts), HPET totally broken in
> net-2.6, but works for latest main git from torvalds tree.
> I have to wait when net-2.6 rebased to current torvalds tree, then i will try
> to test.

You could always pull net-2.6 to Linus' tree by yourself.

...And about the workflow, net-2.6 isn't rebased, instead Linus just
pulls it in to his tree.

--
i.

2008-08-28 19:27:41

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Thursday 28 August 2008, Eric Dumazet wrote:

> So... if using ping on your machine has direct an noticeable effect on cpu
> load, problem is elsewhere (if no ping is running, you dont have skb
> timestamping, but still getnstimeofday() is the top function in oprofile)

>
> 1) Do you have any netfilter rule using xt_time ?
> (This module also calls __net_timestamp(skb))
No

>
> 2) You maybe have a bad program that do something expensive relative to
> kernel time services.
No, process list is very short, it is custom semi-embedded linux distro i
made, so i know each process running there. Here is process list (kernel
processes/threads and running shell(busybox ash) removed)

1 root /bin/sh /init
1119 root init
2451 root /sbin/syslogd -R 80.83.17.2
2453 root /sbin/klogd
3168 squid /usr/sbin/zebra -d
3175 squid /usr/sbin/ripd -d
3195 root /usr/sbin/snmpd -c /config/snmpd.conf
3208 root udhcpd /config/udhcp.office.conf -S
3550 root /usr/sbin/sshd -b /etc/banner
3566 root /sbin/getty 38400 tty1
3567 root /sbin/getty 38400 tty2
3570 root /sbin/getty 38400 tty3
4055 root /usr/sbin/sshd -b /etc/banner

> Your setup is probably not common.
> You want a PersonnalComputer class machine acts as a SuperCiscoDevice(TM),
> while most PC machines dont use more than 10% of CPU power in average...
I dont think i am alone, and almost sure there is many guys trying to run
linux as high-performance router. But most of them dont know about
netdev@ :-)

Well, thats called "Increasing resources use efficiency and system
productivity". It is never a shame to utilize resources more efficiently.
Plus i am not using PC class machine. For example this one with HPET, is Sun
Fire X4100, which costs us that time a lot of bucks, and mostly because it is
reliable hardware (very good IPMI/remote kvm/... onboard, good cooling, 4
e1000, dual power supply).

I can use also PC class, but i will face some issues, like building proper
cooling system and maybe even it will not work well, cause some chips not
designed for "heavy duty", and on load they will not be able to dissipate
heat inside the chip and will be broked soon. But sometimes it is even worth
to try.

And most important, many routers is already "soft"-routers. What is Cisco
7206+NPE G1/G2? It is MIPS CPU with relatively large L2 cache. There is seems
no ASIC for routing offloading. Means Linux can do same or better job.
And means Vyatta can beat Cisco on this market, and be far away forward from
Cisco soon. As result more jobs for opensource guys. Linux must enter "heavy
duty" and critical jobs too, not only SOHO-class routers.

>
> Many existing programs depend on current SO_TIMESTAMP.
> We wont break them to solve a particular problem (yet to be demonstrated)
I think it wouldn't break. But sure we must be very careful and on my side i
can test all possible scenarios i can implement.

Maybe even good idea to not change (for now) current default behaviour, but
to provide option for "high performance" systems then.

> kernel already provides nanosecond resolution :)
> Check SO_TIMESTAMPNS and SCM_TIMESTAMPNS
Maybe this function really must be "heavy" then.

2008-08-28 19:31:45

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

From: Denys Fedoryshchenko <[email protected]>
Date: Thu, 28 Aug 2008 18:02:17 +0300

> It seems i cannot test current net-2.6, because it is broken for me
> on USB part (fixed by workaround in init scripts), HPET totally broken in
> net-2.6, but works for latest main git from torvalds tree.
> I have to wait when net-2.6 rebased to current torvalds tree, then i will try
> to test.

Make a clone of Linus's tree, then pull in the net-2.6 tree.

This is always how you should test things especially if you want to
make sure you have whatever non-networking bug fixes your machine
might require.

2008-08-28 19:37:03

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

From: Denys Fedoryshchenko <[email protected]>
Date: Thu, 28 Aug 2008 19:48:52 +0300

> So there is few good solutions available (IMHO):
> 1)Introduce some SO_REALTIMESTAMP (anyway even SO_TIMESTAMP not defined in any
> standard) for banks and ntp folks, who need them. And even give them timespec
> instead timeval, so they will be even more happy with resolution.
> 2)Provide sysctl,kernel boot, or even "build time" option for "banks" to have
> high resolution(and expensive) SO_TIMESTAMP.

The performance hit hurts, but changing the default to lower
resolution after it having been high resolution for 10+ years
is a regression and something we really can't do.

2008-08-28 19:38:10

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

Denys Fedoryshchenko a ?crit :
> On Thursday 28 August 2008, Eric Dumazet wrote:
>> 2) You maybe have a bad program that do something expensive relative to
>> kernel time services.
> No, process list is very short, it is custom semi-embedded linux distro i
> made, so i know each process running there. Here is process list (kernel
> processes/threads and running shell(busybox ash) removed)
>
> 1 root /bin/sh /init
> 1119 root init
> 2451 root /sbin/syslogd -R 80.83.17.2
> 2453 root /sbin/klogd
> 3168 squid /usr/sbin/zebra -d
> 3175 squid /usr/sbin/ripd -d
> 3195 root /usr/sbin/snmpd -c /config/snmpd.conf
> 3208 root udhcpd /config/udhcp.office.conf -S
> 3550 root /usr/sbin/sshd -b /etc/banner
> 3566 root /sbin/getty 38400 tty1
> 3567 root /sbin/getty 38400 tty2
> 3570 root /sbin/getty 38400 tty3
> 4055 root /usr/sbin/sshd -b /etc/banner
>

OK, please try oprofile with call graph analysis.

>
>> kernel already provides nanosecond resolution :)
>> Check SO_TIMESTAMPNS and SCM_TIMESTAMPNS
> Maybe this function really must be "heavy" then.

Nope... the contrary :)

Kernel timestamping has nanosec resolution.

SO_TIMESTAMP needs a divide (by 1000), while SO_TIMESTAMPNS is native.

2008-08-28 19:43:15

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

From: Rick Jones <[email protected]>
Date: Thu, 28 Aug 2008 11:00:48 -0700

> Earlier in the thread mention was made of financial services types.
> If someone has knowledge of the (probably) arcane rules under which
> they must operate it would be great to hear more. Does some entity
> like the SEC (Securities and Exchange Commission in the United
> States) mandate some sort of timestamp for when the trading request
> "arrives at the trading system" and do they define that "arriving at
> the trading system" means?

The issue is the ordering of processing the requests.

So if request A arrived on interface 1 before request B arrived on
interface 2, the trade described in A should be performed before the
one in B.

This is not "arcance" as you seem to suppose it might be, but rather
pretty clear fair handling or requests sent between trading desks.

2008-08-28 19:57:15

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Thursday 28 August 2008, Eric Dumazet wrote:
> OK, please try oprofile with call graph analysis.
I did already. Even because most of programs (except ripd/zebra) can be
killed, and i kill them, it doesn't change almost anything.

it seems heavy things causing instability:

1)HTB (resolution can be lowered to improve performance, i will try Jarek
patch soon)
2)ocassionally ping/tcpdump other SO_TIMESTAMP users
3)Probably softlockup detection. Disabled already, i will come back to it
soon, if it is required.

One of other issues i notice - "CACHE MISS" cause maybe almost 5-10% in
oprofile in u32, but i am not sure it is interesting subject to discuss. I
have to optimize all my iproute2 rules first.

2008-08-28 20:01:27

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Thursday 28 August 2008, David Miller wrote:
> From: Denys Fedoryshchenko <[email protected]>
> Date: Thu, 28 Aug 2008 19:48:52 +0300
>
> > So there is few good solutions available (IMHO):
> > 1)Introduce some SO_REALTIMESTAMP (anyway even SO_TIMESTAMP not defined
> > in any standard) for banks and ntp folks, who need them. And even give
> > them timespec instead timeval, so they will be even more happy with
> > resolution. 2)Provide sysctl,kernel boot, or even "build time" option for
> > "banks" to have high resolution(and expensive) SO_TIMESTAMP.
>
> The performance hit hurts, but changing the default to lower
> resolution after it having been high resolution for 10+ years
> is a regression and something we really can't do.
Agree. Then maybe to add way to choose, because choice is high resolution vs
performance. For example Intel dynamically throttling interrupts on e1000*,
and it saves me in this case. They leave also option for users who wants low
latency/high troughput.
So maybe there must be a way for specific functions who uses get(ns)timeofday
to use specific timers (cheap and less precise), by option. Or to limit
amount of calls to timer by them.

2008-08-28 20:29:52

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

>
> The issue is the ordering of processing the requests.
>
> So if request A arrived on interface 1 before request B arrived on
> interface 2, the trade described in A should be performed before the
> one in B.
>
> This is not "arcance" as you seem to suppose it might be, but rather
> pretty clear fair handling or requests sent between trading desks.

Has the request "hit the trading system" when it hits the NIC, or when
it hits the application executing the trade? If the SEC calls for when
it hits the NIC, then none of what is done today is really
accurate/correct and one would need to start using NIC HW timestamps,
synchronized with the host and the other NICs in the system no?

The way things are today, there really isn't much guarantee that hitting
NIC 1 before NIC 2 will result in a driver-generated timestamp for the
NIC 1 packet which is before the driver-generated timestamp for the NIC
2 packet. It will be luck of the interrupt coalescing interaction with
other traffic on the NIC and/or polling out of NAPI right?

rick jones

2008-08-28 20:32:38

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

From: Rick Jones <[email protected]>
Date: Thu, 28 Aug 2008 13:29:30 -0700

> Has the request "hit the trading system" when it hits the NIC, or
> when it hits the application executing the trade? If the SEC calls
> for when it hits the NIC, then none of what is done today is really
> accurate/correct and one would need to start using NIC HW
> timestamps, synchronized with the host and the other NICs in the
> system no?

The SEC isn't mandating anything here, stop framing it that way :-)

People simply won't trade with a firm if they find out that trades
there are executed out of order.

They are simply trying to make things as fair as possible.

2008-08-28 20:46:16

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

David Miller wrote:
> From: Rick Jones <[email protected]>
> Date: Thu, 28 Aug 2008 13:29:30 -0700
>
>
>>Has the request "hit the trading system" when it hits the NIC, or
>>when it hits the application executing the trade? If the SEC calls
>>for when it hits the NIC, then none of what is done today is really
>>accurate/correct and one would need to start using NIC HW
>>timestamps, synchronized with the host and the other NICs in the
>>system no?
>
>
> The SEC isn't mandating anything here, stop framing it that way :-)

Must be my DC upbringing. I figured that if the logic wasn't 100%
concrete a US Federal Bureaucracy must be involved :)

> People simply won't trade with a firm if they find out that trades
> there are executed out of order.
>
> They are simply trying to make things as fair as possible.

But that is the very crux of the question - exactly where is "in order"
to be determined? Is it supposed to be arrival time at the NIC HW,
initial notice by the driver, or initial notice by the trading application?

Given that there are no guarantees that a packet arriving on NIC 1 and
timestamped either by the NIC HW or the driver will actually hit the
application before a packet arriving on NIC2, just how long are these
financial services applications going to wait around before executing
the trade carried in the packet arriving on NIC1?

rick jones

2008-08-28 20:47:20

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

From: Rick Jones <[email protected]>
Date: Thu, 28 Aug 2008 13:45:40 -0700

> Given that there are no guarantees that a packet arriving on NIC 1
> and timestamped either by the NIC HW or the driver will actually hit
> the application before a packet arriving on NIC2, just how long are
> these financial services applications going to wait around before
> executing the trade carried in the packet arriving on NIC1?

I have no idea.

They also care about trade processing latency btw.

2008-08-29 15:22:34

by Joe Malicki

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

----- "Andi Kleen" <[email protected]> wrote:
> > I've worked with systems where SO_TIMESTAMP has been used for
> > H.323 videoconferencing systems to synchronize audio and video
>
> But didn't you really want a "end2end" time stamp in this case,
> as in really at the end of all kernel/hardware queues on your side.

No.

That adds variance, and packets aren't comparable because they may
suffer different kernel/hardware delays.

The goal is to approximate original sendtime when the application-level
timestamps are unreliable. The more queueing delays that can be
taken out of the timestamp, the better.

> A packet roughly travels this way on a normal NIC before it hits
> recvmsg()
>
> wire -> NIC on die buffers -> NIC RX ring -> interrupt handler ->
> NAPI or per CPU queue -> softirq socket lookup -> socket queue ->
> recvmsg
>
> These all do their own queuing and all queues can add delays
> depending
> on the load. Right now SO_TIMESTAMP is in the interrupt handler,
> but it's just an arbitary position in a multitude of queues.
>

If it could be even earlier, it would be better.

> For video conferencing (or e.g. in general if you implement a
> retransmit
> timeout in user space) scheduling delays on the local box
> surely need to be taken into account too because they all add
> to the final timing of the packets on the wire.

For retransmit timeouts, that might be interesting, and might be
one case where it is interesting. But then what value does
SO_TIMESTAMP have, since you could call gettimeofday() immediately
after receipt, and also include application scheduling delays?

For videoconferencing, one wants to know when to display a packet
as compared to other packets.

> The queues inside the system are really part of the network
> too. In Linux for example the algorithms who size the TCP
> buffer space know that and especially take account for it
> and reserve a local queue buffer.
>
> > where remote systems' timestamps on the protocol streams proved
> > to be inaccurate (based off of different, unsynchronized clocks).
>
> Yes, but why ignore local scheduling delays?

Because one would want to ignore even network scheduling delays
if possible... unfortunately in some instances it's not.

> >
> > I can't see any other realistic use of this, but trying to get
> > timestamps for quasi-realtime protocols may be an important use
> > case - and in that case, you want the time when it hits the
> > interface, NOT when it hits the socket.
>
> I think it's the other way round. Why would the real time
> protocol care when it hits some arbitary queue in the network
> stack instead of the time when the application can really
> read the data?
>
> > What utility does the time of hitting the socket get you?
>
> SO_TIMESTAMP was originally invented for passive network
> monitoring as in tcpdump (for which PACKET sockets were designed
> originally, DHCP is really just abusing them imho). There it makes
> some sense to do the time stamp as near on the wire as possible
> but really a hardware time stamp would be better because
> it is even nearer. But for anything that does end2end it's
> the wrong semantics anyways because ignoring local queueing
> delays would be just a bug, and SO_TIMESTAMP ignores them currently.
>
> -Andi
>
> --
> [email protected]

Why would you want to do end-to-end with SO_TIMESTAMP, vs. gettimeofday
after recv?

2008-08-29 15:28:18

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

> That adds variance, and packets aren't comparable because they may
> suffer different kernel/hardware delays.

And there are no "different kernel/hardware delays" in the network?

If your RTT measurement method cannot handle some variance (using
standard sampling and data smoothing techniques similar to TCP) then it
just needs to be fixed.

Besides measuring in the interrupt handler doesn't protect you
against local variances anyways because the interrupt timing has variability
(e.g due to irq off regions or due to interrupt mitigation or
higher priority interrupts) too

> > Yes, but why ignore local scheduling delays?
>
> Because one would want to ignore even network scheduling delays
> if possible... unfortunately in some instances it's not.

The local delays add to the user experience too.
It's unclear why you want to ignore those.

-Andi

--
[email protected]

2008-08-29 15:43:56

by Joe Malicki

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

Joe Malicki
Software Engineer
MetaCarta, Inc.

----- "Andi Kleen" <[email protected]> wrote:
> > That adds variance, and packets aren't comparable because they may
> > suffer different kernel/hardware delays.
>
> And there are no "different kernel/hardware delays" in the network?
>
> If your RTT measurement method cannot handle some variance (using
> standard sampling and data smoothing techniques similar to TCP) then
> it
> just needs to be fixed.

Noone's measuring RTT... what ever made you think that?

I should explain the application of SO_TIMESTAMP better.

Video camera -> Video jack -> Digitization -> Compression ->
Packetization -> NIC -> Ethernet -> NIC -> Interrupt Handler -> Queue -> Application

Microphone -> MIC jack -> Digitization -> Compression ->
Packetization -> NIC -> Ethernet -> NIC -> Interrupt Handler -> Queue -> Application

One wants to know the original time sound and light waves hit the camera
and microphone, because one wants to know when they should hit the soundcard and
video on the other end (i.e. any delays should be synchronized) but one only has control
over the receiving system. There are timestamps at the application level for this...
unfortunately, many implementations in the real world have independent clocks that skew
relative to each other, with little correction on the sending system.

Yeah, that's broken, but one has to be liberal in what one accepts from popular products.

One way to mitigate the skew between the clocks is to take measurements on the receiving
host, which you do control, and compare the average skew between the two streams and
correct for it. Interrupt handler time has variance, but it's less than application-level
time, so it's a better, more reliable estimator.

> Besides measuring in the interrupt handler doesn't protect you
> against local variances anyways because the interrupt timing has
> variability
> (e.g due to irq off regions or due to interrupt mitigation or
> higher priority interrupts) too
>
True, but occasionally it's the best approximation to original send time.

> > > Yes, but why ignore local scheduling delays?
> >
> > Because one would want to ignore even network scheduling delays
> > if possible... unfortunately in some instances it's not.
>
> The local delays add to the user experience too.
> It's unclear why you want to ignore those.
>
> -Andi

You don't want to ignore them, you want to compensate for them
by getting an earlier timestamp.

2008-08-29 15:43:41

by Stephen Hemminger

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Thu, 28 Aug 2008 22:55:29 +0300
Denys Fedoryshchenko <[email protected]> wrote:

> On Thursday 28 August 2008, Eric Dumazet wrote:
> > OK, please try oprofile with call graph analysis.
> I did already. Even because most of programs (except ripd/zebra) can be
> killed, and i kill them, it doesn't change almost anything.
>
> it seems heavy things causing instability:
>
> 1)HTB (resolution can be lowered to improve performance, i will try Jarek
> patch soon)

If you are doing HTB it also calls clock to get timing information.
Each packet dequeue in htb calls psched_get_time() and that becomes
another call nano-second real time clock.

If your embedded processor has really expensive clock, you probably
just want to provide an alternative cheaper time source with less
resolution.

2008-08-29 20:44:51

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Fri, Aug 29, 2008 at 11:21:26AM -0400, Joe Malicki ([email protected]) wrote:
> > But didn't you really want a "end2end" time stamp in this case,
> > as in really at the end of all kernel/hardware queues on your side.
>
> No.
>
> That adds variance, and packets aren't comparable because they may
> suffer different kernel/hardware delays.
>
> The goal is to approximate original sendtime when the application-level
> timestamps are unreliable. The more queueing delays that can be
> taken out of the timestamp, the better.

Just a note from that one who really developed real-time audio and
video processing engines: _no_one_ really relies to the timestamps
attached to the received packet. By no one I really mean NO ONE. It is
ust wrong, broken and stupid. There are so many queues in the data
path, that it just can not be reliable by definition.

Instead sending path incapsulates packet sequence number into appropriate
packet header (like, and the most cases the only, RTP header), and
receiving path just multiplies this sequence number by the compression
rate and size of the packet. This numbers differ from design to design,
but overall approach is the same: no one really depends on the hardware
timestamp attached on the receiver, only sender's data is reliable.
If someone depends on it, it is broken and just waits for the
appropriate attack vector to inect broken data into the dataflow (such
users do not use tcp, since it "introduces unneded delays" or similar
marketing and compeltely untested things).

So this overall discussion of the timestamp option is meaningless: we
just bloody can not change it as is, since so many applications really
depend on it (even if they should not).

We can force lower resolution in terms of xtime or similar counter,
which will be default timestamp in case of some syscall (turned off by
default), but since so far no one sent a patch, this looks very subtle.

--
Evgeniy Polyakov

2008-09-01 02:39:57

by Valdis Klētnieks

[permalink] [raw]

Subject: Re: loaded router, excessive getnstimeofday in oprofile

On Thu, 28 Aug 2008 11:00:48 PDT, Rick Jones said:

> Earlier in the thread mention was made of financial services types. If
> someone has knowledge of the (probably) arcane rules under which they
> must operate it would be great to hear more. Does some entity like the
> SEC (Securities and Exchange Commission in the United States) mandate
> some sort of timestamp for when the trading request "arrives at the
> trading system" and do they define that "arriving at the trading system"
> means?

As a totally pragmatic point - if the market is in such free-fall that it
matters that your order got in a 10 thousandth of a second after somebody
else's, instead of before, you probably lost at least 5 to 10 times as much
during the time it took somebody to type the damn order in and hit enter.

At that point, you have *bigger* things to worry about.

Attachments:

(No filename) (226.00 B)

2008-09-01 03:52:07