2002-07-12 19:10:02

by Matt Stegman

[permalink] [raw]
Subject: 64 bit netdev stats counter

Hello,

I have a linux fileserver that wraps around the 'RX bytes' and 'TX bytes'
counters once every couple of days. Since these variables are "unsigned
long" in the kernel (32 bits on x86) it wraps at four gigabytes. I
patched a kernel (2.4.19-pre10-ac2) to make these two variables, along
with RX and TX packet counters, "unsigned long long" which is 64 bits on
x86.

This seems to work OK. The "ifconfig" command (from RedHat net-tools-1.60
RPM) already defines these values as "unsigned long long", so I don't have
any problems with ifconfig reporting weird numbers.

I did notice that different classes of network devices seem to use
different structs, so I guess this would only work for ethernet devices.

This is the first hacking of any kind I've done on the kernel. Any
comments would be appreciated.



The patch I used:
diff -urN linux-orig/include/linux/netdevice.h linux/include/linux/netdevice.h
--- linux-orig/include/linux/netdevice.h Tue Jul 9 13:28:43 2002
+++ linux/include/linux/netdevice.h Tue Jul 9 13:48:05 2002
@@ -96,10 +96,10 @@

struct net_device_stats
{
- unsigned long rx_packets; /* total packets received */
- unsigned long tx_packets; /* total packets transmitted */
- unsigned long rx_bytes; /* total bytes received */
- unsigned long tx_bytes; /* total bytes transmitted */
+ unsigned long long rx_packets; /* total packets received */
+ unsigned long long tx_packets; /* total packets transmitted */
+ unsigned long long rx_bytes; /* total bytes received */
+ unsigned long long tx_bytes; /* total bytes transmitted */
unsigned long rx_errors; /* bad packets received */
unsigned long tx_errors; /* packet transmit problems */
unsigned long rx_dropped; /* no space in linux buffers */
diff -urN linux-orig/net/core/dev.c linux/net/core/dev.c
--- linux-orig/net/core/dev.c Tue Jul 9 13:28:44 2002
+++ linux/net/core/dev.c Tue Jul 9 13:45:03 2002
@@ -1689,7 +1689,7 @@
int size;

if (stats)
- size = sprintf(buffer, "%6s:%8lu %7lu %4lu %4lu %4lu %5lu %10lu %9lu %8lu %7lu %4lu %4lu %4lu %5lu %7lu %10lu\n",
+ size = sprintf(buffer, "%6s:%8llu %7llu %4lu %4lu %4lu %5lu %10lu %9lu %8llu %7llu %4lu %4lu %4lu %5lu %7lu %10lu\n",
dev->name,
stats->rx_bytes,
stats->rx_packets, stats->rx_errors,

$ cat /proc/net/dev
Inter-| Receive | Transmit
face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed
lo: 25660 71 0 0 0 0 0 0 25660 71 0 0 0 0 0 0
eth0:7208362840 12115561 0 0 0 0 0 0 3171323568 5493455 0 0 0 0 0 0
$ ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet addr:a.b.c.d Bcast:a.b.e.f Mask:255.255.248.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:12115561 errors:0 dropped:0 overruns:0 frame:0
TX packets:5493455 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:7208362840 (6874.4 Mb) TX bytes:3171323568 (3024.4 Mb)
Interrupt:5 Base address:0x2000


--
-Matt Stegman



2002-07-12 21:59:21

by Matt Stegman

[permalink] [raw]
Subject: Re: 64 bit netdev stats counter

On 12 Jul 2002, Stephen Hemminger wrote:

> 64 bit values are not atomic so on x86 there will be glitches if this
> ever wraps over on an SMP machine. One other engineer is already
> adressing this for inode values like size; but this would introduce the
> same problem for network stuff.

One other solution I thought of would be to have an rx_bytes_wrap counter
in the struct.

struct net_device_stats {
...
unsigned long rx_bytes;
unsigned long rx_bytes_wrap;
unsigned long tx_bytes;
unsigned long tx_bytes_wrap;
...
}

... and then, everywhere we add to rx_bytes (or tx_bytes):

stats->rx_bytes += pkt_length;
if (stats->rx_bytes <= pkt_length) stats->rx_bytes_wrap++;

I suppose this might also be more backwards compatible - if other code I
don't know about expects the rx_bytes to be a long, this would keep it so.
Also, if gcc has real problems with doing 64-bit math on a 32-bit
processor, this keeps everything in 32-bits. I could then make a local
"unsigned long long" variable in the code that prints /proc/net/dev, and
in ifconfig.

unsigned long long rx_total_bytes = stats->rx_bytes * stats->rx_bytes_wrap;
sprintf(buf, "RX bytes:%llu", stats->rx_total_bytes);

--
-Matt Stegman




2002-07-12 22:04:42

by David Miller

[permalink] [raw]
Subject: Re: 64 bit netdev stats counter

From: Matt Stegman <[email protected]>
Date: Fri, 12 Jul 2002 17:02:07 -0500 (CDT)

On 12 Jul 2002, Stephen Hemminger wrote:

> 64 bit values are not atomic so on x86 there will be glitches if this
> ever wraps over on an SMP machine. One other engineer is already
> adressing this for inode values like size; but this would introduce the
> same problem for network stuff.

One other solution I thought of would be to have an rx_bytes_wrap counter
in the struct.

32-bit values aren't atomic either, what is the issue?
We don't use atomic_t ops on these counters so they aren't
guarenteed in any way right now even. GCC is going to
output "incl MEM" or similar for net_stats->counter++, since
it lacks the 'lock;' prefix it is not atomic.

2002-07-12 22:09:48

by Alan

[permalink] [raw]
Subject: Re: 64 bit netdev stats counter

On Fri, 2002-07-12 at 22:58, David S. Miller wrote:
> 32-bit values aren't atomic either, what is the issue?
> We don't use atomic_t ops on these counters so they aren't
> guarenteed in any way right now even. GCC is going to
> output "incl MEM" or similar for net_stats->counter++, since
> it lacks the 'lock;' prefix it is not atomic.

The behaviour is quite different though. On a 32bit counter the worst we
do is lose a few counts. On a 64bit one on 32bit cpus its quite likely
gcc will output

increment low 32bit
if zero
increment high

Which means we can rapidly get 2^32 out of sync

2002-07-12 22:14:05

by David Miller

[permalink] [raw]
Subject: Re: 64 bit netdev stats counter

From: Alan Cox <[email protected]>
Date: 13 Jul 2002 00:20:53 +0100

gcc will output

increment low 32bit
if zero
increment high

Which means we can rapidly get 2^32 out of sync

True and this equals the "fix" suggested C code involving
two 32-bit counters that the original author posted :-)

So this makes both cases equally wrong.

2002-07-12 22:26:19

by Andrew Morton

[permalink] [raw]
Subject: Re: 64 bit netdev stats counter

"David S. Miller" wrote:
>
> From: Alan Cox <[email protected]>
> Date: 13 Jul 2002 00:20:53 +0100
>
> gcc will output
>
> increment low 32bit
> if zero
> increment high
>
> Which means we can rapidly get 2^32 out of sync
>
> True and this equals the "fix" suggested C code involving
> two 32-bit counters that the original author posted :-)
>

Could you make the counters per-cpu and only add them all up
in the rare case where someone wants to read the stats?

And then change all the drivers ;)

-

2002-07-12 22:29:10

by Chris Friesen

[permalink] [raw]
Subject: Re: 64 bit netdev stats counter

Alan Cox wrote:
>
> On Fri, 2002-07-12 at 22:58, David S. Miller wrote:
> > 32-bit values aren't atomic either, what is the issue?
> > We don't use atomic_t ops on these counters so they aren't
> > guarenteed in any way right now even. GCC is going to
> > output "incl MEM" or similar for net_stats->counter++, since
> > it lacks the 'lock;' prefix it is not atomic.
>
> The behaviour is quite different though. On a 32bit counter the worst we
> do is lose a few counts. On a 64bit one on 32bit cpus its quite likely
> gcc will output
>
> increment low 32bit
> if zero
> increment high
>
> Which means we can rapidly get 2^32 out of sync

Isn't this the same as 32-bit counters on a machine that doesn't do atomic
32-bit ops? Although in that case you could only be 2^16 off...

Chris

--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]

2002-07-12 22:35:11

by David Miller

[permalink] [raw]
Subject: Re: 64 bit netdev stats counter

From: Andrew Morton <[email protected]>
Date: Fri, 12 Jul 2002 15:27:09 -0700

Could you make the counters per-cpu and only add them all up
in the rare case where someone wants to read the stats?

And then change all the drivers ;)

Since spinlocks are held %99 of the time when these counters are
bumped, I'd like to suggest another more sane and space optimized
solution :-)

2002-07-13 00:50:47

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: 64 bit netdev stats counter

Hello!

> 32-bit values aren't atomic either, what is the issue?
...
> output "incl MEM" or similar for net_stats->counter++, since
> it lacks the 'lock;' prefix it is not atomic.

The issue is that this does not matter, all the updates to counters
are serialized by driver logic in any case.

The counters were atomic wrt _read_ i.e. read used to produce valid result
rather than a garbage. We have discussed this some time ago, someone
proposed to use pair of 32bit numbers and prohibiting direct read,
fetching the result with a macro sort of

do {
result_lo = lo;
result_hi = hi;
} while (lo != result_lo);

or something similar. Well, plus some barriers when/if needed.

Honestly, I do not feel any enthusiasm about doing this in kernel.
Some small bit of useless work each packet, some minor waste of memory,
some minor crap in code... All this is not essential, but does not cause any
enthisiasm yet. :-)

Alexey

2002-07-13 01:07:32

by Alan

[permalink] [raw]
Subject: Re: 64 bit netdev stats counter

On Fri, 2002-07-12 at 23:31, Chris Friesen wrote:
>
> Isn't this the same as 32-bit counters on a machine that doesn't do atomic
> 32-bit ops? Although in that case you could only be 2^16 off...

Yes but we don't happen to have any of those I care about 8)