The rolling-over of /proc/net/dev fields annoys me.
I read a couple threads about the issue and saw a lot of whimpering
about how locking would be such a pain to implement in lieu of
32-bit atomicity.
Alan Cox pointed out in one of them that accurate info could be
collected through "the firewalling facilities", which I take to mean
the ipt_counters structure. The caveat is that it only provides
packet and byte counts.
One alternative to throwing locks around everything accessing those
fields is to update a 64-bit counter asynchronously. Has this been
considered? It would entail atomically executing
total_rx_bytes += rx_bytes;
rx_bytes = 0;
and merely ensuring that rx_bytes does not roll over between calls.
On Sun, Feb 16, 2003 at 04:16:16PM -0600, Mark J Roberts wrote:
> The rolling-over of /proc/net/dev fields annoys me.
Why?
How often does it happen?
> total_rx_bytes += rx_bytes;
if lval is 64-bit, then this cannot be done reliably on all
architectures
--cw
Chris Wedgwood:
> How often does it happen?
When the windows box behind my NAT is using all of my 640kbit/sec
downstream to download movies, it takes a little over 14 hours to
download four gigabytes and roll over the byte counter. This means
ifconfig is mostly useless for getting an idea of how much I've
downloaded, which is something very useful to me.
> > total_rx_bytes += rx_bytes;
>
> if lval is 64-bit, then this cannot be done reliably on all
> architectures
I'm not sure why. I realize that x86 can't do atomic 64-bit
operations, but what I propose is to leave the 32-bit rx_bytes code
the way it is, and just have some heuristic for updating the 64-bit
value every so often, which can be done under a lock, so there would
be no opportunity for races to corrupt the counter. (This is also an
optimization since there needn't be any locks in the actual packet
handling code.)
But I admit I'm no expert programmer, and I might be suggesting
nonsense. In any case, the bug is real, the ifconfig output is
misleading, and I think it should be fixed one way or another.
Mark J Roberts wrote:
> Chris Wedgwood:
>>>total_rx_bytes += rx_bytes;
>>
>>if lval is 64-bit, then this cannot be done reliably on all
>>architectures
>
>
> I'm not sure why. I realize that x86 can't do atomic 64-bit
> operations, but what I propose is to leave the 32-bit rx_bytes code
> the way it is, and just have some heuristic for updating the 64-bit
> value every so often, which can be done under a lock, so there would
> be no opportunity for races to corrupt the counter. (This is also an
> optimization since there needn't be any locks in the actual packet
> handling code.)
I was one of the ones who was interested in making the statistics
64-bit, and adding locking to do it right. The solution finally
appeared, many months ago:
The counters don't need to be 64-bit, because it is trivially possible
for userspace to track the statistics, and to simply use the difference
between two samples as the increment used in calculating whatever
numbers you wish -- 64-bit SNMP MIB statistics were what I was
interested in. Wrapping is trivially handled by standard unsigned int
arithmetic, among other methods.
If you really want the raw data, then use ethtool's NIC-specific stats
facility, to retrieve raw statistics directly from the NIC. [this of
course requires driver modifications, but they are easy on modern NICs]
Jeff
On Sun, Feb 16, 2003 at 08:46:05PM -0600, Mark J Roberts wrote:
> When the windows box behind my NAT is using all of my 640kbit/sec
> downstream to download movies, it takes a little over 14 hours to
> download four gigabytes and roll over the byte counter.
Therefore userspace needs to check the counters more often... say ever
30s or so and detect rollover. Most of this could be simply
encapsulated in a library and made transparent to the upper layers.
--cw
On Sun, Feb 16, 2003 at 08:21:56PM -0800, Chris Wedgwood wrote:
> On Sun, Feb 16, 2003 at 08:46:05PM -0600, Mark J Roberts wrote:
> > When the windows box behind my NAT is using all of my 640kbit/sec
> > downstream to download movies, it takes a little over 14 hours to
> > download four gigabytes and roll over the byte counter.
>
> Therefore userspace needs to check the counters more often... say ever
> 30s or so and detect rollover. Most of this could be simply
> encapsulated in a library and made transparent to the upper layers.
Some of my colleques complained once, that at full tilt
the fiber-channel fabric overflowed its SNMP bitcounters
every 2 seconds.
"we need to do polling more rapidly, than the poller can do"
The SNMP pollers do handle gracefully 32-bit unsigned overlow,
they just need to get snapshots in increments a bit under 2G...
(Hmm.. perhaps I remember that wrong, a bit under 4G should be ok.)
> --cw
/Matti Aarnio
don't forget that 10G ethernet is starting to leak out of the labs into
the real world. I don't know of any linux support yet, but it will come
and then you will be able to overflow 32bit bitcounters multiple times per
second.
David Lang
On Mon, 17 Feb 2003, Matti Aarnio wrote:
> Date: Mon, 17 Feb 2003 12:35:53 +0200
> From: Matti Aarnio <[email protected]>
> To: Mark J Roberts <[email protected]>, [email protected]
> Subject: Re: Annoying /proc/net/dev rollovers.
>
> On Sun, Feb 16, 2003 at 08:21:56PM -0800, Chris Wedgwood wrote:
> > On Sun, Feb 16, 2003 at 08:46:05PM -0600, Mark J Roberts wrote:
> > > When the windows box behind my NAT is using all of my 640kbit/sec
> > > downstream to download movies, it takes a little over 14 hours to
> > > download four gigabytes and roll over the byte counter.
> >
> > Therefore userspace needs to check the counters more often... say ever
> > 30s or so and detect rollover. Most of this could be simply
> > encapsulated in a library and made transparent to the upper layers.
>
> Some of my colleques complained once, that at full tilt
> the fiber-channel fabric overflowed its SNMP bitcounters
> every 2 seconds.
>
> "we need to do polling more rapidly, than the poller can do"
>
> The SNMP pollers do handle gracefully 32-bit unsigned overlow,
> they just need to get snapshots in increments a bit under 2G...
> (Hmm.. perhaps I remember that wrong, a bit under 4G should be ok.)
>
> > --cw
>
> /Matti Aarnio
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Mon, Feb 17, 2003 at 08:58:40PM -0800, David Lang wrote:
> don't forget that 10G ethernet is starting to leak out of the labs into
> the real world. I don't know of any linux support yet, but it will come
> and then you will be able to overflow 32bit bitcounters multiple times per
> second.
A machine capable to support full data speed of 10G ether needs ...
around 1.3 GB/sec I/O speed both ways for the card, which at
64-bit PCI-X 533 -- is at most 4.3 GB/sec. In reality one can't
quite get the theorethical maximum out of the hardware.
One full-speed full-duplex 10G ether is barely doable with that new
version of PCI-X.
A giga-ether interface (or two) can be done in current generation
hardware, and even some usefull things can be done to fill the pipe.
I do suppose that at the time we are also using 64-bit processors,
in which incrementing 64-bit counter variables uninterruptably is
trivial.
I leave it as a thought excercise, as to why non-irq-blocking spinlock
is not a good idea to ensure data update monotonicity.
There are algorithmic ways to handle interruptible two-fetch
consistency problem in current 32-bit hardware. None of those
are being used, as far as I know:
irq-context:
add to less-significant-long
add carry to more-significant-long
reader context:
read less-significant-long into ax
read more-significant-long into bx
compare less-significant-long with ax
if differ, start from begin
compare more-significant-long with bx
if differ, start from begin
return ax,bx
That way the reader need not worry interrupting,
but implementation is -- likely -- assembly.
No spinlocks, no irq-blocking...
> David Lang
/Matti Aarnio
> On Mon, 17 Feb 2003, Matti Aarnio wrote:
>
> > Date: Mon, 17 Feb 2003 12:35:53 +0200
> > From: Matti Aarnio <[email protected]>
> > To: Mark J Roberts <[email protected]>, [email protected]
> > Subject: Re: Annoying /proc/net/dev rollovers.
> >
> > On Sun, Feb 16, 2003 at 08:21:56PM -0800, Chris Wedgwood wrote:
> > > On Sun, Feb 16, 2003 at 08:46:05PM -0600, Mark J Roberts wrote:
> > > > When the windows box behind my NAT is using all of my 640kbit/sec
> > > > downstream to download movies, it takes a little over 14 hours to
> > > > download four gigabytes and roll over the byte counter.
> > >
> > > Therefore userspace needs to check the counters more often... say ever
> > > 30s or so and detect rollover. Most of this could be simply
> > > encapsulated in a library and made transparent to the upper layers.
> >
> > Some of my colleques complained once, that at full tilt
> > the fiber-channel fabric overflowed its SNMP bitcounters
> > every 2 seconds.
> >
> > "we need to do polling more rapidly, than the poller can do"
> >
> > The SNMP pollers do handle gracefully 32-bit unsigned overlow,
> > they just need to get snapshots in increments a bit under 2G...
> > (Hmm.. perhaps I remember that wrong, a bit under 4G should be ok.)
> >
> > > --cw
> >
> > /Matti Aarnio