2008-06-01 20:46:35

by Bill Fink

[permalink] [raw]
Subject: Re: [PATCH] net: add ability to clear stats via ethtool - e1000/pcnet32

On Sun, 1 Jun 2008, Ben Hutchings wrote:

> Bill Fink wrote:
> <snip>
> > Yes, every individual Linux network administrator can re-create the
> > wheel by devising their own scripts, but it makes much more sense
> > to me to implement a simple general kernel mechanism once that could
> > be used generically, than to have hundreds (or thousands) of Linux
> > network administrators each having to do it themselves (perhaps
> > multiple times if they have a variety of types of systems and types
> > of NICs).
>
> The ethtool interface is pretty generic, even if the names aren't.
> Here's some Python code I just knocked up which demonstrates how
> to get a set of named stats. It shouldn't be terribly hard to
> extend this to saving and subtracting stat sets.

I'm not sure what that proves. Your python code just basically gives
the same info as running the "ethtool -S" command. But the question
is how does one devise a generic script or tool that doesn't require
any special knowledge of the specific NIC being used. For example,
here's the "ethtool -S" info for my myri10ge NIC:

[root@chance8 ~]# ethtool -S eth2
NIC statistics:
rx_packets: 53243864310
tx_packets: 112826823797
rx_bytes: 301727733072710
tx_bytes: 716648208451198
rx_errors: 0
tx_errors: 0
rx_dropped: 0
tx_dropped: 0
multicast: 0
collisions: 0
rx_length_errors: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
rx_fifo_errors: 0
rx_missed_errors: 0
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
tx_window_errors: 0
tx_boundary: 4096
WC: 1
irq: 8413
MSI: 1
read_dma_bw_MBs: 1398
write_dma_bw_MBs: 1613
read_write_dma_bw_MBs: 2711
serial_number: 287046
tx_pkt_start: 1157674101
tx_pkt_done: 1157674101
tx_req: 188226127
tx_done: 188226127
rx_small_cnt: 3009560676
rx_big_cnt: 1726230729
wake_queue: 57969440
stop_queue: 57969440
watchdog_resets: 0
tx_linearized: 0
link_changes: 8
link_up: 1
dropped_link_overflow: 0
dropped_link_error_or_filtered: 26584
dropped_multicast_filtered: 2190912
dropped_runt: 0
dropped_overrun: 0
dropped_no_small_buffer: 0
dropped_no_big_buffer: 0

How does one know which of these reported values are counter stats
that one wishes to zero/snapshot, and which are not?

Another issue that occurred to me is if multiple people are working
on troubleshooting a network problem, how do we insure that they all
get a consistent view of the stats? If this is done via a kernel
mechanism then there isn't an issue. But if it's done via user space,
then you have to make sure that everyone zeros/snapshots the stats
at the same time.

Ideally, one should be able to do something like "ethtool -z ethX"
to zero/snapshot the driver stats, and then "ethtool -S ethX" to get
the stats since the last snapshot. You should be able to use the
same tool ("ethtool") to do all of this, and not some other special
tool or specially devised homegrown script. Why make users lives
any more difficult than need be?

-Bill


2008-06-01 22:29:47

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH] net: add ability to clear stats via ethtool - e1000/pcnet32

Bill Fink wrote:
> On Sun, 1 Jun 2008, Ben Hutchings wrote:
>
> > Bill Fink wrote:
> > <snip>
> > > Yes, every individual Linux network administrator can re-create the
> > > wheel by devising their own scripts, but it makes much more sense
> > > to me to implement a simple general kernel mechanism once that could
> > > be used generically, than to have hundreds (or thousands) of Linux
> > > network administrators each having to do it themselves (perhaps
> > > multiple times if they have a variety of types of systems and types
> > > of NICs).
> >
> > The ethtool interface is pretty generic, even if the names aren't.
> > Here's some Python code I just knocked up which demonstrates how
> > to get a set of named stats. It shouldn't be terribly hard to
> > extend this to saving and subtracting stat sets.
>
> I'm not sure what that proves. Your python code just basically gives
> the same info as running the "ethtool -S" command.

Yes, but in a form you can more easily manipulate.

> But the question
> is how does one devise a generic script or tool that doesn't require
> any special knowledge of the specific NIC being used. For example,
> here's the "ethtool -S" info for my myri10ge NIC:
>
> [root@chance8 ~]# ethtool -S eth2
> NIC statistics:
[...]
> WC: 1
> irq: 8413
> MSI: 1
> read_dma_bw_MBs: 1398
> write_dma_bw_MBs: 1613
> read_write_dma_bw_MBs: 2711
> serial_number: 287046
[...]
> tx_linearized: 0
> link_changes: 8
> link_up: 1
[...]
>
> How does one know which of these reported values are counter stats
> that one wishes to zero/snapshot, and which are not?

Ah, I see, I didn't realise some drivers were abusing ethtool stats in
this way. But for the most part the differences will show up as zeroes.

If there's really a need for ethtool stats that aren't counters,
maybe the ethtool API should include flags to indicate which they are.

> Another issue that occurred to me is if multiple people are working
> on troubleshooting a network problem, how do we insure that they all
> get a consistent view of the stats? If this is done via a kernel
> mechanism then there isn't an issue. But if it's done via user space,
> then you have to make sure that everyone zeros/snapshots the stats
> at the same time.
>
> Ideally, one should be able to do something like "ethtool -z ethX"
> to zero/snapshot the driver stats, and then "ethtool -S ethX" to get
> the stats since the last snapshot. You should be able to use the
> same tool ("ethtool") to do all of this, and not some other special
> tool or specially devised homegrown script. Why make users lives
> any more difficult than need be?

No-one's stopping you from adding these options to ethtool. You could
have it save statistic sets in, say, /var/run/ethtool.

Ben.

--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.

2008-06-02 03:56:21

by Bill Fink

[permalink] [raw]
Subject: Re: [PATCH] net: add ability to clear stats via ethtool - e1000/pcnet32

On Sun, 1 Jun 2008, Ben Hutchings wrote:

> Bill Fink wrote:
>
> > But the question
> > is how does one devise a generic script or tool that doesn't require
> > any special knowledge of the specific NIC being used. For example,
> > here's the "ethtool -S" info for my myri10ge NIC:
> >
> > [root@chance8 ~]# ethtool -S eth2
> > NIC statistics:
> [...]
> > WC: 1
> > irq: 8413
> > MSI: 1
> > read_dma_bw_MBs: 1398
> > write_dma_bw_MBs: 1613
> > read_write_dma_bw_MBs: 2711
> > serial_number: 287046
> [...]
> > tx_linearized: 0
> > link_changes: 8
> > link_up: 1
> [...]
> >
> > How does one know which of these reported values are counter stats
> > that one wishes to zero/snapshot, and which are not?
>
> Ah, I see, I didn't realise some drivers were abusing ethtool stats in
> this way. But for the most part the differences will show up as zeroes.

I'm not sure I would characterize that as abuse of the ethtool stats,
but rather just a different category of stats. And I wouldn't want
them to show up as zeros after doing a clear/snapshot of the counter
stats. The "ethtool -S" output should be completely normal afterward,
with just the counter stats being zeroed.

> If there's really a need for ethtool stats that aren't counters,
> maybe the ethtool API should include flags to indicate which they are.

That would be useful. Maybe some way of determining the type of an
ethtool stat such as COUNTER (perhaps with subtypes for signed versus
unsigned, 32-bit versus 64-bit), RUNTIME_INFO, etc.

> > Another issue that occurred to me is if multiple people are working
> > on troubleshooting a network problem, how do we insure that they all
> > get a consistent view of the stats? If this is done via a kernel
> > mechanism then there isn't an issue. But if it's done via user space,
> > then you have to make sure that everyone zeros/snapshots the stats
> > at the same time.
> >
> > Ideally, one should be able to do something like "ethtool -z ethX"
> > to zero/snapshot the driver stats, and then "ethtool -S ethX" to get
> > the stats since the last snapshot. You should be able to use the
> > same tool ("ethtool") to do all of this, and not some other special
> > tool or specially devised homegrown script. Why make users lives
> > any more difficult than need be?
>
> No-one's stopping you from adding these options to ethtool. You could
> have it save statistic sets in, say, /var/run/ethtool.

Yes, that could be done if the ability to determine which ethtool stats
were counter stats was added to the ethtool API. And I guess they
should be saved in something like /var/run/ethtool/ethX. But then
what happens when you start using multiple network namespaces, and
for example have eth0 in several different network namespaces. How
could that be handled, to keep the different network namespaces from
clobbering the stats of another namespace? If done in the kernel,
I believe it would all work as expected, but it's not clear to me
how to handle this in user space.

-Bill

2008-06-02 05:39:53

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] net: add ability to clear stats via ethtool - e1000/pcnet32

From: Bill Fink <[email protected]>
Date: Sun, 1 Jun 2008 16:46:14 -0400

> Another issue that occurred to me is if multiple people are working
> on troubleshooting a network problem, how do we insure that they all
> get a consistent view of the stats?

Some of them definitely won't get a consistent view if one of them
triggers this undesirable statistic clearing knob.

2008-06-02 15:41:44

by Bill Fink

[permalink] [raw]
Subject: Re: [PATCH] net: add ability to clear stats via ethtool - e1000/pcnet32

On Sun, 01 Jun 2008, David Miller wrote:

> From: Bill Fink <[email protected]>
> Date: Sun, 1 Jun 2008 16:46:14 -0400
>
> > Another issue that occurred to me is if multiple people are working
> > on troubleshooting a network problem, how do we insure that they all
> > get a consistent view of the stats?
>
> Some of them definitely won't get a consistent view if one of them
> triggers this undesirable statistic clearing knob.

The typical case I was referring to is for example when I am assisting
someone else with troubleshooting. This is coordinated with the system
admin, who would usually be the one to clear/snapshot the stats. This
cannot be done by a normal user as it requires root privileges. BTW
I'm also not clear why running "ethtool -S" requires root privileges,
as it would be more convenient if a normal user could do this (currently
I have to be given sudo access).

This is a useful mechanism for many, many people. Bottom line I don't
really care if it's done in the kernel or in user space, but it should
be possible using the standard "ethtool -S" command, and I have raised
at least a couple of issues with modifying the ethtool command to
support this (but which wouldn't be an issue if implemented in the
kernel).

-Bill