Hi,
This patchset is the corrected version of the patchset I posted
here weeks ago and which brings back the I/O statistics concerning
partitions which were lost between 2.4 and 2.6 kernels.
This version still breaks the /sys/block/_disk_/_part_/stat and
/proc/diskstats interfaces. Incidentally iostat is already able to
deal correctly with the new format and both 'iostat -x' and
'iostat -p' already display the enhanced partition statistics.
Adding new information at the end of the line as suggested by Randy,
would also break some user space applications. For example, iostat
which relies on the number of fields to make the difference between
devices and partitions would not work anymore. Moreover it would would
be inconsistent and confusing to display the same statistics in a
different order depending on whether they are related to a device or
a partition.
I do really think that the current interface is confusing and that
user space programs use it improperly. For example, the output of
'iostat -p' is not consistent: the reported tps (tranfers/s) value of
a device and the sum of his partitions are completely dissimillar
because iostat handle in the same manner pre-merge and after-merge
statistics.
'iostat -p' output without Enhanced Partition Statistics:
[root@xxx tmp]# iostat -p
Linux 2.6.24-orig (xxx) 02/01/2008
avg-cpu: %user %nice %system %iowait %steal %idle
0.37 0.00 0.25 0.82 0.00 98.56
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 3.12 161.00 18.41 299182 34220
cciss/c0d0p1 1.05 2.10 0.00 3906 4
cciss/c0d0p2 6.67 158.77 18.41 295044 34216
cciss/c0d1 0.16 3.16 0.00 5864 0
cciss/c0d1p1 0.14 0.33 0.00 620 0
cciss/c0d1p2 2.21 2.23 0.00 4148 0
dm-0 6.51 158.56 18.41 294650 34216
dm-1 0.00 0.01 0.00 24 0
'iostat -p' output with Enhanced Partition Statistics:
[root@xxx tmp]# iostat -p
Linux 2.6.24-eps (xxx) 02/01/2008
avg-cpu: %user %nice %system %iowait %steal %idle
0.05 0.00 0.09 0.91 0.00 98.95
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
cciss/c0d0 3.15 18.55 136.56 345862 2546140
cciss/c0d0p1 0.01 0.21 0.00 3914 4
cciss/c0d0p2 3.14 18.33 136.56 341724 2546136
cciss/c0d1 0.02 0.31 0.00 5864 0
cciss/c0d1p1 0.01 0.05 0.00 1012 0
cciss/c0d1p2 0.01 0.22 0.00 4148 0
dm-0 17.58 18.31 136.56 341330 2546136
dm-1 0.00 0.00 0.00 24 0
Any comments ?
Regards,
Jerome Marchand
On Fri, Feb 01, 2008 at 07:14:07PM +0100, Jerome Marchand wrote:
> Hi,
>
> This patchset is the corrected version of the patchset I posted
> here weeks ago and which brings back the I/O statistics concerning
> partitions which were lost between 2.4 and 2.6 kernels.
>
> This version still breaks the /sys/block/_disk_/_part_/stat and
> /proc/diskstats interfaces. Incidentally iostat is already able to
> deal correctly with the new format and both 'iostat -x' and
> 'iostat -p' already display the enhanced partition statistics.
As you are changing a userspace ABI, please document it properly in
Documentation/ABI/ when you change it.
thanks,
greg k-h
Update the documentation to reflect the change in userspace interface.
Signed-off-by: Jerome Marchand <[email protected]>
---
Documentation/ABI/testing/procfs-diskstats | 22 ++++++++++++++
Documentation/ABI/testing/sysfs-block | 28 +++++++++++++++++++
Documentation/iostats.txt | 15 +++++++++-
3 files changed, 64 insertions(+), 1 deletion(-)
diff -urNp linux-2.6.orig/Documentation/ABI/testing/procfs-diskstats linux-2.6/Documentation/ABI/testing/procfs-diskstats
--- linux-2.6.orig/Documentation/ABI/testing/procfs-diskstats 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6/Documentation/ABI/testing/procfs-diskstats 2008-02-05 19:29:10.000000000 +0100
@@ -0,0 +1,22 @@
+What: /proc/diskstats
+Date: February 2008
+Contact: Jerome Marchand <[email protected]>
+Description:
+ The /proc/diskstats file displays the I/O statistics
+ of block devices. Each line contains the following 14
+ fields:
+ 1 - major number
+ 2 - minor mumber
+ 3 - device name
+ 4 - reads completed succesfully
+ 5 - reads merged
+ 6 - sectors read
+ 7 - time spent reading (ms)
+ 8 - writes completed
+ 9 - writes merged
+ 10 - sectors written
+ 11 - time spent writing (ms)
+ 12 - I/Os currently in progress
+ 13 - time spent doing I/Os (ms)
+ 14 - weighted time spent doing I/Os (ms)
+ For more details refer to Documentation/iostats.txt
diff -urNp linux-2.6.orig/Documentation/ABI/testing/sysfs-block linux-2.6/Documentation/ABI/testing/sysfs-block
--- linux-2.6.orig/Documentation/ABI/testing/sysfs-block 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6/Documentation/ABI/testing/sysfs-block 2008-02-05 19:32:02.000000000 +0100
@@ -0,0 +1,28 @@
+What: /sys/block/<disk>/stat
+Date: February 2008
+Contact: Jerome Marchand <[email protected]>
+Description:
+ The /sys/block/<disk>/stat files displays the I/O
+ statistics of disk <disk>. They contain 11 fields:
+ 1 - reads completed succesfully
+ 2 - reads merged
+ 3 - sectors read
+ 4 - time spent reading (ms)
+ 5 - writes completed
+ 6 - writes merged
+ 7 - sectors written
+ 8 - time spent writing (ms)
+ 9 - I/Os currently in progress
+ 10 - time spent doing I/Os (ms)
+ 11 - weighted time spent doing I/Os (ms)
+ For more details refer Documentation/iostats.txt
+
+
+What: /sys/block/<disk>/<part>/stat
+Date: February 2008
+Contact: Jerome Marchand <[email protected]>
+Description:
+ The /sys/block/<disk>/<part>/stat files display the
+ I/O statistics of partition <part>. The format is the
+ same as the above-written /sys/block/<disk>/stat
+ format.
diff -urNp linux-2.6.orig/Documentation/iostats.txt linux-2.6/Documentation/iostats.txt
--- linux-2.6.orig/Documentation/iostats.txt 2008-02-05 19:29:44.000000000 +0100
+++ linux-2.6/Documentation/iostats.txt 2008-02-05 19:29:10.000000000 +0100
@@ -58,7 +58,7 @@ they should not wrap twice before you no
Each set of stats only applies to the indicated device; if you want
system-wide stats you'll have to find all the devices and sum them all up.
-Field 1 -- # of reads issued
+Field 1 -- # of reads completed
This is the total number of reads completed successfully.
Field 2 -- # of reads merged, field 6 -- # of writes merged
Reads and writes which are adjacent to each other may be merged for
@@ -132,6 +132,19 @@ words, the number of reads for partition
of queuing for partitions, and at completion for whole disks. This is
a subtle distinction that is probably uninteresting for most cases.
+More significant is the error induced by counting the numbers of
+reads/writes before merges for partitions and after for disks. Since a
+typical workload usually contains a lot of successive and adjacent requests,
+the number of reads/writes issued can be several times higher than the
+number of reads/writes completed.
+
+In 2.6.25, the full statistic set is again available for partitions and
+disk and partition statistics are consistent again. Since we still don't
+keep record of the partition-relative address, an operation is attributed to
+the partition which contains the first sector of the request after the
+eventual merges. As requests can be merged across partition, this could lead
+to some (probably insignificant) innacuracy.
+
Additional notes
----------------