Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757192Ab0DFRAC (ORCPT ); Tue, 6 Apr 2010 13:00:02 -0400 Received: from smtp-out.google.com ([216.239.44.51]:20915 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757229Ab0DFQ7z convert rfc822-to-8bit (ORCPT ); Tue, 6 Apr 2010 12:59:55 -0400 DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=mime-version:in-reply-to:references:from:date:message-id: subject:to:cc:content-type:content-transfer-encoding:x-system-of-record; b=RZFxH0HIQe6qHDvwr0cAF9NJqj1LT3UPOEsTorsUbk2MgpkaPdy/vGKRg6GUCIWrJ +vlH/w78okHVwxYCsYn1w== MIME-Version: 1.0 In-Reply-To: <20100406151621.GB3029@redhat.com> References: <20100406032630.23464.24685.stgit@austin.mtv.corp.google.com> <20100406033629.23464.90216.stgit@austin.mtv.corp.google.com> <20100406151621.GB3029@redhat.com> From: Divyesh Shah Date: Tue, 6 Apr 2010 09:59:32 -0700 Message-ID: Subject: Re: [PATCH 2/3][v3] blkio: Add io controller stats like To: Vivek Goyal Cc: jens.axboe@oracle.com, linux-kernel@vger.kernel.org, nauman@google.com, ctalbott@google.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8710 Lines: 229 Vivek, I'll send out a v3.1 only for this patch since you've Acked the other 2 patches. Thanks a lot for the detailed reviews again! On Tue, Apr 6, 2010 at 8:16 AM, Vivek Goyal wrote: > On Mon, Apr 05, 2010 at 08:37:01PM -0700, Divyesh Shah wrote: >> - io_service_time (the time between request dispatch and completion for IOs >> ? in the cgroup) >> - io_wait_time (the time spent waiting in the IO shceduler queues before >> ? getting serviced) >> - io_serviced (number of IOs serviced from this blkio_group) >> - io_service_bytes (Number of bytes served for this cgroup) >> >> These stats are accumulated per operation type helping us to distinguish between >> read and write, and sync and async IO. This patch does not increment any of >> these stats. >> >> Signed-off-by: Divyesh Shah >> --- >> >> ?Documentation/cgroups/blkio-controller.txt | ? 40 +++++++ >> ?block/blk-cgroup.c ? ? ? ? ? ? ? ? ? ? ? ? | ?166 +++++++++++++++++++++++++--- >> ?block/blk-cgroup.h ? ? ? ? ? ? ? ? ? ? ? ? | ? 60 ++++++++-- >> ?block/cfq-iosched.c ? ? ? ? ? ? ? ? ? ? ? ?| ? ?3 - >> ?4 files changed, 239 insertions(+), 30 deletions(-) >> >> diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt >> index 630879c..087925b 100644 >> --- a/Documentation/cgroups/blkio-controller.txt >> +++ b/Documentation/cgroups/blkio-controller.txt >> @@ -77,7 +77,6 @@ Details of cgroup files >> ?======================= >> ?- blkio.weight >> ? ? ? - Specifies per cgroup weight. >> - >> ? ? ? ? Currently allowed range of weights is from 100 to 1000. >> >> ?- blkio.time >> @@ -92,6 +91,41 @@ Details of cgroup files >> ? ? ? ? third field specifies the number of sectors transferred by the >> ? ? ? ? group to/from the device. >> >> +- blkio.io_service_bytes >> + ? ? - Number of bytes transferred to/from the disk by the group. These >> + ? ? ? are further divided by the type of operation - read or write, sync >> + ? ? ? or async. First two fields specify the major and minor number of the >> + ? ? ? device, third field specifies the operation type and the fourth field >> + ? ? ? specifies the number of bytes. >> + >> +- blkio.io_serviced >> + ? ? - Number of IOs completed to/from the disk by the group. These >> + ? ? ? are further divided by the type of operation - read or write, sync >> + ? ? ? or async. First two fields specify the major and minor number of the >> + ? ? ? device, third field specifies the operation type and the fourth field >> + ? ? ? specifies the number of IOs. >> + > > Hi Divyesh, > > V3 looks much better. Just couple of minor nits. > >> +- blkio.io_service_time >> + ? ? - Total amount of time between request dispatch and request completion >> + ? ? ? for the IOs done by this cgroup. This is in nanoseconds to make it >> + ? ? ? meaningful for flash devices too. For devices with queue depth of 1, >> + ? ? ? this time represents the actual service time. When queue_depth > 1, >> + ? ? ? that is no longer true as requests may be served out of order. > > Like io_wait_time, can you also mention here that this time is cumulative > time and on NCQ disks it can be more than actual time elapsed. Will do. > > I did a quick run with your patches. I ran a sequential workload for 30s > in two groups of weight 100 and 200. Following is the output of > blkio.io_service_time (I have kept stats for only disk 253:3 in output). > > # cat test1/blkio.io_service_time > 253:3 Read 18019970625 > 253:3 Write 0 > 253:3 Sync 18019970625 > 253:3 Async 0 > 253:3 Total 18019970625 > > # cat test2/blkio.io_service_time > 253:3 Read 35479070171 > 253:3 Write 0 > 253:3 Sync 35479070171 > 253:3 Async 0 > > This storage supports NCQ. We see that though I ran test only for > 30 seconds, total service ime for cgroup is close to 18+35=53 seconds. Yes that is expected as from our discussion about NCQ as any IO can have the service time of multiple IOs coupled together when serviced out of order. > >> + ? ? ? This time is further divided by the type of operation - >> + ? ? ? read or write, sync or async. First two fields specify the major and >> + ? ? ? minor number of the device, third field specifies the operation type >> + ? ? ? and the fourth field specifies the io_service_time in ns. >> + >> +- blkio.io_wait_time >> + ? ? - Total amount of time the IO spent waiting in the scheduler queues for >> + ? ? ? service. This can be greater than the total time elapsed since it is >> + ? ? ? cumulative io_wait_time for all IOs. This is in nanoseconds to make it >> + ? ? ? meaningful for flash devices too. This time is further divided by the >> + ? ? ? type of operation - read or write, sync or async. First two fields >> + ? ? ? specify the major and minor number of the device, third field >> + ? ? ? specifies the operation type and the fourth field specifies the >> + ? ? ? io_wait_time in ns. >> + >> ?- blkio.dequeue >> ? ? ? - Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y. This >> ? ? ? ? gives the statistics about how many a times a group was dequeued >> @@ -99,6 +133,10 @@ Details of cgroup files >> ? ? ? ? and minor number of the device and third field specifies the number >> ? ? ? ? of times a group was dequeued from a particular device. >> >> +- blkio.reset_stats >> + ? ? - Writing an int to this file will result in resetting all the stats >> + ? ? ? for that cgroup. >> + > > Personally, I like adding a separate file to reset the stats. Now one does > not get surprised by the fact that writting to blkio.io_service_time, also > reset rest of the stats. > >> ?CFQ sysfs tunable >> ?================= >> ?/sys/block//queue/iosched/group_isolation >> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c >> index 5be3981..d585a05 100644 >> --- a/block/blk-cgroup.c >> +++ b/block/blk-cgroup.c >> @@ -17,6 +17,8 @@ >> ?#include >> ?#include "blk-cgroup.h" >> >> +#define MAX_KEY_LEN 100 >> + >> ?static DEFINE_SPINLOCK(blkio_list_lock); >> ?static LIST_HEAD(blkio_list); >> >> @@ -55,12 +57,21 @@ struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup) >> ?} >> ?EXPORT_SYMBOL_GPL(cgroup_to_blkio_cgroup); >> >> -void blkiocg_update_blkio_group_stats(struct blkio_group *blkg, >> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? unsigned long time) >> +void blkio_group_init(struct blkio_group *blkg) >> +{ >> + ? ? spin_lock_init(&blkg->stats_lock); >> +} >> +EXPORT_SYMBOL_GPL(blkio_group_init); >> + >> +void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time) >> ?{ >> - ? ? blkg->time += time; >> + ? ? unsigned long flags; >> + >> + ? ? spin_lock_irqsave(&blkg->stats_lock, flags); >> + ? ? blkg->stats.time += time; >> + ? ? spin_unlock_irqrestore(&blkg->stats_lock, flags); >> ?} >> -EXPORT_SYMBOL_GPL(blkiocg_update_blkio_group_stats); >> +EXPORT_SYMBOL_GPL(blkiocg_update_timeslice_used); >> >> ?void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg, >> ? ? ? ? ? ? ? ? ? ? ? struct blkio_group *blkg, void *key, dev_t dev) >> @@ -170,13 +181,107 @@ blkiocg_weight_write(struct cgroup *cgroup, struct cftype *cftype, u64 val) >> ? ? ? return 0; >> ?} >> >> -#define SHOW_FUNCTION_PER_GROUP(__VAR) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? \ >> +static int >> +blkiocg_reset_write(struct cgroup *cgroup, struct cftype *cftype, u64 val) >> +{ > > I guess we can rename this function to blkiocg_reset_stats(). Will do > > [..] >> @@ -217,16 +331,36 @@ struct cftype blkio_files[] = { >> ? ? ? }, >> ? ? ? { >> ? ? ? ? ? ? ? .name = "time", >> - ? ? ? ? ? ? .read_seq_string = blkiocg_time_read, >> + ? ? ? ? ? ? .read_map = blkiocg_time_read, >> ? ? ? }, >> ? ? ? { >> ? ? ? ? ? ? ? .name = "sectors", >> - ? ? ? ? ? ? .read_seq_string = blkiocg_sectors_read, >> + ? ? ? ? ? ? .read_map = blkiocg_sectors_read, >> + ? ? }, >> + ? ? { >> + ? ? ? ? ? ? .name = "io_service_bytes", >> + ? ? ? ? ? ? .read_map = blkiocg_io_service_bytes_read, >> + ? ? }, >> + ? ? { >> + ? ? ? ? ? ? .name = "io_serviced", >> + ? ? ? ? ? ? .read_map = blkiocg_io_serviced_read, >> + ? ? }, >> + ? ? { >> + ? ? ? ? ? ? .name = "io_service_time", >> + ? ? ? ? ? ? .read_map = blkiocg_io_service_time_read, >> + ? ? }, >> + ? ? { >> + ? ? ? ? ? ? .name = "io_wait_time", >> + ? ? ? ? ? ? .read_map = blkiocg_io_wait_time_read, >> + ? ? }, >> + ? ? { >> + ? ? ? ? ? ? .name = "reset_stats", >> + ? ? ? ? ? ? .write_u64 = blkiocg_reset_write, > > use blkiocg_reset_stats? Will do > > Thanks > Vivek > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/