2010-06-12 02:35:33

by Divyesh Shah

[permalink] [raw]
Subject: [PATCH 1/2] Remove preempt_enable/disable calls around sched_clock()

calls in the block layer. This was a temporary fix added.

Signed-off-by: Divyesh Shah <[email protected]>
---
include/linux/blkdev.h | 4 ----
1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 09a8402..ebe788e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1218,16 +1218,12 @@ int kblockd_schedule_work(struct request_queue *q, struct work_struct *work);
*/
static inline void set_start_time_ns(struct request *req)
{
- preempt_disable();
req->start_time_ns = sched_clock();
- preempt_enable();
}

static inline void set_io_start_time_ns(struct request *req)
{
- preempt_disable();
req->io_start_time_ns = sched_clock();
- preempt_enable();
}

static inline uint64_t rq_start_time_ns(struct request *req)


2010-06-12 02:35:42

by Divyesh Shah

[permalink] [raw]
Subject: [PATCH 2/2] Use ktime_get() instead of sched_clock() for blkio cgroup stats.

This will take care of the pre-emptive kernel issue and the unbounded
TSC drift problem. We will lose resolution though in some cases.

Signed-off-by: Divyesh Shah <[email protected]>
---
block/blk-cgroup.c | 22 +++++++++++-----------
include/linux/blkdev.h | 4 ++--
2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index a680964..711766d 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -135,19 +135,19 @@ static void blkio_set_start_group_wait_time(struct blkio_group *blkg,
return;
if (blkg == curr_blkg)
return;
- blkg->stats.start_group_wait_time = sched_clock();
+ blkg->stats.start_group_wait_time = ktime_to_ns(ktime_get());
blkio_mark_blkg_waiting(&blkg->stats);
}

/* This should be called with the blkg->stats_lock held. */
static void blkio_update_group_wait_time(struct blkio_group_stats *stats)
{
- unsigned long long now;
+ u64 now;

if (!blkio_blkg_waiting(stats))
return;

- now = sched_clock();
+ now = ktime_to_ns(ktime_get());
if (time_after64(now, stats->start_group_wait_time))
stats->group_wait_time += now - stats->start_group_wait_time;
blkio_clear_blkg_waiting(stats);
@@ -156,12 +156,12 @@ static void blkio_update_group_wait_time(struct blkio_group_stats *stats)
/* This should be called with the blkg->stats_lock held. */
static void blkio_end_empty_time(struct blkio_group_stats *stats)
{
- unsigned long long now;
+ u64 now;

if (!blkio_blkg_empty(stats))
return;

- now = sched_clock();
+ now = ktime_to_ns(ktime_get());
if (time_after64(now, stats->start_empty_time))
stats->empty_time += now - stats->start_empty_time;
blkio_clear_blkg_empty(stats);
@@ -173,7 +173,7 @@ void blkiocg_update_set_idle_time_stats(struct blkio_group *blkg)

spin_lock_irqsave(&blkg->stats_lock, flags);
BUG_ON(blkio_blkg_idling(&blkg->stats));
- blkg->stats.start_idle_time = sched_clock();
+ blkg->stats.start_idle_time = ktime_to_ns(ktime_get());
blkio_mark_blkg_idling(&blkg->stats);
spin_unlock_irqrestore(&blkg->stats_lock, flags);
}
@@ -182,13 +182,13 @@ EXPORT_SYMBOL_GPL(blkiocg_update_set_idle_time_stats);
void blkiocg_update_idle_time_stats(struct blkio_group *blkg)
{
unsigned long flags;
- unsigned long long now;
+ u64 now;
struct blkio_group_stats *stats;

spin_lock_irqsave(&blkg->stats_lock, flags);
stats = &blkg->stats;
if (blkio_blkg_idling(stats)) {
- now = sched_clock();
+ now = ktime_to_ns(ktime_get());
if (time_after64(now, stats->start_idle_time))
stats->idle_time += now - stats->start_idle_time;
blkio_clear_blkg_idling(stats);
@@ -237,7 +237,7 @@ void blkiocg_set_start_empty_time(struct blkio_group *blkg)
return;
}

- stats->start_empty_time = sched_clock();
+ stats->start_empty_time = ktime_to_ns(ktime_get());
blkio_mark_blkg_empty(stats);
spin_unlock_irqrestore(&blkg->stats_lock, flags);
}
@@ -314,7 +314,7 @@ void blkiocg_update_completion_stats(struct blkio_group *blkg,
{
struct blkio_group_stats *stats;
unsigned long flags;
- unsigned long long now = sched_clock();
+ u64 now = ktime_to_ns(ktime_get());

spin_lock_irqsave(&blkg->stats_lock, flags);
stats = &blkg->stats;
@@ -464,7 +464,7 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val)
int i;
#ifdef CONFIG_DEBUG_BLK_CGROUP
bool idling, waiting, empty;
- unsigned long long now = sched_clock();
+ u64 now = ktime_to_ns(ktime_get());
#endif

blkcg = cgroup_to_blkio_cgroup(cgroup);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ebe788e..f174b34 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1218,12 +1218,12 @@ int kblockd_schedule_work(struct request_queue *q, struct work_struct *work);
*/
static inline void set_start_time_ns(struct request *req)
{
- req->start_time_ns = sched_clock();
+ req->start_time_ns = ktime_to_ns(ktime_get());
}

static inline void set_io_start_time_ns(struct request *req)
{
- req->io_start_time_ns = sched_clock();
+ req->io_start_time_ns = ktime_to_ns(ktime_get());
}

static inline uint64_t rq_start_time_ns(struct request *req)

2010-06-12 07:19:15

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] Use ktime_get() instead of sched_clock() for blkio cgroup stats.


* Divyesh Shah <[email protected]> wrote:

> This will take care of the pre-emptive kernel issue and the unbounded
> TSC drift problem. We will lose resolution though in some cases.

> - blkg->stats.start_group_wait_time = sched_clock();
> + blkg->stats.start_group_wait_time = ktime_to_ns(ktime_get());

Ugh!

ktime_get() can have insanely high overhead. Peter has added local_clock(), if
then you should use that and apply checks to make sure the result isnt
negative.

Ingo

2010-06-12 07:20:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/2] Remove preempt_enable/disable calls around sched_clock()


* Divyesh Shah <[email protected]> wrote:

> calls in the block layer. This was a temporary fix added.

Not good, this commit reintroduces those ugly warnings, if someone happens to
bisect straight into this commit.

Ingo

2010-06-12 15:53:12

by Divyesh Shah

[permalink] [raw]
Subject: Re: [PATCH 1/2] Remove preempt_enable/disable calls around sched_clock()

On Sat, Jun 12, 2010 at 12:19 AM, Ingo Molnar <[email protected]> wrote:
>
> * Divyesh Shah <[email protected]> wrote:
>
>> calls in the block layer. This was a temporary fix added.
>
> Not good, this commit reintroduces those ugly warnings, if someone happens to
> bisect straight into this commit.

You're right. In the next version, I'll merge both patches into one so
this cannot happen.

>
> ? ? ? ?Ingo
>

2010-06-12 16:02:52

by Divyesh Shah

[permalink] [raw]
Subject: Re: [PATCH 2/2] Use ktime_get() instead of sched_clock() for blkio cgroup stats.

On Sat, Jun 12, 2010 at 12:18 AM, Ingo Molnar <[email protected]> wrote:
>
> * Divyesh Shah <[email protected]> wrote:
>
>> This will take care of the pre-emptive kernel issue and the unbounded
>> TSC drift problem. We will lose resolution though in some cases.
>
>> - ? ? blkg->stats.start_group_wait_time = sched_clock();
>> + ? ? blkg->stats.start_group_wait_time = ktime_to_ns(ktime_get());
>
> Ugh!
>
> ktime_get() can have insanely high overhead. Peter has added local_clock(), if
> then you should use that and apply checks to make sure the result isnt
> negative.

Ok. The negative checks are already in the code.

>
> ? ? ? ?Ingo
>

2010-06-13 10:01:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/2] Remove preempt_enable/disable calls around sched_clock()


* Divyesh Shah <[email protected]> wrote:

> On Sat, Jun 12, 2010 at 12:19 AM, Ingo Molnar <[email protected]> wrote:
> >
> > * Divyesh Shah <[email protected]> wrote:
> >
> >> calls in the block layer. This was a temporary fix added.
> >
> > Not good, this commit reintroduces those ugly warnings, if someone happens to
> > bisect straight into this commit.
>
> You're right. In the next version, I'll merge both patches into one so
> this cannot happen.

Yeah - or if you really want to make a point of doing the two things in two
separate patches you can first fix the API, _then_ remove the preemption
enable/disable.

Thanks,

Ingo