2019-06-14 11:45:33

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH 0/2] Fix misuse of blk_rq_stats in blk-iolatency

From: Pavel Begunkov <[email protected]>

There are implicit assumptions about struct blk_rq_stats, which make
it's very easy to misuse. The first patch fixes consequences, and the
second employs type-system to prevent recurrences.


Pavel Begunkov (2):
blk-iolatency: Fix zero mean in previous stats
blk-stats: Introduce explicit stat staging buffers

block/blk-iolatency.c | 60 ++++++++++++++++++++++++++++++---------
block/blk-stat.c | 48 +++++++++++++++++++++++--------
block/blk-stat.h | 9 ++++--
include/linux/blk_types.h | 6 ++++
4 files changed, 94 insertions(+), 29 deletions(-)

--
2.22.0


2019-06-14 11:49:54

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH 1/2] blk-iolatency: Fix zero mean in previous stats

From: Pavel Begunkov <[email protected]>

struct blk_rq_stat has two implicit states in which it can be:
(1) per-cpu intermediate stats (i.e. staging, intermediate)
(2) final stats / aggregation of (1) (see blk_rq_stat_collect)

The states use different sets of fields. E.g. (1) uses @batch but not
@mean, and vise versa for (2). So, any function that uses
struct blk_rq_stat has implicit assumptions about the states.

blk_rq_stat_sum() expects @src to be in (1) and @dst in (2).
iolatency_check_latencies() violates that (passing struct blk_rq_stat,
previously used as @dst, as @src). As a result, iolat->cur_stat.rqs.mean
is always 0 for non-ssd devices.

Use 2 distinct functions instead: one to collect intermediate stats
(i.e. with valid batch), and the second one for merging already
accumulated stats (i.e. with valid mean).

Signed-off-by: Pavel Begunkov <[email protected]>
---
block/blk-iolatency.c | 21 ++++++++++++++++-----
block/blk-stat.c | 20 ++++++++++++++++++--
block/blk-stat.h | 3 ++-
3 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index d22e61bced86..fc8ce1a0ae21 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -199,7 +199,7 @@ static inline void latency_stat_init(struct iolatency_grp *iolat,
blk_rq_stat_init(&stat->rqs);
}

-static inline void latency_stat_sum(struct iolatency_grp *iolat,
+static inline void latency_stat_merge(struct iolatency_grp *iolat,
struct latency_stat *sum,
struct latency_stat *stat)
{
@@ -207,7 +207,18 @@ static inline void latency_stat_sum(struct iolatency_grp *iolat,
sum->ps.total += stat->ps.total;
sum->ps.missed += stat->ps.missed;
} else
- blk_rq_stat_sum(&sum->rqs, &stat->rqs);
+ blk_rq_stat_merge(&sum->rqs, &stat->rqs);
+}
+
+static inline void latency_stat_collect(struct iolatency_grp *iolat,
+ struct latency_stat *sum,
+ struct latency_stat *stat)
+{
+ if (iolat->ssd) {
+ sum->ps.total += stat->ps.total;
+ sum->ps.missed += stat->ps.missed;
+ } else
+ blk_rq_stat_collect(&sum->rqs, &stat->rqs);
}

static inline void latency_stat_record_time(struct iolatency_grp *iolat,
@@ -531,7 +542,7 @@ static void iolatency_check_latencies(struct iolatency_grp *iolat, u64 now)
for_each_online_cpu(cpu) {
struct latency_stat *s;
s = per_cpu_ptr(iolat->stats, cpu);
- latency_stat_sum(iolat, &stat, s);
+ latency_stat_collect(iolat, &stat, s);
latency_stat_init(iolat, s);
}
preempt_enable();
@@ -552,7 +563,7 @@ static void iolatency_check_latencies(struct iolatency_grp *iolat, u64 now)
/* Somebody beat us to the punch, just bail. */
spin_lock_irqsave(&lat_info->lock, flags);

- latency_stat_sum(iolat, &iolat->cur_stat, &stat);
+ latency_stat_merge(iolat, &iolat->cur_stat, &stat);
lat_info->nr_samples -= iolat->nr_samples;
lat_info->nr_samples += latency_stat_samples(iolat, &iolat->cur_stat);
iolat->nr_samples = latency_stat_samples(iolat, &iolat->cur_stat);
@@ -913,7 +924,7 @@ static size_t iolatency_ssd_stat(struct iolatency_grp *iolat, char *buf,
for_each_online_cpu(cpu) {
struct latency_stat *s;
s = per_cpu_ptr(iolat->stats, cpu);
- latency_stat_sum(iolat, &stat, s);
+ latency_stat_collect(iolat, &stat, s);
}
preempt_enable();

diff --git a/block/blk-stat.c b/block/blk-stat.c
index 940f15d600f8..78389182b5d0 100644
--- a/block/blk-stat.c
+++ b/block/blk-stat.c
@@ -26,7 +26,7 @@ void blk_rq_stat_init(struct blk_rq_stat *stat)
}

/* src is a per-cpu stat, mean isn't initialized */
-void blk_rq_stat_sum(struct blk_rq_stat *dst, struct blk_rq_stat *src)
+void blk_rq_stat_collect(struct blk_rq_stat *dst, struct blk_rq_stat *src)
{
if (!src->nr_samples)
return;
@@ -40,6 +40,21 @@ void blk_rq_stat_sum(struct blk_rq_stat *dst, struct blk_rq_stat *src)
dst->nr_samples += src->nr_samples;
}

+void blk_rq_stat_merge(struct blk_rq_stat *dst, struct blk_rq_stat *src)
+{
+ if (!src->nr_samples)
+ return;
+
+ dst->min = min(dst->min, src->min);
+ dst->max = max(dst->max, src->max);
+
+ dst->mean = div_u64(src->mean * src->nr_samples +
+ dst->mean * dst->nr_samples,
+ dst->nr_samples + src->nr_samples);
+
+ dst->nr_samples += src->nr_samples;
+}
+
void blk_rq_stat_add(struct blk_rq_stat *stat, u64 value)
{
stat->min = min(stat->min, value);
@@ -90,7 +105,8 @@ static void blk_stat_timer_fn(struct timer_list *t)

cpu_stat = per_cpu_ptr(cb->cpu_stat, cpu);
for (bucket = 0; bucket < cb->buckets; bucket++) {
- blk_rq_stat_sum(&cb->stat[bucket], &cpu_stat[bucket]);
+ blk_rq_stat_collect(&cb->stat[bucket],
+ &cpu_stat[bucket]);
blk_rq_stat_init(&cpu_stat[bucket]);
}
}
diff --git a/block/blk-stat.h b/block/blk-stat.h
index 17b47a86eefb..5597ecc34ef5 100644
--- a/block/blk-stat.h
+++ b/block/blk-stat.h
@@ -165,7 +165,8 @@ static inline void blk_stat_activate_msecs(struct blk_stat_callback *cb,
}

void blk_rq_stat_add(struct blk_rq_stat *, u64);
-void blk_rq_stat_sum(struct blk_rq_stat *, struct blk_rq_stat *);
+void blk_rq_stat_collect(struct blk_rq_stat *dst, struct blk_rq_stat *src);
+void blk_rq_stat_merge(struct blk_rq_stat *dst, struct blk_rq_stat *src);
void blk_rq_stat_init(struct blk_rq_stat *);

#endif
--
2.22.0

2019-06-14 11:50:33

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH 2/2] blk-stats: Introduce explicit stat staging buffers

From: Pavel Begunkov <[email protected]>

Split struct blk_rq_stat into 2 structs, so each would explicitely
represent one of the mentioned states. That duplicates code, but
1. prevents misuses (compile-time check by type-system)
2. reduces memory needed (inc. per-cpu)
3. makes it easier to extend stats

Signed-off-by: Pavel Begunkov <[email protected]>
---
block/blk-iolatency.c | 41 +++++++++++++++++++++++++++++----------
block/blk-stat.c | 30 +++++++++++++++++-----------
block/blk-stat.h | 8 +++++---
include/linux/blk_types.h | 6 ++++++
4 files changed, 61 insertions(+), 24 deletions(-)

diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index fc8ce1a0ae21..fbf986a0b8c2 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -130,9 +130,16 @@ struct latency_stat {
};
};

+struct latency_stat_staging {
+ union {
+ struct percentile_stats ps;
+ struct blk_rq_stat_staging rqs;
+ };
+};
+
struct iolatency_grp {
struct blkg_policy_data pd;
- struct latency_stat __percpu *stats;
+ struct latency_stat_staging __percpu *stats;
struct latency_stat cur_stat;
struct blk_iolatency *blkiolat;
struct rq_depth rq_depth;
@@ -199,6 +206,16 @@ static inline void latency_stat_init(struct iolatency_grp *iolat,
blk_rq_stat_init(&stat->rqs);
}

+static inline void latency_stat_init_staging(struct iolatency_grp *iolat,
+ struct latency_stat_staging *stat)
+{
+ if (iolat->ssd) {
+ stat->ps.total = 0;
+ stat->ps.missed = 0;
+ } else
+ blk_rq_stat_init_staging(&stat->rqs);
+}
+
static inline void latency_stat_merge(struct iolatency_grp *iolat,
struct latency_stat *sum,
struct latency_stat *stat)
@@ -212,7 +229,7 @@ static inline void latency_stat_merge(struct iolatency_grp *iolat,

static inline void latency_stat_collect(struct iolatency_grp *iolat,
struct latency_stat *sum,
- struct latency_stat *stat)
+ struct latency_stat_staging *stat)
{
if (iolat->ssd) {
sum->ps.total += stat->ps.total;
@@ -224,7 +241,8 @@ static inline void latency_stat_collect(struct iolatency_grp *iolat,
static inline void latency_stat_record_time(struct iolatency_grp *iolat,
u64 req_time)
{
- struct latency_stat *stat = get_cpu_ptr(iolat->stats);
+ struct latency_stat_staging *stat = get_cpu_ptr(iolat->stats);
+
if (iolat->ssd) {
if (req_time >= iolat->min_lat_nsec)
stat->ps.missed++;
@@ -540,10 +558,11 @@ static void iolatency_check_latencies(struct iolatency_grp *iolat, u64 now)
latency_stat_init(iolat, &stat);
preempt_disable();
for_each_online_cpu(cpu) {
- struct latency_stat *s;
+ struct latency_stat_staging *s;
+
s = per_cpu_ptr(iolat->stats, cpu);
latency_stat_collect(iolat, &stat, s);
- latency_stat_init(iolat, s);
+ latency_stat_init_staging(iolat, s);
}
preempt_enable();

@@ -922,7 +941,8 @@ static size_t iolatency_ssd_stat(struct iolatency_grp *iolat, char *buf,
latency_stat_init(iolat, &stat);
preempt_disable();
for_each_online_cpu(cpu) {
- struct latency_stat *s;
+ struct latency_stat_staging *s;
+
s = per_cpu_ptr(iolat->stats, cpu);
latency_stat_collect(iolat, &stat, s);
}
@@ -966,8 +986,8 @@ static struct blkg_policy_data *iolatency_pd_alloc(gfp_t gfp, int node)
iolat = kzalloc_node(sizeof(*iolat), gfp, node);
if (!iolat)
return NULL;
- iolat->stats = __alloc_percpu_gfp(sizeof(struct latency_stat),
- __alignof__(struct latency_stat), gfp);
+ iolat->stats = __alloc_percpu_gfp(sizeof(struct latency_stat_staging),
+ __alignof__(struct latency_stat_staging), gfp);
if (!iolat->stats) {
kfree(iolat);
return NULL;
@@ -990,9 +1010,10 @@ static void iolatency_pd_init(struct blkg_policy_data *pd)
iolat->ssd = false;

for_each_possible_cpu(cpu) {
- struct latency_stat *stat;
+ struct latency_stat_staging *stat;
+
stat = per_cpu_ptr(iolat->stats, cpu);
- latency_stat_init(iolat, stat);
+ latency_stat_init_staging(iolat, stat);
}

latency_stat_init(iolat, &iolat->cur_stat);
diff --git a/block/blk-stat.c b/block/blk-stat.c
index 78389182b5d0..d892ad2cb938 100644
--- a/block/blk-stat.c
+++ b/block/blk-stat.c
@@ -18,15 +18,22 @@ struct blk_queue_stats {
bool enable_accounting;
};

+void blk_rq_stat_init_staging(struct blk_rq_stat_staging *stat)
+{
+ stat->min = -1ULL;
+ stat->max = 0;
+ stat->batch = 0;
+ stat->nr_samples = 0;
+}
+
void blk_rq_stat_init(struct blk_rq_stat *stat)
{
stat->min = -1ULL;
stat->max = stat->nr_samples = stat->mean = 0;
- stat->batch = 0;
}

-/* src is a per-cpu stat, mean isn't initialized */
-void blk_rq_stat_collect(struct blk_rq_stat *dst, struct blk_rq_stat *src)
+void blk_rq_stat_collect(struct blk_rq_stat *dst,
+ struct blk_rq_stat_staging *src)
{
if (!src->nr_samples)
return;
@@ -55,7 +62,7 @@ void blk_rq_stat_merge(struct blk_rq_stat *dst, struct blk_rq_stat *src)
dst->nr_samples += src->nr_samples;
}

-void blk_rq_stat_add(struct blk_rq_stat *stat, u64 value)
+void blk_rq_stat_add(struct blk_rq_stat_staging *stat, u64 value)
{
stat->min = min(stat->min, value);
stat->max = max(stat->max, value);
@@ -67,7 +74,7 @@ void blk_stat_add(struct request *rq, u64 now)
{
struct request_queue *q = rq->q;
struct blk_stat_callback *cb;
- struct blk_rq_stat *stat;
+ struct blk_rq_stat_staging *stat;
int bucket;
u64 value;

@@ -101,13 +108,13 @@ static void blk_stat_timer_fn(struct timer_list *t)
blk_rq_stat_init(&cb->stat[bucket]);

for_each_online_cpu(cpu) {
- struct blk_rq_stat *cpu_stat;
+ struct blk_rq_stat_staging *cpu_stat;

cpu_stat = per_cpu_ptr(cb->cpu_stat, cpu);
for (bucket = 0; bucket < cb->buckets; bucket++) {
blk_rq_stat_collect(&cb->stat[bucket],
&cpu_stat[bucket]);
- blk_rq_stat_init(&cpu_stat[bucket]);
+ blk_rq_stat_init_staging(&cpu_stat[bucket]);
}
}

@@ -131,8 +138,9 @@ blk_stat_alloc_callback(void (*timer_fn)(struct blk_stat_callback *),
kfree(cb);
return NULL;
}
- cb->cpu_stat = __alloc_percpu(buckets * sizeof(struct blk_rq_stat),
- __alignof__(struct blk_rq_stat));
+ cb->cpu_stat = __alloc_percpu(
+ buckets * sizeof(struct blk_rq_stat_staging),
+ __alignof__(struct blk_rq_stat_staging));
if (!cb->cpu_stat) {
kfree(cb->stat);
kfree(cb);
@@ -155,11 +163,11 @@ void blk_stat_add_callback(struct request_queue *q,
int cpu;

for_each_possible_cpu(cpu) {
- struct blk_rq_stat *cpu_stat;
+ struct blk_rq_stat_staging *cpu_stat;

cpu_stat = per_cpu_ptr(cb->cpu_stat, cpu);
for (bucket = 0; bucket < cb->buckets; bucket++)
- blk_rq_stat_init(&cpu_stat[bucket]);
+ blk_rq_stat_init_staging(&cpu_stat[bucket]);
}

spin_lock(&q->stats->lock);
diff --git a/block/blk-stat.h b/block/blk-stat.h
index 5597ecc34ef5..e5c753fbd6e6 100644
--- a/block/blk-stat.h
+++ b/block/blk-stat.h
@@ -30,7 +30,7 @@ struct blk_stat_callback {
/**
* @cpu_stat: Per-cpu statistics buckets.
*/
- struct blk_rq_stat __percpu *cpu_stat;
+ struct blk_rq_stat_staging __percpu *cpu_stat;

/**
* @bucket_fn: Given a request, returns which statistics bucket it
@@ -164,9 +164,11 @@ static inline void blk_stat_activate_msecs(struct blk_stat_callback *cb,
mod_timer(&cb->timer, jiffies + msecs_to_jiffies(msecs));
}

-void blk_rq_stat_add(struct blk_rq_stat *, u64);
-void blk_rq_stat_collect(struct blk_rq_stat *dst, struct blk_rq_stat *src);
+void blk_rq_stat_add(struct blk_rq_stat_staging *stat, u64);
+void blk_rq_stat_collect(struct blk_rq_stat *dst,
+ struct blk_rq_stat_staging *src);
void blk_rq_stat_merge(struct blk_rq_stat *dst, struct blk_rq_stat *src);
void blk_rq_stat_init(struct blk_rq_stat *);
+void blk_rq_stat_init_staging(struct blk_rq_stat_staging *stat);

#endif
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index be418275763c..2db5a5fd318f 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -449,7 +449,13 @@ struct blk_rq_stat {
u64 min;
u64 max;
u32 nr_samples;
+};
+
+struct blk_rq_stat_staging {
+ u64 min;
+ u64 max;
u64 batch;
+ u32 nr_samples;
};

#endif /* __LINUX_BLK_TYPES_H */
--
2.22.0

2019-06-14 13:41:38

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix misuse of blk_rq_stats in blk-iolatency

On Fri, Jun 14, 2019 at 02:44:11PM +0300, Pavel Begunkov (Silence) wrote:
> From: Pavel Begunkov <[email protected]>
>
> There are implicit assumptions about struct blk_rq_stats, which make
> it's very easy to misuse. The first patch fixes consequences, and the
> second employs type-system to prevent recurrences.
>
>
> Pavel Begunkov (2):
> blk-iolatency: Fix zero mean in previous stats
> blk-stats: Introduce explicit stat staging buffers
>

I don't have a problem with this, but it's up to Jens I suppose

Acked-by: Josef Bacik <[email protected]>

Thanks,

Josef

2019-06-20 17:19:44

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix misuse of blk_rq_stats in blk-iolatency

Hi,

Josef, thanks for taking a look.


Although, there is nothing critical yet -- just a not working / disabled
optimisation, but changes in stats could sublty break it. E.g. grouping
@batch and @mean into a union will increase estimated average by several
orders of magnitude.

Jens, what do you think?



On 14/06/2019 16:40, Josef Bacik wrote:
> On Fri, Jun 14, 2019 at 02:44:11PM +0300, Pavel Begunkov (Silence) wrote:
>> From: Pavel Begunkov <[email protected]>
>>
>> There are implicit assumptions about struct blk_rq_stats, which make
>> it's very easy to misuse. The first patch fixes consequences, and the
>> second employs type-system to prevent recurrences.
>>
>>
>> Pavel Begunkov (2):
>> blk-iolatency: Fix zero mean in previous stats
>> blk-stats: Introduce explicit stat staging buffers
>>
>
> I don't have a problem with this, but it's up to Jens I suppose
>
> Acked-by: Josef Bacik <[email protected]>
>
> Thanks,
>
> Josef
>

--
Yours sincerely,
Pavel Begunkov


Attachments:
signature.asc (849.00 B)
OpenPGP digital signature

2019-06-29 15:38:29

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix misuse of blk_rq_stats in blk-iolatency

Ping?

On 20/06/2019 10:18, Pavel Begunkov wrote:
> Hi,
>
> Josef, thanks for taking a look.
>
>
> Although, there is nothing critical yet -- just a not working / disabled
> optimisation, but changes in stats could sublty break it. E.g. grouping
> @batch and @mean into a union will increase estimated average by several
> orders of magnitude.
>
> Jens, what do you think?
>
>
>
> On 14/06/2019 16:40, Josef Bacik wrote:
>> On Fri, Jun 14, 2019 at 02:44:11PM +0300, Pavel Begunkov (Silence) wrote:
>>> From: Pavel Begunkov <[email protected]>
>>>
>>> There are implicit assumptions about struct blk_rq_stats, which make
>>> it's very easy to misuse. The first patch fixes consequences, and the
>>> second employs type-system to prevent recurrences.
>>>
>>>
>>> Pavel Begunkov (2):
>>> blk-iolatency: Fix zero mean in previous stats
>>> blk-stats: Introduce explicit stat staging buffers
>>>
>>
>> I don't have a problem with this, but it's up to Jens I suppose
>>
>> Acked-by: Josef Bacik <[email protected]>
>>
>> Thanks,
>>
>> Josef
>>
>

--
Yours sincerely,
Pavel Begunkov


Attachments:
signature.asc (849.00 B)
OpenPGP digital signature

2019-07-11 01:21:27

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix misuse of blk_rq_stats in blk-iolatency

Hi,

Any thoughts? Is there something wrong with the patchset?


On 29/06/2019 18:37, Pavel Begunkov wrote:
> Ping?
>
> On 20/06/2019 10:18, Pavel Begunkov wrote:
>> Hi,
>>
>> Josef, thanks for taking a look.
>>
>>
>> Although, there is nothing critical yet -- just a not working / disabled
>> optimisation, but changes in stats could sublty break it. E.g. grouping
>> @batch and @mean into a union will increase estimated average by several
>> orders of magnitude.
>>
>> Jens, what do you think?
>>
>>
>>
>> On 14/06/2019 16:40, Josef Bacik wrote:
>>> On Fri, Jun 14, 2019 at 02:44:11PM +0300, Pavel Begunkov (Silence) wrote:
>>>> From: Pavel Begunkov <[email protected]>
>>>>
>>>> There are implicit assumptions about struct blk_rq_stats, which make
>>>> it's very easy to misuse. The first patch fixes consequences, and the
>>>> second employs type-system to prevent recurrences.
>>>>
>>>>
>>>> Pavel Begunkov (2):
>>>> blk-iolatency: Fix zero mean in previous stats
>>>> blk-stats: Introduce explicit stat staging buffers
>>>>
>>>
>>> I don't have a problem with this, but it's up to Jens I suppose
>>>
>>> Acked-by: Josef Bacik <[email protected]>
>>>
>>> Thanks,
>>>
>>> Josef
>>>
>>
>

--
Yours sincerely,
Pavel Begunkov

2019-09-30 08:32:27

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [RESEND RFC PATCH 0/2] Fix misuse of blk_rq_stats in blk-iolatency

Hi,

I claim, that there is a bug, that hopefully doesn't show itself apart
from a minor disabled optimisation. It's _too_ easy to misuse, and if
somebody try to reuse, this could lead to quite interesting issues.

Could somebody at last take a look?
Thanks

On 25/07/2019 00:35, Pavel Begunkov (Silence) wrote:
> From: Pavel Begunkov <[email protected]>
>
> There are implicit assumptions about struct blk_rq_stats, which make
> it's very easy to misuse. The first patch fixes consequences, and the
> second employs type-system to prevent recurrences.
>
> Acked-by: Josef Bacik <[email protected]>
>
> Pavel Begunkov (2):
> blk-iolatency: Fix zero mean in previous stats
> blk-stats: Introduce explicit stat staging buffers
>
> block/blk-iolatency.c | 60 ++++++++++++++++++++++++++++++---------
> block/blk-stat.c | 48 +++++++++++++++++++++++--------
> block/blk-stat.h | 9 ++++--
> include/linux/blk_types.h | 6 ++++
> 4 files changed, 94 insertions(+), 29 deletions(-)
>

--
Yours sincerely,
Pavel Begunkov


Attachments:
signature.asc (849.00 B)
OpenPGP digital signature