2018-01-09 09:28:18

by Paolo Valente

[permalink] [raw]
Subject: [PATCH BUGFIX V2 0/2] block, bfq: fix two memory leaks related to cgroups

[There was a mistake in the subject of the second patch, sorry]

Hi Jens,
these two patches fix two related memory leaks, the first reported in
[1], and the second found by ourselves while fixing the first.

Thanks,
Paolo

[1] https://www.mail-archive.com/[email protected]/msg16258.html

Paolo Valente (2):
block, bfq: put async queues for root bfq groups too
block, bfq: release oom-queue ref to root group on exit

block/bfq-cgroup.c | 7 +++++--
block/bfq-iosched.c | 3 +++
2 files changed, 8 insertions(+), 2 deletions(-)

--
2.15.1


2018-01-09 09:28:29

by Paolo Valente

[permalink] [raw]
Subject: [PATCH BUGFIX V2 2/2] block, bfq: release oom-queue ref to root group on exit

On scheduler init, a reference to the root group, and a reference to
its corresponding blkg are taken for the oom queue. Yet these
references are not released on scheduler exit, which prevents these
objects from be freed. This commit adds the missing reference
releases.

Reported-by: Davide Ferrari <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-iosched.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index ea48b5c8f088..c56a495af2e8 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5015,6 +5015,9 @@ static void bfq_exit_queue(struct elevator_queue *e)

hrtimer_cancel(&bfqd->idle_slice_timer);

+ /* release oom-queue reference to root group */
+ bfqg_and_blkg_put(bfqd->root_group);
+
#ifdef CONFIG_BFQ_GROUP_IOSCHED
blkcg_deactivate_policy(bfqd->queue, &blkcg_policy_bfq);
#else
--
2.15.1

2018-01-09 09:29:22

by Paolo Valente

[permalink] [raw]
Subject: [PATCH BUGFIX V2 1/2] block, bfq: put async queues for root bfq groups too

For each pair [device for which bfq is selected as I/O scheduler,
group in blkio/io], bfq maintains a corresponding bfq group. Each such
bfq group contains a set of async queues, with each async queue
created on demand, i.e., when some I/O request arrives for it. On
creation, an async queue gets an extra reference, to make sure that
the queue is not freed as long as its bfq group exists. Accordingly,
to allow the queue to be freed after the group exited, this extra
reference must released on group exit.

The above holds also for a bfq root group, i.e., for the bfq group
corresponding to the root blkio/io root for a given device. Yet, by
mistake, the references to the existing async queues of a root group
are not released when the latter exits. This causes a memory leak when
the instance of bfq for a given device exits. In a similar vein,
bfqg_stats_xfer_dead is not executed for a root group.

This commit fixes bfq_pd_offline so that the latter executes the above
missing operations for a root group too.

Reported-by: Holger Hoffstätte <[email protected]>
Reported-by: Guoqing Jiang <[email protected]>
Signed-off-by: Davide Ferrari <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-cgroup.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index da1525ec4c87..d819dc77fe65 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -775,10 +775,11 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
unsigned long flags;
int i;

+ spin_lock_irqsave(&bfqd->lock, flags);
+
if (!entity) /* root group */
- return;
+ goto put_async_queues;

- spin_lock_irqsave(&bfqd->lock, flags);
/*
* Empty all service_trees belonging to this group before
* deactivating the group itself.
@@ -809,6 +810,8 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
}

__bfq_deactivate_entity(entity, false);
+
+put_async_queues:
bfq_put_async_queues(bfqd, bfqg);

spin_unlock_irqrestore(&bfqd->lock, flags);
--
2.15.1

2018-01-09 15:30:55

by Holger Hoffstätte

[permalink] [raw]
Subject: Re: [PATCH BUGFIX V2 0/2] block, bfq: fix two memory leaks related to cgroups

On 01/09/18 10:27, Paolo Valente wrote:
> [There was a mistake in the subject of the second patch, sorry]
>
> Hi Jens,
> these two patches fix two related memory leaks, the first reported in
> [1], and the second found by ourselves while fixing the first.
>
> Thanks,
> Paolo
>
> [1] https://www.mail-archive.com/[email protected]/msg16258.html
>
> Paolo Valente (2):
> block, bfq: put async queues for root bfq groups too
> block, bfq: release oom-queue ref to root group on exit
>
> block/bfq-cgroup.c | 7 +++++--
> block/bfq-iosched.c | 3 +++
> 2 files changed, 8 insertions(+), 2 deletions(-)

Gave this a try and can't reproduce the leak anymore, so for both patches:

Tested-by: Holger Hoffstätte <[email protected]>

cheers!
Holger

2018-01-10 01:41:30

by Guoqing Jiang

[permalink] [raw]
Subject: Re: [PATCH BUGFIX V2 1/2] block, bfq: put async queues for root bfq groups too



On 01/09/2018 05:27 PM, Paolo Valente wrote:
> For each pair [device for which bfq is selected as I/O scheduler,
> group in blkio/io], bfq maintains a corresponding bfq group. Each such
> bfq group contains a set of async queues, with each async queue
> created on demand, i.e., when some I/O request arrives for it. On
> creation, an async queue gets an extra reference, to make sure that
> the queue is not freed as long as its bfq group exists. Accordingly,
> to allow the queue to be freed after the group exited, this extra
> reference must released on group exit.
>
> The above holds also for a bfq root group, i.e., for the bfq group
> corresponding to the root blkio/io root for a given device. Yet, by
> mistake, the references to the existing async queues of a root group
> are not released when the latter exits. This causes a memory leak when
> the instance of bfq for a given device exits. In a similar vein,
> bfqg_stats_xfer_dead is not executed for a root group.
>
> This commit fixes bfq_pd_offline so that the latter executes the above
> missing operations for a root group too.
>
> Reported-by: Holger Hoffstätte <[email protected]>
> Reported-by: Guoqing Jiang <[email protected]>
> Signed-off-by: Davide Ferrari <[email protected]>
> Signed-off-by: Paolo Valente <[email protected]>
> ---
> block/bfq-cgroup.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> index da1525ec4c87..d819dc77fe65 100644
> --- a/block/bfq-cgroup.c
> +++ b/block/bfq-cgroup.c
> @@ -775,10 +775,11 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
> unsigned long flags;
> int i;
>
> + spin_lock_irqsave(&bfqd->lock, flags);
> +
> if (!entity) /* root group */
> - return;
> + goto put_async_queues;
>
> - spin_lock_irqsave(&bfqd->lock, flags);
> /*
> * Empty all service_trees belonging to this group before
> * deactivating the group itself.
> @@ -809,6 +810,8 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
> }
>
> __bfq_deactivate_entity(entity, false);
> +
> +put_async_queues:
> bfq_put_async_queues(bfqd, bfqg);
>
> spin_unlock_irqrestore(&bfqd->lock, flags);

With this change, bfqg_stats_xfer_dead will be called even entity is not
existed,
is it necessary? Thanks.

Regards,
Guoqing

2018-01-10 06:13:51

by Paolo Valente

[permalink] [raw]
Subject: Re: [PATCH BUGFIX V2 1/2] block, bfq: put async queues for root bfq groups too



> Il giorno 10 gen 2018, alle ore 02:41, Guoqing Jiang <[email protected]> ha scritto:
>
>
>
> On 01/09/2018 05:27 PM, Paolo Valente wrote:
>> For each pair [device for which bfq is selected as I/O scheduler,
>> group in blkio/io], bfq maintains a corresponding bfq group. Each such
>> bfq group contains a set of async queues, with each async queue
>> created on demand, i.e., when some I/O request arrives for it. On
>> creation, an async queue gets an extra reference, to make sure that
>> the queue is not freed as long as its bfq group exists. Accordingly,
>> to allow the queue to be freed after the group exited, this extra
>> reference must released on group exit.
>>
>> The above holds also for a bfq root group, i.e., for the bfq group
>> corresponding to the root blkio/io root for a given device. Yet, by
>> mistake, the references to the existing async queues of a root group
>> are not released when the latter exits. This causes a memory leak when
>> the instance of bfq for a given device exits. In a similar vein,
>> bfqg_stats_xfer_dead is not executed for a root group.
>>
>> This commit fixes bfq_pd_offline so that the latter executes the above
>> missing operations for a root group too.
>>
>> Reported-by: Holger Hoffstätte <[email protected]>
>> Reported-by: Guoqing Jiang <[email protected]>
>> Signed-off-by: Davide Ferrari <[email protected]>
>> Signed-off-by: Paolo Valente <[email protected]>
>> ---
>> block/bfq-cgroup.c | 7 +++++--
>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
>> index da1525ec4c87..d819dc77fe65 100644
>> --- a/block/bfq-cgroup.c
>> +++ b/block/bfq-cgroup.c
>> @@ -775,10 +775,11 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
>> unsigned long flags;
>> int i;
>> + spin_lock_irqsave(&bfqd->lock, flags);
>> +
>> if (!entity) /* root group */
>> - return;
>> + goto put_async_queues;
>> - spin_lock_irqsave(&bfqd->lock, flags);
>> /*
>> * Empty all service_trees belonging to this group before
>> * deactivating the group itself.
>> @@ -809,6 +810,8 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
>> }
>> __bfq_deactivate_entity(entity, false);
>> +
>> +put_async_queues:
>> bfq_put_async_queues(bfqd, bfqg);
>> spin_unlock_irqrestore(&bfqd->lock, flags);
>
> With this change, bfqg_stats_xfer_dead will be called even entity is not existed,

Actually, the entity associated with the bfq group being offlined
exists even if the local variable entity is NULL here. Simply, that
variable gets NULL if the bfq group is the bfq root group for a
device.

> is it necessary?

No, I opted for this solution to not shake the code too much, and
considering that
- bfqg_stats_xfer_dead simply does nothing if applied
to a root group
- who knows, in the future that function may need
do be invoked for a root group too.

Yet, if you guys think that it would be cleaner to not invoke
bfqg_stats_xfer_dead at all for a root group, I'll change the code
accordingly (this would introduce a little asymmetry with
cfq_pd_offline, which invokes cfqg_stats_xfer_dead unconditionally).

Thanks,
Paolo

> Thanks.
>
> Regards,
> Guoqing

2018-01-10 06:32:09

by Guoqing Jiang

[permalink] [raw]
Subject: Re: [PATCH BUGFIX V2 1/2] block, bfq: put async queues for root bfq groups too



On 01/10/2018 02:13 PM, Paolo Valente wrote:
>
>> Il giorno 10 gen 2018, alle ore 02:41, Guoqing Jiang <[email protected]> ha scritto:
>>
>>
>>
>> On 01/09/2018 05:27 PM, Paolo Valente wrote:
>>> For each pair [device for which bfq is selected as I/O scheduler,
>>> group in blkio/io], bfq maintains a corresponding bfq group. Each such
>>> bfq group contains a set of async queues, with each async queue
>>> created on demand, i.e., when some I/O request arrives for it. On
>>> creation, an async queue gets an extra reference, to make sure that
>>> the queue is not freed as long as its bfq group exists. Accordingly,
>>> to allow the queue to be freed after the group exited, this extra
>>> reference must released on group exit.
>>>
>>> The above holds also for a bfq root group, i.e., for the bfq group
>>> corresponding to the root blkio/io root for a given device. Yet, by
>>> mistake, the references to the existing async queues of a root group
>>> are not released when the latter exits. This causes a memory leak when
>>> the instance of bfq for a given device exits. In a similar vein,
>>> bfqg_stats_xfer_dead is not executed for a root group.
>>>
>>> This commit fixes bfq_pd_offline so that the latter executes the above
>>> missing operations for a root group too.
>>>
>>> Reported-by: Holger Hoffstätte <[email protected]>
>>> Reported-by: Guoqing Jiang <[email protected]>
>>> Signed-off-by: Davide Ferrari <[email protected]>
>>> Signed-off-by: Paolo Valente <[email protected]>
>>> ---
>>> block/bfq-cgroup.c | 7 +++++--
>>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
>>> index da1525ec4c87..d819dc77fe65 100644
>>> --- a/block/bfq-cgroup.c
>>> +++ b/block/bfq-cgroup.c
>>> @@ -775,10 +775,11 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
>>> unsigned long flags;
>>> int i;
>>> + spin_lock_irqsave(&bfqd->lock, flags);
>>> +
>>> if (!entity) /* root group */
>>> - return;
>>> + goto put_async_queues;
>>> - spin_lock_irqsave(&bfqd->lock, flags);
>>> /*
>>> * Empty all service_trees belonging to this group before
>>> * deactivating the group itself.
>>> @@ -809,6 +810,8 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
>>> }
>>> __bfq_deactivate_entity(entity, false);
>>> +
>>> +put_async_queues:
>>> bfq_put_async_queues(bfqd, bfqg);
>>> spin_unlock_irqrestore(&bfqd->lock, flags);
>> With this change, bfqg_stats_xfer_dead will be called even entity is not existed,
> Actually, the entity associated with the bfq group being offlined
> exists even if the local variable entity is NULL here. Simply, that
> variable gets NULL if the bfq group is the bfq root group for a
> device.
>
>> is it necessary?
> No, I opted for this solution to not shake the code too much, and
> considering that
> - bfqg_stats_xfer_dead simply does nothing if applied
> to a root group
> - who knows, in the future that function may need
> do be invoked for a root group too.
>
> Yet, if you guys think that it would be cleaner to not invoke
> bfqg_stats_xfer_dead at all for a root group, I'll change the code
> accordingly (this would introduce a little asymmetry with
> cfq_pd_offline, which invokes cfqg_stats_xfer_dead unconditionally).

Thanks a lot for the explanation, I am fine with it.

Acked-by: Guoqing Jiang <[email protected]>

Regards,
Guoqing