2017-04-03 01:19:34

by NeilBrown

[permalink] [raw]
Subject: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread.


When a filesystem is mounted from a loop device, writes are
throttled by balance_dirty_pages() twice: once when writing
to the filesystem and once when the loop_handle_cmd() writes
to the backing file. This double-throttling can trigger
positive feedback loops that create significant delays. The
throttling at the lower level is seen by the upper level as
a slow device, so it throttles extra hard.

The PF_LESS_THROTTLE flag was created to handle exactly this
circumstance, though with an NFS filesystem mounted from a
local NFS server. It reduces the throttling on the lower
layer so that it can proceed largely unthrottled.

To demonstrate this, create a filesystem on a loop device
and write (e.g. with dd) several large files which combine
to consume significantly more than the limit set by
/proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total
time taken.

When I do this directly on a device (no loop device) the
total time for several runs (mkfs, mount, write 200 files,
umount) is fairly stable: 28-35 seconds.
When I do this over a loop device the times are much worse
and less stable. 52-460 seconds. Half below 100seconds,
half above.
When I apply this patch, the times become stable again,
though not as fast as the no-loop-back case: 53-72 seconds.

There may be room for further improvement as the total overhead still
seems too high, but this is a big improvement.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/block/loop.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0ecb6461ed81..a7e1dd215fc2 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1694,8 +1694,11 @@ static void loop_queue_work(struct kthread_work *work)
{
struct loop_cmd *cmd =
container_of(work, struct loop_cmd, work);
+ int oldflags = current->flags & PF_LESS_THROTTLE;

+ current->flags |= PF_LESS_THROTTLE;
loop_handle_cmd(cmd);
+ current->flags = (current->flags & ~PF_LESS_THROTTLE) | oldflags;
}

static int loop_init_request(void *data, struct request *rq,
--
2.12.0


Attachments:
signature.asc (832.00 B)

2017-04-04 07:10:37

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread.

Looks fine,

Reviewed-by: Christoph Hellwig <[email protected]>

But if you actually care about performance in any way I'd suggest
to use the loop device in direct I/O mode..

2017-04-04 11:23:12

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread.

On Mon 03-04-17 11:18:51, NeilBrown wrote:
>
> When a filesystem is mounted from a loop device, writes are
> throttled by balance_dirty_pages() twice: once when writing
> to the filesystem and once when the loop_handle_cmd() writes
> to the backing file. This double-throttling can trigger
> positive feedback loops that create significant delays. The
> throttling at the lower level is seen by the upper level as
> a slow device, so it throttles extra hard.
>
> The PF_LESS_THROTTLE flag was created to handle exactly this
> circumstance, though with an NFS filesystem mounted from a
> local NFS server. It reduces the throttling on the lower
> layer so that it can proceed largely unthrottled.
>
> To demonstrate this, create a filesystem on a loop device
> and write (e.g. with dd) several large files which combine
> to consume significantly more than the limit set by
> /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total
> time taken.
>
> When I do this directly on a device (no loop device) the
> total time for several runs (mkfs, mount, write 200 files,
> umount) is fairly stable: 28-35 seconds.
> When I do this over a loop device the times are much worse
> and less stable. 52-460 seconds. Half below 100seconds,
> half above.
> When I apply this patch, the times become stable again,
> though not as fast as the no-loop-back case: 53-72 seconds.
>
> There may be room for further improvement as the total overhead still
> seems too high, but this is a big improvement.

Yes this makes sense to me

> Signed-off-by: NeilBrown <[email protected]>

Acked-by: Michal Hocko <[email protected]>

one nit below

> ---
> drivers/block/loop.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 0ecb6461ed81..a7e1dd215fc2 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1694,8 +1694,11 @@ static void loop_queue_work(struct kthread_work *work)
> {
> struct loop_cmd *cmd =
> container_of(work, struct loop_cmd, work);
> + int oldflags = current->flags & PF_LESS_THROTTLE;
>
> + current->flags |= PF_LESS_THROTTLE;
> loop_handle_cmd(cmd);
> + current->flags = (current->flags & ~PF_LESS_THROTTLE) | oldflags;

we have a helper for this tsk_restore_flags(). It is not used
consistently and maybe we want a dedicated api like we have for the
scope NOIO/NOFS but that is a separate thing. I would find
tsk_restore_flags easier to read.

--
Michal Hocko
SUSE Labs

2017-04-04 14:24:15

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread.

On Mon, Apr 3, 2017 at 9:18 AM, NeilBrown <[email protected]> wrote:
>
> When a filesystem is mounted from a loop device, writes are
> throttled by balance_dirty_pages() twice: once when writing
> to the filesystem and once when the loop_handle_cmd() writes
> to the backing file. This double-throttling can trigger
> positive feedback loops that create significant delays. The
> throttling at the lower level is seen by the upper level as
> a slow device, so it throttles extra hard.
>
> The PF_LESS_THROTTLE flag was created to handle exactly this
> circumstance, though with an NFS filesystem mounted from a
> local NFS server. It reduces the throttling on the lower
> layer so that it can proceed largely unthrottled.
>
> To demonstrate this, create a filesystem on a loop device
> and write (e.g. with dd) several large files which combine
> to consume significantly more than the limit set by
> /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total
> time taken.
>
> When I do this directly on a device (no loop device) the
> total time for several runs (mkfs, mount, write 200 files,
> umount) is fairly stable: 28-35 seconds.
> When I do this over a loop device the times are much worse
> and less stable. 52-460 seconds. Half below 100seconds,
> half above.
> When I apply this patch, the times become stable again,
> though not as fast as the no-loop-back case: 53-72 seconds.
>
> There may be room for further improvement as the total overhead still
> seems too high, but this is a big improvement.
>
> Signed-off-by: NeilBrown <[email protected]>
> ---
> drivers/block/loop.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 0ecb6461ed81..a7e1dd215fc2 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1694,8 +1694,11 @@ static void loop_queue_work(struct kthread_work *work)
> {
> struct loop_cmd *cmd =
> container_of(work, struct loop_cmd, work);
> + int oldflags = current->flags & PF_LESS_THROTTLE;
>
> + current->flags |= PF_LESS_THROTTLE;
> loop_handle_cmd(cmd);
> + current->flags = (current->flags & ~PF_LESS_THROTTLE) | oldflags;
> }

You can do it against 'lo->worker_task' instead of doing it in each
loop_queue_work(),
and this flag needn't to be restored because the kernel thread is loop
specialized.


thanks,
Ming Lei

2017-04-05 04:27:47

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread.

On Tue, Apr 04 2017, Christoph Hellwig wrote:

> Looks fine,
>
> Reviewed-by: Christoph Hellwig <[email protected]>
>
> But if you actually care about performance in any way I'd suggest
> to use the loop device in direct I/O mode..

The losetup on my test VM is too old to support that :-(
I guess it might be time to upgraded.

It seems that there is not "mount -o direct_loop" or similar, so you
have to do the losetup and the mount separately. Any thoughts on
whether that should be changed ?

Thanks,
NeilBrown


Attachments:
signature.asc (832.00 B)

2017-04-05 04:31:30

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread.

On Tue, Apr 04 2017, Ming Lei wrote:

> On Mon, Apr 3, 2017 at 9:18 AM, NeilBrown <[email protected]> wrote:
>>
>> When a filesystem is mounted from a loop device, writes are
>> throttled by balance_dirty_pages() twice: once when writing
>> to the filesystem and once when the loop_handle_cmd() writes
>> to the backing file. This double-throttling can trigger
>> positive feedback loops that create significant delays. The
>> throttling at the lower level is seen by the upper level as
>> a slow device, so it throttles extra hard.
>>
>> The PF_LESS_THROTTLE flag was created to handle exactly this
>> circumstance, though with an NFS filesystem mounted from a
>> local NFS server. It reduces the throttling on the lower
>> layer so that it can proceed largely unthrottled.
>>
>> To demonstrate this, create a filesystem on a loop device
>> and write (e.g. with dd) several large files which combine
>> to consume significantly more than the limit set by
>> /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total
>> time taken.
>>
>> When I do this directly on a device (no loop device) the
>> total time for several runs (mkfs, mount, write 200 files,
>> umount) is fairly stable: 28-35 seconds.
>> When I do this over a loop device the times are much worse
>> and less stable. 52-460 seconds. Half below 100seconds,
>> half above.
>> When I apply this patch, the times become stable again,
>> though not as fast as the no-loop-back case: 53-72 seconds.
>>
>> There may be room for further improvement as the total overhead still
>> seems too high, but this is a big improvement.
>>
>> Signed-off-by: NeilBrown <[email protected]>
>> ---
>> drivers/block/loop.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
>> index 0ecb6461ed81..a7e1dd215fc2 100644
>> --- a/drivers/block/loop.c
>> +++ b/drivers/block/loop.c
>> @@ -1694,8 +1694,11 @@ static void loop_queue_work(struct kthread_work *work)
>> {
>> struct loop_cmd *cmd =
>> container_of(work, struct loop_cmd, work);
>> + int oldflags = current->flags & PF_LESS_THROTTLE;
>>
>> + current->flags |= PF_LESS_THROTTLE;
>> loop_handle_cmd(cmd);
>> + current->flags = (current->flags & ~PF_LESS_THROTTLE) | oldflags;
>> }
>
> You can do it against 'lo->worker_task' instead of doing it in each
> loop_queue_work(),
> and this flag needn't to be restored because the kernel thread is loop
> specialized.
>

good point. I'll do that. Thanks,
NeilBrown


Attachments:
signature.asc (832.00 B)

2017-04-05 04:34:00

by NeilBrown

[permalink] [raw]
Subject: [PATCH v2] loop: Add PF_LESS_THROTTLE to block/loop device thread.


When a filesystem is mounted from a loop device, writes are
throttled by balance_dirty_pages() twice: once when writing
to the filesystem and once when the loop_handle_cmd() writes
to the backing file. This double-throttling can trigger
positive feedback loops that create significant delays. The
throttling at the lower level is seen by the upper level as
a slow device, so it throttles extra hard.

The PF_LESS_THROTTLE flag was created to handle exactly this
circumstance, though with an NFS filesystem mounted from a
local NFS server. It reduces the throttling on the lower
layer so that it can proceed largely unthrottled.

To demonstrate this, create a filesystem on a loop device
and write (e.g. with dd) several large files which combine
to consume significantly more than the limit set by
/proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total
time taken.

When I do this directly on a device (no loop device) the
total time for several runs (mkfs, mount, write 200 files,
umount) is fairly stable: 28-35 seconds.
When I do this over a loop device the times are much worse
and less stable. 52-460 seconds. Half below 100seconds,
half above.
When I apply this patch, the times become stable again,
though not as fast as the no-loop-back case: 53-72 seconds.

There may be room for further improvement as the total overhead still
seems too high, but this is a big improvement.

Reviewed-by: Christoph Hellwig <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
---

I moved where the flag is set, thanks to suggestion from
Ming Lei.
I've preserved the *-by: tags I was offered despite the code
being different, as the concept is identical.

Thanks,
NeilBrown


drivers/block/loop.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0ecb6461ed81..44b3506fd086 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -852,6 +852,7 @@ static int loop_prepare_queue(struct loop_device *lo)
if (IS_ERR(lo->worker_task))
return -ENOMEM;
set_user_nice(lo->worker_task, MIN_NICE);
+ lo->worker_task->flags |= PF_LESS_THROTTLE;
return 0;
}

--
2.12.2


Attachments:
signature.asc (832.00 B)

2017-04-05 05:05:37

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH v2] loop: Add PF_LESS_THROTTLE to block/loop device thread.

On Wed, Apr 5, 2017 at 12:33 PM, NeilBrown <[email protected]> wrote:
>
> When a filesystem is mounted from a loop device, writes are
> throttled by balance_dirty_pages() twice: once when writing
> to the filesystem and once when the loop_handle_cmd() writes
> to the backing file. This double-throttling can trigger
> positive feedback loops that create significant delays. The
> throttling at the lower level is seen by the upper level as
> a slow device, so it throttles extra hard.
>
> The PF_LESS_THROTTLE flag was created to handle exactly this
> circumstance, though with an NFS filesystem mounted from a
> local NFS server. It reduces the throttling on the lower
> layer so that it can proceed largely unthrottled.
>
> To demonstrate this, create a filesystem on a loop device
> and write (e.g. with dd) several large files which combine
> to consume significantly more than the limit set by
> /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total
> time taken.
>
> When I do this directly on a device (no loop device) the
> total time for several runs (mkfs, mount, write 200 files,
> umount) is fairly stable: 28-35 seconds.
> When I do this over a loop device the times are much worse
> and less stable. 52-460 seconds. Half below 100seconds,
> half above.
> When I apply this patch, the times become stable again,
> though not as fast as the no-loop-back case: 53-72 seconds.
>
> There may be room for further improvement as the total overhead still
> seems too high, but this is a big improvement.
>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Acked-by: Michal Hocko <[email protected]>
> Signed-off-by: NeilBrown <[email protected]>

Reviewed-by: Ming Lei <[email protected]>

> ---
>
> I moved where the flag is set, thanks to suggestion from
> Ming Lei.
> I've preserved the *-by: tags I was offered despite the code
> being different, as the concept is identical.
>
> Thanks,
> NeilBrown
>
>
> drivers/block/loop.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 0ecb6461ed81..44b3506fd086 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -852,6 +852,7 @@ static int loop_prepare_queue(struct loop_device *lo)
> if (IS_ERR(lo->worker_task))
> return -ENOMEM;
> set_user_nice(lo->worker_task, MIN_NICE);
> + lo->worker_task->flags |= PF_LESS_THROTTLE;
> return 0;
> }
>
> --
> 2.12.2
>

2017-04-05 05:13:27

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread.

On Wed, Apr 5, 2017 at 12:27 PM, NeilBrown <[email protected]> wrote:
> On Tue, Apr 04 2017, Christoph Hellwig wrote:
>
>> Looks fine,
>>
>> Reviewed-by: Christoph Hellwig <[email protected]>
>>
>> But if you actually care about performance in any way I'd suggest
>> to use the loop device in direct I/O mode..
>
> The losetup on my test VM is too old to support that :-(
> I guess it might be time to upgraded.
>
> It seems that there is not "mount -o direct_loop" or similar, so you
> have to do the losetup and the mount separately. Any thoughts on

I guess the 'direct_loop' can be added into 'mount' directly? but not familiar
with mount utility.

> whether that should be changed ?

There was sysfs interface for controling direct IO in the initial
submission, but
looks it was reviewed out, :-)


Thanks,
Ming Lei

2017-04-05 07:19:32

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2] loop: Add PF_LESS_THROTTLE to block/loop device thread.

On Wed 05-04-17 14:33:50, NeilBrown wrote:
>
> When a filesystem is mounted from a loop device, writes are
> throttled by balance_dirty_pages() twice: once when writing
> to the filesystem and once when the loop_handle_cmd() writes
> to the backing file. This double-throttling can trigger
> positive feedback loops that create significant delays. The
> throttling at the lower level is seen by the upper level as
> a slow device, so it throttles extra hard.
>
> The PF_LESS_THROTTLE flag was created to handle exactly this
> circumstance, though with an NFS filesystem mounted from a
> local NFS server. It reduces the throttling on the lower
> layer so that it can proceed largely unthrottled.
>
> To demonstrate this, create a filesystem on a loop device
> and write (e.g. with dd) several large files which combine
> to consume significantly more than the limit set by
> /proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total
> time taken.
>
> When I do this directly on a device (no loop device) the
> total time for several runs (mkfs, mount, write 200 files,
> umount) is fairly stable: 28-35 seconds.
> When I do this over a loop device the times are much worse
> and less stable. 52-460 seconds. Half below 100seconds,
> half above.
> When I apply this patch, the times become stable again,
> though not as fast as the no-loop-back case: 53-72 seconds.
>
> There may be room for further improvement as the total overhead still
> seems too high, but this is a big improvement.
>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Acked-by: Michal Hocko <[email protected]>
> Signed-off-by: NeilBrown <[email protected]>
> ---
>
> I moved where the flag is set, thanks to suggestion from
> Ming Lei.
> I've preserved the *-by: tags I was offered despite the code
> being different, as the concept is identical.
>
> Thanks,
> NeilBrown
>
>
> drivers/block/loop.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 0ecb6461ed81..44b3506fd086 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -852,6 +852,7 @@ static int loop_prepare_queue(struct loop_device *lo)
> if (IS_ERR(lo->worker_task))
> return -ENOMEM;
> set_user_nice(lo->worker_task, MIN_NICE);
> + lo->worker_task->flags |= PF_LESS_THROTTLE;
> return 0;

As mentioned elsewhere, PF flags should be updated only on the current
task otherwise there is potential rmw race. Is this safe? The code runs
concurrently with the worker thread.


--
Michal Hocko
SUSE Labs

2017-04-05 07:32:38

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2] loop: Add PF_LESS_THROTTLE to block/loop device thread.

On Wed 05-04-17 09:19:27, Michal Hocko wrote:
> On Wed 05-04-17 14:33:50, NeilBrown wrote:
[...]
> > diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> > index 0ecb6461ed81..44b3506fd086 100644
> > --- a/drivers/block/loop.c
> > +++ b/drivers/block/loop.c
> > @@ -852,6 +852,7 @@ static int loop_prepare_queue(struct loop_device *lo)
> > if (IS_ERR(lo->worker_task))
> > return -ENOMEM;
> > set_user_nice(lo->worker_task, MIN_NICE);
> > + lo->worker_task->flags |= PF_LESS_THROTTLE;
> > return 0;
>
> As mentioned elsewhere, PF flags should be updated only on the current
> task otherwise there is potential rmw race. Is this safe? The code runs
> concurrently with the worker thread.

I believe you need something like this instead
---
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index f347285c67ec..07b2a909e4fb 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -844,10 +844,16 @@ static void loop_unprepare_queue(struct loop_device *lo)
kthread_stop(lo->worker_task);
}

+int loop_kthread_worker_fn(void *worker_ptr)
+{
+ current->flags |= PF_LESS_THROTTLE;
+ return kthread_worker_fn(worker_ptr);
+}
+
static int loop_prepare_queue(struct loop_device *lo)
{
kthread_init_worker(&lo->worker);
- lo->worker_task = kthread_run(kthread_worker_fn,
+ lo->worker_task = kthread_run(loop_kthread_worker_fn,
&lo->worker, "loop%d", lo->lo_number);
if (IS_ERR(lo->worker_task))
return -ENOMEM;
--
Michal Hocko
SUSE Labs

2017-04-06 02:24:56

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH v2] loop: Add PF_LESS_THROTTLE to block/loop device thread.

On Wed, Apr 05 2017, Michal Hocko wrote:

> On Wed 05-04-17 09:19:27, Michal Hocko wrote:
>> On Wed 05-04-17 14:33:50, NeilBrown wrote:
> [...]
>> > diff --git a/drivers/block/loop.c b/drivers/block/loop.c
>> > index 0ecb6461ed81..44b3506fd086 100644
>> > --- a/drivers/block/loop.c
>> > +++ b/drivers/block/loop.c
>> > @@ -852,6 +852,7 @@ static int loop_prepare_queue(struct loop_device *lo)
>> > if (IS_ERR(lo->worker_task))
>> > return -ENOMEM;
>> > set_user_nice(lo->worker_task, MIN_NICE);
>> > + lo->worker_task->flags |= PF_LESS_THROTTLE;
>> > return 0;
>>
>> As mentioned elsewhere, PF flags should be updated only on the current
>> task otherwise there is potential rmw race. Is this safe? The code runs
>> concurrently with the worker thread.
>
> I believe you need something like this instead
> ---
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index f347285c67ec..07b2a909e4fb 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -844,10 +844,16 @@ static void loop_unprepare_queue(struct loop_device *lo)
> kthread_stop(lo->worker_task);
> }
>
> +int loop_kthread_worker_fn(void *worker_ptr)
> +{
> + current->flags |= PF_LESS_THROTTLE;
> + return kthread_worker_fn(worker_ptr);
> +}
> +
> static int loop_prepare_queue(struct loop_device *lo)
> {
> kthread_init_worker(&lo->worker);
> - lo->worker_task = kthread_run(kthread_worker_fn,
> + lo->worker_task = kthread_run(loop_kthread_worker_fn,
> &lo->worker, "loop%d", lo->lo_number);
> if (IS_ERR(lo->worker_task))
> return -ENOMEM;

Arg - of course.
How about we just split the kthread_create from the wake_up?

Thanks,
NeilBrown


From: NeilBrown <[email protected]>
Subject: [PATCH] loop: Add PF_LESS_THROTTLE to block/loop device thread.

When a filesystem is mounted from a loop device, writes are
throttled by balance_dirty_pages() twice: once when writing
to the filesystem and once when the loop_handle_cmd() writes
to the backing file. This double-throttling can trigger
positive feedback loops that create significant delays. The
throttling at the lower level is seen by the upper level as
a slow device, so it throttles extra hard.

The PF_LESS_THROTTLE flag was created to handle exactly this
circumstance, though with an NFS filesystem mounted from a
local NFS server. It reduces the throttling on the lower
layer so that it can proceed largely unthrottled.

To demonstrate this, create a filesystem on a loop device
and write (e.g. with dd) several large files which combine
to consume significantly more than the limit set by
/proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total
time taken.

When I do this directly on a device (no loop device) the
total time for several runs (mkfs, mount, write 200 files,
umount) is fairly stable: 28-35 seconds.
When I do this over a loop device the times are much worse
and less stable. 52-460 seconds. Half below 100seconds,
half above.
When I apply this patch, the times become stable again,
though not as fast as the no-loop-back case: 53-72 seconds.

There may be room for further improvement as the total overhead still
seems too high, but this is a big improvement.

Signed-off-by: NeilBrown <[email protected]>
---
drivers/block/loop.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0ecb6461ed81..95679d988725 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -847,10 +847,12 @@ static void loop_unprepare_queue(struct loop_device *lo)
static int loop_prepare_queue(struct loop_device *lo)
{
kthread_init_worker(&lo->worker);
- lo->worker_task = kthread_run(kthread_worker_fn,
+ lo->worker_task = kthread_create(kthread_worker_fn,
&lo->worker, "loop%d", lo->lo_number);
if (IS_ERR(lo->worker_task))
return -ENOMEM;
+ lo->worker_task->flags |= PF_LESS_THROTTLE;
+ wake_up_process(lo->worker_task);
set_user_nice(lo->worker_task, MIN_NICE);
return 0;
}
--
2.12.2


Attachments:
signature.asc (832.00 B)

2017-04-06 06:53:39

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2] loop: Add PF_LESS_THROTTLE to block/loop device thread.

On Thu 06-04-17 12:23:51, NeilBrown wrote:
[...]
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 0ecb6461ed81..95679d988725 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -847,10 +847,12 @@ static void loop_unprepare_queue(struct loop_device *lo)
> static int loop_prepare_queue(struct loop_device *lo)
> {
> kthread_init_worker(&lo->worker);
> - lo->worker_task = kthread_run(kthread_worker_fn,
> + lo->worker_task = kthread_create(kthread_worker_fn,
> &lo->worker, "loop%d", lo->lo_number);
> if (IS_ERR(lo->worker_task))
> return -ENOMEM;
> + lo->worker_task->flags |= PF_LESS_THROTTLE;
> + wake_up_process(lo->worker_task);
> set_user_nice(lo->worker_task, MIN_NICE);
> return 0;

This should work for the current implementation because kthread_create
will return only after the full initialization has been done. No idea
whether we can rely on that in future. I also think it would be cleaner
to set the flag on current and keep the current semantic that only
current changes its flags.

So while I do not have a strong opinion on this I think defining loop
specific thread function which set PF_LESS_THROTTLE as the first thing
is more elegant and less error prone longerm. A short comment explaining
why we use the flag there would be also preferred.

I will leave the decision to you.

Thanks.

--
Michal Hocko
SUSE Labs

2017-04-06 23:48:23

by NeilBrown

[permalink] [raw]
Subject: [PATCH v3] loop: Add PF_LESS_THROTTLE to block/loop device thread.


When a filesystem is mounted from a loop device, writes are
throttled by balance_dirty_pages() twice: once when writing
to the filesystem and once when the loop_handle_cmd() writes
to the backing file. This double-throttling can trigger
positive feedback loops that create significant delays. The
throttling at the lower level is seen by the upper level as
a slow device, so it throttles extra hard.

The PF_LESS_THROTTLE flag was created to handle exactly this
circumstance, though with an NFS filesystem mounted from a
local NFS server. It reduces the throttling on the lower
layer so that it can proceed largely unthrottled.

To demonstrate this, create a filesystem on a loop device
and write (e.g. with dd) several large files which combine
to consume significantly more than the limit set by
/proc/sys/vm/dirty_ratio or dirty_bytes. Measure the total
time taken.

When I do this directly on a device (no loop device) the
total time for several runs (mkfs, mount, write 200 files,
umount) is fairly stable: 28-35 seconds.
When I do this over a loop device the times are much worse
and less stable. 52-460 seconds. Half below 100seconds,
half above.
When I apply this patch, the times become stable again,
though not as fast as the no-loop-back case: 53-72 seconds.

There may be room for further improvement as the total overhead still
seems too high, but this is a big improvement.

Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Suggested-by: Michal Hocko <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
---

Hi Jens,
I think this version meets with everyone's approval.

Thanks,
NeilBrown


drivers/block/loop.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0ecb6461ed81..035b8651b8bf 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -844,10 +844,16 @@ static void loop_unprepare_queue(struct loop_device *lo)
kthread_stop(lo->worker_task);
}

+static int loop_kthread_worker_fn(void *worker_ptr)
+{
+ current->flags |= PF_LESS_THROTTLE;
+ return kthread_worker_fn(worker_ptr);
+}
+
static int loop_prepare_queue(struct loop_device *lo)
{
kthread_init_worker(&lo->worker);
- lo->worker_task = kthread_run(kthread_worker_fn,
+ lo->worker_task = kthread_run(loop_kthread_worker_fn,
&lo->worker, "loop%d", lo->lo_number);
if (IS_ERR(lo->worker_task))
return -ENOMEM;
--
2.12.2


Attachments:
signature.asc (832.00 B)