2016-03-16 09:44:14

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [fuse-devel] Horrible mmap write performance (kernel writeback issue?)

On Tue, Mar 15, 2016 at 8:55 AM, Jakob Unterwurzacher
<[email protected]> wrote:
> Just for anybody finding this thread: This still happens in v4.4, it
> just took longer to trigger.
>
> I have posted more details to linux-kernel (copy-pasted below),
> http://thread.gmane.org/gmane.linux.kernel/2132944

Okay, so you can reproduce this relatively quickly. Can you try "git
bisect" to find exactly which commit is responsible?

Thanks,
Miklos

>
> -------- copy of the email to linux-kernel -------------------
>
> 2016-01-22 21:10:59
>
> I have noticed an annoying regression that was introduced in 4.2 and is
> still there in 4.4. mmap writes to FUSE filesystems are throttled down
> to basically zero.
>
> Reproducer: https://github.com/rfjakob/mmapwrite , testing against encfs:
>
> $ mmapwrite /tmp/encfs-mnt/foo
> 1 .................................................. 107.01 MB/s
> 2 .................................................. 101.98 MB/s
> [...]
> 68 .................................................. 106.79 MB/s
> 69 .................................................. 105.09 MB/s
> 70 .................................................. 2.02 MB/s
> 71 .................................................. 1.77 MB/s
> 72 .................................................. 0.42 MB/s
> 73 .................................... (hangs)
>
> I have tested kernels from 4.0 and this seems to have been introduced in
> 4.2:
>
> 4.0 ....... 140MB/s permanent
> 4.1 ....... 140MB/s permanent
> 4.2 ....... 100MB/s at the start, sudden slowdown to 1MB/s after ~5GB
> 4.3 ....... 100MB/s at the start, sudden slowdown to 1MB/s after ~1.5GB
> 4.4-rc4 ... 100MB/s at the start, slowly ramps down, 0.3MB/s after ~2GB
> 4.4 ....... 100MB/s at the start, sudden slowdown after ~3GB
>
> Is there a way to disable the throttling? Or at least exempt FUSE until
> there is a proper fix?
>
> Thanks,
> Jakob
>
>
>
>
> On 17.12.2015 00:26, Jakob Unterwurzacher wrote:
>> This seems to be fixed in v4.4-rc5-18-gedb42dc. mmap writes now proceed
>> at solid 100MB/s with full CPU saturation.
>>
>> Thanks,
>> Jakob
>>
>> On Mon, Dec 14, 2015 at 9:06 AM, Jakob Unterwurzacher
>> <[email protected] <mailto:[email protected]>> wrote:
>>
>> I am the developer of https://github.com/rfjakob/gocryptfs (an
>> encrypted overlay filesystem like EncFS)
>> and have ported xfstests over for regression testing
>> ( https://github.com/rfjakob/fuse-xfstests ).
>>
>> xfstests generic/074 is how I noticed that mmap write performance
>> plummeted when Fedora upgraded my kernel to 4.2.5.
>> It used to complete in 10 minutes and now it will probably take days.
>> I am on kernel 4.4-rc1 now and still seeing the same issue.
>>
>> It looks like the kernel at some point the kernel writeback
>> mechanism gets stuck (or throttles?)
>>
>> Testing against encfs:
>>
>> ./mmapwrite /tmp/e2/foo #
>> https://github.com/rfjakob/mmapwrite .
>> ....................................................................................................
>> 98.91 MB/s
>> ....................................................................................................
>> 93.74 MB/s
>> ....................................................................................................
>> 103.89 MB/s
>> ....................................................................................................
>> 100.20 MB/s
>> ....................................................................................................
>> 104.03 MB/s
>> ....................................................................................................
>> 98.06 MB/s
>> ....................................................................................................
>> 10.17 MB/s
>> ....................................................................................................
>> 9.50 MB/s
>> .................................. (hangs)
>>
>> At this point no write requests are submitted to encfs and the
>> mmapwrite process is
>> stuck in the kernel in balance_dirty_pages.isra.22.
>>
>> Bisecting this will be a pain, I would appreciate any suggestions.
>>
>> Note that this seems to affect every FUSE filesystem, also ntfs-3g.
>>
>> Best regards,
>> Jakob
>>
>>
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
> --
> fuse-devel mailing list
> To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel


2016-03-26 21:42:49

by Jakob Unterwurzacher

[permalink] [raw]
Subject: Re: [fuse-devel] Horrible mmap write performance (kernel writeback issue?)

On 16.03.2016 10:44, Miklos Szeredi wrote:
> On Tue, Mar 15, 2016 at 8:55 AM, Jakob Unterwurzacher
> <[email protected]> wrote:
>> Just for anybody finding this thread: This still happens in v4.4, it
>> just took longer to trigger.
>>
>> I have posted more details to linux-kernel (copy-pasted below),
>> http://thread.gmane.org/gmane.linux.kernel/2132944
>
> Okay, so you can reproduce this relatively quickly. Can you try "git
> bisect" to find exactly which commit is responsible?
>
> Thanks,
> Miklos

That took a while, but it looks like it got it:

> commit 947e9762a8ddefda38aa21e249e6a4fec215cd12
> Author: Tejun Heo <[email protected]>
> Date: Fri May 22 18:23:32 2015 -0400
>
> writeback: update wb_over_bg_thresh() to use wb_domain aware operations

Note that this commens seems to only activate changes that happened in the
commit before, aa661bb:

> commit aa661bbe1e61ce80ca4ae98804f673ede94b0827
> Author: Tejun Heo <[email protected]>
> Date: Fri May 22 18:23:31 2015 -0400
>
> writeback: move over_bground_thresh() to mm/page-writeback.c


Anyway, I can reliably reboot between aa661bb and 947e976 and always get the
same results:

aa661bb passes
947e976 fails

I you want to reproduce, clone https://github.com/rfjakob/mmapwrite.git and
run ./encfs-test.sh (needs encfs installed). On the bad kernel, it will hang
within a few seconds.

Thanks,
Jakob

2016-03-28 19:45:56

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [fuse-devel] Horrible mmap write performance (kernel writeback issue?)

On Sat, Mar 26, 2016 at 10:42 PM, Jakob Unterwurzacher
<[email protected]> wrote:
> On 16.03.2016 10:44, Miklos Szeredi wrote:
>> On Tue, Mar 15, 2016 at 8:55 AM, Jakob Unterwurzacher
>> <[email protected]> wrote:
>>> Just for anybody finding this thread: This still happens in v4.4, it
>>> just took longer to trigger.
>>>
>>> I have posted more details to linux-kernel (copy-pasted below),
>>> http://thread.gmane.org/gmane.linux.kernel/2132944
>>
>> Okay, so you can reproduce this relatively quickly. Can you try "git
>> bisect" to find exactly which commit is responsible?
>>
>> Thanks,
>> Miklos
>
> That took a while, but it looks like it got it:
>
>> commit 947e9762a8ddefda38aa21e249e6a4fec215cd12
>> Author: Tejun Heo <[email protected]>
>> Date: Fri May 22 18:23:32 2015 -0400
>>
>> writeback: update wb_over_bg_thresh() to use wb_domain aware operations
>

Tejun,

Any idea why this commit stalls fuse mmap writes?

Here's the start of the this thread:

http://marc.info/?l=fuse-devel&m=145008058603261&w=2

Thanks,
Miklos


> Note that this commens seems to only activate changes that happened in the
> commit before, aa661bb:
>
>> commit aa661bbe1e61ce80ca4ae98804f673ede94b0827
>> Author: Tejun Heo <[email protected]>
>> Date: Fri May 22 18:23:31 2015 -0400
>>
>> writeback: move over_bground_thresh() to mm/page-writeback.c
>
>
> Anyway, I can reliably reboot between aa661bb and 947e976 and always get the
> same results:
>
> aa661bb passes
> 947e976 fails
>
> I you want to reproduce, clone https://github.com/rfjakob/mmapwrite.git and
> run ./encfs-test.sh (needs encfs installed). On the bad kernel, it will hang
> within a few seconds.
>
> Thanks,
> Jakob

2016-03-30 18:47:37

by Tejun Heo

[permalink] [raw]
Subject: Re: [fuse-devel] Horrible mmap write performance (kernel writeback issue?)

Hello,

On Mon, Mar 28, 2016 at 09:45:53PM +0200, Miklos Szeredi wrote:
> >> commit 947e9762a8ddefda38aa21e249e6a4fec215cd12
> >> Author: Tejun Heo <[email protected]>
> >> Date: Fri May 22 18:23:32 2015 -0400
> >>
> >> writeback: update wb_over_bg_thresh() to use wb_domain aware operations
> >
>
> Tejun,
>
> Any idea why this commit stalls fuse mmap writes?
>
> Here's the start of the this thread:
>
> http://marc.info/?l=fuse-devel&m=145008058603261&w=2

Hmmm... cgroup writeback support shouldn't affect fuse at all as the
backing device doesn't enable cgroup support. I probably made some
silly mistake. Is there a simple reproducer I can play with?

Thanks.

--
tejun

2016-04-02 06:34:59

by Sedat Dilek

[permalink] [raw]
Subject: Re: [fuse-devel] Horrible mmap write performance (kernel writeback issue?)

On Wed, Mar 30, 2016 at 8:47 PM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Mon, Mar 28, 2016 at 09:45:53PM +0200, Miklos Szeredi wrote:
>> >> commit 947e9762a8ddefda38aa21e249e6a4fec215cd12
>> >> Author: Tejun Heo <[email protected]>
>> >> Date: Fri May 22 18:23:32 2015 -0400
>> >>
>> >> writeback: update wb_over_bg_thresh() to use wb_domain aware operations
>> >
>>
>> Tejun,
>>
>> Any idea why this commit stalls fuse mmap writes?
>>
>> Here's the start of the this thread:
>>
>> http://marc.info/?l=fuse-devel&m=145008058603261&w=2
>
> Hmmm... cgroup writeback support shouldn't affect fuse at all as the
> backing device doesn't enable cgroup support. I probably made some
> silly mistake. Is there a simple reproducer I can play with?
>
> Thanks.
>

To quote Jakob from a previous email:

"I you want to reproduce, clone https://github.com/rfjakob/mmapwrite.git and
run ./encfs-test.sh (needs encfs installed). On the bad kernel, it will hang
within a few seconds."

Hope this helps.

- Sedat -

2016-04-11 08:04:49

by Jakob Unterwurzacher

[permalink] [raw]
Subject: Re: [fuse-devel] Horrible mmap write performance (kernel writeback issue?)

On 30.03.2016 20:47, Tejun Heo wrote:
> Hmmm... cgroup writeback support shouldn't affect fuse at all as the
> backing device doesn't enable cgroup support. I probably made some
> silly mistake. Is there a simple reproducer I can play with?

Hi Tejun! A simple reproducer is at https://github.com/rfjakob/mmapwrite .

What seems to be happening in the kernel is that the estimated device bandwith
drops to zero. I'm not even sure how this works for FUSE, but that's what I
gathered from some printk debugging.

What I also found is that once mmapwrite is hung, you can unblock it for some
time by running something like

cat /dev/zero > /var/tmp/foo

mmapwrite will then steam ahead as long as cat is writing, even though encfs
writes to /tmp (tmpfs) and /var is on the ext4 disk.

Note that the hang happens regardless of the backing device, on both tmpfs and
ext4.

Best regards,
Jakob

2016-04-12 00:24:35

by Tejun Heo

[permalink] [raw]
Subject: Re: [fuse-devel] Horrible mmap write performance (kernel writeback issue?)

Hello,

On Mon, Apr 11, 2016 at 10:04:42AM +0200, Jakob Unterwurzacher wrote:
> What seems to be happening in the kernel is that the estimated device bandwith
> drops to zero. I'm not even sure how this works for FUSE, but that's what I
> gathered from some printk debugging.

Yeah, writeback bw getting messed up is the most likely cause. Prolly
some silly bug. I can reproduce the problem. Looking into it.

Thanks.

--
tejun

2016-04-12 09:24:12

by Ashish Sangwan

[permalink] [raw]
Subject: Re: [fuse-devel] Horrible mmap write performance (kernel writeback issue?)

On Tue, Apr 12, 2016 at 5:54 AM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Mon, Apr 11, 2016 at 10:04:42AM +0200, Jakob Unterwurzacher wrote:
>> What seems to be happening in the kernel is that the estimated device bandwith
>> drops to zero. I'm not even sure how this works for FUSE, but that's what I
>> gathered from some printk debugging.
>
> Yeah, writeback bw getting messed up is the most likely cause. Prolly
> some silly bug. I can reproduce the problem. Looking into it.

Probably you want to look into:
https://lkml.org/lkml/2016/3/10/21

The patch mentioned above solves the issue for me.

Thanks,
Ashish
>
> Thanks.
>
> --
> tejun

2016-04-12 11:09:09

by Tejun Heo

[permalink] [raw]
Subject: Re: [fuse-devel] Horrible mmap write performance (kernel writeback issue?)

Hello,

On Tue, Apr 12, 2016 at 02:54:08PM +0530, Ashish Sangwan wrote:
> On Tue, Apr 12, 2016 at 5:54 AM, Tejun Heo <[email protected]> wrote:
> > Hello,
> >
> > On Mon, Apr 11, 2016 at 10:04:42AM +0200, Jakob Unterwurzacher wrote:
> >> What seems to be happening in the kernel is that the estimated device bandwith
> >> drops to zero. I'm not even sure how this works for FUSE, but that's what I
> >> gathered from some printk debugging.
> >
> > Yeah, writeback bw getting messed up is the most likely cause. Prolly
> > some silly bug. I can reproduce the problem. Looking into it.
>
> Probably you want to look into:
> https://lkml.org/lkml/2016/3/10/21
>
> The patch mentioned above solves the issue for me.

Heh, I tracked it down to wb_over_bg_thresh() and fell asleep. Yeah,
that is the right fix.

Thanks.

--
tejun

2016-04-13 07:21:04

by Sedat Dilek

[permalink] [raw]
Subject: Re: [fuse-devel] Horrible mmap write performance (kernel writeback issue?)

On Tue, Apr 12, 2016 at 1:09 PM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Tue, Apr 12, 2016 at 02:54:08PM +0530, Ashish Sangwan wrote:
>> On Tue, Apr 12, 2016 at 5:54 AM, Tejun Heo <[email protected]> wrote:
>> > Hello,
>> >
>> > On Mon, Apr 11, 2016 at 10:04:42AM +0200, Jakob Unterwurzacher wrote:
>> >> What seems to be happening in the kernel is that the estimated device bandwith
>> >> drops to zero. I'm not even sure how this works for FUSE, but that's what I
>> >> gathered from some printk debugging.
>> >
>> > Yeah, writeback bw getting messed up is the most likely cause. Prolly
>> > some silly bug. I can reproduce the problem. Looking into it.
>>
>> Probably you want to look into:
>> https://lkml.org/lkml/2016/3/10/21
>>
>> The patch mentioned above solves the issue for me.
>
> Heh, I tracked it down to wb_over_bg_thresh() and fell asleep. Yeah,
> that is the right fix.
>

Feel free to add my...

Tested-by: Sedat Dilek <[email protected]>

Patch available from [1].

- sed@ -

[1] https://patchwork.kernel.org/patch/8554181/

2016-04-18 21:06:30

by Jakob Unterwurzacher

[permalink] [raw]
Subject: Re: [fuse-devel] Horrible mmap write performance (kernel writeback issue?)

On 12.04.2016 13:09, Tejun Heo wrote:
>>
>> Probably you want to look into:
>> https://lkml.org/lkml/2016/3/10/21
>>
>> The patch mentioned above solves the issue for me.
>
> Heh, I tracked it down to wb_over_bg_thresh() and fell asleep. Yeah,
> that is the right fix.

Works wonderfully now, thanks to everybody involved. Is it too late for 4.6?

Best regards,
Jakob

2016-04-20 01:35:41

by Howard Cochran

[permalink] [raw]
Subject: Re: [fuse-devel] Horrible mmap write performance (kernel writeback issue?)

On Mon, Apr 18, 2016 at 5:06 PM, Jakob Unterwurzacher
<[email protected]> wrote:
> On 12.04.2016 13:09, Tejun Heo wrote:
>>>
>>> Probably you want to look into:
>>> https://lkml.org/lkml/2016/3/10/21
>>>
>>> The patch mentioned above solves the issue for me.
>>
>> Heh, I tracked it down to wb_over_bg_thresh() and fell asleep. Yeah,
>> that is the right fix.
>
> Works wonderfully now, thanks to everybody involved. Is it too late for 4.6?
>
> Best regards,
> Jakob
>
Jakob, et. al.

You're welcome. That performance problem stumped me for a couple of
weeks until tracked it down and submitted the fix.

Howard

2016-04-25 08:07:53

by Sedat Dilek

[permalink] [raw]
Subject: Re: [fuse-devel] Horrible mmap write performance (kernel writeback issue?)

On Wed, Apr 20, 2016 at 3:35 AM, Howard Cochran
<[email protected]> wrote:
> On Mon, Apr 18, 2016 at 5:06 PM, Jakob Unterwurzacher
> <[email protected]> wrote:
>> On 12.04.2016 13:09, Tejun Heo wrote:
>>>>
>>>> Probably you want to look into:
>>>> https://lkml.org/lkml/2016/3/10/21
>>>>
>>>> The patch mentioned above solves the issue for me.
>>>
>>> Heh, I tracked it down to wb_over_bg_thresh() and fell asleep. Yeah,
>>> that is the right fix.
>>
>> Works wonderfully now, thanks to everybody involved. Is it too late for 4.6?
>>
>> Best regards,
>> Jakob
>>
> Jakob, et. al.
>
> You're welcome. That performance problem stumped me for a couple of
> weeks until tracked it down and submitted the fix.
>

What has happened to "writeback: Fix performance regression in
wb_over_bg_thresh()"?
I checked Linux v4.6-rc5, it is not included.
Is it in another Git tree? If yes, where?

- sed@ -