On Tue, Mar 15, 2016 at 8:55 AM, Jakob Unterwurzacher
<[email protected]> wrote:
> Just for anybody finding this thread: This still happens in v4.4, it
> just took longer to trigger.
>
> I have posted more details to linux-kernel (copy-pasted below),
> http://thread.gmane.org/gmane.linux.kernel/2132944
Okay, so you can reproduce this relatively quickly. Can you try "git
bisect" to find exactly which commit is responsible?
Thanks,
Miklos
>
> -------- copy of the email to linux-kernel -------------------
>
> 2016-01-22 21:10:59
>
> I have noticed an annoying regression that was introduced in 4.2 and is
> still there in 4.4. mmap writes to FUSE filesystems are throttled down
> to basically zero.
>
> Reproducer: https://github.com/rfjakob/mmapwrite , testing against encfs:
>
> $ mmapwrite /tmp/encfs-mnt/foo
> 1 .................................................. 107.01 MB/s
> 2 .................................................. 101.98 MB/s
> [...]
> 68 .................................................. 106.79 MB/s
> 69 .................................................. 105.09 MB/s
> 70 .................................................. 2.02 MB/s
> 71 .................................................. 1.77 MB/s
> 72 .................................................. 0.42 MB/s
> 73 .................................... (hangs)
>
> I have tested kernels from 4.0 and this seems to have been introduced in
> 4.2:
>
> 4.0 ....... 140MB/s permanent
> 4.1 ....... 140MB/s permanent
> 4.2 ....... 100MB/s at the start, sudden slowdown to 1MB/s after ~5GB
> 4.3 ....... 100MB/s at the start, sudden slowdown to 1MB/s after ~1.5GB
> 4.4-rc4 ... 100MB/s at the start, slowly ramps down, 0.3MB/s after ~2GB
> 4.4 ....... 100MB/s at the start, sudden slowdown after ~3GB
>
> Is there a way to disable the throttling? Or at least exempt FUSE until
> there is a proper fix?
>
> Thanks,
> Jakob
>
>
>
>
> On 17.12.2015 00:26, Jakob Unterwurzacher wrote:
>> This seems to be fixed in v4.4-rc5-18-gedb42dc. mmap writes now proceed
>> at solid 100MB/s with full CPU saturation.
>>
>> Thanks,
>> Jakob
>>
>> On Mon, Dec 14, 2015 at 9:06 AM, Jakob Unterwurzacher
>> <[email protected] <mailto:[email protected]>> wrote:
>>
>> I am the developer of https://github.com/rfjakob/gocryptfs (an
>> encrypted overlay filesystem like EncFS)
>> and have ported xfstests over for regression testing
>> ( https://github.com/rfjakob/fuse-xfstests ).
>>
>> xfstests generic/074 is how I noticed that mmap write performance
>> plummeted when Fedora upgraded my kernel to 4.2.5.
>> It used to complete in 10 minutes and now it will probably take days.
>> I am on kernel 4.4-rc1 now and still seeing the same issue.
>>
>> It looks like the kernel at some point the kernel writeback
>> mechanism gets stuck (or throttles?)
>>
>> Testing against encfs:
>>
>> ./mmapwrite /tmp/e2/foo #
>> https://github.com/rfjakob/mmapwrite .
>> ....................................................................................................
>> 98.91 MB/s
>> ....................................................................................................
>> 93.74 MB/s
>> ....................................................................................................
>> 103.89 MB/s
>> ....................................................................................................
>> 100.20 MB/s
>> ....................................................................................................
>> 104.03 MB/s
>> ....................................................................................................
>> 98.06 MB/s
>> ....................................................................................................
>> 10.17 MB/s
>> ....................................................................................................
>> 9.50 MB/s
>> .................................. (hangs)
>>
>> At this point no write requests are submitted to encfs and the
>> mmapwrite process is
>> stuck in the kernel in balance_dirty_pages.isra.22.
>>
>> Bisecting this will be a pain, I would appreciate any suggestions.
>>
>> Note that this seems to affect every FUSE filesystem, also ntfs-3g.
>>
>> Best regards,
>> Jakob
>>
>>
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
> --
> fuse-devel mailing list
> To unsubscribe or subscribe, visit https://lists.sourceforge.net/lists/listinfo/fuse-devel
On 16.03.2016 10:44, Miklos Szeredi wrote:
> On Tue, Mar 15, 2016 at 8:55 AM, Jakob Unterwurzacher
> <[email protected]> wrote:
>> Just for anybody finding this thread: This still happens in v4.4, it
>> just took longer to trigger.
>>
>> I have posted more details to linux-kernel (copy-pasted below),
>> http://thread.gmane.org/gmane.linux.kernel/2132944
>
> Okay, so you can reproduce this relatively quickly. Can you try "git
> bisect" to find exactly which commit is responsible?
>
> Thanks,
> Miklos
That took a while, but it looks like it got it:
> commit 947e9762a8ddefda38aa21e249e6a4fec215cd12
> Author: Tejun Heo <[email protected]>
> Date: Fri May 22 18:23:32 2015 -0400
>
> writeback: update wb_over_bg_thresh() to use wb_domain aware operations
Note that this commens seems to only activate changes that happened in the
commit before, aa661bb:
> commit aa661bbe1e61ce80ca4ae98804f673ede94b0827
> Author: Tejun Heo <[email protected]>
> Date: Fri May 22 18:23:31 2015 -0400
>
> writeback: move over_bground_thresh() to mm/page-writeback.c
Anyway, I can reliably reboot between aa661bb and 947e976 and always get the
same results:
aa661bb passes
947e976 fails
I you want to reproduce, clone https://github.com/rfjakob/mmapwrite.git and
run ./encfs-test.sh (needs encfs installed). On the bad kernel, it will hang
within a few seconds.
Thanks,
Jakob
On Sat, Mar 26, 2016 at 10:42 PM, Jakob Unterwurzacher
<[email protected]> wrote:
> On 16.03.2016 10:44, Miklos Szeredi wrote:
>> On Tue, Mar 15, 2016 at 8:55 AM, Jakob Unterwurzacher
>> <[email protected]> wrote:
>>> Just for anybody finding this thread: This still happens in v4.4, it
>>> just took longer to trigger.
>>>
>>> I have posted more details to linux-kernel (copy-pasted below),
>>> http://thread.gmane.org/gmane.linux.kernel/2132944
>>
>> Okay, so you can reproduce this relatively quickly. Can you try "git
>> bisect" to find exactly which commit is responsible?
>>
>> Thanks,
>> Miklos
>
> That took a while, but it looks like it got it:
>
>> commit 947e9762a8ddefda38aa21e249e6a4fec215cd12
>> Author: Tejun Heo <[email protected]>
>> Date: Fri May 22 18:23:32 2015 -0400
>>
>> writeback: update wb_over_bg_thresh() to use wb_domain aware operations
>
Tejun,
Any idea why this commit stalls fuse mmap writes?
Here's the start of the this thread:
http://marc.info/?l=fuse-devel&m=145008058603261&w=2
Thanks,
Miklos
> Note that this commens seems to only activate changes that happened in the
> commit before, aa661bb:
>
>> commit aa661bbe1e61ce80ca4ae98804f673ede94b0827
>> Author: Tejun Heo <[email protected]>
>> Date: Fri May 22 18:23:31 2015 -0400
>>
>> writeback: move over_bground_thresh() to mm/page-writeback.c
>
>
> Anyway, I can reliably reboot between aa661bb and 947e976 and always get the
> same results:
>
> aa661bb passes
> 947e976 fails
>
> I you want to reproduce, clone https://github.com/rfjakob/mmapwrite.git and
> run ./encfs-test.sh (needs encfs installed). On the bad kernel, it will hang
> within a few seconds.
>
> Thanks,
> Jakob
Hello,
On Mon, Mar 28, 2016 at 09:45:53PM +0200, Miklos Szeredi wrote:
> >> commit 947e9762a8ddefda38aa21e249e6a4fec215cd12
> >> Author: Tejun Heo <[email protected]>
> >> Date: Fri May 22 18:23:32 2015 -0400
> >>
> >> writeback: update wb_over_bg_thresh() to use wb_domain aware operations
> >
>
> Tejun,
>
> Any idea why this commit stalls fuse mmap writes?
>
> Here's the start of the this thread:
>
> http://marc.info/?l=fuse-devel&m=145008058603261&w=2
Hmmm... cgroup writeback support shouldn't affect fuse at all as the
backing device doesn't enable cgroup support. I probably made some
silly mistake. Is there a simple reproducer I can play with?
Thanks.
--
tejun
On Wed, Mar 30, 2016 at 8:47 PM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Mon, Mar 28, 2016 at 09:45:53PM +0200, Miklos Szeredi wrote:
>> >> commit 947e9762a8ddefda38aa21e249e6a4fec215cd12
>> >> Author: Tejun Heo <[email protected]>
>> >> Date: Fri May 22 18:23:32 2015 -0400
>> >>
>> >> writeback: update wb_over_bg_thresh() to use wb_domain aware operations
>> >
>>
>> Tejun,
>>
>> Any idea why this commit stalls fuse mmap writes?
>>
>> Here's the start of the this thread:
>>
>> http://marc.info/?l=fuse-devel&m=145008058603261&w=2
>
> Hmmm... cgroup writeback support shouldn't affect fuse at all as the
> backing device doesn't enable cgroup support. I probably made some
> silly mistake. Is there a simple reproducer I can play with?
>
> Thanks.
>
To quote Jakob from a previous email:
"I you want to reproduce, clone https://github.com/rfjakob/mmapwrite.git and
run ./encfs-test.sh (needs encfs installed). On the bad kernel, it will hang
within a few seconds."
Hope this helps.
- Sedat -
On 30.03.2016 20:47, Tejun Heo wrote:
> Hmmm... cgroup writeback support shouldn't affect fuse at all as the
> backing device doesn't enable cgroup support. I probably made some
> silly mistake. Is there a simple reproducer I can play with?
Hi Tejun! A simple reproducer is at https://github.com/rfjakob/mmapwrite .
What seems to be happening in the kernel is that the estimated device bandwith
drops to zero. I'm not even sure how this works for FUSE, but that's what I
gathered from some printk debugging.
What I also found is that once mmapwrite is hung, you can unblock it for some
time by running something like
cat /dev/zero > /var/tmp/foo
mmapwrite will then steam ahead as long as cat is writing, even though encfs
writes to /tmp (tmpfs) and /var is on the ext4 disk.
Note that the hang happens regardless of the backing device, on both tmpfs and
ext4.
Best regards,
Jakob
Hello,
On Mon, Apr 11, 2016 at 10:04:42AM +0200, Jakob Unterwurzacher wrote:
> What seems to be happening in the kernel is that the estimated device bandwith
> drops to zero. I'm not even sure how this works for FUSE, but that's what I
> gathered from some printk debugging.
Yeah, writeback bw getting messed up is the most likely cause. Prolly
some silly bug. I can reproduce the problem. Looking into it.
Thanks.
--
tejun
On Tue, Apr 12, 2016 at 5:54 AM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Mon, Apr 11, 2016 at 10:04:42AM +0200, Jakob Unterwurzacher wrote:
>> What seems to be happening in the kernel is that the estimated device bandwith
>> drops to zero. I'm not even sure how this works for FUSE, but that's what I
>> gathered from some printk debugging.
>
> Yeah, writeback bw getting messed up is the most likely cause. Prolly
> some silly bug. I can reproduce the problem. Looking into it.
Probably you want to look into:
https://lkml.org/lkml/2016/3/10/21
The patch mentioned above solves the issue for me.
Thanks,
Ashish
>
> Thanks.
>
> --
> tejun
Hello,
On Tue, Apr 12, 2016 at 02:54:08PM +0530, Ashish Sangwan wrote:
> On Tue, Apr 12, 2016 at 5:54 AM, Tejun Heo <[email protected]> wrote:
> > Hello,
> >
> > On Mon, Apr 11, 2016 at 10:04:42AM +0200, Jakob Unterwurzacher wrote:
> >> What seems to be happening in the kernel is that the estimated device bandwith
> >> drops to zero. I'm not even sure how this works for FUSE, but that's what I
> >> gathered from some printk debugging.
> >
> > Yeah, writeback bw getting messed up is the most likely cause. Prolly
> > some silly bug. I can reproduce the problem. Looking into it.
>
> Probably you want to look into:
> https://lkml.org/lkml/2016/3/10/21
>
> The patch mentioned above solves the issue for me.
Heh, I tracked it down to wb_over_bg_thresh() and fell asleep. Yeah,
that is the right fix.
Thanks.
--
tejun
On Tue, Apr 12, 2016 at 1:09 PM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Tue, Apr 12, 2016 at 02:54:08PM +0530, Ashish Sangwan wrote:
>> On Tue, Apr 12, 2016 at 5:54 AM, Tejun Heo <[email protected]> wrote:
>> > Hello,
>> >
>> > On Mon, Apr 11, 2016 at 10:04:42AM +0200, Jakob Unterwurzacher wrote:
>> >> What seems to be happening in the kernel is that the estimated device bandwith
>> >> drops to zero. I'm not even sure how this works for FUSE, but that's what I
>> >> gathered from some printk debugging.
>> >
>> > Yeah, writeback bw getting messed up is the most likely cause. Prolly
>> > some silly bug. I can reproduce the problem. Looking into it.
>>
>> Probably you want to look into:
>> https://lkml.org/lkml/2016/3/10/21
>>
>> The patch mentioned above solves the issue for me.
>
> Heh, I tracked it down to wb_over_bg_thresh() and fell asleep. Yeah,
> that is the right fix.
>
Feel free to add my...
Tested-by: Sedat Dilek <[email protected]>
Patch available from [1].
- sed@ -
[1] https://patchwork.kernel.org/patch/8554181/
On 12.04.2016 13:09, Tejun Heo wrote:
>>
>> Probably you want to look into:
>> https://lkml.org/lkml/2016/3/10/21
>>
>> The patch mentioned above solves the issue for me.
>
> Heh, I tracked it down to wb_over_bg_thresh() and fell asleep. Yeah,
> that is the right fix.
Works wonderfully now, thanks to everybody involved. Is it too late for 4.6?
Best regards,
Jakob
On Mon, Apr 18, 2016 at 5:06 PM, Jakob Unterwurzacher
<[email protected]> wrote:
> On 12.04.2016 13:09, Tejun Heo wrote:
>>>
>>> Probably you want to look into:
>>> https://lkml.org/lkml/2016/3/10/21
>>>
>>> The patch mentioned above solves the issue for me.
>>
>> Heh, I tracked it down to wb_over_bg_thresh() and fell asleep. Yeah,
>> that is the right fix.
>
> Works wonderfully now, thanks to everybody involved. Is it too late for 4.6?
>
> Best regards,
> Jakob
>
Jakob, et. al.
You're welcome. That performance problem stumped me for a couple of
weeks until tracked it down and submitted the fix.
Howard
On Wed, Apr 20, 2016 at 3:35 AM, Howard Cochran
<[email protected]> wrote:
> On Mon, Apr 18, 2016 at 5:06 PM, Jakob Unterwurzacher
> <[email protected]> wrote:
>> On 12.04.2016 13:09, Tejun Heo wrote:
>>>>
>>>> Probably you want to look into:
>>>> https://lkml.org/lkml/2016/3/10/21
>>>>
>>>> The patch mentioned above solves the issue for me.
>>>
>>> Heh, I tracked it down to wb_over_bg_thresh() and fell asleep. Yeah,
>>> that is the right fix.
>>
>> Works wonderfully now, thanks to everybody involved. Is it too late for 4.6?
>>
>> Best regards,
>> Jakob
>>
> Jakob, et. al.
>
> You're welcome. That performance problem stumped me for a couple of
> weeks until tracked it down and submitted the fix.
>
What has happened to "writeback: Fix performance regression in
wb_over_bg_thresh()"?
I checked Linux v4.6-rc5, it is not included.
Is it in another Git tree? If yes, where?
- sed@ -