2015-09-23 13:49:31

by Dexuan Cui

[permalink] [raw]
Subject: ext4: performance regression introduced by the cgroup writeback support

Hi all,
Since some point between July and Sep, I have been suffered from a strange "very slow write" issue and on Sep 9 I reported it to LKML (but got no reply): https://lkml.org/lkml/2015/9/9/290

The issue is: under high CPU and disk I/O pressure, *some* processes can suffer from a very slow write speed (e.g., <1MB/s or even only 20KB/s), while the normal write speed should be at least dozens of MB/s.

I think I identified the commit which introduced the regression:
ext4: implement cgroup writeback support (https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=001e4a8775f6e8ad52a89e0072f09aee47d5d252)

This commit is already in the mainline tree, so I can reproduce the issue there too:
With the latest mainline, I can reproduce the issue; after I revert the patch, I can't reproduce the issue.

When the issue happens:
1. the read speed is pretty normal, e.g.. it's still >100MB/s.
2. 'top' shows both the 'user' and 'sys' utilization is about 0%, but the IO-wait is always about 100%.
3. 'iotop' shows the read speed is 0 (this is correct because there is indeed no read request) and the write speed is pretty slow (the average is <1MB/s or even 20KB/s).
4. when the issue happens, sometimes any new process suffers from the slow write issue, but sometimes it looks not all the new processes suffers from the issue.
5. The " WARNING: CPU: 7 PID: 6782 at fs/inode.c:390 ihold+0x30/0x40() " in my Sep-9 mail may be another different issue.
6. To reproduce the issue, I need to run my workload for enough long time (see the below).

My workload is simple: I just repeatedly build the kernel source ("make clean; make -j16"). My kernel config is attached FYI.

I can reproduce the issue on a physical machine: e.g., in my kernel building test with my .config, it took only ~5 minutes in the first 176 runs, but since the 177th run, it could take from 10 hours to 5 minutes - very unstable.

It looks it's easier to reproduce the issue in a Hyper-V VM: usually I can reproduce the issue within the first 10 or 20 runs.

Any idea?

Thanks,
-- Dexuan


Attachments:
kernel-config.txt.gz (45.10 kB)
kernel-config.txt.gz

2015-09-23 16:13:59

by Chris Mason

[permalink] [raw]
Subject: Re: ext4: performance regression introduced by the cgroup writeback support

On Wed, Sep 23, 2015 at 01:49:31PM +0000, Dexuan Cui wrote:
> Hi all,
> Since some point between July and Sep, I have been suffered from a strange "very slow write" issue and on Sep 9 I reported it to LKML (but got no reply): https://lkml.org/lkml/2015/9/9/290
>
> The issue is: under high CPU and disk I/O pressure, *some* processes can suffer from a very slow write speed (e.g., <1MB/s or even only 20KB/s), while the normal write speed should be at least dozens of MB/s.
>
> I think I identified the commit which introduced the regression:
> ext4: implement cgroup writeback support (https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=001e4a8775f6e8ad52a89e0072f09aee47d5d252)
>
> This commit is already in the mainline tree, so I can reproduce the issue there too:
> With the latest mainline, I can reproduce the issue; after I revert the patch, I can't reproduce the issue.
>
> When the issue happens:
> 1. the read speed is pretty normal, e.g.. it's still >100MB/s.
> 2. 'top' shows both the 'user' and 'sys' utilization is about 0%, but the IO-wait is always about 100%.
> 3. 'iotop' shows the read speed is 0 (this is correct because there is indeed no read request) and the write speed is pretty slow (the average is <1MB/s or even 20KB/s).
> 4. when the issue happens, sometimes any new process suffers from the slow write issue, but sometimes it looks not all the new processes suffers from the issue.
> 5. The " WARNING: CPU: 7 PID: 6782 at fs/inode.c:390 ihold+0x30/0x40() " in my Sep-9 mail may be another different issue.
> 6. To reproduce the issue, I need to run my workload for enough long time (see the below).
>
> My workload is simple: I just repeatedly build the kernel source ("make clean; make -j16"). My kernel config is attached FYI.
>
> I can reproduce the issue on a physical machine: e.g., in my kernel building test with my .config, it took only ~5 minutes in the first 176 runs, but since the 177th run, it could take from 10 hours to 5 minutes - very unstable.
>
> It looks it's easier to reproduce the issue in a Hyper-V VM: usually I can reproduce the issue within the first 10 or 20 runs.
>
> Any idea?

Are you using cgroups? That patch really shouldn't impact load unless
there are actual IO controls in place.

-chris

2015-09-23 18:53:37

by Tejun Heo

[permalink] [raw]
Subject: Re: ext4: performance regression introduced by the cgroup writeback support

On Wed, Sep 23, 2015 at 12:13:59PM -0400, Chris Mason wrote:
> > The issue is: under high CPU and disk I/O pressure, *some* processes can suffer from a very slow write speed (e.g., <1MB/s or even only 20KB/s), while the normal write speed should be at least dozens of MB/s.

So, I think I know what caused this regression. Separate wb domains
shouldn't have been enabled on traditional hierarchies. It doesn't
work there and leads to multiple wb domains competing on the same
blkcg and the bw estimation would go completely haywire. Will update
soon.

Thanks.

--
tejun

2015-09-24 00:12:30

by Dexuan Cui

[permalink] [raw]
Subject: RE: ext4: performance regression introduced by the cgroup writeback support

> -----Original Message-----
> From: Chris Mason [mailto:[email protected]]
> Sent: Thursday, September 24, 2015 0:14
> To: Dexuan Cui <[email protected]>
> Cc: Theodore Ts'o <[email protected]>; Andreas Dilger <[email protected]>;
> Tejun Heo <[email protected]>; [email protected]; linux-
> [email protected]; [email protected]
> Subject: Re: ext4: performance regression introduced by the cgroup writeback
> support
>
> On Wed, Sep 23, 2015 at 01:49:31PM +0000, Dexuan Cui wrote:
> > Hi all,
> > Since some point between July and Sep, I have been suffered from a strange
> "very slow write" issue and on Sep 9 I reported it to LKML (but got no reply):
> https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2flkml.org%
> 2flkml%2f2015%2f9%2f9%2f290&data=01%7c01%7cdecui%40064d.mgd.micros
> oft.com%7c8001aa10249f41a0363608d2c432042d%7c72f988bf86f141af91ab2
> d7cd011db47%7c1&sdata=oJBsP55jdg86TNt2X71s0gfPlwbMTzaJN9QIcsXsSmA%
> 3d
> >
> > The issue is: under high CPU and disk I/O pressure, *some* processes can
> suffer from a very slow write speed (e.g., <1MB/s or even only 20KB/s), while
> the normal write speed should be at least dozens of MB/s.
> >
> > I think I identified the commit which introduced the regression:
> > ext4: implement cgroup writeback support
> (https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgit.kernel.
> org%2fcgit%2flinux%2fkernel%2fgit%2fnext%2flinux-
> next.git%2fcommit%2f%3fid%3d001e4a8775f6e8ad52a89e0072f09aee47d5d25
> 2&data=01%7c01%7cdecui%40064d.mgd.microsoft.com%7c8001aa10249f41a0
> 363608d2c432042d%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=QIcX
> R%2flZMqkK2afIxV%2fYxZDug26vj5yx%2bkoh6ugJB2A%3d)
> >
> > This commit is already in the mainline tree, so I can reproduce the issue there
> too:
> > With the latest mainline, I can reproduce the issue; after I revert the patch, I
> can't reproduce the issue.
> >
> > When the issue happens:
> > 1. the read speed is pretty normal, e.g.. it's still >100MB/s.
> > 2. 'top' shows both the 'user' and 'sys' utilization is about 0%, but the IO-wait is
> always about 100%.
> > 3. 'iotop' shows the read speed is 0 (this is correct because there is indeed no
> read request) and the write speed is pretty slow (the average is <1MB/s or even
> 20KB/s).
> > 4. when the issue happens, sometimes any new process suffers from the slow
> write issue, but sometimes it looks not all the new processes suffers from the
> issue.
> > 5. The " WARNING: CPU: 7 PID: 6782 at fs/inode.c:390 ihold+0x30/0x40() " in
> my Sep-9 mail may be another different issue.
> > 6. To reproduce the issue, I need to run my workload for enough long time
> (see the below).
> >
> > My workload is simple: I just repeatedly build the kernel source ("make clean;
> make -j16"). My kernel config is attached FYI.
> >
> > I can reproduce the issue on a physical machine: e.g., in my kernel building test
> with my .config, it took only ~5 minutes in the first 176 runs, but since the 177th
> run, it could take from 10 hours to 5 minutes - very unstable.
> >
> > It looks it's easier to reproduce the issue in a Hyper-V VM: usually I can
> reproduce the issue within the first 10 or 20 runs.
> >
> > Any idea?
>
> Are you using cgroups? That patch really shouldn't impact load unless
> there are actual IO controls in place.
>
> -chris

I'm not using cgroups here.

Tejun just now found the root cause: "Separate wb domains
shouldn't have been enabled on traditional hierarchies " and supplied a fix.

Thanks,
-- Dexuan

2015-09-24 00:15:37

by Dexuan Cui

[permalink] [raw]
Subject: RE: ext4: performance regression introduced by the cgroup writeback support

> -----Original Message-----
> From: Tejun Heo [mailto:[email protected]] On Behalf Of Tejun Heo
> Sent: Thursday, September 24, 2015 2:54
> To: Chris Mason <[email protected]>; Dexuan Cui <[email protected]>;
> Theodore Ts'o <[email protected]>; Andreas Dilger <[email protected]>;
> [email protected]; [email protected]; linux-
> [email protected]
> Subject: Re: ext4: performance regression introduced by the cgroup writeback
> support
>
> On Wed, Sep 23, 2015 at 12:13:59PM -0400, Chris Mason wrote:
> > > The issue is: under high CPU and disk I/O pressure, *some* processes can
> suffer from a very slow write speed (e.g., <1MB/s or even only 20KB/s), while
> the normal write speed should be at least dozens of MB/s.
>
> So, I think I know what caused this regression. Separate wb domains
> shouldn't have been enabled on traditional hierarchies. It doesn't
> work there and leads to multiple wb domains competing on the same
> blkcg and the bw estimation would go completely haywire. Will update
> soon.
>
> Thanks.
>
> --
> tejun

Thanks a lot for the quick fix, Tejun!

I'll test the fix.
I'll report back in case it can't fix the issue --I think this is unlikely. :-)

-- Dexuan

2015-09-24 07:26:20

by Dexuan Cui

[permalink] [raw]
Subject: RE: ext4: performance regression introduced by the cgroup writeback support

> From: Dexuan Cui
> Sent: Thursday, September 24, 2015 8:16
> To: 'Tejun Heo' <[email protected]>; Chris Mason <[email protected]>; Theodore Ts'o
> <[email protected]>; Andreas Dilger <[email protected]>; linux-
> [email protected]; [email protected]; linux-
> [email protected]
> Subject: RE: ext4: performance regression introduced by the cgroup writeback
> support
>
> > -----Original Message-----
> > From: Tejun Heo [mailto:[email protected]] On Behalf Of Tejun Heo
> > Sent: Thursday, September 24, 2015 2:54
> > To: Chris Mason <[email protected]>; Dexuan Cui <[email protected]>;
> > Theodore Ts'o <[email protected]>; Andreas Dilger <[email protected]>;
> > [email protected]; [email protected]; linux-
> > [email protected]
> > Subject: Re: ext4: performance regression introduced by the cgroup writeback
> > support
> >
> > On Wed, Sep 23, 2015 at 12:13:59PM -0400, Chris Mason wrote:
> > > > The issue is: under high CPU and disk I/O pressure, *some* processes can
> > suffer from a very slow write speed (e.g., <1MB/s or even only 20KB/s), while
> > the normal write speed should be at least dozens of MB/s.
> >
> > So, I think I know what caused this regression. Separate wb domains
> > shouldn't have been enabled on traditional hierarchies. It doesn't
> > work there and leads to multiple wb domains competing on the same
> > blkcg and the bw estimation would go completely haywire. Will update
> > soon.
> >
> > Thanks.
> >
> > --
> > tejun
>
> Thanks a lot for the quick fix, Tejun!
>
> I'll test the fix.
> I'll report back in case it can't fix the issue --I think this is unlikely. :-)
>
> -- Dexuan

Hi Tejun,
Thank you!
I believe your patch fixes my issue, according to my test.

-- Dexuan