2010-01-25 11:19:13

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 14830] When other IO is running sync times go to 10 to 20 minutes

http://bugzilla.kernel.org/show_bug.cgi?id=14830





--- Comment #11 from Michael Godfrey <[email protected]> 2010-01-25 11:19:12 ---
I have attached the iostat.log and log/messages output.

sync was started after about 4 cycles of iostat.

This was run on the FC12 ext4 system. No testing can
be done on the production system.

After kill -9 of the sync run it took about 20 minutes before
it died.

Michael

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.


2010-01-27 13:38:07

by Andre Noll

[permalink] [raw]
Subject: Re: [Bug 14830] When other IO is running sync times go to 10 to 20 minutes

On 11:19, [email protected] wrote:
> After kill -9 of the sync run it took about 20 minutes before
> it died.

I was seeing similar behaviour on one of our servers, and changing
the io scheduler to noop fixed things for me. So it seems to be an
issue with cfq which is somehow triggered by ext4 but not by ext3.

To change the IO scheduler, just execute

echo noop > /sys/block/sda/queue/scheduler

(replace sda if necessary).

Just my 2 cents
Andre
--
The only person who always got his work done by Friday was Robinson Crusoe


Attachments:
(No filename) (549.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2010-01-27 19:44:05

by Andreas Dilger

[permalink] [raw]
Subject: Re: [Bug 14830] When other IO is running sync times go to 10 to 20 minutes

On 2010-01-27, at 06:06, Andre Noll wrote:
> On 11:19, [email protected] wrote:
>> After kill -9 of the sync run it took about 20 minutes before
>> it died.
>
> I was seeing similar behaviour on one of our servers, and changing
> the io scheduler to noop fixed things for me. So it seems to be an
> issue with cfq which is somehow triggered by ext4 but not by ext3.
>
> To change the IO scheduler, just execute
>
> echo noop > /sys/block/sda/queue/scheduler

Andre, could you please also test deadline instead of noop? In
general, deadline has nearly the same IO behaviour as noop, but still
allows simple request merging and is generally a better option than
noop.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2010-01-28 08:35:23

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Bug 14830] When other IO is running sync times go to 10 to 20 minutes

On Wed, Jan 27, 2010 at 02:06:25PM +0100, Andre Noll wrote:
> On 11:19, [email protected] wrote:
> > After kill -9 of the sync run it took about 20 minutes before
> > it died.
>
> I was seeing similar behaviour on one of our servers, and changing
> the io scheduler to noop fixed things for me. So it seems to be an
> issue with cfq which is somehow triggered by ext4 but not by ext3.
>
> To change the IO scheduler, just execute
>
> echo noop > /sys/block/sda/queue/scheduler
>
> (replace sda if necessary).

Andre or Michael. If switching away from cfq helps, that's
definitely... interesting. Given that cfq is the default scheduler, I
definitely want to understand what might be going on here. Are either
if you able to run blktrace so we can get a sense of what is going on
under the cfq and deadline/noop I/O schedulers?

And in both of your cases, were you using a new file system freshly
created using mke2fs -t ext4, or was this a ext2/ext3 filesystem that
was converted for use under ext4?

Thanks,

- Ted

2010-01-28 10:24:27

by Andre Noll

[permalink] [raw]
Subject: Re: [Bug 14830] When other IO is running sync times go to 10 to 20 minutes

On 12:43, Andreas Dilger wrote:
> On 2010-01-27, at 06:06, Andre Noll wrote:
> >On 11:19, [email protected] wrote:
> >>After kill -9 of the sync run it took about 20 minutes before
> >>it died.
> >
> >I was seeing similar behaviour on one of our servers, and changing
> >the io scheduler to noop fixed things for me. So it seems to be an
> >issue with cfq which is somehow triggered by ext4 but not by ext3.
> >
> >To change the IO scheduler, just execute
> >
> > echo noop > /sys/block/sda/queue/scheduler
>
> Andre, could you please also test deadline instead of noop?

Sure. I just switched to deadline and the system still feels responsive
while rsync is running. With cfq a simple "ls" command took ages
to complete. I'll let you know if the system becomes sluggish again
after a while.

Thanks
Andre
--
The only person who always got his work done by Friday was Robinson Crusoe


Attachments:
(No filename) (904.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2010-01-28 10:25:39

by Andre Noll

[permalink] [raw]
Subject: Re: [Bug 14830] When other IO is running sync times go to 10 to 20 minutes

On 02:53, [email protected] wrote:
> On Wed, Jan 27, 2010 at 02:06:25PM +0100, Andre Noll wrote:
> > On 11:19, [email protected] wrote:
> > > After kill -9 of the sync run it took about 20 minutes before
> > > it died.
> >
> > I was seeing similar behaviour on one of our servers, and changing
> > the io scheduler to noop fixed things for me. So it seems to be an
> > issue with cfq which is somehow triggered by ext4 but not by ext3.
> >
> > To change the IO scheduler, just execute
> >
> > echo noop > /sys/block/sda/queue/scheduler
> >
> > (replace sda if necessary).
>
> Andre or Michael. If switching away from cfq helps, that's
> definitely... interesting. Given that cfq is the default scheduler, I
> definitely want to understand what might be going on here. Are either
> if you able to run blktrace so we can get a sense of what is going on
> under the cfq and deadline/noop I/O schedulers?

Yes, I can use that machine freely for testing purposes, including
reboots. It is just our fallback server which creates hardlink-based
snapshots using rsync.

However, I have to recompile the kernel to include debugfs which is
needed by blktrace and I'd like to wait until the currently running
rsync completes before rebooting. Would you like to see the output of

btrace /dev/mapper/...

or should I use more sophisticated command line options?

> And in both of your cases, were you using a new file system freshly
> created using mke2fs -t ext4, or was this a ext2/ext3 filesystem that
> was converted for use under ext4?

The ext4 file system was created from scratch using -O
dir_index,uninit_bg,extent, a block size of 4096 and 32768 bytes
per inode.

Thanks
Andre
--
The only person who always got his work done by Friday was Robinson Crusoe


Attachments:
(No filename) (1.74 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments