LinuxLists.cc - kswapd continuously active

2010-01-22 23:52:15

Subject: kswapd continuously active

Hi,

with 2.6.32.2 on sparc64 I am seeing that there is a sync(1) process
busy in D state, with the following trace:

sync D 000000000079299c 7552 4851 1 0x208061101000004
Call Trace:
[000000000053ca58] bdi_sched_wait+0xc/0x1c
[000000000079299c] __wait_on_bit+0x58/0xb8
[0000000000792a5c] out_of_line_wait_on_bit+0x60/0x74
[000000000053ca3c] bdi_sync_writeback+0x6c/0x7c
[000000000053ca78] sync_inodes_sb+0x10/0xfc
[0000000000540dd0] __sync_filesystem+0x50/0x88
[0000000000540ec8] sync_filesystems+0xc0/0x124
[0000000000540f80] sys_sync+0x1c/0x48
[0000000000406294] linux_sparc_syscall+0x34/0x44

kswapd is also active all the time, writing something to disk - LED is
blinking, and that's been going on for over half an hour despite the box
being not busy. How do I see what kswapd is still flushing to disk? Even
if all RAM (8 GB) was filled with dirty data, syncing it out would not
take that long - that is to say, the sync process should have long
exited.

2010-01-25 13:06:30

by Jens Axboe

[permalink] [raw]

Subject: Re: kswapd continuously active

On Sat, Jan 23 2010, Jan Engelhardt wrote:
> Hi,
>
>
> with 2.6.32.2 on sparc64 I am seeing that there is a sync(1) process
> busy in D state, with the following trace:
>
> sync D 000000000079299c 7552 4851 1 0x208061101000004
> Call Trace:
> [000000000053ca58] bdi_sched_wait+0xc/0x1c
> [000000000079299c] __wait_on_bit+0x58/0xb8
> [0000000000792a5c] out_of_line_wait_on_bit+0x60/0x74
> [000000000053ca3c] bdi_sync_writeback+0x6c/0x7c
> [000000000053ca78] sync_inodes_sb+0x10/0xfc
> [0000000000540dd0] __sync_filesystem+0x50/0x88
> [0000000000540ec8] sync_filesystems+0xc0/0x124
> [0000000000540f80] sys_sync+0x1c/0x48
> [0000000000406294] linux_sparc_syscall+0x34/0x44
>
> kswapd is also active all the time, writing something to disk - LED is
> blinking, and that's been going on for over half an hour despite the box
> being not busy. How do I see what kswapd is still flushing to disk? Even
> if all RAM (8 GB) was filled with dirty data, syncing it out would not
> take that long - that is to say, the sync process should have long
> exited.

That doesn't sound good. What does /proc/meminfo say? What file systems
are you using?

--
Jens Axboe

2010-01-25 13:22:41

by Jan Engelhardt

[permalink] [raw]

Subject: Re: kswapd continuously active

On Monday 2010-01-25 14:06, Jens Axboe wrote:
>>
>> with 2.6.32.2 on sparc64 I am seeing that there is a sync(1) process
>> busy in D state, with the following trace:
>>
>> sync D 000000000079299c 7552 4851 1 0x208061101000004
>> Call Trace:
>> [000000000053ca58] bdi_sched_wait+0xc/0x1c
>> [000000000079299c] __wait_on_bit+0x58/0xb8
>> [0000000000792a5c] out_of_line_wait_on_bit+0x60/0x74
>> [000000000053ca3c] bdi_sync_writeback+0x6c/0x7c
>> [000000000053ca78] sync_inodes_sb+0x10/0xfc
>> [0000000000540dd0] __sync_filesystem+0x50/0x88
>> [0000000000540ec8] sync_filesystems+0xc0/0x124
>> [0000000000540f80] sys_sync+0x1c/0x48
>> [0000000000406294] linux_sparc_syscall+0x34/0x44
>>
>> kswapd is also active all the time, writing something to disk - LED is
>> blinking, and that's been going on for over half an hour despite the box
>> being not busy. How do I see what kswapd is still flushing to disk? Even
>> if all RAM (8 GB) was filled with dirty data, syncing it out would not
>> take that long - that is to say, the sync process should have long
>> exited.
>
>That doesn't sound good. What does /proc/meminfo say? What file systems
>are you using?

Eventually that day, the sync finished; not sure what triggered that.
Usually, meminfo looks like when the box is doing something:

14:08 ares:~ > cat /proc/meminfo
MemTotal: 8166752 kB
MemFree: 3243552 kB
Buffers: 207968 kB
Cached: 2728216 kB
SwapCached: 0 kB
Active: 2203136 kB
Inactive: 2152544 kB
Active(anon): 1167256 kB
Inactive(anon): 252952 kB
Active(file): 1035880 kB
Inactive(file): 1899592 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 141624 kB
Writeback: 0 kB
AnonPages: 1421448 kB
Mapped: 49904 kB
Shmem: 680 kB
Slab: 429784 kB
SReclaimable: 315760 kB
SUnreclaim: 114024 kB
KernelStack: 9248 kB
PageTables: 6120 kB
Quicklists: 10560 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 4083376 kB
Committed_AS: 1626280 kB
VmallocTotal: 1069547520 kB
VmallocUsed: 28816 kB
VmallocChunk: 1069518664 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 4096 kB
14:16 ares:~ > w
14:17:13 up 6:26, 2 users, load average: 21.09, 24.41, 22.49

This is used as a compile box, and there are lots of files created
and deleted when there is work - a new chroot for every package
basically, with a total of about 300K such "floating" files.
Filesystem is ext4.

/dev/sda4 / ext4 rw,relatime,barrier=1,data=ordered 0 0

Now that I see barrier=1, maybe I should turn that off in the future
like I did with XFS around 2.6.17[1].

[1] http://lkml.indiana.edu/hypermail/linux/kernel/0605.2/1110.html

2010-02-05 12:35:15

by Jan Engelhardt

[permalink] [raw]

Subject: Re: kswapd continuously active

On Monday 2010-01-25 14:22, Jan Engelhardt wrote:
>On Monday 2010-01-25 14:06, Jens Axboe wrote:
>>>
>>> with 2.6.32.2 on sparc64 I am seeing that there is a sync(1) process
>>> busy in D state, with the following trace:
>>>
>>> sync D 000000000079299c 7552 4851 1 0x208061101000004
>>> Call Trace:
>>> [000000000053ca58] bdi_sched_wait+0xc/0x1c[...]
>>> [000000000053ca78] sync_inodes_sb+0x10/0xfc
>>>
>>> kswapd is also active all the time, writing something to disk[...]
>>
>>That doesn't sound good. What does /proc/meminfo say? What file systems
>>are you using?

>January 25 Feb-05
>MemTotal: 8166752 kB 8166752
>MemFree: 3243552 kB 3781776
>Buffers: 207968 kB 4912
>Cached: 2728216 kB 2684400
>SwapCached: 0 kB 0
>Active: 2203136 kB 495624
>Inactive: 2152544 kB 3263136
>Active(anon): 1167256 kB 488168
>Inactive(anon): 252952 kB 583912
>Active(file): 1035880 kB 7456
>Inactive(file): 1899592 kB 2679224
>Unevictable: 0 kB 0
>Mlocked: 0 kB 0
>SwapTotal: 0 kB 0
>SwapFree: 0 kB 0
>Dirty: 141624 kB 2662184
>Writeback: 0 kB ..

Today this happened again. So I looked at /proc/meminfo to paste today's
values next to those from January. That is when I noticed the "Dirty"
value - and thus I ran

watch -d -n 1 'grep Dirty /proc/meminfo'

What I see is that the dirty amount - a sync is currently running -
only decreases with at most 400 KB/sec, often less than that.

2010-02-05 13:00:06

by Jens Axboe

[permalink] [raw]

Subject: Re: kswapd continuously active

On Fri, Feb 05 2010, Jan Engelhardt wrote:
>
> On Monday 2010-01-25 14:22, Jan Engelhardt wrote:
> >On Monday 2010-01-25 14:06, Jens Axboe wrote:
> >>>
> >>> with 2.6.32.2 on sparc64 I am seeing that there is a sync(1) process
> >>> busy in D state, with the following trace:
> >>>
> >>> sync D 000000000079299c 7552 4851 1 0x208061101000004
> >>> Call Trace:
> >>> [000000000053ca58] bdi_sched_wait+0xc/0x1c[...]
> >>> [000000000053ca78] sync_inodes_sb+0x10/0xfc
> >>>
> >>> kswapd is also active all the time, writing something to disk[...]
> >>
> >>That doesn't sound good. What does /proc/meminfo say? What file systems
> >>are you using?
>
> >January 25 Feb-05
> >MemTotal: 8166752 kB 8166752
> >MemFree: 3243552 kB 3781776
> >Buffers: 207968 kB 4912
> >Cached: 2728216 kB 2684400
> >SwapCached: 0 kB 0
> >Active: 2203136 kB 495624
> >Inactive: 2152544 kB 3263136
> >Active(anon): 1167256 kB 488168
> >Inactive(anon): 252952 kB 583912
> >Active(file): 1035880 kB 7456
> >Inactive(file): 1899592 kB 2679224
> >Unevictable: 0 kB 0
> >Mlocked: 0 kB 0
> >SwapTotal: 0 kB 0
> >SwapFree: 0 kB 0
> >Dirty: 141624 kB 2662184
> >Writeback: 0 kB ..
>
> Today this happened again. So I looked at /proc/meminfo to paste today's
> values next to those from January. That is when I noticed the "Dirty"
> value - and thus I ran
>
> watch -d -n 1 'grep Dirty /proc/meminfo'
>
> What I see is that the dirty amount - a sync is currently running -
> only decreases with at most 400 KB/sec, often less than that.

I'm guessing the barriers and commits are what is killing your
performance. What happens with barrier=0?

--
Jens Axboe

2010-02-05 13:09:06

by Jan Engelhardt

[permalink] [raw]

Subject: Re: kswapd continuously active

On Friday 2010-02-05 14:00, Jens Axboe wrote:
>> >January 25 Feb-05
>> >MemTotal: 8166752 kB 8166752
>> >MemFree: 3243552 kB 3781776
>> >Buffers: 207968 kB 4912
>> >Cached: 2728216 kB 2684400
>> >SwapCached: 0 kB 0
>> >Active: 2203136 kB 495624
>> >Inactive: 2152544 kB 3263136
>> >Active(anon): 1167256 kB 488168
>> >Inactive(anon): 252952 kB 583912
>> >Active(file): 1035880 kB 7456
>> >Inactive(file): 1899592 kB 2679224
>> >Unevictable: 0 kB 0
>> >Mlocked: 0 kB 0
>> >SwapTotal: 0 kB 0
>> >SwapFree: 0 kB 0
>> >Dirty: 141624 kB 2662184
>> >Writeback: 0 kB ..
>>
>> Today this happened again. So I looked at /proc/meminfo to paste today's
>> values next to those from January. That is when I noticed the "Dirty"
>> value - and thus I ran
>>
>> watch -d -n 1 'grep Dirty /proc/meminfo'
>>
>> What I see is that the dirty amount - a sync is currently running -
>> only decreases with at most 400 KB/sec, often less than that.
>
>I'm guessing the barriers and commits are what is killing your
>performance. What happens with barrier=0?

The ext4 filesystem is already mounted with barrier=0. If there
is any block-level barriers I also can turn off, what would be
the command?

thanks,
Jan

2010-02-05 13:12:23

by Jens Axboe

[permalink] [raw]

Subject: Re: kswapd continuously active

On Fri, Feb 05 2010, Jan Engelhardt wrote:
>
> On Friday 2010-02-05 14:00, Jens Axboe wrote:
> >> >January 25 Feb-05
> >> >MemTotal: 8166752 kB 8166752
> >> >MemFree: 3243552 kB 3781776
> >> >Buffers: 207968 kB 4912
> >> >Cached: 2728216 kB 2684400
> >> >SwapCached: 0 kB 0
> >> >Active: 2203136 kB 495624
> >> >Inactive: 2152544 kB 3263136
> >> >Active(anon): 1167256 kB 488168
> >> >Inactive(anon): 252952 kB 583912
> >> >Active(file): 1035880 kB 7456
> >> >Inactive(file): 1899592 kB 2679224
> >> >Unevictable: 0 kB 0
> >> >Mlocked: 0 kB 0
> >> >SwapTotal: 0 kB 0
> >> >SwapFree: 0 kB 0
> >> >Dirty: 141624 kB 2662184
> >> >Writeback: 0 kB ..
> >>
> >> Today this happened again. So I looked at /proc/meminfo to paste today's
> >> values next to those from January. That is when I noticed the "Dirty"
> >> value - and thus I ran
> >>
> >> watch -d -n 1 'grep Dirty /proc/meminfo'
> >>
> >> What I see is that the dirty amount - a sync is currently running -
> >> only decreases with at most 400 KB/sec, often less than that.
> >
> >I'm guessing the barriers and commits are what is killing your
> >performance. What happens with barrier=0?
>
> The ext4 filesystem is already mounted with barrier=0. If there
> is any block-level barriers I also can turn off, what would be
> the command?

barrier=0 is enough. I do wonder why your writeback rate is that slow,
then. The disk has write back caching enabled?

--
Jens Axboe

2010-02-05 13:24:11

by Jan Engelhardt

[permalink] [raw]

Subject: Re: kswapd continuously active

On Friday 2010-02-05 14:12, Jens Axboe wrote:
>> >> Today this happened again. So I looked at /proc/meminfo to paste today's
>> >> values next to those from January. That is when I noticed the "Dirty"
>> >> value - and thus I ran
>> >>
>> >> watch -d -n 1 'grep Dirty /proc/meminfo'
>> >>
>> >> What I see is that the dirty amount - a sync is currently running -
>> >> only decreases with at most 400 KB/sec, often less than that.
>> >
>> >I'm guessing the barriers and commits are what is killing your
>> >performance. What happens with barrier=0?
>>
>> The ext4 filesystem is already mounted with barrier=0. If there
>> is any block-level barriers I also can turn off, what would be
>> the command?
>
>barrier=0 is enough. I do wonder why your writeback rate is that slow,
>then. The disk has write back caching enabled?

Yes, that seems to be the case at least. In fact the box is all fluffy
when nobody runs sync(1), which is what makes it so strange.

sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA

2010-02-07 10:50:42

by Jan Engelhardt

[permalink] [raw]

Subject: Re: kswapd continuously active

On Friday 2010-02-05 14:24, Jan Engelhardt wrote:
>>> The ext4 filesystem is already mounted with barrier=0. If there
>>> is any block-level barriers I also can turn off, what would be
>>> the command?
>>
>>barrier=0 is enough. I do wonder why your writeback rate is that slow,
>>then. The disk has write back caching enabled?
>
>Yes, that seems to be the case at least. In fact the box is all fluffy
>when nobody runs sync(1), which is what makes it so strange.
>
>sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
>support DPO or FUA

Here is an alternate trace that just happened.

INFO: task flush-8:0:343 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
flush-8:0 D 0000000000555930 6664 343 2 0x18000000000
Call Trace:
[00000000005555f4] start_this_handle+0x324/0x4b0
[0000000000555930] jbd2_journal_start+0x94/0xc0
[000000000052ddb8] ext4_da_writepages+0x1e0/0x460
[000000000049bf30] do_writepages+0x28/0x48
[00000000004e6d58] writeback_single_inode+0xf0/0x330
[00000000004e7b24] writeback_inodes_wb+0x4c8/0x5d8
[00000000004e7da4] wb_writeback+0x170/0x1ec
[00000000004e8074] wb_do_writeback+0x188/0x1a4
[00000000004e80b8] bdi_writeback_task+0x28/0xa0
[00000000004a76c8] bdi_start_fn+0x64/0xc4
[0000000000475f84] kthread+0x58/0x6c
[000000000042ade0] kernel_thread+0x30/0x48
[0000000000475ee0] kthreadd+0xb8/0x104

2010-02-08 00:07:44

by Jan Engelhardt

[permalink] [raw]

Subject: Re: kswapd continuously active

On Sunday 2010-02-07 11:50, Jan Engelhardt wrote:
>On Friday 2010-02-05 14:24, Jan Engelhardt wrote:
>>>> The ext4 filesystem is already mounted with barrier=0. If there
>>>> is any block-level barriers I also can turn off, what would be
>>>> the command?
>>>
>>>barrier=0 is enough. I do wonder why your writeback rate is that slow,
>>>then. The disk has write back caching enabled?
>>
>>Yes, that seems to be the case at least. In fact the box is all fluffy
>>when nobody runs sync(1), which is what makes it so strange.
>>
>>sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
>>support DPO or FUA
>
>Here is an alternate trace that just happened.
>
>INFO: task flush-8:0:343 blocked for more than 120 seconds.
>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>flush-8:0 D 0000000000555930 6664 343 2 0x18000000000
>Call Trace:
> [00000000005555f4] start_this_handle+0x324/0x4b0
> [0000000000555930] jbd2_journal_start+0x94/0xc0
> [000000000052ddb8] ext4_da_writepages+0x1e0/0x460
> [000000000049bf30] do_writepages+0x28/0x48
> [00000000004e6d58] writeback_single_inode+0xf0/0x330
> [00000000004e7b24] writeback_inodes_wb+0x4c8/0x5d8
> [00000000004e7da4] wb_writeback+0x170/0x1ec
> [00000000004e8074] wb_do_writeback+0x188/0x1a4
> [00000000004e80b8] bdi_writeback_task+0x28/0xa0
> [00000000004a76c8] bdi_start_fn+0x64/0xc4
> [0000000000475f84] kthread+0x58/0x6c
> [000000000042ade0] kernel_thread+0x30/0x48
> [0000000000475ee0] kthreadd+0xb8/0x104

Could it be that there is something synchronize_rcu()-like in the
game that — as a result of how RCU works — just takes ages with 24
VCPUs?

2010-02-08 14:08:40

by Jens Axboe

[permalink] [raw]

Subject: Re: kswapd continuously active

On Mon, Feb 08 2010, Jan Engelhardt wrote:
>
> On Sunday 2010-02-07 11:50, Jan Engelhardt wrote:
> >On Friday 2010-02-05 14:24, Jan Engelhardt wrote:
> >>>> The ext4 filesystem is already mounted with barrier=0. If there
> >>>> is any block-level barriers I also can turn off, what would be
> >>>> the command?
> >>>
> >>>barrier=0 is enough. I do wonder why your writeback rate is that slow,
> >>>then. The disk has write back caching enabled?
> >>
> >>Yes, that seems to be the case at least. In fact the box is all fluffy
> >>when nobody runs sync(1), which is what makes it so strange.
> >>
> >>sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
> >>support DPO or FUA
> >
> >Here is an alternate trace that just happened.
> >
> >INFO: task flush-8:0:343 blocked for more than 120 seconds.
> >"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >flush-8:0 D 0000000000555930 6664 343 2 0x18000000000
> >Call Trace:
> > [00000000005555f4] start_this_handle+0x324/0x4b0
> > [0000000000555930] jbd2_journal_start+0x94/0xc0
> > [000000000052ddb8] ext4_da_writepages+0x1e0/0x460
> > [000000000049bf30] do_writepages+0x28/0x48
> > [00000000004e6d58] writeback_single_inode+0xf0/0x330
> > [00000000004e7b24] writeback_inodes_wb+0x4c8/0x5d8
> > [00000000004e7da4] wb_writeback+0x170/0x1ec
> > [00000000004e8074] wb_do_writeback+0x188/0x1a4
> > [00000000004e80b8] bdi_writeback_task+0x28/0xa0
> > [00000000004a76c8] bdi_start_fn+0x64/0xc4
> > [0000000000475f84] kthread+0x58/0x6c
> > [000000000042ade0] kernel_thread+0x30/0x48
> > [0000000000475ee0] kthreadd+0xb8/0x104
>
> Could it be that there is something synchronize_rcu()-like in the
> game that???? as a result of how RCU works???? just takes ages with 24
> VCPUs?

The only synchronize_rcu() involved in the writeback code happens when a
bdi exits, so you should not hit that. It'll do call_rcu() for work
completions, but 1) you should not see a lot of work entries, and 2)
lots of other kernel code will do that, too.

Are you seeing a lot of CPU usage? What does eg perf top -a say?

And what setup is this, I didn't realize you were running a virtualized
setup?

--
Jens Axboe

2010-02-10 15:18:14

by Jan Engelhardt

[permalink] [raw]

Subject: Re: kswapd continuously active

Hi,

the perf top -a dumps:

Light fs activity (/bin/rm old build roots)
------------------------------------------------------------------------------
PerfTop: 4800 irqs/sec kernel:94.1% [100000 cycles], (all, 24 CPUs)
------------------------------------------------------------------------------
12352.00 - 17.7% : __lock_acquire
4975.00 - 7.1% : lock_release_holdtime
4793.00 - 6.9% : lock_release
3586.00 - 5.1% : lock_acquired
3544.00 - 5.1% : stick_get_tick
2245.00 - 3.2% : sched_clock
1938.00 - 2.8% : sched_clock_cpu
1851.00 - 2.7% : lock_acquire
1800.00 - 2.6% : tl0_linux64
1475.00 - 2.1% : ext4_mark_iloc_dirty
1041.00 - 1.5% : fget_light
1035.00 - 1.5% : find_busiest_group
977.00 - 1.4% : getnstimeofday
939.00 - 1.3% : do_sys_poll
926.00 - 1.3% : perf_poll

(Starting real work - file creation etc - now)

Calling sync a minute later when Dirty > 200MB in /proc/meminfo:
------------------------------------------------------------------------------
PerfTop: 28822 irqs/sec kernel:22.2% [100000 cycles], (all, 24 CPUs)
------------------------------------------------------------------------------
samples pcnt kernel function
_______ _____ _______________

22952.00 - 14.9% : __lock_acquire
9142.00 - 5.9% : lock_release_holdtime
8881.00 - 5.8% : stick_get_tick
8375.00 - 5.4% : lock_release
7401.00 - 4.8% : lock_acquired
5522.00 - 3.6% : tl0_linux64
4325.00 - 2.8% : sched_clock
3634.00 - 2.4% : ext4_rec_len_from_disk
3600.00 - 2.3% : sched_clock_cpu
3261.00 - 2.1% : lock_acquire
3012.00 - 2.0% : NGcopy_to_user
2988.00 - 1.9% : ext4_check_dir_entry
2909.00 - 1.9% : getnstimeofday
2010.00 - 1.3% : __delay
1915.00 - 1.2% : cpu_clock

Stopping all write processes so that sync can complete:

------------------------------------------------------------------------------
PerfTop: 2679 irqs/sec kernel:92.6% [100000 cycles], (all, 24 CPUs)
------------------------------------------------------------------------------
samples pcnt kernel function
_______ _____ _______________

6102.00 - 19.7% : __lock_acquire
2504.00 - 8.1% : lock_release_holdtime
2346.00 - 7.6% : lock_release
1685.00 - 5.5% : stick_get_tick
1532.00 - 5.0% : lock_acquired
1135.00 - 3.7% : sched_clock
1068.00 - 3.5% : find_busiest_group
1034.00 - 3.3% : sched_clock_cpu
1023.00 - 3.3% : tl0_linux64
832.00 - 2.7% : lock_acquire
658.00 - 2.1% : fget_light
656.00 - 2.1% : perf_poll
629.00 - 2.0% : journal_clean_one_cp_list
572.00 - 1.9% : find_next_bit
525.00 - 1.7% : getnstimeofday
------------------------------------------------------------------------------
PerfTop: 3182 irqs/sec kernel:90.5% [100000 cycles], (all, 24 CPUs)
------------------------------------------------------------------------------
samples pcnt kernel function
_______ _____ _______________

9076.00 - 19.9% : __lock_acquire
3702.00 - 8.1% : lock_release_holdtime
3420.00 - 7.5% : lock_release
2522.00 - 5.5% : stick_get_tick
2198.00 - 4.8% : lock_acquired
1700.00 - 3.7% : sched_clock
1539.00 - 3.4% : find_busiest_group
1507.00 - 3.3% : sched_clock_cpu
1402.00 - 3.1% : tl0_linux64
1277.00 - 2.8% : lock_acquire
1089.00 - 2.4% : journal_clean_one_cp_list
1085.00 - 2.4% : fget_light
1004.00 - 2.2% : perf_poll
860.00 - 1.9% : do_sys_poll
782.00 - 1.7% : find_next_bit

There's that journal_clean_one_cp_list thing ..

I have a gut feeling that sys_sync just waits until all data has been
flushed. That however won't fly, because there's always new dirty data
arriving when the box is not idle.

Of course I am not calling /bin/sync on purpose, but some software
packages do; rrdtool for example has an autoconf runtime
check for fsync(2), so it's inevitable that sync is called at some
point by something.

I also did

# perf top -a -k vmlinux-... -p $(that pid of jbd2-sda6)
------------------------------------------------------------------------------
PerfTop: 105 irqs/sec kernel:100.0% [100000 cycles], (target_pid: 574)
------------------------------------------------------------------------------
samples pcnt kernel function
_______ _____ _______________

42.00 - 12.8% : journal_clean_one_cp_list
34.00 - 10.4% : __lock_acquire
33.00 - 10.1% : __jbd2_journal_clean_checkpoint_list
28.00 - 8.5% : find_busiest_group
20.00 - 6.1% : clear_bit
16.00 - 4.9% : lock_release
16.00 - 4.9% : stick_get_tick
14.00 - 4.3% : test_and_set_bit
13.00 - 4.0% : lock_release_holdtime
12.00 - 3.7% : lock_acquired
9.00 - 2.7% : find_next_bit
7.00 - 2.1% : lock_acquire
7.00 - 2.1% : jbd2_journal_write_revoke_records
6.00 - 1.8% : jbd2_journal_switch_revoke_table
6.00 - 1.8% : cpu_clock

2010-02-10 15:23:48

by Jan Engelhardt

[permalink] [raw]

Subject: Re: kswapd continuously active

Hi,

>On Mon, Feb 08 2010, Jan Engelhardt wrote:
>
>> Could it be that there is something synchronize_rcu()-like in the
>> game that-- as a result of how RCU works-- just takes ages with 24
>> VCPUs?
>
>The only synchronize_rcu() involved in the writeback code happens when
>a bdi exits, so you should not hit that. It'll do call_rcu() for work
>completions, but 1) you should not see a lot of work entries, and 2)
>lots of other kernel code will do that, too.
>
>Are you seeing a lot of CPU usage? What does eg perf top -a say?

top - 16:19:42 up 1:02, 3 users, load average: 3.05, 3.87, 3.51
Cpu(s): 0.0%us, 0.2%sy, 0.0%ni, 98.2%id, 1.6%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8063800k total, 6676472k used, 1387328k free, 767040k buffers
Swap: 0k total, 0k used, 0k free, 4775464k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26354 root 20 0 3472 1704 1184 R 4 0.0 0:00.33 top
26198 jengelh 20 0 2864 1368 1152 S 4 0.0 0:03.72 watch
574 root 20 0 0 0 0 D 3 0.0 0:52.20 jbd2/sda6-8
312 root 20 0 0 0 0 R 1 0.0 0:29.29 flush-8:0
1 root 20 0 2408 936 808 S 0 0.0 0:01.13 init

When it comes down hard, i.e. when there is always new data arriving
while sys_sync is working, the loadavg is usually in the range of
1.0*CPUs up to 2.0*CPUs. The top dump shown here is after quiescing
writers.

>And what setup is this, I didn't realize you were running a virtualized
>setup?

It is real; but I usually use the term VCPU because this iron has
1 CPU with 6 cores with 24 threads, not 24 CPU (packages).