2008-12-31 21:43:48

by Dave Johnson

[permalink] [raw]
Subject: [ext3] kjournald writing after each read despite noatime,commit=nnn


Since upgrading from 2.6.23.1 to 2.6.27.8, I'm now seeing kjournald
perform a write 2 seconds after each read to an ext3 fs.

laptop mode is on and filesystem is mounted with noatime and a large
commit time (6 hours).

Did I miss something or is this expected?


Note, there is no other activity reported by block_dump at this time
from any other mounted filesystem.

Some examples:

Dec 31 15:43:14 gw kernel: w(26329): READ block 756266 on hda1
Dec 31 15:43:16 gw kernel: kjournald(760): WRITE block 2152 on hda1

Dec 31 15:45:05 gw kernel: top(26366): READ block 756008 on hda1
Dec 31 15:45:05 gw kernel: top(26366): READ block 756016 on hda1
Dec 31 15:45:05 gw kernel: top(26366): READ block 756026 on hda1
Dec 31 15:45:05 gw kernel: top(26366): READ block 756098 on hda1
Dec 31 15:45:07 gw kernel: kjournald(760): WRITE block 2154 on hda1

Dec 31 15:48:43 gw kernel: cat(26430): READ block 502016 on hda1
Dec 31 15:48:43 gw kernel: cat(26430): READ block 502024 on hda1
Dec 31 15:48:43 gw kernel: cat(26430): READ block 502032 on hda1
Dec 31 15:48:43 gw kernel: cat(26430): READ block 502040 on hda1
Dec 31 15:48:43 gw kernel: cat(26430): READ block 502042 on hda1
Dec 31 15:48:43 gw kernel: cat(26430): READ block 502050 on hda1
Dec 31 15:48:43 gw kernel: cat(26430): READ block 502114 on hda1
Dec 31 15:48:45 gw kernel: kjournald(760): WRITE block 2156 on hda1

Dec 31 15:48:55 gw kernel: bash(26436): READ block 744586 on hda1
Dec 31 15:48:55 gw kernel: bash(26436): READ block 744594 on hda1
Dec 31 15:48:55 gw kernel: bash(26436): READ block 744602 on hda1
Dec 31 15:48:55 gw kernel: bash(26436): READ block 744610 on hda1
Dec 31 15:48:55 gw kernel: bash(26436): READ block 744612 on hda1
Dec 31 15:48:55 gw kernel: reset(26436): READ block 744620 on hda1
Dec 31 15:48:57 gw kernel: kjournald(760): WRITE block 2158 on hda1

$ grep ext3 /proc/mounts
/dev/root / ext3 rw,noatime,errors=remount-ro,commit=21600,data=ordered 0 0

$ for f in /proc/sys/vm/*; do printf "%-50s" $f; cat $f; done
/proc/sys/vm/block_dump 1
/proc/sys/vm/dirty_background_ratio 10
/proc/sys/vm/dirty_expire_centisecs 2160000
/proc/sys/vm/dirty_ratio 60
/proc/sys/vm/dirty_writeback_centisecs 2160000
/proc/sys/vm/drop_caches 0
/proc/sys/vm/laptop_mode 2
/proc/sys/vm/legacy_va_layout 0
/proc/sys/vm/lowmem_reserve_ratio 256 32
/proc/sys/vm/max_map_count 65536
/proc/sys/vm/min_free_kbytes 1440
/proc/sys/vm/nr_pdflush_threads 2
/proc/sys/vm/oom_dump_tasks 0
/proc/sys/vm/oom_kill_allocating_task 0
/proc/sys/vm/overcommit_memory 0
/proc/sys/vm/overcommit_ratio 50
/proc/sys/vm/page-cluster 3
/proc/sys/vm/panic_on_oom 0
/proc/sys/vm/percpu_pagelist_fraction 0
/proc/sys/vm/swappiness 60
/proc/sys/vm/vdso_enabled 2
/proc/sys/vm/vfs_cache_pressure 100
$

--
Dave


2009-01-01 14:22:40

by Bart Samwel

[permalink] [raw]
Subject: Re: [ext3] kjournald writing after each read despite noatime,commit=nnn

Hi Dave,

Dave Johnson wrote:
> Since upgrading from 2.6.23.1 to 2.6.27.8, I'm now seeing kjournald
> perform a write 2 seconds after each read to an ext3 fs.
>
> laptop mode is on and filesystem is mounted with noatime and a large
> commit time (6 hours).
>
> Did I miss something or is this expected?

This is the defined behaviour for laptop_mode. Whenever a *physical*
READ takes place, this is taken to indicate that the disk is spun up at
that time. The laptop_mode functionality then takes that opportunity to
sync any dirty data to disk, two seconds (or whatever value you put in
/proc/sys/vm/laptop_mode) after the physical disk activity has ceased.
The rationale behind this is that you want to sync your stuff when the
disk is spun up, and then you want to hold back writing back stuff for a
very long while. And the only way it can detect that the disk is spun up
is when there is physical disk activity.

This is exactly what happens in your case. The READ activity reported by
block_dump is *physical* read activity: some data was needed that was
not cached in memory. block_dump does not show you what data was
retrieved from the ext3 fs *without* having to access the disk, it only
shows actual physical disk I/O.

Cheers,
Bart

2009-01-01 16:49:43

by Dave Johnson

[permalink] [raw]
Subject: Re: [ext3] kjournald writing after each read despite noatime,commit=nnn

Bart Samwel writes:
> This is the defined behaviour for laptop_mode. Whenever a *physical*
> READ takes place, this is taken to indicate that the disk is spun up at
> that time. The laptop_mode functionality then takes that opportunity to
> sync any dirty data to disk, two seconds (or whatever value you put in
> /proc/sys/vm/laptop_mode) after the physical disk activity has ceased.
> The rationale behind this is that you want to sync your stuff when the
> disk is spun up, and then you want to hold back writing back stuff for a
> very long while. And the only way it can detect that the disk is spun up
> is when there is physical disk activity.
>
> This is exactly what happens in your case. The READ activity reported by
> block_dump is *physical* read activity: some data was needed that was
> not cached in memory. block_dump does not show you what data was
> retrieved from the ext3 fs *without* having to access the disk, it only
> shows actual physical disk I/O.

Yep sounds good, but this happens even if there is no dirty data
needing a sync back to disk.

$ grep 'Dirty\|Write' /proc/meminfo
Dirty: 0 kB
Writeback: 0 kB
WritebackTmp: 0 kB
$ cat /some/uncached/file >/dev/null

Jan 1 11:43:49 gw kernel: cat(6615): READ block 864408 on hda1
Jan 1 11:43:51 gw kernel: kjournald(760): WRITE block 2376 on hda1


Note, the reason I ask is this is a SSD so just because a physical
read has taken place recently unneeded writes should be avoided.

Turning laptop_mode to 0, but leaving other settings the same
resolves the uneeded write:

$ echo 0 > /proc/sys/vm/laptop_mode
$ grep 'Dirty\|Write' /proc/meminfo
Dirty: 0 kB
Writeback: 0 kB
WritebackTmp: 0 kB
$ cat /some/other/uncached/file >/dev/null

Jan 1 11:47:15 gw kernel: cat(6653): READ block 864258 on hda1


--
Dave

2009-01-01 18:59:57

by Bart Samwel

[permalink] [raw]
Subject: Re: [ext3] kjournald writing after each read despite noatime,commit=nnn

Hi Dave,

Dave Johnson wrote:
> Bart Samwel writes:
>> This is the defined behaviour for laptop_mode. Whenever a *physical*
>> READ takes place, this is taken to indicate that the disk is spun up at
>> that time. The laptop_mode functionality then takes that opportunity to
>> sync any dirty data to disk, two seconds (or whatever value you put in
>> /proc/sys/vm/laptop_mode) after the physical disk activity has ceased.
>> The rationale behind this is that you want to sync your stuff when the
>> disk is spun up, and then you want to hold back writing back stuff for a
>> very long while. And the only way it can detect that the disk is spun up
>> is when there is physical disk activity.
>>
>> This is exactly what happens in your case. The READ activity reported by
>> block_dump is *physical* read activity: some data was needed that was
>> not cached in memory. block_dump does not show you what data was
>> retrieved from the ext3 fs *without* having to access the disk, it only
>> shows actual physical disk I/O.
>
> Yep sounds good, but this happens even if there is no dirty data
> needing a sync back to disk.
>
> $ grep 'Dirty\|Write' /proc/meminfo
> Dirty: 0 kB
> Writeback: 0 kB
> WritebackTmp: 0 kB
> $ cat /some/uncached/file >/dev/null
>
> Jan 1 11:43:49 gw kernel: cat(6615): READ block 864408 on hda1
> Jan 1 11:43:51 gw kernel: kjournald(760): WRITE block 2376 on hda1

This looks like it's a generic property of syncing an ext3 file system.
Try turning off laptop_mode and then running "sync". You will probably
see the same behaviour.

> Note, the reason I ask is this is a SSD so just because a physical
> read has taken place recently unneeded writes should be avoided.
>
> Turning laptop_mode to 0, but leaving other settings the same
> resolves the uneeded write:

For your SSD I guess you need to get rid of the
sync-after-disk-activity, but keep the other VM behaviours of
laptop_mode (such as avoiding swapping out pages / writing back dirty
pages in order to free memory as long as it is also possible to just
drop pages that are not dirty).
You can probably achieve this by:
- having a large commit interval etc., like you have now
- setting laptop_mode to a very large value, e.g. a couple of hours.
That will trigger a sync if and only if there has been *no* disk
activity at all for hours on end -- i.e., pretty much never. And the
other write-reducing VM features of laptop_mode will still be enabled.

It would perhaps be a good thing to split these mechanisms into separate
knobs. Write batching (the sync-after-disk-activity stuff and also the
dirty_ratio / dirty_background_ratio changes) are a completely separate
mechanism from write avoidance (the other mechanism I mentioned).

Cheers,
Bart

2009-01-01 19:18:31

by Dave Johnson

[permalink] [raw]
Subject: Re: [ext3] kjournald writing after each read despite noatime,commit=nnn

Bart Samwel writes:
> This looks like it's a generic property of syncing an ext3 file system.
> Try turning off laptop_mode and then running "sync". You will probably
> see the same behaviour.

yep, every sync() causes a gratuitous write to disk N seconds later
even when no data has been read or written since the last sync().

$ while sleep 5; do sync; done
^C
$

Jan 1 14:09:31 gw kernel: kjournald(760): WRITE block 2662 on hda1
Jan 1 14:09:36 gw kernel: kjournald(760): WRITE block 2664 on hda1
Jan 1 14:09:42 gw kernel: kjournald(760): WRITE block 2666 on hda1
Jan 1 14:09:47 gw kernel: kjournald(760): WRITE block 2668 on hda1
Jan 1 14:09:52 gw kernel: kjournald(760): WRITE block 2670 on hda1

> > Note, the reason I ask is this is a SSD so just because a physical
> > read has taken place recently unneeded writes should be avoided.
> >
> > Turning laptop_mode to 0, but leaving other settings the same
> > resolves the uneeded write:
>
> For your SSD I guess you need to get rid of the
> sync-after-disk-activity, but keep the other VM behaviours of
> laptop_mode (such as avoiding swapping out pages / writing back dirty
> pages in order to free memory as long as it is also possible to just
> drop pages that are not dirty).
>
> You can probably achieve this by:
> - having a large commit interval etc., like you have now
> - setting laptop_mode to a very large value, e.g. a couple of hours.
> That will trigger a sync if and only if there has been *no* disk
> activity at all for hours on end -- i.e., pretty much never. And the
> other write-reducing VM features of laptop_mode will still be enabled.
>
> It would perhaps be a good thing to split these mechanisms into separate
> knobs. Write batching (the sync-after-disk-activity stuff and also the
> dirty_ratio / dirty_background_ratio changes) are a completely separate
> mechanism from write avoidance (the other mechanism I mentioned).

I'm going to run without laptop mode for a while to see how this
goes. This system is a small router/firewall with most writes going
to a tmpfs so the number of writes to the ext3 fs is only a few a
month (provided no one logs into the system to do a config change).

Thanks for the help.

--
Dave

2009-01-01 19:35:25

by Dave Johnson

[permalink] [raw]
Subject: Re: [ext3] kjournald writing after each read despite noatime,commit=nnn

Dave Johnson writes:
> Bart Samwel writes:
> > This looks like it's a generic property of syncing an ext3 file system.
> > Try turning off laptop_mode and then running "sync". You will probably
> > see the same behaviour.
>
> yep, every sync() causes a gratuitous write to disk N seconds later
> even when no data has been read or written since the last sync().

Woops, correction, every sync() call causes a single gratuitous write,
no delay.

$ for n in 1 2 3 4 5; do sync; done

Jan 1 14:32:54 gw kernel: kjournald(760): WRITE block 2682 on hda1
Jan 1 14:32:54 gw kernel: kjournald(760): WRITE block 2684 on hda1
Jan 1 14:32:54 gw kernel: kjournald(760): WRITE block 2686 on hda1
Jan 1 14:32:54 gw kernel: kjournald(760): WRITE block 2688 on hda1
Jan 1 14:32:54 gw kernel: kjournald(760): WRITE block 2690 on hda1

--
Dave

2009-01-01 20:16:47

by Eric Sandeen

[permalink] [raw]
Subject: Re: [ext3] kjournald writing after each read despite noatime,commit=nnn

Dave Johnson wrote:
> Dave Johnson writes:
>> Bart Samwel writes:
>>> This looks like it's a generic property of syncing an ext3 file system.
>>> Try turning off laptop_mode and then running "sync". You will probably
>>> see the same behaviour.
>> yep, every sync() causes a gratuitous write to disk N seconds later
>> even when no data has been read or written since the last sync().
>
> Woops, correction, every sync() call causes a single gratuitous write,
> no delay.
>
> $ for n in 1 2 3 4 5; do sync; done
>
> Jan 1 14:32:54 gw kernel: kjournald(760): WRITE block 2682 on hda1
> Jan 1 14:32:54 gw kernel: kjournald(760): WRITE block 2684 on hda1
> Jan 1 14:32:54 gw kernel: kjournald(760): WRITE block 2686 on hda1
> Jan 1 14:32:54 gw kernel: kjournald(760): WRITE block 2688 on hda1
> Jan 1 14:32:54 gw kernel: kjournald(760): WRITE block 2690 on hda1
>

This may be http://bugzilla.kernel.org/show_bug.cgi?id=12224

-Eric