2013-02-17 04:04:17

by Subranshu Patel

[permalink] [raw]
Subject: Large buffer cache in EXT4

I created 2 filesystem on my system (RHEL 6.3, kernel version 2.6.32)
- XFS and EXT4 and mounted them.

On both the filesystem I executed the mdtest tool(opensource tool) for
64 concurrent process. Each process performed the following:
- Create large number of directories
- Remove all the directories

During this time I monitored the memory usage of the system using sar
command. I checked the 3 components - kbmemused, kbbuffers and
kbcached

kbmemused - Amount of used memory in kilobytes. This does not take
into account memory used by the kernel itself.
kbbuffers - buffer cache
kbcached - page cache

While the kbmemused and kbcached component was almost similar in EXT4
and XFS (XFS being a little higher), the kbbuffer showed a totally
different trend.

For EXT4, kbbuffers was:
390999KB for dir creation
364803KB for dir removal
For XFS, kbbuffers was:

1701KB for dir creation
2738KB for dir removal

In kernel 2.6, both buffer cache and page cache are merged. The page
cache caches pages of files. The buffer cache caches disk blocks which
consists of mainly metadata (not file data).

Why is the buffer cache large in case of EXT4 and what is stored in
the buffer cache?


2013-02-17 06:28:56

by Andreas Dilger

[permalink] [raw]
Subject: Re: Large buffer cache in EXT4

On 2013-02-16, at 21:04, Subranshu Patel <[email protected]> wrote:

> I created 2 filesystem on my system (RHEL 6.3, kernel version 2.6.32)
> - XFS and EXT4 and mounted them.
>
> On both the filesystem I executed the mdtest tool(opensource tool) for
> 64 concurrent process. Each process performed the following:
> - Create large number of directories
> - Remove all the directories
>
> During this time I monitored the memory usage of the system using sar
> command. I checked the 3 components - kbmemused, kbbuffers and
> kbcached
>
> kbmemused - Amount of used memory in kilobytes. This does not take
> into account memory used by the kernel itself.
> kbbuffers - buffer cache
> kbcached - page cache
>
> While the kbmemused and kbcached component was almost similar in EXT4
> and XFS (XFS being a little higher), the kbbuffer showed a totally
> different trend.
>
> For EXT4, kbbuffers was:
> 390999KB for dir creation
> 364803KB for dir removal
> For XFS, kbbuffers was:
>
> 1701KB for dir creation
> 2738KB for dir removal
>
> In kernel 2.6, both buffer cache and page cache are merged. The page
> cache caches pages of files. The buffer cache caches disk blocks which
> consists of mainly metadata (not file data).
>
> Why is the buffer cache large in case of EXT4 and what is stored in
> the buffer cache?

XFS does not use buffer cache, while ext[234] does use buffer cache.

This is just a different code design. Ext4 uses the buffer cache to track metadata for journaling.

Cheers, Andreas

2013-02-17 10:19:13

by Martin Steigerwald

[permalink] [raw]
Subject: Re: Large buffer cache in EXT4

Am Sonntag, 17. Februar 2013 schrieb Andreas Dilger:
> On 2013-02-16, at 21:04, Subranshu Patel <[email protected]> wrote:
> > I created 2 filesystem on my system (RHEL 6.3, kernel version 2.6.32)
> > - XFS and EXT4 and mounted them.
> >
> > On both the filesystem I executed the mdtest tool(opensource tool) for
> > 64 concurrent process. Each process performed the following:
> > - Create large number of directories
> > - Remove all the directories
> >
> > During this time I monitored the memory usage of the system using sar
> > command. I checked the 3 components - kbmemused, kbbuffers and
> > kbcached
> >
> > kbmemused - Amount of used memory in kilobytes. This does not take
> > into account memory used by the kernel itself.
> > kbbuffers - buffer cache
> > kbcached - page cache
> >
> > While the kbmemused and kbcached component was almost similar in EXT4
> > and XFS (XFS being a little higher), the kbbuffer showed a totally
> > different trend.
> >
> > For EXT4, kbbuffers was:
> > 390999KB for dir creation
> > 364803KB for dir removal
> > For XFS, kbbuffers was:
> >
> > 1701KB for dir creation
> > 2738KB for dir removal
> >
> > In kernel 2.6, both buffer cache and page cache are merged. The page
> > cache caches pages of files. The buffer cache caches disk blocks which
> > consists of mainly metadata (not file data).
> >
> > Why is the buffer cache large in case of EXT4 and what is stored in
> > the buffer cache?
>
> XFS does not use buffer cache, while ext[234] does use buffer cache.
>
> This is just a different code design. Ext4 uses the buffer cache to track
> metadata for journaling.

Doesn?t XFS use its own mechanism with xfsbufd kernel thread?

Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

2013-02-17 10:25:41

by Martin Steigerwald

[permalink] [raw]
Subject: Re: Large buffer cache in EXT4

Am Sonntag, 17. Februar 2013 schrieb Subranshu Patel:
> I created 2 filesystem on my system (RHEL 6.3, kernel version 2.6.32)
> - XFS and EXT4 and mounted them.
>
> On both the filesystem I executed the mdtest tool(opensource tool) for
> 64 concurrent process. Each process performed the following:
> - Create large number of directories
> - Remove all the directories
>
> During this time I monitored the memory usage of the system using sar
> command. I checked the 3 components - kbmemused, kbbuffers and
> kbcached
>
> kbmemused - Amount of used memory in kilobytes. This does not take
> into account memory used by the kernel itself.
> kbbuffers - buffer cache
> kbcached - page cache
>
> While the kbmemused and kbcached component was almost similar in EXT4
> and XFS (XFS being a little higher), the kbbuffer showed a totally
> different trend.
>
> For EXT4, kbbuffers was:
> 390999KB for dir creation
> 364803KB for dir removal
> For XFS, kbbuffers was:
>
> 1701KB for dir creation
> 2738KB for dir removal
>
> In kernel 2.6, both buffer cache and page cache are merged. The page
> cache caches pages of files. The buffer cache caches disk blocks which
> consists of mainly metadata (not file data).
>
> Why is the buffer cache large in case of EXT4 and what is stored in
> the buffer cache?

What is stored in the buffer cache? An interesting question. I also wondered
about it.

I always thought filesystem metadata that is to be written to the disk. As
opposed to dirty pages which are counted in Dirty: in /proc/meminfo.

Then on being asked in a Linux Performance Analyse and Tuning training I
held where I had some little Linux kernel hackers in there, it seemed to me,
they found out, that it is a disk block buffer by looking at the source. And
indeed on doing dd if=/dev/zero of=/dev/somedevice bs=1M or so the buffer
count raises considerably.

What I never really understand was what is the clear distinction between
dirty pages and disk block buffers. Why isnĀ“t anything that is about to be
written to disk in one cache?

Can anybody enlighten me?

PS: buffers=0 with BTRFS also.

Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

2013-02-18 04:35:23

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Large buffer cache in EXT4

On Sun, Feb 17, 2013 at 11:25:39AM +0100, Martin Steigerwald wrote:
>
> What I never really understand was what is the clear distinction between
> dirty pages and disk block buffers. Why isn?t anything that is about to be
> written to disk in one cache?

The buffer cache is indexed by physical block number, and each buffer
in the buffer cache is the size of the block size used for I/O to the
device.

The page cache is indexed by <inode, page frame number>, and each page
is the size of a VM page (i.e.4k for x86 systems, 16k for Power
systems, etc.)

Certain file systems, including ext3, ext4, and ocfs2, use the jbd or
jbd2 layer to handle their physical block journalling, and this layer
fundamentally uses the buffer cache, since it is concerned with
controlling when specific file system blocks are allowed to ben
written back to the hard drive.

Other file systems may not support file system blocks smaller than 4k.
This may make it easier for them to use the page cache for their
metadata blocks, although I don't know what happens if you try to
mount a btrfs file system formatted with 4k blocks on an architecture
such as Power which has 16k pages. I don't know if it will work, or
blow up in a spectacular display of sparks. :-)

In practice, it really doesn't matter. The actual data storage for
the buffer cache (i.e., where the b_data field points to in the struct
buffer_head) is actually in the page cache, so from a space
perspective it doesn't really matter. File systems like ext3 and ext4
which use the buffer cache for metadata blocks need to be careful than
when a directory (which is metadata) is deleted, that the blocks in
the buffer cache are zapped so that if the space on disk is reused for
data file (which is cached in the page cache), that the stale entries
in the buffer cache aren't at risk of being written back to the disk.
But that's just a tiny a implementation detail....

- Ted

2013-02-18 13:16:28

by Martin Steigerwald

[permalink] [raw]
Subject: Re: Large buffer cache in EXT4

Am Montag, 18. Februar 2013 schrieb Theodore Ts'o:
> On Sun, Feb 17, 2013 at 11:25:39AM +0100, Martin Steigerwald wrote:
> > What I never really understand was what is the clear distinction
> > between dirty pages and disk block buffers. Why isn?t anything that is
> > about to be written to disk in one cache?
>
> The buffer cache is indexed by physical block number, and each buffer
> in the buffer cache is the size of the block size used for I/O to the
> device.
>
> The page cache is indexed by <inode, page frame number>, and each page
> is the size of a VM page (i.e.4k for x86 systems, 16k for Power
> systems, etc.)
>
> Certain file systems, including ext3, ext4, and ocfs2, use the jbd or
> jbd2 layer to handle their physical block journalling, and this layer
> fundamentally uses the buffer cache, since it is concerned with
> controlling when specific file system blocks are allowed to ben
> written back to the hard drive.

Thank you for the explanation, Ted.

--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

2013-02-18 15:22:59

by Eric Sandeen

[permalink] [raw]
Subject: Re: Large buffer cache in EXT4

On 2/17/13 12:28 AM, Andreas Dilger wrote:
> On 2013-02-16, at 21:04, Subranshu Patel <[email protected]> wrote:
>
>> I created 2 filesystem on my system (RHEL 6.3, kernel version 2.6.32)
>> - XFS and EXT4 and mounted them.
>>
>> On both the filesystem I executed the mdtest tool(opensource tool) for
>> 64 concurrent process. Each process performed the following:
>> - Create large number of directories
>> - Remove all the directories
>>
>> During this time I monitored the memory usage of the system using sar
>> command. I checked the 3 components - kbmemused, kbbuffers and
>> kbcached
>>
>> kbmemused - Amount of used memory in kilobytes. This does not take
>> into account memory used by the kernel itself.
>> kbbuffers - buffer cache
>> kbcached - page cache
>>
>> While the kbmemused and kbcached component was almost similar in EXT4
>> and XFS (XFS being a little higher), the kbbuffer showed a totally
>> different trend.
>>
>> For EXT4, kbbuffers was:
>> 390999KB for dir creation
>> 364803KB for dir removal
>> For XFS, kbbuffers was:
>>
>> 1701KB for dir creation
>> 2738KB for dir removal
>>
>> In kernel 2.6, both buffer cache and page cache are merged. The page
>> cache caches pages of files. The buffer cache caches disk blocks which
>> consists of mainly metadata (not file data).
>>
>> Why is the buffer cache large in case of EXT4 and what is stored in
>> the buffer cache?
>
> XFS does not use buffer cache, while ext[234] does use buffer cache.
>
> This is just a different code design. Ext4 uses the buffer cache to track metadata for journaling.

Use slabtop or similar to see what xfs slab caches grow during the test; look at xfs_buf for example.

-Eric