On Wed, 15 Jul 2015 12:26:26 +0200 Jan Kara <[email protected]> wrote:
> From: Jan Kara <[email protected]>
>
> The functionality of ext3 is fully supported by ext4 driver. Major
> distributions (SUSE, RedHat) already use ext4 driver to handle ext3
> filesystems for quite some time. There is some ugliness in mm resulting
> from jbd cleaning buffers in a dirty page without cleaning page dirty
> bit and also support for buffer bouncing in the block layer when stable
> pages are required is there only because of jbd. So let's remove the
> ext3 driver.
Does this imply that ext4 doesn't do the
secretly-clean-the-page-via-buffers thing? If so, how?
The comment in shrink_page_list() says the blockdev mapping will do
this as well, although I can't imagine how - there's no means of
getting to those buffer_heads except via the page. So maybe the "even
if the page is PageDirty()" is no longer true. It was added by:
commit 493f4988d640a73337df91f2c63e94c78ecd5e97
Author: Andrew Morton <[email protected]>
Date: Mon Jun 17 20:20:53 2002 -0700
[PATCH] allow GFP_NOFS allocators to perform swapcache writeout
One weakness which was introduced when the buffer LRU went away was
that GFP_NOFS allocations became equivalent to GFP_NOIO. Because all
writeback goes via writepage/writepages, which requires entry into the
filesystem.
However now that swapout no longer calls bmap(), we can honour
GFP_NOFS's intent for swapcache pages. So if the allocation request
specifies __GFP_IO and !__GFP_FS, we can wait on swapcache pages and we
can perform swapcache writeout.
This should strengthen the VM somewhat.
I wonder what I was thinking.
Also, what's the status of ext4's data=journal? It's the hardest ext3
mode for the rest of the kernel to support and I suspect hardly anyone
uses it.
> 46 files changed, 54 insertions(+), 28109 deletions(-)
Heroic.
On 2015-07-15 12:58, Andrew Morton wrote:
> Also, what's the status of ext4's data=journal? It's the hardest ext3
> mode for the rest of the kernel to support and I suspect hardly anyone
> uses it.
I use it, as do some other people I know, but only for stuff that pretty
much needs to be 100% guaranteed to be consistent after a power failure.
It is _very_ slow, but it does work correctly based on my recent
experience.
On Wed 15-07-15 09:58:22, Andrew Morton wrote:
> On Wed, 15 Jul 2015 12:26:26 +0200 Jan Kara <[email protected]> wrote:
>
> > From: Jan Kara <[email protected]>
> >
> > The functionality of ext3 is fully supported by ext4 driver. Major
> > distributions (SUSE, RedHat) already use ext4 driver to handle ext3
> > filesystems for quite some time. There is some ugliness in mm resulting
> > from jbd cleaning buffers in a dirty page without cleaning page dirty
> > bit and also support for buffer bouncing in the block layer when stable
> > pages are required is there only because of jbd. So let's remove the
> > ext3 driver.
>
> Does this imply that ext4 doesn't do the
> secretly-clean-the-page-via-buffers thing? If so, how?
The biggest offender which was cleaning pages via buffers was JBD commit
code writing back data=ordered buffers. I have modified JBD2 to do this
via generic_writepages() instead of through buffer heads (which required
locking overhaul in JBD2). So JBD2 doesn't do this for quite a few years.
That being said, JBD2 checkpointing code will still clean pages via buffer
heads so blockdev mapping may still have silently cleaned pages. And in
data=journal mode this can be the case even for other mappings. In these
cases, locking isn't luckily an issue and fixing this is relatively
straightforward. I'm just looking for an elegant way to do this inside JBD2
- I'm hoping for something better than just get page from bh, lock it and
call clear_page_dirty_for_io() and ->writepage(). It works but looks
ugly...
> The comment in shrink_page_list() says the blockdev mapping will do
> this as well, although I can't imagine how - there's no means of
> getting to those buffer_heads except via the page. So maybe the "even
> if the page is PageDirty()" is no longer true. It was added by:
>
> commit 493f4988d640a73337df91f2c63e94c78ecd5e97
> Author: Andrew Morton <[email protected]>
> Date: Mon Jun 17 20:20:53 2002 -0700
>
> [PATCH] allow GFP_NOFS allocators to perform swapcache writeout
>
> One weakness which was introduced when the buffer LRU went away was
> that GFP_NOFS allocations became equivalent to GFP_NOIO. Because all
> writeback goes via writepage/writepages, which requires entry into the
> filesystem.
>
> However now that swapout no longer calls bmap(), we can honour
> GFP_NOFS's intent for swapcache pages. So if the allocation request
> specifies __GFP_IO and !__GFP_FS, we can wait on swapcache pages and we
> can perform swapcache writeout.
>
> This should strengthen the VM somewhat.
>
> I wonder what I was thinking.
Well, e.g. sync_mapping_buffers() from fs/buffer.c will write out buffer
heads without cleaning the page. So does the checkpointing code in
JBD/JBD2. So for blockdev mappings, this really happens rather frequently
I'd say.
> Also, what's the status of ext4's data=journal? It's the hardest ext3
> mode for the rest of the kernel to support and I suspect hardly anyone
> uses it.
As this thread shows, there are people using it (and I occasionally see bug
reports for it as well). It would simplify things if we could get rid of it
but I don't think it's currently an option...
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Wed, Jul 15, 2015 at 01:35:09PM -0400, Austin S Hemmelgarn wrote:
> On 2015-07-15 12:58, Andrew Morton wrote:
> > Also, what's the status of ext4's data=journal? It's the hardest ext3
> > mode for the rest of the kernel to support and I suspect hardly anyone
> > uses it.
> I use it, as do some other people I know, but only for stuff that pretty
> much needs to be 100% guaranteed to be consistent after a power failure.
> It is _very_ slow, but it does work correctly based on my recent
> experience.
It is also a benefit for workloads that are heavy on sync writes to
(relatively) small files, such as mail servers. At least, it used to be
considerably faster than data=ordered the last time I benchmarked it.
--
Bruce Guenter <[email protected]> http://untroubled.org/