2022-12-07 11:36:58

by Jan Kara

[permalink] [raw]
Subject: [PATCH v4 0/13] ext4: Stop using ext4_writepage() for writeout of ordered data

Hello,

this patch series modifies ext4 so that we stop using ext4_writepage() for
writeout of ordered data during transaction commit (through
generic_writepages() from jbd2_journal_submit_inode_data_buffers()). Instead we
directly call ext4_writepages() from the
ext4_journal_submit_inode_data_buffers(). This is part of Christoph's effort
to get rid of the .writepage() callback in all filesystems.

I have also modified ext4_writepages() to use write_cache_pages() instead of
generic_writepages() so now we don't expose .writepage hook at all. We still
keep ext4_writepage() as a callback for write_cache_pages(). We should refactor
that path as well and get rid of ext4_writepage() completely but that is for a
separate cleanup. Also note that jbd2 still uses generic_writepages() in its
jbd2_journal_submit_inode_data_buffers() helper because it is still used from
OCFS2. Again, something to be dealt with in a separate patchset.

Changes since v3:
* Added export of buffer_migrate_page_norefs()
* Clarified commit message about page migration a bit

Changes since v2:
* Added Reviewed-by tags from Christoph
* Added WARN_ON to verify we're not trying to start transaction from
transaction commit writeback
* Converted fastcommit path to use ext4_writepages() for data writeout
* Some minor tweaks suggested by Christoph

Changes since v1:
* Added Reviewed-by tags from Ritesh
* Added patch to get rid of generic_writepages() in ext4_writepages()
* Added patch to get rid of .writepage hook

Honza
Previous versions:
Link: http://lore.kernel.org/r/[email protected] # v1
Link: http://lore.kernel.org/r/[email protected] # v2
Link: http://lore.kernel.org/r/[email protected] # v3


2022-12-07 11:38:21

by Jan Kara

[permalink] [raw]
Subject: [PATCH v4 06/13] ext4: Provide ext4_do_writepages()

Provide ext4_do_writepages() function that takes mpage_da_data as an
argument and make ext4_writepages() just a simple wrapper around it. No
functional changes.

Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
---
fs/ext4/inode.c | 96 +++++++++++++++++++++++++++----------------------
1 file changed, 54 insertions(+), 42 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9ba843b7a92c..99c66c768cd0 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1543,9 +1543,12 @@ void ext4_da_release_space(struct inode *inode, int to_free)
*/

struct mpage_da_data {
+ /* These are input fields for ext4_do_writepages() */
struct inode *inode;
struct writeback_control *wbc;
+ unsigned int can_map:1; /* Can writepages call map blocks? */

+ /* These are internal state of ext4_do_writepages() */
pgoff_t first_page; /* The first page to write */
pgoff_t next_page; /* Current page to examine */
pgoff_t last_page; /* Last page to examine */
@@ -1557,7 +1560,6 @@ struct mpage_da_data {
struct ext4_map_blocks map;
struct ext4_io_submit io_submit; /* IO submission data */
unsigned int do_map:1;
- unsigned int can_map:1; /* Can writepages call map blocks? */
unsigned int scanned_until_end:1;
};

@@ -2703,16 +2705,16 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
return err;
}

-static int ext4_writepages(struct address_space *mapping,
- struct writeback_control *wbc)
+static int ext4_do_writepages(struct mpage_da_data *mpd)
{
+ struct writeback_control *wbc = mpd->wbc;
pgoff_t writeback_index = 0;
long nr_to_write = wbc->nr_to_write;
int range_whole = 0;
int cycled = 1;
handle_t *handle = NULL;
- struct mpage_da_data mpd;
- struct inode *inode = mapping->host;
+ struct inode *inode = mpd->inode;
+ struct address_space *mapping = inode->i_mapping;
int needed_blocks, rsv_blocks = 0, ret = 0;
struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb);
struct blk_plug plug;
@@ -2787,19 +2789,18 @@ static int ext4_writepages(struct address_space *mapping,
writeback_index = mapping->writeback_index;
if (writeback_index)
cycled = 0;
- mpd.first_page = writeback_index;
- mpd.last_page = -1;
+ mpd->first_page = writeback_index;
+ mpd->last_page = -1;
} else {
- mpd.first_page = wbc->range_start >> PAGE_SHIFT;
- mpd.last_page = wbc->range_end >> PAGE_SHIFT;
+ mpd->first_page = wbc->range_start >> PAGE_SHIFT;
+ mpd->last_page = wbc->range_end >> PAGE_SHIFT;
}

- mpd.inode = inode;
- mpd.wbc = wbc;
- ext4_io_submit_init(&mpd.io_submit, wbc);
+ ext4_io_submit_init(&mpd->io_submit, wbc);
retry:
if (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages)
- tag_pages_for_writeback(mapping, mpd.first_page, mpd.last_page);
+ tag_pages_for_writeback(mapping, mpd->first_page,
+ mpd->last_page);
blk_start_plug(&plug);

/*
@@ -2808,28 +2809,27 @@ static int ext4_writepages(struct address_space *mapping,
* in the block layer on device congestion while having transaction
* started.
*/
- mpd.do_map = 0;
- mpd.scanned_until_end = 0;
- mpd.can_map = 1;
- mpd.io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
- if (!mpd.io_submit.io_end) {
+ mpd->do_map = 0;
+ mpd->scanned_until_end = 0;
+ mpd->io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
+ if (!mpd->io_submit.io_end) {
ret = -ENOMEM;
goto unplug;
}
- ret = mpage_prepare_extent_to_map(&mpd);
+ ret = mpage_prepare_extent_to_map(mpd);
/* Unlock pages we didn't use */
- mpage_release_unused_pages(&mpd, false);
+ mpage_release_unused_pages(mpd, false);
/* Submit prepared bio */
- ext4_io_submit(&mpd.io_submit);
- ext4_put_io_end_defer(mpd.io_submit.io_end);
- mpd.io_submit.io_end = NULL;
+ ext4_io_submit(&mpd->io_submit);
+ ext4_put_io_end_defer(mpd->io_submit.io_end);
+ mpd->io_submit.io_end = NULL;
if (ret < 0)
goto unplug;

- while (!mpd.scanned_until_end && wbc->nr_to_write > 0) {
+ while (!mpd->scanned_until_end && wbc->nr_to_write > 0) {
/* For each extent of pages we use new io_end */
- mpd.io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
- if (!mpd.io_submit.io_end) {
+ mpd->io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
+ if (!mpd->io_submit.io_end) {
ret = -ENOMEM;
break;
}
@@ -2854,16 +2854,16 @@ static int ext4_writepages(struct address_space *mapping,
"%ld pages, ino %lu; err %d", __func__,
wbc->nr_to_write, inode->i_ino, ret);
/* Release allocated io_end */
- ext4_put_io_end(mpd.io_submit.io_end);
- mpd.io_submit.io_end = NULL;
+ ext4_put_io_end(mpd->io_submit.io_end);
+ mpd->io_submit.io_end = NULL;
break;
}
- mpd.do_map = 1;
+ mpd->do_map = 1;

- trace_ext4_da_write_pages(inode, mpd.first_page, mpd.wbc);
- ret = mpage_prepare_extent_to_map(&mpd);
- if (!ret && mpd.map.m_len)
- ret = mpage_map_and_submit_extent(handle, &mpd,
+ trace_ext4_da_write_pages(inode, mpd->first_page, wbc);
+ ret = mpage_prepare_extent_to_map(mpd);
+ if (!ret && mpd->map.m_len)
+ ret = mpage_map_and_submit_extent(handle, mpd,
&give_up_on_write);
/*
* Caution: If the handle is synchronous,
@@ -2878,12 +2878,12 @@ static int ext4_writepages(struct address_space *mapping,
if (!ext4_handle_valid(handle) || handle->h_sync == 0) {
ext4_journal_stop(handle);
handle = NULL;
- mpd.do_map = 0;
+ mpd->do_map = 0;
}
/* Unlock pages we didn't use */
- mpage_release_unused_pages(&mpd, give_up_on_write);
+ mpage_release_unused_pages(mpd, give_up_on_write);
/* Submit prepared bio */
- ext4_io_submit(&mpd.io_submit);
+ ext4_io_submit(&mpd->io_submit);

/*
* Drop our io_end reference we got from init. We have
@@ -2893,11 +2893,11 @@ static int ext4_writepages(struct address_space *mapping,
* up doing unwritten extent conversion.
*/
if (handle) {
- ext4_put_io_end_defer(mpd.io_submit.io_end);
+ ext4_put_io_end_defer(mpd->io_submit.io_end);
ext4_journal_stop(handle);
} else
- ext4_put_io_end(mpd.io_submit.io_end);
- mpd.io_submit.io_end = NULL;
+ ext4_put_io_end(mpd->io_submit.io_end);
+ mpd->io_submit.io_end = NULL;

if (ret == -ENOSPC && sbi->s_journal) {
/*
@@ -2917,8 +2917,8 @@ static int ext4_writepages(struct address_space *mapping,
blk_finish_plug(&plug);
if (!ret && !cycled && wbc->nr_to_write > 0) {
cycled = 1;
- mpd.last_page = writeback_index - 1;
- mpd.first_page = 0;
+ mpd->last_page = writeback_index - 1;
+ mpd->first_page = 0;
goto retry;
}

@@ -2928,7 +2928,7 @@ static int ext4_writepages(struct address_space *mapping,
* Set the writeback_index so that range_cyclic
* mode will write it back later
*/
- mapping->writeback_index = mpd.first_page;
+ mapping->writeback_index = mpd->first_page;

out_writepages:
trace_ext4_writepages_result(inode, wbc, ret,
@@ -2937,6 +2937,18 @@ static int ext4_writepages(struct address_space *mapping,
return ret;
}

+static int ext4_writepages(struct address_space *mapping,
+ struct writeback_control *wbc)
+{
+ struct mpage_da_data mpd = {
+ .inode = mapping->host,
+ .wbc = wbc,
+ .can_map = 1,
+ };
+
+ return ext4_do_writepages(&mpd);
+}
+
static int ext4_dax_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
--
2.35.3

2022-12-07 11:39:28

by Jan Kara

[permalink] [raw]
Subject: [PATCH v4 12/13] ext4: Stop providing .writepage hook

Now we don't need .writepage hook for anything anymore. Reclaim is fine
with relying on .writepages to clean pages and we often couldn't do much
from the .writepage callback anyway. We only need to provide
.migrate_folio callback for the ext4_journalled_aops - let's use
buffer_migrate_page_norefs() there so that buffers cannot be modified
under jdb2's hands as that can cause data corruption. For example when
commit code does writeout of transaction buffers in
jbd2_journal_write_metadata_buffer(), we don't hold page lock or have
page writeback bit set or have the buffer locked. So page migration code
would go and happily migrate the page elsewhere while the copy is
running thus corrupting data.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
---
fs/ext4/inode.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f3b3792c1c96..acf9d23c1cfb 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3718,7 +3718,6 @@ static int ext4_iomap_swap_activate(struct swap_info_struct *sis,
static const struct address_space_operations ext4_aops = {
.read_folio = ext4_read_folio,
.readahead = ext4_readahead,
- .writepage = ext4_writepage,
.writepages = ext4_writepages,
.write_begin = ext4_write_begin,
.write_end = ext4_write_end,
@@ -3736,7 +3735,6 @@ static const struct address_space_operations ext4_aops = {
static const struct address_space_operations ext4_journalled_aops = {
.read_folio = ext4_read_folio,
.readahead = ext4_readahead,
- .writepage = ext4_writepage,
.writepages = ext4_writepages,
.write_begin = ext4_write_begin,
.write_end = ext4_journalled_write_end,
@@ -3745,6 +3743,7 @@ static const struct address_space_operations ext4_journalled_aops = {
.invalidate_folio = ext4_journalled_invalidate_folio,
.release_folio = ext4_release_folio,
.direct_IO = noop_direct_IO,
+ .migrate_folio = buffer_migrate_folio_norefs,
.is_partially_uptodate = block_is_partially_uptodate,
.error_remove_page = generic_error_remove_page,
.swap_activate = ext4_iomap_swap_activate,
@@ -3753,7 +3752,6 @@ static const struct address_space_operations ext4_journalled_aops = {
static const struct address_space_operations ext4_da_aops = {
.read_folio = ext4_read_folio,
.readahead = ext4_readahead,
- .writepage = ext4_writepage,
.writepages = ext4_writepages,
.write_begin = ext4_da_write_begin,
.write_end = ext4_da_write_end,
--
2.35.3

2022-12-07 11:41:37

by Jan Kara

[permalink] [raw]
Subject: [PATCH v4 05/13] ext4: Add support for writepages calls that cannot map blocks

Add support for calls to ext4_writepages() than cannot map blocks. These
will be issued from jbd2 transaction commit code.

Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
---
fs/ext4/inode.c | 62 ++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 48 insertions(+), 14 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 43eb175d0c1c..9ba843b7a92c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1557,6 +1557,7 @@ struct mpage_da_data {
struct ext4_map_blocks map;
struct ext4_io_submit io_submit; /* IO submission data */
unsigned int do_map:1;
+ unsigned int can_map:1; /* Can writepages call map blocks? */
unsigned int scanned_until_end:1;
};

@@ -2549,18 +2550,33 @@ static int ext4_da_writepages_trans_blocks(struct inode *inode)
MAX_WRITEPAGES_EXTENT_LEN + bpp - 1, bpp);
}

+/* Return true if the page needs to be written as part of transaction commit */
+static bool ext4_page_nomap_can_writeout(struct page *page)
+{
+ struct buffer_head *bh, *head;
+
+ bh = head = page_buffers(page);
+ do {
+ if (buffer_dirty(bh) && buffer_mapped(bh) && !buffer_delay(bh))
+ return true;
+ } while ((bh = bh->b_this_page) != head);
+ return false;
+}
+
/*
* mpage_prepare_extent_to_map - find & lock contiguous range of dirty pages
- * and underlying extent to map
+ * needing mapping, submit mapped pages
*
* @mpd - where to look for pages
*
* Walk dirty pages in the mapping. If they are fully mapped, submit them for
- * IO immediately. When we find a page which isn't mapped we start accumulating
- * extent of buffers underlying these pages that needs mapping (formed by
- * either delayed or unwritten buffers). We also lock the pages containing
- * these buffers. The extent found is returned in @mpd structure (starting at
- * mpd->lblk with length mpd->len blocks).
+ * IO immediately. If we cannot map blocks, we submit just already mapped
+ * buffers in the page for IO and keep page dirty. When we can map blocks and
+ * we find a page which isn't mapped we start accumulating extent of buffers
+ * underlying these pages that needs mapping (formed by either delayed or
+ * unwritten buffers). We also lock the pages containing these buffers. The
+ * extent found is returned in @mpd structure (starting at mpd->lblk with
+ * length mpd->len blocks).
*
* Note that this function can attach bios to one io_end structure which are
* neither logically nor physically contiguous. Although it may seem as an
@@ -2651,14 +2667,30 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
if (mpd->map.m_len == 0)
mpd->first_page = page->index;
mpd->next_page = page->index + 1;
- /* Add all dirty buffers to mpd */
- lblk = ((ext4_lblk_t)page->index) <<
- (PAGE_SHIFT - blkbits);
- head = page_buffers(page);
- err = mpage_process_page_bufs(mpd, head, head, lblk);
- if (err <= 0)
- goto out;
- err = 0;
+ /*
+ * Writeout for transaction commit where we cannot
+ * modify metadata is simple. Just submit the page.
+ */
+ if (!mpd->can_map) {
+ if (ext4_page_nomap_can_writeout(page)) {
+ err = mpage_submit_page(mpd, page);
+ if (err < 0)
+ goto out;
+ } else {
+ unlock_page(page);
+ mpd->first_page++;
+ }
+ } else {
+ /* Add all dirty buffers to mpd */
+ lblk = ((ext4_lblk_t)page->index) <<
+ (PAGE_SHIFT - blkbits);
+ head = page_buffers(page);
+ err = mpage_process_page_bufs(mpd, head, head,
+ lblk);
+ if (err <= 0)
+ goto out;
+ err = 0;
+ }
left--;
}
pagevec_release(&pvec);
@@ -2778,6 +2810,7 @@ static int ext4_writepages(struct address_space *mapping,
*/
mpd.do_map = 0;
mpd.scanned_until_end = 0;
+ mpd.can_map = 1;
mpd.io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
if (!mpd.io_submit.io_end) {
ret = -ENOMEM;
@@ -2801,6 +2834,7 @@ static int ext4_writepages(struct address_space *mapping,
break;
}

+ WARN_ON_ONCE(!mpd->can_map);
/*
* We have two constraints: We find one extent to map and we
* must always write out whole page (makes a difference when
--
2.35.3

2022-12-07 11:41:46

by Jan Kara

[permalink] [raw]
Subject: [PATCH v4 07/13] ext4: Move percpu_rwsem protection into ext4_writepages()

Move protection by percpu_rwsem from ext4_do_writepages() to
ext4_writepages(). We will not want to grab this protection during
transaction commits as that would be prone to deadlocks and the
protection is not needed. Move the shutdown state checking as well since
we want to be able to complete commit while the shutdown is in progress.

Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
---
fs/ext4/inode.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 99c66c768cd0..4f8f4959524c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2720,10 +2720,6 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
struct blk_plug plug;
bool give_up_on_write = false;

- if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
- return -EIO;
-
- percpu_down_read(&sbi->s_writepages_rwsem);
trace_ext4_writepages(inode, wbc);

/*
@@ -2933,20 +2929,28 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
out_writepages:
trace_ext4_writepages_result(inode, wbc, ret,
nr_to_write - wbc->nr_to_write);
- percpu_up_read(&sbi->s_writepages_rwsem);
return ret;
}

static int ext4_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
+ struct super_block *sb = mapping->host->i_sb;
struct mpage_da_data mpd = {
.inode = mapping->host,
.wbc = wbc,
.can_map = 1,
};
+ int ret;
+
+ if (unlikely(ext4_forced_shutdown(EXT4_SB(sb))))
+ return -EIO;

- return ext4_do_writepages(&mpd);
+ percpu_down_read(&EXT4_SB(sb)->s_writepages_rwsem);
+ ret = ext4_do_writepages(&mpd);
+ percpu_up_read(&EXT4_SB(sb)->s_writepages_rwsem);
+
+ return ret;
}

static int ext4_dax_writepages(struct address_space *mapping,
--
2.35.3

2022-12-07 11:43:51

by Jan Kara

[permalink] [raw]
Subject: [PATCH v4 04/13] ext4: Drop pointless IO submission from ext4_bio_write_page()

We submit outstanding IO in ext4_bio_write_page() if we find a buffer we
are not going to write. This is however pointless because we already
handle submission of previous IO in case we detect newly added buffer
head is discontiguous. So just delete the pointless IO submission call.

Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
---
fs/ext4/page-io.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 2bdfb8a046d9..beaec6d81074 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -489,8 +489,6 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
redirty_page_for_writepage(wbc, page);
keep_towrite = true;
}
- if (io->io_bio)
- ext4_io_submit(io);
continue;
}
if (buffer_new(bh))
--
2.35.3

2022-12-07 11:55:14

by Jan Kara

[permalink] [raw]
Subject: [PATCH v4 02/13] ext4: Move keep_towrite handling to ext4_bio_write_page()

When we are writing back page but we cannot for some reason write all
its buffers (e.g. because we cannot allocate blocks in current context) we
have to keep TOWRITE tag set in the mapping as otherwise racing
WB_SYNC_ALL writeback that could write these buffers can skip the page
and result in data loss. We will need this logic for writeback during
transaction commit so move the logic from ext4_writepage() to
ext4_bio_write_page().

Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
---
fs/ext4/ext4.h | 3 +--
fs/ext4/inode.c | 6 ++----
fs/ext4/page-io.c | 36 +++++++++++++++++++++---------------
3 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 8d5453852f98..1b3bffc04fd0 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3756,8 +3756,7 @@ extern void ext4_end_io_rsv_work(struct work_struct *work);
extern void ext4_io_submit(struct ext4_io_submit *io);
extern int ext4_bio_write_page(struct ext4_io_submit *io,
struct page *page,
- int len,
- bool keep_towrite);
+ int len);
extern struct ext4_io_end_vec *ext4_alloc_io_end_vec(ext4_io_end_t *io_end);
extern struct ext4_io_end_vec *ext4_last_io_end_vec(ext4_io_end_t *io_end);

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2b5ef1b64249..43eb175d0c1c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2009,7 +2009,6 @@ static int ext4_writepage(struct page *page,
struct buffer_head *page_bufs = NULL;
struct inode *inode = page->mapping->host;
struct ext4_io_submit io_submit;
- bool keep_towrite = false;

if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) {
folio_invalidate(folio, 0, folio_size(folio));
@@ -2067,7 +2066,6 @@ static int ext4_writepage(struct page *page,
unlock_page(page);
return 0;
}
- keep_towrite = true;
}

if (PageChecked(page) && ext4_should_journal_data(inode))
@@ -2084,7 +2082,7 @@ static int ext4_writepage(struct page *page,
unlock_page(page);
return -ENOMEM;
}
- ret = ext4_bio_write_page(&io_submit, page, len, keep_towrite);
+ ret = ext4_bio_write_page(&io_submit, page, len);
ext4_io_submit(&io_submit);
/* Drop io_end reference we got from init */
ext4_put_io_end_defer(io_submit.io_end);
@@ -2118,7 +2116,7 @@ static int mpage_submit_page(struct mpage_da_data *mpd, struct page *page)
len = size & ~PAGE_MASK;
else
len = PAGE_SIZE;
- err = ext4_bio_write_page(&mpd->io_submit, page, len, false);
+ err = ext4_bio_write_page(&mpd->io_submit, page, len);
if (!err)
mpd->wbc->nr_to_write--;
mpd->first_page++;
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 4e68ace86f11..4f9ecacd10aa 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -430,8 +430,7 @@ static void io_submit_add_bh(struct ext4_io_submit *io,

int ext4_bio_write_page(struct ext4_io_submit *io,
struct page *page,
- int len,
- bool keep_towrite)
+ int len)
{
struct page *bounce_page = NULL;
struct inode *inode = page->mapping->host;
@@ -441,14 +440,11 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
int nr_submitted = 0;
int nr_to_submit = 0;
struct writeback_control *wbc = io->io_wbc;
+ bool keep_towrite = false;

BUG_ON(!PageLocked(page));
BUG_ON(PageWriteback(page));

- if (keep_towrite)
- set_page_writeback_keepwrite(page);
- else
- set_page_writeback(page);
ClearPageError(page);

/*
@@ -483,12 +479,17 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
if (!buffer_mapped(bh))
clear_buffer_dirty(bh);
/*
- * Keeping dirty some buffer we cannot write? Make
- * sure to redirty the page. This happens e.g. when
- * doing writeout for transaction commit.
+ * Keeping dirty some buffer we cannot write? Make sure
+ * to redirty the page and keep TOWRITE tag so that
+ * racing WB_SYNC_ALL writeback does not skip the page.
+ * This happens e.g. when doing writeout for
+ * transaction commit.
*/
- if (buffer_dirty(bh) && !PageDirty(page))
- redirty_page_for_writepage(wbc, page);
+ if (buffer_dirty(bh)) {
+ if (!PageDirty(page))
+ redirty_page_for_writepage(wbc, page);
+ keep_towrite = true;
+ }
if (io->io_bio)
ext4_io_submit(io);
continue;
@@ -500,6 +501,10 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
nr_to_submit++;
} while ((bh = bh->b_this_page) != head);

+ /* Nothing to submit? Just unlock the page... */
+ if (!nr_to_submit)
+ goto unlock;
+
bh = head = page_buffers(page);

/*
@@ -550,6 +555,11 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
}
}

+ if (keep_towrite)
+ set_page_writeback_keepwrite(page);
+ else
+ set_page_writeback(page);
+
/* Now submit buffers to write */
do {
if (!buffer_async_write(bh))
@@ -558,11 +568,7 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
bounce_page ? bounce_page : page, bh);
nr_submitted++;
} while ((bh = bh->b_this_page) != head);
-
unlock:
unlock_page(page);
- /* Nothing submitted - we have to end page writeback */
- if (!nr_submitted)
- end_page_writeback(page);
return ret;
}
--
2.35.3

2022-12-07 11:55:20

by Jan Kara

[permalink] [raw]
Subject: [PATCH v4 03/13] ext4: Remove nr_submitted from ext4_bio_write_page()

nr_submitted is the same as nr_to_submit. Drop one of them.

Reviewed-by: Ritesh Harjani (IBM) <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
---
fs/ext4/page-io.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 4f9ecacd10aa..2bdfb8a046d9 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -437,7 +437,6 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
unsigned block_start;
struct buffer_head *bh, *head;
int ret = 0;
- int nr_submitted = 0;
int nr_to_submit = 0;
struct writeback_control *wbc = io->io_wbc;
bool keep_towrite = false;
@@ -566,7 +565,6 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
continue;
io_submit_add_bh(io, inode,
bounce_page ? bounce_page : page, bh);
- nr_submitted++;
} while ((bh = bh->b_this_page) != head);
unlock:
unlock_page(page);
--
2.35.3

2022-12-07 11:55:58

by Jan Kara

[permalink] [raw]
Subject: [PATCH v4 11/13] mm: Export buffer_migrate_folio_norefs()

Ext4 needs this function to allow safe migration for journalled data
pages.

Signed-off-by: Jan Kara <[email protected]>
---
mm/migrate.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/migrate.c b/mm/migrate.c
index dff333593a8a..5e4ca21da712 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -820,6 +820,7 @@ int buffer_migrate_folio_norefs(struct address_space *mapping,
{
return __buffer_migrate_folio(mapping, dst, src, mode, true);
}
+EXPORT_SYMBOL_GPL(buffer_migrate_folio_norefs);
#endif

int filemap_migrate_folio(struct address_space *mapping,
--
2.35.3

2022-12-07 11:56:43

by Jan Kara

[permalink] [raw]
Subject: [PATCH v4 09/13] jbd2: Switch jbd2_submit_inode_data() to use fs-provided hook for data writeout

jbd2_submit_inode_data() hardcoded use of
jbd2_journal_submit_inode_data_buffers() for submission of data pages.
Make it use j_submit_inode_data_buffers hook instead. This effectively
switches ext4 fastcommits to use ext4_writepages() for data writeout
instead of generic_writepages().

Signed-off-by: Jan Kara <[email protected]>
---
fs/ext4/fast_commit.c | 2 +-
fs/jbd2/commit.c | 5 ++---
include/linux/jbd2.h | 2 +-
3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 0f6d0a80467d..7c6694593497 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -986,7 +986,7 @@ static int ext4_fc_submit_inode_data_all(journal_t *journal)
finish_wait(&ei->i_fc_wait, &wait);
}
spin_unlock(&sbi->s_fc_lock);
- ret = jbd2_submit_inode_data(ei->jinode);
+ ret = jbd2_submit_inode_data(journal, ei->jinode);
if (ret)
return ret;
spin_lock(&sbi->s_fc_lock);
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 885a7a6cc53e..4810438b7856 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -207,14 +207,13 @@ int jbd2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode)
}

/* Send all the data buffers related to an inode */
-int jbd2_submit_inode_data(struct jbd2_inode *jinode)
+int jbd2_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode)
{
-
if (!jinode || !(jinode->i_flags & JI_WRITE_DATA))
return 0;

trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
- return jbd2_journal_submit_inode_data_buffers(jinode);
+ return journal->j_submit_inode_data_buffers(jinode);

}
EXPORT_SYMBOL(jbd2_submit_inode_data);
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 0b7242370b56..2170e0cc279d 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1662,7 +1662,7 @@ int jbd2_fc_begin_commit(journal_t *journal, tid_t tid);
int jbd2_fc_end_commit(journal_t *journal);
int jbd2_fc_end_commit_fallback(journal_t *journal);
int jbd2_fc_get_buf(journal_t *journal, struct buffer_head **bh_out);
-int jbd2_submit_inode_data(struct jbd2_inode *jinode);
+int jbd2_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode);
int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode);
int jbd2_fc_wait_bufs(journal_t *journal, int num_blks);
int jbd2_fc_release_bufs(journal_t *journal);
--
2.35.3

2022-12-07 12:05:47

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 11/13] mm: Export buffer_migrate_folio_norefs()

Looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2022-12-08 15:33:00

by Ritesh Harjani

[permalink] [raw]
Subject: Re: [PATCH v4 09/13] jbd2: Switch jbd2_submit_inode_data() to use fs-provided hook for data writeout

On 22/12/07 12:27PM, Jan Kara wrote:
> jbd2_submit_inode_data() hardcoded use of
> jbd2_journal_submit_inode_data_buffers() for submission of data pages.
> Make it use j_submit_inode_data_buffers hook instead. This effectively
> switches ext4 fastcommits to use ext4_writepages() for data writeout
> instead of generic_writepages().

Very neat!! I agree, that jbd2_submit_inode_data() should have always used
journal->j_submit_inode_data_buffers().

Looks good to me. Please feel free to add:
Reviewed-by: Ritesh Harjani (IBM) <[email protected]>


>
> Signed-off-by: Jan Kara <[email protected]>
> ---
> fs/ext4/fast_commit.c | 2 +-
> fs/jbd2/commit.c | 5 ++---
> include/linux/jbd2.h | 2 +-
> 3 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index 0f6d0a80467d..7c6694593497 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -986,7 +986,7 @@ static int ext4_fc_submit_inode_data_all(journal_t *journal)
> finish_wait(&ei->i_fc_wait, &wait);
> }
> spin_unlock(&sbi->s_fc_lock);
> - ret = jbd2_submit_inode_data(ei->jinode);
> + ret = jbd2_submit_inode_data(journal, ei->jinode);
> if (ret)
> return ret;
> spin_lock(&sbi->s_fc_lock);
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 885a7a6cc53e..4810438b7856 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -207,14 +207,13 @@ int jbd2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode)
> }
>
> /* Send all the data buffers related to an inode */
> -int jbd2_submit_inode_data(struct jbd2_inode *jinode)
> +int jbd2_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode)
> {
> -
> if (!jinode || !(jinode->i_flags & JI_WRITE_DATA))
> return 0;
>
> trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
> - return jbd2_journal_submit_inode_data_buffers(jinode);
> + return journal->j_submit_inode_data_buffers(jinode);
>
> }
> EXPORT_SYMBOL(jbd2_submit_inode_data);
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 0b7242370b56..2170e0cc279d 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -1662,7 +1662,7 @@ int jbd2_fc_begin_commit(journal_t *journal, tid_t tid);
> int jbd2_fc_end_commit(journal_t *journal);
> int jbd2_fc_end_commit_fallback(journal_t *journal);
> int jbd2_fc_get_buf(journal_t *journal, struct buffer_head **bh_out);
> -int jbd2_submit_inode_data(struct jbd2_inode *jinode);
> +int jbd2_submit_inode_data(journal_t *journal, struct jbd2_inode *jinode);
> int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode);
> int jbd2_fc_wait_bufs(journal_t *journal, int num_blks);
> int jbd2_fc_release_bufs(journal_t *journal);
> --
> 2.35.3
>

2023-05-08 18:35:21

by youling 257

[permalink] [raw]
Subject: [PATCH v4 12/13] ext4: Stop providing .writepage hook

I using linux mainline kernel on android. https://github.com/youling257/android-mainline/commits/6.4 https://github.com/youling257/android-mainline/commits/6.3
"ext4: Stop providing .writepage hook" cause some android app unable to read storage/emulated/0 files, i need to say android esdfs file system storage/emulated is ext4 data/media bind mount.
I want to ask, why android storage/emulated need .writepage hook?

2023-05-09 00:31:54

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v4 12/13] ext4: Stop providing .writepage hook

Hello,

On Tue 09-05-23 01:51:08, youling257 wrote:
> I using linux mainline kernel on android.
> https://github.com/youling257/android-mainline/commits/6.4
> https://github.com/youling257/android-mainline/commits/6.3 "ext4: Stop
> providing .writepage hook" cause some android app unable to read
> storage/emulated/0 files, i need to say android esdfs file system
> storage/emulated is ext4 data/media bind mount. I want to ask, why
> android storage/emulated need .writepage hook?

Honestly, I don't know. I guess you need to look into implementation of
esdfs in the Android kernel and why it needs filesystem's .writepage
hook...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2023-05-09 05:16:17

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v4 12/13] ext4: Stop providing .writepage hook

On Tue, May 09, 2023 at 01:51:08AM +0800, youling257 wrote:
> I using linux mainline kernel on android. https://github.com/youling257/android-mainline/commits/6.4 https://github.com/youling257/android-mainline/commits/6.3
> "ext4: Stop providing .writepage hook" cause some android app unable to read storage/emulated/0 files, i need to say android esdfs file system storage/emulated is ext4 data/media bind mount.
> I want to ask, why android storage/emulated need .writepage hook?

"esdfs" doesn't exist upstream, so linux-ext4 can't provide support for it.

Also, it doesn't exist in the Android Common Kernels either, so the Android team
cannot help you either.

- Eric

2023-05-09 19:00:28

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v4 12/13] ext4: Stop providing .writepage hook

On Mon, May 08, 2023 at 10:02:27PM -0700, Eric Biggers wrote:
> On Tue, May 09, 2023 at 01:51:08AM +0800, youling257 wrote:
> > I using linux mainline kernel on android. https://github.com/youling257/android-mainline/commits/6.4 https://github.com/youling257/android-mainline/commits/6.3
> > "ext4: Stop providing .writepage hook" cause some android app unable to read storage/emulated/0 files, i need to say android esdfs file system storage/emulated is ext4 data/media bind mount.
> > I want to ask, why android storage/emulated need .writepage hook?
>
> "esdfs" doesn't exist upstream, so linux-ext4 can't provide support for it.
>
> Also, it doesn't exist in the Android Common Kernels either, so the Android team
> cannot help you either.

The problem with esdfs is that it's based on the old stackable file
system paradigm which is filled with races and is inherently
unreliable (just for fun, try running fsstress on the upper and lower
file systems of a stackable file system simultaneously, and watch the
kernel crash and burn). For that reason, some number of us have been
working for a while to eliminate the need for stacking file systems,
such as sdcardfs. esdfs, etc. from the Android kernel.

The other thing I would add is that upstream has been working[1] on
getting rid of writepage function. So out-of-tree file systems are
going to need to adapt --- or die.

[1] https://lore.kernel.org/all/[email protected]/

It looks like esdfs is coming from the Chromium kernel? The latest
Chromium kernel I can find is 5.15 based, and it has esdfs in it. I'm
sad to see that esdfs hasn't been removed from the Chromium kernel
yet, and replaced with something more stable and reliable, but maybe
we can find someone who is more familiar with the Chromium kernel to
comment.

Cheers,

- Ted

2023-05-10 05:22:07

by youling 257

[permalink] [raw]
Subject: Re: [PATCH v4 12/13] ext4: Stop providing .writepage hook

It isn't android storage emulated esdfs or sdcardfs problem, i test
mount bind /data/media on /storage/emulated, can reproduce my problem.
Let me say clear my problem, "ext4: Stop providing .writepage hook"
cause android gallery app unable read pictures thumbnails.

I test linux kernel 6.3 Revert "ext4: Stop providing .writepage hook",
android gallery can read pictures thumbnails,

this is mount bind /data/media on /storage/emulated
/dev/nvme0n1p1 on /storage/emulated type ext4 (rw,seclabel,noatime)
/dev/nvme0n1p1 on /mnt/runtime/default/emulated type ext4 (rw,seclabel,noatime)
/dev/nvme0n1p1 on /mnt/runtime/read/emulated type ext4 (rw,seclabel,noatime)
/dev/nvme0n1p1 on /mnt/runtime/write/emulated type ext4 (rw,seclabel,noatime)
/dev/nvme0n1p1 on /mnt/runtime/full/emulated type ext4 (rw,seclabel,noatime)

this is esdfs,
/data/media on /mnt/runtime/default/emulated type esdfs
(rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:1015:771:771,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)
/data/media on /storage/emulated type esdfs
(rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:1015:771:771,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)
/data/media on /mnt/runtime/read/emulated type esdfs
(rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:9997:750:750,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)
/data/media on /mnt/runtime/write/emulated type esdfs
(rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:9997:770:770,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)
/data/media on /mnt/runtime/full/emulated type esdfs
(rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:9997:770:770,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)

if i do mount bind data/media on storage/emulated, take a screenshot,
will create /data/media/0/Pictures/Screenshots/Screenshot_20230510-130020.png
file,
chmod -R 0777 /data/media/0/Pictures/Screenshots,
on linux 6.3 Revert "ext4: Stop providing .writepage hook", android
gallery app can read
/storage/emulated/0/Pictures/Screenshots/Screenshot_20230510-130020.png
thumbnail,
linux 6.4 removed .writepage hook, android gallery unable read thumbnail.


2023-05-10 2:36 GMT+08:00, Theodore Ts'o <[email protected]>:
> On Mon, May 08, 2023 at 10:02:27PM -0700, Eric Biggers wrote:
>> On Tue, May 09, 2023 at 01:51:08AM +0800, youling257 wrote:
>> > I using linux mainline kernel on android.
>> > https://github.com/youling257/android-mainline/commits/6.4
>> > https://github.com/youling257/android-mainline/commits/6.3
>> > "ext4: Stop providing .writepage hook" cause some android app unable to
>> > read storage/emulated/0 files, i need to say android esdfs file system
>> > storage/emulated is ext4 data/media bind mount.
>> > I want to ask, why android storage/emulated need .writepage hook?
>>
>> "esdfs" doesn't exist upstream, so linux-ext4 can't provide support for
>> it.
>>
>> Also, it doesn't exist in the Android Common Kernels either, so the
>> Android team
>> cannot help you either.
>
> The problem with esdfs is that it's based on the old stackable file
> system paradigm which is filled with races and is inherently
> unreliable (just for fun, try running fsstress on the upper and lower
> file systems of a stackable file system simultaneously, and watch the
> kernel crash and burn). For that reason, some number of us have been
> working for a while to eliminate the need for stacking file systems,
> such as sdcardfs. esdfs, etc. from the Android kernel.
>
> The other thing I would add is that upstream has been working[1] on
> getting rid of writepage function. So out-of-tree file systems are
> going to need to adapt --- or die.
>
> [1] https://lore.kernel.org/all/[email protected]/
>
> It looks like esdfs is coming from the Chromium kernel? The latest
> Chromium kernel I can find is 5.15 based, and it has esdfs in it. I'm
> sad to see that esdfs hasn't been removed from the Chromium kernel
> yet, and replaced with something more stable and reliable, but maybe
> we can find someone who is more familiar with the Chromium kernel to
> comment.
>
> Cheers,
>
> - Ted
>

2023-05-10 06:08:00

by youling 257

[permalink] [raw]
Subject: Re: [PATCH v4 12/13] ext4: Stop providing .writepage hook

I do more test, it is android esdfs or sdcardfs
/storage/emulated/0/Android/data problem,
"ext4: Stop providing .writepage hook" cause
/storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.0
/storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.1
/storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.idx
unable read,

on linux 6.4, i use mount bind data/media on storage/emulated, chmod
-R 0777 /data/media/0, rm
/storage/emulated/0/Android/data/com.android.gallery3d/cache/*, open
gallery app can read pictures thumbnail,
/storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.idx
/storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.0
/storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.1
available read.


2023-05-10 13:17 GMT+08:00, youling 257 <[email protected]>:
> It isn't android storage emulated esdfs or sdcardfs problem, i test
> mount bind /data/media on /storage/emulated, can reproduce my problem.
> Let me say clear my problem, "ext4: Stop providing .writepage hook"
> cause android gallery app unable read pictures thumbnails.
>
> I test linux kernel 6.3 Revert "ext4: Stop providing .writepage hook",
> android gallery can read pictures thumbnails,
>
> this is mount bind /data/media on /storage/emulated
> /dev/nvme0n1p1 on /storage/emulated type ext4 (rw,seclabel,noatime)
> /dev/nvme0n1p1 on /mnt/runtime/default/emulated type ext4
> (rw,seclabel,noatime)
> /dev/nvme0n1p1 on /mnt/runtime/read/emulated type ext4
> (rw,seclabel,noatime)
> /dev/nvme0n1p1 on /mnt/runtime/write/emulated type ext4
> (rw,seclabel,noatime)
> /dev/nvme0n1p1 on /mnt/runtime/full/emulated type ext4
> (rw,seclabel,noatime)
>
> this is esdfs,
> /data/media on /mnt/runtime/default/emulated type esdfs
> (rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:1015:771:771,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)
> /data/media on /storage/emulated type esdfs
> (rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:1015:771:771,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)
> /data/media on /mnt/runtime/read/emulated type esdfs
> (rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:9997:750:750,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)
> /data/media on /mnt/runtime/write/emulated type esdfs
> (rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:9997:770:770,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)
> /data/media on /mnt/runtime/full/emulated type esdfs
> (rw,nosuid,nodev,noexec,noatime,lower=1023:1023:664:775,upper=0:9997:770:770,derive=multi,noconfine,derive_gid,default_normal,unshared_obb)
>
> if i do mount bind data/media on storage/emulated, take a screenshot,
> will create
> /data/media/0/Pictures/Screenshots/Screenshot_20230510-130020.png
> file,
> chmod -R 0777 /data/media/0/Pictures/Screenshots,
> on linux 6.3 Revert "ext4: Stop providing .writepage hook", android
> gallery app can read
> /storage/emulated/0/Pictures/Screenshots/Screenshot_20230510-130020.png
> thumbnail,
> linux 6.4 removed .writepage hook, android gallery unable read thumbnail.
>
>
> 2023-05-10 2:36 GMT+08:00, Theodore Ts'o <[email protected]>:
>> On Mon, May 08, 2023 at 10:02:27PM -0700, Eric Biggers wrote:
>>> On Tue, May 09, 2023 at 01:51:08AM +0800, youling257 wrote:
>>> > I using linux mainline kernel on android.
>>> > https://github.com/youling257/android-mainline/commits/6.4
>>> > https://github.com/youling257/android-mainline/commits/6.3
>>> > "ext4: Stop providing .writepage hook" cause some android app unable
>>> > to
>>> > read storage/emulated/0 files, i need to say android esdfs file system
>>> > storage/emulated is ext4 data/media bind mount.
>>> > I want to ask, why android storage/emulated need .writepage hook?
>>>
>>> "esdfs" doesn't exist upstream, so linux-ext4 can't provide support for
>>> it.
>>>
>>> Also, it doesn't exist in the Android Common Kernels either, so the
>>> Android team
>>> cannot help you either.
>>
>> The problem with esdfs is that it's based on the old stackable file
>> system paradigm which is filled with races and is inherently
>> unreliable (just for fun, try running fsstress on the upper and lower
>> file systems of a stackable file system simultaneously, and watch the
>> kernel crash and burn). For that reason, some number of us have been
>> working for a while to eliminate the need for stacking file systems,
>> such as sdcardfs. esdfs, etc. from the Android kernel.
>>
>> The other thing I would add is that upstream has been working[1] on
>> getting rid of writepage function. So out-of-tree file systems are
>> going to need to adapt --- or die.
>>
>> [1] https://lore.kernel.org/all/[email protected]/
>>
>> It looks like esdfs is coming from the Chromium kernel? The latest
>> Chromium kernel I can find is 5.15 based, and it has esdfs in it. I'm
>> sad to see that esdfs hasn't been removed from the Chromium kernel
>> yet, and replaced with something more stable and reliable, but maybe
>> we can find someone who is more familiar with the Chromium kernel to
>> comment.
>>
>> Cheers,
>>
>> - Ted
>>
>

2023-05-10 06:55:09

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH v4 12/13] ext4: Stop providing .writepage hook

On Wed, May 10, 2023 at 01:47:58PM +0800, youling 257 wrote:
> I do more test, it is android esdfs or sdcardfs
> /storage/emulated/0/Android/data problem,
> "ext4: Stop providing .writepage hook" cause
> /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.0
> /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.1
> /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.idx
> unable read,
>
> on linux 6.4, i use mount bind data/media on storage/emulated, chmod
> -R 0777 /data/media/0, rm
> /storage/emulated/0/Android/data/com.android.gallery3d/cache/*, open
> gallery app can read pictures thumbnail,
> /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.idx
> /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.0
> /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.1
> available read.

Maybe try reverting your commit that added esdfs to your kernel? It should not
be needed at all.

- Eric

2023-05-10 22:08:56

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v4 12/13] ext4: Stop providing .writepage hook

On Tue, May 09, 2023 at 11:50:36PM -0700, Eric Biggers wrote:
> On Wed, May 10, 2023 at 01:47:58PM +0800, youling 257 wrote:
> > I do more test, it is android esdfs or sdcardfs
> > /storage/emulated/0/Android/data problem,
> > "ext4: Stop providing .writepage hook" cause
> > /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.0
> > /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.1
> > /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.idx
> > unable read,
> >
> > on linux 6.4, i use mount bind data/media on storage/emulated, chmod
> > -R 0777 /data/media/0, rm
> > /storage/emulated/0/Android/data/com.android.gallery3d/cache/*, open
> > gallery app can read pictures thumbnail,
> > /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.idx
> > /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.0
> > /storage/emulated/0/Android/data/com.android.gallery3d/cache/imgcache.1
> > available read.
>
> Maybe try reverting your commit that added esdfs to your kernel? It should not
> be needed at all.

Youling, what version of Android are you trying to run with the latest
bleeding edge kernel? Starting with Android 11, sdcardfs was
deprecated[1].

SDCardFS is deprecated on devices that launch with Android 11 or
higher and run kernel version 5.4 or higher. On such devices, VTS
testing doesn't allow mounted file systems listed as
SDCardFS. Devices that launch with Android 11 or higher but run
kernel version 4.19 or lower can continue to use SDCardFS, but
Google doesn't provide additional support.

[1] https://source.android.com/docs/core/storage/sdcardfs-deprecate

With newer versions of Android, use of something like sdcardfs or
esdfs is not necessary. If you are using an older version of Android,
and you insist on use a bleeding edge kernel where the writepage is
getting deprecated, then someone will need to update esdfs or deal
with the changing internal interfaces of the upstream kernel. This is
not the ext4 upstream developer's problem.

Personally, I would recommend that you *not* try to fix esdfs; that's
because stacking file systems like sdcardfs and esdfs are inherently
unreliable. See the section in [1] entitled, "Why deprecate sdcardfs?".

Instead, what I would recommend is upgrading to a newer version of
Android, and then dropping esdfs from your kernel repository.

- Ted