Add some words about the traversal order and its pagepool usage.
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
---
fs/erofs/zdata.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index cb4d0889eca9..054b9839e9db 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1424,6 +1424,16 @@ static void z_erofs_readahead(struct readahead_control *rac)
f.readahead = true;
f.headoffset = readahead_pos(rac);
+ /*
+ * All pages are locked in the forward order in advance, so directly
+ * traverse pages in the reverse order since:
+ * 1) more effective to get each extent start offset, calculate partial
+ * decompressed length w/o knowing the full extent length (which is
+ * more metadata costly). If traversing in the normal order, it's
+ * mandatory to get full extent length one-by-one.
+ * 2) submission chain can be then in the forward order since
+ * pclusters are all inserted at head.
+ */
while ((page = readahead_page(rac))) {
prefetchw(&page->flags);
@@ -1460,7 +1470,7 @@ static void z_erofs_readahead(struct readahead_control *rac)
if (f.map.mpage)
put_page(f.map.mpage);
- /* clean up the remaining free pages */
+ /* drain the on-stack pagepool with unused non-LRU temporary pages */
put_pages_list(&pagepool);
}
--
2.20.1
In that way, pages can be accessed directly with xarray.
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
---
Just an open-coded PoC one, since that is what EROFS actually needs but
without the iteration API (see [PATCH 1/2] as well.)
fs/erofs/zdata.c | 32 +++++++++++++-------------------
1 file changed, 13 insertions(+), 19 deletions(-)
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 054b9839e9db..210b2965ecc4 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1416,7 +1416,8 @@ static void z_erofs_readahead(struct readahead_control *rac)
bool sync = (sbi->ctx.readahead_sync_decompress &&
nr_pages <= sbi->ctx.max_sync_decompress_pages);
struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode);
- struct page *page, *head = NULL;
+ struct page *page;
+ pgoff_t index;
LIST_HEAD(pagepool);
trace_erofs_readpages(inode, readahead_index(rac), nr_pages, false);
@@ -1434,26 +1435,19 @@ static void z_erofs_readahead(struct readahead_control *rac)
* 2) submission chain can be then in the forward order since
* pclusters are all inserted at head.
*/
- while ((page = readahead_page(rac))) {
- prefetchw(&page->flags);
-
- /*
- * A pure asynchronous readahead is indicated if
- * a PG_readahead marked page is hitted at first.
- * Let's also do asynchronous decompression for this case.
- */
- sync &= !(PageReadahead(page) && !head);
-
- set_page_private(page, (unsigned long)head);
- head = page;
- }
-
- while (head) {
- struct page *page = head;
+ index = rac->_index + rac->_nr_pages;
+ while (rac->_nr_pages) {
+ struct page *head;
int err;
- /* traversal in reverse order */
- head = (void *)page_private(page);
+ --rac->_nr_pages;
+ page = xa_load(&rac->mapping->i_pages, --index);
+ /* XXX: very incomplete thp support */
+ head = thp_head(page);
+ if (head != page) {
+ index -= page->index - head->index;
+ page = head;
+ }
err = z_erofs_do_read_page(&f, page, &pagepool);
if (err)
--
2.20.1
On Tue, Jul 06, 2021 at 02:32:53AM +0800, Gao Xiang wrote:
> In that way, pages can be accessed directly with xarray.
I didn't mean "open code readahead_page()". I meant "Wouldn't it be
great if z_erofs_do_read_page() used readahead_expand() in order to
allocate the extra pages in the extents that cover the start and end of
the requested chunk". That way the pages would already be in the page
cache for subsequent reads.
Hi Matthew,
On Mon, Jul 05, 2021 at 09:15:26PM +0100, Matthew Wilcox wrote:
> On Tue, Jul 06, 2021 at 02:32:53AM +0800, Gao Xiang wrote:
> > In that way, pages can be accessed directly with xarray.
>
> I didn't mean "open code readahead_page()". I meant "Wouldn't it be
> great if z_erofs_do_read_page() used readahead_expand() in order to
> allocate the extra pages in the extents that cover the start and end of
> the requested chunk". That way the pages would already be in the page
> cache for subsequent reads.
Yes, I understand that idea. readahead_expand() can be used to
cache the whole requested decompressed chunk. Currently, EROFS tends
to cache compressed data for incomplete requested chunks instead in the
managed inode if cache mode is enabled since LZ4 compressed data is
more effective for caching to save I/O than uncompressed data
(considering LZ4 decompression speed, which is somewhat like zcache.)
For now, EROFS LZMA support is also on the way (but recently I
have to work on the DAX support in advance.) LZMA is somewhat slow
algorithm but with much better compression ratio. I think it should
decompress all requested chunks to save the overall decompression
time. So readahead_expand() is quite useful with EROFS LZMA cases,
I will investigate it for LZMA-specific cases...
BTW, from the patch [2/2] itself, it there some chance to extend
readahead_page() or some new API like this? So the original dance
can be avoided.
Some comments for the iteration behavior itself is always useful,
anyway...
Thanks,
Gao Xiang
>