2022-06-01 19:59:43

by Hsin-Yi Wang

[permalink] [raw]
Subject: [PATCH v4 0/3] Implement readahead for squashfs

Commit c1f6925e1091("mm: put readahead pages in cache earlier") requires
fs to implement readahead callback. Otherwise there will be a
performance regression.

Commit 9eec1d897139("squashfs: provide backing_dev_info in order to
disable read-ahead") mitigates the performance drop issue for squashfs
by closing readahead for it.

This series implements readahead callback for squashfs. The previous
discussion are in [1] and [2].

[1] https://lore.kernel.org/all/CAJMQK-g9G6KQmH-V=BRGX0swZji9Wxe_2c7ht-MMAapdFy2pXw@mail.gmail.com/T/
[2] https://lore.kernel.org/linux-mm/[email protected]/t/#m4af4473b94f98a4996cb11756b633a07e5e059d1

Hsin-Yi Wang (2):
Revert "squashfs: provide backing_dev_info in order to disable
read-ahead"
squashfs: implement readahead

Phillip Lougher (1):
squashfs: always build "file direct" version of page actor

fs/squashfs/Makefile | 4 +-
fs/squashfs/file.c | 97 +++++++++++++++++++++++++++++++++++++++-
fs/squashfs/page_actor.h | 41 -----------------
fs/squashfs/super.c | 33 --------------
4 files changed, 98 insertions(+), 77 deletions(-)

--
2.36.1.255.ge46751e96f-goog



2022-06-01 20:39:55

by Hsin-Yi Wang

[permalink] [raw]
Subject: [PATCH v4 3/3] squashfs: implement readahead

Implement readahead callback for squashfs. It will read datablocks
which cover pages in readahead request. For a few cases it will
not mark page as uptodate, including:
- file end is 0.
- zero filled blocks.
- current batch of pages isn't in the same datablock or not enough in a
datablock.
- decompressor error.
Otherwise pages will be marked as uptodate. The unhandled pages will be
updated by readpage later.

Suggested-by: Matthew Wilcox <[email protected]>
Signed-off-by: Hsin-Yi Wang <[email protected]>
Reported-by: Matthew Wilcox <[email protected]>
Reported-by: Phillip Lougher <[email protected]>
Reported-by: Xiongwei Song <[email protected]>
---
v3->v4: Fix a few variable type and their locations.
v3: https://lore.kernel.org/lkml/[email protected]/
v2: https://lore.kernel.org/lkml/[email protected]/
v1: https://lore.kernel.org/lkml/[email protected]/
---
fs/squashfs/file.c | 97 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 96 insertions(+), 1 deletion(-)

diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
index a8e495d8eb86..df7ad4b3e99c 100644
--- a/fs/squashfs/file.c
+++ b/fs/squashfs/file.c
@@ -39,6 +39,7 @@
#include "squashfs_fs_sb.h"
#include "squashfs_fs_i.h"
#include "squashfs.h"
+#include "page_actor.h"

/*
* Locate cache slot in range [offset, index] for specified inode. If
@@ -495,7 +496,101 @@ static int squashfs_read_folio(struct file *file, struct folio *folio)
return 0;
}

+static void squashfs_readahead(struct readahead_control *ractl)
+{
+ struct inode *inode = ractl->mapping->host;
+ struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
+ size_t mask = (1UL << msblk->block_log) - 1;
+ unsigned short shift = msblk->block_log - PAGE_SHIFT;
+ loff_t start = readahead_pos(ractl) &~ mask;
+ size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
+ struct squashfs_page_actor *actor;
+ unsigned int nr_pages = 0;
+ struct page **pages;
+ int i, file_end = i_size_read(inode) >> msblk->block_log;
+ unsigned int max_pages = 1UL << shift;
+
+ readahead_expand(ractl, start, (len | mask) + 1);
+
+ if (file_end == 0)
+ return;
+
+ pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
+ if (!pages)
+ return;
+
+ actor = squashfs_page_actor_init_special(pages, max_pages, 0);
+ if (!actor)
+ goto out;
+
+ for (;;) {
+ pgoff_t index;
+ int res, bsize;
+ u64 block = 0;
+ unsigned int expected;
+
+ nr_pages = __readahead_batch(ractl, pages, max_pages);
+ if (!nr_pages)
+ break;
+
+ if (readahead_pos(ractl) >= i_size_read(inode) ||
+ nr_pages < max_pages)
+ goto skip_pages;
+
+ index = pages[0]->index >> shift;
+ if ((pages[nr_pages - 1]->index >> shift) != index)
+ goto skip_pages;
+
+ expected = index == file_end ?
+ (i_size_read(inode) & (msblk->block_size - 1)) :
+ msblk->block_size;
+
+ bsize = read_blocklist(inode, index, &block);
+ if (bsize == 0)
+ goto skip_pages;
+
+ res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
+ actor);
+
+ if (res == expected) {
+ int bytes;
+
+ /* Last page may have trailing bytes not filled */
+ bytes = res % PAGE_SIZE;
+ if (bytes) {
+ void *pageaddr;
+
+ pageaddr = kmap_atomic(pages[nr_pages - 1]);
+ memset(pageaddr + bytes, 0, PAGE_SIZE - bytes);
+ kunmap_atomic(pageaddr);
+ }
+
+ for (i = 0; i < nr_pages; i++)
+ SetPageUptodate(pages[i]);
+ }
+
+ for (i = 0; i < nr_pages; i++) {
+ unlock_page(pages[i]);
+ put_page(pages[i]);
+ }
+ }
+
+ kfree(actor);
+ kfree(pages);
+ return;
+
+skip_pages:
+ for (i = 0; i < nr_pages; i++) {
+ unlock_page(pages[i]);
+ put_page(pages[i]);
+ }
+
+ kfree(actor);
+out:
+ kfree(pages);
+}

const struct address_space_operations squashfs_aops = {
- .read_folio = squashfs_read_folio
+ .read_folio = squashfs_read_folio,
+ .readahead = squashfs_readahead
};
--
2.36.1.255.ge46751e96f-goog


2022-06-01 21:26:59

by Hsin-Yi Wang

[permalink] [raw]
Subject: [PATCH v4 2/3] squashfs: always build "file direct" version of page actor

From: Phillip Lougher <[email protected]>

Squashfs_readahead uses the "file direct" version of the page
actor, and so build it unconditionally.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Phillip Lougher <[email protected]>
Signed-off-by: Hsin-Yi Wang <[email protected]>
---
fs/squashfs/Makefile | 4 ++--
fs/squashfs/page_actor.h | 41 ----------------------------------------
2 files changed, 2 insertions(+), 43 deletions(-)

diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile
index 7bd9b8b856d0..477c89a519ee 100644
--- a/fs/squashfs/Makefile
+++ b/fs/squashfs/Makefile
@@ -5,9 +5,9 @@

obj-$(CONFIG_SQUASHFS) += squashfs.o
squashfs-y += block.o cache.o dir.o export.o file.o fragment.o id.o inode.o
-squashfs-y += namei.o super.o symlink.o decompressor.o
+squashfs-y += namei.o super.o symlink.o decompressor.o page_actor.o
squashfs-$(CONFIG_SQUASHFS_FILE_CACHE) += file_cache.o
-squashfs-$(CONFIG_SQUASHFS_FILE_DIRECT) += file_direct.o page_actor.o
+squashfs-$(CONFIG_SQUASHFS_FILE_DIRECT) += file_direct.o
squashfs-$(CONFIG_SQUASHFS_DECOMP_SINGLE) += decompressor_single.o
squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI) += decompressor_multi.o
squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU) += decompressor_multi_percpu.o
diff --git a/fs/squashfs/page_actor.h b/fs/squashfs/page_actor.h
index 2e3073ace009..26e07373af8a 100644
--- a/fs/squashfs/page_actor.h
+++ b/fs/squashfs/page_actor.h
@@ -6,46 +6,6 @@
* Phillip Lougher <[email protected]>
*/

-#ifndef CONFIG_SQUASHFS_FILE_DIRECT
-struct squashfs_page_actor {
- void **page;
- int pages;
- int length;
- int next_page;
-};
-
-static inline struct squashfs_page_actor *squashfs_page_actor_init(void **page,
- int pages, int length)
-{
- struct squashfs_page_actor *actor = kmalloc(sizeof(*actor), GFP_KERNEL);
-
- if (actor == NULL)
- return NULL;
-
- actor->length = length ? : pages * PAGE_SIZE;
- actor->page = page;
- actor->pages = pages;
- actor->next_page = 0;
- return actor;
-}
-
-static inline void *squashfs_first_page(struct squashfs_page_actor *actor)
-{
- actor->next_page = 1;
- return actor->page[0];
-}
-
-static inline void *squashfs_next_page(struct squashfs_page_actor *actor)
-{
- return actor->next_page == actor->pages ? NULL :
- actor->page[actor->next_page++];
-}
-
-static inline void squashfs_finish_page(struct squashfs_page_actor *actor)
-{
- /* empty */
-}
-#else
struct squashfs_page_actor {
union {
void **buffer;
@@ -76,4 +36,3 @@ static inline void squashfs_finish_page(struct squashfs_page_actor *actor)
actor->squashfs_finish_page(actor);
}
#endif
-#endif
--
2.36.1.255.ge46751e96f-goog


2022-06-03 18:59:41

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] squashfs: implement readahead

Hi,

On 01.06.2022 12:39, Hsin-Yi Wang wrote:
> Implement readahead callback for squashfs. It will read datablocks
> which cover pages in readahead request. For a few cases it will
> not mark page as uptodate, including:
> - file end is 0.
> - zero filled blocks.
> - current batch of pages isn't in the same datablock or not enough in a
> datablock.
> - decompressor error.
> Otherwise pages will be marked as uptodate. The unhandled pages will be
> updated by readpage later.
>
> Suggested-by: Matthew Wilcox <[email protected]>
> Signed-off-by: Hsin-Yi Wang <[email protected]>
> Reported-by: Matthew Wilcox <[email protected]>
> Reported-by: Phillip Lougher <[email protected]>
> Reported-by: Xiongwei Song <[email protected]>
> ---

This patch landed recently in linux-next as commit 95f7a26191de
("squashfs: implement readahead"). I've noticed that it causes serious
issues on my test systems (various ARM 32bit and 64bit based boards).
The easiest way to observe is udev timeout 'waiting for /dev to be fully
populated' and prolonged booting time. I'm using squashfs for deploying
kernel modules via initrd. Reverting aeefca9dfae7 & 95f7a26191deon on
top of the next-20220603 fixes the issue.

Let me know how I can help debugging this issue. There is no hurry
though, because the next week I will be on holidays.

> v3->v4: Fix a few variable type and their locations.
> v3: https://lore.kernel.org/lkml/[email protected]/
> v2: https://lore.kernel.org/lkml/[email protected]/
> v1: https://lore.kernel.org/lkml/[email protected]/
> ---
> fs/squashfs/file.c | 97 +++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 96 insertions(+), 1 deletion(-)
>
> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> index a8e495d8eb86..df7ad4b3e99c 100644
> --- a/fs/squashfs/file.c
> +++ b/fs/squashfs/file.c
> @@ -39,6 +39,7 @@
> #include "squashfs_fs_sb.h"
> #include "squashfs_fs_i.h"
> #include "squashfs.h"
> +#include "page_actor.h"
>
> /*
> * Locate cache slot in range [offset, index] for specified inode. If
> @@ -495,7 +496,101 @@ static int squashfs_read_folio(struct file *file, struct folio *folio)
> return 0;
> }
>
> +static void squashfs_readahead(struct readahead_control *ractl)
> +{
> + struct inode *inode = ractl->mapping->host;
> + struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
> + size_t mask = (1UL << msblk->block_log) - 1;
> + unsigned short shift = msblk->block_log - PAGE_SHIFT;
> + loff_t start = readahead_pos(ractl) &~ mask;
> + size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
> + struct squashfs_page_actor *actor;
> + unsigned int nr_pages = 0;
> + struct page **pages;
> + int i, file_end = i_size_read(inode) >> msblk->block_log;
> + unsigned int max_pages = 1UL << shift;
> +
> + readahead_expand(ractl, start, (len | mask) + 1);
> +
> + if (file_end == 0)
> + return;
> +
> + pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
> + if (!pages)
> + return;
> +
> + actor = squashfs_page_actor_init_special(pages, max_pages, 0);
> + if (!actor)
> + goto out;
> +
> + for (;;) {
> + pgoff_t index;
> + int res, bsize;
> + u64 block = 0;
> + unsigned int expected;
> +
> + nr_pages = __readahead_batch(ractl, pages, max_pages);
> + if (!nr_pages)
> + break;
> +
> + if (readahead_pos(ractl) >= i_size_read(inode) ||
> + nr_pages < max_pages)
> + goto skip_pages;
> +
> + index = pages[0]->index >> shift;
> + if ((pages[nr_pages - 1]->index >> shift) != index)
> + goto skip_pages;
> +
> + expected = index == file_end ?
> + (i_size_read(inode) & (msblk->block_size - 1)) :
> + msblk->block_size;
> +
> + bsize = read_blocklist(inode, index, &block);
> + if (bsize == 0)
> + goto skip_pages;
> +
> + res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
> + actor);
> +
> + if (res == expected) {
> + int bytes;
> +
> + /* Last page may have trailing bytes not filled */
> + bytes = res % PAGE_SIZE;
> + if (bytes) {
> + void *pageaddr;
> +
> + pageaddr = kmap_atomic(pages[nr_pages - 1]);
> + memset(pageaddr + bytes, 0, PAGE_SIZE - bytes);
> + kunmap_atomic(pageaddr);
> + }
> +
> + for (i = 0; i < nr_pages; i++)
> + SetPageUptodate(pages[i]);
> + }
> +
> + for (i = 0; i < nr_pages; i++) {
> + unlock_page(pages[i]);
> + put_page(pages[i]);
> + }
> + }
> +
> + kfree(actor);
> + kfree(pages);
> + return;
> +
> +skip_pages:
> + for (i = 0; i < nr_pages; i++) {
> + unlock_page(pages[i]);
> + put_page(pages[i]);
> + }
> +
> + kfree(actor);
> +out:
> + kfree(pages);
> +}
>
> const struct address_space_operations squashfs_aops = {
> - .read_folio = squashfs_read_folio
> + .read_folio = squashfs_read_folio,
> + .readahead = squashfs_readahead
> };

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2022-06-04 07:35:59

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] squashfs: implement readahead

On Fri, Jun 03, 2022 at 10:55:01PM +0800, Hsin-Yi Wang wrote:
> On Fri, Jun 3, 2022 at 10:10 PM Marek Szyprowski
> <[email protected]> wrote:
> >
> > Hi Matthew,
> >
> > On 03.06.2022 14:59, Matthew Wilcox wrote:
> > > On Fri, Jun 03, 2022 at 02:54:21PM +0200, Marek Szyprowski wrote:
> > >> On 01.06.2022 12:39, Hsin-Yi Wang wrote:
> > >>> Implement readahead callback for squashfs. It will read datablocks
> > >>> which cover pages in readahead request. For a few cases it will
> > >>> not mark page as uptodate, including:
> > >>> - file end is 0.
> > >>> - zero filled blocks.
> > >>> - current batch of pages isn't in the same datablock or not enough in a
> > >>> datablock.
> > >>> - decompressor error.
> > >>> Otherwise pages will be marked as uptodate. The unhandled pages will be
> > >>> updated by readpage later.
> > >>>
> > >>> Suggested-by: Matthew Wilcox <[email protected]>
> > >>> Signed-off-by: Hsin-Yi Wang <[email protected]>
> > >>> Reported-by: Matthew Wilcox <[email protected]>
> > >>> Reported-by: Phillip Lougher <[email protected]>
> > >>> Reported-by: Xiongwei Song <[email protected]>
> > >>> ---
> > >> This patch landed recently in linux-next as commit 95f7a26191de
> > >> ("squashfs: implement readahead"). I've noticed that it causes serious
> > >> issues on my test systems (various ARM 32bit and 64bit based boards).
> > >> The easiest way to observe is udev timeout 'waiting for /dev to be fully
> > >> populated' and prolonged booting time. I'm using squashfs for deploying
> > >> kernel modules via initrd. Reverting aeefca9dfae7 & 95f7a26191deon on
> > >> top of the next-20220603 fixes the issue.
> > > How large are these files? Just a few kilobytes?
> >
> > Yes, they are small, most of them are smaller than 16KB, some about
> > 128KB and a few about 256KB. I've sent a detailed list in private mail.
> >
>
> Hi Marek,
>
> Are there any obvious squashfs errors in dmesg? Did you enable
> CONFIG_SQUASHFS_FILE_DIRECT or CONFIG_SQUASHFS_FILE_CACHE?

I don't think it's an error problem. I think it's a short file problem.

As I understand the current code (and apologies for not keeping up
to date with how the patch is progressing), if the file is less than
msblk->block_size bytes, we'll leave all the pages as !uptodate, leaving
them to be brough uptodate by squashfs_read_folio(). So Marek is hitting
the worst case scenario where we re-read the entire block for each page
in it. I think we have to handle this tail case in ->readahead().

2022-06-05 18:04:42

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] squashfs: implement readahead

Hi Matthew,

On 03.06.2022 14:59, Matthew Wilcox wrote:
> On Fri, Jun 03, 2022 at 02:54:21PM +0200, Marek Szyprowski wrote:
>> On 01.06.2022 12:39, Hsin-Yi Wang wrote:
>>> Implement readahead callback for squashfs. It will read datablocks
>>> which cover pages in readahead request. For a few cases it will
>>> not mark page as uptodate, including:
>>> - file end is 0.
>>> - zero filled blocks.
>>> - current batch of pages isn't in the same datablock or not enough in a
>>> datablock.
>>> - decompressor error.
>>> Otherwise pages will be marked as uptodate. The unhandled pages will be
>>> updated by readpage later.
>>>
>>> Suggested-by: Matthew Wilcox <[email protected]>
>>> Signed-off-by: Hsin-Yi Wang <[email protected]>
>>> Reported-by: Matthew Wilcox <[email protected]>
>>> Reported-by: Phillip Lougher <[email protected]>
>>> Reported-by: Xiongwei Song <[email protected]>
>>> ---
>> This patch landed recently in linux-next as commit 95f7a26191de
>> ("squashfs: implement readahead"). I've noticed that it causes serious
>> issues on my test systems (various ARM 32bit and 64bit based boards).
>> The easiest way to observe is udev timeout 'waiting for /dev to be fully
>> populated' and prolonged booting time. I'm using squashfs for deploying
>> kernel modules via initrd. Reverting aeefca9dfae7 & 95f7a26191deon on
>> top of the next-20220603 fixes the issue.
> How large are these files? Just a few kilobytes?

Yes, they are small, most of them are smaller than 16KB, some about
128KB and a few about 256KB. I've sent a detailed list in private mail.

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2022-06-06 03:38:58

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] squashfs: implement readahead

On Fri, Jun 03, 2022 at 02:54:21PM +0200, Marek Szyprowski wrote:
> Hi,
>
> On 01.06.2022 12:39, Hsin-Yi Wang wrote:
> > Implement readahead callback for squashfs. It will read datablocks
> > which cover pages in readahead request. For a few cases it will
> > not mark page as uptodate, including:
> > - file end is 0.
> > - zero filled blocks.
> > - current batch of pages isn't in the same datablock or not enough in a
> > datablock.
> > - decompressor error.
> > Otherwise pages will be marked as uptodate. The unhandled pages will be
> > updated by readpage later.
> >
> > Suggested-by: Matthew Wilcox <[email protected]>
> > Signed-off-by: Hsin-Yi Wang <[email protected]>
> > Reported-by: Matthew Wilcox <[email protected]>
> > Reported-by: Phillip Lougher <[email protected]>
> > Reported-by: Xiongwei Song <[email protected]>
> > ---
>
> This patch landed recently in linux-next as commit 95f7a26191de
> ("squashfs: implement readahead"). I've noticed that it causes serious
> issues on my test systems (various ARM 32bit and 64bit based boards).
> The easiest way to observe is udev timeout 'waiting for /dev to be fully
> populated' and prolonged booting time. I'm using squashfs for deploying
> kernel modules via initrd. Reverting aeefca9dfae7 & 95f7a26191deon on
> top of the next-20220603 fixes the issue.

How large are these files? Just a few kilobytes?

2022-06-06 03:39:08

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] squashfs: implement readahead

Hi Matthew,

On 03.06.2022 17:29, Matthew Wilcox wrote:
> On Fri, Jun 03, 2022 at 10:55:01PM +0800, Hsin-Yi Wang wrote:
>> On Fri, Jun 3, 2022 at 10:10 PM Marek Szyprowski
>> <[email protected]> wrote:
>>> Hi Matthew,
>>>
>>> On 03.06.2022 14:59, Matthew Wilcox wrote:
>>>> On Fri, Jun 03, 2022 at 02:54:21PM +0200, Marek Szyprowski wrote:
>>>>> On 01.06.2022 12:39, Hsin-Yi Wang wrote:
>>>>>> Implement readahead callback for squashfs. It will read datablocks
>>>>>> which cover pages in readahead request. For a few cases it will
>>>>>> not mark page as uptodate, including:
>>>>>> - file end is 0.
>>>>>> - zero filled blocks.
>>>>>> - current batch of pages isn't in the same datablock or not enough in a
>>>>>> datablock.
>>>>>> - decompressor error.
>>>>>> Otherwise pages will be marked as uptodate. The unhandled pages will be
>>>>>> updated by readpage later.
>>>>>>
>>>>>> Suggested-by: Matthew Wilcox <[email protected]>
>>>>>> Signed-off-by: Hsin-Yi Wang <[email protected]>
>>>>>> Reported-by: Matthew Wilcox <[email protected]>
>>>>>> Reported-by: Phillip Lougher <[email protected]>
>>>>>> Reported-by: Xiongwei Song <[email protected]>
>>>>>> ---
>>>>> This patch landed recently in linux-next as commit 95f7a26191de
>>>>> ("squashfs: implement readahead"). I've noticed that it causes serious
>>>>> issues on my test systems (various ARM 32bit and 64bit based boards).
>>>>> The easiest way to observe is udev timeout 'waiting for /dev to be fully
>>>>> populated' and prolonged booting time. I'm using squashfs for deploying
>>>>> kernel modules via initrd. Reverting aeefca9dfae7 & 95f7a26191deon on
>>>>> top of the next-20220603 fixes the issue.
>>>> How large are these files? Just a few kilobytes?
>>> Yes, they are small, most of them are smaller than 16KB, some about
>>> 128KB and a few about 256KB. I've sent a detailed list in private mail.
>>>
>> Hi Marek,
>>
>> Are there any obvious squashfs errors in dmesg? Did you enable
>> CONFIG_SQUASHFS_FILE_DIRECT or CONFIG_SQUASHFS_FILE_CACHE?
> I don't think it's an error problem. I think it's a short file problem.
>
> As I understand the current code (and apologies for not keeping up
> to date with how the patch is progressing), if the file is less than
> msblk->block_size bytes, we'll leave all the pages as !uptodate, leaving
> them to be brough uptodate by squashfs_read_folio(). So Marek is hitting
> the worst case scenario where we re-read the entire block for each page
> in it. I think we have to handle this tail case in ->readahead().

I'm not sure if this is related to reading of small files. There are
only 50 modules being loaded from squashfs volume. I did a quick test of
reading the files.

Simple file read with this patch:

root@target:~# time find /initrd/ -type f | while read f; do cat $f
>/dev/null; done

real    0m5.865s
user    0m2.362s
sys     0m3.844s

Without:

root@target:~# time find /initrd/ -type f | while read f; do cat $f
>/dev/null; done

real    0m6.619s
user    0m2.112s
sys     0m4.827s

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2022-06-06 04:34:04

by Hsin-Yi Wang

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] squashfs: implement readahead

On Fri, Jun 3, 2022 at 10:10 PM Marek Szyprowski
<[email protected]> wrote:
>
> Hi Matthew,
>
> On 03.06.2022 14:59, Matthew Wilcox wrote:
> > On Fri, Jun 03, 2022 at 02:54:21PM +0200, Marek Szyprowski wrote:
> >> On 01.06.2022 12:39, Hsin-Yi Wang wrote:
> >>> Implement readahead callback for squashfs. It will read datablocks
> >>> which cover pages in readahead request. For a few cases it will
> >>> not mark page as uptodate, including:
> >>> - file end is 0.
> >>> - zero filled blocks.
> >>> - current batch of pages isn't in the same datablock or not enough in a
> >>> datablock.
> >>> - decompressor error.
> >>> Otherwise pages will be marked as uptodate. The unhandled pages will be
> >>> updated by readpage later.
> >>>
> >>> Suggested-by: Matthew Wilcox <[email protected]>
> >>> Signed-off-by: Hsin-Yi Wang <[email protected]>
> >>> Reported-by: Matthew Wilcox <[email protected]>
> >>> Reported-by: Phillip Lougher <[email protected]>
> >>> Reported-by: Xiongwei Song <[email protected]>
> >>> ---
> >> This patch landed recently in linux-next as commit 95f7a26191de
> >> ("squashfs: implement readahead"). I've noticed that it causes serious
> >> issues on my test systems (various ARM 32bit and 64bit based boards).
> >> The easiest way to observe is udev timeout 'waiting for /dev to be fully
> >> populated' and prolonged booting time. I'm using squashfs for deploying
> >> kernel modules via initrd. Reverting aeefca9dfae7 & 95f7a26191deon on
> >> top of the next-20220603 fixes the issue.
> > How large are these files? Just a few kilobytes?
>
> Yes, they are small, most of them are smaller than 16KB, some about
> 128KB and a few about 256KB. I've sent a detailed list in private mail.
>

Hi Marek,

Are there any obvious squashfs errors in dmesg? Did you enable
CONFIG_SQUASHFS_FILE_DIRECT or CONFIG_SQUASHFS_FILE_CACHE?

Thanks

> Best regards
> --
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
>

2022-06-06 04:47:52

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] squashfs: implement readahead

Hi,

On 03.06.2022 16:55, Hsin-Yi Wang wrote:
> On Fri, Jun 3, 2022 at 10:10 PM Marek Szyprowski
> <[email protected]> wrote:
>> On 03.06.2022 14:59, Matthew Wilcox wrote:
>>> On Fri, Jun 03, 2022 at 02:54:21PM +0200, Marek Szyprowski wrote:
>>>> On 01.06.2022 12:39, Hsin-Yi Wang wrote:
>>>>> Implement readahead callback for squashfs. It will read datablocks
>>>>> which cover pages in readahead request. For a few cases it will
>>>>> not mark page as uptodate, including:
>>>>> - file end is 0.
>>>>> - zero filled blocks.
>>>>> - current batch of pages isn't in the same datablock or not enough in a
>>>>> datablock.
>>>>> - decompressor error.
>>>>> Otherwise pages will be marked as uptodate. The unhandled pages will be
>>>>> updated by readpage later.
>>>>>
>>>>> Suggested-by: Matthew Wilcox <[email protected]>
>>>>> Signed-off-by: Hsin-Yi Wang <[email protected]>
>>>>> Reported-by: Matthew Wilcox <[email protected]>
>>>>> Reported-by: Phillip Lougher <[email protected]>
>>>>> Reported-by: Xiongwei Song <[email protected]>
>>>>> ---
>>>> This patch landed recently in linux-next as commit 95f7a26191de
>>>> ("squashfs: implement readahead"). I've noticed that it causes serious
>>>> issues on my test systems (various ARM 32bit and 64bit based boards).
>>>> The easiest way to observe is udev timeout 'waiting for /dev to be fully
>>>> populated' and prolonged booting time. I'm using squashfs for deploying
>>>> kernel modules via initrd. Reverting aeefca9dfae7 & 95f7a26191deon on
>>>> top of the next-20220603 fixes the issue.
>>> How large are these files? Just a few kilobytes?
>> Yes, they are small, most of them are smaller than 16KB, some about
>> 128KB and a few about 256KB. I've sent a detailed list in private mail.
>>
> Hi Marek,
>
> Are there any obvious squashfs errors in dmesg? Did you enable
> CONFIG_SQUASHFS_FILE_DIRECT or CONFIG_SQUASHFS_FILE_CACHE?

Here are related config entries from my .config:

CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_FILE_CACHE=y
# CONFIG_SQUASHFS_FILE_DIRECT is not set
CONFIG_SQUASHFS_DECOMP_SINGLE=y
# CONFIG_SQUASHFS_DECOMP_MULTI is not set
# CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU is not set
# CONFIG_SQUASHFS_XATTR is not set
CONFIG_SQUASHFS_ZLIB=y
# CONFIG_SQUASHFS_LZ4 is not set
CONFIG_SQUASHFS_LZO=y
CONFIG_SQUASHFS_XZ=y
# CONFIG_SQUASHFS_ZSTD is not set
# CONFIG_SQUASHFS_4K_DEVBLK_SIZE is not set
# CONFIG_SQUASHFS_EMBEDDED is not set
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3

(I've used standard ARM 32bit multi_v7_defconfig)

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2022-06-06 06:10:39

by Phillip Lougher

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] squashfs: implement readahead

On 03/06/2022 16:58, Marek Szyprowski wrote:
> Hi Matthew,
>
> On 03.06.2022 17:29, Matthew Wilcox wrote:
>> On Fri, Jun 03, 2022 at 10:55:01PM +0800, Hsin-Yi Wang wrote:
>>> On Fri, Jun 3, 2022 at 10:10 PM Marek Szyprowski
>>> <[email protected]> wrote:
>>>> Hi Matthew,
>>>>
>>>> On 03.06.2022 14:59, Matthew Wilcox wrote:
>>>>> On Fri, Jun 03, 2022 at 02:54:21PM +0200, Marek Szyprowski wrote:
>>>>>> On 01.06.2022 12:39, Hsin-Yi Wang wrote:
>>>>>>> Implement readahead callback for squashfs. It will read datablocks
>>>>>>> which cover pages in readahead request. For a few cases it will
>>>>>>> not mark page as uptodate, including:
>>>>>>> - file end is 0.
>>>>>>> - zero filled blocks.
>>>>>>> - current batch of pages isn't in the same datablock or not enough in a
>>>>>>> datablock.
>>>>>>> - decompressor error.
>>>>>>> Otherwise pages will be marked as uptodate. The unhandled pages will be
>>>>>>> updated by readpage later.
>>>>>>>
>>>>>>> Suggested-by: Matthew Wilcox <[email protected]>
>>>>>>> Signed-off-by: Hsin-Yi Wang <[email protected]>
>>>>>>> Reported-by: Matthew Wilcox <[email protected]>
>>>>>>> Reported-by: Phillip Lougher <[email protected]>
>>>>>>> Reported-by: Xiongwei Song <[email protected]>
>>>>>>> ---
>>>>>> This patch landed recently in linux-next as commit 95f7a26191de
>>>>>> ("squashfs: implement readahead"). I've noticed that it causes serious
>>>>>> issues on my test systems (various ARM 32bit and 64bit based boards).
>>>>>> The easiest way to observe is udev timeout 'waiting for /dev to be fully
>>>>>> populated' and prolonged booting time. I'm using squashfs for deploying
>>>>>> kernel modules via initrd. Reverting aeefca9dfae7 & 95f7a26191deon on
>>>>>> top of the next-20220603 fixes the issue.
>>>>> How large are these files? Just a few kilobytes?
>>>> Yes, they are small, most of them are smaller than 16KB, some about
>>>> 128KB and a few about 256KB. I've sent a detailed list in private mail.
>>>>
>>> Hi Marek,
>>>
>>> Are there any obvious squashfs errors in dmesg? Did you enable
>>> CONFIG_SQUASHFS_FILE_DIRECT or CONFIG_SQUASHFS_FILE_CACHE?
>> I don't think it's an error problem. I think it's a short file problem.
>>
>> As I understand the current code (and apologies for not keeping up
>> to date with how the patch is progressing), if the file is less than
>> msblk->block_size bytes, we'll leave all the pages as !uptodate, leaving
>> them to be brough uptodate by squashfs_read_folio(). So Marek is hitting
>> the worst case scenario where we re-read the entire block for each page
>> in it. I think we have to handle this tail case in ->readahead().
>
> I'm not sure if this is related to reading of small files. There are
> only 50 modules being loaded from squashfs volume. I did a quick test of
> reading the files.
>
> Simple file read with this patch:
>
> root@target:~# time find /initrd/ -type f | while read f; do cat $f
> >/dev/null; done
>
> real    0m5.865s
> user    0m2.362s
> sys     0m3.844s
>
> Without:
>
> root@target:~# time find /initrd/ -type f | while read f; do cat $f
> >/dev/null; done
>
> real    0m6.619s
> user    0m2.112s
> sys     0m4.827s
>

It has been a four day holiday in the UK (Queen's Platinum Jubilee),
hence the delay in responding.

The above read use-case is sequential (only one thread/process),
whereas the use-case where the slow-down is observed may be
parallel (multiple threads/processes entering Squashfs).

The above sequential use-case if the small files are held in
fragments, will be exhibiting caching behaviour that will
ameliorate the case where the same block is being repeatedly
re-read for each page in it. Because each time
Squashfs is re-entered handling only a single page, the
decompressed block will be found in the fragment
cache, eliminating a block decompression for each page.

In a parallel use-case the decompressed fragment block
may be being eliminated from the cache (by other reading
processes), hence forcing the block to be repeatedly
decompressed.

Hence the slow-down will be much more noticable with a
parallel use-case than a sequential use-case. It also may
be why this slipped through testing, if the test cases
are purely sequential in nature.

So Matthew's previous comment is still the most likely
explanation for the slow-down.

Phillip

> Best regards

2022-06-06 10:17:53

by Hsin-Yi Wang

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] squashfs: implement readahead

On Mon, Jun 6, 2022 at 11:54 AM Phillip Lougher <[email protected]> wrote:
>
> On 03/06/2022 16:58, Marek Szyprowski wrote:
> > Hi Matthew,
> >
> > On 03.06.2022 17:29, Matthew Wilcox wrote:
> >> On Fri, Jun 03, 2022 at 10:55:01PM +0800, Hsin-Yi Wang wrote:
> >>> On Fri, Jun 3, 2022 at 10:10 PM Marek Szyprowski
> >>> <[email protected]> wrote:
> >>>> Hi Matthew,
> >>>>
> >>>> On 03.06.2022 14:59, Matthew Wilcox wrote:
> >>>>> On Fri, Jun 03, 2022 at 02:54:21PM +0200, Marek Szyprowski wrote:
> >>>>>> On 01.06.2022 12:39, Hsin-Yi Wang wrote:
> >>>>>>> Implement readahead callback for squashfs. It will read datablocks
> >>>>>>> which cover pages in readahead request. For a few cases it will
> >>>>>>> not mark page as uptodate, including:
> >>>>>>> - file end is 0.
> >>>>>>> - zero filled blocks.
> >>>>>>> - current batch of pages isn't in the same datablock or not enough in a
> >>>>>>> datablock.
> >>>>>>> - decompressor error.
> >>>>>>> Otherwise pages will be marked as uptodate. The unhandled pages will be
> >>>>>>> updated by readpage later.
> >>>>>>>
> >>>>>>> Suggested-by: Matthew Wilcox <[email protected]>
> >>>>>>> Signed-off-by: Hsin-Yi Wang <[email protected]>
> >>>>>>> Reported-by: Matthew Wilcox <[email protected]>
> >>>>>>> Reported-by: Phillip Lougher <[email protected]>
> >>>>>>> Reported-by: Xiongwei Song <[email protected]>
> >>>>>>> ---
> >>>>>> This patch landed recently in linux-next as commit 95f7a26191de
> >>>>>> ("squashfs: implement readahead"). I've noticed that it causes serious
> >>>>>> issues on my test systems (various ARM 32bit and 64bit based boards).
> >>>>>> The easiest way to observe is udev timeout 'waiting for /dev to be fully
> >>>>>> populated' and prolonged booting time. I'm using squashfs for deploying
> >>>>>> kernel modules via initrd. Reverting aeefca9dfae7 & 95f7a26191deon on
> >>>>>> top of the next-20220603 fixes the issue.
> >>>>> How large are these files? Just a few kilobytes?
> >>>> Yes, they are small, most of them are smaller than 16KB, some about
> >>>> 128KB and a few about 256KB. I've sent a detailed list in private mail.
> >>>>
> >>> Hi Marek,
> >>>
> >>> Are there any obvious squashfs errors in dmesg? Did you enable
> >>> CONFIG_SQUASHFS_FILE_DIRECT or CONFIG_SQUASHFS_FILE_CACHE?
> >> I don't think it's an error problem. I think it's a short file problem.
> >>
> >> As I understand the current code (and apologies for not keeping up
> >> to date with how the patch is progressing), if the file is less than
> >> msblk->block_size bytes, we'll leave all the pages as !uptodate, leaving
> >> them to be brough uptodate by squashfs_read_folio(). So Marek is hitting
> >> the worst case scenario where we re-read the entire block for each page
> >> in it. I think we have to handle this tail case in ->readahead().
> >
> > I'm not sure if this is related to reading of small files. There are
> > only 50 modules being loaded from squashfs volume. I did a quick test of
> > reading the files.
> >
> > Simple file read with this patch:
> >
> > root@target:~# time find /initrd/ -type f | while read f; do cat $f
> > >/dev/null; done
> >
> > real 0m5.865s
> > user 0m2.362s
> > sys 0m3.844s
> >
> > Without:
> >
> > root@target:~# time find /initrd/ -type f | while read f; do cat $f
> > >/dev/null; done
> >
> > real 0m6.619s
> > user 0m2.112s
> > sys 0m4.827s
> >
>
> It has been a four day holiday in the UK (Queen's Platinum Jubilee),
> hence the delay in responding.
>
> The above read use-case is sequential (only one thread/process),
> whereas the use-case where the slow-down is observed may be
> parallel (multiple threads/processes entering Squashfs).
>
> The above sequential use-case if the small files are held in
> fragments, will be exhibiting caching behaviour that will
> ameliorate the case where the same block is being repeatedly
> re-read for each page in it. Because each time
> Squashfs is re-entered handling only a single page, the
> decompressed block will be found in the fragment
> cache, eliminating a block decompression for each page.
>
> In a parallel use-case the decompressed fragment block
> may be being eliminated from the cache (by other reading
> processes), hence forcing the block to be repeatedly
> decompressed.
>
> Hence the slow-down will be much more noticable with a
> parallel use-case than a sequential use-case. It also may
> be why this slipped through testing, if the test cases
> are purely sequential in nature.
>
> So Matthew's previous comment is still the most likely
> explanation for the slow-down.
>
Thanks for the pointers. To deal with short file cases (nr_pages <
max_pages), Can we refer to squashfs_fill_page() used in
squashfs_read_cache(), similar to the case where there are missing
pages on the block?

Directly calling squashfs_read_data() on short files will lead to crash:

Unable to handle kernel paging request at virtual address:
[ 19.244654] zlib_inflate+0xba4/0x10c8
[ 19.244658] zlib_uncompress+0x150/0x1bc
[ 19.244662] squashfs_decompress+0x6c/0xb4
[ 19.244669] squashfs_read_data+0x1a8/0x298
[ 19.244673] squashfs_readahead+0x2cc/0x4cc

I also noticed that the function didn't set flush_dcache_page() with
SetPageUptodate() previously.

Put these 2 issues together:

diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
index 658fb98af0cd..27519f1f9045 100644
--- a/fs/squashfs/file.c
+++ b/fs/squashfs/file.c
@@ -532,8 +532,7 @@ static void squashfs_readahead(struct
readahead_control *ractl)
if (!nr_pages)
break;

- if (readahead_pos(ractl) >= i_size_read(inode) ||
- nr_pages < max_pages)
+ if (readahead_pos(ractl) >= i_size_read(inode))
goto skip_pages;

index = pages[0]->index >> shift;
@@ -548,6 +547,23 @@ static void squashfs_readahead(struct
readahead_control *ractl)
if (bsize == 0)
goto skip_pages;

+ if (nr_pages < max_pages) {
+ struct squashfs_cache_entry *buffer;
+
+ buffer = squashfs_get_datablock(inode->i_sb, block,
+ bsize);
+ if (!buffer->error) {
+ for (i = 0; i < nr_pages && expected > 0; i++,
+ expected -= PAGE_SIZE) {
+ int avail = min_t(int,
expected, PAGE_SIZE);
+
+ squashfs_fill_page(pages[i],
buffer, i * PAGE_SIZE, avail);
+ }
+ }
+ squashfs_cache_put(buffer);
+ goto skip_pages;
+ }
+
res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
actor);

@@ -564,8 +580,10 @@ static void squashfs_readahead(struct
readahead_control *ractl)
kunmap_atomic(pageaddr);
}

- for (i = 0; i < nr_pages; i++)
+ for (i = 0; i < nr_pages; i++) {
+ flush_dcache_page(pages[i]);
SetPageUptodate(pages[i]);
+ }
}


> Phillip
>
> > Best regards
>

2022-06-06 11:24:47

by Hsin-Yi Wang

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] squashfs: implement readahead

On Mon, Jun 6, 2022 at 5:55 PM Hsin-Yi Wang <[email protected]> wrote:
>
> On Mon, Jun 6, 2022 at 11:54 AM Phillip Lougher <[email protected]> wrote:
> >
> > On 03/06/2022 16:58, Marek Szyprowski wrote:
> > > Hi Matthew,
> > >
> > > On 03.06.2022 17:29, Matthew Wilcox wrote:
> > >> On Fri, Jun 03, 2022 at 10:55:01PM +0800, Hsin-Yi Wang wrote:
> > >>> On Fri, Jun 3, 2022 at 10:10 PM Marek Szyprowski
> > >>> <[email protected]> wrote:
> > >>>> Hi Matthew,
> > >>>>
> > >>>> On 03.06.2022 14:59, Matthew Wilcox wrote:
> > >>>>> On Fri, Jun 03, 2022 at 02:54:21PM +0200, Marek Szyprowski wrote:
> > >>>>>> On 01.06.2022 12:39, Hsin-Yi Wang wrote:
> > >>>>>>> Implement readahead callback for squashfs. It will read datablocks
> > >>>>>>> which cover pages in readahead request. For a few cases it will
> > >>>>>>> not mark page as uptodate, including:
> > >>>>>>> - file end is 0.
> > >>>>>>> - zero filled blocks.
> > >>>>>>> - current batch of pages isn't in the same datablock or not enough in a
> > >>>>>>> datablock.
> > >>>>>>> - decompressor error.
> > >>>>>>> Otherwise pages will be marked as uptodate. The unhandled pages will be
> > >>>>>>> updated by readpage later.
> > >>>>>>>
> > >>>>>>> Suggested-by: Matthew Wilcox <[email protected]>
> > >>>>>>> Signed-off-by: Hsin-Yi Wang <[email protected]>
> > >>>>>>> Reported-by: Matthew Wilcox <[email protected]>
> > >>>>>>> Reported-by: Phillip Lougher <[email protected]>
> > >>>>>>> Reported-by: Xiongwei Song <[email protected]>
> > >>>>>>> ---
> > >>>>>> This patch landed recently in linux-next as commit 95f7a26191de
> > >>>>>> ("squashfs: implement readahead"). I've noticed that it causes serious
> > >>>>>> issues on my test systems (various ARM 32bit and 64bit based boards).
> > >>>>>> The easiest way to observe is udev timeout 'waiting for /dev to be fully
> > >>>>>> populated' and prolonged booting time. I'm using squashfs for deploying
> > >>>>>> kernel modules via initrd. Reverting aeefca9dfae7 & 95f7a26191deon on
> > >>>>>> top of the next-20220603 fixes the issue.
> > >>>>> How large are these files? Just a few kilobytes?
> > >>>> Yes, they are small, most of them are smaller than 16KB, some about
> > >>>> 128KB and a few about 256KB. I've sent a detailed list in private mail.
> > >>>>
> > >>> Hi Marek,
> > >>>
> > >>> Are there any obvious squashfs errors in dmesg? Did you enable
> > >>> CONFIG_SQUASHFS_FILE_DIRECT or CONFIG_SQUASHFS_FILE_CACHE?
> > >> I don't think it's an error problem. I think it's a short file problem.
> > >>
> > >> As I understand the current code (and apologies for not keeping up
> > >> to date with how the patch is progressing), if the file is less than
> > >> msblk->block_size bytes, we'll leave all the pages as !uptodate, leaving
> > >> them to be brough uptodate by squashfs_read_folio(). So Marek is hitting
> > >> the worst case scenario where we re-read the entire block for each page
> > >> in it. I think we have to handle this tail case in ->readahead().
> > >
> > > I'm not sure if this is related to reading of small files. There are
> > > only 50 modules being loaded from squashfs volume. I did a quick test of
> > > reading the files.
> > >
> > > Simple file read with this patch:
> > >
> > > root@target:~# time find /initrd/ -type f | while read f; do cat $f
> > > >/dev/null; done
> > >
> > > real 0m5.865s
> > > user 0m2.362s
> > > sys 0m3.844s
> > >
> > > Without:
> > >
> > > root@target:~# time find /initrd/ -type f | while read f; do cat $f
> > > >/dev/null; done
> > >
> > > real 0m6.619s
> > > user 0m2.112s
> > > sys 0m4.827s
> > >
> >
> > It has been a four day holiday in the UK (Queen's Platinum Jubilee),
> > hence the delay in responding.
> >
> > The above read use-case is sequential (only one thread/process),
> > whereas the use-case where the slow-down is observed may be
> > parallel (multiple threads/processes entering Squashfs).
> >
> > The above sequential use-case if the small files are held in
> > fragments, will be exhibiting caching behaviour that will
> > ameliorate the case where the same block is being repeatedly
> > re-read for each page in it. Because each time
> > Squashfs is re-entered handling only a single page, the
> > decompressed block will be found in the fragment
> > cache, eliminating a block decompression for each page.
> >
> > In a parallel use-case the decompressed fragment block
> > may be being eliminated from the cache (by other reading
> > processes), hence forcing the block to be repeatedly
> > decompressed.
> >
> > Hence the slow-down will be much more noticable with a
> > parallel use-case than a sequential use-case. It also may
> > be why this slipped through testing, if the test cases
> > are purely sequential in nature.
> >
> > So Matthew's previous comment is still the most likely
> > explanation for the slow-down.
> >
> Thanks for the pointers. To deal with short file cases (nr_pages <
> max_pages), Can we refer to squashfs_fill_page() used in
> squashfs_read_cache(), similar to the case where there are missing
> pages on the block?
>
> Directly calling squashfs_read_data() on short files will lead to crash:
>
> Unable to handle kernel paging request at virtual address:
> [ 19.244654] zlib_inflate+0xba4/0x10c8
> [ 19.244658] zlib_uncompress+0x150/0x1bc
> [ 19.244662] squashfs_decompress+0x6c/0xb4
> [ 19.244669] squashfs_read_data+0x1a8/0x298
> [ 19.244673] squashfs_readahead+0x2cc/0x4cc
>
> I also noticed that the function didn't set flush_dcache_page() with
> SetPageUptodate() previously.
>
> Put these 2 issues together:
>

The patch here is not correct. Please ignore it for now. Sorry for the noice.

> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> index 658fb98af0cd..27519f1f9045 100644
> --- a/fs/squashfs/file.c
> +++ b/fs/squashfs/file.c
> @@ -532,8 +532,7 @@ static void squashfs_readahead(struct
> readahead_control *ractl)
> if (!nr_pages)
> break;
>
> - if (readahead_pos(ractl) >= i_size_read(inode) ||
> - nr_pages < max_pages)
> + if (readahead_pos(ractl) >= i_size_read(inode))
> goto skip_pages;
>
> index = pages[0]->index >> shift;
> @@ -548,6 +547,23 @@ static void squashfs_readahead(struct
> readahead_control *ractl)
> if (bsize == 0)
> goto skip_pages;
>
> + if (nr_pages < max_pages) {
> + struct squashfs_cache_entry *buffer;
> +
> + buffer = squashfs_get_datablock(inode->i_sb, block,
> + bsize);
> + if (!buffer->error) {
> + for (i = 0; i < nr_pages && expected > 0; i++,
> + expected -= PAGE_SIZE) {
> + int avail = min_t(int,
> expected, PAGE_SIZE);
> +
> + squashfs_fill_page(pages[i],
> buffer, i * PAGE_SIZE, avail);
> + }
> + }
> + squashfs_cache_put(buffer);
> + goto skip_pages;
> + }
> +
> res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
> actor);
>
> @@ -564,8 +580,10 @@ static void squashfs_readahead(struct
> readahead_control *ractl)
> kunmap_atomic(pageaddr);
> }
>
> - for (i = 0; i < nr_pages; i++)
> + for (i = 0; i < nr_pages; i++) {
> + flush_dcache_page(pages[i]);
> SetPageUptodate(pages[i]);
> + }
> }
>
>
> > Phillip
> >
> > > Best regards
> >

2022-06-06 15:22:03

by Hsin-Yi Wang

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] squashfs: implement readahead

On Mon, Jun 6, 2022 at 7:09 PM Hsin-Yi Wang <[email protected]> wrote:
>
> On Mon, Jun 6, 2022 at 5:55 PM Hsin-Yi Wang <[email protected]> wrote:
> >
> > On Mon, Jun 6, 2022 at 11:54 AM Phillip Lougher <[email protected]> wrote:
> > >
> > > On 03/06/2022 16:58, Marek Szyprowski wrote:
> > > > Hi Matthew,
> > > >
> > > > On 03.06.2022 17:29, Matthew Wilcox wrote:
> > > >> On Fri, Jun 03, 2022 at 10:55:01PM +0800, Hsin-Yi Wang wrote:
> > > >>> On Fri, Jun 3, 2022 at 10:10 PM Marek Szyprowski
> > > >>> <[email protected]> wrote:
> > > >>>> Hi Matthew,
> > > >>>>
> > > >>>> On 03.06.2022 14:59, Matthew Wilcox wrote:
> > > >>>>> On Fri, Jun 03, 2022 at 02:54:21PM +0200, Marek Szyprowski wrote:
> > > >>>>>> On 01.06.2022 12:39, Hsin-Yi Wang wrote:
> > > >>>>>>> Implement readahead callback for squashfs. It will read datablocks
> > > >>>>>>> which cover pages in readahead request. For a few cases it will
> > > >>>>>>> not mark page as uptodate, including:
> > > >>>>>>> - file end is 0.
> > > >>>>>>> - zero filled blocks.
> > > >>>>>>> - current batch of pages isn't in the same datablock or not enough in a
> > > >>>>>>> datablock.
> > > >>>>>>> - decompressor error.
> > > >>>>>>> Otherwise pages will be marked as uptodate. The unhandled pages will be
> > > >>>>>>> updated by readpage later.
> > > >>>>>>>
> > > >>>>>>> Suggested-by: Matthew Wilcox <[email protected]>
> > > >>>>>>> Signed-off-by: Hsin-Yi Wang <[email protected]>
> > > >>>>>>> Reported-by: Matthew Wilcox <[email protected]>
> > > >>>>>>> Reported-by: Phillip Lougher <[email protected]>
> > > >>>>>>> Reported-by: Xiongwei Song <[email protected]>
> > > >>>>>>> ---
> > > >>>>>> This patch landed recently in linux-next as commit 95f7a26191de
> > > >>>>>> ("squashfs: implement readahead"). I've noticed that it causes serious
> > > >>>>>> issues on my test systems (various ARM 32bit and 64bit based boards).
> > > >>>>>> The easiest way to observe is udev timeout 'waiting for /dev to be fully
> > > >>>>>> populated' and prolonged booting time. I'm using squashfs for deploying
> > > >>>>>> kernel modules via initrd. Reverting aeefca9dfae7 & 95f7a26191deon on
> > > >>>>>> top of the next-20220603 fixes the issue.
> > > >>>>> How large are these files? Just a few kilobytes?
> > > >>>> Yes, they are small, most of them are smaller than 16KB, some about
> > > >>>> 128KB and a few about 256KB. I've sent a detailed list in private mail.
> > > >>>>
> > > >>> Hi Marek,
> > > >>>
> > > >>> Are there any obvious squashfs errors in dmesg? Did you enable
> > > >>> CONFIG_SQUASHFS_FILE_DIRECT or CONFIG_SQUASHFS_FILE_CACHE?
> > > >> I don't think it's an error problem. I think it's a short file problem.
> > > >>
> > > >> As I understand the current code (and apologies for not keeping up
> > > >> to date with how the patch is progressing), if the file is less than
> > > >> msblk->block_size bytes, we'll leave all the pages as !uptodate, leaving
> > > >> them to be brough uptodate by squashfs_read_folio(). So Marek is hitting
> > > >> the worst case scenario where we re-read the entire block for each page
> > > >> in it. I think we have to handle this tail case in ->readahead().
> > > >
> > > > I'm not sure if this is related to reading of small files. There are
> > > > only 50 modules being loaded from squashfs volume. I did a quick test of
> > > > reading the files.
> > > >
> > > > Simple file read with this patch:
> > > >
> > > > root@target:~# time find /initrd/ -type f | while read f; do cat $f
> > > > >/dev/null; done
> > > >
> > > > real 0m5.865s
> > > > user 0m2.362s
> > > > sys 0m3.844s
> > > >
> > > > Without:
> > > >
> > > > root@target:~# time find /initrd/ -type f | while read f; do cat $f
> > > > >/dev/null; done
> > > >
> > > > real 0m6.619s
> > > > user 0m2.112s
> > > > sys 0m4.827s
> > > >
> > >
> > > It has been a four day holiday in the UK (Queen's Platinum Jubilee),
> > > hence the delay in responding.
> > >
> > > The above read use-case is sequential (only one thread/process),
> > > whereas the use-case where the slow-down is observed may be
> > > parallel (multiple threads/processes entering Squashfs).
> > >
> > > The above sequential use-case if the small files are held in
> > > fragments, will be exhibiting caching behaviour that will
> > > ameliorate the case where the same block is being repeatedly
> > > re-read for each page in it. Because each time
> > > Squashfs is re-entered handling only a single page, the
> > > decompressed block will be found in the fragment
> > > cache, eliminating a block decompression for each page.
> > >
> > > In a parallel use-case the decompressed fragment block
> > > may be being eliminated from the cache (by other reading
> > > processes), hence forcing the block to be repeatedly
> > > decompressed.
> > >
> > > Hence the slow-down will be much more noticable with a
> > > parallel use-case than a sequential use-case. It also may
> > > be why this slipped through testing, if the test cases
> > > are purely sequential in nature.
> > >
> > > So Matthew's previous comment is still the most likely
> > > explanation for the slow-down.
> > >
> > Thanks for the pointers. To deal with short file cases (nr_pages <
> > max_pages), Can we refer to squashfs_fill_page() used in
> > squashfs_read_cache(), similar to the case where there are missing
> > pages on the block?
> >
> > Directly calling squashfs_read_data() on short files will lead to crash:
> >
> > Unable to handle kernel paging request at virtual address:
> > [ 19.244654] zlib_inflate+0xba4/0x10c8
> > [ 19.244658] zlib_uncompress+0x150/0x1bc
> > [ 19.244662] squashfs_decompress+0x6c/0xb4
> > [ 19.244669] squashfs_read_data+0x1a8/0x298
> > [ 19.244673] squashfs_readahead+0x2cc/0x4cc
> >
> > I also noticed that the function didn't set flush_dcache_page() with
> > SetPageUptodate() previously.
> >
> > Put these 2 issues together:
> >
>
> The patch here is not correct. Please ignore it for now. Sorry for the noice.
>
Hi all,

The correct version is sent as v5:
https://lore.kernel.org/lkml/[email protected]/T/#t

Note that this is based on next-20220513, which doesn't have v4 applied.
I also squashed a fix to a checkpatch error in this version.

Thanks


> > diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
> > index 658fb98af0cd..27519f1f9045 100644
> > --- a/fs/squashfs/file.c
> > +++ b/fs/squashfs/file.c
> > @@ -532,8 +532,7 @@ static void squashfs_readahead(struct
> > readahead_control *ractl)
> > if (!nr_pages)
> > break;
> >
> > - if (readahead_pos(ractl) >= i_size_read(inode) ||
> > - nr_pages < max_pages)
> > + if (readahead_pos(ractl) >= i_size_read(inode))
> > goto skip_pages;
> >
> > index = pages[0]->index >> shift;
> > @@ -548,6 +547,23 @@ static void squashfs_readahead(struct
> > readahead_control *ractl)
> > if (bsize == 0)
> > goto skip_pages;
> >
> > + if (nr_pages < max_pages) {
> > + struct squashfs_cache_entry *buffer;
> > +
> > + buffer = squashfs_get_datablock(inode->i_sb, block,
> > + bsize);
> > + if (!buffer->error) {
> > + for (i = 0; i < nr_pages && expected > 0; i++,
> > + expected -= PAGE_SIZE) {
> > + int avail = min_t(int,
> > expected, PAGE_SIZE);
> > +
> > + squashfs_fill_page(pages[i],
> > buffer, i * PAGE_SIZE, avail);
> > + }
> > + }
> > + squashfs_cache_put(buffer);
> > + goto skip_pages;
> > + }
> > +
> > res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
> > actor);
> >
> > @@ -564,8 +580,10 @@ static void squashfs_readahead(struct
> > readahead_control *ractl)
> > kunmap_atomic(pageaddr);
> > }
> >
> > - for (i = 0; i < nr_pages; i++)
> > + for (i = 0; i < nr_pages; i++) {
> > + flush_dcache_page(pages[i]);
> > SetPageUptodate(pages[i]);
> > + }
> > }
> >
> >
> > > Phillip
> > >
> > > > Best regards
> > >

2022-06-07 11:52:36

by Phillip Lougher

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] squashfs: implement readahead

On 03/06/2022 13:54, Marek Szyprowski wrote:
> Hi,
>
> On 01.06.2022 12:39, Hsin-Yi Wang wrote:
>> Implement readahead callback for squashfs. It will read datablocks
>> which cover pages in readahead request. For a few cases it will
>> not mark page as uptodate, including:
>> - file end is 0.
>> - zero filled blocks.
>> - current batch of pages isn't in the same datablock or not enough in a
>> datablock.
>> - decompressor error.
>> Otherwise pages will be marked as uptodate. The unhandled pages will be
>> updated by readpage later.
>>
>> Suggested-by: Matthew Wilcox <[email protected]>
>> Signed-off-by: Hsin-Yi Wang <[email protected]>
>> Reported-by: Matthew Wilcox <[email protected]>
>> Reported-by: Phillip Lougher <[email protected]>
>> Reported-by: Xiongwei Song <[email protected]>
>> ---
>
> This patch landed recently in linux-next as commit 95f7a26191de
> ("squashfs: implement readahead"). I've noticed that it causes serious
> issues on my test systems (various ARM 32bit and 64bit based boards).
> The easiest way to observe is udev timeout 'waiting for /dev to be fully
> populated' and prolonged booting time. I'm using squashfs for deploying
> kernel modules via initrd. Reverting aeefca9dfae7 & 95f7a26191deon on
> top of the next-20220603 fixes the issue.
>
> Let me know how I can help debugging this issue. There is no hurry
> though, because the next week I will be on holidays.

Hi Marek,

Can you supply an example Squashfs filesystem and script that
reproduces the slow-down? Failing that, can you supply a copy
of your initrd/root-filesystem that can be run under emulation
to reproduce the issue? (I don't have any modern ARM embedded
systems).

Again failing that, are you happy to test some debug code?

Thanks

Phillip (Squashfs maintainer and author).

>
>> v3->v4: Fix a few variable type and their locations.
>> v3: https://lore.kernel.org/lkml/[email protected]/
>> v2: https://lore.kernel.org/lkml/[email protected]/
>> v1: https://lore.kernel.org/lkml/[email protected]/
>> ---
>> fs/squashfs/file.c | 97 +++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 96 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
>> index a8e495d8eb86..df7ad4b3e99c 100644
>> --- a/fs/squashfs/file.c
>> +++ b/fs/squashfs/file.c
>> @@ -39,6 +39,7 @@
>> #include "squashfs_fs_sb.h"
>> #include "squashfs_fs_i.h"
>> #include "squashfs.h"
>> +#include "page_actor.h"
>>
>> /*
>> * Locate cache slot in range [offset, index] for specified inode. If
>> @@ -495,7 +496,101 @@ static int squashfs_read_folio(struct file *file, struct folio *folio)
>> return 0;
>> }
>>
>> +static void squashfs_readahead(struct readahead_control *ractl)
>> +{
>> + struct inode *inode = ractl->mapping->host;
>> + struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
>> + size_t mask = (1UL << msblk->block_log) - 1;
>> + unsigned short shift = msblk->block_log - PAGE_SHIFT;
>> + loff_t start = readahead_pos(ractl) &~ mask;
>> + size_t len = readahead_length(ractl) + readahead_pos(ractl) - start;
>> + struct squashfs_page_actor *actor;
>> + unsigned int nr_pages = 0;
>> + struct page **pages;
>> + int i, file_end = i_size_read(inode) >> msblk->block_log;
>> + unsigned int max_pages = 1UL << shift;
>> +
>> + readahead_expand(ractl, start, (len | mask) + 1);
>> +
>> + if (file_end == 0)
>> + return;
>> +
>> + pages = kmalloc_array(max_pages, sizeof(void *), GFP_KERNEL);
>> + if (!pages)
>> + return;
>> +
>> + actor = squashfs_page_actor_init_special(pages, max_pages, 0);
>> + if (!actor)
>> + goto out;
>> +
>> + for (;;) {
>> + pgoff_t index;
>> + int res, bsize;
>> + u64 block = 0;
>> + unsigned int expected;
>> +
>> + nr_pages = __readahead_batch(ractl, pages, max_pages);
>> + if (!nr_pages)
>> + break;
>> +
>> + if (readahead_pos(ractl) >= i_size_read(inode) ||
>> + nr_pages < max_pages)
>> + goto skip_pages;
>> +
>> + index = pages[0]->index >> shift;
>> + if ((pages[nr_pages - 1]->index >> shift) != index)
>> + goto skip_pages;
>> +
>> + expected = index == file_end ?
>> + (i_size_read(inode) & (msblk->block_size - 1)) :
>> + msblk->block_size;
>> +
>> + bsize = read_blocklist(inode, index, &block);
>> + if (bsize == 0)
>> + goto skip_pages;
>> +
>> + res = squashfs_read_data(inode->i_sb, block, bsize, NULL,
>> + actor);
>> +
>> + if (res == expected) {
>> + int bytes;
>> +
>> + /* Last page may have trailing bytes not filled */
>> + bytes = res % PAGE_SIZE;
>> + if (bytes) {
>> + void *pageaddr;
>> +
>> + pageaddr = kmap_atomic(pages[nr_pages - 1]);
>> + memset(pageaddr + bytes, 0, PAGE_SIZE - bytes);
>> + kunmap_atomic(pageaddr);
>> + }
>> +
>> + for (i = 0; i < nr_pages; i++)
>> + SetPageUptodate(pages[i]);
>> + }
>> +
>> + for (i = 0; i < nr_pages; i++) {
>> + unlock_page(pages[i]);
>> + put_page(pages[i]);
>> + }
>> + }
>> +
>> + kfree(actor);
>> + kfree(pages);
>> + return;
>> +
>> +skip_pages:
>> + for (i = 0; i < nr_pages; i++) {
>> + unlock_page(pages[i]);
>> + put_page(pages[i]);
>> + }
>> +
>> + kfree(actor);
>> +out:
>> + kfree(pages);
>> +}
>>
>> const struct address_space_operations squashfs_aops = {
>> - .read_folio = squashfs_read_folio
>> + .read_folio = squashfs_read_folio,
>> + .readahead = squashfs_readahead
>> };
>
> Best regards

2022-06-13 17:23:24

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] squashfs: implement readahead

Hi Phillip,

On 07.06.2022 09:35, Phillip Lougher wrote:
> On 03/06/2022 13:54, Marek Szyprowski wrote:
>> Hi,
>>
>> On 01.06.2022 12:39, Hsin-Yi Wang wrote:
>>> Implement readahead callback for squashfs. It will read datablocks
>>> which cover pages in readahead request. For a few cases it will
>>> not mark page as uptodate, including:
>>> - file end is 0.
>>> - zero filled blocks.
>>> - current batch of pages isn't in the same datablock or not enough in a
>>>     datablock.
>>> - decompressor error.
>>> Otherwise pages will be marked as uptodate. The unhandled pages will be
>>> updated by readpage later.
>>>
>>> Suggested-by: Matthew Wilcox <[email protected]>
>>> Signed-off-by: Hsin-Yi Wang <[email protected]>
>>> Reported-by: Matthew Wilcox <[email protected]>
>>> Reported-by: Phillip Lougher <[email protected]>
>>> Reported-by: Xiongwei Song <[email protected]>
>>> ---
>>
>> This patch landed recently in linux-next as commit 95f7a26191de
>> ("squashfs: implement readahead"). I've noticed that it causes serious
>> issues on my test systems (various ARM 32bit and 64bit based boards).
>> The easiest way to observe is udev timeout 'waiting for /dev to be fully
>> populated' and prolonged booting time. I'm using squashfs for deploying
>> kernel modules via initrd. Reverting aeefca9dfae7 & 95f7a26191deon on
>> top of the next-20220603 fixes the issue.
>>
>> Let me know how I can help debugging this issue. There is no hurry
>> though, because the next week I will be on holidays.
>
> Hi Marek,
>
> Can you supply an example Squashfs filesystem and script that
> reproduces the slow-down?  Failing that, can you supply a copy
> of your initrd/root-filesystem that can be run under emulation
> to reproduce the issue? (I don't have any modern ARM embedded
> systems).
>
> Again failing that, are you happy to test some debug code?
>
> Thanks
>
> Phillip (Squashfs maintainer and author).

I've just got back from my holidays.

Is this still relevant? I've noticed that v6 has been posted, but I
failed to apply it on top of next-20220610 as mentioned in the
cover-letter to test. I've also tried also to apply the mentioned
'Squashfs: handle missing pages decompressing into page cache' patchset.
On the other hand, next-20220610 seems to be working fine on my setup now.


Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2022-06-13 18:35:10

by Hsin-Yi Wang

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] squashfs: implement readahead

On Mon, Jun 13, 2022 at 8:08 PM Marek Szyprowski
<[email protected]> wrote:
>
> Hi Phillip,
>
> On 07.06.2022 09:35, Phillip Lougher wrote:
> > On 03/06/2022 13:54, Marek Szyprowski wrote:
> >> Hi,
> >>
> >> On 01.06.2022 12:39, Hsin-Yi Wang wrote:
> >>> Implement readahead callback for squashfs. It will read datablocks
> >>> which cover pages in readahead request. For a few cases it will
> >>> not mark page as uptodate, including:
> >>> - file end is 0.
> >>> - zero filled blocks.
> >>> - current batch of pages isn't in the same datablock or not enough in a
> >>> datablock.
> >>> - decompressor error.
> >>> Otherwise pages will be marked as uptodate. The unhandled pages will be
> >>> updated by readpage later.
> >>>
> >>> Suggested-by: Matthew Wilcox <[email protected]>
> >>> Signed-off-by: Hsin-Yi Wang <[email protected]>
> >>> Reported-by: Matthew Wilcox <[email protected]>
> >>> Reported-by: Phillip Lougher <[email protected]>
> >>> Reported-by: Xiongwei Song <[email protected]>
> >>> ---
> >>
> >> This patch landed recently in linux-next as commit 95f7a26191de
> >> ("squashfs: implement readahead"). I've noticed that it causes serious
> >> issues on my test systems (various ARM 32bit and 64bit based boards).
> >> The easiest way to observe is udev timeout 'waiting for /dev to be fully
> >> populated' and prolonged booting time. I'm using squashfs for deploying
> >> kernel modules via initrd. Reverting aeefca9dfae7 & 95f7a26191deon on
> >> top of the next-20220603 fixes the issue.
> >>
> >> Let me know how I can help debugging this issue. There is no hurry
> >> though, because the next week I will be on holidays.
> >
> > Hi Marek,
> >
> > Can you supply an example Squashfs filesystem and script that
> > reproduces the slow-down? Failing that, can you supply a copy
> > of your initrd/root-filesystem that can be run under emulation
> > to reproduce the issue? (I don't have any modern ARM embedded
> > systems).
> >
> > Again failing that, are you happy to test some debug code?
> >
> > Thanks
> >
> > Phillip (Squashfs maintainer and author).
>
> I've just got back from my holidays.
>
> Is this still relevant? I've noticed that v6 has been posted, but I
> failed to apply it on top of next-20220610 as mentioned in the
> cover-letter to test. I've also tried also to apply the mentioned
> 'Squashfs: handle missing pages decompressing into page cache' patchset.
> On the other hand, next-20220610 seems to be working fine on my setup now.
>

hi Marek,

next-20220610 contains v5 of the series. To apply v6, you need to
revert ca1505bf4805 ("squashfs: implement readahead") and 9d58b94aa73a
("squashfs: always build "file direct" version of page actor") first,
then apply 'Squashfs: handle missing pages decompressing into page
cache' patchset, then finally apply v6, since v6 is dependent on the
patchset.

Thanks.

>
> Best regards
> --
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
>