Received: by 2002:a5d:9c59:0:0:0:0:0 with SMTP id 25csp551805iof; Mon, 6 Jun 2022 08:22:03 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyKe4sluhT5CDTFgt3JnLq9ThnlZRjiAs3mFbGOF1jiY6UxCnjQT4ZPZ6GBIYJGEDxIFlAp X-Received: by 2002:a63:4722:0:b0:3fb:94a7:9986 with SMTP id u34-20020a634722000000b003fb94a79986mr21492132pga.531.1654528923549; Mon, 06 Jun 2022 08:22:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1654528923; cv=none; d=google.com; s=arc-20160816; b=LjEfX0mnN/w83PuSCqV2njuUgXddL5dkaAciou86VqHTmEKHAKSVsxHcEXPdoL0r/0 ii9Vga7YYS8H8Y4huMCW2ymaHgxW4tkYg7NpllOJBkHuC7SfKjtbcH/iOhCXNf7houA5 Ko218MCbWPigphOBOy9ygo4gPuw+xW5PLD00Q71Oz2CH12e7pY1PXiR4IM7NAuBMDPj0 O5sf9wkG5f9AwMsPi7AQn+zQA0eQqnw4f3Br0kL4mAAh+4BTQqUNbKanU/czydCKqw6n M5HhJ0smXSNIlPHi78qy1Tg41ntsszLmQenpQBwntJKSxR+YmG0KWJ7hvwUR15X7WAqQ 2Dlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=W9Swvmr+mEd+b2k7Kv/iaTDXCqPlEY6l5h9lFS87+Ow=; b=K1f576p2iNCMNBDZBrHoYNz+s6x4tDO7Li6wYkSq+ji8Tpbwt/kwoX/M3MDIXz0yJW 15jybzVzm4CwGdNnv6wFZMcGYn2tDt5fIJYqjDX+WoxS6+TX4l5ArZiou9zOSc7A6dEX 6WRlFMRuRkwKwDnJt8KHb3MJDF0Thrr32emZ0ZTEaj4GYwJRF/bF0ITtkQFSsjM2Orjo al6evJ+8ixNSp5OjNKYDC/d0012SWzSpygU6KV7bgacvq/vQiiH+CyB2PzTmW/kFiGuV K+FXcJm53cWeIAruB5xUKn5V76ZKhWPIwRl7fRo0smb9lMHfWNRx8R+4/m+ukw2aeKhg JssA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=PfbdbKnD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id a38-20020a056a001d2600b0051c244f864dsi3169786pfx.220.2022.06.06.08.22.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Jun 2022 08:22:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=PfbdbKnD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8C75E117668; Mon, 6 Jun 2022 08:09:23 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240313AbiFFPJR (ORCPT + 99 others); Mon, 6 Jun 2022 11:09:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55346 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240302AbiFFPJQ (ORCPT ); Mon, 6 Jun 2022 11:09:16 -0400 Received: from mail-ej1-x62a.google.com (mail-ej1-x62a.google.com [IPv6:2a00:1450:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FD1F1105DD for ; Mon, 6 Jun 2022 08:09:15 -0700 (PDT) Received: by mail-ej1-x62a.google.com with SMTP id me5so29027950ejb.2 for ; Mon, 06 Jun 2022 08:09:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=W9Swvmr+mEd+b2k7Kv/iaTDXCqPlEY6l5h9lFS87+Ow=; b=PfbdbKnDFT8dgP9deYtOzdM71oANle08XQIozpYsC2EoNVg0uvqx4/dk8gEzIabuxk Ijt9tNLNzN3VP4OK+bFHOckaW8W43XYeABimqeW3wMkJ/602sVfXdsRaBqxYIHD+nlAQ Q+Ss9MMsKer7w0I/zAEgKMtEwOGabeM98XTwk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=W9Swvmr+mEd+b2k7Kv/iaTDXCqPlEY6l5h9lFS87+Ow=; b=nW8fQnOEIXRCMAer32JcZrtlowYsb2bb/10hkzOBGLhL4o2r0Nyd88tal62I0nTd1Z /XJUNF19Z3q+62XxVDkv8NoWHpr/PVRKe6TwCDoPU70UOM0HH8AjrwNCtodE5VhJxUKi bTgf29eK1A93gZIGFHOarvp0jCd5c89RUFxaC+ef/Dp/wMcxGOQjaAEQjbfhZBxvdfxH gMSLhhlvJ89zM3TBqidF526yefbpmoDwK3ERP0LmyQdLbTOnC5t7B28n8XlviufhFfpK 3DBXcs1LsZXSn3qCx54tgGIsxPHOTi7tyuC2OStYDyaQA2mSVaFuGEhLv7HLr3DRGAye RkKg== X-Gm-Message-State: AOAM533OygBkYrR79pkUEF+dTKDmXZDFTCob8lTNfIwWFAyz8beicf2R N2BWGUtbu5GAH3fQ2z3kQri55EeCqOuFv8gKGRVSRA== X-Received: by 2002:a17:907:868f:b0:702:f865:55de with SMTP id qa15-20020a170907868f00b00702f86555demr21976870ejc.24.1654528153830; Mon, 06 Jun 2022 08:09:13 -0700 (PDT) MIME-Version: 1.0 References: <20220601103922.1338320-1-hsinyi@chromium.org> <20220601103922.1338320-4-hsinyi@chromium.org> <90b228ea-1b0e-d2e8-62be-9ad5802dcce7@samsung.com> <0e84fe64-c993-7f43-ca52-8fee735b0372@squashfs.org.uk> In-Reply-To: From: Hsin-Yi Wang Date: Mon, 6 Jun 2022 23:08:47 +0800 Message-ID: Subject: Re: [PATCH v4 3/3] squashfs: implement readahead To: Phillip Lougher Cc: Marek Szyprowski , Matthew Wilcox , Xiongwei Song , Zheng Liang , Zhang Yi , Hou Tao , Miao Xie , Andrew Morton , "linux-mm @ kvack . org" , "squashfs-devel @ lists . sourceforge . net" , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 6, 2022 at 7:09 PM Hsin-Yi Wang wrote: > > On Mon, Jun 6, 2022 at 5:55 PM Hsin-Yi Wang wrote: > > > > On Mon, Jun 6, 2022 at 11:54 AM Phillip Lougher wrote: > > > > > > On 03/06/2022 16:58, Marek Szyprowski wrote: > > > > Hi Matthew, > > > > > > > > On 03.06.2022 17:29, Matthew Wilcox wrote: > > > >> On Fri, Jun 03, 2022 at 10:55:01PM +0800, Hsin-Yi Wang wrote: > > > >>> On Fri, Jun 3, 2022 at 10:10 PM Marek Szyprowski > > > >>> wrote: > > > >>>> Hi Matthew, > > > >>>> > > > >>>> On 03.06.2022 14:59, Matthew Wilcox wrote: > > > >>>>> On Fri, Jun 03, 2022 at 02:54:21PM +0200, Marek Szyprowski wrote: > > > >>>>>> On 01.06.2022 12:39, Hsin-Yi Wang wrote: > > > >>>>>>> Implement readahead callback for squashfs. It will read datablocks > > > >>>>>>> which cover pages in readahead request. For a few cases it will > > > >>>>>>> not mark page as uptodate, including: > > > >>>>>>> - file end is 0. > > > >>>>>>> - zero filled blocks. > > > >>>>>>> - current batch of pages isn't in the same datablock or not enough in a > > > >>>>>>> datablock. > > > >>>>>>> - decompressor error. > > > >>>>>>> Otherwise pages will be marked as uptodate. The unhandled pages will be > > > >>>>>>> updated by readpage later. > > > >>>>>>> > > > >>>>>>> Suggested-by: Matthew Wilcox > > > >>>>>>> Signed-off-by: Hsin-Yi Wang > > > >>>>>>> Reported-by: Matthew Wilcox > > > >>>>>>> Reported-by: Phillip Lougher > > > >>>>>>> Reported-by: Xiongwei Song > > > >>>>>>> --- > > > >>>>>> This patch landed recently in linux-next as commit 95f7a26191de > > > >>>>>> ("squashfs: implement readahead"). I've noticed that it causes serious > > > >>>>>> issues on my test systems (various ARM 32bit and 64bit based boards). > > > >>>>>> The easiest way to observe is udev timeout 'waiting for /dev to be fully > > > >>>>>> populated' and prolonged booting time. I'm using squashfs for deploying > > > >>>>>> kernel modules via initrd. Reverting aeefca9dfae7 & 95f7a26191deon on > > > >>>>>> top of the next-20220603 fixes the issue. > > > >>>>> How large are these files? Just a few kilobytes? > > > >>>> Yes, they are small, most of them are smaller than 16KB, some about > > > >>>> 128KB and a few about 256KB. I've sent a detailed list in private mail. > > > >>>> > > > >>> Hi Marek, > > > >>> > > > >>> Are there any obvious squashfs errors in dmesg? Did you enable > > > >>> CONFIG_SQUASHFS_FILE_DIRECT or CONFIG_SQUASHFS_FILE_CACHE? > > > >> I don't think it's an error problem. I think it's a short file problem. > > > >> > > > >> As I understand the current code (and apologies for not keeping up > > > >> to date with how the patch is progressing), if the file is less than > > > >> msblk->block_size bytes, we'll leave all the pages as !uptodate, leaving > > > >> them to be brough uptodate by squashfs_read_folio(). So Marek is hitting > > > >> the worst case scenario where we re-read the entire block for each page > > > >> in it. I think we have to handle this tail case in ->readahead(). > > > > > > > > I'm not sure if this is related to reading of small files. There are > > > > only 50 modules being loaded from squashfs volume. I did a quick test of > > > > reading the files. > > > > > > > > Simple file read with this patch: > > > > > > > > root@target:~# time find /initrd/ -type f | while read f; do cat $f > > > > >/dev/null; done > > > > > > > > real 0m5.865s > > > > user 0m2.362s > > > > sys 0m3.844s > > > > > > > > Without: > > > > > > > > root@target:~# time find /initrd/ -type f | while read f; do cat $f > > > > >/dev/null; done > > > > > > > > real 0m6.619s > > > > user 0m2.112s > > > > sys 0m4.827s > > > > > > > > > > It has been a four day holiday in the UK (Queen's Platinum Jubilee), > > > hence the delay in responding. > > > > > > The above read use-case is sequential (only one thread/process), > > > whereas the use-case where the slow-down is observed may be > > > parallel (multiple threads/processes entering Squashfs). > > > > > > The above sequential use-case if the small files are held in > > > fragments, will be exhibiting caching behaviour that will > > > ameliorate the case where the same block is being repeatedly > > > re-read for each page in it. Because each time > > > Squashfs is re-entered handling only a single page, the > > > decompressed block will be found in the fragment > > > cache, eliminating a block decompression for each page. > > > > > > In a parallel use-case the decompressed fragment block > > > may be being eliminated from the cache (by other reading > > > processes), hence forcing the block to be repeatedly > > > decompressed. > > > > > > Hence the slow-down will be much more noticable with a > > > parallel use-case than a sequential use-case. It also may > > > be why this slipped through testing, if the test cases > > > are purely sequential in nature. > > > > > > So Matthew's previous comment is still the most likely > > > explanation for the slow-down. > > > > > Thanks for the pointers. To deal with short file cases (nr_pages < > > max_pages), Can we refer to squashfs_fill_page() used in > > squashfs_read_cache(), similar to the case where there are missing > > pages on the block? > > > > Directly calling squashfs_read_data() on short files will lead to crash: > > > > Unable to handle kernel paging request at virtual address: > > [ 19.244654] zlib_inflate+0xba4/0x10c8 > > [ 19.244658] zlib_uncompress+0x150/0x1bc > > [ 19.244662] squashfs_decompress+0x6c/0xb4 > > [ 19.244669] squashfs_read_data+0x1a8/0x298 > > [ 19.244673] squashfs_readahead+0x2cc/0x4cc > > > > I also noticed that the function didn't set flush_dcache_page() with > > SetPageUptodate() previously. > > > > Put these 2 issues together: > > > > The patch here is not correct. Please ignore it for now. Sorry for the noice. > Hi all, The correct version is sent as v5: https://lore.kernel.org/lkml/20220606150305.1883410-1-hsinyi@chromium.org/T/#t Note that this is based on next-20220513, which doesn't have v4 applied. I also squashed a fix to a checkpatch error in this version. Thanks > > diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c > > index 658fb98af0cd..27519f1f9045 100644 > > --- a/fs/squashfs/file.c > > +++ b/fs/squashfs/file.c > > @@ -532,8 +532,7 @@ static void squashfs_readahead(struct > > readahead_control *ractl) > > if (!nr_pages) > > break; > > > > - if (readahead_pos(ractl) >= i_size_read(inode) || > > - nr_pages < max_pages) > > + if (readahead_pos(ractl) >= i_size_read(inode)) > > goto skip_pages; > > > > index = pages[0]->index >> shift; > > @@ -548,6 +547,23 @@ static void squashfs_readahead(struct > > readahead_control *ractl) > > if (bsize == 0) > > goto skip_pages; > > > > + if (nr_pages < max_pages) { > > + struct squashfs_cache_entry *buffer; > > + > > + buffer = squashfs_get_datablock(inode->i_sb, block, > > + bsize); > > + if (!buffer->error) { > > + for (i = 0; i < nr_pages && expected > 0; i++, > > + expected -= PAGE_SIZE) { > > + int avail = min_t(int, > > expected, PAGE_SIZE); > > + > > + squashfs_fill_page(pages[i], > > buffer, i * PAGE_SIZE, avail); > > + } > > + } > > + squashfs_cache_put(buffer); > > + goto skip_pages; > > + } > > + > > res = squashfs_read_data(inode->i_sb, block, bsize, NULL, > > actor); > > > > @@ -564,8 +580,10 @@ static void squashfs_readahead(struct > > readahead_control *ractl) > > kunmap_atomic(pageaddr); > > } > > > > - for (i = 0; i < nr_pages; i++) > > + for (i = 0; i < nr_pages; i++) { > > + flush_dcache_page(pages[i]); > > SetPageUptodate(pages[i]); > > + } > > } > > > > > > > Phillip > > > > > > > Best regards > > >