Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20;
Date:   Wed, 15 Jun 2022 01:00:47 +0800
From:   Gao Xiang <hsiangkao@linux.alibaba.com>
To:     Daeho Jeong <daeho43@gmail.com>
Cc:     Eric Biggers <ebiggers@kernel.org>,
        Daeho Jeong <daehojeong@google.com>,
        Nathan Huckleberry <nhuck@google.com>, kernel-team@android.com,
        linux-kernel@vger.kernel.org,
        linux-f2fs-devel@lists.sourceforge.net
Subject: Re: [f2fs-dev] [PATCH] f2fs: handle decompress only post processing
 in softirq
Message-ID: <Yqi+vyY4K0mzEdeP@B-P7TQMD6M-0146.local>
References: <20220613155612.402297-1-daeho43@gmail.com>
 <Yqge0XS7jbSnNWvq@sol.localdomain>
 <YqhRBZMYPp/kyxoe@B-P7TQMD6M-0146.local>
 <CACOAw_wjCyTmwusY6S4+NgMuLOZm9fwGfrvCT272GJ01-RP6PQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <CACOAw_wjCyTmwusY6S4+NgMuLOZm9fwGfrvCT272GJ01-RP6PQ@mail.gmail.com>
Precedence: bulk

Hi Daeho,

On Tue, Jun 14, 2022 at 09:46:50AM -0700, Daeho Jeong wrote:
> >
> > Some my own previous thoughts about this strategy:
> >
> >  - If we allocate all memory and map these before I/Os, all inflight I/Os
> >    will keep such temporary pages all the time until decompression is
> >    finished. In contrast, if we allocate or reuse such pages just before
> >    decompression, it would minimize the memory footprints.
> >
> >    I think it will impact the memory numbers at least on the very
> >    low-ended devices with bslow storage. (I've seen f2fs has some big
> >    mempool already)
> >
> >  - Many compression algorithms are not suitable in the softirq contexts,
> >    also I vaguely remembered if softirq context lasts for > 2ms, it will
> >    push into ksoftirqd instead so it's actually another process context.
> >    And it may delay other important interrupt handling.
> >
> >  - Go back to the non-deterministic scheduling of workqueues. I guess it
> >    may be just due to scheduling punishment due to a lot of CPU consuming
> >    due to decompression before so the priority becomes low, but that is
> >    just a pure guess. May be we need to use RT scheduling policy instead.
> >
> >    At least with WQ_HIGHPRI for dm-verity at least, but I don't find
> >    WQ_HIGHPRI mark for dm-verity.
> >
> > Thanks,
> > Gao Xiang
> 
> I totally understand what you are worried about. However, in the real
> world, non-determinism from workqueues is more harsh than we expected.
> As you know, reading I/Os in the system are critical paths most of the
> time and now I/O variations with workqueue are too bad.
> 
> I also think it's better that we have RT scheduling like things here.
> We could think about it more.

Yeah, I heard that you folks are really suffered from the scheduling
issues. But for my own previous experience, extra memory footprints are
really critical in Android low memory scenarios (no matter low-ended
devices or artificial workloads), it tossed me a lot. So I finally 
ntroduced many inplace I/O to handle/minimize that, including inplace
I/O for compressed pages and temporary pages.

But I'm not quite sure what's currently happening now, since we once
didn't have such non-deterministic workqueues, and I don't hear from
other landed vendors.  I think it'd be better to analyse what's going
on for these kworkers from scheduling POV and why they don't schedule
in time.

I also have an idea is much like what I'm doing now for sync
decompression, is that just before lock page and ->read_folio, we can
trigger some decompression in addition to kworker decompression, but it
needs some MM modification, as below:

   !PageUptodate(page)

   some callback to decompress in addition to kworker

   lock_page()
   ->read_folio()

If mm folks don't like it, I think RT thread is also fine after we
analysed the root cause of the kworker delay I think.

Thanks,
Gao Xiang

> 
> Thanks,