Date:   Tue, 4 Jan 2022 13:06:32 -0500
From:   Josef Bacik <josef@toxicpanda.com>
To:     Yongji Xie <xieyongji@bytedance.com>
Cc:     Christoph Hellwig <hch@infradead.org>,
        Jens Axboe <axboe@kernel.dk>,
        Bart Van Assche <bvanassche@acm.org>,
        linux-block@vger.kernel.org, nbd@other.debian.org,
        linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] nbd: Don't use workqueue to handle recv work
Message-ID: <YdSMqKXv0PUkAwfl@localhost.localdomain>
References: <20211227091241.103-1-xieyongji@bytedance.com>
 <Ycycda8w/zHWGw9c@infradead.org>
 <CACycT3usfTdzmK=gOsBf3=-0e8HZ3_0ZiBJqkWb_r7nki7xzYA@mail.gmail.com>
 <YdMgCS1RMcb5V2RJ@localhost.localdomain>
 <CACycT3vYt0XNV2GdjKjDS1iyWieY_OV4h=W1qqk_AAAahRZowA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CACycT3vYt0XNV2GdjKjDS1iyWieY_OV4h=W1qqk_AAAahRZowA@mail.gmail.com>
Precedence: bulk

On Tue, Jan 04, 2022 at 01:31:47PM +0800, Yongji Xie wrote:
> On Tue, Jan 4, 2022 at 12:10 AM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > On Thu, Dec 30, 2021 at 12:01:23PM +0800, Yongji Xie wrote:
> > > On Thu, Dec 30, 2021 at 1:35 AM Christoph Hellwig <hch@infradead.org> wrote:
> > > >
> > > > On Mon, Dec 27, 2021 at 05:12:41PM +0800, Xie Yongji wrote:
> > > > > The rescuer thread might take over the works queued on
> > > > > the workqueue when the worker thread creation timed out.
> > > > > If this happens, we have no chance to create multiple
> > > > > recv threads which causes I/O hung on this nbd device.
> > > >
> > > > If a workqueue is used there aren't really 'receive threads'.
> > > > What is the deadlock here?
> > >
> > > We might have multiple recv works, and those recv works won't quit
> > > unless the socket is closed. If the rescuer thread takes over those
> > > works, only the first recv work can run. The I/O needed to be handled
> > > in other recv works would be hung since no thread can handle them.
> > >
> >
> > I'm not following this explanation.  What is the rescuer thread you're talking
> 
> https://www.kernel.org/doc/html/latest/core-api/workqueue.html#c.rescuer_thread
> 

Ahhh ok now I see, thanks, I didn't know this is how this worked.

So what happens is we do the queue_work(), this needs to do a GFP_KERNEL
allocation internally, we are unable to satisfy this, and thus the work gets
pushed onto the rescuer thread.

Then the rescuer thread can't be used in the future because it's doing this long
running thing.

I think the correct thing to do here is simply drop the WQ_MEM_RECLAIM bit.  It
makes sense for workqueue's that are handling the work of short lived works that
are in the memory reclaim path.  That's not what these workers are doing, yes
they are in the reclaim path, but they run the entire time the device is up.
The actual work happens as they process incoming requests.  AFAICT
WQ_MEM_RECLAIM doesn't affect the actual allocations that the worker thread
needs to do, which is what I think the intention was in using WQ_MEM_RECLAIM,
which isn't really what it's used for.

tl;dr, just remove thee WQ_MEM_RECLAIM flag completely and I think that's good
enough?  Thanks,

Josef