Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp1368009pxb; Fri, 21 Jan 2022 16:42:24 -0800 (PST) X-Google-Smtp-Source: ABdhPJxzKxhVD1mtA4gXDPBxCemGbpWSYNUoJacnkwFCtdhnMFem6w8zOamhtNCSEME2bGZXMDrI X-Received: by 2002:a17:902:7c09:b0:148:e02f:176b with SMTP id x9-20020a1709027c0900b00148e02f176bmr5872699pll.130.1642812144215; Fri, 21 Jan 2022 16:42:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642812144; cv=none; d=google.com; s=arc-20160816; b=W7pSJ0cueBa0kGi8um+JOR0yKDt/wUTL+0fwyjK80pgHC+tjDDxKHsdHNzGjbwmN8b ZgC+VAKxlxoXyGwd8ACa9Jz0aljyUFW5AQsZOHUZeH3LOBwflSwtgiXWFRxjI7TfrYZb BweNw0+yLPtHJHImr/N774wIuKgRH3uxzL9DqW9wCe3es/XhHmHEXXSK4Jnt5sLJbvbr Tb3IU8KHX2d+kgBYE3F/LXZDiIfgKai+SUfK3xuUdK+54h5+YVRFVg4dGhA4OnqZ3MSQ WpRwBF9ISEsELvidZhp6KsQTFri8R0pMFKZJ4AF4GEmfCIKP4ZrEZ1oMeL/l9FTLMKHm dzqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=MaJVMg+g5DUUOUdBu9/Df9/RFBs/TIbTQZzIsKw2iKg=; b=MzJVoH9j/As7HJlqcfDN+xYVoZAszVP6uMZGuBTOJyBhnSxaPDXRhpB/AXiJKP9UT1 4i/yIy1OgzrH8ygF2RrRWI8xfl5X0gAxODVILLJLQopWfI7pW3uRvmrptsHx/bGOga0g ZGKiJzWoo0Sg0FfdRqDtyFID7VVMEJ6RMDnUUcFnEuWmyUJAfY74sktATVqDq4ecX6jA JRYDz+dE0NEFTOQy44u5B7gMK5xCg6De8h7hNU8qIQmO+9CT7n0toXtHSVn3QAurzi+X +3jokOZ4c4whb2c/hxyUhu7PIDWU0RbFsAyMc1XkueD1/ERv7TzqOYy/Ue6XX0m5iCkB gMBA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=P7QTTv7h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s1si7978241plg.23.2022.01.21.16.42.12; Fri, 21 Jan 2022 16:42:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=P7QTTv7h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234195AbiAUIfJ (ORCPT + 99 others); Fri, 21 Jan 2022 03:35:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48796 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1379591AbiAUIeZ (ORCPT ); Fri, 21 Jan 2022 03:34:25 -0500 Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [IPv6:2a00:1450:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 89213C061574 for ; Fri, 21 Jan 2022 00:34:24 -0800 (PST) Received: by mail-ed1-x531.google.com with SMTP id l5so22864934edv.3 for ; Fri, 21 Jan 2022 00:34:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=MaJVMg+g5DUUOUdBu9/Df9/RFBs/TIbTQZzIsKw2iKg=; b=P7QTTv7hHfbbe2OClT1Kjb3kRatE9n4kVuBrYAX+TSocdmc/pPPmPArC+eBY8j45O3 /SqBaFXkywgPJ+2rC3PruZ4LR45txQGneJk3zC45MIi26yOLkRREZX8IsvAT7tkdrxP5 uoblK4HfmWm0avFOSwpzGyoUMzGHyvAP1s3g/3zm9BRNVjxt9WiYwhbKz5KuLrqOuYh7 tHag4GlgjUnKmV+cBLPaJ+enEnDmJxHSDFo+1/0HV+VCGvlilYuI/RlY22HlbJhKLVdC p4cbHOrTY4GL975OpWwwVb/paJISTARLDu193tj/pGhcRcg0khuHika8wQ2RFnrWcS6S f4+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=MaJVMg+g5DUUOUdBu9/Df9/RFBs/TIbTQZzIsKw2iKg=; b=3gBqrN4+VLmB4FKxWwkwiF3PU6hpV9DEEgr3EzxomQpIEHgkaTvaAlaahcmsPUxyN4 VkEBAReFOEAXQUwqgGEuJRCjymRzSJyugz8RB+YjJ8i3V8tK5hxGdPlhGSbG8PA7udTC LPRPjAbw1hwgqJ7QEZZQzt7I7UtaBBeZ6iEnjv2Hkt9TA28S0UT1fKfg81tndLa4VuA6 o5SDtcy+EpWop46I/pQ+EFZgHDhQcHQIHo7Ljbe9vd6vSEXO+RyaEG3HlA76CNFPlA1a 70Uipz6x484hXD2iAmKegjgB5G4CFDNebB37mXId/iynOF0omKAc5Xx7jLpPiN37PY89 rvMg== X-Gm-Message-State: AOAM5318VakeFWUHQvShb1tzN8oysEoo2OJvbSl5jjc8qyiPEJt0sG6+ tfC85hzll8JzFYJk4kl+50eD3osHIs8wTyBG2lsf X-Received: by 2002:a05:6402:4495:: with SMTP id er21mr3388511edb.298.1642754063092; Fri, 21 Jan 2022 00:34:23 -0800 (PST) MIME-Version: 1.0 References: <20211227091241.103-1-xieyongji@bytedance.com> In-Reply-To: From: Yongji Xie Date: Fri, 21 Jan 2022 16:34:11 +0800 Message-ID: Subject: Re: [PATCH v2] nbd: Don't use workqueue to handle recv work To: Josef Bacik Cc: Christoph Hellwig , Jens Axboe , Bart Van Assche , linux-block@vger.kernel.org, nbd@other.debian.org, linux-kernel Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ping. On Wed, Jan 5, 2022 at 1:36 PM Yongji Xie wrote: > > On Wed, Jan 5, 2022 at 2:06 AM Josef Bacik wrote: > > > > On Tue, Jan 04, 2022 at 01:31:47PM +0800, Yongji Xie wrote: > > > On Tue, Jan 4, 2022 at 12:10 AM Josef Bacik wrote: > > > > > > > > On Thu, Dec 30, 2021 at 12:01:23PM +0800, Yongji Xie wrote: > > > > > On Thu, Dec 30, 2021 at 1:35 AM Christoph Hellwig wrote: > > > > > > > > > > > > On Mon, Dec 27, 2021 at 05:12:41PM +0800, Xie Yongji wrote: > > > > > > > The rescuer thread might take over the works queued on > > > > > > > the workqueue when the worker thread creation timed out. > > > > > > > If this happens, we have no chance to create multiple > > > > > > > recv threads which causes I/O hung on this nbd device. > > > > > > > > > > > > If a workqueue is used there aren't really 'receive threads'. > > > > > > What is the deadlock here? > > > > > > > > > > We might have multiple recv works, and those recv works won't quit > > > > > unless the socket is closed. If the rescuer thread takes over those > > > > > works, only the first recv work can run. The I/O needed to be handled > > > > > in other recv works would be hung since no thread can handle them. > > > > > > > > > > > > > I'm not following this explanation. What is the rescuer thread you're talking > > > > > > https://www.kernel.org/doc/html/latest/core-api/workqueue.html#c.rescuer_thread > > > > > > > Ahhh ok now I see, thanks, I didn't know this is how this worked. > > > > So what happens is we do the queue_work(), this needs to do a GFP_KERNEL > > allocation internally, we are unable to satisfy this, and thus the work gets > > pushed onto the rescuer thread. > > > > Then the rescuer thread can't be used in the future because it's doing this long > > running thing. > > > > Yes. > > > I think the correct thing to do here is simply drop the WQ_MEM_RECLAIM bit. It > > makes sense for workqueue's that are handling the work of short lived works that > > are in the memory reclaim path. That's not what these workers are doing, yes > > they are in the reclaim path, but they run the entire time the device is up. > > The actual work happens as they process incoming requests. AFAICT > > WQ_MEM_RECLAIM doesn't affect the actual allocations that the worker thread > > needs to do, which is what I think the intention was in using WQ_MEM_RECLAIM, > > which isn't really what it's used for. > > > > tl;dr, just remove thee WQ_MEM_RECLAIM flag completely and I think that's good > > enough? Thanks, > > > > In the reconnect case, we still need to call queue_work() while the > device is running. So it looks like we can't simply remove the > WQ_MEM_RECLAIM flag. > > Thanks, > Yongji