Received: by 2002:a05:6512:23a5:0:0:0:0 with SMTP id c37csp499597lfv; Tue, 15 Feb 2022 07:28:17 -0800 (PST) X-Google-Smtp-Source: ABdhPJxsOJaEl2EVXyzuQNWi20VqJU9xfnqCWZ6GpQp2QjUYBrNofx9h7GWL+XPHBmERZQxsEN44 X-Received: by 2002:a05:6402:358e:: with SMTP id y14mr4497494edc.136.1644938897735; Tue, 15 Feb 2022 07:28:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644938897; cv=none; d=google.com; s=arc-20160816; b=RmXaYeDPuum1wAXfsBrLEAt5oULFLKe3T6nMw3GtPEJSbDw+CVdK18wXpeYC8bTizL utLSmbODdHp2zjB8BgOWDH4aNJxhguX+c+rS3XwYH3uHQehOc5DDm6dKiOEez4XmQsB/ /6N88twZ+UEhmua7MiZkW7kAHueyE3j2tP+TJLD/eUKYxRDa5K+m8WlbPHCGMGuWIcOU YFcS94yn4oZwVmzA9J++l8q+0Has1RW8H12l+/YYkYj+Q7DxpQcghknXiGEYnG+ZtJGF HsTBFG/3s23kjeso8nLEDk8m9WIA8Co8RDn5aQ0tkupZeQKTYzktimVFKhEJVp4dh19Y UHdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=fsJbGaa3SFcG2KERbDvW9xmJviCwqG46K8qkid7fn6c=; b=J9/wHZlhOvMvR1NVZKQTJyuuCrro6smw3vr6FQaSfXEuzKLGkEYGeNzNSxtzrDeISs dUpnI9Htmp/umdq4ZtKk7a842TTzBNyISYlXe486NxZI0Xbrrwrwm9zU2L4W/sMdsWqT UhEB4FlAGdpQtK+nLTBZsAstxVqiJbTUGPaTAadRqkFmh2nknSsYuXxMRF7K9nqQMRZ0 ypaF7yn7sfHpDyMhtUcdSf7J459boKqONPOeMgWH9iavjxxE1SAtVCMlnrmCkIB3Xq6d WhUQLeCzZnQAICwIiyC1laXEDHg5hYa+FdFH+wF+qQW2K05JLbDyW6ghKaecZ2Fh8ctb ad3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=7XgS7fAL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kf17si6656995ejc.194.2022.02.15.07.27.53; Tue, 15 Feb 2022 07:28:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=7XgS7fAL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238183AbiBONSD (ORCPT + 99 others); Tue, 15 Feb 2022 08:18:03 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:51752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232192AbiBONSC (ORCPT ); Tue, 15 Feb 2022 08:18:02 -0500 Received: from mail-ej1-x634.google.com (mail-ej1-x634.google.com [IPv6:2a00:1450:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F352ED109F for ; Tue, 15 Feb 2022 05:17:49 -0800 (PST) Received: by mail-ej1-x634.google.com with SMTP id lw4so13361679ejb.12 for ; Tue, 15 Feb 2022 05:17:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fsJbGaa3SFcG2KERbDvW9xmJviCwqG46K8qkid7fn6c=; b=7XgS7fALQBQuwbclAnMxqO2wqnc/+bGyhHE6lujXNJouj8037qr9Ue3xmlYhI+AFsM f03sdWv+Ujf9RTFJyoBv7idlnbP7CFOaazsTBbodwXnN05FoLL135oSZZxVmGFnnwP88 WH3bxdDxRsdRjGcjNZZO48JYgmAldIWUGcGXbXsjnGnIh9ZihL3KqC9/M1MbHs3GoBne zE83wH/HodNhR31zL6+Hqvc1xTPXqI9cJkDGWbw0rx6TjkaZPKZQOq6w46mlcAExg6az +Ec+2vyvtnJq1nDHQ82gJCkdM7D0dK0xKB5ba5B4yaS+xrvLCTuuVhVUZzMWRcV3I5k/ IEhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fsJbGaa3SFcG2KERbDvW9xmJviCwqG46K8qkid7fn6c=; b=U8V0YnYCwqS9HIplFRR9DzIRyoK/VnD0SCACJZb7VdEh4JeBRZanP8qNPZmMgLQL4d va3c1GefrMMU3jk20KJdAUkMFi6xUB5CrnvrHRm/83rnyzrX+p6JJa13H4zRC3GNg11a wzhYoWRsJRMT0OjzJTsLDCja3Buu2d0BlEbn4RU45UqeJd7MJFcBecUlE7guHw51HAQn q12K82+B+LgNuv9w5MGE4gjgZJ0Gq7deBuxCBH+4QBI8dOyT8sAArZPWeUTFWEhhi9ul pMoR+7UTDaDR5CI1nRCGuHGLhktksPNZ1ymR3ztJTOYNQ9BSvCviNqf7rBsKuFqNpHKt ZhsA== X-Gm-Message-State: AOAM530CsPGzPlAKV+C0xF26w9IuSZQBFZz97mi9IcrEg50lnyrM8qxc WuQcU+n/hC5GtTHogbbdW9La2uPMD9lffKtKtMSb X-Received: by 2002:a17:907:9605:: with SMTP id gb5mr3161841ejc.490.1644931068478; Tue, 15 Feb 2022 05:17:48 -0800 (PST) MIME-Version: 1.0 References: <20211227091241.103-1-xieyongji@bytedance.com> In-Reply-To: From: Yongji Xie Date: Tue, 15 Feb 2022 21:17:37 +0800 Message-ID: Subject: Re: [PATCH v2] nbd: Don't use workqueue to handle recv work To: Josef Bacik Cc: Christoph Hellwig , Jens Axboe , Bart Van Assche , linux-block@vger.kernel.org, nbd@other.debian.org, linux-kernel Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ping again. Hi Josef, could you take a look? On Fri, Jan 21, 2022 at 4:34 PM Yongji Xie wrote: > > Ping. > > On Wed, Jan 5, 2022 at 1:36 PM Yongji Xie wrote: > > > > On Wed, Jan 5, 2022 at 2:06 AM Josef Bacik wrote: > > > > > > On Tue, Jan 04, 2022 at 01:31:47PM +0800, Yongji Xie wrote: > > > > On Tue, Jan 4, 2022 at 12:10 AM Josef Bacik wrote: > > > > > > > > > > On Thu, Dec 30, 2021 at 12:01:23PM +0800, Yongji Xie wrote: > > > > > > On Thu, Dec 30, 2021 at 1:35 AM Christoph Hellwig wrote: > > > > > > > > > > > > > > On Mon, Dec 27, 2021 at 05:12:41PM +0800, Xie Yongji wrote: > > > > > > > > The rescuer thread might take over the works queued on > > > > > > > > the workqueue when the worker thread creation timed out. > > > > > > > > If this happens, we have no chance to create multiple > > > > > > > > recv threads which causes I/O hung on this nbd device. > > > > > > > > > > > > > > If a workqueue is used there aren't really 'receive threads'. > > > > > > > What is the deadlock here? > > > > > > > > > > > > We might have multiple recv works, and those recv works won't quit > > > > > > unless the socket is closed. If the rescuer thread takes over those > > > > > > works, only the first recv work can run. The I/O needed to be handled > > > > > > in other recv works would be hung since no thread can handle them. > > > > > > > > > > > > > > > > I'm not following this explanation. What is the rescuer thread you're talking > > > > > > > > https://www.kernel.org/doc/html/latest/core-api/workqueue.html#c.rescuer_thread > > > > > > > > > > Ahhh ok now I see, thanks, I didn't know this is how this worked. > > > > > > So what happens is we do the queue_work(), this needs to do a GFP_KERNEL > > > allocation internally, we are unable to satisfy this, and thus the work gets > > > pushed onto the rescuer thread. > > > > > > Then the rescuer thread can't be used in the future because it's doing this long > > > running thing. > > > > > > > Yes. > > > > > I think the correct thing to do here is simply drop the WQ_MEM_RECLAIM bit. It > > > makes sense for workqueue's that are handling the work of short lived works that > > > are in the memory reclaim path. That's not what these workers are doing, yes > > > they are in the reclaim path, but they run the entire time the device is up. > > > The actual work happens as they process incoming requests. AFAICT > > > WQ_MEM_RECLAIM doesn't affect the actual allocations that the worker thread > > > needs to do, which is what I think the intention was in using WQ_MEM_RECLAIM, > > > which isn't really what it's used for. > > > > > > tl;dr, just remove thee WQ_MEM_RECLAIM flag completely and I think that's good > > > enough? Thanks, > > > > > > > In the reconnect case, we still need to call queue_work() while the > > device is running. So it looks like we can't simply remove the > > WQ_MEM_RECLAIM flag. > > > > Thanks, > > Yongji