Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp589337pxb; Thu, 15 Apr 2021 01:39:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy+jppxzjPNnPHFBjYjRYiWibSJV4iniYPdGAxffcHSfM5xqD3FwLQ5UpkTrs5PlBYgMdKF X-Received: by 2002:a17:90a:88e:: with SMTP id v14mr2654901pjc.107.1618475953423; Thu, 15 Apr 2021 01:39:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618475953; cv=none; d=google.com; s=arc-20160816; b=lmE5ppflbj9Cd4N8elufhIqPyysP2lVSJVqnm7nXFo1HLXroY9cLabqwreU85wUrqX yALQcFq00+ep8vzEMtra6qBdlmrC8Gk6mdzuN9QO8/gw0lZ4crgSRSCBEAxi49jKCoJz uSwH/2GikFAOx1/Mf7n51ZM+bR8kzlKZNmwL9y+1epidL/5b9HKWNjhySulYtqM6lgTz dCx0DlXnTMKODHZ9J8dJ6w/At+tr7Ah8CbNp/ggozXntrj1//5QGceW3cku3S3FM919y YEzJz4q5OfxsQWzvKbrJiuaE8McoWNlv4HQXY9rmHrCt4X39RmGsPpJj+Y0NvIpE02Gt FwIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:message-id :in-reply-to:subject:cc:to:from:date; bh=tX8nQuSkyZk3wuJnT2827MEPpDMNKcSJZEMIMt4Znus=; b=XmrLZPdxuOCGaUg0m0ztohfr/Y8ZkeUmxcwuCzR4KK0A7W++wMUIz0eDJv8RTueth1 Dc4DdX62h5VGZ1skhR0MFwTsslrfpcWOj9z097IdZ+l192wULCH8uYlciYiCAjO0Ojap pFLbI297SsRTvF7rSQVmmc2HL5TjHS4727Frr+Kk4M5+/0h/yH70DCCUOxPCx4nzvrDA qOpcVEpy2ZKJY52MfQEYMvqIg+rAQTecTiK8iVFqbA2lbopHTCxpPjXW6THjjnU6wD/n ony4Bv5vMQOYgbV71Rgjw8iDsRjpLcYr49QywkDNg71iCB5pXTZho1Wm7ocd0v+nTpId +ANQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g1si2374868pgd.275.2021.04.15.01.39.00; Thu, 15 Apr 2021 01:39:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231838AbhDOIiR (ORCPT + 99 others); Thu, 15 Apr 2021 04:38:17 -0400 Received: from mx2.suse.de ([195.135.220.15]:45596 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231736AbhDOIiI (ORCPT ); Thu, 15 Apr 2021 04:38:08 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 20484AE27; Thu, 15 Apr 2021 08:37:45 +0000 (UTC) Date: Thu, 15 Apr 2021 10:37:44 +0200 (CEST) From: Miroslav Benes To: Josef Bacik cc: xiaojun.zhao141@gmail.com, linux-kernel@vger.kernel.org, live-patching@vger.kernel.org Subject: Re: the qemu-nbd process automatically exit with the commit 43347d56c 'livepatch: send a fake signal to all blocking tasks' In-Reply-To: Message-ID: References: <20210414115548.0cdb529b@slime> <20210414232119.13b126fa@slime> User-Agent: Alpine 2.21 (LSU 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 14 Apr 2021, Josef Bacik wrote: > On 4/14/21 11:21 AM, xiaojun.zhao141@gmail.com wrote: > > On Wed, 14 Apr 2021 13:27:43 +0200 (CEST) > > Miroslav Benes wrote: > > > >> Hi, > >> > >> On Wed, 14 Apr 2021, xiaojun.zhao141@gmail.com wrote: > >> > >>> I found the qemu-nbd process(started with qemu-nbd -t -c /dev/nbd0 > >>> nbd.qcow2) will automatically exit when I patched for functions of > >>> the nbd with livepatch. > >>> > >>> The nbd relative source: > >>> static int nbd_start_device_ioctl(struct nbd_device *nbd, struct > >>> block_device *bdev) > >>> { struct nbd_config *config = > >>> nbd->config; int > >>> ret; > >>> ret = > >>> nbd_start_device(nbd); if > >>> (ret) return > >>> ret; > >>> if > >>> (max_part) bdev->bd_invalidated = > >>> 1; > >>> mutex_unlock(&nbd->config_lock); ret = > >>> wait_event_interruptible(config->recv_wq, > >>> atomic_read(&config->recv_threads) == 0); if > >>> (ret) > >>> sock_shutdown(nbd); > >>> flush_workqueue(nbd->recv_workq); > >>> mutex_lock(&nbd->config_lock); > >>> nbd_bdev_reset(bdev); > >>> /* user requested, ignore socket errors > >>> */ if (test_bit(NBD_RT_DISCONNECT_REQUESTED, > >>> &config->runtime_flags)) ret = > >>> 0; if (test_bit(NBD_RT_TIMEDOUT, > >>> &config->runtime_flags)) ret = > >>> -ETIMEDOUT; return > >>> ret; } > >> > >> So my understanding is that ndb spawns a number > >> (config->recv_threads) of workqueue jobs and then waits for them to > >> finish. It waits interruptedly. Now, any signal would make > >> wait_event_interruptible() to return -ERESTARTSYS. Livepatch fake > >> signal is no exception there. The error is then propagated back to > >> the userspace. Unless a user requested a disconnection or there is > >> timeout set. How does the userspace then reacts to it? Is > >> _interruptible there because the userspace sends a signal in case of > >> NBD_RT_DISCONNECT_REQUESTED set? How does the userspace handles > >> ordinary signals? This all sounds a bit strange, but I may be missing > >> something easily. > >> > >>> When the nbd waits for atomic_read(&config->recv_threads) == 0, the > >>> klp will send a fake signal to it then the qemu-nbd process exits. > >>> And the signal of sysfs to control this action was removed in the > >>> commit 10b3d52790e 'livepatch: Remove signal sysfs attribute'. Are > >>> there other ways to control this action? How? > >> > >> No, there is no way currently. We send a fake signal automatically. > >> > >> Regards > >> Miroslav > > It occurs IO error of the nbd device when I use livepatch of the > > nbd, and I guess that any livepatch on other kernel source maybe cause > > the IO error. Well, now I decide to workaround for this problem by > > adding a livepatch for the klp to disable a automatic fake signal. > > > > Would wait_event_killable() fix this problem? I'm not sure any client > implementations depend on being able to send other signals to the client > process, so it should be safe from that standpoint. Not sure if the livepatch > thing would still get an error at that point tho. Thanks, wait_event_killable() means that you would sleep uninterruptedly (still reacting to fatal signals), so the fake signal from livepatch would not be sent at all. set_notify_signal() handles TASK_INTERRUPTIBLE tasks. No disruption for the userspace and it would fix this problem. There is a catch on the livepatch side of things. If there is a live patch for nbd_start_device_ioctl(), the transition process would get stuck until the task leaves the function (all workqueue jobs are processed). I gather it is unlikely to be it indefinite, so we can live with that, I think. Miroslav