Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757538AbcLADNU (ORCPT ); Wed, 30 Nov 2016 22:13:20 -0500 Received: from mx1.redhat.com ([209.132.183.28]:52590 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754724AbcLADNT (ORCPT ); Wed, 30 Nov 2016 22:13:19 -0500 Subject: Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors To: wangyunjian , "Michael S. Tsirkin" References: <1480507857-22976-1-git-send-email-wangyunjian@huawei.com> <20161130152004-mutt-send-email-mst@kernel.org> <34EFBCA9F01B0748BEB6B629CE643AE60B0A7B68@szxeml561-mbx.china.huawei.com> Cc: "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , caihe From: Jason Wang Message-ID: Date: Thu, 1 Dec 2016 11:13:11 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <34EFBCA9F01B0748BEB6B629CE643AE60B0A7B68@szxeml561-mbx.china.huawei.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Thu, 01 Dec 2016 03:13:19 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2403 Lines: 62 On 2016年12月01日 10:48, wangyunjian wrote: >> -----Original Message----- >> From: Michael S. Tsirkin [mailto:mst@redhat.com] >> Sent: Wednesday, November 30, 2016 9:41 PM >> To: wangyunjian >> Cc: jasowang@redhat.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; caihe >> Subject: Re: [PATCH net] vhost_net: don't continue to call the recvmsg when meet errors >> >> On Wed, Nov 30, 2016 at 08:10:57PM +0800, Yunjian Wang wrote: >>> When we meet an error(err=-EBADFD) recvmsg, >> How do you get EBADFD? Won't vhost_net_rx_peek_head_len >> return 0 in this case, breaking the loop? > We started many guest VMs while attaching/detaching some virtio-net nics for loop. > The soft lockup might happened. The err is -EBADFD. How did you do the attaching/detaching? AFAIK, the -EBADFD can only happens when you deleting tun device during vhost_net transmission. > > meesage log: > kernel:[609608.510180]BUG: soft lockup - CPU#18 stuck for 23s! [vhost-60898:126093] > call trace: > []vhost_get_vq_desc+0x1e7/0x984 [vhost] > []handle_rx+0x226/0x810 [vhost_net] > []handle_rx_net+0x15/0x20 [vhost_net] > []vhost_worker+0xfb/0x1e0 [vhost] > []? vhost_dev_reset_owner+0x50/0x50 [vhost] > []kthread+0xcf/0xe0 > []? kthread_create_on_node+0x140/0x140 > []ret_from_fork+0x58/0x90 > []? kthread_create_on_node+0x140/0x140 > >>> the error handling in vhost >>> handle_rx() will continue. This will cause a soft CPU lockup in vhost thread. >>> >>> Signed-off-by: Yunjian Wang >>> --- >>> drivers/vhost/net.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c >>> index 5dc128a..edc470b 100644 >>> --- a/drivers/vhost/net.c >>> +++ b/drivers/vhost/net.c >>> @@ -717,6 +717,9 @@ static void handle_rx(struct vhost_net *net) >>> pr_debug("Discarded rx packet: " >>> " len %d, expected %zd\n", err, sock_len); >>> vhost_discard_vq_desc(vq, headcount); >>> + /* Don't continue to do, when meet errors. */ >>> + if (err < 0) >>> + goto out; >> You might get e.g. EAGAIN and I think you need to retry >> in this case. >> >>> continue; >>> } >>> /* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */ >>> -- >>> 1.9.5.msysgit.1 >>>