Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp3757403ybl; Tue, 21 Jan 2020 06:33:15 -0800 (PST) X-Google-Smtp-Source: APXvYqyUq6fXS+PE9/qa39S1W6FsnfE3PkTktjY5mIxtx2OpfeY/P/XL39SKK1ZNUaNxCgJ1u76C X-Received: by 2002:a05:6830:12ce:: with SMTP id a14mr3595444otq.366.1579617194918; Tue, 21 Jan 2020 06:33:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579617194; cv=none; d=google.com; s=arc-20160816; b=GhowV1QqJ9+ErPz/V856dev0fvy6+sXZzpY6Frkrdcp/jGJgF+tPPKyXCWlMnawVYB Rkwdv3RaNl2ry0V4ea7GaSxPU3rscRcc5/BHZHUeze2H15nanLMs/5+3dsfSiLeTubCd fJOEG06B4W0c+AoN7oJI7r8/+00Yj6iouiav908HRjGNPev8BZaelagkD2MOODsMtN30 oARa09TjvHf4kh8EcFHHUuOveu3y2G5SbouxB7cGhDU83837+UozwAL8JNOwwPQd6AUP KX0kDMX0mT5R4SlpZ/Wd/tHRjezjzbaJKf6UmoMwgnfSd+XQkj2ZKV/W6kj7wKYiy4Jp gq8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=9mF7S/CElHcPOzBiJE5ZLHPm54JjJRyvMZy7fUQh3+o=; b=BKDIJ0fhuh+Vc+p1OVic7YF3TwUbYVPgMJZec8sQzxxw8s+iR3dJQo9ytPtrE/oeYn ZILW+Gb0YSwWK/2QA1pmTwV7hQqxNm0Tjqd77I5mOi/yxWCh+vRjDzT5MNxXN5xuKoB7 cd1GjWcxPnD24iHKmFIfhvkzAzCsdIZ49N9asx6dM7R0Lun5uwfeX6u+3vg9ia7VI1La USOJPhPf3QWdPZdET+vp1LoZ47es06j6XGq1KEUdR2Lzm0TZv9udOzzf1aZsi1bb/LzB e4QHR4+TU6EGzGavtthGzU62B/+Oh0gK0LMauncAU0w3mnF9qoxknQUvglkezuPE4b5T u0zQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NnNnaQaa; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s17si22750127otq.78.2020.01.21.06.33.02; Tue, 21 Jan 2020 06:33:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NnNnaQaa; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729246AbgAUObw (ORCPT + 99 others); Tue, 21 Jan 2020 09:31:52 -0500 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:20573 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728992AbgAUObw (ORCPT ); Tue, 21 Jan 2020 09:31:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1579617111; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=9mF7S/CElHcPOzBiJE5ZLHPm54JjJRyvMZy7fUQh3+o=; b=NnNnaQaanfumYsmNsvScvYmoGERSc4ePxHlyig0UHJ1P1D3dhk4bXBzs0pqqjNGXqE6JfU Xn3JUYgB16+sdJmtErsByX83oCpeK2LNXYZ44fvRhcQTuIz2m9wW8gl8zKK3HyTEPMwreV p1NtqbblBECAY2/2Ye55pI5DmVpUQrE= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-37-kkVtd8nDPDCgzWTCJoFYVQ-1; Tue, 21 Jan 2020 09:31:49 -0500 X-MC-Unique: kkVtd8nDPDCgzWTCJoFYVQ-1 Received: by mail-qt1-f198.google.com with SMTP id k27so1994810qtu.12 for ; Tue, 21 Jan 2020 06:31:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=9mF7S/CElHcPOzBiJE5ZLHPm54JjJRyvMZy7fUQh3+o=; b=X/rtDL3eEUNz5usEU0uAqd379NkYHxUrEr0zTO9RnprQI67IellVlkuaWqkDLnIF7c LGAcXTuDCQgpPKJMZuEmq54ClBJugFOTfSwZ3iWji9YsY7HLNdfSDvgx1Dqp711Tilb9 u7Y+SMfguPsiEUUOv8S9MZlVSWX1/9kxDkDeiTHtdo98yZM98DW+yY2k97dZLNKyZfwH Z8xMN3/O1TpQb+AwQkO2RAptyvc9BFu4ODYEFwuEPynOZ7+bw+oEg9iAuU3bJ42ORD4G fWy05WquflggDUmWPjX5oaLDU84DAep2oRqhMFluxB4pPrqG11SdLA8w7WFBM31zRCOz Ww2g== X-Gm-Message-State: APjAAAWD4zJTJ8OLHRqB3Oxe0P3sNlzAbVGp+wJwcJhOjtv7uLe6bmKG QCdSlpgY+TMyZ+96Zku8FOdy/8VqiefjYA1J1abclYcwP+utj0eVBLuIk8gUL+qL4rVdNKOe9l3 V07d6J4kvgaZTyl/GFmQwHfe5 X-Received: by 2002:ac8:461a:: with SMTP id p26mr4497798qtn.317.1579617108767; Tue, 21 Jan 2020 06:31:48 -0800 (PST) X-Received: by 2002:ac8:461a:: with SMTP id p26mr4497751qtn.317.1579617108435; Tue, 21 Jan 2020 06:31:48 -0800 (PST) Received: from redhat.com (bzq-79-179-85-180.red.bezeqint.net. [79.179.85.180]) by smtp.gmail.com with ESMTPSA id h13sm142713qtu.23.2020.01.21.06.31.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Jan 2020 06:31:47 -0800 (PST) Date: Tue, 21 Jan 2020 09:31:42 -0500 From: "Michael S. Tsirkin" To: Stefan Hajnoczi Cc: Stefano Garzarella , David Miller , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Jorgen Hansen , Jason Wang , kvm , virtualization@lists.linux-foundation.org, linux-hyperv@vger.kernel.org, Dexuan Cui Subject: Re: [PATCH net-next 1/3] vsock: add network namespace support Message-ID: <20200121093104-mutt-send-email-mst@kernel.org> References: <20200116172428.311437-2-sgarzare@redhat.com> <20200120.100610.546818167633238909.davem@davemloft.net> <20200120101735.uyh4o64gb4njakw5@steredhat> <20200120060601-mutt-send-email-mst@kernel.org> <20200120110319-mutt-send-email-mst@kernel.org> <20200120170120-mutt-send-email-mst@kernel.org> <20200121135907.GA641751@stefanha-x1.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200121135907.GA641751@stefanha-x1.localdomain> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 21, 2020 at 01:59:07PM +0000, Stefan Hajnoczi wrote: > On Tue, Jan 21, 2020 at 10:07:06AM +0100, Stefano Garzarella wrote: > > On Mon, Jan 20, 2020 at 11:02 PM Michael S. Tsirkin wrote: > > > On Mon, Jan 20, 2020 at 05:53:39PM +0100, Stefano Garzarella wrote: > > > > On Mon, Jan 20, 2020 at 5:04 PM Michael S. Tsirkin wrote: > > > > > On Mon, Jan 20, 2020 at 02:58:01PM +0100, Stefano Garzarella wrote: > > > > > > On Mon, Jan 20, 2020 at 1:03 PM Michael S. Tsirkin wrote: > > > > > > > On Mon, Jan 20, 2020 at 11:17:35AM +0100, Stefano Garzarella wrote: > > > > > > > > On Mon, Jan 20, 2020 at 10:06:10AM +0100, David Miller wrote: > > > > > > > > > From: Stefano Garzarella > > > > > > > > > Date: Thu, 16 Jan 2020 18:24:26 +0100 > > > > > > > > > > > > > > > > > > > This patch adds 'netns' module param to enable this new feature > > > > > > > > > > (disabled by default), because it changes vsock's behavior with > > > > > > > > > > network namespaces and could break existing applications. > > > > > > > > > > > > > > > > > > Sorry, no. > > > > > > > > > > > > > > > > > > I wonder if you can even design a legitimate, reasonable, use case > > > > > > > > > where these netns changes could break things. > > > > > > > > > > > > > > > > I forgot to mention the use case. > > > > > > > > I tried the RFC with Kata containers and we found that Kata shim-v1 > > > > > > > > doesn't work (Kata shim-v2 works as is) because there are the following > > > > > > > > processes involved: > > > > > > > > - kata-runtime (runs in the init_netns) opens /dev/vhost-vsock and > > > > > > > > passes it to qemu > > > > > > > > - kata-shim (runs in a container) wants to talk with the guest but the > > > > > > > > vsock device is assigned to the init_netns and kata-shim runs in a > > > > > > > > different netns, so the communication is not allowed > > > > > > > > But, as you said, this could be a wrong design, indeed they already > > > > > > > > found a fix, but I was not sure if others could have the same issue. > > > > > > > > > > > > > > > > In this case, do you think it is acceptable to make this change in > > > > > > > > the vsock's behavior with netns and ask the user to change the design? > > > > > > > > > > > > > > David's question is what would be a usecase that's broken > > > > > > > (as opposed to fixed) by enabling this by default. > > > > > > > > > > > > Yes, I got that. Thanks for clarifying. > > > > > > I just reported a broken example that can be fixed with a different > > > > > > design (due to the fact that before this series, vsock devices were > > > > > > accessible to all netns). > > > > > > > > > > > > > > > > > > > > If it does exist, you need a way for userspace to opt-in, > > > > > > > module parameter isn't that. > > > > > > > > > > > > Okay, but I honestly can't find a case that can't be solved. > > > > > > So I don't know whether to add an option (ioctl, sysfs ?) or wait for > > > > > > a real case to come up. > > > > > > > > > > > > I'll try to see better if there's any particular case where we need > > > > > > to disable netns in vsock. > > > > > > > > > > > > Thanks, > > > > > > Stefano > > > > > > > > > > Me neither. so what did you have in mind when you wrote: > > > > > "could break existing applications"? > > > > > > > > I had in mind: > > > > 1. the Kata case. It is fixable (the fix is not merged on kata), but > > > > older versions will not work with newer Linux. > > > > > > meaning they will keep not working, right? > > > > Right, I mean without this series they work, with this series they work > > only if the netns support is disabled or with a patch proposed but not > > merged in kata. > > > > > > > > > 2. a single process running on init_netns that wants to communicate with > > > > VMs handled by VMMs running in different netns, but this case can be > > > > solved opening the /dev/vhost-vsock in the same netns of the process > > > > that wants to communicate with the VMs (init_netns in this case), and > > > > passig it to the VMM. > > > > > > again right now they just don't work, right? > > > > Right, as above. > > > > What do you recommend I do? > > Existing userspace applications must continue to work. > > Guests are fine because G2H transports are always in the initial network > namespace. > > On the host side we have a real case where Kata Containers and other > vsock users break. Existing applications run in other network > namespaces and assume they can communicate over vsock (it's only > available in the initial network namespace by default). > > It seems we cannot isolate new network namespaces from the initial > network namespace by default because it will break existing > applications. That's a bummer. > > There is one solution that maintains compatibility: > > Introduce a per-namespace vsock isolation flag that can only transition > from false to true. Once it becomes true it cannot be reset to false > anymore (for security). > > When vsock isolation is false the initial network namespace is used for > addressing. > > When vsock isolation is true the current namespace is used for port> addressing. > > I guess the vsock isolation flag would be set via a rtnetlink message, > but I haven't checked. > > The upshot is: existing software doesn't benefit from namespaces for > vsock isolation but it continues to work! New software makes 1 special > call after creating the namespace to opt in to vsock isolation. > > This approach is secure because whoever sets up namespaces can > transition the flag from false to true and know that it can never be > reset to false anymore. > > Does this make sense to everyone? > > Stefan Anything wrong with a separate device? whoever opens it decides whether netns will work ... -- MST