Received: by 2002:a05:6358:e9c4:b0:b2:91dc:71ab with SMTP id hc4csp4867824rwb; Mon, 8 Aug 2022 08:15:46 -0700 (PDT) X-Google-Smtp-Source: AA6agR68/5w8u0cKGpNSKZLgGOAP/QKxWOseeioo4UYmsF+PCUTRsIlgu3Fd6UTy+DgFgWQqWnNC X-Received: by 2002:a05:6402:3552:b0:43d:a57d:22e9 with SMTP id f18-20020a056402355200b0043da57d22e9mr18116575edd.119.1659971746565; Mon, 08 Aug 2022 08:15:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659971746; cv=none; d=google.com; s=arc-20160816; b=bM6CYVc9L1lkQXzju5Nvsk23gUqiEkXaP5yw8L+zVMV7Ka1zECx42Ypgu43ZbCLPHN k2a+F5SPxe6np/3a7rMCdGjRcNB+sSuMwgbDORlQvmejSviAFakmOOqs8jG0bzocPhlu 0t3ZVMI6O+uGrEF0lObYSGl+fzFk6IsNjHa0SrjVeuIV1wX4hSkjTbKlqu+ej0gXhUWJ ddxKexu7myNCqDk0n8X5Dp/Y0Uq/gI5ddfH2nd+knh2RtS8+tizK03XZwrPK7pdKOqkX +V5zDxrMmm/6XO2Bib/y26WtMgjsQd4xQiKGgtpgMGaFoZ4ze03i+cqkdxqeIn/OjX5a lPGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=/1TLUgVlg2OyYCJ+DZaddQK6OMJhjwa3jOmWOJ1SScI=; b=oR+TiTaDRRjqkwF59Y2Wgui66jGSQTEP4KEpcGRKcbInvOTc07sPa6vOzdPfLS/bHy TRSchMTzI3qQV2AjsQhwZPL4JGDwLr3FKqlYQxQgO/zX/q3/NZdgjNmFTsPRalEBnQ+s SjjQCR1FPhhbNLjetNxAYARdsnIVz7MHlnNWDWL+V1Z5oMPwEc2hG/VHfBe/v0dR759y 7sczuGwTJMiLLvgIJ/vM2hf+ZuHVuA5yhhEB7gmO93Avg/3CrMdlZNM+a7YC3OuwBYFy FnX1bo3A4j5Ubmmq1wu+cVoNHISExpYMODzzdrWKc2F2ZKRFw3JgISbp4qal95jGgb/N OSNA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CzBnDYHg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dd20-20020a1709069b9400b00730a159b6f6si11500424ejc.790.2022.08.08.08.15.20; Mon, 08 Aug 2022 08:15:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CzBnDYHg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243423AbiHHOdX (ORCPT + 99 others); Mon, 8 Aug 2022 10:33:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58872 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231475AbiHHOdW (ORCPT ); Mon, 8 Aug 2022 10:33:22 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E751C11803 for ; Mon, 8 Aug 2022 07:33:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1659969199; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=/1TLUgVlg2OyYCJ+DZaddQK6OMJhjwa3jOmWOJ1SScI=; b=CzBnDYHg00fkiP2nI9ZM8nxuQ2k5bQifO1OU38JZukIHNr7dT8JPl3Nqt3vrQHkbdX89pM JNkrB6Fgj2owiaHm967HNtHgqwU/FG/9hz525Wc7ZGdlt46RCpZx/DmKZM5hjf3IkkKMfU c1VYEC/XLufh/AeV0XkQ3L8Pqt8x8y4= Received: from mail-lf1-f70.google.com (mail-lf1-f70.google.com [209.85.167.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-551-kvogdGC4Oie-jdDllxR_fQ-1; Mon, 08 Aug 2022 10:33:18 -0400 X-MC-Unique: kvogdGC4Oie-jdDllxR_fQ-1 Received: by mail-lf1-f70.google.com with SMTP id y6-20020a196406000000b0048ae6ac68adso2218480lfb.21 for ; Mon, 08 Aug 2022 07:33:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=/1TLUgVlg2OyYCJ+DZaddQK6OMJhjwa3jOmWOJ1SScI=; b=LKkmVMTtNopRWg7YCb4ALEe0MMCAOLklr7F3NSAsfmoIi6TDHf6LFrtalgBQ8LF1Ze GPqvIlEIBb2sc5morA3xMfY1PvJQLrOJwV4D8JrYjopTpdnjU25wM/s3cO+EahvQpHWB Ri9hE8qjOKlrDpO1kYva2BIrpLlfLn7AdnreUupnGsIQtaNtXyXWouMDISwjZuEiw5xu cr0SSi9PKAnQMJKqEtgxuQbYi/dTN+MjwquXi8ShF8iPAVhrYijfIFUSuwD8iKYvsudF IxfI/1VUfqYC9WvwKj1Akdzh7GOcpQ5NhiY8q3efm+l+l8PlGSL1AEj3i8bZ8OfcgQq6 6uzQ== X-Gm-Message-State: ACgBeo3PkSQzuGWyn1VD4F60qZ10NQZkhGhIk6oCwxx1goWEOeZ2FWNK iomCCZcNRKusQ9BFjb9CkNWnA8sfi2yHOnPfDfk8G3OFWoq9yfu+OltpvdNxWRsXouOzBNhYpFk SqfnOGl0sqHCVgH/0Iq10AiFx0gWt9E9LAHRwLezI X-Received: by 2002:a05:6512:68a:b0:48b:9d3d:b19b with SMTP id t10-20020a056512068a00b0048b9d3db19bmr3474985lfe.174.1659969196984; Mon, 08 Aug 2022 07:33:16 -0700 (PDT) X-Received: by 2002:a05:6512:68a:b0:48b:9d3d:b19b with SMTP id t10-20020a056512068a00b0048b9d3db19bmr3474977lfe.174.1659969196714; Mon, 08 Aug 2022 07:33:16 -0700 (PDT) MIME-Version: 1.0 References: <20220805181105.GA29848@willie-the-truck> <20220807042408-mutt-send-email-mst@kernel.org> <20220808101850.GA31984@willie-the-truck> <20220808083958-mutt-send-email-mst@kernel.org> In-Reply-To: <20220808083958-mutt-send-email-mst@kernel.org> From: Stefan Hajnoczi Date: Mon, 8 Aug 2022 10:33:05 -0400 Message-ID: Subject: Re: IOTLB support for vhost/vsock breaks crosvm on Android To: "Michael S. Tsirkin" Cc: Will Deacon , Stefan Hajnoczi , Jason Wang , torvalds@linux-foundation.org, ascull@google.com, maz@kernel.org, keirf@google.com, jiyong@google.com, kernel-team@android.com, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, kvm@vger.kernel.org, crosvm-dev@chromium.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 8, 2022 at 8:46 AM Michael S. Tsirkin wrote: > > On Mon, Aug 08, 2022 at 11:18:50AM +0100, Will Deacon wrote: > > Hi Michael, > > > > On Sun, Aug 07, 2022 at 09:14:43AM -0400, Michael S. Tsirkin wrote: > > > Will, thanks very much for the analysis and the writeup! > > > > No problem, and thanks for following up. > > > > > On Fri, Aug 05, 2022 at 07:11:06PM +0100, Will Deacon wrote: > > > > So how should we fix this? One possibility is for us to hack crosvm to > > > > clear the VIRTIO_F_ACCESS_PLATFORM flag when setting the vhost features, > > > > but others here have reasonably pointed out that they didn't expect a > > > > kernel change to break userspace. On the flip side, the offending commit > > > > in the kernel isn't exactly new (it's from the end of 2020!) and so it's > > > > likely that others (e.g. QEMU) are using this feature. > > > > > > Exactly, that's the problem. > > > > > > vhost is reusing the virtio bits and it's only natural that > > > what you are doing would happen. > > > > > > To be precise, this is what we expected people to do (and what QEMU does): > > > > > > > > > #define QEMU_VHOST_FEATURES ((1 << VIRTIO_F_VERSION_1) | > > > (1 << VIRTIO_NET_F_RX_MRG) | .... ) > > > > > > VHOST_GET_FEATURES(... &host_features); > > > host_features &= QEMU_VHOST_FEATURES > > > VHOST_SET_FEATURES(host_features & guest_features) > > > > > > > > > Here QEMU_VHOST_FEATURES are the bits userspace knows about. > > > > > > Our assumption was that whatever userspace enables, it > > > knows what the effect on vhost is going to be. > > > > > > But yes, I understand absolutely how someone would instead just use the > > > guest features. It is unfortunate that we did not catch this in time. > > > > > > > > > In hindsight, we should have just created vhost level macros > > > instead of reusing virtio ones. Would address the concern > > > about naming: PLATFORM_ACCESS makes sense for the > > > guest since there it means "whatever access rules platform has", > > > but for vhost a better name would be VHOST_F_IOTLB. > > > We should have also taken greater pains to document what > > > we expect userspace to do. I remember now how I thought about something > > > like this but after coding this up in QEMU I forgot to document this :( > > > Also, I suspect given the history the GET/SET features ioctl and burned > > > wrt extending it and we have to use a new when we add new features. > > > All this we can do going forward. > > > > Makes sense. The crosvm developers are also pretty friendly in my > > experience, so I'm sure they wouldn't mind being involved in discussions > > around any future ABI extensions. Just be aware that they _very_ recently > > moved their mailing lists, so I think it lives here now: > > > > https://groups.google.com/a/chromium.org/g/crosvm-dev > > > > > But what can we do about the specific issue? > > > I am not 100% sure since as Will points out, QEMU and other > > > userspace already rely on the current behaviour. > > > > > > Looking at QEMU specifically, it always sends some translations at > > > startup, this in order to handle device rings. > > > > > > So, *maybe* we can get away with assuming that if no IOTLB ioctl was > > > ever invoked then this userspace does not know about IOTLB and > > > translation should ignore IOTLB completely. > > > > There was a similar suggestion from Stefano: > > > > https://lore.kernel.org/r/20220806105225.crkui6nw53kbm5ge@sgarzare-redhat > > > > about spotting the backend ioctl for IOTLB and using that to enable > > the negotiation of F_ACCESS_PLATFORM. Would that work for qemu? > > Hmm I would worry that this disables the feature for old QEMU :( > > > > > I am a bit nervous about breaking some *other* userspace which actually > > > wants device to be blocked from accessing memory until IOTLB > > > has been setup. If we get it wrong we are making guest > > > and possibly even host vulnerable. > > > And of course just revering is not an option either since there > > > are now whole stacks depending on the feature. > > > > Absolutely, I'm not seriously suggesting the revert. I just did it locally > > to confirm the issue I was seeing. > > > > > Will I'd like your input on whether you feel a hack in the kernel > > > is justified here. > > > > If we can come up with something that we have confidence in and won't be a > > pig to maintain, then I think we should do it, but otherwise we can go ahead > > and change crosvm to mask out this feature flag on the vhost side for now. > > We mainly wanted to raise the issue to illustrate that this flag continues > > to attract problems in the hope that it might inform further usage and/or > > spec work in this area. > > > > In any case, I'm happy to test any kernel patches with our setup if you > > want to give it a shot. > > Thanks! > I'm a bit concerned that the trick I proposed changes the configuration > where iotlb was not set up from "access to memory not allowed" to > "access to all memory allowed". This just might have security > implications if some application assumed the former. > And the one Stefano proposed disables IOTLB for old QEMU versions. Adding hacks to vhost in order to work around userspace applications that misunderstand the vhost model seems like a it will lead to problems. Userspace applications need to follow the vhost model: vhost is designed for virtqueue passthrough, but the rest of the vhost interface is not suitable for pass through. It's similar to how VFIO PCI passthrough needs to do a significant amount of stuff in userspace to emulate a PCI configuration space and it won't work properly if you pass through the physical PCI device's PCI configuration space. The emulator has to mediate between the guest device and vhost device because it still emulates the VIRTIO transport, configuration space, device lifecycle, etc even when all virtqueues are passed through. Let's document this for vhost and vDPA because it is not obvious. Stefan