Received: by 2002:a05:6358:e9c4:b0:b2:91dc:71ab with SMTP id hc4csp5457072rwb; Mon, 8 Aug 2022 20:22:57 -0700 (PDT) X-Google-Smtp-Source: AA6agR4gCX02JO0LUJol3LSmMg3Il+pHAHS6MCx++yXlE6aDlPf6WnjXQXQ70ZIXeLMMYgRDDeME X-Received: by 2002:a05:6402:35d5:b0:43d:a02f:cbfb with SMTP id z21-20020a05640235d500b0043da02fcbfbmr19385382edc.275.1660015377296; Mon, 08 Aug 2022 20:22:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660015377; cv=none; d=google.com; s=arc-20160816; b=iN+s018qPnEp3xWVXzQy1f0huu74G1tmhDQJSU2MTPITQf7F2/jca+Hc7mcaCoiZxh FoyeU7AewalWeSCd5PMb5gcuEoRK4U3dh8lc02KztuO0FA+LrmKuiJuLPcCAXnXyTvYS CzMtioRiFmnV9yhM17QAJzzc+bLKhkLexb2bWI5AOy/EOOjyV5gPc2nEPD2AcR7tMWfB 5JCW0+yFxamCUWPTQ2pE47M9YO0vwgMONaBZMUu/0IA7mZmyoLoaat2QXMxf6/c/2F0z r3Pj58QaZU9qRkHuV3RJT1vve3E0dYvs4qktbVbR2JD3GtsOWprfl468IucnbGxbONXS tKPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=p7pA+XrnNoTq27MsRVHmfaaizOofAZ0f/q24EJV9Tu8=; b=czhGrZ5vwu8GuZiRZcsrygzf9Epn4okxr9QTajEkJKWf44BjVxVcw/Ju0QMdgzGtvP ZYZ2qtSHljCDKkH1faLf5+tqnBQIuZUJvGAKR3I5c2ZRTmapXqWyKdKA9YvkTBN5WPBX bqLgWQHdSzcDHKhT6kGtdpATeg5wyjgfUobM0dcdRpsM7OhfRdYEQ4lrHk2+8qJYORJP xF9jyqhADKTwg7GI6sC45ksvIlqEiuFtXefz07Xx8ajbjklPX7RkqQ3T2TxtbYA5EY55 j+tjg5iEk09Eht4f6AYdOtx0psugNHNuRZy3VhslPAgymVnMFLNGyD8Q2FbSzenh0RtQ Hxwg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="hWH/vn8/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s4-20020a056402036400b0043d8ed5f039si6726776edw.598.2022.08.08.20.22.31; Mon, 08 Aug 2022 20:22:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="hWH/vn8/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232603AbiHIDNE (ORCPT + 99 others); Mon, 8 Aug 2022 23:13:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57980 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232584AbiHIDNB (ORCPT ); Mon, 8 Aug 2022 23:13:01 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id F0C8610EA for ; Mon, 8 Aug 2022 20:12:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1660014779; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=p7pA+XrnNoTq27MsRVHmfaaizOofAZ0f/q24EJV9Tu8=; b=hWH/vn8/gqnLiSHBedpxOTHd3V/xX4oqT6qT4x36blqgHLT+M1JfUa6U+YiMJeqHhblXOi sfnEaU/cXbM1NduXxEE+SmsNmhI0CccIrL9if+vZq7W+WRtLt0KbcBpRR6dtSh29KxYSO2 KiOmX5RN4hy0mfDjAEODACYIRKGmCE8= Received: from mail-lj1-f198.google.com (mail-lj1-f198.google.com [209.85.208.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-639-_h2bQkHhMbuVJrLJYh7-YA-1; Mon, 08 Aug 2022 23:12:54 -0400 X-MC-Unique: _h2bQkHhMbuVJrLJYh7-YA-1 Received: by mail-lj1-f198.google.com with SMTP id y10-20020a2eb00a000000b0025e505fc2c4so2997531ljk.11 for ; Mon, 08 Aug 2022 20:12:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=p7pA+XrnNoTq27MsRVHmfaaizOofAZ0f/q24EJV9Tu8=; b=JQbv4QFcSzxTOyFZePSuRfhGMQl7SlNp5jIQYEzvfFoRhbopEYfWJ71PTQt/3GEZWz NktFCNXjSLaOl0LKcHatU8DHAKXfz2ecWbGLGH9P7mG/8hAW6JsZnBtZWOsHDih2VgOH EvxHf+d/RbJC49G2LfULQoab6Y85guc2JSAQjPF0iVXvXv3lX7CFK1sZhQJYcpmB1rtR Za5+lttog+CEqTGnJBoVnrkVbLGX9Y/BX/Lq0Qth0J7V/Yt3u7xmdAS2zyZ3KLm3MdlE BPhR8c2JyqgGqdY93alS4XLhohvlY+YtuQFxzQMwhEj8Ya6K5YF2WgZWAM8ZTSoi9A+n 6K1Q== X-Gm-Message-State: ACgBeo1C7F7lIO5AaTzu1qRfSlI8DMdnAGuFRv2Pv20ZDXtrr0tCXt7e jL3/4yE7O42CUv52mr7mFvGCUgfCveldSrSuoyY+pvV1RUBlNqzTmhBHt7r2DxMyyvvG9LpRali y64WpVjUxV4dH8sVZhRQqREyQxG/LgcW0VvBj9c4G X-Received: by 2002:a2e:9e17:0:b0:25d:7654:4c6b with SMTP id e23-20020a2e9e17000000b0025d76544c6bmr6972607ljk.130.1660014770550; Mon, 08 Aug 2022 20:12:50 -0700 (PDT) X-Received: by 2002:a2e:9e17:0:b0:25d:7654:4c6b with SMTP id e23-20020a2e9e17000000b0025d76544c6bmr6972603ljk.130.1660014770167; Mon, 08 Aug 2022 20:12:50 -0700 (PDT) MIME-Version: 1.0 References: <20220805181105.GA29848@willie-the-truck> <20220807042408-mutt-send-email-mst@kernel.org> In-Reply-To: <20220807042408-mutt-send-email-mst@kernel.org> From: Jason Wang Date: Tue, 9 Aug 2022 11:12:39 +0800 Message-ID: Subject: Re: IOTLB support for vhost/vsock breaks crosvm on Android To: "Michael S. Tsirkin" Cc: Will Deacon , Stefan Hajnoczi , Linus Torvalds , ascull@google.com, Marc Zyngier , Keir Fraser , jiyong@google.com, kernel-team@android.com, linux-kernel , virtualization , kvm Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Aug 7, 2022 at 9:14 PM Michael S. Tsirkin wrote: > > Will, thanks very much for the analysis and the writeup! > > On Fri, Aug 05, 2022 at 07:11:06PM +0100, Will Deacon wrote: > > So how should we fix this? One possibility is for us to hack crosvm to > > clear the VIRTIO_F_ACCESS_PLATFORM flag when setting the vhost features, > > but others here have reasonably pointed out that they didn't expect a > > kernel change to break userspace. On the flip side, the offending commit > > in the kernel isn't exactly new (it's from the end of 2020!) and so it's > > likely that others (e.g. QEMU) are using this feature. > > Exactly, that's the problem. > > vhost is reusing the virtio bits and it's only natural that > what you are doing would happen. > > To be precise, this is what we expected people to do (and what QEMU does): > > > #define QEMU_VHOST_FEATURES ((1 << VIRTIO_F_VERSION_1) | > (1 << VIRTIO_NET_F_RX_MRG) | .... ) > > VHOST_GET_FEATURES(... &host_features); > host_features &= QEMU_VHOST_FEATURES > VHOST_SET_FEATURES(host_features & guest_features) > > > Here QEMU_VHOST_FEATURES are the bits userspace knows about. > > Our assumption was that whatever userspace enables, it > knows what the effect on vhost is going to be. > > But yes, I understand absolutely how someone would instead just use the > guest features. It is unfortunate that we did not catch this in time. > > > In hindsight, we should have just created vhost level macros > instead of reusing virtio ones. Would address the concern > about naming: PLATFORM_ACCESS makes sense for the > guest since there it means "whatever access rules platform has", > but for vhost a better name would be VHOST_F_IOTLB. Yes, in the original patch it is called VHOST_F_DEVICE_IOTLB actually to make it differ from virtio like VHOST_F_LOG_ALL etc. And I remember I tried to post patch to avoid the bit duplication but the conclusion is that it's too late for the changes. > We should have also taken greater pains to document what > we expect userspace to do. I remember now how I thought about something > like this but after coding this up in QEMU I forgot to document this :( > Also, I suspect given the history the GET/SET features ioctl and burned > wrt extending it and we have to use a new when we add new features. > All this we can do going forward. > > > But what can we do about the specific issue? > I am not 100% sure since as Will points out, QEMU and other > userspace already rely on the current behaviour. > > Looking at QEMU specifically, it always sends some translations at > startup, this in order to handle device rings. > > So, *maybe* we can get away with assuming that if no IOTLB ioctl was > ever invoked then this userspace does not know about IOTLB and > translation should ignore IOTLB completely. I think this breaks the security assumptions of some setups. > > I am a bit nervous about breaking some *other* userspace which actually > wants device to be blocked from accessing memory until IOTLB > has been setup. If we get it wrong we are making guest > and possibly even host vulnerable. Yes. > And of course just revering is not an option either since there > are now whole stacks depending on the feature. > > Will I'd like your input on whether you feel a hack in the kernel > is justified here. > > Also yes, I think it's a good idea to change crosvm anyway. While the > work around I describe might make sense upstream I don't think it's a > reasonable thing to do in stable kernels. +1 Thanks > I think I'll prepare a patch documenting the legal vhost features > as a 1st step even though crosvm is rust so it's not importing > the header directly, right? > > -- > MST >