Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp4551405yba; Tue, 9 Apr 2019 23:10:31 -0700 (PDT) X-Google-Smtp-Source: APXvYqyYj6142GZ5wxLxscUYHpI8rUYtM0djps9xQsQXhtmy1XYz6CO+J5ihKeVxW/m8bDMpKRyy X-Received: by 2002:aa7:8092:: with SMTP id v18mr40500564pff.35.1554876631028; Tue, 09 Apr 2019 23:10:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554876631; cv=none; d=google.com; s=arc-20160816; b=N5g3DHFgQ4R2lzcNWnnTi6ibME1A4cLPYn/HKWaRvI61dRFCcbY3MKfhc10nwFSjGU UG08YNqQCTuEL59Y+5jsfIvJScgNxfJerg/xzCF8/KN9v7Fgyf7v4XYaxXvbEsDqvM0l vlDRbghsUIYhygPvRFA9jS71TQRXw541jLzuF79FdVBgtYrEA62BN8IgoKikwjzAdyH9 Jy8HDVp6c/oWlHfYPQSMs3nSRPZu33VVKLdjx2G7CyZZPWTRNMPOyv+RjmcAr+b7JNBi vl6nQ/l0on+KwgfplrcYUAutRvj1gbL96IVZJekcWGjYBV+sy4XnGSZIKUhbNZ4bvPjn /bjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=vBGEx1u4ETjzLctlghoHy+pgeY6OEoIag8vqsKRSLCI=; b=I/VYUmqaSYrgr5+t6W4xHentq0vpXgJpU8FE/rpI5X6+2ouakmu2W5Y4vHeeC5w4sg w2mrTkg/zbRNkyuUid4ESSInn4K+XNBDOsv9YUk1TN4wbPVbpdTluty7qUTBfykBN6AC mPF4VoFggxE6oIxstLOrMTWvk1IyR6NUxn8sPYRIuBWCMl69hj/B3EG4Dc+jguXZ4Uhg IpbhXzepkKuJ7WbuiGDrWHmJM7KK144gX0qcm8/0IMqQodOL/BFbkqGbsm7suYNB7IkL 6lZsMfj1ASXyj14CRquLyaAsibaCpcz72l1eb9rWIyAcdq3ByB1ho8IluEMAxL6YRk7I W2sQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=oJfsGKOK; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 3si32342670plx.386.2019.04.09.23.10.14; Tue, 09 Apr 2019 23:10:31 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2018-07-02 header.b=oJfsGKOK; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727242AbfDJFvn (ORCPT + 99 others); Wed, 10 Apr 2019 01:51:43 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:33238 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726230AbfDJFvn (ORCPT ); Wed, 10 Apr 2019 01:51:43 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x3A5iaFV045325; Wed, 10 Apr 2019 05:50:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=vBGEx1u4ETjzLctlghoHy+pgeY6OEoIag8vqsKRSLCI=; b=oJfsGKOKw5eZF6yFQGebwkDVOtg694sZmAnkSWdRfjoJ68GK9y2BYAdDJ7/o7OME/Ohu EtdEfGmlPlGmQdQvmX6NlVkouUPiEhmQRiZgh8LQnUPyyNcIQnpZTaRydcVO4sfAgpqc 2e207HuNdcAaiOyYXNQEijZ5CW5/zZR059Y5LSCE85zIszt6hdrp5uJpQggCIGfeBLzZ sXemhv1Bi2X1kKYF3CULBIDLxCtL40uMmsfS4QeWpLidygb4FTbZsp6q0nIPJVBgfPhi pCleZ/NLwkuOrdt0ddxJHyC/ZsKFCXJc6VlAE6k8IoG0Q8hH8TJKFiDQkBl+gKMP8MPw 6g== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 2rpkht0xa0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 10 Apr 2019 05:50:45 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x3A5ngbV042641; Wed, 10 Apr 2019 05:50:45 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3020.oracle.com with ESMTP id 2rpytc1r6w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 10 Apr 2019 05:50:44 +0000 Received: from abhmp0002.oracle.com (abhmp0002.oracle.com [141.146.116.8]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x3A5oeeJ022577; Wed, 10 Apr 2019 05:50:41 GMT Received: from [10.159.253.119] (/10.159.253.119) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 09 Apr 2019 22:50:40 -0700 Subject: Re: [Xen-devel] [PATCH RFC 00/39] x86/KVM: Xen HVM guest support To: Stefano Stabellini , Joao Martins Cc: Juergen Gross , Artem_Mygaiev@epam.com, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , x86@kernel.org, linux-kernel@vger.kernel.org, Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Paolo Bonzini , Boris Ostrovsky , Thomas Gleixner References: <20190220201609.28290-1-joao.m.martins@oracle.com> <35051310-c497-8ad5-4434-1b8426a317d2@redhat.com> <8b1f4912-4f92-69ae-ae01-d899d5640572@oracle.com> <3ee91f33-2973-c2db-386f-afbf138081b4@redhat.com> <59676804-786d-3df8-7752-8e45dec6d65b@oracle.com> <94738323-ebdf-d58e-55b6-313e27c923b0@oracle.com> <585163c2-8dea-728d-7556-9cb3559f0eca@suse.com> <97808492-58ee-337f-c894-900b34b7b1a5@oracle.com> From: Ankur Arora Message-ID: <8300d893-40a3-2b5b-e510-cd5955c72670@oracle.com> Date: Tue, 9 Apr 2019 22:50:33 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9222 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904100042 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9222 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1904100042 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019-04-08 5:35 p.m., Stefano Stabellini wrote: > On Mon, 8 Apr 2019, Joao Martins wrote: >> On 4/8/19 11:42 AM, Juergen Gross wrote: >>> On 08/04/2019 12:36, Joao Martins wrote: >>>> On 4/8/19 7:44 AM, Juergen Gross wrote: >>>>> On 12/03/2019 18:14, Joao Martins wrote: >>>>>> On 2/22/19 4:59 PM, Paolo Bonzini wrote: >>>>>>> On 21/02/19 12:45, Joao Martins wrote: >>>>>>>> On 2/20/19 9:09 PM, Paolo Bonzini wrote: >>>>>>>>> On 20/02/19 21:15, Joao Martins wrote: >>>>>>>>>> 2. PV Driver support (patches 17 - 39) >>>>>>>>>> >>>>>>>>>> We start by redirecting hypercalls from the backend to routines >>>>>>>>>> which emulate the behaviour that PV backends expect i.e. grant >>>>>>>>>> table and interdomain events. Next, we add support for late >>>>>>>>>> initialization of xenbus, followed by implementing >>>>>>>>>> frontend/backend communication mechanisms (i.e. grant tables and >>>>>>>>>> interdomain event channels). Finally, introduce xen-shim.ko, >>>>>>>>>> which will setup a limited Xen environment. This uses the added >>>>>>>>>> functionality of Xen specific shared memory (grant tables) and >>>>>>>>>> notifications (event channels). >>>>>>>>> >>>>>>>>> I am a bit worried by the last patches, they seem really brittle and >>>>>>>>> prone to breakage. I don't know Xen well enough to understand if the >>>>>>>>> lack of support for GNTMAP_host_map is fixable, but if not, you have to >>>>>>>>> define a completely different hypercall. >>>>>>>>> >>>>>>>> I guess Ankur already answered this; so just to stack this on top of his comment. >>>>>>>> >>>>>>>> The xen_shim_domain() is only meant to handle the case where the backend >>>>>>>> has/can-have full access to guest memory [i.e. netback and blkback would work >>>>>>>> with similar assumptions as vhost?]. For the normal case, where a backend *in a >>>>>>>> guest* maps and unmaps other guest memory, this is not applicable and these >>>>>>>> changes don't affect that case. >>>>>>>> >>>>>>>> IOW, the PV backend here sits on the hypervisor, and the hypercalls aren't >>>>>>>> actual hypercalls but rather invoking shim_hypercall(). The call chain would go >>>>>>>> more or less like: >>>>>>>> >>>>>>>> >>>>>>>> gnttab_map_refs(map_ops, pages) >>>>>>>> HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,...) >>>>>>>> shim_hypercall() >>>>>>>> shim_hcall_gntmap() >>>>>>>> >>>>>>>> Our reasoning was that given we are already in KVM, why mapping a page if the >>>>>>>> user (i.e. the kernel PV backend) is himself? The lack of GNTMAP_host_map is how >>>>>>>> the shim determines its user doesn't want to map the page. Also, there's another >>>>>>>> issue where PV backends always need a struct page to reference the device >>>>>>>> inflight data as Ankur pointed out. >>>>>>> >>>>>>> Ultimately it's up to the Xen people. It does make their API uglier, >>>>>>> especially the in/out change for the parameter. If you can at least >>>>>>> avoid that, it would alleviate my concerns quite a bit. >>>>>> >>>>>> In my view, we have two options overall: >>>>>> >>>>>> 1) Make it explicit, the changes the PV drivers we have to make in >>>>>> order to support xen_shim_domain(). This could mean e.g. a) add a callback >>>>>> argument to gnttab_map_refs() that is invoked for every page that gets looked up >>>>>> successfully, and inside this callback the PV driver may update it's tracking >>>>>> page. Here we no longer have this in/out parameter in gnttab_map_refs, and all >>>>>> shim_domain specific bits would be a little more abstracted from Xen PV >>>>>> backends. See netback example below the scissors mark. Or b) have sort of a >>>>>> translate_gref() and put_gref() API that Xen PV drivers use which make it even >>>>>> more explicit that there's no grant ops involved. The latter is more invasive. >>>>>> >>>>>> 2) The second option is to support guest grant mapping/unmapping [*] to allow >>>>>> hosting PV backends inside the guest. This would remove the Xen changes in this >>>>>> series completely. But it would require another guest being used >>>>>> as netback/blkback/xenstored, and less performance than 1) (though, in theory, >>>>>> it would be equivalent to what does Xen with grants/events). The only changes in >>>>>> Linux Xen code is adding xenstored domain support, but that is useful on its own >>>>>> outside the scope of this work. >>>>>> >>>>>> I think there's value on both; 1) is probably more familiar for KVM users >>>>>> perhaps (as it is similar to what vhost does?) while 2) equates to implementing >>>>>> Xen disagregation capabilities in KVM. >>>>>> >>>>>> Thoughts? Xen maintainers what's your take on this? >>>>> >>>>> What I'd like best would be a new handle (e.g. xenhost_t *) used as an >>>>> abstraction layer for this kind of stuff. It should be passed to the >>>>> backends and those would pass it on to low-level Xen drivers (xenbus, >>>>> event channels, grant table, ...). >>>>> >>>> So if IIRC backends would use the xenhost layer to access grants or frames >>>> referenced by grants, and that would grok into some of this. IOW, you would have >>>> two implementors of xenhost: one for nested remote/local events+grants and >>>> another for this "shim domain" ? >>> >>> As I'd need that for nested Xen I guess that would make it 3 variants. >>> Probably the xen-shim variant would need more hooks, but that should be >>> no problem. >>> >> I probably messed up in the short description but "nested remote/local >> events+grants" was referring to nested Xen (FWIW remote meant L0 and local L1). >> So maybe only 2 variants are needed? >> >>>>> I was planning to do that (the xenhost_t * stuff) soon in order to add >>>>> support for nested Xen using PV devices (you need two Xenstores for that >>>>> as the nested dom0 is acting as Xen backend server, while using PV >>>>> frontends for accessing the "real" world outside). >>>>> >>>>> The xenhost_t should be used for: >>>>> >>>>> - accessing Xenstore >>>>> - issuing and receiving events >>>>> - doing hypercalls >>>>> - grant table operations >>>>> >>>> >>>> In the text above, I sort of suggested a slice of this on 1.b) with a >>>> translate_gref() and put_gref() API -- to get the page from a gref. This was >>>> because of the flags|host_addr hurdle we depicted above wrt to using using grant >>>> maps/unmaps. You think some of the xenhost layer would be ammenable to support >>>> this case? >>> >>> I think so, yes. >>> >>>> >>>>> So exactly the kind of stuff you want to do, too. >>>>> >>>> Cool idea! >>> >>> In the end you might make my life easier for nested Xen. :-) >>> >> Hehe :) >> >>> Do you want to have a try with that idea or should I do that? I might be >>> able to start working on that in about a month. >>> >> Ankur (CC'ed) will give a shot at it, and should start a new thread on this >> xenhost abstraction layer. > > If you are up for it, it would be great to write some documentation too. > We are starting to have decent docs for some PV protocols, describing a > specific PV interface, but we are lacking docs about the basic building > blocks to bring up PV drivers in general. They would be extremely Agreed. These would be useful. > useful. Given that you have just done the work, you are in a great > position to write those docs. Even bad English would be fine, I am sure > somebody else could volunteer to clean-up the language. Anything would > help :-) Can't make any promises on this yet but I will see if I can convert notes I made into something useful for the community. Ankur > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xenproject.org > https://lists.xenproject.org/mailman/listinfo/xen-devel >