Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp3805789ybt; Tue, 30 Jun 2020 11:36:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJygc+cBkhwgPUAMBOtn+4UW3KiXYcfoj6iOOStzH8k1V5OVYB+6/qf4tG1pyAXcfQ6y02JQ X-Received: by 2002:a17:906:2b52:: with SMTP id b18mr19826274ejg.158.1593541268304; Tue, 30 Jun 2020 11:21:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1593541267; cv=none; d=google.com; s=arc-20160816; b=hiepxZ1RZTmJ1QfXRdWYupPfBqRxmbFu99/Eu12DjlQ9UB3Ngs5yIG/ycyLemksb5S m4QwpCGLcU9SpemLNmd1DcNV6lAIsxdR2VmRYntFTeXPefQVkgaUZdZQOvYkXmtysZzn LCP17iw+x4bv3OSV93BxnEtrkmCnfDkmr4ONf1x1IXg4rTkkMTdrPeKR/1arMEjfjK2e /Z9UisTmbA7M65ujUDlvp4CC7ZbdY77+rhLPQsf0tjsj1/aXFs2qNS8HSZpkTU5yu0ft jc2xsnNtYULDLwbR9apmzD69Va+bFgb39URkeiP5ftq3c8Cqvkuxf958ZBlQQKRZhuFy TqvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:ironport-sdr:ironport-sdr; bh=V/es2SFaq19sHtLKcCDOJE+reMAqGSEGgKfhapkne+g=; b=FYWP8nZDy7vYqZATFijYfCq8aXzsYFJUP5jPF5UALWpcaI8JSALomys+xe0eANef32 MO8ks+kQ8FB1MVjR+SYRSzOcgcZwdeMNMC+5pQHCTZkZd7MtXrJHDLQz+sFbr1OWirHp ALSvfyg8+8yj7ztUNw/Hzg2Ru5FMvIheT9nWENW7+AjvMWxuBsSTnHN4o6398lBthHFb 1wC0i1CcIyQIMygaswA6rCkhlodjwpGSpz1L5rmUHqQMz69zpGKzwsR4kAM8N2IOIr1L ZlaQz+wv/Gg8oiQl1YVvHJGimmEGkxwB/vLCvMaWo/GMzU7KiAjJr3CIXBrL4D9YqkgL XjQw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i19si2327204ejd.479.2020.06.30.11.20.43; Tue, 30 Jun 2020 11:21:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732235AbgF3QcV (ORCPT + 99 others); Tue, 30 Jun 2020 12:32:21 -0400 Received: from mga17.intel.com ([192.55.52.151]:2223 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731126AbgF3QcU (ORCPT ); Tue, 30 Jun 2020 12:32:20 -0400 IronPort-SDR: IodXyb0xPi/vxOTgYu3u9Yj3kSWN9XPkOQdritQZipp9xaT3O3Za+m0+fIb6GcbnnUhd+cqqYF WJHclqZGYAMg== X-IronPort-AV: E=McAfee;i="6000,8403,9667"; a="126405342" X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="126405342" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2020 09:32:19 -0700 IronPort-SDR: oIQS/OHkEep/5BuWVDTlfTy5KSQp+wkGHV1WNKsGdyAtesGvONkbufj8q93zfcv++IYL9K5LX7 JKCcfTcQgYKA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,298,1589266800"; d="scan'208";a="266519139" Received: from sjchrist-coffee.jf.intel.com (HELO linux.intel.com) ([10.54.74.152]) by fmsmga008.fm.intel.com with ESMTP; 30 Jun 2020 09:32:18 -0700 Date: Tue, 30 Jun 2020 09:32:18 -0700 From: Sean Christopherson To: Vitaly Kuznetsov Cc: Vivek Goyal , kvm@vger.kernel.org, virtio-fs@redhat.com, pbonzini@redhat.com, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH] kvm,x86: Exit to user space in case of page fault error Message-ID: <20200630163218.GF7733@linux.intel.com> References: <20200626150303.GC195150@redhat.com> <874kqtd212.fsf@vitty.brq.redhat.com> <20200629220353.GC269627@redhat.com> <87sgecbs9w.fsf@vitty.brq.redhat.com> <20200630145303.GB322149@redhat.com> <87mu4kbn7x.fsf@vitty.brq.redhat.com> <20200630152529.GC322149@redhat.com> <87k0zobltx.fsf@vitty.brq.redhat.com> <20200630155028.GE7733@linux.intel.com> <87h7usbkhq.fsf@vitty.brq.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87h7usbkhq.fsf@vitty.brq.redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 30, 2020 at 06:12:49PM +0200, Vitaly Kuznetsov wrote: > Sean Christopherson writes: > > > On Tue, Jun 30, 2020 at 05:43:54PM +0200, Vitaly Kuznetsov wrote: > >> Vivek Goyal writes: > >> > >> > On Tue, Jun 30, 2020 at 05:13:54PM +0200, Vitaly Kuznetsov wrote: > >> >> > >> >> > - If you retry in kernel, we will change the context completely that > >> >> > who was trying to access the gfn in question. We want to retain > >> >> > the real context and retain information who was trying to access > >> >> > gfn in question. > >> >> > >> >> (Just so I understand the idea better) does the guest context matter to > >> >> the host? Or, more specifically, are we going to do anything besides > >> >> get_user_pages() which will actually analyze who triggered the access > >> >> *in the guest*? > >> > > >> > When we exit to user space, qemu prints bunch of register state. I am > >> > wondering what does that state represent. Does some of that traces > >> > back to the process which was trying to access that hva? I don't > >> > know. > >> > >> We can get the full CPU state when the fault happens if we need to but > >> generally we are not analyzing it. I can imagine looking at CPL, for > >> example, but trying to distinguish guest's 'process A' from 'process B' > >> may not be simple. > >> > >> > > >> > I think keeping a cache of error gfns might not be too bad from > >> > implemetation point of view. I will give it a try and see how > >> > bad does it look. > >> > >> Right; I'm only worried about the fact that every cache (or hash) has a > >> limited size and under certain curcumstances we may overflow it. When an > >> overflow happens, we will follow the APF path again and this can go over > >> and over. Maybe we can punch a hole in EPT/NPT making the PFN reserved/ > >> not-present so when the guest tries to access it again we trap the > >> access in KVM and, if the error persists, don't follow the APF path? > > > > Just to make sure I'm somewhat keeping track, is the problem we're trying to > > solve that the guest may not immediately retry the "bad" GPA and so KVM may > > not detect that the async #PF already came back as -EFAULT or whatever? > > Yes. In Vivek's patch there's a single 'error_gfn' per vCPU which serves > as an indicator whether to follow APF path or not. A thought along the lines of your "punch a hole in the page tables" idea would be to invalidate the SPTE (in the unlikely case it's present but not writable) and tagging it as being invalid for async #PF. E.g. for !EPT, there are 63 bits available for metadata. For EPT, there's a measly 60, assuming we want to avoid using SUPPRESS_VE. The fully !present case would be straightforward, but the !writable case would require extra work, especially for shadow paging. With the SPTE tagged, it'd "just" be a matter of hooking into the page fault paths to detect the flag and disable async #PF. For TDP that's not too bad, e.g. pass in a flag to fast_page_fault() and propagate it to try_async_pf(). Not sure how to handle shadow paging, that code makes my head hurt just looking at it. It'd require tweaking is_shadow_present_pte() to be more precise, but that's probably a good thing, and peanuts compared to handling the faults.