Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp2913478img; Sun, 24 Mar 2019 23:08:22 -0700 (PDT) X-Google-Smtp-Source: APXvYqwXMtz7AVkVvklgMj3ptMP0zFC92A0IBkmwg4a8tcVL4Wx1X1QnZMhh3SMEZpJXu147X/rJ X-Received: by 2002:a62:5687:: with SMTP id h7mr22231942pfj.198.1553494101996; Sun, 24 Mar 2019 23:08:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553494101; cv=none; d=google.com; s=arc-20160816; b=ljtt3m//lKZLFqdJ8L6OMH6m03n+fKWVDvk8P+Gt+AXUn9tr1aJVUo+GXejxPw/3ec 99TJ0s3yYIIGxxp8Vpc1Uhve74D/kpbOme1+iZYwQI081VVI20Fkt93kEFZhDotSaC4w bAl4vGsQ9F/+CNpsG9mniKx1uI1PPLJ4VZqAgedlljuQVy8eUvxWgXwLaTB3eIXqv8Sg R/oR6DsBNd/kzelHeewADcNEEgIuuSIBufRRerV4wWZtJ5oddhM+R4PqFKDpCMk3av8m 1SSPkR6AFi7AqsBrN4pLdinUvEceoIKG1+v+j/aAxLz3VXhPJ5immiAkuQAWK0JsZgE3 D9fQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:cc:to:subject; bh=CzH7+8UciELR9v+b8yaKXIr/FuEXw8CWulDJzXN8yNI=; b=EJQmURPEmSWB6T1jnhqqBdAANb4/edBLC1TI3rZg+GumyddqrAx4JVP6nbxpozvLre ONJZNwxJrxzSK1Qq5733iYIvYCJhEHTekS/b07oIbaJRDcfDZXjkerSWzhjZP1nABwNX qgKV9BCR4sDTEo2DsUuxuE4O+cS4hTCyOehFrG+bvr6AjYQWNqXtYDjL29Ljgupevxm0 cmqgHxn/xPiFi4qtU5trkjoWXOff3RxB9JUDRiCAUPz22QWmphJsDb3o7+5hGz9i8+Oc 7kKFGBHW1W2mrzuQlcLafhmEvzeSuPMXD+QCkd7WLYWx6i+sv0L1GgYuCnT2ThMoi1Y8 Mknw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c15si12866170pgj.13.2019.03.24.23.08.07; Sun, 24 Mar 2019 23:08:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729614AbfCYGHW (ORCPT + 99 others); Mon, 25 Mar 2019 02:07:22 -0400 Received: from mga17.intel.com ([192.55.52.151]:11639 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729373AbfCYGHW (ORCPT ); Mon, 25 Mar 2019 02:07:22 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Mar 2019 23:07:21 -0700 X-IronPort-AV: E=Sophos;i="5.60,256,1549958400"; d="scan'208";a="128373226" Received: from likexu-mobl1.ccr.corp.intel.com (HELO [10.239.196.197]) ([10.239.196.197]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/AES128-SHA; 24 Mar 2019 23:07:18 -0700 Subject: Re: [RFC] [PATCH v2 0/5] Intel Virtual PMU Optimization To: Andi Kleen , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, like.xu@intel.com, wei.w.wang@intel.com, Kan Liang , Ingo Molnar , Paolo Bonzini , Thomas Gleixner References: <1553350688-39627-1-git-send-email-like.xu@linux.intel.com> <20190323172800.GD6058@hirez.programming.kicks-ass.net> <20190323231543.GE18020@tassilo.jf.intel.com> From: Like Xu Organization: Intel OTC Message-ID: <0559a810-c0e6-16f6-fa15-7baec707a2ad@linux.intel.com> Date: Mon, 25 Mar 2019 14:07:17 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.0 MIME-Version: 1.0 In-Reply-To: <20190323231543.GE18020@tassilo.jf.intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019/3/24 7:15, Andi Kleen wrote: >>> We optimize the current vPMU to work in this manner: >>> >>> (1) rely on the existing host perf (perf_event_create_kernel_counter) >>> to allocate counters for in-use vPMC and always try to reuse events; >>> (2) vPMU captures guest accesses to the eventsel and fixctrl msr directly >>> to the hardware msr that the corresponding host event is scheduled on >>> and avoid pollution from host is also needed in its partial runtime; >> >> If you do pass-through; how do you deal with event constraints? > > The guest has to deal with them. It already needs to know > the model number to program the right events, can as well know > the constraints too. > > For architectural events that don't need the model number it's > not a problem because they don't have constraints. > > -Andi > I agree this version doesn't seem to keep an eye on host perf event constraints deliberately: 1. Based on my limited knowledge, assuming the model number means hwc->idx. 2. The guest event constraints would be constructed into hwc->config_base value which is pmc->eventsel and pmu->fixed_ctr_ctrl from KVM point of view. 3. The guest PMU has same semantic model on virt hardware limitation as the host does with real PMU (related CPUID/PERF_MSR expose this part of information to guest). 3. Guest perf scheduler would make sure the guest event constraints could dance with right guest model number. 4. vPMU would make sure the guest vPMC get the right guest model number by hard-code EVENT_PINNED or just fail with creation. 5. This patch directly apply the guest hwc->config_base value to host assigned hardware without consent from host perf(a bit deceptive but practical for reducing the number of reprogram calls). === OR ==== If we insist on passing guest event constraints to host perf, this proposal may need the following changes: Because the guest configuration of hwc->config_base mostly only toggles the enable bit of eventsel or fixctrl,it is not necessary to do reprogram_counter because it's serving the same guest perf event. The event creation is only needed when guest writes a complete new value to eventsel or fixctrl.Codes for guest MSR_P6_EVNTSEL0 trap for example may be modified to be like this: u64 diff = pmc->eventsel ^ data; if (intel_pmc_is_assigned(pmc) && diff != ARCH_PERFMON_EVENTSEL_ENABLE) { intel_pmu_save_guest_pmc(pmu, pmc->idx); intel_pmc_stop_counter(pmc); } reprogram_gp_counter(pmc, data); Does this seem to satisfy our needs? It makes everything easier to correct me if I'm wrong.