Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752237AbdI0Woz (ORCPT ); Wed, 27 Sep 2017 18:44:55 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:29849 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751970AbdI0Woy (ORCPT ); Wed, 27 Sep 2017 18:44:54 -0400 Subject: Re: [PATCH v4 2/3] x86/xen/time: setup vcpu 0 time info page To: Joao Martins References: <20170927134623.3147-1-joao.m.martins@oracle.com> <20170927134623.3147-3-joao.m.martins@oracle.com> <0f58594c-ee42-7813-a0c2-1cbbc1e2a576@oracle.com> <2fb49000-5f07-8cb9-f16d-1701c36b6c49@oracle.com> Cc: linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Juergen Gross , Andy Lutomirski From: Boris Ostrovsky Message-ID: Date: Wed, 27 Sep 2017 18:44:21 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <2fb49000-5f07-8cb9-f16d-1701c36b6c49@oracle.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Source-IP: aserv0021.oracle.com [141.146.126.233] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3927 Lines: 97 On 09/27/2017 04:57 PM, Joao Martins wrote: > On 09/27/2017 09:22 PM, Boris Ostrovsky wrote: >> On 09/27/2017 11:26 AM, Joao Martins wrote: >>> On 09/27/2017 03:40 PM, Boris Ostrovsky wrote: >>>>> +static void xen_setup_vsyscall_time_info(void) >>>>> +{ >>>>> + struct vcpu_register_time_memory_area t; >>>>> + struct pvclock_vsyscall_time_info *ti; >>>>> + struct pvclock_vcpu_time_info *pvti; >>>>> + int ret; >>>>> + >>>>> + pvti = &__this_cpu_read(xen_vcpu)->time; >>>>> + >>>>> + /* >>>>> + * We check ahead on the primary time info if this >>>>> + * bit is supported hence speeding up Xen clocksource. >>>>> + */ >>>>> + if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)) >>>>> + return; >>>>> + >>>>> + pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT); >>>> Is it OK to have this flag set if anything below fails? >>>> >>> Yes - if anything below fails it will only affect userspace mapped page. >> Then should it be set somewhere else, like in xen_time_init()? >> > Hm, I could move it if you think it's better - but given the importance of the > bit we are checking and its direct correlation to whether or not we can setup > VCLOCK_PVCLOCK then I find it cleaner to have it here in the same routine. One > thing I failed to mention before is that checking ahead like above, let us also > avoid allocating a page plus an hypercall to register the pvti just to check the > one bit of info we need for using VCLOCK_PVCLOCK. > > It is very unlikely with current Xen code that 1) the secondary copy register > below fails, or 2) master and secondary don't have the same bits set. So in case > you're reconsidering the "shortcut" check above I can move it like we had in v1 > and have pvclock_set_flags right before pvclock_set_pvti_cpu0_va(). I think it would be more logical to move it to the end like in v1. But can you explain again why this flag should not be set in xen_time_init()? It seems to me that it would be useful not just for vDSO but for xen_clocksource_read()->pvclock_clocksource_read() as well. -boris > >>> What I >>> do above is just allowing xen clocksource to use/check that bit (consequently >>> speeding up sched_clock) given the necessary support is there in the master >>> copy. The secondary copy (i.e. what's being set up below, mapped/used in vdso) >>> has the same data from the master copy, just separate memory regions. The checks >>> below are just for the unlikely cases of failing to register the secondary copy >>> or if its content were to differ from master copy in future releases - and >>> therefore we handle those more gracefully. >>> >>>> (I can see in the changelog that apparently at some point I've asked >>>> about this at v1 but I can't remember/find what exactly it was) >>>> >>>>> + >>>>> + ti = (struct pvclock_vsyscall_time_info *)get_zeroed_page(GFP_KERNEL); >>>>> + if (!ti) >>>>> + return; >>>>> + >>>>> + t.addr.v = &ti->pvti; >>>>> + >>>>> + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, 0, &t); >>>>> + if (ret) { >>>>> + pr_notice("xen: VCLOCK_PVCLOCK not supported (err %d)\n", ret); >>>>> + free_page((unsigned long)ti); >>>>> + return; >>>>> + } >>>>> + >>>>> + /* >>>>> + * If the check above succedded this one should too since it's the >>>>> + * same data on both primary and secondary time infos just different >>>>> + * memory regions. But we still check it in case hypervisor is buggy. >>>>> + */ >>>>> + pvti = &ti->pvti; >>>>> + if (!(pvti->flags & PVCLOCK_TSC_STABLE_BIT)) { >>>>> + t.addr.v = NULL; >>>>> + ret = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_time_memory_area, >>>>> + 0, &t); >>>>> + if (!ret) >>>>> + free_page((unsigned long)ti); >>>>> + >>>>> + pr_notice("xen: VCLOCK_PVCLOCK not supported (tsc unstable)\n"); >>>>> + return; >>>>> + } >>>>> + >>>>> + xen_clock = ti; >>>>> + pvclock_set_pvti_cpu0_va(xen_clock); >>>>> + >>>>> + xen_clocksource.archdata.vclock_mode = VCLOCK_PVCLOCK; >>>>> +} >>>>> +