Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp2237302ybg; Sun, 27 Oct 2019 14:06:29 -0700 (PDT) X-Google-Smtp-Source: APXvYqx19pjI84WBU4HmtbrXhNM5AXc/BvHPSB3DjakJ0zElElTE5la7q/b5k/KKDqvQWQBhTx8G X-Received: by 2002:a17:906:3ecc:: with SMTP id d12mr13662212ejj.52.1572210299309; Sun, 27 Oct 2019 14:04:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572210299; cv=none; d=google.com; s=arc-20160816; b=BveWjmNjhBZF2MPGdiMXR3qwXryEPXBW+n0NezP5X8u2QwoiAEeG5BTpPjfUALM9RU 8GfBh+iEprx6MqbOJPDpPM/QIfvE0OX24XGZkMDf5lXUBRLI6aSZyliDHi+iYx3Qx98f 3GLPlbX8Nhzf/Tt2FsCy7lETVCBpE5wbhhqyKZgWikvDBXuYpbNVcTcH2DtX07giKz3k /VktZnioOj+VRo+yCwjEh1vvU1B7Aon3eqOHRj7cZqorIh9C3fuNGIhJqHuNWPRFtfQj HV+DB/6oQpRox6VmC+a27RhE/d03nH6fWaeJs3Cs/r/5F7m4/4KuyE1DWSAfTmmEfcLc t5Qg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=NV4ELFtTvrSX4QGUFe1JBzX1SsdaObCQhTLfz11H6BI=; b=0wrhPsxeQh+3hTQKV777vH5mzRiUCznjPLhQ8l798Rz37trXu/gDdoU4AJsPURTBVR YGPEaOD4gixPTygk0wlGLbmTCmFeNws7V22boJhjIzFBdfV/0wNDuF/5N8VYREH/Dh6x qhTtQhBA54x02yDW7yD6xWRJXlx54KW++aCEUToSmxVHj7UQzE0qnpKg9HqHxlrka8dh YdAiaQpbQogIlvl3V3cKzlGvc2KaYsEw74G7Lvsex4EP2WA3i8zSDM/KQYoQWYRxfO7E P9iTE7UZxf0UdTR2iBN+vZJWw25eKjs6KHh0rfG+4lTcNJe0vv3u2o9myIyIgey5WolA VShA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t15si3775282edt.264.2019.10.27.14.04.15; Sun, 27 Oct 2019 14:04:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727947AbfJ0THu (ORCPT + 99 others); Sun, 27 Oct 2019 15:07:50 -0400 Received: from gate.crashing.org ([63.228.1.57]:57536 "EHLO gate.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726974AbfJ0THu (ORCPT ); Sun, 27 Oct 2019 15:07:50 -0400 Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id x9RJ7Apj017967; Sun, 27 Oct 2019 14:07:10 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id x9RJ79gY017964; Sun, 27 Oct 2019 14:07:09 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Sun, 27 Oct 2019 14:07:09 -0500 From: Segher Boessenkool To: Christophe Leroy Cc: Thomas Gleixner , linux-kernel@vger.kernel.org, Paul Mackerras , luto@kernel.org, vincenzo.frascino@arm.com, linuxppc-dev@lists.ozlabs.org Subject: Re: [RFC PATCH] powerpc/32: Switch VDSO to C implementation. Message-ID: <20191027190709.GZ28442@gate.crashing.org> References: <8ce3582f7f7da9ff0286ced857e5aa2e5ae6746e.1571662378.git.christophe.leroy@c-s.fr> <95bd2367-8edc-29db-faa3-7729661e05f2@c-s.fr> <439bce37-9c2c-2afe-9c9e-2f500472f9f8@c-s.fr> <20191026230609.GY28442@gate.crashing.org> <8e4d0b82-a7a1-b7f1-308e-df871b32d317@c-s.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <8e4d0b82-a7a1-b7f1-308e-df871b32d317@c-s.fr> User-Agent: Mutt/1.4.2.3i Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Oct 27, 2019 at 10:21:25AM +0100, Christophe Leroy wrote: > Le 27/10/2019 ? 01:06, Segher Boessenkool a ?crit?: > >The hand-optimised asm code will pretty likely win handsomely, whatever > >you do. Especially on cores like the 885 (no branch prediction, single > >issue, small caches, etc.: every instruction counts). > > > >Is there any reason to replace this hand-optimised code? It was written > >for exacty this reason? These functions are critical and should be as > >fast as possible. > > Well, all this started with COARSE clocks not being supported by PPC32 > VDSO. I first submitted a series with a set of optimisations including > the implementation of COARSE clocks > (https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=126779) > > Then after a comment received on patch 4 of the series from Santosh > Sivaraj asking for a common implementation of it for PPC32 and PPC64, I > started looking into making the whole VDSO source code common to PPC32 > and PPC64. Most functions are similar. Time functions are also rather > similar but unfortunately don't use the same registers. They also don't > cover all possible clocks. And getres() is also buggy, see series > https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=110321 That is all nice work :-) > So instead of reworking the existing time functions, I started > investigating whether we could plug powerpc to the generic > implementation. One drawback of PPC is that we need to setup an ASM > trampoline to handle the SO bit as it can't be handled from C directly, > can it ? There is no way to say what CR bits to return. The ABI requires some of those bits to be preserved, and some are volatile. System calls use a different ABI, one the compiler knows nothing about, so you cannot even show system calls as calls to the compiler. > How critical are these functions ? Although we have a slight degration > with the C implementation, they are still way faster than the > corresponding syscall. "Slight": With current powerpc/32 ASM VDSO: gettimeofday: vdso: 750 nsec/call clock-getres-realtime: vdso: 382 nsec/call clock-gettime-realtime: vdso: 928 nsec/call clock-getres-monotonic: vdso: 382 nsec/call clock-gettime-monotonic: vdso: 1033 nsec/call Once switched to C implementation: gettimeofday: vdso: 1533 nsec/call clock-getres-realtime: vdso: 853 nsec/call clock-gettime-realtime: vdso: 1570 nsec/call clock-getres-monotonic: vdso: 835 nsec/call clock-gettime-monotonic: vdso: 1605 nsec/call ---> Those that are not more than two times slower are almost that. <--- This also needs measurements on more representative PowerPC cores, say some G3 or G4; and on modern CPUs (Power7/8/9). It also needs context with those measurements: what CPU core is it? Running at what frequency clock? > Another thing I was wondering, is it worth using the 64 bit timebase on > PPC32 ? As far as I understand, the timebase is there to calculate a > linear date update since last VDSO datapage update. How often is the > VDSO datapage updated ? On the 885 clocked at 132Mhz, the timebase is at > 8.25 Mhz, which means it needs more than 8 minutes to loop over 32 bits. On most PowerPC cores the time base is incremented significantly faster. Usual speeds for older cores are 50MHz to 100MHz, and for newer cores ten times that. Recommended frequency is currently 512MHz, so you'll wrap the low 32 bits in 8s or so on those, and in about a minute on many powermac etc. machines already. How can you know this long hasn't passed since the last time you read the high half of the time base? Without reading that high part? The current (assembler) code already optimises converting this to some other scale quite well, better than a compiler can (see __do_get_tspec). Segher