Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp2459680pxb; Mon, 23 Aug 2021 22:56:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwNZ+cKR2bpnyzBduBFATpdoD/Mc5QsHpo6JWBiH5qiq0ZaW03PGRT/EKCj0IBQvFW2q6fY X-Received: by 2002:a05:6402:5:: with SMTP id d5mr40937709edu.359.1629784585403; Mon, 23 Aug 2021 22:56:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629784585; cv=none; d=google.com; s=arc-20160816; b=B837Ftp+s3G9Ekut3IU5zxhycLDNYqTvkRdnkLcQ7T82VRwJPvC9iNb/sXL7uVp6xP OyTIPjZ42ZGuE/61NHzSrmd/M5ildZBysJlX67fFZUljpoILoT8+sbrBDDpRTJ/OKMBu KnVzzyqOxCVnnEk4JDijRUaB+7uwaZ6uk+DhMc7emsHhtsoDG01NlLj0m7qbFvhNUnzL s59yIAsGQ1K4CDg/QuOj6+mmiCx17t6VZs9/T4wnOPv02zZwr9iaTTvkw6rlngSsghWu vmFDf91VkC0Gy29NBLZSi9qe2UNIzsUsnJ+ADA+P+Y0Qsz4q6cHBl5c0sIMcwTGD0N8b A6Pg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=AT+L387rcZ4dHeZZHy5rRq5k3fpjYQ4LJB0DzegKNIQ=; b=tzgYnKWeidUSPtYwyhkduOEBEZBgxnRiyPZtaNGEGgv7nDpSfWX6UwL+SXo2siUyoy t1KCaheSKrPBehUJBAJJN0UNJMywu6f+y2rhrvAEU1BJeCnNx6YP6GHlb7y4fSKeTCvS o1X7T/Kqr3PSIchSqQeoytYSK/nIh3lWr2RpnKPCZGuiZfPKzMdT/Bpf+J+Ww2qIeJ1p /JSYdaHYluNtBu+F09WzAyo2kfj8wv46PlHYOvXxNMqQRlOY0OfUGZEayNlL/eNdAnOA ut8SywybD9GSPXzTY42nC/O8a7NZX7/JA0OMvKSvOHHrDqmHWg6COYUDuC7lZ4NPu77F TsWQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s10si1505338edd.150.2021.08.23.22.56.02; Mon, 23 Aug 2021 22:56:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234127AbhHXFzJ (ORCPT + 99 others); Tue, 24 Aug 2021 01:55:09 -0400 Received: from pegase2.c-s.fr ([93.17.235.10]:38507 "EHLO pegase2.c-s.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230104AbhHXFzI (ORCPT ); Tue, 24 Aug 2021 01:55:08 -0400 Received: from localhost (mailhub3.si.c-s.fr [172.26.127.67]) by localhost (Postfix) with ESMTP id 4Gtywt4Rsgz9sTs; Tue, 24 Aug 2021 07:54:22 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from pegase2.c-s.fr ([172.26.127.65]) by localhost (pegase2.c-s.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id suucyjXRqs1c; Tue, 24 Aug 2021 07:54:22 +0200 (CEST) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase2.c-s.fr (Postfix) with ESMTP id 4Gtywt3BDnz9sTp; Tue, 24 Aug 2021 07:54:22 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 4BBBE8B7D2; Tue, 24 Aug 2021 07:54:22 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id DgcerSLVu5AC; Tue, 24 Aug 2021 07:54:22 +0200 (CEST) Received: from [192.168.4.90] (unknown [192.168.4.90]) by messagerie.si.c-s.fr (Postfix) with ESMTP id DA5178B7D1; Tue, 24 Aug 2021 07:54:21 +0200 (CEST) Subject: Re: [PATCH] powerpc/32: Don't use lmw/stmw for saving/restoring non volatile regs To: Segher Boessenkool Cc: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org References: <316c543b8906712c108985c8463eec09c8db577b.1629732542.git.christophe.leroy@csgroup.eu> <20210823184648.GY1583@gate.crashing.org> From: Christophe Leroy Message-ID: <9bbc9797-cfc7-1484-90ad-2146ff1a5e18@csgroup.eu> Date: Tue, 24 Aug 2021 07:54:22 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <20210823184648.GY1583@gate.crashing.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: fr Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Le 23/08/2021 à 20:46, Segher Boessenkool a écrit : > On Mon, Aug 23, 2021 at 03:29:12PM +0000, Christophe Leroy wrote: >> Instructions lmw/stmw are interesting for functions that are rarely >> used and not in the cache, because only one instruction is to be >> copied into the instruction cache instead of 19. However those >> instruction are less performant than 19x raw lwz/stw as they require >> synchronisation plus one additional cycle. > > lmw takes N+2 cycles for loading N words on 603/604/750/7400, and N+3 on > 7450. stmw takes N+1 cycles for storing N words on 603, N+2 on 604/750/ > 7400, and N+3 on 7450 (load latency is 3 instead of 2 on 7450). > > There is no synchronisation needed, although there is some serialisation, > which of course doesn't mean much since there can be only 6 or 8 or so > insns executing at once anyway. Yes I meant serialisation, isn't it the same as synchronisation ? > > So, these insns are almost never slower, they can easily win cycles back > because of the smaller code, too. > > What 32-bit core do you see where load/store multiple are more than a > fraction of a cycle (per memory access) slower? > >> SAVE_NVGPRS / REST_NVGPRS are used in only a few places which are >> mostly in interrupts entries/exits and in task switch so they are >> likely already in the cache. > > Nothing is likely in the cache on the older cores (except in > microbenchmarks), the caches are not big enough for that! Even syscall entries/exit pathes and/or most frequent interrupts entries and interrupt exit ? > >> Using standard lwz improves null_syscall selftest by: >> - 10 cycles on mpc832x. >> - 2 cycles on mpc8xx. > > And in real benchmarks? Don't know, what benchmark should I use to evaluate syscall entry/exit if 'null_syscall' selftest is not relevant ? > > On mpccore both lmw and stmw are only N+1 btw. But the serialization > might cost another cycle here? > That coherent on MPC8xx, that's only 2 cycles. But on the mpc832x which has a e300c2 core, it looks like I have 10 cycles difference. Is anything wrong ? Christophe