Received: by 10.223.185.116 with SMTP id b49csp1092130wrg; Fri, 23 Feb 2018 11:47:47 -0800 (PST) X-Google-Smtp-Source: AH8x227jPrBlAVGDDcRsYGyZzY83ZuPgMJXB726XZ7kp95ZmvYY6PF1Se+v2/Hxb2QRHPdkLa53M X-Received: by 10.101.91.3 with SMTP id y3mr2277576pgq.149.1519415267177; Fri, 23 Feb 2018 11:47:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519415267; cv=none; d=google.com; s=arc-20160816; b=oKMcTwctqMeBCF9svQXj0h0SGTsPRoUY9EYUbDnrLFrrA0azL3hwCnI7cJz4bUra48 VmbyzqHvd/Q8XFXqBZPrEAmD2Ax7r078eWbSEknMI5BLKXLFkP/QDhqheuN67lQoQKG7 IkUw3JKCYzcgdahBKPB013psV2DCyUhfiBPMnNeWizrd1VNKcXJoARtmwPg5SuU4lJeL dmWZ4ZJZs9o35FFR8wh2UDkhUp5K3m/Glo5jobX0GIyoOSEhRHX32fXrTGW+f+rpU9me K38YMgs2/shSYZG5Js2PSoestdy2QGzBu8b40za3p1N2RBzTHrDeZgNletvCXKJtHeUf R2+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=oqUoHKrMKf/x/pa28dSxzaKUXHU7a1WGtYHSZsey1sM=; b=FEQUucxGkQJ1oSTzS5kiraIeoMv0Fycd7hj+EXEFHfORb4q1QYke7MIC8lbLyyIOlq 5vihCfUfvEd+jX7aisTxTk2KE03PRU6s+/NvkCDH2jt0tMLezfc0T1AJySfGNxGQPNHf 0628gz+0x5KDFlNNxy4JG0S93I/bDaFOLmHPsdPL0a4cTHJVBsfNguNNKwEHUOI6ERf1 Zs/A9VYPkajO+JqKkcu70eERqfYFdznucI45qSBCK25T3BQvXNhcN8uMqT1qo3zgoHr9 OIB/FL/TH/dyG3Nr5+5as8dMCnqmY6TMFEvd+RJci7q7cxNM2TzXU8n09AzPNuKWdScx RP2g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n10si1875256pgp.343.2018.02.23.11.47.32; Fri, 23 Feb 2018 11:47:47 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754985AbeBWTqK (ORCPT + 99 others); Fri, 23 Feb 2018 14:46:10 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:44742 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934264AbeBWSr6 (ORCPT ); Fri, 23 Feb 2018 13:47:58 -0500 Received: from localhost (LFbn-1-12258-90.w90-92.abo.wanadoo.fr [90.92.71.90]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 348F3FDC; Fri, 23 Feb 2018 18:47:57 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Nicholas Piggin , Michael Ellerman Subject: [PATCH 4.9 089/145] powerpc/64s: Improve RFI L1-D cache flush fallback Date: Fri, 23 Feb 2018 19:26:35 +0100 Message-Id: <20180223170736.276371585@linuxfoundation.org> X-Mailer: git-send-email 2.16.2 In-Reply-To: <20180223170724.669759283@linuxfoundation.org> References: <20180223170724.669759283@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.9-stable review patch. If anyone has any objections, please let me know. ------------------ From: Nicholas Piggin commit bdcb1aefc5b3f7d0f1dc8b02673602bca2ff7a4b upstream. The fallback RFI flush is used when firmware does not provide a way to flush the cache. It's a "displacement flush" that evicts useful data by displacing it with an uninteresting buffer. The flush has to take care to work with implementation specific cache replacment policies, so the recipe has been in flux. The initial slow but conservative approach is to touch all lines of a congruence class, with dependencies between each load. It has since been determined that a linear pattern of loads without dependencies is sufficient, and is significantly faster. Measuring the speed of a null syscall with RFI fallback flush enabled gives the relative improvement: P8 - 1.83x P9 - 1.75x The flush also becomes simpler and more adaptable to different cache geometries. Signed-off-by: Nicholas Piggin [mpe: Backport to 4.9] Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/include/asm/paca.h | 3 - arch/powerpc/kernel/asm-offsets.c | 3 - arch/powerpc/kernel/exceptions-64s.S | 76 ++++++++++++++++------------------- arch/powerpc/kernel/setup_64.c | 13 ----- 4 files changed, 39 insertions(+), 56 deletions(-) --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -212,8 +212,7 @@ struct paca_struct { */ u64 exrfi[13] __aligned(0x80); void *rfi_flush_fallback_area; - u64 l1d_flush_congruence; - u64 l1d_flush_sets; + u64 l1d_flush_size; #endif }; --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -242,8 +242,7 @@ int main(void) DEFINE(PACA_IN_MCE, offsetof(struct paca_struct, in_mce)); DEFINE(PACA_RFI_FLUSH_FALLBACK_AREA, offsetof(struct paca_struct, rfi_flush_fallback_area)); DEFINE(PACA_EXRFI, offsetof(struct paca_struct, exrfi)); - DEFINE(PACA_L1D_FLUSH_CONGRUENCE, offsetof(struct paca_struct, l1d_flush_congruence)); - DEFINE(PACA_L1D_FLUSH_SETS, offsetof(struct paca_struct, l1d_flush_sets)); + DEFINE(PACA_L1D_FLUSH_SIZE, offsetof(struct paca_struct, l1d_flush_size)); #endif DEFINE(PACAHWCPUID, offsetof(struct paca_struct, hw_cpu_id)); DEFINE(PACAKEXECSTATE, offsetof(struct paca_struct, kexec_state)); --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1602,39 +1602,37 @@ rfi_flush_fallback: std r9,PACA_EXRFI+EX_R9(r13) std r10,PACA_EXRFI+EX_R10(r13) std r11,PACA_EXRFI+EX_R11(r13) - std r12,PACA_EXRFI+EX_R12(r13) - std r8,PACA_EXRFI+EX_R13(r13) mfctr r9 ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13) - ld r11,PACA_L1D_FLUSH_SETS(r13) - ld r12,PACA_L1D_FLUSH_CONGRUENCE(r13) - /* - * The load adresses are at staggered offsets within cachelines, - * which suits some pipelines better (on others it should not - * hurt). - */ - addi r12,r12,8 + ld r11,PACA_L1D_FLUSH_SIZE(r13) + srdi r11,r11,(7 + 3) /* 128 byte lines, unrolled 8x */ mtctr r11 DCBT_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */ /* order ld/st prior to dcbt stop all streams with flushing */ sync -1: li r8,0 - .rept 8 /* 8-way set associative */ - ldx r11,r10,r8 - add r8,r8,r12 - xor r11,r11,r11 // Ensure r11 is 0 even if fallback area is not - add r8,r8,r11 // Add 0, this creates a dependency on the ldx - .endr - addi r10,r10,128 /* 128 byte cache line */ + + /* + * The load adresses are at staggered offsets within cachelines, + * which suits some pipelines better (on others it should not + * hurt). + */ +1: + ld r11,(0x80 + 8)*0(r10) + ld r11,(0x80 + 8)*1(r10) + ld r11,(0x80 + 8)*2(r10) + ld r11,(0x80 + 8)*3(r10) + ld r11,(0x80 + 8)*4(r10) + ld r11,(0x80 + 8)*5(r10) + ld r11,(0x80 + 8)*6(r10) + ld r11,(0x80 + 8)*7(r10) + addi r10,r10,0x80*8 bdnz 1b mtctr r9 ld r9,PACA_EXRFI+EX_R9(r13) ld r10,PACA_EXRFI+EX_R10(r13) ld r11,PACA_EXRFI+EX_R11(r13) - ld r12,PACA_EXRFI+EX_R12(r13) - ld r8,PACA_EXRFI+EX_R13(r13) GET_SCRATCH0(r13); rfid @@ -1645,39 +1643,37 @@ hrfi_flush_fallback: std r9,PACA_EXRFI+EX_R9(r13) std r10,PACA_EXRFI+EX_R10(r13) std r11,PACA_EXRFI+EX_R11(r13) - std r12,PACA_EXRFI+EX_R12(r13) - std r8,PACA_EXRFI+EX_R13(r13) mfctr r9 ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13) - ld r11,PACA_L1D_FLUSH_SETS(r13) - ld r12,PACA_L1D_FLUSH_CONGRUENCE(r13) - /* - * The load adresses are at staggered offsets within cachelines, - * which suits some pipelines better (on others it should not - * hurt). - */ - addi r12,r12,8 + ld r11,PACA_L1D_FLUSH_SIZE(r13) + srdi r11,r11,(7 + 3) /* 128 byte lines, unrolled 8x */ mtctr r11 DCBT_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */ /* order ld/st prior to dcbt stop all streams with flushing */ sync -1: li r8,0 - .rept 8 /* 8-way set associative */ - ldx r11,r10,r8 - add r8,r8,r12 - xor r11,r11,r11 // Ensure r11 is 0 even if fallback area is not - add r8,r8,r11 // Add 0, this creates a dependency on the ldx - .endr - addi r10,r10,128 /* 128 byte cache line */ + + /* + * The load adresses are at staggered offsets within cachelines, + * which suits some pipelines better (on others it should not + * hurt). + */ +1: + ld r11,(0x80 + 8)*0(r10) + ld r11,(0x80 + 8)*1(r10) + ld r11,(0x80 + 8)*2(r10) + ld r11,(0x80 + 8)*3(r10) + ld r11,(0x80 + 8)*4(r10) + ld r11,(0x80 + 8)*5(r10) + ld r11,(0x80 + 8)*6(r10) + ld r11,(0x80 + 8)*7(r10) + addi r10,r10,0x80*8 bdnz 1b mtctr r9 ld r9,PACA_EXRFI+EX_R9(r13) ld r10,PACA_EXRFI+EX_R10(r13) ld r11,PACA_EXRFI+EX_R11(r13) - ld r12,PACA_EXRFI+EX_R12(r13) - ld r8,PACA_EXRFI+EX_R13(r13) GET_SCRATCH0(r13); hrfid --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -745,19 +745,8 @@ static void init_fallback_flush(void) memset(l1d_flush_fallback_area, 0, l1d_size * 2); for_each_possible_cpu(cpu) { - /* - * The fallback flush is currently coded for 8-way - * associativity. Different associativity is possible, but it - * will be treated as 8-way and may not evict the lines as - * effectively. - * - * 128 byte lines are mandatory. - */ - u64 c = l1d_size / 8; - paca[cpu].rfi_flush_fallback_area = l1d_flush_fallback_area; - paca[cpu].l1d_flush_congruence = c; - paca[cpu].l1d_flush_sets = c / 128; + paca[cpu].l1d_flush_size = l1d_size; } }