Received: by 10.223.185.116 with SMTP id b49csp2482068wrg; Mon, 12 Feb 2018 10:14:00 -0800 (PST) X-Google-Smtp-Source: AH8x227qj1pkKyV8na/G+6hegoSooFYJOeMkPjifugjXL8DNIMeKWN4qks2sLk/Rx2GHrUwBzMRU X-Received: by 2002:a17:902:5388:: with SMTP id c8-v6mr1428669pli.355.1518459239961; Mon, 12 Feb 2018 10:13:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518459239; cv=none; d=google.com; s=arc-20160816; b=WpESmLN5VmL4N2mNzoQfUlyy0D/7vEh/uqjwlcILNEXqmqopFPx9/OhCYU+Yu/EBNa KKqcA8V55ogTZf5E3F82BAr95yinS9dcWF9x3VDOmwTw/QgAbQvFJ2LFII2o19bj8mS1 fCWe7OZiTkF3DicNtbLp2d5Z/SJa2kLEs+aP7gFmeZb7HbzKB9ut0wNstc16UnMPDZG1 dnXXOfSEXGjvbXeQdLhH3QnVsRonPI1HGW4dZK1fyW26OS3Atmcmt8zxS2XhwIbLcyrh 1KNzzaCByDLYcFQsf74dighkLiisQ2hXwHHd3K3L2tMuOMa1qaz17Zx04tuj6JbjB9Bw 0Stw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:date:cc:to:subject:from:references :in-reply-to:message-id:arc-authentication-results; bh=UGvj7vT+cJCIeWDHe9qm31wJY3NOPJ1hZNN8gLPMnNM=; b=B/PdckGkVNH6CM0BZYCkSj3jH1dqricTD58+mTm4vXA8qORIoUcapJX60BXDRq4O9a nKfG6Cl0rmsTPQxWcnz8zsOLrdzqlxkBNl+v78Wzw8elKpUoq3RPd9u8BWenm3XjZxiN diAV0KDXHyNz7AQGhhedEEuItK+a/37o6hi5VHttQcQ1ZCBiwKBndZQrvYgFdtSYVoX5 wU+2zGLbiDsXnSLWAgGnmeCbuDi+V9KNbB5u8wZmdtmKCBx+CeToFLjGbkG7forKaQNA z1xakQx6aqTxn4bi4TP7yHAyj1+u2ZgReNiIidfcfFkuFIYzeEm2wrP+doNhthmh1Ith wY8Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y8si2210290pge.391.2018.02.12.10.13.44; Mon, 12 Feb 2018 10:13:59 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753606AbeBLSMc (ORCPT + 99 others); Mon, 12 Feb 2018 13:12:32 -0500 Received: from pegase1.c-s.fr ([93.17.236.30]:24089 "EHLO pegase1.c-s.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753346AbeBLSM0 (ORCPT ); Mon, 12 Feb 2018 13:12:26 -0500 Received: from localhost (mailhub1-int [192.168.12.234]) by localhost (Postfix) with ESMTP id 3zgDKG5Kxbz9tvMs; Mon, 12 Feb 2018 19:12:18 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at c-s.fr Received: from pegase1.c-s.fr ([192.168.12.234]) by localhost (pegase1.c-s.fr [192.168.12.234]) (amavisd-new, port 10024) with ESMTP id Z7Rj8v8PN6Dd; Mon, 12 Feb 2018 19:12:18 +0100 (CET) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase1.c-s.fr (Postfix) with ESMTP id 3zgDKG4n5Mz9tvMh; Mon, 12 Feb 2018 19:12:18 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 010418B96C; Mon, 12 Feb 2018 19:12:25 +0100 (CET) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id vB0nom6fHMJo; Mon, 12 Feb 2018 19:12:24 +0100 (CET) Received: from po15720vm.idsi0.si.c-s.fr (unknown [192.168.232.3]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 8B8348B962; Mon, 12 Feb 2018 19:12:24 +0100 (CET) Received: by po15720vm.idsi0.si.c-s.fr (Postfix, from userid 0) id 6AA9B67B0B; Mon, 12 Feb 2018 19:12:24 +0100 (CET) Message-Id: In-Reply-To: <02a62db83282b5ef3e0e8281fdc46fa91beffc86.1518382747.git.christophe.leroy@c-s.fr> References: <02a62db83282b5ef3e0e8281fdc46fa91beffc86.1518382747.git.christophe.leroy@c-s.fr> From: Christophe Leroy Subject: [RFC REBASED 2/5] powerpc/mm/slice: implement a slice mask cache To: Nicholas Piggin Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Date: Mon, 12 Feb 2018 19:12:24 +0100 (CET) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Calculating the slice mask can become a signifcant overhead for get_unmapped_area. This patch adds a struct slice_mask for each page size in the mm_context, and keeps these in synch with the slices psize arrays and slb_addr_limit. This saves about 30% kernel time on a single-page mmap/munmap micro benchmark. Signed-off-by: Nicholas Piggin Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/book3s/64/mmu.h | 20 ++++++- arch/powerpc/include/asm/mmu-8xx.h | 16 ++++- arch/powerpc/mm/slice.c | 100 ++++++++++++++++++++++++++----- 3 files changed, 118 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h index 0abeb0e2d616..b6d136fd8ffd 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu.h +++ b/arch/powerpc/include/asm/book3s/64/mmu.h @@ -80,6 +80,16 @@ struct spinlock; /* Maximum possible number of NPUs in a system. */ #define NV_MAX_NPUS 8 +/* + * One bit per slice. We have lower slices which cover 256MB segments + * upto 4G range. That gets us 16 low slices. For the rest we track slices + * in 1TB size. + */ +struct slice_mask { + u64 low_slices; + DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH); +}; + typedef struct { mm_context_id_t id; u16 user_psize; /* page size index */ @@ -91,9 +101,17 @@ typedef struct { struct npu_context *npu_context; #ifdef CONFIG_PPC_MM_SLICES + unsigned long slb_addr_limit; u64 low_slices_psize; /* SLB page size encodings */ unsigned char high_slices_psize[SLICE_ARRAY_SIZE]; - unsigned long slb_addr_limit; +# ifdef CONFIG_PPC_64K_PAGES + struct slice_mask mask_64k; +# endif + struct slice_mask mask_4k; +# ifdef CONFIG_HUGETLB_PAGE + struct slice_mask mask_16m; + struct slice_mask mask_16g; +# endif #else u16 sllp; /* SLB page size encoding */ #endif diff --git a/arch/powerpc/include/asm/mmu-8xx.h b/arch/powerpc/include/asm/mmu-8xx.h index b324ab46d838..b97d4ed3dddf 100644 --- a/arch/powerpc/include/asm/mmu-8xx.h +++ b/arch/powerpc/include/asm/mmu-8xx.h @@ -187,15 +187,29 @@ #define M_APG3 0x00000060 #ifndef __ASSEMBLY__ +struct slice_mask { + u64 low_slices; + DECLARE_BITMAP(high_slices, 0); +}; + typedef struct { unsigned int id; unsigned int active; unsigned long vdso_base; #ifdef CONFIG_PPC_MM_SLICES + unsigned long slb_addr_limit; u16 user_psize; /* page size index */ u64 low_slices_psize; /* page size encodings */ unsigned char high_slices_psize[0]; - unsigned long slb_addr_limit; +# ifdef CONFIG_PPC_16K_PAGES + struct slice_mask mask_16k; +# else + struct slice_mask mask_4k; +# endif +# ifdef CONFIG_HUGETLB_PAGE + struct slice_mask mask_512k; + struct slice_mask mask_8m; +# endif #endif } mm_context_t; diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c index db1278ac21c2..ddf015d2d05b 100644 --- a/arch/powerpc/mm/slice.c +++ b/arch/powerpc/mm/slice.c @@ -37,15 +37,6 @@ #include static DEFINE_SPINLOCK(slice_convert_lock); -/* - * One bit per slice. We have lower slices which cover 256MB segments - * upto 4G range. That gets us 16 low slices. For the rest we track slices - * in 1TB size. - */ -struct slice_mask { - u64 low_slices; - DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH); -}; #ifdef DEBUG int _slice_debug = 1; @@ -147,7 +138,7 @@ static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret, __set_bit(i, ret->high_slices); } -static void slice_mask_for_size(struct mm_struct *mm, int psize, +static void calc_slice_mask_for_size(struct mm_struct *mm, int psize, struct slice_mask *ret, unsigned long high_limit) { @@ -176,6 +167,72 @@ static void slice_mask_for_size(struct mm_struct *mm, int psize, } } +#ifdef CONFIG_PPC_BOOK3S_64 +static void recalc_slice_mask_cache(struct mm_struct *mm) +{ + unsigned long l = mm->context.slb_addr_limit; + calc_slice_mask_for_size(mm, MMU_PAGE_4K, &mm->context.mask_4k, l); +#ifdef CONFIG_PPC_64K_PAGES + calc_slice_mask_for_size(mm, MMU_PAGE_64K, &mm->context.mask_64k, l); +#endif +#ifdef CONFIG_HUGETLB_PAGE + calc_slice_mask_for_size(mm, MMU_PAGE_16M, &mm->context.mask_16m, l); + calc_slice_mask_for_size(mm, MMU_PAGE_16G, &mm->context.mask_16g, l); +#endif +} + +static const struct slice_mask *slice_mask_for_size(struct mm_struct *mm, int psize) +{ +#ifdef CONFIG_PPC_64K_PAGES + if (psize == MMU_PAGE_64K) + return &mm->context.mask_64k; +#endif + if (psize == MMU_PAGE_4K) + return &mm->context.mask_4k; +#ifdef CONFIG_HUGETLB_PAGE + if (psize == MMU_PAGE_16M) + return &mm->context.mask_16m; + if (psize == MMU_PAGE_16G) + return &mm->context.mask_16g; +#endif + BUG(); +} +#elif defined(CONFIG_PPC_8xx) +static void recalc_slice_mask_cache(struct mm_struct *mm) +{ + unsigned long l = mm->context.slb_addr_limit; +#ifdef CONFIG_PPC_16K_PAGES + calc_slice_mask_for_size(mm, MMU_PAGE_16K, &mm->context.mask_16k, l); +#else + calc_slice_mask_for_size(mm, MMU_PAGE_4K, &mm->context.mask_4k, l); +#endif +#ifdef CONFIG_HUGETLB_PAGE + calc_slice_mask_for_size(mm, MMU_PAGE_512K, &mm->context.mask_512k, l); + calc_slice_mask_for_size(mm, MMU_PAGE_8M, &mm->context.mask_8m, l); +#endif +} + +static const struct slice_mask *slice_mask_for_size(struct mm_struct *mm, int psize) +{ +#ifdef CONFIG_PPC_16K_PAGES + if (psize == MMU_PAGE_16K) + return &mm->context.mask_16k; +#else + if (psize == MMU_PAGE_4K) + return &mm->context.mask_4k; +#endif +#ifdef CONFIG_HUGETLB_PAGE + if (psize == MMU_PAGE_512K) + return &mm->context.mask_512k; + if (psize == MMU_PAGE_8M) + return &mm->context.mask_8m; +#endif + BUG(); +} +#else +#error "Must define the slice masks for page sizes supported by the platform" +#endif + static int slice_check_fit(struct mm_struct *mm, const struct slice_mask *mask, const struct slice_mask *available) @@ -251,6 +308,8 @@ static void slice_convert(struct mm_struct *mm, (unsigned long)mm->context.low_slices_psize, (unsigned long)mm->context.high_slices_psize); + recalc_slice_mask_cache(mm); + spin_unlock_irqrestore(&slice_convert_lock, flags); copro_flush_all_slbs(mm); @@ -449,7 +508,14 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len, } if (high_limit > mm->context.slb_addr_limit) { + unsigned long flags; + mm->context.slb_addr_limit = high_limit; + + spin_lock_irqsave(&slice_convert_lock, flags); + recalc_slice_mask_cache(mm); + spin_unlock_irqrestore(&slice_convert_lock, flags); + on_each_cpu(slice_flush_segments, mm, 1); } @@ -488,7 +554,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len, /* First make up a "good" mask of slices that have the right size * already */ - slice_mask_for_size(mm, psize, &good_mask, high_limit); + good_mask = *slice_mask_for_size(mm, psize); slice_print_mask(" good_mask", &good_mask); /* @@ -513,7 +579,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len, #ifdef CONFIG_PPC_64K_PAGES /* If we support combo pages, we can allow 64k pages in 4k slices */ if (psize == MMU_PAGE_64K) { - slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask, high_limit); + compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K); if (fixed) slice_or_mask(&good_mask, &compat_mask); } @@ -695,7 +761,7 @@ void slice_set_user_psize(struct mm_struct *mm, unsigned int psize) goto bail; mm->context.user_psize = psize; - wmb(); + wmb(); /* Why? */ lpsizes = mm->context.low_slices_psize; for (i = 0; i < SLICE_NUM_LOW; i++) @@ -722,6 +788,9 @@ void slice_set_user_psize(struct mm_struct *mm, unsigned int psize) (unsigned long)mm->context.low_slices_psize, (unsigned long)mm->context.high_slices_psize); + recalc_slice_mask_cache(mm); + spin_unlock_irqrestore(&slice_convert_lock, flags); + return; bail: spin_unlock_irqrestore(&slice_convert_lock, flags); } @@ -762,18 +831,17 @@ int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr, { struct slice_mask mask, available; unsigned int psize = mm->context.user_psize; - unsigned long high_limit = mm->context.slb_addr_limit; if (radix_enabled()) return 0; slice_range_to_mask(addr, len, &mask); - slice_mask_for_size(mm, psize, &available, high_limit); + available = *slice_mask_for_size(mm, psize); #ifdef CONFIG_PPC_64K_PAGES /* We need to account for 4k slices too */ if (psize == MMU_PAGE_64K) { struct slice_mask compat_mask; - slice_mask_for_size(mm, MMU_PAGE_4K, &compat_mask, high_limit); + compat_mask = *slice_mask_for_size(mm, MMU_PAGE_4K); slice_or_mask(&available, &compat_mask); } #endif -- 2.13.3