Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751408AbdGPC0M (ORCPT ); Sat, 15 Jul 2017 22:26:12 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:54035 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751258AbdGPCYj (ORCPT ); Sat, 15 Jul 2017 22:24:39 -0400 From: Dennis Zhou To: Tejun Heo , Christoph Lameter CC: , , , Dennis Zhou Subject: [PATCH 04/10] percpu: update the header comment and pcpu_build_alloc_info comments Date: Sat, 15 Jul 2017 22:23:09 -0400 Message-ID: <20170716022315.19892-5-dennisz@fb.com> X-Mailer: git-send-email 2.13.1 In-Reply-To: <20170716022315.19892-1-dennisz@fb.com> References: <20170716022315.19892-1-dennisz@fb.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [192.168.52.123] X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-07-16_01:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5592 Lines: 115 From: "Dennis Zhou (Facebook)" The header comment for percpu memory is a little hard to parse and is not super clear about how the first chunk is managed. This adds a little more clarity to the situation. There is also quite a bit of tricky logic in the pcpu_build_alloc_info. This adds a restructure of a comment to add a little more information. Unfortunately, you will still have to piece together a handful of other comments too, but should help direct you to the meaningful comments. Signed-off-by: Dennis Zhou --- mm/percpu.c | 58 ++++++++++++++++++++++++++++++++-------------------------- 1 file changed, 32 insertions(+), 26 deletions(-) diff --git a/mm/percpu.c b/mm/percpu.c index 9ec5fd4..5bb90d8 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -4,36 +4,35 @@ * Copyright (C) 2009 SUSE Linux Products GmbH * Copyright (C) 2009 Tejun Heo * - * This file is released under the GPLv2. + * This file is released under the GPLv2 license. * - * This is percpu allocator which can handle both static and dynamic - * areas. Percpu areas are allocated in chunks. Each chunk is - * consisted of boot-time determined number of units and the first - * chunk is used for static percpu variables in the kernel image - * (special boot time alloc/init handling necessary as these areas - * need to be brought up before allocation services are running). - * Unit grows as necessary and all units grow or shrink in unison. - * When a chunk is filled up, another chunk is allocated. + * The percpu allocator handles both static and dynamic areas. Percpu + * areas are allocated in chunks which are divided into units. There is + * a 1-to-1 mapping for units to possible cpus. These units are grouped + * based on NUMA properties of the machine. * * c0 c1 c2 * ------------------- ------------------- ------------ * | u0 | u1 | u2 | u3 | | u0 | u1 | u2 | u3 | | u0 | u1 | u * ------------------- ...... ------------------- .... ------------ + + * Allocation is done by offsets into a unit's address space. Ie., an + * area of 512 bytes at 6k in c1 occupies 512 bytes at 6k in c1:u0, + * c1:u1, c1:u2, etc. On NUMA machines, the mapping may be non-linear + * and even sparse. Access is handled by configuring percpu base + * registers according to the cpu to unit mappings and offsetting the + * base address using pcpu_unit_size. + * + * There is special consideration for the first chunk which must handle + * the static percpu variables in the kernel image as allocation services + * are not online yet. In short, the first chunk is structure like so: * - * Allocation is done in offset-size areas of single unit space. Ie, - * an area of 512 bytes at 6k in c1 occupies 512 bytes at 6k of c1:u0, - * c1:u1, c1:u2 and c1:u3. On UMA, units corresponds directly to - * cpus. On NUMA, the mapping can be non-linear and even sparse. - * Percpu access can be done by configuring percpu base registers - * according to cpu to unit mapping and pcpu_unit_size. - * - * There are usually many small percpu allocations many of them being - * as small as 4 bytes. The allocator organizes chunks into lists - * according to free size and tries to allocate from the fullest one. - * Each chunk keeps the maximum contiguous area size hint which is - * guaranteed to be equal to or larger than the maximum contiguous - * area in the chunk. This helps the allocator not to iterate the - * chunk maps unnecessarily. + * + * + * The static data is copied from the original section managed by the + * linker. The reserved section, if non-zero, primarily manages static + * percpu variables from kernel modules. Finally, the dynamic section + * takes care of normal allocations. * * Allocation state in each chunk is kept using an array of integers * on chunk->map. A positive value in the map represents a free @@ -43,6 +42,12 @@ * Chunks can be determined from the address using the index field * in the page struct. The index field contains a pointer to the chunk. * + * These chunks are organized into lists according to free_size and + * tries to allocate from the fullest chunk first. Each chunk maintains + * a maximum contiguous area size hint which is guaranteed to be equal + * to or larger than the maximum contiguous area in the chunk. This + * helps prevent the allocator from iterating over chunks unnecessarily. + * * To use this allocator, arch code should do the following: * * - define __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr() to translate @@ -1842,6 +1847,7 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info( */ min_unit_size = max_t(size_t, size_sum, PCPU_MIN_UNIT_SIZE); + /* determine the maximum # of units that can fit in an allocation */ alloc_size = roundup(min_unit_size, atom_size); upa = alloc_size / min_unit_size; while (alloc_size % upa || (offset_in_page(alloc_size / upa))) @@ -1868,9 +1874,9 @@ static struct pcpu_alloc_info * __init pcpu_build_alloc_info( } /* - * Expand unit size until address space usage goes over 75% - * and then as much as possible without using more address - * space. + * Wasted space is caused by a ratio imbalance of upa to group_cnt. + * Expand the unit_size until we use >= 75% of the units allocated. + * Related to atom_size, which could be much larger than the unit_size. */ last_allocs = INT_MAX; for (upa = max_upa; upa; upa--) { -- 2.9.3