Received: by 10.223.185.116 with SMTP id b49csp3988775wrg; Tue, 6 Mar 2018 08:07:34 -0800 (PST) X-Google-Smtp-Source: AG47ELtCtUFwIOAAN6+8Xb7gtl1EJfLZojdGy1lnb1KlqKTmjPcy7qDl/BTwe3LzFOX68SB1OwOy X-Received: by 2002:a17:902:8501:: with SMTP id bj1-v6mr17640753plb.110.1520352454102; Tue, 06 Mar 2018 08:07:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520352454; cv=none; d=google.com; s=arc-20160816; b=GpPRQQOgbqe5zSsBNVvp+w6s7Nc+ZfuyNXZmqAZBYthnVZd1u+6PSX1eeTSHTQxIol uo/gkt4uSNOW7iLv0jazMhI8AaFHRoUCHB30YEabdpDEGdreSEu1JTtYnDiRnSxaya7I w2025ut6oGU4iQ7/a1ubKglSvxeJCYKQCcc4cGBk4JGIYGqqkD2WYQBvq/BrGkdxrtDG JkDHkhJhoiXzghnObNkXUnYBq+p2kIy/h5wUng03rLUE70X5lzpkGkWsZKb5q1fs4uGG SiMaLBmcoBstBYH7brN4Y7xGQV1Vjs6bwJU1IfpXYviXblLF3o86t6vLuiZnqXQ8D67q 0AJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=YDTJC6WUKwAY53JzA8e8qG8563JPhMdYYc5N5vfgN5g=; b=JWtyq5JZn1hCEtFXDa4s66oG/Y+OzgKO9jeXsGcjOe/Z+D61ZK3WY7YHHnlSixlvyr OgaV6Zel87IyBSGBhQbebdvWW87O3uZC1hV+c0fZDDQ+yaybpT5PGPFrtbtUcFgLc+Jy 0TiXDscL3b0tN7TQexs99tfcetVpO6BJL509A/U0WMcrGoPMPXCbFu4Y0YQ4SxMJo0MN oR/LLgDlkqvkJm6N+eS2hwNBwRQFCsHFUiX9o0oRnbwdTodRkyIM8ZN2AAJdgtjLWfhS 3tNoOq4/M7evuMMrgP087t9u4TSh1YdiaHeEwOg6YKI99VT8x20HgapK/FjlymQpViNN MQtA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c10si10029046pgf.230.2018.03.06.08.07.16; Tue, 06 Mar 2018 08:07:34 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753559AbeCFQFj (ORCPT + 99 others); Tue, 6 Mar 2018 11:05:39 -0500 Received: from lhrrgout.huawei.com ([194.213.3.17]:28466 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750838AbeCFQFi (ORCPT ); Tue, 6 Mar 2018 11:05:38 -0500 Received: from LHREML713-CAH.china.huawei.com (unknown [172.18.7.107]) by Forcepoint Email with ESMTP id EB6A0ACCEC7CB; Tue, 6 Mar 2018 16:05:30 +0000 (GMT) Received: from [10.202.185.64] (10.202.185.64) by smtpsuk.huawei.com (10.201.108.36) with Microsoft SMTP Server (TLS) id 14.3.382.0; Tue, 6 Mar 2018 16:05:21 +0000 Subject: Re: [PATCH 1/7] genalloc: track beginning of allocations To: Matthew Wilcox CC: , , , , , , , References: <20180228200620.30026-1-igor.stoppa@huawei.com> <20180228200620.30026-2-igor.stoppa@huawei.com> <20180306141047.GB13722@bombadil.infradead.org> From: Igor Stoppa Message-ID: <6d27845d-a8f3-607b-1b6b-8464de65162c@huawei.com> Date: Tue, 6 Mar 2018 18:05:20 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180306141047.GB13722@bombadil.infradead.org> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.202.185.64] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/03/2018 16:10, Matthew Wilcox wrote: > On Wed, Feb 28, 2018 at 10:06:14PM +0200, Igor Stoppa wrote: >> + * Encoding of the bitmap tracking the allocations >> + * ----------------------------------------------- >> + * >> + * The bitmap is composed of units of allocations. >> + * >> + * Each unit of allocation is represented using 2 consecutive bits. >> + * >> + * This makes it possible to encode, for each unit of allocation, >> + * information about: >> + * - allocation status (busy/free) >> + * - beginning of a sequennce of allocation units (first / successive) >> + * >> + * >> + * Dictionary of allocation units (msb to the left, lsb to the right): >> + * >> + * 11: first allocation unit in the allocation >> + * 10: any subsequent allocation unit (if any) in the allocation >> + * 00: available allocation unit >> + * 01: invalid >> + * >> + * Example, using the same notation as above - MSb.......LSb: >> + * >> + * ...000010111100000010101011 <-- Read in this direction. >> + * \__|\__|\|\____|\______| >> + * | | | | \___ 4 used allocation units >> + * | | | \___________ 3 empty allocation units >> + * | | \_________________ 1 used allocation unit >> + * | \___________________ 2 used allocation units >> + * \_______________________ 2 empty allocation units >> + * >> + * The encoding allows for lockless operations, such as: >> + * - search for a sufficiently large range of allocation units >> + * - reservation of a selected range of allocation units >> + * - release of a specific allocation >> + * >> + * The alignment at which to perform the research for sequence of empty >> + * allocation units (marked as zeros in the bitmap) is 2^1. >> + * >> + * This means that an allocation can start only at even places >> + * (bit 0, bit 2, etc.) in the bitmap. >> + * >> + * Therefore, the number of zeroes to look for must be twice the number >> + * of desired allocation units. >> + * >> + * When it's time to free the memory associated to an allocation request, >> + * it's a matter of checking if the corresponding allocation unit is >> + * really the beginning of an allocation (both bits are set to 1). >> + * >> + * Looking for the ending can also be performed locklessly. >> + * It's sufficient to identify the first mapped allocation unit >> + * that is represented either as free (00) or busy (11). >> + * Even if the allocation status should change in the meanwhile, it >> + * doesn't matter, since it can only transition between free (00) and >> + * first-allocated (11). > > This seems unnecessarily complicated. TBH it seemed to me a natural extension of the existing encoding :-) > Why not handle it like this: > > - Double the bitmap in size (as you have done) but > - The first half of the bits are unchanged from the existing implementation > - The second half of the bits are used for determining the length Wouldn't that mean a less tight loop and less localized data? The implementation from this patch does not have to jump elsewhere, when (un)marking the allocation units and the start. > On allocation, you look for a sufficiently-large string of 0 bits in > the first-half. When you find it, you set all of them to 1, and set one > bit in the second-half to indicate where the tail of the allocation is > (you might actually want to use an rbtree or something to handle this ... > using all these bits seems pretty inefficient). 1 bit maps to 1 unit of allocation, which is very seldom 1 byte. For pmalloc use, I expect that the average allocation is likely to be 2-4 units, where 1 unit equals either a 32 or 64 bits word. So it's probably likely that for every couple of allocation units, one is marked as start-of-allocation. In other cases where genalloc is used, like the tracking of uncached pages, 1 unit of allocation equals to 1 page. I would expect the rbtree to end up generating a far larger footprint. For the same reasons, since the bitmap is implemented using unsigned longs, chances are high that one allocation will fit in one bitmap "word", which means that if the "beginning" bit and the "occupied" bit are adjacent, one write is sufficient. In the case you describe, it would be almost always at least 2. I do not have factual evidence to back my reasoning, but it seems more likely to be the case, from my analysis of data types that could belong to pools (both existing users of genalloc and my experiments with SELinux data structures and pmalloc). Even in the XFS case, if I understood correctly, it was about protecting 1 or 2 pages at a time, which seems to fit what I have empirically observed. What makes you think otherwise? -- igor