Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1526617rwd; Thu, 18 May 2023 13:16:43 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7FSp98LxltY7jtnfXUlw512ddIwT3KorWvjD6sUeeYdHxe2J7v3APWPzeY7a8Tf45cZ2wu X-Received: by 2002:a05:6a00:98c:b0:646:59e4:94eb with SMTP id u12-20020a056a00098c00b0064659e494ebmr12297pfg.15.1684441003133; Thu, 18 May 2023 13:16:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684441003; cv=none; d=google.com; s=arc-20160816; b=sOoabWQHgKmv8dRGBk7ipgzelCtG76mHU/9AQHAv16RfSJG7xWLDh2+xD3/Jv9mgvB Y+fxyYsSHGvZ+Sd4MvXRbyzznNh87c4SooOi1hKIuxNeWmNF2i0GDAsrCHLm2tVLGorA 3SnQw3M46pw895BtuwHYEO5X45tqHvoV2wcC1xzfveCsIMKRTjrdfra5M/Sy+tp3Myk3 P2G5GNx0cKGxMbbs8D4Wu84Nk7alrGFfIfTKNF2VH758ZIe1o+fB2zNUs+CJQt+vC2do 9CU0cOp27ZZBh/Cao70s11DJy85bwb6ots05Qi7uBWOOi2MraummCF1Cfe4oU2w+GCuV wSsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=s5akZbkqIYokP0UfzU6damnPIQ9kf68gM2KCOhjo17k=; b=0oWmlQrneTqxhDpfE5HdGybDziCwM4b16jYpi9zAmzTZiHMn+6lFLe8N0YHSEIKydE toxZT7SaaI16nHXjk3U6rgbc6Y6FZOI++8rEwDYNC0X3xtZUcsaU32odFNYGA1R+RHGu IpSAOx9zHk1nQyC45KG/yFXzp8LqbJrCqKDLQqOPHGsxqjMA6WpCv9TCL6Cl1FqXsYhm /vkZgAxFxxONI3kmobT7T8J+T29pKxZERVPJBT/6WeeaLcmRwM8pCv830t5aTarymqML g1qgcNVPEgDhz446mSH3VvlsGLPxD03y+MwEHyyQn+yChSwUbEj2s1YpS1WC0MvO2os4 AnVw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=oJGyPRK5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s13-20020aa78bcd000000b0063f032d78efsi2503985pfd.269.2023.05.18.13.16.28; Thu, 18 May 2023 13:16:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=oJGyPRK5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229650AbjERUEd (ORCPT + 99 others); Thu, 18 May 2023 16:04:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230375AbjERUEL (ORCPT ); Thu, 18 May 2023 16:04:11 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C0211172D for ; Thu, 18 May 2023 13:03:43 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 3B05B651F0 for ; Thu, 18 May 2023 20:03:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9B99CC433EF for ; Thu, 18 May 2023 20:03:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1684440222; bh=nKITgJy7liqGSAvU1qV/vaHdz2NbzxKAJNfHBMK/A6I=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=oJGyPRK5eH+do4d5a06JYGcoZr+p2ZGGjmK2JPVjQF01jinGz9WidVHlZJQOxtHCe 1BLbheNVoVv9+vdGFC3ag8vqOQ8na/QlTVort+sU0r4gIdghZgYy04Q8hQ2a8h05RL 8qcD0BeXuBkwYUzImXnVh90Yk7+b5hopu7w8HN/ABbzqpsztrZgB4YGK+rW051vIBP lzNqHJxBU1q/r/dGS6TBe5FZ4W6B1TZV2pmq8v9xmyH3ddGIyXdyk1fnvgRwaoyIGU nKG35Ak2bLdM9KzkNopyEUmWJ7QVtYI2XUr44KCtnbTKL3Ihfy+07/zJ0XV2h9Momp G+BnSHzTgrxPw== Received: by mail-lj1-f169.google.com with SMTP id 38308e7fff4ca-2af1ae3a21fso9902681fa.0 for ; Thu, 18 May 2023 13:03:42 -0700 (PDT) X-Gm-Message-State: AC+VfDy9IrAxTP1SKlzVjfWDtOd4+7DBrA+1g7Sq2iIGrfiWqBAMlPBC yzg1h0c/PqK2/IR3TWcrWpMAQQDWUDwpc/HGmOE= X-Received: by 2002:a19:ad02:0:b0:4f3:7b3c:2e16 with SMTP id t2-20020a19ad02000000b004f37b3c2e16mr35252lfc.39.1684440220668; Thu, 18 May 2023 13:03:40 -0700 (PDT) MIME-Version: 1.0 References: <20230308094106.227365-1-rppt@kernel.org> <20230308094106.227365-2-rppt@kernel.org> <20230518152354.GD4967@kernel.org> In-Reply-To: From: Song Liu Date: Thu, 18 May 2023 13:03:28 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH 1/5] mm: intorduce __GFP_UNMAPPED and unmapped_alloc() To: Kent Overstreet Cc: Mike Rapoport , linux-mm@kvack.org, Andrew Morton , Dave Hansen , Peter Zijlstra , Rick Edgecombe , Thomas Gleixner , Vlastimil Babka , linux-kernel@vger.kernel.org, x86@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 18, 2023 at 12:15=E2=80=AFPM Kent Overstreet wrote: > > On Thu, May 18, 2023 at 12:03:03PM -0700, Song Liu wrote: > > On Thu, May 18, 2023 at 11:47=E2=80=AFAM Song Liu wro= te: > > > > > > On Thu, May 18, 2023 at 10:24=E2=80=AFAM Kent Overstreet > > > wrote: > > > > > > > > On Thu, May 18, 2023 at 10:00:39AM -0700, Song Liu wrote: > > > > > On Thu, May 18, 2023 at 9:48=E2=80=AFAM Kent Overstreet > > > > > wrote: > > > > > > > > > > > > On Thu, May 18, 2023 at 09:33:20AM -0700, Song Liu wrote: > > > > > > > I am working on patches based on the discussion in [1]. I am = planning to > > > > > > > send v1 for review in a week or so. > > > > > > > > > > > > Hey Song, I was reviewing that thread too, > > > > > > > > > > > > Are you taking a different approach based on Thomas's feedback?= I think > > > > > > he had some fair points in that thread. > > > > > > > > > > Yes, the API is based on Thomas's suggestion, like 90% from the d= iscussions. > > > > > > > > > > > > > > > > > My own feeling is that the buddy allocator is our tool for allo= cating > > > > > > larger variable sized physically contiguous allocations, so I'd= like to > > > > > > see something based on that - I think we could do a hybrid budd= y/slab > > > > > > allocator approach, like we have for regular memory allocations= . > > > > > > > > > > I am planning to implement the allocator based on this (reuse > > > > > vmap_area logic): > > > > > > > > Ah, you're still doing vmap_area approach. > > > > > > > > Mike's approach looks like it'll be _much_ lighter weight and highe= r > > > > performance, to me. vmalloc is known to be slow compared to the bud= dy > > > > allocator, and with Mike's approach we're only modifying mappings o= nce > > > > per 2 MB chunk. > > > > > > > > I don't see anything in your code for sub-page sized allocations to= o, so > > > > perhaps I should keep going with my slab allocator. > > > > > > The vmap_area approach handles sub-page allocations. In 5/5 of set [2= ], > > > we showed that multiple BPF programs share the same page with some > > > kernel text (_etext). > > > > > > > Could you share your thoughts on your approach vs. Mike's? I'm newe= r to > > > > this area of the code than you two so maybe there's an angle I've m= issed > > > > :) > > > > > > AFAICT, tree based solution (vmap_area) is more efficient than bitmap > > > based solution. > > Tree based requires quite a bit of overhead for the rbtree pointers, and > additional vmap_area structs. > > With a buddy allocator based approach, there's no additional state that > needs to be allocated, since it all fits in struct page. > > > > First, for 2MiB page with 64B chunk size, we need a bitmap of > > > 2MiB / 64B =3D 32k bit =3D 4k bytes > > > While the tree based solution can adapt to the number of allocations = within > > > This 2MiB page. Also, searching a free range within 4kB of bitmap may > > > actually be slower than searching in the tree. > > > > > > Second, bitmap based solution cannot handle > 2MiB allocation cleanly= , > > > while tree based solution can. For example, if a big driver uses 3MiB= , the > > > tree based allocator can allocate 4MiB for it, and use the rest 1MiB = for > > > smaller allocations. > > We're not talking about a bitmap based solution for >=3D PAGE_SIZE > allocations, the alternative is a buddy allocator - so no searching, > just per power-of-two freelists. > > > > > Missed one: > > > > Third, bitmap based solution requires a "size" parameter in free(). It = is an > > overhead for the user. Tree based solution doesn't have this issue. > > No, we can recover the size of the allocation via compound_order() - > hasn't historically been done for alloc_pages() allocations to avoid > setting up the state in each page for compound head/tail, but it perhaps > should be (and is with folios, which we've generally been switching to). If we use compound_order(), we will round up to power of 2 for all allocations. Does this mean we will use 4MiB for a 2.1MiB allocation? Thanks, Song