Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp4351230rwb; Mon, 31 Jul 2023 05:47:36 -0700 (PDT) X-Google-Smtp-Source: APBJJlEmgnRSnVLJLW1VImETQr9HFHfLrG/jJnv2/t46AYR3An13lIWUZBWNfrKfRp/fwuSjNEVM X-Received: by 2002:ac2:5e2b:0:b0:4fb:5dd5:715c with SMTP id o11-20020ac25e2b000000b004fb5dd5715cmr4870935lfg.4.1690807655678; Mon, 31 Jul 2023 05:47:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690807655; cv=none; d=google.com; s=arc-20160816; b=QC47nET1Gg/FAtUBxlKAbBYSQgIu+rNPMxIAKEHAcbe5qoSmqMlULTVONm270VTieu UqIVl51tyN0XR9POXbBrEO1TO/lXn3wYsKK/FOQWn8/JcdJsde6MHEMEIEdYM61UP6O+ 468tk3iuzlQkLR7HAvndwF07dQD634ad2AdLz/6h2kh0R3d3vbr65norXJexysVtFegu wRUUQghuujcvpGFCfDNvmIyjnsmb69nKeXxf/xPi6qmbwt+wkJLtH/fbEHI6xjgNFFwS DImv0RJKf6JZT2nA3jKgHTfSHrpQysgffdUbg2T8Ny4o3O6PAei3CvJEtkGxk5S4AInK ndHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:cc:from :references:to:subject:user-agent:mime-version:date:message-id :dkim-signature; bh=n33i2l+o13R2vXkTMdoXjMS5PBDW3Nc5tow2fit7wXw=; fh=7VtDrkjTTrdPZwRCpxrwqxFu8JYdU2NGr3bJLwEafEY=; b=fOZ2And821vLl/oRPxEi3UajIjsGiG3gb6La9Eb/M/LBM3pxBMxTXAsgUd1CjiuG/9 Y3LYWae3KUHnFYsxPNAxZtPRHtrija0XMPpG6lU29uQAKaMGn8mqRlT+Bl733dbTTsz+ itqXaiCn+y5qlwcigdEJYiQT2LdGKGy0zn/P9opld4tpXAiisk4sZ/bLzkcrtJUp0KOc ifSg4di+0NMc3m411aLvKrw0fxNxZIpd/ajLIctov1/0EEJyoaaIMrmY1yOJYi11EIhZ dg5JGfELx3FNv8Y0fKxJsVAc0teegL64k9W2Ve7r+WYgdy0PsYfIqt2BdsJ2Jew4Eof9 KiyQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=K6GPcQAx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g9-20020a1709061c8900b009925144d755si1973116ejh.461.2023.07.31.05.47.10; Mon, 31 Jul 2023 05:47:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=K6GPcQAx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230133AbjGaMZf (ORCPT + 99 others); Mon, 31 Jul 2023 08:25:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46964 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232706AbjGaMZR (ORCPT ); Mon, 31 Jul 2023 08:25:17 -0400 Received: from mail-pf1-x42f.google.com (mail-pf1-x42f.google.com [IPv6:2607:f8b0:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B18319AB for ; Mon, 31 Jul 2023 05:24:35 -0700 (PDT) Received: by mail-pf1-x42f.google.com with SMTP id d2e1a72fcca58-686efb9ee3cso4232977b3a.3 for ; Mon, 31 Jul 2023 05:24:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690806275; x=1691411075; h=content-transfer-encoding:in-reply-to:cc:from:references:to:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=n33i2l+o13R2vXkTMdoXjMS5PBDW3Nc5tow2fit7wXw=; b=K6GPcQAxCt1vZzjj5YetA6l5koeAHwxlWkzP07rifIlXI502MaLVpwE/AyfjICJ6mS LvLbBpX160QFnZ0voELDsIi71o5vomh39WjAEwINwscs7Und+M4Gl/vm5CFpbXG7U52b wm+PC/JqkBZzyjLc/B4BuCb5EpPihhAFtIIEu3laGBI4PmV2NOtXojtIXiWPPiMFNUR2 p3yL08t3+JrqnXJIsDMxvavpQeplUZjhPacMtSme5Fuvo1jkjJiDeCqGBiDOLIZRrX4P Syt2whJ0qVwEz8cEUZLVEqETAFZp25dRSmf1JHROH3XZztTAnvZ0CEkhqB0m6YLJz6YD GVCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690806275; x=1691411075; h=content-transfer-encoding:in-reply-to:cc:from:references:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=n33i2l+o13R2vXkTMdoXjMS5PBDW3Nc5tow2fit7wXw=; b=kNHtDaH4x3O9wpxsEzJAVCblmhVthXSjLAG+WBnR85oCvUxMcXk6lEmfSV+uAQGXuc hza3F8A0zewZBjknAauash4AbPzDHeP3Z5jICVS5qC1iCaP3hSwccYN5lNgZCsAbBFd6 1R4cPH7Duoxc8sp5FhgsFWDWm/e/5Nm6FiqvpIIsSiUImIH2Iv+0T0ARxR3RhD1isKXI 1s7QZ3bsDxT5r24GOyX0f5vufut7xcu7MFrVeaKsOo2f6tcwyMRkgrZV9P+ohm6KMxRi APf6yR5ni1cC5wwqAssQhFr1DhyEfSh3PycSGxv9i4A+ZhitQK3AtSu1kSbJy9GpAW29 oKRw== X-Gm-Message-State: ABy/qLZqEagzw/U/I2spls5z6HSw4wxvuvknYASQ22JuQ9cE3L7fkRXQ QGgcCeSVGi6BPQTpzSWgkqU7vA== X-Received: by 2002:a05:6a00:9a2:b0:676:76ea:e992 with SMTP id u34-20020a056a0009a200b0067676eae992mr10589145pfg.5.1690806274821; Mon, 31 Jul 2023 05:24:34 -0700 (PDT) Received: from [10.90.34.137] ([203.208.167.147]) by smtp.gmail.com with ESMTPSA id bn9-20020a056a00324900b0064398fe3451sm7465987pfb.217.2023.07.31.05.24.29 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 31 Jul 2023 05:24:34 -0700 (PDT) Message-ID: Date: Mon, 31 Jul 2023 20:24:26 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.1 Subject: Re: [PATCH 04/11] maple_tree: Introduce interfaces __mt_dup() and mt_dup() To: "Liam R. Howlett" References: <20230726080916.17454-1-zhangpeng.00@bytedance.com> <20230726080916.17454-5-zhangpeng.00@bytedance.com> <20230726160354.konsgq6hidj7gr5u@revolver> From: Peng Zhang Cc: Peng Zhang , willy@infradead.org, michael.christie@oracle.com, surenb@google.com, npiggin@gmail.com, corbet@lwn.net, mathieu.desnoyers@efficios.com, avagin@gmail.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, brauner@kernel.org, peterz@infradead.org In-Reply-To: <20230726160354.konsgq6hidj7gr5u@revolver> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2023/7/27 00:03, Liam R. Howlett 写道: > * Peng Zhang [230726 04:10]: >> Introduce interfaces __mt_dup() and mt_dup(), which are used to >> duplicate a maple tree. Compared with traversing the source tree and >> reinserting entry by entry in the new tree, it has better performance. >> The difference between __mt_dup() and mt_dup() is that mt_dup() holds >> an internal lock. >> >> Signed-off-by: Peng Zhang >> --- >> include/linux/maple_tree.h | 3 + >> lib/maple_tree.c | 211 +++++++++++++++++++++++++++++++++++++ >> 2 files changed, 214 insertions(+) >> >> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h >> index c962af188681..229fe78e4c89 100644 >> --- a/include/linux/maple_tree.h >> +++ b/include/linux/maple_tree.h >> @@ -327,6 +327,9 @@ int mtree_store(struct maple_tree *mt, unsigned long index, >> void *entry, gfp_t gfp); >> void *mtree_erase(struct maple_tree *mt, unsigned long index); >> >> +int mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); >> +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); >> + >> void mtree_destroy(struct maple_tree *mt); >> void __mt_destroy(struct maple_tree *mt); >> >> diff --git a/lib/maple_tree.c b/lib/maple_tree.c >> index da3a2fb405c0..efac6761ae37 100644 >> --- a/lib/maple_tree.c >> +++ b/lib/maple_tree.c >> @@ -6595,6 +6595,217 @@ void *mtree_erase(struct maple_tree *mt, unsigned long index) >> } >> EXPORT_SYMBOL(mtree_erase); >> >> +/* >> + * mt_dup_free() - Free the nodes of a incomplete maple tree. >> + * @mt: The incomplete maple tree >> + * @node: Free nodes from @node >> + * >> + * This function frees all nodes starting from @node in the reverse order of >> + * mt_dup_build(). At this point we don't need to hold the source tree lock. >> + */ >> +static void mt_dup_free(struct maple_tree *mt, struct maple_node *node) >> +{ >> + void **slots; >> + unsigned char offset; >> + struct maple_enode *enode; >> + enum maple_type type; >> + unsigned char count = 0, i; >> + > > Can we make these labels inline functions and try to make this a loop? I did this just to make things easier. Refer to the implementation of walk_tg_tree_from() in sched/core.c. Using some loops and inline functions probably doesn't simplify things. I'll try to do that and give up if it complicates things. > >> +try_ascend: >> + if (ma_is_root(node)) { >> + mt_free_one(node); >> + return; >> + } >> + >> + offset = ma_parent_slot(node); >> + type = ma_parent_type(mt, node); >> + node = ma_parent(node); >> + if (!offset) >> + goto free; >> + >> + offset--; >> + >> +descend: >> + slots = (void **)ma_slots(node, type); >> + enode = slots[offset]; >> + if (mte_is_leaf(enode)) >> + goto free; >> + >> + type = mte_node_type(enode); >> + node = mte_to_node(enode); >> + offset = ma_nonleaf_data_end_nocheck(node, type); >> + goto descend; >> + >> +free: >> + slots = (void **)ma_slots(node, type); >> + count = ma_nonleaf_data_end_nocheck(node, type) + 1; >> + for (i = 0; i < count; i++) >> + ((unsigned long *)slots)[i] &= ~MAPLE_NODE_MASK; >> + >> + /* Cast to __rcu to avoid sparse checker complaining. */ >> + mt_free_bulk(count, (void __rcu **)slots); >> + goto try_ascend; >> +} >> + >> +/* >> + * mt_dup_build() - Build a new maple tree from a source tree >> + * @mt: The source maple tree to copy from >> + * @new: The new maple tree >> + * @gfp: The GFP_FLAGS to use for allocations >> + * @to_free: Free nodes starting from @to_free if the build fails >> + * >> + * This function builds a new tree in DFS preorder. If it fails due to memory >> + * allocation, @to_free will store the last failed node to free the incomplete >> + * tree. Use mt_dup_free() to free nodes. >> + * >> + * Return: 0 on success, -ENOMEM if memory could not be allocated. >> + */ >> +static inline int mt_dup_build(struct maple_tree *mt, struct maple_tree *new, >> + gfp_t gfp, struct maple_node **to_free) > > I am trying to change the functions to be two tabs of indent for > arguments from now on. It allows for more to fit on a single line and > still maintains a clear separation between code and argument lists. I'm not too concerned about code formatting. . . At least in this patchset. > >> +{ >> + struct maple_enode *enode; >> + struct maple_node *new_node, *new_parent = NULL, *node; >> + enum maple_type type; >> + void __rcu **slots; >> + void **new_slots; >> + unsigned char count, request, i, offset; >> + unsigned long *set_parent; >> + unsigned long new_root; >> + >> + mt_init_flags(new, mt->ma_flags); >> + enode = mt_root_locked(mt); >> + if (unlikely(!xa_is_node(enode))) { >> + rcu_assign_pointer(new->ma_root, enode); >> + return 0; >> + } >> + >> + new_node = mt_alloc_one(gfp); >> + if (!new_node) >> + return -ENOMEM; >> + >> + new_root = (unsigned long)new_node; >> + new_root |= (unsigned long)enode & MAPLE_NODE_MASK; >> + >> +copy_node: > > Can you make copy_node, descend, ascend inline functions instead of the > goto jumping please? It's better to have loops over jumping around a > lot. Gotos are good for undoing things and retry, but constructing > loops with them makes it difficult to follow. Same as above. > >> + node = mte_to_node(enode); >> + type = mte_node_type(enode); >> + memcpy(new_node, node, sizeof(struct maple_node)); >> + >> + set_parent = (unsigned long *)&(new_node->parent); >> + *set_parent &= MAPLE_NODE_MASK; >> + *set_parent |= (unsigned long)new_parent; > > Maybe make a small inline to set the parent instead of this? > > There are some defined helpers for setting the types like > ma_parent_ptr() and ma_enode_ptr() to make casting more type-safe. Ok, I'll try to do that. > >> + if (ma_is_leaf(type)) >> + goto ascend; >> + >> + new_slots = (void **)ma_slots(new_node, type); >> + slots = ma_slots(node, type); >> + request = ma_nonleaf_data_end(mt, node, type) + 1; >> + count = mt_alloc_bulk(gfp, request, new_slots); >> + if (!count) { >> + *to_free = new_node; >> + return -ENOMEM; >> + } >> + >> + for (i = 0; i < count; i++) >> + ((unsigned long *)new_slots)[i] |= >> + ((unsigned long)mt_slot_locked(mt, slots, i) & >> + MAPLE_NODE_MASK); >> + offset = 0; >> + >> +descend: >> + new_parent = new_node; >> + enode = mt_slot_locked(mt, slots, offset); >> + new_node = mte_to_node(new_slots[offset]); >> + goto copy_node; >> + >> +ascend: >> + if (ma_is_root(node)) { >> + new_node = mte_to_node((void *)new_root); >> + new_node->parent = ma_parent_ptr((unsigned long)new | >> + MA_ROOT_PARENT); >> + rcu_assign_pointer(new->ma_root, (void *)new_root); >> + return 0; >> + } >> + >> + offset = ma_parent_slot(node); >> + type = ma_parent_type(mt, node); >> + node = ma_parent(node); >> + new_node = ma_parent(new_node); >> + if (offset < ma_nonleaf_data_end(mt, node, type)) { >> + offset++; >> + new_slots = (void **)ma_slots(new_node, type); >> + slots = ma_slots(node, type); >> + goto descend; >> + } >> + >> + goto ascend; >> +} >> + >> +/** >> + * __mt_dup(): Duplicate a maple tree >> + * @mt: The source maple tree >> + * @new: The new maple tree >> + * @gfp: The GFP_FLAGS to use for allocations >> + * >> + * This function duplicates a maple tree using a faster method than traversing >> + * the source tree and inserting entries into the new tree one by one. The user >> + * needs to lock the source tree manually. Before calling this function, @new >> + * must be an empty tree or an uninitialized tree. If @mt uses an external lock, >> + * we may also need to manually set @new's external lock using >> + * mt_set_external_lock(). >> + * >> + * Return: 0 on success, -ENOMEM if memory could not be allocated. >> + */ >> +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) > > We use mas_ for things that won't handle the locking and pass in a maple > state. Considering the leaves need to be altered once this is returned, > I would expect passing in a maple state should be feasible? But we don't really need mas here. What do you think the state of mas should be when this function returns? Make it point to the first entry, or the last entry? > >> +{ >> + int ret; >> + struct maple_node *to_free = NULL; >> + >> + ret = mt_dup_build(mt, new, gfp, &to_free); >> + >> + if (unlikely(ret == -ENOMEM)) { > > On other errors, will the half constructed tree be returned? Is this > safe? Of course, mt_dup_free() is carefully designed to handle this. > >> + if (to_free) >> + mt_dup_free(new, to_free); >> + } >> + >> + return ret; >> +} >> +EXPORT_SYMBOL(__mt_dup); >> + >> +/** >> + * mt_dup(): Duplicate a maple tree >> + * @mt: The source maple tree >> + * @new: The new maple tree >> + * @gfp: The GFP_FLAGS to use for allocations >> + * >> + * This function duplicates a maple tree using a faster method than traversing >> + * the source tree and inserting entries into the new tree one by one. The >> + * function will lock the source tree with an internal lock, and the user does >> + * not need to manually handle the lock. Before calling this function, @new must >> + * be an empty tree or an uninitialized tree. If @mt uses an external lock, we >> + * may also need to manually set @new's external lock using >> + * mt_set_external_lock(). >> + * >> + * Return: 0 on success, -ENOMEM if memory could not be allocated. >> + */ >> +int mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) > > mtree_ ususually used to indicate locking is handled. Before unifying mtree_* and mt_*, I don't think I can see any difference between them. At least mt_set_in_rcu() and mt_clear_in_rcu() will hold the lock. > >> +{ >> + int ret; >> + struct maple_node *to_free = NULL; >> + >> + mtree_lock(mt); >> + ret = mt_dup_build(mt, new, gfp, &to_free); >> + mtree_unlock(mt); >> + >> + if (unlikely(ret == -ENOMEM)) { >> + if (to_free) >> + mt_dup_free(new, to_free); > > Again, is a half constructed tree safe to return? Since each caller > checks to_free is NULL, could that be in mt_dup_free() instead? Yes, this check can be put in mt_dup_free(). > >> + } >> + >> + return ret; >> +} >> +EXPORT_SYMBOL(mt_dup); >> + >> /** >> * __mt_destroy() - Walk and free all nodes of a locked maple tree. >> * @mt: The maple tree >> -- >> 2.20.1 >> >>