Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp4299474rdh; Tue, 28 Nov 2023 18:36:55 -0800 (PST) X-Google-Smtp-Source: AGHT+IFrWHB/2HbEJYfkrMIK6MUwSQ9fknc0jN9KuGwG+trH1nrpYqArGay6+BXbDo/1ukMnLXbh X-Received: by 2002:a05:6a20:6728:b0:18b:8b4:2dde with SMTP id q40-20020a056a20672800b0018b08b42ddemr17301221pzh.61.1701225415069; Tue, 28 Nov 2023 18:36:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701225415; cv=none; d=google.com; s=arc-20160816; b=VEwe/bjW82N0x/Pn16Mztaysct+dxW4o4wBkFFo6uM5ZK4VCjyZ2PaHnSDvq0dO2NI 4/gOUMUHlGAEBTjf9GPBPThkv60+BoArx2gdW0l85TuLwIu1PMn2SfkxIv/6Z7ZbnctV X7z5wOksiRQTi49LAwVp5cwV4s5zG9bp1RWIVhwZ24r5J1krcHhEf/LEj9ptNxcFk+AH 73Yp8kbLt6HXkviODUP8I7Fapq/vItgXYO7k5r9zao339UtPssEJmpOnIjds20rA23Ob QNhhOsJbXfuAFpTtTOiuzknDHTX4t/aNLoR4MrcOol0UEyjPxdDv2GbqbgETq1ZjDNfS Aoyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=DwJB9IE9U0fNkYKAsbTbaHexqgcD2HVrtbLOqf3Tqr4=; fh=5Sf07FejOMhNGUIyUgt8Gx7xwtJyqIRegUzWIZ8NYyA=; b=f1KBnJVcxlECuavqkgkgfWa9i+fbieZNQbqfLvtAyv3AOHIMsyd5vGI0DCP69yuS/i kt4c0ouJ2OPgLgf+kjwi/n6OzHoLT3pE0BRjFTLcrrKT1YAhJirH++cED5la8IAD9Ftn 3Z1urqHBt5x6S0NmtQsCQYLxezCWF0+XITj8FFBy5iYa4MRCBgFZaGW/l5SRDzJxxDSm EHXLlDIw60pgf6iVdY0QTteFi8pbNcBh0H2ZfiiAZQfXRfXebhuca+l/Vrqkhr1RjkP8 tnJlpoHZSxAM04+8aMDbRS1Cda1okdisEjGwneXyYz3lzhDbAWQ8ndZ8R5oxjD07dWvd 05Xg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Wpbf/rbj"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id d35-20020a631d63000000b005c2422cb335si13862075pgm.680.2023.11.28.18.36.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 18:36:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="Wpbf/rbj"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 08C0F807BEC9; Tue, 28 Nov 2023 18:35:49 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230251AbjK2Cfd (ORCPT + 99 others); Tue, 28 Nov 2023 21:35:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51962 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229526AbjK2Cfc (ORCPT ); Tue, 28 Nov 2023 21:35:32 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 237C51727 for ; Tue, 28 Nov 2023 18:35:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1701225338; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DwJB9IE9U0fNkYKAsbTbaHexqgcD2HVrtbLOqf3Tqr4=; b=Wpbf/rbj39f31U+lHQ/zWtzk8A4geEmZTILOt6xKmqb5AHTOLh9m0MDBXN4FhyODgSP9dY EcbrvwiEX+zD2JeW2WTWFFYjoX/os3fyi6CVUOOH4xD2Cndv6baqdNO1dON/DZw6NkDkdw GpeWJKRnx7PIRlvhLg0JhXfovQ4xraM= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-471-2xEdQ1gqNMKemJKqaLEgTQ-1; Tue, 28 Nov 2023 21:35:35 -0500 X-MC-Unique: 2xEdQ1gqNMKemJKqaLEgTQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9FEEF3C02763; Wed, 29 Nov 2023 02:35:34 +0000 (UTC) Received: from [10.22.17.248] (unknown [10.22.17.248]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5AD2F502A; Wed, 29 Nov 2023 02:35:34 +0000 (UTC) Message-ID: <66e526c8-9d06-460b-b5df-92697634106b@redhat.com> Date: Tue, 28 Nov 2023 21:35:34 -0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm/kmemleak: Add cond_resched() to kmemleak_free_percpu() Content-Language: en-US To: Catalin Marinas Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20231127194153.289626-1-longman@redhat.com> From: Waiman Long In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.5 X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Tue, 28 Nov 2023 18:35:49 -0800 (PST) On 11/28/23 11:04, Catalin Marinas wrote: > On Mon, Nov 27, 2023 at 02:41:53PM -0500, Waiman Long wrote: >> /** >> * kmemleak_free_percpu - unregister a previously registered __percpu object >> * @ptr: __percpu pointer to beginning of the object >> * >> * This function is called from the kernel percpu allocator when an object >> - * (memory block) is freed (free_percpu). >> + * (memory block) is freed (free_percpu). Since this function is inherently >> + * slow especially on systems with a large number of CPUs, defer the actual >> + * removal of kmemleak objects associated with the percpu pointer to a >> + * workqueue if it is not in a task context. >> */ >> void __ref kmemleak_free_percpu(const void __percpu *ptr) >> { >> - unsigned int cpu; >> - >> pr_debug("%s(0x%px)\n", __func__, ptr); >> >> - if (kmemleak_free_enabled && ptr && !IS_ERR(ptr)) >> - for_each_possible_cpu(cpu) >> - delete_object_full((unsigned long)per_cpu_ptr(ptr, >> - cpu)); >> + if (!kmemleak_free_enabled || !ptr || IS_ERR(ptr)) >> + return; >> + >> + if (!in_task()) { >> + struct kmemleak_percpu_addr *addr; >> + >> + addr = kzalloc(sizeof(*addr), GFP_ATOMIC); >> + if (addr) { >> + INIT_WORK(&addr->work, kmemleak_free_percpu_workfn); >> + addr->ptr = ptr; >> + queue_work(system_long_wq, &addr->work); >> + return; >> + } > We can't defer this freeing. It can mess up the kmemleak metadata if the > per-cpu pointer is re-allocated before kmemleak removed it from its > object tree. You are right. In fact, it is possible for kmemleak_free_percpu() be called from softIRQ context. And if the system has hundreds of CPUs, it will take a long time to process all the free request. > > The problem is looking up the object tree for each per-cpu offset. We > can make the percpu pointer handling O(1) since freeing is only done by > the main __percpu pointer, so that's the only one needing a look-up. So > far the per-cpu pointers are not tracked for leaking, only scanned. > > We could just add the per_cpu_ptr(ptr, 0) to the kmemleak > object_tree_root but when scanning we don't have an inverse function to > get the __percpu pointer back and calculate the pointers for the other > CPUs (well, we could with some hacks but they are probably fragile). We could keep a separate tree to track the percpu area. We will know the max percpu offset in each percpu area. The base of the percpu area is just per_cpu_ptr(0, cpu). > > What I came up with is a separate object_percpu_tree_root similar to the > object_phys_tree_root. The only reason for these additional trees is to > look up the kmemleak metadata when needed (usually freeing). They don't > contain objects that are tracked for actual leaking, only scanned. A > briefly tested patch below. I need to go through it again, update some > comments and write a commit log: That sounds like a good idea like what I have said above. I will do a more careful review of the change tomorrow as it is getting late for me today. Cheers, Longman