Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp1252761pxj; Fri, 18 Jun 2021 03:06:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxzH+hbDhGN7WIisYkf9TLy+no1aRZKK6b1H3EpzlWgt9mToqKJi7tPmYG/QiuHP3w0zs0k X-Received: by 2002:a05:6402:51:: with SMTP id f17mr3761573edu.359.1624010760576; Fri, 18 Jun 2021 03:06:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624010760; cv=none; d=google.com; s=arc-20160816; b=go7dGW93Ad1C6z196nbd9ogZca43z0olC+dNnjcdQnRIb+TiDYCD0RZN6C+v4hi75m xOs2PGlazx8UWbMW6olRhSMv2EtIFBAmvwj1q2j6AY186aXaB0EOyhkAqZpwZs6F2EsE +FJu/VDexmJkoIYlZfZg2oRcGzJrjQpSUCqJ7SI01kPfZtFbpOlE75ildnF/H8vZNTqL mh2hGTDpMm//HiL+J1rbmFjHQjdm1ghvL0EdpewTd3g3czyAiidLMeV49+0Ax3J9vUXO obI7IFHXiA/0koPBchTIpmpakQt71qM6KiF3tyjIdiqEFUMk4aSXLJQfUGg83v5L0Vga S1RA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=ZWdhnm5vDGxSTwGrEAnS5oQkzBaIx+HjqYEBbEhXdB4=; b=k2+kYiMIN0SK8namw0jJ2udwbq/DF+Ye2Of4UW4/rjI/GuckZBpvcogolVnrI4KREo UfS8PN2iHk+xFZ9gISgY8H6rrHu47mdYNgQu0apXFrT/UjRtZORC2j0WOly2Y1t8N6Yk sxCv7bDV8B+lLkSomIoGa/7FyZnXdprDaDusoQhcjAXvEags0e5AMs5QYi25n2SJLAG7 /nq91gO4cSvVnNuNwWSNA7E6yh1VC9W/OdkteMKLXOG/CHqYmWQ+Gr7noSrDnXKfDjXG jHsIg6lFpYd1A0ref4pIC2xHrGU3cpAy7P543wUWODHaqckOvuXZdD9G/IucXUzwDPie UaqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=GuJplkfC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g21si8373813edb.13.2021.06.18.03.05.36; Fri, 18 Jun 2021 03:06:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=GuJplkfC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231465AbhFRKD4 (ORCPT + 99 others); Fri, 18 Jun 2021 06:03:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41430 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232792AbhFRKDq (ORCPT ); Fri, 18 Jun 2021 06:03:46 -0400 Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 651D8C061574; Fri, 18 Jun 2021 03:01:32 -0700 (PDT) Received: by mail-ed1-x534.google.com with SMTP id i13so7834773edb.9; Fri, 18 Jun 2021 03:01:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:subject:from:to:cc:date:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=ZWdhnm5vDGxSTwGrEAnS5oQkzBaIx+HjqYEBbEhXdB4=; b=GuJplkfChAmI3RkGHuxkbjYR20jdrmahoJ7VjXV556/0O16AEz1DwguHfzTbsSi58+ cA81CxgEWsohbp8gx8jPNpI8eXmvvx2MgZl3SxiKl1XbIjyeXgEVqHvKPVVI+MCzMAut Bnx1qUT/Uf582+qR3+HZGfRE8fsiEylLdgXLTLMi/sxzre9EJP8ofeL1UvForHvpmc12 o+FwxS7BGyZmAMT3Pe5reI/HgyDlNmRua/NpzEPUmaWWE5bwaWe4xgFGE3xMyLDqdVOm HUSum0i/yLrCs2e0bdZA5zrrHKDpE60LRXPJiEMtv7BM+avYr9J8HqZdqVra61I1DqB9 SljQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=ZWdhnm5vDGxSTwGrEAnS5oQkzBaIx+HjqYEBbEhXdB4=; b=Zi3b3fLI6qknlXEw8Ni7YJiolfZa3kMC9Nil1fiFcgeMPKupbnzsjlXiOuZWAzhmuq iWexJxu1qe5W7CInXf8Tk+s2v5CKiD6cnTaypba32j576o3S/6MnabHRvyeLryIuBTQz Z+K9JqIzyEToFfMUUPdGRlu6isnYkV85Bs71prROUu03ekNFmyKH+mFx0GE3P62NAKPD CzKwQeW0ESDUI1z77Ib+zDnbFoCuiIDDnIH83wsE7Rn4vid4kkFbpUJWYmqq6LrbSLlP iSIsGre2usgXpu11rYUukNFxSoHJz0wI6ziy+S2j2w1/UF9H/FzuNt2F4y8hFylX0jG0 P1ew== X-Gm-Message-State: AOAM53281YldytoKlWX1Hi2ZpwJgWmpocPQuLGa5yjgKCz4+H91G7Od1 tdpGiu3Sqt6Prjkeog3cAXw= X-Received: by 2002:a05:6402:22a1:: with SMTP id cx1mr1399725edb.338.1624010490862; Fri, 18 Jun 2021 03:01:30 -0700 (PDT) Received: from [192.168.1.15] ([151.29.187.239]) by smtp.gmail.com with ESMTPSA id m18sm842797ejx.56.2021.06.18.03.01.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Jun 2021 03:01:30 -0700 (PDT) Message-ID: Subject: Re: [PATCH] perf ksymbol: fix memory leak: decrease refcount of map and dso From: Riccardo Mancini To: Arnaldo Carvalho de Melo Cc: Ian Rogers , Namhyung Kim , Peter Zijlstra , Ingo Molnar , Mark Rutland , Alexander Shishkin , Jiri Olsa , Adrian Hunter , Andi Kleen , Tommi Rantala , linux-perf-users , LKML Date: Fri, 18 Jun 2021 12:01:28 +0200 In-Reply-To: References: <20210602231052.317048-1-rickyman7@gmail.com> <3b8c7c2c5de492c7fbf86df73c43cdb0fbb453df.camel@gmail.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.40.2 (3.40.2-1.fc34) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On Fri, 2021-06-04 at 15:29 -0300, Arnaldo Carvalho de Melo wrote:   > > > But looking at this code now I realize that maps__find() should grab a > > > refcount for the map it returns, because in this > > > machine__process_ksymbol_register() function we use reference that 'map' > > > after the if block, i.e. we use it if it came from maps__find() or if we > > > created it machine__process_ksymbol_register, so there is a possible > > > race where other thread removes it from the list and map__put()s it > > > ending up in map__delete() while we still use it in > > > machine__process_ksymbol_register(), right? > > > > Agree. It should be placed before up_read to avoid races, right? > > Yes, we have to grab a refcount while we are sure its not going away, > then return that as the lookup result, whoever receives that refcounted > entry should use it and then drop the refcount. > > > Then we would need to see where it's called and add the appropriate > > map__put. > > yes This function has quite a number of callers (direct and indirect) so the the patch is becoming huge. One of these callers is thread__find_map, which returns an addr_location (actually it's an output argument). This addr_location holds references to map, maps and thread without getting any refcnt (actually in one function it gets it on the thread and a comment tells to put it once done). If I'm not wrong, this addr_location is never malloced (always a local variable) and, is should be present in parts of the code where there should be a refcnt on the thread. Therefore, maybe it does not get the refcnts since it assumes that thread (upon which depends maps and as a consequence map) is always refcnted in its context. However, I think that it should get all refcnts anyways for clarity and to prevent possible misuses (if I understood correctly, Ian is of the same opinion). My solution would be to add the refcnt grabbing for map, maps and thread in thread__find_map, releasing them in addr_location__put, and then making sure all callers call it when no longer in use. Following the same reasoning, I added refcnt grabbing also to mem_info, branch_info (map was already refcnted, I added it also to maps for coherency), map_symbol (as in branch_info, I added it to maps), and in other places in which I saw a pointer was passed without refcounting. Most changes are quite trivial, however, the changelog is huge: 48 files changed, 472 insertions(+), 157 deletions(-) Most of them are just returns converted to goto for calling the __put functions. Doing so, I managed to remove memory leaks caused by refcounting also in perf- report (I wanted to try also perf top but I encountered another memory-related issue). However, the changelog is huge and testing all of it is challenging (especially since I can test missing puts only with ASan's LeakSanitizer and its reports are usually full of leaks, which I am trying to fix along the way, I will send some patches in the following days). How would you go about it? Do you have any suggestions? >   > > In addition, having a look at other possible concurrency issues in map.c: > > Its good to have new eyes looking at this, exactly at a time we're > discussing further parallelizing perf :-) > > >  - maps__for_each_entry should always be called with either read or write > > lock, > > am I right? It looks like this is not done in certain parts of the code. If > > such > > Right. > > > lock is taken, then grabbing the refcount on the looping variable is not > > needed > > unless we need to return it, right? > > Right, returning an entry needs to take a refcount. > > >  - maps__first and map__next do not grab a refcount and neither a lock. If > > they're used through a lock-protected loop, it's not a problem, but maybe > > it's > > yes > > > worth making explicit that they are not to be used directly (through either > > a > > comment or adding some underscores in their names). > > yes, __ in front means, in kernel style, that it does less than the non > __ prefixed, same name, function. > > >  - maps__empty: should probably take a reader lock. > > Indeed. > > >  - maps__find_symbol: the returned symbol is not protected (the caller does > > not > > receive a refcount to neither map or dso, so if dso is deleted, his > > reference to > > the symbol gets invalidated). Depending on how it's being used it might not > > be a > > problem, but in the general scenario I think it's not thread-safe. > > Yes, that function is also problematic. This issue is easier to solve than expected since the map is returned as **mapp, so it's just a matter of making sure that the caller always passes it and then puts the refcnt. Thanks, Riccardo > > Thanks for looking into this, please consider sending patches for these > issues, > > - Arnaldo