Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp843666pxb; Wed, 13 Jan 2021 18:05:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJxdzZNeZBrpHGY7ezzE9nI/5c/Q2QPYjK6dlsjlB4ntHpdku4t54dMQc/N+OP9VH4Orp2oH X-Received: by 2002:a17:906:3b4d:: with SMTP id h13mr3559054ejf.289.1610589952935; Wed, 13 Jan 2021 18:05:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610589952; cv=none; d=google.com; s=arc-20160816; b=RXROz+PwuVgfDW0kmaNyf9xaMNgEuQAF8jOlGtOU4LeRVhYbHypzRjTpgxUWC5zEr4 xYQcZcjrqgZkgiKO7o7KM2RVAn9zpehplkMMBu6YBhLp3A1wN+VVbnLnYm7wsY1HYwz3 thaDXtEinx4BuVyqjMpyOOYIBkq81TRh87JRpR5yvVre2jQOzZaAYcfZ7XCVtepmBUqp hA4lp/lpYkPNIxAzeSyn9bFkA/zdF6xVuVbhPtz6NPUkeKafMgUIIGxaQaqxWBckNLEq UsJNAkz9ElnrVJWsXrIYZ0q1ROUcId+QjSlV1u7l06oRpbmOzicfAbddDLxppSEXqxx8 ThTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=OAIWjOtXPXk3f6XGXvlQDMsqqk/Jc2w4tbFamd1OcrM=; b=Yxtq8E+WlgI4n6gaIcFb00vyGAdNvQzhiDkawnIewCr4xW84XDhv8REY8tX8jHseCm 4buO/mvJwVMZWO9ng8oWzWbwNT/v5wH6a7xN5+n4Fym56ZQeQ13PJhEgHPfc9jhka+cs +G9j6Z17TzWTDmczUr7l0wSV0LBJg1M7J8Gaxvjofu8bHl0wy23Tf4JEjLtj7ovVqyi3 YTHsEr/G+NL53ikR5VRU9E/WXn5tb5ReuFn9LNZ//lBGJXLCwwhPN96lhlnaTKevb1Qg mVFW4qZWJmBDviaRy79lae862fL8E4HNgpdCdQosCRtdllsbEfwO8M6O6wY+ZORYU7ya 94HA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="K/eirvkv"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o4si424414ejx.582.2021.01.13.18.05.20; Wed, 13 Jan 2021 18:05:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="K/eirvkv"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729862AbhANCDp (ORCPT + 99 others); Wed, 13 Jan 2021 21:03:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58168 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729404AbhAMWio (ORCPT ); Wed, 13 Jan 2021 17:38:44 -0500 Received: from mail-lf1-x12f.google.com (mail-lf1-x12f.google.com [IPv6:2a00:1450:4864:20::12f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A059C061575 for ; Wed, 13 Jan 2021 14:38:02 -0800 (PST) Received: by mail-lf1-x12f.google.com with SMTP id o13so5175643lfr.3 for ; Wed, 13 Jan 2021 14:38:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=OAIWjOtXPXk3f6XGXvlQDMsqqk/Jc2w4tbFamd1OcrM=; b=K/eirvkv5kcBKYbbEyvX7Zv/UJQNaPzp/XrxS/sGEZXeaoXL6+zJxS1r8PZw/FU00U g5iIEhcguSXVlJLsHNhOgpMldxjLP2XQAXQSHPs5kpn7fifZzsWJc3ICDApAhaCRKJWf Ubzk6w+ulrIdDenC5gCem8VvNQCmG4jtjWxf+xMXvLNYkyk9uaqTbNW5WIXoTuSszrPE lv8Mf7vWlMmHcoH3hDLJE5N2XVmtTYrEeKjbA4ZDnXE9UuOz++bZwy5WfMAzmPIzNEXB UmWoAkytH6FipvksxcU+3cajHH3qWhFRHH8MX9eB0t88lKnUSPNrudc1F37KmgjnN7j5 0KxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=OAIWjOtXPXk3f6XGXvlQDMsqqk/Jc2w4tbFamd1OcrM=; b=NBhaOFCwFFVxGVs8PVbgt/jIoP+dO+6z5ylSLcYTaE55hNKRy0rHF0qriWBKk+ErU5 fFWPxweglyIRSbijZrryudBxNbSQd0C4pDB0RGExl8dse5lGEZ5d4roJzKLaFiKsB7zx 8bPhw0djyTce6+AEr5MOpCmoB1m5zbHuh0+VOiYOXIJso66kYxfVE6OfzjAWY9hGZIL8 1PQBQHw1UEyiKllRK7iqZ/O7p5ym3Ij0qzhWClAURa6eTN4rmN4Xf40HzFg/5IARt3j/ FBu9PX469VlH5ILJDbmKaJANSAyxNfCioNAklqT5p9If9gvWFZn2FkBTMHQ4KrtEltd2 giCg== X-Gm-Message-State: AOAM531a50aq5abu3aW+Vuyy66Oj9DmUahZlqNzGw7ATO7ivewDN7kR8 2tpK+w9CzSe6qDhBGn9tnuzUTTY6X+siwrYKEJrHpg== X-Received: by 2002:a19:8210:: with SMTP id e16mr1657132lfd.69.1610577480744; Wed, 13 Jan 2021 14:38:00 -0800 (PST) MIME-Version: 1.0 References: <2f0f46e8-2535-410a-1859-e9cfa4e57c18@suse.cz> In-Reply-To: <2f0f46e8-2535-410a-1859-e9cfa4e57c18@suse.cz> From: Jann Horn Date: Wed, 13 Jan 2021 23:37:34 +0100 Message-ID: Subject: Re: SLUB: percpu partial object count is highly inaccurate, causing some memory wastage and maybe also worse tail latencies? To: Vlastimil Babka Cc: Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Linux-MM , kernel list , Thomas Gleixner , Sebastian Andrzej Siewior , Roman Gushchin , Johannes Weiner , Shakeel Butt , Suren Baghdasaryan , Minchan Kim , Michal Hocko Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 13, 2021 at 8:14 PM Vlastimil Babka wrote: > On 1/12/21 12:12 AM, Jann Horn wrote: > It doesn't help that slabinfo (global or per-memcg) is also > inaccurate as it cannot count free objects on per-cpu partial slabs and thus > reports them as active. Maybe SLUB could be taught to track how many objects are in the percpu machinery, and then print that number separately so that you can at least know how much data you're missing without having to collect data with IPIs... > > It might be a good idea to figure out whether it is possible to > > efficiently keep track of a more accurate count of the free objects on > > As long as there are some inuse objects, it shouldn't matter much if the slab is > sitting on per-cpu partial list or per-node list, as it can't be freed anyway. > It becomes a real problem only after the slab become fully free. If we detected > that in __slab_free() also for already-frozen slabs, we would need to know which > CPU this slab belongs to (currently that's not tracked afaik), Yeah, but at least on 64-bit systems we still have 32 completely unused bits in the counter field that's updated via cmpxchg_double on struct page. (On 32-bit systems the bitfields are also wider than they strictly need to be, I think, at least if the system has 4K page size.) So at least on 64-bit systems, we could squeeze a CPU number in there, and then you'd know to which CPU the page belonged at the time the object was freed. > and send it an > IPI to do some light version of unfreeze_partials() that would only remove empty > slabs. The trick would be not to cause too many IPI's by this, obviously :/ Some brainstorming: Maybe you could have an atomic counter in kmem_cache_cpu that tracks the number of empty frozen pages that are associated with a specific CPU? So the freeing slowpath would do its cmpxchg_double, and if the new state after a successful cmpxchg_double is "inuse==0 && frozen == 1" with a valid CPU number, you afterwards do "atomic_long_inc(&per_cpu_ptr(cache->cpu_slab, cpu)->empty_partial_pages)". I think it should be possible to implement that such that the empty_partial_pages count, while not immediately completely accurate, would be eventually consistent; and readers on the CPU owning the kmem_cache_cpu should never see a number that is too large, only one that is too small. You could additionally have a plain percpu counter, not tied to the kmem_cache, and increment it by 1<