Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp11009942rwd; Thu, 22 Jun 2023 07:40:11 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6lRDkzrhGrmOxYlHjZl5nCdslIwwac2ldQUlO7EhtG4tT/3mt7aD3bfj/mmg2kk/bCJG/5 X-Received: by 2002:a05:6a20:734e:b0:110:2d7f:8eb9 with SMTP id v14-20020a056a20734e00b001102d7f8eb9mr17525357pzc.40.1687444810937; Thu, 22 Jun 2023 07:40:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687444810; cv=none; d=google.com; s=arc-20160816; b=TOe1Lc8Lo+aF5XaWrtxYrSok/Gm0rmQqeMitSskJ5JIlllNsjxbmDluhFYfveB9aer k66h0kefJ6Fyxgb0QtvKrcQamqM0bXLRFConm+8ar05h8L5b0djklzQuxASgr3BeE4bw 8YnV5c5zG1OA+67abG3LPtrWlCVOLNuzI8Zf+APVXe3qnbo5uE1Sl99czGb4JcZBDtyP 8j4bF8N5HGO9q8TS8sa2PqzVNACfUcb9G1WpxhwzGyfGqz5PsLY0nLNAcQv9TwbdWhO/ LrhktPPxXiSUHZV5V59y2TzZdcYyOIJAPtq+PoqnEV9ciaoeHpl1KWFVjQwD1ipljMfQ 7byA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=gRRvoe1Hvz3ufMTSvxbdYPGVw55E2XMvRShvq+21/m0=; b=OZrO014zhY2eo18dZYu1748Qgpna10wc5l5/hG9MWFxT0mykT+oUid+iB4/kMVmlCS ZEozPY0GhcQvqUy/hoeWTKGT2ptX3HJadY47vDSvh4UHGJLKDeELZ4LsWrj2cWVDTszf 8EWWCbceavIyyDfpPAFhDNfvi2Um125A+sVnN1MYo6OIpOI/lDYWEF7fjcQLAukkmhlU DjJ1Dh9oENXkiFMBEVyAYzshx7LVHic0M44TAwux38K9UpneI6ZUiuulRuQA8CTDJ0Ki 5dtTa3PDlNlkcEDPHzonbD9PpdfxHmu6PBlTRFHqeVFEYTz08OtUbAFF3d2qcnBl2d6t yxGQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=c3pPpaGF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 204-20020a6219d5000000b0065109dcb632si6470776pfz.86.2023.06.22.07.39.57; Thu, 22 Jun 2023 07:40:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=c3pPpaGF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231214AbjFVOdD (ORCPT + 99 others); Thu, 22 Jun 2023 10:33:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41002 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231244AbjFVOc7 (ORCPT ); Thu, 22 Jun 2023 10:32:59 -0400 Received: from smtpout.efficios.com (unknown [IPv6:2607:5300:203:b2ee::31e5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BECA31BD4; Thu, 22 Jun 2023 07:32:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1687444370; bh=ftXFedbmHgQ/U0q0DWkKJGQ03MxPPOmYpxX2LMMXz2g=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=c3pPpaGFVSdeZrEFZHnWxhwida8cx2xAEPSCJVSyZmQC8yVWEyr5X/HPrqvhNbplz geIEHiaXeXRBSa2ApWsq6/9LBjCPF4rsMCLua+Xo336V7Gwr9QcaY+/mb4dDcB0r2R ZmvLOsgZn5I/yOw2ZztrWY9+UjUAd7oL46++3L8GZ4jk+8icOFdEKa8nGUe8cj7vj8 KA9MMhL4KDzvRs1QpP9Jf+UISmiPvdqn6NbfE6SCpiyGrpGqswj4j3MwQi3SXXXscW aZsdB7gDhj32wtUxE5vT0GU5D/qyTxNPLIUgFnfuL+uh17oqXHzP6qiGz6x4YetdHL ME2otT/AS+uGA== Received: from [172.16.0.134] (192-222-143-198.qc.cable.ebox.net [192.222.143.198]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4Qn2sG2rvjz193b; Thu, 22 Jun 2023 10:32:50 -0400 (EDT) Message-ID: <287c33e1-acb7-62db-7267-227cbcc54707@efficios.com> Date: Thu, 22 Jun 2023 10:33:13 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [tip: sched/core] sched: Fix performance regression introduced by mm_cid Content-Language: en-US To: John Johansen , Swapnil Sapkal , Peter Zijlstra , Sebastian Andrzej Siewior Cc: linux-kernel@vger.kernel.org, linux-tip-commits@vger.kernel.org, Aaron Lu , x86@kernel.org, Andrew Morton , Thomas Gleixner References: <168214940343.404.10896712987516429042.tip-bot2@tip-bot2> <09e0f469-a3f7-62ef-75a1-e64cec2dcfc5@amd.com> <20230620091139.GZ4253@hirez.programming.kicks-ass.net> <44428f1e-ca2c-466f-952f-d5ad33f12073@amd.com> <3e9eaed6-4708-9e58-c80d-143760d6b23a@efficios.com> <6c693e3b-b941-9acf-6821-179e7a7fe2b8@efficios.com> From: Mathieu Desnoyers In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RDNS_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/21/23 19:59, John Johansen wrote: > On 6/21/23 14:41, Mathieu Desnoyers wrote: >> On 6/21/23 14:51, Mathieu Desnoyers wrote: >>> On 6/21/23 12:36, Swapnil Sapkal wrote: >>>> Hello Mathieu, >>>> >>> [...] >>>>> >>>>> I suspect the regression is caused by the mm_count cache line >>>>> bouncing. >>>>> >>>>> Please try with this additional patch applied: >>>>> >>>>> https://lore.kernel.org/lkml/20230515143536.114960-1-mathieu.desnoyers@efficios.com/ >>>> >>>> Thanks for the suggestion. I tried out with the patch you suggested. >>>> I am seeing >>>> improvement in hackbench numbers with mm_count padding. But this is >>>> not matching >>>> with what we achieved through reverting the new mm_cid patch. >>>> >>>> Below are the results on the 1 Socket 4th Generation EPYC Processor >>>> (1 x 96C/192T): >>>> >>>> Threads: >>>> >>>> Test:              Base (v6.4-rc1)   Base + new_mmcid_reverted  Base >>>> + mm_count_padding >>>>   1-groups:         5.23 (0.00 pct)         4.61 (11.85 pct) 5.11 >>>> (2.29 pct) >>>>   2-groups:         4.99 (0.00 pct)         4.72 (5.41 pct) 5.00 >>>> (-0.20 pct) >>>>   4-groups:         5.96 (0.00 pct)         4.87 (18.28 pct) 5.86 >>>> (1.67 pct) >>>>   8-groups:         6.58 (0.00 pct)         5.44 (17.32 pct) 6.20 >>>> (5.77 pct) >>>> 16-groups:        11.48 (0.00 pct)         8.07 (29.70 pct) 10.68 >>>> (6.96 pct) >>>> >>>> Processes: >>>> >>>> Test:              Base (v6.4-rc1)  Base + new_mmcid_reverted   Base >>>> + mm_count_padding >>>>   1-groups:         5.19 (0.00 pct)         4.90 (5.58 pct) 5.19 >>>> (0.00 pct) >>>>   2-groups:         5.44 (0.00 pct)         5.39 (0.91 pct) 5.39 >>>> (0.91 pct) >>>>   4-groups:         5.69 (0.00 pct)         5.64 (0.87 pct) 5.64 >>>> (0.87 pct) >>>>   8-groups:         6.08 (0.00 pct)         6.01 (1.15 pct) 6.04 >>>> (0.65 pct) >>>> 16-groups:        10.87 (0.00 pct)        10.83 (0.36 pct) 10.93 >>>> (-0.55 pct) >>>> >>>> The ibs profile shows that function __switch_to_asm() is coming at >>>> top in baseline >>>> run and is not seen with mm_count padding patch. Will be attaching >>>> full ibs profile >>>> data for all the 3 runs: >>>> >>>> # Base (v6.4-rc1) >>>> Threads: >>>> Total time: 11.486 [sec] >>>> >>>>     5.15%  sched-messaging  [kernel.vmlinux]      [k] __switch_to_asm >>>>     4.31%  sched-messaging  [kernel.vmlinux]      [k] copyout >>>>     4.29%  sched-messaging  [kernel.vmlinux]      [k] >>>> native_queued_spin_lock_slowpath >>>>     4.22%  sched-messaging  [kernel.vmlinux]      [k] copyin >>>>     3.92%  sched-messaging  [kernel.vmlinux]      [k] >>>> apparmor_file_permission >>>>     2.91%  sched-messaging  [kernel.vmlinux]      [k] __schedule >>>>     2.34%  swapper          [kernel.vmlinux]      [k] __switch_to_asm >>>>     2.10%  sched-messaging  [kernel.vmlinux]      [k] >>>> prepare_to_wait_event >>>>     2.10%  sched-messaging  [kernel.vmlinux]      [k] try_to_wake_up >>>>     2.07%  sched-messaging  [kernel.vmlinux]      [k] >>>> finish_task_switch.isra.0 >>>>     2.00%  sched-messaging  [kernel.vmlinux]      [k] pipe_write >>>>     1.82%  sched-messaging  [kernel.vmlinux]      [k] >>>> check_preemption_disabled >>>>     1.73%  sched-messaging  [kernel.vmlinux]      [k] >>>> exit_to_user_mode_prepare >>>>     1.52%  sched-messaging  [kernel.vmlinux]      [k] >>>> __entry_text_start >>>>     1.49%  sched-messaging  [kernel.vmlinux]      [k] osq_lock >>>>     1.45%  sched-messaging  libc.so.6             [.] write >>>>     1.44%  swapper          [kernel.vmlinux]      [k] >>>> native_sched_clock >>>>     1.38%  sched-messaging  [kernel.vmlinux]      [k] psi_group_change >>>>     1.38%  sched-messaging  [kernel.vmlinux]      [k] pipe_read >>>>     1.37%  sched-messaging  libc.so.6             [.] read >>>>     1.06%  sched-messaging  [kernel.vmlinux]      [k] vfs_read >>>>     1.01%  swapper          [kernel.vmlinux]      [k] psi_group_change >>>>     1.00%  sched-messaging  [kernel.vmlinux]      [k] update_curr >>>> >>>> # Base + mm_count_padding >>>> Threads: >>>> Total time: 11.384 [sec] >>>> >>>>     4.43%  sched-messaging  [kernel.vmlinux]         [k] copyin >>>>     4.39%  sched-messaging  [kernel.vmlinux]         [k] >>>> native_queued_spin_lock_slowpath >>>>     4.07%  sched-messaging  [kernel.vmlinux]         [k] >>>> apparmor_file_permission >>>>     4.07%  sched-messaging  [kernel.vmlinux]         [k] copyout >>>>     2.49%  sched-messaging  [kernel.vmlinux]         [k] >>>> entry_SYSCALL_64 >>>>     2.37%  sched-messaging  [kernel.vmlinux]         [k] >>>> update_cfs_group >>>>     2.19%  sched-messaging  [kernel.vmlinux]         [k] pipe_write >>>>     2.00%  sched-messaging  [kernel.vmlinux]         [k] >>>> check_preemption_disabled >>>>     1.93%  swapper          [kernel.vmlinux]         [k] >>>> update_load_avg >>>>     1.81%  sched-messaging  [kernel.vmlinux]         [k] >>>> exit_to_user_mode_prepare >>>>     1.69%  sched-messaging  [kernel.vmlinux]         [k] try_to_wake_up >>>>     1.58%  sched-messaging  libc.so.6                [.] write >>>>     1.53%  sched-messaging  [kernel.vmlinux]         [k] >>>> psi_group_change >>>>     1.50%  sched-messaging  libc.so.6                [.] read >>>>     1.50%  sched-messaging  [kernel.vmlinux]         [k] pipe_read >>>>     1.39%  sched-messaging  [kernel.vmlinux]         [k] >>>> update_load_avg >>>>     1.39%  sched-messaging  [kernel.vmlinux]         [k] osq_lock >>>>     1.30%  sched-messaging  [kernel.vmlinux]         [k] update_curr >>>>     1.28%  swapper          [kernel.vmlinux]         [k] >>>> psi_group_change >>>>     1.16%  sched-messaging  [kernel.vmlinux]         [k] vfs_read >>>>     1.12%  sched-messaging  [kernel.vmlinux]         [k] vfs_write >>>>     1.10%  sched-messaging  [kernel.vmlinux]         [k] >>>> entry_SYSRETQ_unsafe_stack >>>>     1.09%  sched-messaging  [kernel.vmlinux]         [k] >>>> __switch_to_asm >>>>     1.08%  sched-messaging  [kernel.vmlinux]         [k] do_syscall_64 >>>>     1.06%  sched-messaging  [kernel.vmlinux]         [k] >>>> select_task_rq_fair >>>>     1.03%  swapper          [kernel.vmlinux]         [k] >>>> update_cfs_group >>>>     1.00%  swapper          [kernel.vmlinux]         [k] >>>> rb_insert_color >>>> >>>> # Base + reverted_new_mm_cid >>>> Threads: >>>> Total time: 7.847 [sec] >>>> >>>>    12.14%  sched-messaging  [kernel.vmlinux]      [k] >>>> native_queued_spin_lock_slowpath >>>>     8.86%  swapper          [kernel.vmlinux]      [k] >>>> native_queued_spin_lock_slowpath >>>>     6.13%  sched-messaging  [kernel.vmlinux]      [k] copyin >>>>     5.54%  sched-messaging  [kernel.vmlinux]      [k] >>>> apparmor_file_permission >>>>     3.59%  sched-messaging  [kernel.vmlinux]      [k] copyout >>>>     2.61%  sched-messaging  [kernel.vmlinux]      [k] osq_lock >>>>     2.48%  sched-messaging  [kernel.vmlinux]      [k] pipe_write >>>>     2.33%  sched-messaging  [kernel.vmlinux]      [k] >>>> exit_to_user_mode_prepare >>>>     2.01%  sched-messaging  [kernel.vmlinux]      [k] >>>> check_preemption_disabled >>>>     1.96%  sched-messaging  [kernel.vmlinux]      [k] >>>> __entry_text_start >>>>     1.91%  sched-messaging  libc.so.6             [.] write >>>>     1.77%  sched-messaging  libc.so.6             [.] read >>>>     1.64%  sched-messaging  [kernel.vmlinux]      [k] >>>> mutex_spin_on_owner >>>>     1.58%  sched-messaging  [kernel.vmlinux]      [k] pipe_read >>>>     1.52%  sched-messaging  [kernel.vmlinux]      [k] try_to_wake_up >>>>     1.38%  sched-messaging  [kernel.vmlinux]      [k] >>>> ktime_get_coarse_real_ts64 >>>>     1.35%  sched-messaging  [kernel.vmlinux]      [k] vfs_write >>>>     1.28%  sched-messaging  [kernel.vmlinux]      [k] >>>> entry_SYSRETQ_unsafe_stack >>>>     1.28%  sched-messaging  [kernel.vmlinux]      [k] vfs_read >>>>     1.25%  sched-messaging  [kernel.vmlinux]      [k] do_syscall_64 >>>>     1.22%  sched-messaging  [kernel.vmlinux]      [k] __fget_light >>>>     1.18%  sched-messaging  [kernel.vmlinux]      [k] mutex_lock >>>>     1.12%  sched-messaging  [kernel.vmlinux]      [k] file_update_time >>>>     1.04%  sched-messaging  [kernel.vmlinux]      [k] _copy_from_iter >>>>     1.01%  sched-messaging  [kernel.vmlinux]      [k] current_time >>>> >>>> So with the reverted new_mm_cid patch, we are seeing a lot of time >>>> being spent in >>>> native_queued_spin_lock_slowpath and yet, hackbench finishes faster. >>>> >>>> I keep further digging into this please let me know if you have any >>>> pointers for me. >>> >>> Do you have CONFIG_SECURITY_APPARMOR=y ? Can you try without ? >>> >>> I notice that apparmor_file_permission appears near the top of your >>> profiles, and apparmor uses an internal aa_buffers_lock spinlock, >>> which could possibly explain the top hits for >>> native_queued_spin_lock_slowpath. My current suspicion is that >>> the raw spinlock that was taken by "Base + reverted_new_mm_cid" >>> changed the contention pattern on the apparmor lock enough to >>> speed things up by pure accident. >> >> If apparmor happens to be the culprit here, we should have a hard look >> at this commit: >> >> commit df323337e50 "apparmor: Use a memory pool instead per-CPU caches" >> >> Which turned a per-cpu cache into a global memory pool protected by a >> spinlock. It may benefit RT, but it does not appear to be so great at >> scaling. >> > it is not. And I have a patch that needs some more formal testing for > some stats. > Ubuntu pulled it in last cycle so it has gotten a fair bit of use and is > looking good > on that end. There are probably some tweaks that can be done to improve > it. The > backoff in particular is something that has mostly been adjusted in > response to some > basic benchmarking. > > anyways patch below I don't understand why all these heuristics are needed at all ? What was fundamentally wrong with the per-cpu caches before commit df323337e50 other than being non-RT friendly ? Was the only purpose of that commit to reduce the duration of preempt-off critical sections, or is there a bigger picture concern it was taking care of by introducing a global pool ? Introducing per-cpu memory pools, dealing with migration by giving entries back to the right cpu's pool, taking into account the cpu the entry belongs to, and use a per-cpu/lock-free data structure allowing lock-free push to give back an entry on a remote cpu should do the trick without locking, and without long preempt-off critical sections. The only downside I see for per-cpu memory pools is a slightly larger memory overhead on large multi-core systems. But is that really a concern ? What am I missing here ? Thanks, Mathieu > > commit e057e9b47f1749882ea0efb4427d6b9671c761ab > Author: John Johansen > Date:   Tue Oct 25 01:18:41 2022 -0700 > >     apparmor: cache buffers on percpu list if there is lock contention >     df323337e507 ("apparmor: Use a memory pool instead per-CPU caches") >     changed buffer allocation to use a memory pool, however on a heavily >     loaded machine there can be lock contention on the global buffers >     lock. Add a percpu list to cache buffers on when lock contention is >     encountered. >     When allocating buffers attempt to use cached buffers first, >     before taking the global buffers lock. When freeing buffers >     try to put them back to the global list but if contention is >     encountered, put the buffer on the percpu list. >     The length of time a buffer is held on the percpu list is dynamically >     adjusted based on lock contention.  The amount of hold time is rapidly >     increased and slow ramped down. >     v4: >     - fix percpu ->count buffer count which had been spliced across a >       debug patch. >     - introduce define for MAX_LOCAL_COUNT >     - rework count check and locking around it. >     - update commit message to reference commit that introduced the >       memory pool. >     v3: >     - limit number of buffers that can be pushed onto the percpu >       list. This avoids a problem on some kernels where one percpu >       list can inherit buffers from another cpu after a reschedule, >       causing more kernel memory to used than is necessary. Under >       normal conditions this should eventually return to normal >       but under pathelogical conditions the extra memory consumption >       may have been unbouanded >     v2: >     - dynamically adjust buffer hold time on percpu list based on >       lock contention. >     v1: >     - cache buffers on percpu list on lock contention >     Signed-off-by: John Johansen > > diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c > index e6671a4a89c4..ea3af769af5a 100644 > --- a/security/apparmor/lsm.c > +++ b/security/apparmor/lsm.c > @@ -55,12 +55,21 @@ union aa_buffer { >      char buffer[1]; >  }; > > +struct aa_local_cache { > +    unsigned int contention; > +    unsigned int hold; > +    unsigned int count; > +    struct list_head head; > +}; > + > +#define MAX_LOCAL_COUNT 2 >  #define RESERVE_COUNT 2 >  static int reserve_count = RESERVE_COUNT; >  static int buffer_count; > >  static LIST_HEAD(aa_global_buffers); >  static DEFINE_SPINLOCK(aa_buffers_lock); > +static DEFINE_PER_CPU(struct aa_local_cache, aa_local_buffers); > >  struct kmem_cache *aa_audit_slab; > > @@ -2029,14 +2038,45 @@ static int param_set_mode(const char *val, const > struct kernel_param *kp) >      return 0; >  } > > +static void update_contention(struct aa_local_cache *cache) > +{ > +    cache->contention += 3; > +    if (cache->contention > 9) > +        cache->contention = 9; > +    cache->hold += 1 << cache->contention;        /* 8, 64, 512 */ > +} > + >  char *aa_get_buffer(bool in_atomic) >  { >      union aa_buffer *aa_buf; > +    struct aa_local_cache *cache; >      bool try_again = true; >      gfp_t flags = (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN); > > +    /* use per cpu cached buffers first */ > +    cache = get_cpu_ptr(&aa_local_buffers); > +    if (!list_empty(&cache->head)) { > +        aa_buf = list_first_entry(&cache->head, union aa_buffer, list); > +        list_del(&aa_buf->list); > +        cache->hold--; > +        cache->count--; > +        put_cpu_ptr(&aa_local_buffers); > +        return &aa_buf->buffer[0]; > +    } > +    put_cpu_ptr(&aa_local_buffers); > + > +    if (!spin_trylock(&aa_buffers_lock)) { > +        cache = get_cpu_ptr(&aa_local_buffers); > +        update_contention(cache); > +        put_cpu_ptr(&aa_local_buffers); > +        spin_lock(&aa_buffers_lock); > +    } else { > +        cache = get_cpu_ptr(&aa_local_buffers); > +        if (cache->contention) > +            cache->contention--; > +        put_cpu_ptr(&aa_local_buffers); > +    } >  retry: > -    spin_lock(&aa_buffers_lock); >      if (buffer_count > reserve_count || >          (in_atomic && !list_empty(&aa_global_buffers))) { >          aa_buf = list_first_entry(&aa_global_buffers, union aa_buffer, > @@ -2062,6 +2102,7 @@ char *aa_get_buffer(bool in_atomic) >      if (!aa_buf) { >          if (try_again) { >              try_again = false; > +            spin_lock(&aa_buffers_lock); >              goto retry; >          } >          pr_warn_once("AppArmor: Failed to allocate a memory buffer.\n"); > @@ -2073,15 +2114,42 @@ char *aa_get_buffer(bool in_atomic) >  void aa_put_buffer(char *buf) >  { >      union aa_buffer *aa_buf; > +    struct aa_local_cache *cache; > >      if (!buf) >          return; >      aa_buf = container_of(buf, union aa_buffer, buffer[0]); > > -    spin_lock(&aa_buffers_lock); > -    list_add(&aa_buf->list, &aa_global_buffers); > -    buffer_count++; > -    spin_unlock(&aa_buffers_lock); > +    cache = get_cpu_ptr(&aa_local_buffers); > +    if (!cache->hold) { > +        bool must_lock = cache->count >= MAX_LOCAL_COUNT; > + > +        put_cpu_ptr(&aa_local_buffers); > + > +        if (must_lock) { > +            spin_lock(&aa_buffers_lock); > +            goto locked; > +        } else if (spin_trylock(&aa_buffers_lock)) { > +        locked: > +            /* put back on global list */ > +            list_add(&aa_buf->list, &aa_global_buffers); > +            buffer_count++; > +            spin_unlock(&aa_buffers_lock); > +            cache = get_cpu_ptr(&aa_local_buffers); > +            if (cache->contention) > +                cache->contention--; > +            put_cpu_ptr(&aa_local_buffers); > +            return; > +        } > +        /* contention on global list, fallback to percpu */ > +        cache = get_cpu_ptr(&aa_local_buffers); > +        update_contention(cache); > +    } > + > +    /* cache in percpu list */ > +    list_add(&aa_buf->list, &cache->head); > +    cache->count++; > +    put_cpu_ptr(&aa_local_buffers); >  } > >  /* > @@ -2123,6 +2191,16 @@ static int __init alloc_buffers(void) >      union aa_buffer *aa_buf; >      int i, num; > > +    /* > +     * per cpu set of cached allocated buffers used to help reduce > +     * lock contention > +     */ > +    for_each_possible_cpu(i) { > +        per_cpu(aa_local_buffers, i).contention = 0; > +        per_cpu(aa_local_buffers, i).hold = 0; > +        per_cpu(aa_local_buffers, i).count = 0; > +        INIT_LIST_HEAD(&per_cpu(aa_local_buffers, i).head); > +    } >      /* >       * A function may require two buffers at once. Usually the buffers > are >       * used for a short period of time and are shared. On UP kernel > buffers > > > > -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com