Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp2767597pxb; Mon, 19 Apr 2021 13:27:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyi4n7lhAXCLFYSYich+VgJP011WhFkSdX6fSSLwyVYj19JT9AaF7V0aYDMm0na8igQI3sT X-Received: by 2002:a63:e405:: with SMTP id a5mr9852195pgi.89.1618864031265; Mon, 19 Apr 2021 13:27:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618864031; cv=none; d=google.com; s=arc-20160816; b=xCEUbNvhWb2pqYrtZOF4HKiLz3adF6AJgcdVgDhe7St8uXDl0xkpRI1Zk9bPqvVeUq Yp+90w+0/PVKNR3uckCrE34tKaoaOFASQr+XlxjCJ7QFvzd+e2tZoNsRTxRXJC/3tTlt nbQJ5BKwMpO/Tuq+MPEkcgglxfUsKn0gKkhEMQbs9TuSo993Ik8ljOVRFNoJPpdj61T4 36s3ljHIxcsRqat5bRbxYLgHXUr4GeRik1x1zLwhKxA45sDwThtRbdT6DJNEhHJITVK5 uyqmbja3dkMwZI9fjRL88YIAEErXAnUWnQTie1ZytY1JxfNE7gAzZL2K6X3YsjfQxmO3 qXEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Zk/sp40ccYRx2WOjOuziU8/bUWKXs2YHeaI5ZDdPDPI=; b=fZT4quvMrKpEShh+AaGtMlSdUeqFTsiMfN+1bZV7QkFSKdh70PfYZ9fqFsmDO8faoD 7eLfB2vqcpza4TvHwQ0r386vXcFih3LZOb6vXC3HraWOLDwKJFHP8Bja14BDG+dcjzTh TGyPPbLuXL0TUOfAVLdT43ZyGW/00mR5Kpm22p7ztkm0sfbHpXyd+KyhcSe8/13vpWAe lHZIwQgf0l7vcyw+JFBpmmQC0nch4hb4zGYkbN4VppB11ECT94KFVCyPervF+849neY4 wf2ByOhSHO67BDerhxFHXQ9/7bqNxC+CgpKV/RKLCVzP9y4JTH6E0wisuVypofv5PaYK O9UQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=myzKF9uZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 99si19826408plc.316.2021.04.19.13.26.41; Mon, 19 Apr 2021 13:27:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=myzKF9uZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233014AbhDSQzk (ORCPT + 99 others); Mon, 19 Apr 2021 12:55:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231674AbhDSQzj (ORCPT ); Mon, 19 Apr 2021 12:55:39 -0400 Received: from mail-qt1-x829.google.com (mail-qt1-x829.google.com [IPv6:2607:f8b0:4864:20::829]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D630C06174A for ; Mon, 19 Apr 2021 09:55:09 -0700 (PDT) Received: by mail-qt1-x829.google.com with SMTP id u8so26535974qtq.12 for ; Mon, 19 Apr 2021 09:55:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Zk/sp40ccYRx2WOjOuziU8/bUWKXs2YHeaI5ZDdPDPI=; b=myzKF9uZT1H+cgfSxn1jOdSsvWzrqzudomITjhtn9FDtGADN3tXQke1hCP4wlgdRf0 5HX95g51uPseo0+nFjTTF2PR1JjCHWrLLpUzFZYeFAbo1JoMD8a17o+GGMvHl+PKhat+ l/uMpPTaRajd7H8iACtsfOyNCo6Td/JG4FvjTmoEsl+F4rfAT/MD34dsRPHeZjSR7RkP BBRdAhyiLjoi5tHbTD2NQOclMK0esAxezvnR2mMICCqC0BZuZJUkqHLQC084nW96WXF3 14EfLu58GyE8MKxe8cNl1fASHtfj5wRcCQk2iiishUDfEIOJ+29yhqunDEzgFo5eH/Zg Lcjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Zk/sp40ccYRx2WOjOuziU8/bUWKXs2YHeaI5ZDdPDPI=; b=DNoUNI3EoEfsImj9NeumsBPu8ro7TbZS5Wukq7qKDBfx0fZnu7ESKnMBuHttFVRdj3 dbFa6AnVGFZncYbAVGA1aH7qh8A+SUimPSP1TF7bC+wgJMhJbgTIuS+2msF3lMNDj3Ju Uz1qm0XHJ8FNvkpXGDgWtqBKK9zpS8DiNvezaMSVC1bp2rl177aojFnGJYzsu+J2GuOp 4si5eWJL2iH5IZf1s8ot/DLbks11+d1KbqTtLz37gDkpqF1WnknbWiTUOidalVLVA2Wd tFkrkfcOVrPG9Ef+Yrurx9zVYNkfjkhtAS8WAcI9Id0OjVvUZMP0q+fbjB142DCKO61Y N0ug== X-Gm-Message-State: AOAM531tiix3vTHnduiFoBsib/l01QqGMiHYuYvkFuQ+M69kQTzJUTn7 ZTUSzr1JZtKdQZqFb7wGCOX0sQ== X-Received: by 2002:a05:622a:301:: with SMTP id q1mr12502788qtw.48.1618851308782; Mon, 19 Apr 2021 09:55:08 -0700 (PDT) Received: from localhost (70.44.39.90.res-cmts.bus.ptd.net. [70.44.39.90]) by smtp.gmail.com with ESMTPSA id w13sm6004476qts.17.2021.04.19.09.55.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Apr 2021 09:55:07 -0700 (PDT) Date: Mon, 19 Apr 2021 12:55:07 -0400 From: Johannes Weiner To: Waiman Long Cc: Michal Hocko , Vladimir Davydov , Andrew Morton , Tejun Heo , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Vlastimil Babka , Roman Gushchin , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Shakeel Butt , Muchun Song , Alex Shi , Chris Down , Yafang Shao , Wei Yang , Masayoshi Mizuma , Xing Zhengjun , Matthew Wilcox Subject: Re: [PATCH v4 4/5] mm/memcg: Save both reclaimable & unreclaimable bytes in object stock Message-ID: References: <20210419000032.5432-1-longman@redhat.com> <20210419000032.5432-5-longman@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210419000032.5432-5-longman@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Apr 18, 2021 at 08:00:31PM -0400, Waiman Long wrote: > Currently, the object stock structure caches either reclaimable vmstat > bytes or unreclaimable vmstat bytes in its object stock structure. The > hit rate can be improved if both types of vmstat data can be cached > especially for single-node system. > > This patch supports the cacheing of both type of vmstat data, though > at the expense of a slightly increased complexity in the caching code. > For large object (>= PAGE_SIZE), vmstat array is done directly without > going through the stock caching step. > > On a 2-socket Cascade Lake server with instrumentation enabled, the > miss rates are shown in the table below. > > Initial bootup: > > Kernel __mod_objcg_state mod_objcg_state %age > ------ ----------------- --------------- ---- > Before patch 634400 3243830 19.6% > After patch 419810 3182424 13.2% > > Parallel kernel build: > > Kernel __mod_objcg_state mod_objcg_state %age > ------ ----------------- --------------- ---- > Before patch 24329265 142512465 17.1% > After patch 24051721 142445825 16.9% > > There was a decrease of miss rate after initial system bootup. However, > the miss rate for parallel kernel build remained about the same probably > because most of the touched kmemcache objects were reclaimable inodes > and dentries. > > Signed-off-by: Waiman Long > --- > mm/memcontrol.c | 79 +++++++++++++++++++++++++++++++------------------ > 1 file changed, 51 insertions(+), 28 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index c13502eab282..a6dd18f6d8a8 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2212,8 +2212,8 @@ struct obj_stock { > struct obj_cgroup *cached_objcg; > struct pglist_data *cached_pgdat; > unsigned int nr_bytes; > - int vmstat_idx; > - int vmstat_bytes; > + int reclaimable_bytes; /* NR_SLAB_RECLAIMABLE_B */ > + int unreclaimable_bytes; /* NR_SLAB_UNRECLAIMABLE_B */ How about int nr_slab_reclaimable_b; int nr_slab_unreclaimable_b; so you don't need the comments? > #else > int dummy[0]; > #endif > @@ -3217,40 +3217,56 @@ void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat, > enum node_stat_item idx, int nr) > { > unsigned long flags; > - struct obj_stock *stock = get_obj_stock(&flags); > + struct obj_stock *stock; > + int *bytes, *alt_bytes, alt_idx; > + > + /* > + * Directly update vmstat array for big object. > + */ > + if (unlikely(abs(nr) >= PAGE_SIZE)) > + goto update_vmstat; This looks like an optimization independent of the vmstat item split? > + stock = get_obj_stock(&flags); > + if (idx == NR_SLAB_RECLAIMABLE_B) { > + bytes = &stock->reclaimable_bytes; > + alt_bytes = &stock->unreclaimable_bytes; > + alt_idx = NR_SLAB_UNRECLAIMABLE_B; > + } else { > + bytes = &stock->unreclaimable_bytes; > + alt_bytes = &stock->reclaimable_bytes; > + alt_idx = NR_SLAB_RECLAIMABLE_B; > + } > > /* > - * Save vmstat data in stock and skip vmstat array update unless > - * accumulating over a page of vmstat data or when pgdat or idx > + * Try to save vmstat data in stock and skip vmstat array update > + * unless accumulating over a page of vmstat data or when pgdat > * changes. > */ > if (stock->cached_objcg != objcg) { > /* Output the current data as is */ > - } else if (!stock->vmstat_bytes) { > - /* Save the current data */ > - stock->vmstat_bytes = nr; > - stock->vmstat_idx = idx; > - stock->cached_pgdat = pgdat; > - nr = 0; > - } else if ((stock->cached_pgdat != pgdat) || > - (stock->vmstat_idx != idx)) { > - /* Output the cached data & save the current data */ > - swap(nr, stock->vmstat_bytes); > - swap(idx, stock->vmstat_idx); > + } else if (stock->cached_pgdat != pgdat) { > + /* Save the current data and output cached data, if any */ > + swap(nr, *bytes); > swap(pgdat, stock->cached_pgdat); > + if (*alt_bytes) { > + __mod_objcg_state(objcg, pgdat, alt_idx, > + *alt_bytes); > + *alt_bytes = 0; > + } As per the other email, I really don't think optimizing the pgdat switch (in a percpu cache) is worth this level of complexity. > } else { > - stock->vmstat_bytes += nr; > - if (abs(stock->vmstat_bytes) > PAGE_SIZE) { > - nr = stock->vmstat_bytes; > - stock->vmstat_bytes = 0; > + *bytes += nr; > + if (abs(*bytes) > PAGE_SIZE) { > + nr = *bytes; > + *bytes = 0; > } else { > nr = 0; > } > } > - if (nr) > - __mod_objcg_state(objcg, pgdat, idx, nr); > - > put_obj_stock(flags); > + if (!nr) > + return; > +update_vmstat: > + __mod_objcg_state(objcg, pgdat, idx, nr); > } > > static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) > @@ -3303,12 +3319,19 @@ static void drain_obj_stock(struct obj_stock *stock) > /* > * Flush the vmstat data in current stock > */ > - if (stock->vmstat_bytes) { > - __mod_objcg_state(old, stock->cached_pgdat, stock->vmstat_idx, > - stock->vmstat_bytes); > + if (stock->reclaimable_bytes || stock->unreclaimable_bytes) { > + int bytes; > + > + if ((bytes = stock->reclaimable_bytes)) > + __mod_objcg_state(old, stock->cached_pgdat, > + NR_SLAB_RECLAIMABLE_B, bytes); > + if ((bytes = stock->unreclaimable_bytes)) > + __mod_objcg_state(old, stock->cached_pgdat, > + NR_SLAB_UNRECLAIMABLE_B, bytes); The int bytes indirection isn't necessary. It's easier to read even with the extra lines required to repeat the long stock member names, because that is quite a common pattern (if (stuff) frob(stuff)). __mod_objcg_state() also each time does rcu_read_lock() toggling and a memcg lookup that could be batched, which I think is further proof that it should just be inlined here.