Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760827Ab3IDUAs (ORCPT ); Wed, 4 Sep 2013 16:00:48 -0400 Received: from mail-pb0-f52.google.com ([209.85.160.52]:46716 "EHLO mail-pb0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755763Ab3IDUAq convert rfc822-to-8bit (ORCPT ); Wed, 4 Sep 2013 16:00:46 -0400 Subject: Re: [PATCH v3 0/2] ext4: increase mbcache scalability Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: Andreas Dilger In-Reply-To: <1378312756-68597-1-git-send-email-tmac@hp.com> Date: Wed, 4 Sep 2013 14:00:44 -0600 Cc: "Theodore Ts'o" , Al Viro , "linux-ext4@vger.kernel.org List" , Linux Kernel Mailing List , "linux-fsdevel@vger.kernel.org Devel" , aswin@hp.com, Linus Torvalds , aswin_proj@groups.hp.com Content-Transfer-Encoding: 8BIT Message-Id: References: <1374108934-50550-1-git-send-email-tmac@hp.com> <1378312756-68597-1-git-send-email-tmac@hp.com> To: T Makphaibulchoke X-Mailer: Apple Mail (2.1085) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4704 Lines: 123 On 2013-09-04, at 10:39 AM, T Makphaibulchoke wrote: > This patch intends to improve the scalability of an ext filesystem, > particularly ext4. In the past, I've raised the question of whether mbcache is even useful on real-world systems. Essentially, this is providing a "deduplication" service for ext2/3/4 xattr blocks that are identical. The question is how often this is actually the case in modern use? The original design was for allowing external ACL blocks to be shared between inodes, at a time when ACLs where pretty much the only xattrs stored on inodes. The question now is whether there are common uses where all of the xattrs stored on multiple inodes are identical? If that is not the case, mbcache is just adding overhead and should just be disabled entirely instead of just adding less overhead. There aren't good statistics on the hit rate for mbcache, but it might be possible to generate some with systemtap or similar to see how often ext4_xattr_cache_find() returns NULL vs. non-NULL. Cheers, Andreas > The patch consists of two parts. The first part introduces higher > degree of parallelism to the usages of the mb_cache and > mb_cache_entries and impacts all ext filesystems. > > The second part of the patch further increases the scalablity of > an ext4 filesystem by having each ext4 fielsystem allocate and use > its own private mbcache structure, instead of sharing a single > mcache structures across all ext4 filesystems > > Here are some of the benchmark results with the changes. > > On a 90 core machine: > > Here are the performance improvements in some of the aim7 workloads, > > --------------------------- > | | % increase | > --------------------------- > | alltests | 11.85 | > --------------------------- > | custom | 14.42 | > --------------------------- > | fserver | 21.36 | > --------------------------- > | new_dbase | 5.59 | > --------------------------- > | new_fserver | 21.45 | > --------------------------- > | shared | 12.84 | > --------------------------- > For Swingbench dss workload, with 16 GB database, > > ------------------------------------------------------------------------------- > | Users | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | > ------------------------------------------------------------------------------- > | % imprvoment | 8.46 | 8.00 | 7.35 | -.313| 1.09 | 0.69 | 0.30 | 2.18 | 5.23 | > ------------------------------------------------------------------------------- > | % imprvoment |45.66 |47.62 |34.54 |25.15 |15.29 | 3.38 | -8.7 |-4.98 |-7.86 | > | without using| | | | | | | | | | > | shared memory| | | | | | | | | | > ------------------------------------------------------------------------------- > For SPECjbb2013, composite run, > > -------------------------------------------- > | | max-jOPS | critical-jOPS | > -------------------------------------------- > | % improvement | 5.99 | N/A | > -------------------------------------------- > > > On an 80 core machine: > > The aim7's results for most of the workloads turn out to the same. > > Here are the results of Swingbench dss workload, > > ------------------------------------------------------------------------------- > | Users | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | > ------------------------------------------------------------------------------- > | % imprvoment |-1.79 | 0.37 | 1.36 | 0.08 | 1.66 | 2.09 | 1.16 | 1.48 | 1.92 | > ------------------------------------------------------------------------------- > > The changes have been tested with ext4 xfstests to verify that no regression > has been introduced. > > Changed in v3: > - New diff summary > > Changed in v2: > - New performance data > - New diff summary > > T Makphaibulchoke (2): > mbcache: decoupling the locking of local from global data > ext4: each filesystem creates and uses its own mb_cache > > fs/ext4/ext4.h | 1 + > fs/ext4/super.c | 24 ++-- > fs/ext4/xattr.c | 51 ++++---- > fs/ext4/xattr.h | 6 +- > fs/mbcache.c | 306 +++++++++++++++++++++++++++++++++++------------- > include/linux/mbcache.h | 10 +- > 6 files changed, 277 insertions(+), 121 deletions(-) > > -- > 1.7.11.3 > Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/