Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751517AbbD3VNI (ORCPT ); Thu, 30 Apr 2015 17:13:08 -0400 Received: from g2t2354.austin.hp.com ([15.217.128.53]:51253 "EHLO g2t2354.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750809AbbD3VNE (ORCPT ); Thu, 30 Apr 2015 17:13:04 -0400 From: Waiman Long To: Peter Zijlstra , Ingo Molnar Cc: linux-kernel@vger.kernel.org, Jason Low , Davidlohr Bueso , Scott J Norton , Douglas Hatch , Waiman Long Subject: [PATCH v4 0/2] locking/rwsem: optimize rwsem_wakeup() Date: Thu, 30 Apr 2015 17:12:15 -0400 Message-Id: <1430428337-16802-1-git-send-email-Waiman.Long@hp.com> X-Mailer: git-send-email 1.7.1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2940 Lines: 82 v3->v4: - Break out the active writer check into a separate patch and move it from __rwsem_do_wake() to rwsem_wake(). - Use smp_rmb() instead of the incorrect smp_mb__after_atomic() as suggested by PeterZ. v2->v3: - Fix errors in commit log. v1->v2: - Add a memory barrier before calling spin_trylock for proper memory ordering. This patch set aims to reduce spinlock contention in the wait_lock due to excessive activity in the rwsem_wake code path. This, in turn, reduces up_write/up_read latency and improve performance when the rwsem is heavily contended. On an 8-socket Westmere-EX server (80 cores, HT off), running AIM7's high_systime workload (1000 users) on a vanilla 4.0 kernel produced the following perf profile for spinlock contention: 9.23% reaim [kernel.kallsyms] [k] _raw_spin_lock_irqsave |--97.39%-- rwsem_wake |--0.69%-- try_to_wake_up |--0.52%-- release_pages --1.40%-- [...] 1.70% reaim [kernel.kallsyms] [k] _raw_spin_lock_irq |--96.61%-- rwsem_down_write_failed |--2.03%-- __schedule |--0.50%-- run_timer_softirq --0.86%-- [...] Here the contended rwsems are the mmap_sem (mm_struct) and the i_mmap_rwsem (address_space) with mostly write locking. With a patched 4.0 kernel, the perf profile became: 1.87% reaim [kernel.kallsyms] [k] _raw_spin_lock_irqsave |--87.64%-- rwsem_wake |--2.80%-- release_pages |--2.56%-- try_to_wake_up |--1.10%-- __wake_up |--1.06%-- pagevec_lru_move_fn |--0.93%-- prepare_to_wait_exclusive |--0.71%-- free_pid |--0.58%-- get_page_from_freelist |--0.57%-- add_device_randomness --2.04%-- [...] 0.80% reaim [kernel.kallsyms] [k] _raw_spin_lock_irq |--92.49%-- rwsem_down_write_failed |--4.24%-- __schedule |--1.37%-- run_timer_softirq --1.91%-- [...] The table below shows the % improvement in throughput (1100-2000 users) in the various AIM7's workloads: Workload % increase in throughput -------- ------------------------ custom 3.8% five-sec 3.5% fserver 4.1% high_systime 22.2% shared 2.1% short 10.1% Waiman Long (2): locking/rwsem: reduce spinlock contention in wakeup after up_read/up_write locking/rwsem: check for active writer before wakeup include/linux/osq_lock.h | 5 +++ kernel/locking/rwsem-xadd.c | 65 +++++++++++++++++++++++++++++++++++++++++- 2 files changed, 68 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/