Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp48442pxk; Tue, 15 Sep 2020 17:25:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwPKEYRaQcYFwlx9QYH9XzyJsq6lr7t41AXWahy/cBxto+CluOftxCwCrtdecDYFGqzlnNS X-Received: by 2002:a17:906:4a07:: with SMTP id w7mr22682322eju.366.1600215915160; Tue, 15 Sep 2020 17:25:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600215915; cv=none; d=google.com; s=arc-20160816; b=khH+05QF9Fz/qkYNiCAM3B/w3LwRyVvAIDqBh1gS2lzPG44zVwGAVcDBio/35FxJRv 1gHG07jfvv4NdBwvsipT21MukoxVxTRiz7scgUgQU/REWOYYWU/8vSinzDQBOvlGsdFP 2TeuuxgbA+oDxIf5ApDjyFp+lq/bW7NuRiRtWcr0E9TVr4LoQoR86fupQJRk4G1e7Kbk kFqA1nVwadJpwuYusXQXttadL9I9dEGF8NtaDz0cvTQfW7gW6J30VWFod+mFaoreDER2 OzNuDC+y6+dhTWeVHoC2geCdgS9XOnWzOJKNJ/2zOBVz/8cCM6+ecZQboW80DWTIVFin KpuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=bb/2nP6dbMDHxSWqPMeaJuwwWH+l/Eyia1gre2bZZYs=; b=znPDH4NMXtRUGTHPAZdiqrOT4e57ES3jIzcaEQIgMaKUlBGPQPSwDj7AkQ8qg8co5v k8PEZDKUHreudkPrb+nfbRh6mnVe/KPh4jeORalo0PDJf+WNuKvchvwpGCS7s0MqrIL1 v9QypfE6lFUPxhQOdvcX+QwfyLXhrz/rxS5L7/7+mfSsDJaROdAGEPgR4CsvfR7RVsAd t3xftoA9dUjCgxQBtspCpi4VSGmOujXRfq/oVouIRDmghoXue2KB8UrpINnUALyD/QVQ IJZSI6MV1ksdmVK0ahai8DcaQM9JAfMwjKg0zSGD7AloIFGQbRLz2ULDea7a8dchd+bb pZkg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id gj3si10902398ejb.728.2020.09.15.17.24.52; Tue, 15 Sep 2020 17:25:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727525AbgIPAYO (ORCPT + 99 others); Tue, 15 Sep 2020 20:24:14 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:12274 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726734AbgIOOUY (ORCPT ); Tue, 15 Sep 2020 10:20:24 -0400 Received: from DGGEMS403-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 7D46D5D0ED035B7F8EAF; Tue, 15 Sep 2020 22:01:02 +0800 (CST) Received: from huawei.com (10.90.53.225) by DGGEMS403-HUB.china.huawei.com (10.3.19.203) with Microsoft SMTP Server id 14.3.487.0; Tue, 15 Sep 2020 22:00:55 +0800 From: Hou Tao To: Peter Zijlstra , Ingo Molnar , Oleg Nesterov , Will Deacon CC: Dennis Zhou , Tejun Heo , "Christoph Lameter" , , , Subject: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count Date: Tue, 15 Sep 2020 22:07:50 +0800 Message-ID: <20200915140750.137881-1-houtao1@huawei.com> X-Mailer: git-send-email 2.25.0.4.g0ad7144999 MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.90.53.225] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Under aarch64, __this_cpu_inc() is neither IRQ-safe nor atomic, so when percpu_up_read() is invoked under IRQ-context (e.g. aio completion), and it interrupts the process on the same CPU which is invoking percpu_down_read(), the decreasement on read_count may lost and the final value of read_count on the CPU will be unexpected as shown below: CPU 0 CPU 0 io_submit_one __sb_start_write percpu_down_read __this_cpu_inc // there is already an inflight IO, so // reading *raw_cpu_ptr(&pcp) returns 1 // half complete, then being interrupted *raw_cpu_ptr(&pcp)) += 1 nvme_irq nvme_complete_cqes blk_mq_complete_request nvme_pci_complete_rq nvme_complete_rq blk_mq_end_request blk_update_request bio_endio dio_bio_end_aio aio_complete_rw __sb_end_write percpu_up_read *raw_cpu_ptr(&pcp)) -= 1 // *raw_cpu_ptr(&pcp) is 0 // the decreasement is overwritten by the increasement *raw_cpu_ptr(&pcp)) += 1 // the final value is 1 + 1 = 2 instead of 1 Fixing it by using the IRQ-safe helper this_cpu_inc|dec() for operations on read_count. Another plausible fix is to state that percpu-rwsem can NOT be used under IRQ context and convert all users which may use it under IRQ context. Signed-off-by: Hou Tao --- include/linux/percpu-rwsem.h | 8 ++++---- kernel/locking/percpu-rwsem.c | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h index 5e033fe1ff4e9..5fda40f97fe91 100644 --- a/include/linux/percpu-rwsem.h +++ b/include/linux/percpu-rwsem.h @@ -60,7 +60,7 @@ static inline void percpu_down_read(struct percpu_rw_semaphore *sem) * anything we did within this RCU-sched read-size critical section. */ if (likely(rcu_sync_is_idle(&sem->rss))) - __this_cpu_inc(*sem->read_count); + this_cpu_inc(*sem->read_count); else __percpu_down_read(sem, false); /* Unconditional memory barrier */ /* @@ -79,7 +79,7 @@ static inline bool percpu_down_read_trylock(struct percpu_rw_semaphore *sem) * Same as in percpu_down_read(). */ if (likely(rcu_sync_is_idle(&sem->rss))) - __this_cpu_inc(*sem->read_count); + this_cpu_inc(*sem->read_count); else ret = __percpu_down_read(sem, true); /* Unconditional memory barrier */ preempt_enable(); @@ -103,7 +103,7 @@ static inline void percpu_up_read(struct percpu_rw_semaphore *sem) * Same as in percpu_down_read(). */ if (likely(rcu_sync_is_idle(&sem->rss))) { - __this_cpu_dec(*sem->read_count); + this_cpu_dec(*sem->read_count); } else { /* * slowpath; reader will only ever wake a single blocked @@ -115,7 +115,7 @@ static inline void percpu_up_read(struct percpu_rw_semaphore *sem) * aggregate zero, as that is the only time it matters) they * will also see our critical section. */ - __this_cpu_dec(*sem->read_count); + this_cpu_dec(*sem->read_count); rcuwait_wake_up(&sem->writer); } preempt_enable(); diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c index 8bbafe3e5203d..70a32a576f3f2 100644 --- a/kernel/locking/percpu-rwsem.c +++ b/kernel/locking/percpu-rwsem.c @@ -45,7 +45,7 @@ EXPORT_SYMBOL_GPL(percpu_free_rwsem); static bool __percpu_down_read_trylock(struct percpu_rw_semaphore *sem) { - __this_cpu_inc(*sem->read_count); + this_cpu_inc(*sem->read_count); /* * Due to having preemption disabled the decrement happens on @@ -71,7 +71,7 @@ static bool __percpu_down_read_trylock(struct percpu_rw_semaphore *sem) if (likely(!atomic_read_acquire(&sem->block))) return true; - __this_cpu_dec(*sem->read_count); + this_cpu_dec(*sem->read_count); /* Prod writer to re-evaluate readers_active_check() */ rcuwait_wake_up(&sem->writer); -- 2.25.0.4.g0ad7144999