Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752107AbdHAQIu (ORCPT ); Tue, 1 Aug 2017 12:08:50 -0400 Received: from mail-pg0-f67.google.com ([74.125.83.67]:34757 "EHLO mail-pg0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751878AbdHAQI3 (ORCPT ); Tue, 1 Aug 2017 12:08:29 -0400 MIME-Version: 1.0 In-Reply-To: References: <1499962492-8931-1-git-send-email-akinobu.mita@gmail.com> <20170801130907.GB3359@fnst> From: Akinobu Mita Date: Wed, 2 Aug 2017 01:08:07 +0900 Message-ID: Subject: Re: [PATCH -mm] fault-inject: avoid unwanted data race to task->fail_nth To: Dmitry Vyukov Cc: Lu Fengqi , Andrew Morton , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4865 Lines: 120 2017-08-02 0:54 GMT+09:00 Akinobu Mita : > 2017-08-01 22:45 GMT+09:00 Dmitry Vyukov : >> On Tue, Aug 1, 2017 at 3:09 PM, Lu Fengqi wrote: >>> On Fri, Jul 14, 2017 at 01:14:52AM +0900, Akinobu Mita wrote: >>>>The fault-inject-make-fail-nth-read-write-interface-symmetric.patch in >>>>-mm tree allows users to set task->fail_nth for non current task by procfs. >>>>On the other hand, the current task's fail_nth is decreased to zero in >>>>fault-injection path without any specific locks. >>>> >>>>So we need to prevent the task->fail_nth from being unexpected value by >>>>data races (for example, setting task->fail_nth to zero while decreasing >>>>the current->fail_nth). In this fix, we use READ_ONCE() and WRITE_ONCE() >>>>to prevent the compiler from creating unsolicited accesses. >>>> >>>>Cc: Dmitry Vyukov >>>>Reported-by: Dmitry Vyukov >>>>Signed-off-by: Akinobu Mita >>>>--- >>>> fs/proc/base.c | 5 +++-- >>>> lib/fault-inject.c | 7 +++++-- >>>> 2 files changed, 8 insertions(+), 4 deletions(-) >>>> >>>>diff --git a/fs/proc/base.c b/fs/proc/base.c >>>>index ecc8a25..719c2e9 100644 >>>>--- a/fs/proc/base.c >>>>+++ b/fs/proc/base.c >>>>@@ -1370,7 +1370,7 @@ static ssize_t proc_fail_nth_write(struct file *file, const char __user *buf, >>>> task = get_proc_task(file_inode(file)); >>>> if (!task) >>>> return -ESRCH; >>>>- task->fail_nth = n; >>>>+ WRITE_ONCE(task->fail_nth, n); >>>> put_task_struct(task); >>>> >>>> return count; >>>>@@ -1386,7 +1386,8 @@ static ssize_t proc_fail_nth_read(struct file *file, char __user *buf, >>>> task = get_proc_task(file_inode(file)); >>>> if (!task) >>>> return -ESRCH; >>>>- len = snprintf(numbuf, sizeof(numbuf), "%u\n", task->fail_nth); >>>>+ len = snprintf(numbuf, sizeof(numbuf), "%u\n", >>>>+ READ_ONCE(task->fail_nth)); >>>> len = simple_read_from_buffer(buf, count, ppos, numbuf, len); >>>> put_task_struct(task); >>>> >>>>diff --git a/lib/fault-inject.c b/lib/fault-inject.c >>>>index 09ac73c1..7d315fd 100644 >>>>--- a/lib/fault-inject.c >>>>+++ b/lib/fault-inject.c >>>>@@ -107,9 +107,12 @@ static inline bool fail_stacktrace(struct fault_attr *attr) >>>> >>>> bool should_fail(struct fault_attr *attr, ssize_t size) >>>> { >>>>- if (in_task() && current->fail_nth) { >>>>- if (--current->fail_nth == 0) >>>>+ if (in_task()) { >>>>+ unsigned int fail_nth = READ_ONCE(current->fail_nth); >>>>+ >>>>+ if (fail_nth && !WRITE_ONCE(current->fail_nth, fail_nth - 1)) >>>> goto fail; >>>>+ >>>> return false; >>>> } >>>> >>>>-- >>>>2.7.4 >>>> >>>> >>>> >>> hi >>> >>> I'm a btrfs developer. I found that fail_make_request didn't produce the >>> expected IO ERROR when running xfstests on linux 4.13-rc1. >>> >>> That testcase enable fail_make_request by the following commands: >>> # echo 100 > /sys/kernel/debug/fail_make_request/probability >>> # echo 2 > /sys/kernel/debug/fail_make_request/times >>> # echo 0 > /sys/kernel/debug/fail_make_request/verbose >>> # echo 1 > /sys/block/sda/sda1/make-it-fail >>> # dd if=/dev/zero of=/dev/sda1 bs=128K count=1 oflag=direct >>> >>> As I understand it, after applying this patch, I have to write >>> /proc/
/file-nth firstly so that dd process can catch the IO ERROR. >>> However, the dd process is so fast that I can't write file-nth. >>> >>> So, could you tell me how to produce IO ERROR under these circumstances? >> >> Hi, >> >> fail-nth is orthogonal to the existing mechanisms, so if you have a >> setup that fails all sites with certain probability, that should >> continue to work. > > Lu's setting for fail_make_request is fine before introducing systematic > fault injection and they want to inject fail_make_request only. > > So I think we need a global parameter to turn on/off the systematic fault > injection. (e.g. /sys/kernel/debug/systematic-fault-inject/enable) Oops. That is simply a bug in my patch. Correct should_fail() is below. bool should_fail(struct fault_attr *attr, ssize_t size) { if (in_task()) { unsigned int fail_nth = READ_ONCE(current->fail_nth); if (fail_nth) { if (!WRITE_ONCE(current->fail_nth, fail_nth - 1)) goto fail; return false; } } ... >> If you are writing a new facility and want to use fail-nth, then the >> test process itself needs to cooperate and write fail-nth accordingly. >> See the original patch for an example of how to do it: >> https://groups.google.com/d/msg/syzkaller/DbB4rjYd82s/3MHDwtcqCAAJ