Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932894AbdC1NDP (ORCPT ); Tue, 28 Mar 2017 09:03:15 -0400 Received: from mail-vk0-f45.google.com ([209.85.213.45]:34729 "EHLO mail-vk0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932808AbdC1NDN (ORCPT ); Tue, 28 Mar 2017 09:03:13 -0400 MIME-Version: 1.0 In-Reply-To: References: <20170324200837.82451-1-dvyukov@google.com> From: Dmitry Vyukov Date: Tue, 28 Mar 2017 15:02:36 +0200 Message-ID: Subject: Re: [PATCH] fault-inject: support systematic fault injection To: Akinobu Mita Cc: Andrew Morton , syzkaller , LKML , "linux-mm@kvack.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2178 Lines: 43 On Sat, Mar 25, 2017 at 10:54 AM, Akinobu Mita wrote: > 2017-03-25 5:08 GMT+09:00 Dmitry Vyukov : >> Add /sys/kernel/debug/fail_once file that allows failing 0-th, 1-st, 2-nd >> and so on calls systematically. Excerpt from the added documentation: >> >> === >> Write to this file of integer N makes N-th call in the current task fail >> (N is 0-based). Read from this file returns a single char 'Y' or 'N' >> that says if the fault setup with a previous write to this file was >> injected or not, and disables the fault if it wasn't yet injected. >> Note that this file enables all types of faults (slab, futex, etc). >> This setting takes precedence over all other generic settings like >> probability, interval, times, etc. But per-capability settings >> (e.g. fail_futex/ignore-private) take precedence over it. >> This feature is intended for systematic testing of faults in a single >> system call. See an example below. >> === > > The "/sys/kernel/debug/fail_once" contains per-task data. > > Should we introduce new per-task file like "/proc//fail-nth" > instead of adding a single global debugfs file? Mailed v2 that uses /proc/self/task/tid/fail-nth. >> Why adding new setting: >> 1. Existing settings are global rather than per-task. >> So parallel testing is not possible. >> 2. attr->interval is close but it depends on attr->count >> which is non reset to 0, so interval does not work as expected. >> 3. Trying to model this with existing settings requires manipulations >> of all of probability, interval, times, space, task-filter and >> unexposed count and per-task make-it-fail files. >> 4. Existing settings are per-failure-type, and the set of failure >> types is potentially expanding. >> 5. make-it-fail can't be changed by unprivileged user and aggressive >> stress testing better be done from an unprivileged user. >> Similarly, this would require opening the debugfs files to the >> unprivileged user, as he would need to reopen at least times file >> (not possible to pre-open before dropping privs). >> >> The proposed interface solves all of the above (see the example).