Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932551AbdC1VkH (ORCPT ); Tue, 28 Mar 2017 17:40:07 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:42806 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752248AbdC1VkG (ORCPT ); Tue, 28 Mar 2017 17:40:06 -0400 Date: Tue, 28 Mar 2017 14:40:03 -0700 From: Andrew Morton To: Dmitry Vyukov Cc: akinobu.mita@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v2] fault-inject: support systematic fault injection Message-Id: <20170328144003.b7f7b699f3d22616064e8f7e@linux-foundation.org> In-Reply-To: <20170328130128.101773-1-dvyukov@google.com> References: <20170328130128.101773-1-dvyukov@google.com> X-Mailer: Sylpheed 3.4.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2226 Lines: 52 On Tue, 28 Mar 2017 15:01:28 +0200 Dmitry Vyukov wrote: > Add /proc/self/task//fail-nth file that allows failing > 0-th, 1-st, 2-nd and so on calls systematically. > Excerpt from the added documentation: > > === > Write to this file of integer N makes N-th call in the current task fail > (N is 0-based). Read from this file returns a single char 'Y' or 'N' > that says if the fault setup with a previous write to this file was > injected or not, and disables the fault if it wasn't yet injected. > Note that this file enables all types of faults (slab, futex, etc). > This setting takes precedence over all other generic settings like > probability, interval, times, etc. But per-capability settings > (e.g. fail_futex/ignore-private) take precedence over it. > This feature is intended for systematic testing of faults in a single > system call. See an example below. > === > > Why adding new setting: > 1. Existing settings are global rather than per-task. > So parallel testing is not possible. > 2. attr->interval is close but it depends on attr->count > which is non reset to 0, so interval does not work as expected. > 3. Trying to model this with existing settings requires manipulations > of all of probability, interval, times, space, task-filter and > unexposed count and per-task make-it-fail files. > 4. Existing settings are per-failure-type, and the set of failure > types is potentially expanding. > 5. make-it-fail can't be changed by unprivileged user and aggressive > stress testing better be done from an unprivileged user. > Similarly, this would require opening the debugfs files to the > unprivileged user, as he would need to reopen at least times file > (not possible to pre-open before dropping privs). > > The proposed interface solves all of the above (see the example). Seems reasonable. > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1897,6 +1897,7 @@ struct task_struct { > #endif > #ifdef CONFIG_FAULT_INJECTION > int make_it_fail; > + int fail_nth; > #endif Nit: fail_nth should really be unsigned. And make_it_fail could be made a single bit which shares storage with brk_randomized (for example).