Changes from last version
- use lightweight random simulator instead of get_random_int()
- added per queue filter for disk IO failures
(/sys/blocks/sda/sda1/make-it-fail, /sys/blocks/sda/make-it-fail)
- added process filter
(/debug/{failslab,fail_page_alloc,fail_make_request}/process-filter,
/proc/<pid>/make-it-fail)
---
This patch set provides some fault-injection capabilities.
- kmalloc failures
- alloc_pages() failures
- disk IO errors
We can see what really happens if those failures happen.
In order to enable these fault-injection capabilities:
1. Enable relevant config options (CONFIG_FAILSLAB, CONFIG_PAGE_ALLOC,
CONFIG_MAKE_REQUEST) and runtime configuration kernel module
(CONFIG_SHOULD_FAIL_KNOBS)
2. build and boot with this kernel
3. modprobe should_fail_knob
4. configure fault-injection capabilities behavior by debugfs
For example about kmalloc failures:
/debug/failslab/probability
specifies how often it should fail in percent.
/debug/failslab/interval
specifies the interval of failures.
/debug/failslab/times
specifies how many times failures may happen at most.
/debug/failslab/space
specifies the size of free space where memory can be allocated
safely in bytes.
/debug/failslab/process-filter
enable process filter.
5. see what really happens.
On Thu, 2006-08-31 at 19:07 +0900, Akinobu Mita wrote:
> This patch set provides some fault-injection capabilities.
>
> - kmalloc failures
>
> - alloc_pages() failures
>
> - disk IO errors
>
> We can see what really happens if those failures happen.
Looks very useful for testing error paths; nice work.
Should this perhaps taint the kernel when used?
- Josh Triplett
On Aug 31, 2006, at 13:29:20, Josh Triplett wrote:
> On Thu, 2006-08-31 at 19:07 +0900, Akinobu Mita wrote:
>> This patch set provides some fault-injection capabilities.
>>
>> - kmalloc failures
>>
>> - alloc_pages() failures
>>
>> - disk IO errors
>>
>> We can see what really happens if those failures happen.
>
> Looks very useful for testing error paths; nice work.
>
> Should this perhaps taint the kernel when used?
It shouldn't; these are all failures that could quite possibly happen
during normal operation even without this enabled, they're just a few
orders of magnitude less likely (in most situations).
Cheers,
Kyle moffett