Cleaned up and fixed unclear logics. and removed RFC.
Maybe this version is easy to be read.
When we see forkbomb, it tends can be a fatal one.
When A user makes a forkbomb (and sometimes reaches ulimit....
In this case,
- If the system is not in OOM, the admin may be able to kill all threads by
hand..but forkbomb may be faster than pkill() by admin.
- If the system is in OOM, the admin needs to reboot system.
OOM killer is slow than forkbomb.
So, I think forkbomb killer is appreciated. It's better than reboot.
At implementing forkbomb killer, one of difficult case is like this
# forkbomb(){ forkbomb|forkbomb & } ; forkbomb
With this, parent tasks will exit() before the system goes under OOM.
So, it's difficult to know the whole image of forkbomb.
This patch introduce a subsystem to track mm's history and records it
even after the task exit. (It will be flushed periodically.)
I tested with several forkbomb cases and this patch seems work fine.
Maybe some more 'heuristics' can be added....but I think this simple
one works enough. Any comments are welcome.
Thanks,
-Kame
Kconfig and Documentation for forkbomb killer.
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
Documentation/vm/forkbomb.txt | 62 ++++++++++++++++++++++++++++++++++++++++++
mm/Kconfig | 16 ++++++++++
2 files changed, 78 insertions(+)
Index: mm-work2/Documentation/vm/forkbomb.txt
===================================================================
--- /dev/null
+++ mm-work2/Documentation/vm/forkbomb.txt
@@ -0,0 +1,62 @@
+Forkbomb.txt
+
+1. Intruduction
+ Maybe many programmer have an experience to write a fork-bomb program.
+
+ One example of fork-bomb is a bomb which make system unstable by the
+ memory pressure caused by the number of tasks. This kind of fork-bomb
+ can be limited by ulimit(max user processes). If it happens, the user
+ who has the same owner ID of forkbomb will not be able to do anything
+ but other users(admin) may have a chance to kill them. (Of course,
+ if forkbomb is created by root, we have no chance to recover.)
+
+ Another example of fork-bomb is a bomb which eats much memory. This
+ kind of forkbomb causes huge swapout and make system slow and finally,
+ OOM. In swapless system, the system will see OOM soon. To prevent this
+ type of bomb, memory cgroup or overcommit_memory will be a help. But
+ troubles happen when we don't expected.....
+
+ To recover from fork-bomb, we need to kill all tasks which is in the
+ forkbomb tree, in general. But if the system is in OOM state, killing
+ them all tends to be difficult.
+
+2. Forkbomb Killer.
+ The kernel provides a forkbomb killer. (see mm/Kconfig FORKBOMB_KILLER)
+ If enabled, the forkbomb killer will provides 2 system files.
+
+ /sys/kernel/mm/oom/mm_tracking_enabled
+ /sys/kernel/mm/oom/mm_tracking_reset_interval_msecs
+
+
+ If /sys/kernel/mm/oom/mm_tracking_enabled == enabled, the kernel records
+ all fork/vfork/exec information by an extra structure than usual task
+ management. This information is used for tracking a task tree. Unlike
+ process tree, this doesn't discard parent<->children information even
+ when the parent exits before children and make children as orphan processes.
+ By this, even with following script, task tracking information can be
+ preserved and we have a chance to chase all proceesses in a fork bomb.
+
+ (example) # forkbomb(){ forkbomb|forkbomb & } ; forkbomb
+
+ But this information tracking adds a small overhead at fork/vfork/exec/exit.
+ Default is enabled.
+
+ /sys/kernel/mm/oom/mm_tracking_reset_interval_msecs
+
+ Because we cannot preserve all information since the system boot, we need
+ to forget information. Forkbomb killer checks the system status in each
+ period. What checked now is
+ 1. the number of process.
+ 2. the number of kswapd runs.
+ 3. the number of alloc stalls. (memory reclaim)
+ If all of 1,2,3 aren't increased for mm_tracking_reset_interval_msecs,
+ all tracking information recorded before previous period will be
+ removed.
+ IOW, by making mm_tracking_reset_interval_msecs larger, you can check
+ forkbomb in a long period but will have more overheads. By making it
+ smaller, tracking records are removed earlier and tasks killed by
+ forkbomb killer will decrease (and you can avoid unnecessary kills.)
+ Default is 30secs.
+
+
+
Index: mm-work2/mm/Kconfig
===================================================================
--- mm-work2.orig/mm/Kconfig
+++ mm-work2/mm/Kconfig
@@ -274,6 +274,22 @@ config HWPOISON_INJECT
depends on MEMORY_FAILURE && DEBUG_KERNEL && PROC_FS
select PROC_PAGE_MONITOR
+config FORKBOMB_KILLER
+ bool "Killing a tree of tasks when a forkbomb is found"
+ depends on EXPERIMENTAL
+ default n
+ select MM_OWNER
+ help
+ Provide a fork-bomb-killer, which is triggered at OOM.
+ In usual case, OOM-Killer kills a memory eater processes.
+ But it kills tasks in conservative way and cannot be a help
+ if forkbomb is running. The admin may need to reboot system
+ if the influence of the bomb cannot be limited by rlimits or
+ some security settings. FORKBOMB Killer kills a tree of process
+ which have started recently and eats much memory. Please see,
+ Documentation/vm/forkbomb.txt for details. If unsure, say N.
+
+
config NOMMU_INITIAL_TRIM_EXCESS
int "Turn on mmap() excess space trimming before booting"
depends on !MMU
This patch adds a subsystem for recording a history of mm.
This patch records relation ship of each mm_structs and
preserve them in a tree. New record is added at fork()
and exec(). If all children disapperas at exit(), the record
will be removed.
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
fs/exec.c | 1
include/linux/mm_types.h | 3 +
include/linux/oom.h | 14 ++++++++
kernel/fork.c | 3 +
mm/oom_kill.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 96 insertions(+)
Index: mm-work2/include/linux/oom.h
===================================================================
--- mm-work2.orig/include/linux/oom.h
+++ mm-work2/include/linux/oom.h
@@ -72,5 +72,19 @@ extern struct task_struct *find_lock_tas
extern int sysctl_oom_dump_tasks;
extern int sysctl_oom_kill_allocating_task;
extern int sysctl_panic_on_oom;
+
+#ifdef CONFIG_FORKBOMB_KILLER
+extern void track_mm_history(struct mm_struct *new, struct mm_struct *old);
+extern void delete_mm_history(struct mm_struct *mm);
+#else
+static inline void
+track_mm_history(struct mm_struct *new, struct mm_struct *old)
+{
+}
+static inline void delete_mm_history(struct mm_struct *mm)
+{
+}
+#endif
+
#endif /* __KERNEL__*/
#endif /* _INCLUDE_LINUX_OOM_H */
Index: mm-work2/mm/oom_kill.c
===================================================================
--- mm-work2.orig/mm/oom_kill.c
+++ mm-work2/mm/oom_kill.c
@@ -761,3 +761,78 @@ void pagefault_out_of_memory(void)
if (!test_thread_flag(TIF_MEMDIE))
schedule_timeout_uninterruptible(1);
}
+
+#ifdef CONFIG_FORKBOMB_KILLER
+
+struct mm_history {
+ spinlock_t lock;
+ struct mm_struct *mm;
+ struct mm_history *parent;
+ struct list_head siblings;
+ struct list_head children;
+ /* scores */
+ unsigned long start_time;
+ unsigned long score;
+ unsigned int family;
+ int need_to_kill;
+};
+
+struct mm_history init_hist = {
+ .parent = &init_hist,
+ .lock = __SPIN_LOCK_UNLOCKED(init_hist.lock),
+ .siblings = LIST_HEAD_INIT(init_hist.siblings),
+ .children = LIST_HEAD_INIT(init_hist.children),
+};
+
+void track_mm_history(struct mm_struct *new, struct mm_struct *parent)
+{
+ struct mm_history *hist, *phist;
+
+ hist = kmalloc(sizeof(*hist), GFP_KERNEL);
+ if (!hist)
+ return;
+ spin_lock_init(&hist->lock);
+ INIT_LIST_HEAD(&hist->children);
+ hist->mm = new;
+ hist->start_time = jiffies;
+ if (parent)
+ phist = parent->history;
+ else
+ phist = NULL;
+ if (!phist)
+ phist = &init_hist;
+ new->history = hist;
+ hist->parent = phist;
+ spin_lock(&phist->lock);
+ list_add_tail(&hist->siblings, &phist->children);
+ spin_unlock(&phist->lock);
+ return;
+}
+
+void delete_mm_history(struct mm_struct *mm)
+{
+ struct mm_history *hist, *phist;
+ bool nochild;
+
+ if (!mm->history)
+ return;
+ hist = mm->history;
+ spin_lock(&hist->lock);
+ nochild = list_empty(&hist->children);
+ mm->history = NULL;
+ hist->mm = NULL;
+ spin_unlock(&hist->lock);
+ /* delete if we have no child */
+ while (nochild && hist != &init_hist) {
+ phist = hist->parent;
+ spin_lock(&phist->lock);
+ list_del(&hist->siblings);
+ /* delete parent if it's dead & no more child other than me.*/
+ nochild = (phist->mm == NULL && list_empty(&phist->children));
+ spin_unlock(&phist->lock);
+ kfree(hist);
+ hist = phist;
+ }
+}
+
+#endif
Index: mm-work2/fs/exec.c
===================================================================
--- mm-work2.orig/fs/exec.c
+++ mm-work2/fs/exec.c
@@ -802,6 +802,7 @@ static int exec_mmap(struct mm_struct *m
}
task_unlock(tsk);
arch_pick_mmap_layout(mm);
+ track_mm_history(mm, old_mm);
if (old_mm) {
up_read(&old_mm->mmap_sem);
BUG_ON(active_mm != old_mm);
Index: mm-work2/kernel/fork.c
===================================================================
--- mm-work2.orig/kernel/fork.c
+++ mm-work2/kernel/fork.c
@@ -559,6 +559,7 @@ void mmput(struct mm_struct *mm)
ksm_exit(mm);
khugepaged_exit(mm); /* must run before exit_mmap */
exit_mmap(mm);
+ delete_mm_history(mm);
set_mm_exe_file(mm, NULL);
if (!list_empty(&mm->mmlist)) {
spin_lock(&mmlist_lock);
@@ -706,6 +707,8 @@ struct mm_struct *dup_mm(struct task_str
if (mm->binfmt && !try_module_get(mm->binfmt->module))
goto free_pt;
+ track_mm_history(mm, oldmm);
+
return mm;
free_pt:
Index: mm-work2/include/linux/mm_types.h
===================================================================
--- mm-work2.orig/include/linux/mm_types.h
+++ mm-work2/include/linux/mm_types.h
@@ -317,6 +317,9 @@ struct mm_struct {
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
pgtable_t pmd_huge_pte; /* protected by page_table_lock */
#endif
+#ifdef CONFIG_FORKBOMB_KILLER
+ struct mm_history *history;
+#endif
};
/* Future-safe accessor for struct mm_struct's cpu_vm_mask. */
This patch adds a code for scanning mm_history tree. Later, we need
to scan all mm_histroy from children->parent direction.
And this patch adds a global lock which will be required for scanning.
Because scanning isn't called frequently, using rwsem with a help of
percpu variable.
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
mm/oom_kill.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 116 insertions(+)
Index: mm-work2/mm/oom_kill.c
===================================================================
--- mm-work2.orig/mm/oom_kill.c
+++ mm-work2/mm/oom_kill.c
@@ -31,6 +31,7 @@
#include <linux/memcontrol.h>
#include <linux/mempolicy.h>
#include <linux/security.h>
+#include <linux/cpu.h>
int sysctl_panic_on_oom;
int sysctl_oom_kill_allocating_task;
@@ -764,6 +765,58 @@ void pagefault_out_of_memory(void)
#ifdef CONFIG_FORKBOMB_KILLER
+static DEFINE_PER_CPU(unsigned long, pcpu_history_lock);
+static DECLARE_RWSEM(hist_rwsem);
+static int need_global_history_lock;
+
+static void update_history_lock(void)
+{
+retry:
+ preempt_disable();
+ this_cpu_inc(pcpu_history_lock);
+ smp_rmb();
+ if (need_global_history_lock) {
+ this_cpu_dec(pcpu_history_lock);
+ preempt_enable();
+ down_read(&hist_rwsem);
+ up_read(&hist_rwsem);
+ goto retry;
+ }
+}
+
+static void update_history_unlock(void)
+{
+ this_cpu_dec(pcpu_history_lock);
+ preempt_enable();
+}
+
+static void scan_history_lock(void)
+{
+ int cpu;
+ bool loop;
+
+ down_write(&hist_rwsem);
+ need_global_history_lock++;
+ do {
+ loop = false;
+ get_online_cpus();
+ for_each_online_cpu(cpu)
+ if (per_cpu(pcpu_history_lock, cpu)) {
+ loop = true;
+ break;
+ }
+ put_online_cpus();
+ cpu_relax();
+ } while (loop);
+}
+
+static void scan_history_unlock(void)
+{
+ need_global_history_lock--;
+ up_write(&hist_rwsem);
+}
+
+
struct mm_history {
spinlock_t lock;
struct mm_struct *mm;
@@ -791,6 +844,7 @@ void track_mm_history(struct mm_struct *
hist = kmalloc(sizeof(*hist), GFP_KERNEL);
if (!hist)
return;
+ update_history_lock();
spin_lock_init(&hist->lock);
INIT_LIST_HEAD(&hist->children);
hist->mm = new;
@@ -806,6 +860,7 @@ void track_mm_history(struct mm_struct *
spin_lock(&phist->lock);
list_add_tail(&hist->siblings, &phist->children);
spin_unlock(&phist->lock);
+ update_history_unlock();
return;
}
@@ -816,6 +871,7 @@ void delete_mm_history(struct mm_struct
if (!mm->history)
return;
+ update_history_lock();
hist = mm->history;
spin_lock(&hist->lock);
nochild = list_empty(&hist->children);
@@ -833,6 +889,66 @@ void delete_mm_history(struct mm_struct
kfree(hist);
hist = phist;
}
+ update_history_unlock();
}
+/* Because we have global scan lock, we need no lock at scaning. */
+static struct mm_history* __first_child(struct mm_history *p)
+{
+ if (list_empty(&p->children))
+ return NULL;
+ return list_first_entry(&p->children, struct mm_history, siblings);
+}
+
+static struct mm_history* __next_sibling(struct mm_history *p)
+{
+ if (p->siblings.next == &p->parent->children)
+ return NULL;
+ return list_first_entry(&p->siblings, struct mm_history, siblings);
+}
+
+static struct mm_history *first_deepest_child(struct mm_history *p)
+{
+ struct mm_history *tmp;
+
+ do {
+ tmp = __first_child(p);
+ if (!tmp)
+ return p;
+ p = tmp;
+ } while (1);
+}
+
+static struct mm_history *mm_history_scan_start(struct mm_history *hist)
+{
+ return first_deepest_child(hist);
+}
+
+static struct mm_history *mm_history_scan_next(struct mm_history *pos)
+{
+ struct mm_history *tmp;
+
+ tmp = __next_sibling(pos);
+ if (!tmp)
+ return pos->parent;
+ pos = tmp;
+ pos = first_deepest_child(pos);
+ return pos;
+}
+
+#define for_each_mm_history_under(pos, root)\
+ for (pos = mm_history_scan_start(root);\
+ pos != root;\
+ pos = mm_history_scan_next(pos))
+
+#define for_each_mm_history_safe_under(pos, root, tmp)\
+ for (pos = mm_history_scan_start(root),\
+ tmp = mm_history_scan_next(pos);\
+ pos != root;\
+ pos = tmp, tmp = mm_history_scan_next(pos))
+
+#define for_each_mm_history(pos) for_each_mm_history_under((pos), &init_hist)
+#define for_each_mm_history_safe(pos, tmp)\
+ for_each_mm_history_safe_under((pos), &init_hist, (tmp))
+
#endif
At 1st, this patch adds a control knob for enable/disable mm_history
tracking.
2nd, at tracking mm's history for forkbomb detection, information of
processes which doesn't seem to be important for fork-bomb detection
is just a noise.
This patch adds a knob for forgetting information with a periodic
check routine.
At every 30secs (can be configured),
1. check nr_procesess doesn't increase
2. check kswapd doesn't run
3. check allocstall doesn't occur.
If all don't happens, clear mm_history which is older than 30secs.
Note: reorder of objects in makefile was required because
mm_kobj's initcall should be called before oom's...
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
mm/Makefile | 4 -
mm/oom_kill.c | 144 +++++++++++++++++++++++++++++++++++++++++++++++++++++++---
2 files changed, 139 insertions(+), 9 deletions(-)
Index: mm-work2/mm/oom_kill.c
===================================================================
--- mm-work2.orig/mm/oom_kill.c
+++ mm-work2/mm/oom_kill.c
@@ -768,6 +768,7 @@ void pagefault_out_of_memory(void)
static DEFINE_PER_CPU(unsigned long, pcpu_history_lock);
static DECLARE_RWSEM(hist_rwsem);
static int need_global_history_lock;
+static int mm_tracking_enabled = 1;
static void update_history_lock(void)
{
@@ -841,6 +842,9 @@ void track_mm_history(struct mm_struct *
{
struct mm_history *hist, *phist;
+ if (!mm_tracking_enabled)
+ return;
+
hist = kmalloc(sizeof(*hist), GFP_KERNEL);
if (!hist)
return;
@@ -864,19 +868,19 @@ void track_mm_history(struct mm_struct *
return;
}
-void delete_mm_history(struct mm_struct *mm)
+static void __delete_mm_history(struct mm_history *hist, bool check_ancestors)
{
- struct mm_history *hist, *phist;
+ struct mm_history *phist;
bool nochild;
- if (!mm->history)
+ if (!hist)
return;
- update_history_lock();
- hist = mm->history;
spin_lock(&hist->lock);
nochild = list_empty(&hist->children);
- mm->history = NULL;
- hist->mm = NULL;
+ if (hist->mm) {
+ hist->mm->history = NULL;
+ hist->mm = NULL;
+ }
spin_unlock(&hist->lock);
/* delete if we have no child */
while (nochild && hist != &init_hist) {
@@ -887,8 +891,16 @@ void delete_mm_history(struct mm_struct
nochild = (phist->mm == NULL && list_empty(&phist->children));
spin_unlock(&phist->lock);
kfree(hist);
+ if (!check_ancestors)
+ break;
hist = phist;
}
+}
+
+void delete_mm_history(struct mm_struct *mm)
+{
+ update_history_lock();
+ __delete_mm_history(mm->history, true);
update_history_unlock();
}
@@ -951,4 +963,122 @@ static struct mm_history *mm_history_sca
#define for_each_mm_history_safe(pos, tmp)\
for_each_mm_history_safe_under((pos), &init_hist, (tmp))
+static unsigned long reset_interval_jiffies = 30*HZ;
+unsigned long last_nr_procs;
+unsigned long last_pageout_run;
+unsigned long last_allocstall;
+static void reset_mm_tracking(struct work_struct *w);
+DECLARE_DELAYED_WORK(reset_mm_tracking_work, reset_mm_tracking);
+
+static void reset_mm_tracking(struct work_struct *w)
+{
+ struct mm_history *pos, *tmp;
+ unsigned long nr_procs;
+ unsigned long events[NR_VM_EVENT_ITEMS];
+ bool forget = true;
+
+ nr_procs = nr_processes();
+ if (nr_procs > last_nr_procs)
+ forget = false;
+ last_nr_procs = nr_procs;
+
+ all_vm_events(events);
+ if (last_pageout_run != events[PAGEOUTRUN])
+ forget = false;
+ last_pageout_run = events[PAGEOUTRUN];
+ if (last_allocstall != events[ALLOCSTALL])
+ forget = false;
+ last_allocstall = events[ALLOCSTALL];
+
+ if (forget) {
+ unsigned long thresh = jiffies - reset_interval_jiffies;
+ scan_history_lock();
+ for_each_mm_history_safe(pos, tmp) {
+ if (time_before(pos->start_time, thresh))
+ __delete_mm_history(pos, false);
+ }
+ scan_history_unlock();
+ }
+ if (mm_tracking_enabled)
+ schedule_delayed_work(&reset_mm_tracking_work,
+ reset_interval_jiffies);
+ return;
+}
+
+#define OOM_ATTR(_name)\
+ static struct kobj_attribute _name##_attr =\
+ __ATTR(_name, 0644, _name##_show, _name##_store)
+
+static ssize_t mm_tracker_reset_interval_msecs_show(struct kobject *obj,
+ struct kobj_attribute *attr, char *buf)
+{
+ return sprintf(buf, "%u", jiffies_to_msecs(reset_interval_jiffies));
+}
+
+static ssize_t mm_tracker_reset_interval_msecs_store(struct kobject *obj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ unsigned long msecs;
+ int err;
+
+ err = strict_strtoul(buf, 10, &msecs);
+ if (err || msecs > UINT_MAX)
+ return -EINVAL;
+
+ reset_interval_jiffies = msecs_to_jiffies(msecs);
+ return count;
+}
+OOM_ATTR(mm_tracker_reset_interval_msecs);
+
+static ssize_t mm_tracker_enable_show(struct kobject *obj,
+ struct kobj_attribute *attr, char *buf)
+{
+ if (mm_tracking_enabled)
+ return sprintf(buf, "enabled");
+ return sprintf(buf, "disabled");
+}
+
+static ssize_t mm_tracker_enable_store(struct kobject *obj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ if (!memcmp("disable", buf, min(sizeof("disable")-1, count)))
+ mm_tracking_enabled = 0;
+ else if (!memcmp("enable", buf, min(sizeof("enable")-1, count)))
+ mm_tracking_enabled = 1;
+ else
+ return -EINVAL;
+ if (mm_tracking_enabled
+ && delayed_work_pending(&reset_mm_tracking_work))
+ schedule_delayed_work(&reset_mm_tracking_work,
+ reset_interval_jiffies);
+
+ return count;
+}
+OOM_ATTR(mm_tracker_enable);
+
+static struct attribute *oom_attrs[] = {
+ &mm_tracker_reset_interval_msecs_attr.attr,
+ &mm_tracker_enable_attr.attr,
+ NULL,
+};
+
+static struct attribute_group oom_attr_group = {
+ .attrs = oom_attrs,
+ .name = "oom",
+};
+
+static int __init init_mm_history(void)
+{
+ int err = 0;
+
+#ifdef CONFIG_SYSFS
+ err = sysfs_create_group(mm_kobj, &oom_attr_group);
+ if (err)
+ printk(KERN_ERR
+ "failed to register mm history tracking for oom\n");
+#endif
+ schedule_delayed_work(&reset_mm_tracking_work, reset_interval_jiffies);
+ return 0;
+}
+module_init(init_mm_history);
#endif
Index: mm-work2/mm/Makefile
===================================================================
--- mm-work2.orig/mm/Makefile
+++ mm-work2/mm/Makefile
@@ -7,11 +7,11 @@ mmu-$(CONFIG_MMU) := fremap.o highmem.o
mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
vmalloc.o pagewalk.o pgtable-generic.o
-obj-y := filemap.o mempool.o oom_kill.o fadvise.o \
+obj-y := mm_init.o filemap.o mempool.o oom_kill.o fadvise.o \
maccess.o page_alloc.o page-writeback.o \
readahead.o swap.o truncate.o vmscan.o shmem.o \
prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
- page_isolation.o mm_init.o mmu_context.o percpu.o \
+ page_isolation.o mmu_context.o percpu.o \
$(mmu-y)
obj-y += init-mm.o
A forkbomb killer implementation.
This patch implements a forkbomb killer which makes use of mm_histroy
record. This calculates badness of each tree of mm_history and kills
all alive processes in the worst tree. This function assumes that
all not-guilty task's mm_history is already removed.
Tested with several known types of forkbombs and works well.
Note:
This doesn't have memory cgroup support because
1. it's difficult.
2. memory cgroup has oom_notify and oom_disable. The userland
management daemon can do better job than kernels.
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
---
mm/oom_kill.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 123 insertions(+)
Index: mm-work2/mm/oom_kill.c
===================================================================
--- mm-work2.orig/mm/oom_kill.c
+++ mm-work2/mm/oom_kill.c
@@ -83,6 +83,18 @@ static bool has_intersects_mems_allowed(
}
#endif /* CONFIG_NUMA */
+#ifdef CONFIG_FORKBOMB_KILLER
+static bool fork_bomb_killer(unsigned long totalpages, struct mem_cgroup *mem,
+ const nodemask_t *nodemask);
+#else
+static bool fork_bomb_killer(unsigned long totalpages, struct mem_cgroup *mem,
+ const nodemask_t *nodemask)
+{
+ return false;
+}
+#endif
+
+
/*
* If this is a system OOM (not a memcg OOM) and the task selected to be
* killed is not already running at high (RT) priorities, speed up the
@@ -705,6 +717,10 @@ void out_of_memory(struct zonelist *zone
mpol_mask = (constraint == CONSTRAINT_MEMORY_POLICY) ? nodemask : NULL;
check_panic_on_oom(constraint, gfp_mask, order, mpol_mask);
+ if (!sysctl_oom_kill_allocating_task)
+ if (fork_bomb_killer(totalpages, NULL, mpol_mask))
+ return;
+
read_lock(&tasklist_lock);
if (sysctl_oom_kill_allocating_task &&
!oom_unkillable_task(current, NULL, nodemask) &&
@@ -963,6 +979,113 @@ static struct mm_history *mm_history_sca
#define for_each_mm_history_safe(pos, tmp)\
for_each_mm_history_safe_under((pos), &init_hist, (tmp))
+atomic_t forkbomb_killing;
+bool nobomb = false;
+
+void clear_forkbomb_killing(struct work_struct *w)
+{
+ atomic_set(&forkbomb_killing, 0);
+ nobomb = false;
+}
+DECLARE_DELAYED_WORK(fork_bomb_work, clear_forkbomb_killing);
+
+void reset_forkbomb_killing(void)
+{
+ schedule_delayed_work(&fork_bomb_work, 10*HZ);
+}
+
+static void get_badness_score(struct mm_history *pos, struct mem_cgroup *mem,
+ const nodemask_t *nodemask, unsigned long totalpages)
+{
+ struct task_struct *task;
+
+ if (!pos->mm)
+ return;
+ /* task struct is freed by RCU and we;re under rcu_read_lock() */
+ task = pos->mm->owner;
+ if (task && !oom_unkillable_task(task, mem, nodemask))
+ pos->score += oom_badness(task, mem, nodemask, totalpages);
+}
+
+static void propagate_oom_info(struct mm_history *pos)
+{
+ struct mm_history *ppos;
+
+ ppos = pos->parent;
+ if (ppos == &init_hist) /* deadlink by timeout */
+ return;
+ /* +1 means that the child is a burden of the parent */
+ if (pos->mm) {
+ ppos->score += pos->score + 1;
+ ppos->family += pos->family;
+ } else {
+ ppos->score += pos->score;
+ ppos->family += pos->family;
+ }
+}
+
+static bool fork_bomb_killer(unsigned long totalpages, struct mem_cgroup *mem,
+ const nodemask_t *nodemask)
+{
+ struct mm_history *pos, *bomb;
+ unsigned int max_score;
+ struct task_struct *p;
+
+ if (nobomb || !mm_tracking_enabled)
+ return false;
+
+ if (atomic_inc_return(&forkbomb_killing) != 1)
+ return true;
+ /* reset information */
+ scan_history_lock();
+ nobomb = false;
+ pr_err("forkbomb detection running....\n");
+ for_each_mm_history(pos) {
+ pos->score = 0;
+ if (pos->mm)
+ pos->family = 1;
+ pos->need_to_kill = 0;
+ }
+ max_score = 0;
+ bomb = NULL;
+ for_each_mm_history(pos) {
+ get_badness_score(pos, mem, nodemask, totalpages);
+ propagate_oom_info(pos);
+ if (pos->score > max_score) {
+ bomb = pos;
+ max_score = pos->score;
+ }
+ }
+ if (!bomb || bomb->family < 10) {
+ scan_history_unlock();
+ nobomb = true;
+ reset_forkbomb_killing();
+ pr_err("no forkbomb found \n");
+ return false;
+ }
+
+ pr_err("Possible forkbomb. Killing _all_ doubtful tasks\n");
+ for_each_mm_history_under(pos, bomb) {
+ pos->need_to_kill = 1;
+ }
+ read_lock(&tasklist_lock);
+ for_each_process(p) {
+ if (!p->mm || oom_unkillable_task(p, mem, nodemask))
+ continue;
+ if (p->signal->oom_score_adj == -1000)
+ continue;
+ if (p->mm->history && p->mm->history->need_to_kill) {
+ pr_err("kill %d(%s)->%ld\n", task_pid_nr(p),
+ p->comm, p->mm->history->score);
+ force_sig(SIGKILL, p);
+ }
+ }
+ read_unlock(&tasklist_lock);
+ scan_history_unlock();
+ reset_forkbomb_killing();
+ return true;
+}
+
static unsigned long reset_interval_jiffies = 30*HZ;
unsigned long last_nr_procs;
unsigned long last_pageout_run;
Hi Kame,
On Thu, Mar 24, 2011 at 06:22:40PM +0900, KAMEZAWA Hiroyuki wrote:
>
> Cleaned up and fixed unclear logics. and removed RFC.
> Maybe this version is easy to be read.
>
>
> When we see forkbomb, it tends can be a fatal one.
>
> When A user makes a forkbomb (and sometimes reaches ulimit....
> In this case,
> - If the system is not in OOM, the admin may be able to kill all threads by
> hand..but forkbomb may be faster than pkill() by admin.
> - If the system is in OOM, the admin needs to reboot system.
> OOM killer is slow than forkbomb.
>
> So, I think forkbomb killer is appreciated. It's better than reboot.
>
> At implementing forkbomb killer, one of difficult case is like this
>
> # forkbomb(){ forkbomb|forkbomb & } ; forkbomb
>
> With this, parent tasks will exit() before the system goes under OOM.
> So, it's difficult to know the whole image of forkbomb.
>
> This patch introduce a subsystem to track mm's history and records it
> even after the task exit. (It will be flushed periodically.)
>
> I tested with several forkbomb cases and this patch seems work fine.
>
> Maybe some more 'heuristics' can be added....but I think this simple
> one works enough. Any comments are welcome.
Sorry for the late review. Recently I dont' have enough time to review patches.
Even I didn't start to review this series but I want to review this series.
It's one of my interest features. :)
But before digging in code, I would like to make a consensus to others to
need this feature. Let's Cc others.
What I think is that about "cost(frequent case) VS effectiveness(very rare case)"
as you expected. :)
1. At least, I don't meet any fork-bomb case for a few years. My primary linux usage
is just desktop and developement enviroment, NOT server. Only thing I have seen is
just ltp or intentional fork-bomb test like hackbench. AFAIR, ltp case was fixed
a few years ago. Although it happens suddenly, reboot in desktop isn't critical
as much as server's one.
2. I don't know server enviroment but I think applications executing on server
are selected by admin carefully. So virus program like fork-bomb is unlikely in there.
(Maybe I am wrong. You know than me).
If some normal program becomes fork-bomb unexpectedly, it's critical.
Admin should select application with much testing very carefully. But I don't know
the reality. :(
Of course, although he did such efforts, he could meet OOM hang situation.
In the case, he can't avoid rebooting. Sad. But for helping him, should we pay cost
in normal situation?(Again said, I didn't start looking at your code so
I can't expect the cost but at least it's more than as-is).
It could help developing many virus program and to make careless admins.
It's just my private opinion.
I don't have enough experience so I hope listen other's opinions
about generic fork-bomb killer, not memcg.
I don't intend to ignore your effort but justify your and my effort rightly.
Thanks for your effort, Kame. :)
--
Kind regards,
Minchan Kim
On Thu, 24 Mar 2011 19:52:22 +0900
Minchan Kim <[email protected]> wrote:
> Hi Kame,
>
Hi.
> On Thu, Mar 24, 2011 at 06:22:40PM +0900, KAMEZAWA Hiroyuki wrote:
> >
> > I tested with several forkbomb cases and this patch seems work fine.
> >
> > Maybe some more 'heuristics' can be added....but I think this simple
> > one works enough. Any comments are welcome.
>
> Sorry for the late review. Recently I dont' have enough time to review patches.
> Even I didn't start to review this series but I want to review this series.
> It's one of my interest features. :)
>
> But before digging in code, I would like to make a consensus to others to
> need this feature. Let's Cc others.
>
> What I think is that about "cost(frequent case) VS effectiveness(very rare case)"
> as you expected. :)
>
> 1. At least, I don't meet any fork-bomb case for a few years. My primary linux usage
> is just desktop and developement enviroment, NOT server. Only thing I have seen is
> just ltp or intentional fork-bomb test like hackbench. AFAIR, ltp case was fixed
> a few years ago. Although it happens suddenly, reboot in desktop isn't critical
> as much as server's one.
>
Personally, I've met forkbombs several times by typing "make -j" .....by mistake.
I met a forkbomb on production system by buggy script, once.
That happens because
1. $PATH includes "."
2. a programmer write a scirpt "date" and call "date" in the script.
Maybe this is a one of typical case of forkbomb. I needed to dig crashdump to find
fragile of page-caches and see what happens...But, I guess, if appearent forkbomb
happens, the issue will not be sent to my team because we're 2nd line support team
and 1st line should block it ;).
So, I'm not sure how many forkbombs happens in server world in a year. But I guess
forkbomb still happens in many development systems because there is no guard
against it.
> 2. I don't know server enviroment but I think applications executing on server
> are selected by admin carefully. So virus program like fork-bomb is unlikely in there.
> (Maybe I am wrong. You know than me).
> If some normal program becomes fork-bomb unexpectedly, it's critical.
> Admin should select application with much testing very carefully. But I don't know
> the reality. :(
>
Yes, admin selects applications carefully. There is no 100% protection by human's hand.
> Of course, although he did such efforts, he could meet OOM hang situation.
> In the case, he can't avoid rebooting. Sad. But for helping him, should we pay cost
> in normal situation?(Again said, I didn't start looking at your code so
> I can't expect the cost but at least it's more than as-is).
> It could help developing many virus program and to make careless admins.
>
> It's just my private opinion.
> I don't have enough experience so I hope listen other's opinions
> about generic fork-bomb killer, not memcg.
>
> I don't intend to ignore your effort but justify your and my effort rightly.
>
To me, the fact "the system _can_ be broken by a normal user program" is the most
terrible thing. With Andrey's case or make -j, a user doesn't need to be an admin.
I believe it's worth to pay costs.
(and I made this function configurable and can be turned off by sysfs.)
And while testing Andrey's case, I used KVM finaly becasue cost of rebooting was small.
My development server is on other building and I need to push server's button
to reboot it when forkbomb happens ;)
In some environement, cost of rebooting is not small even if it's a development system.
Thanks,
-Kame
On Fri, Mar 25, 2011 at 9:04 AM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Thu, 24 Mar 2011 19:52:22 +0900
> Minchan Kim <[email protected]> wrote:
>
>> Hi Kame,
>>
> Hi.
>
>> On Thu, Mar 24, 2011 at 06:22:40PM +0900, KAMEZAWA Hiroyuki wrote:
>> >
>> > I tested with several forkbomb cases and this patch seems work fine.
>> >
>> > Maybe some more 'heuristics' can be added....but I think this simple
>> > one works enough. Any comments are welcome.
>>
>> Sorry for the late review. Recently I dont' have enough time to review patches.
>> Even I didn't start to review this series but I want to review this series.
>> It's one of my interest features. :)
>>
>> But before digging in code, I would like to make a consensus to others to
>> need this feature. Let's Cc others.
>>
>> What I think is that about "cost(frequent case) VS effectiveness(very rare case)"
>> as you expected. :)
>>
>> 1. At least, I don't meet any fork-bomb case for a few years. My primary linux usage
>> is just desktop and developement enviroment, NOT server. Only thing I have seen is
>> just ltp or intentional fork-bomb test like hackbench. AFAIR, ltp case was fixed
>> a few years ago. Although it happens suddenly, reboot in desktop isn't critical
>> as much as server's one.
>>
>
> Personally, I've met forkbombs several times by typing "make -j" .....by mistake.
>
> I met a forkbomb on production system by buggy script, once.
> That happens because
> 1. $PATH includes "."
> 2. a programmer write a scirpt "date" and call "date" in the script.
>
> Maybe this is a one of typical case of forkbomb. I needed to dig crashdump to find
> fragile of page-caches and see what happens...But, I guess, if appearent forkbomb
> happens, the issue will not be sent to my team because we're 2nd line support team
> and 1st line should block it ;).
>
> So, I'm not sure how many forkbombs happens in server world in a year. But I guess
> forkbomb still happens in many development systems because there is no guard
> against it.
>
>
>> 2. I don't know server enviroment but I think applications executing on server
>> are selected by admin carefully. So virus program like fork-bomb is unlikely in there.
>> (Maybe I am wrong. You know than me).
>> If some normal program becomes fork-bomb unexpectedly, it's critical.
>> Admin should select application with much testing very carefully. But I don't know
>> the reality. :(
>>
>
> Yes, admin selects applications carefully. There is no 100% protection by human's hand.
>
>
>> Of course, although he did such efforts, he could meet OOM hang situation.
>> In the case, he can't avoid rebooting. Sad. But for helping him, should we pay cost
>> in normal situation?(Again said, I didn't start looking at your code so
>> I can't expect the cost but at least it's more than as-is).
>> It could help developing many virus program and to make careless admins.
>>
>> It's just my private opinion.
>> I don't have enough experience so I hope listen other's opinions
>> about generic fork-bomb killer, not memcg.
>>
>> I don't intend to ignore your effort but justify your and my effort rightly.
>>
>
> To me, the fact "the system _can_ be broken by a normal user program" is the most
> terrible thing. With Andrey's case or make -j, a user doesn't need to be an admin.
> I believe it's worth to pay costs.
> (and I made this function configurable and can be turned off by sysfs.)
>
> And while testing Andrey's case, I used KVM finaly becasue cost of rebooting was small.
> My development server is on other building and I need to push server's button
> to reboot it when forkbomb happens ;)
> In some environement, cost of rebooting is not small even if it's a development system.
>
Forkbomb is very rare case in normal situation but if it happens, the
cost like reboot would be big. So we need the such facility. I agree.
(But I don't know why others don't have a interest if it is important
task. Maybe they are so busy due to rc1)
Just a concern is cost.
The approach is we can enhance your approach to minimize the cost but
apparently it would have a limitation.
Other approach is we can provide new rescue facility.
What I have thought is new sysrq about killing fork-bomb.
If we execute the new sysrq, the kernel freezes all tasks so forkbomb
can't execute any more and kernel ready to receive the command to show
the system state. Admin can investigate which is fork-bomb and then he
kill the tasks. At last, admin restarts all processes with new sysrq
and processes which received SIGKILL start to die.
This approach offloads kernel's heuristic forkbomb detection to admin
and avoid runtime cost in normal situation.
I don't have any code to implement above the concept so it might be ridiculous.
What do you think about it?
>
> Thanks,
> -Kame
>
>
>
>
>
>
>
>
>
>
--
Kind regards,
Minchan Kim
On Fri, 25 Mar 2011 11:38:19 +0900
Minchan Kim <[email protected]> wrote:
> On Fri, Mar 25, 2011 at 9:04 AM, KAMEZAWA Hiroyuki
> <[email protected]> wrote:
> > On Thu, 24 Mar 2011 19:52:22 +0900
> > Minchan Kim <[email protected]> wrote:
> > To me, the fact "the system _can_ be broken by a normal user program" is the most
> > terrible thing. With Andrey's case or make -j, a user doesn't need to be an admin.
> > I believe it's worth to pay costs.
> > (and I made this function configurable and can be turned off by sysfs.)
> >
> > And while testing Andrey's case, I used KVM finaly becasue cost of rebooting was small.
> > My development server is on other building and I need to push server's button
> > to reboot it when forkbomb happens ;)
> > In some environement, cost of rebooting is not small even if it's a development system.
> >
>
> Forkbomb is very rare case in normal situation but if it happens, the
> cost like reboot would be big. So we need the such facility. I agree.
> (But I don't know why others don't have a interest if it is important
> task. Maybe they are so busy due to rc1)
> Just a concern is cost.
me, too.
> The approach is we can enhance your approach to minimize the cost but
> apparently it would have a limitation.
>
agreed. "tracking" always costs.
> Other approach is we can provide new rescue facility.
> What I have thought is new sysrq about killing fork-bomb.
>
Mine works fine with Sysrq+f. But, I need to go to other building
for pushing Sysrq.....
> If we execute the new sysrq, the kernel freezes all tasks so forkbomb
> can't execute any more and kernel ready to receive the command to show
> the system state. Admin can investigate which is fork-bomb and then he
> kill the tasks. At last, admin restarts all processes with new sysrq
> and processes which received SIGKILL start to die.
>
> This approach offloads kernel's heuristic forkbomb detection to admin
> and avoid runtime cost in normal situation.
> I don't have any code to implement above the concept so it might be ridiculous.
>
> What do you think about it?
>
For usual user, forkbmob killer works better, rather than special console for
fatal system.
I can think of 2 similar works. One is Windows's TaskManager. You can kill tasks
with it (and I guess TaskManager is always on memory...) Another one is
"guarantee" or "preserve XXXX for special apps." which clustering guys wants for
quick server failover.
If trouble happens,
- freeze all apps other than HA apps.
- open the gate for hidden preserved resources (of memory / disks)
- do safe failover to other server.
- do necessary jobs and reboot.
So, you need to preserve some resources for recover...IOW, have to pay costs.
BTW, Sysrq/TaskManager/Failover doesn't help me, using development system via network.
Thanks,
-Kame
On Fri, Mar 25, 2011 at 11:54 AM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Fri, 25 Mar 2011 11:38:19 +0900
> Minchan Kim <[email protected]> wrote:
>
>> On Fri, Mar 25, 2011 at 9:04 AM, KAMEZAWA Hiroyuki
>> <[email protected]> wrote:
>> > On Thu, 24 Mar 2011 19:52:22 +0900
>> > Minchan Kim <[email protected]> wrote:
>> > To me, the fact "the system _can_ be broken by a normal user program" is the most
>> > terrible thing. With Andrey's case or make -j, a user doesn't need to be an admin.
>> > I believe it's worth to pay costs.
>> > (and I made this function configurable and can be turned off by sysfs.)
>> >
>> > And while testing Andrey's case, I used KVM finaly becasue cost of rebooting was small.
>> > My development server is on other building and I need to push server's button
>> > to reboot it when forkbomb happens ;)
>> > In some environement, cost of rebooting is not small even if it's a development system.
>> >
>>
>> Forkbomb is very rare case in normal situation but if it happens, the
>> cost like reboot would be big. So we need the such facility. I agree.
>> (But I don't know why others don't have a interest if it is important
>> task. Maybe they are so busy due to rc1)
>> Just a concern is cost.
>
> me, too.
>
>> The approach is we can enhance your approach to minimize the cost but
>> apparently it would have a limitation.
>>
> agreed. "tracking" always costs.
>
>> Other approach is we can provide new rescue facility.
>> What I have thought is new sysrq about killing fork-bomb.
>>
> Mine works fine with Sysrq+f. But, I need to go to other building
> for pushing Sysrq.....
>
>> If we execute the new sysrq, the kernel freezes all tasks so forkbomb
>> can't execute any more and kernel ready to receive the command to show
>> the system state. Admin can investigate which is fork-bomb and then he
>> kill the tasks. At last, admin restarts all processes with new sysrq
>> and processes which received SIGKILL start to die.
>>
>> This approach offloads kernel's heuristic forkbomb detection to admin
>> and avoid runtime cost in normal situation.
>> I don't have any code to implement above the concept so it might be ridiculous.
>>
>> What do you think about it?
>>
> For usual user, forkbmob killer works better, rather than special console for
> fatal system.
>
> I can think of 2 similar works. One is Windows's TaskManager. You can kill tasks
> with it (and I guess TaskManager is always on memory...) Another one is
> "guarantee" or "preserve XXXX for special apps." which clustering guys wants for
> quick server failover.
>
> If trouble happens,
> - freeze all apps other than HA apps.
> - open the gate for hidden preserved resources (of memory / disks)
> - do safe failover to other server.
> - do necessary jobs and reboot.
>
> So, you need to preserve some resources for recover...IOW, have to pay costs.
>
> BTW, Sysrq/TaskManager/Failover doesn't help me, using development system via network.
Okay. Each approach has a pros and cons and at least, now anyone
doesn't provide any method and comments but I agree it is needed(ex,
careless and lazy admin could need it strongly). Let us wait a little
bit more. Maybe google guys or redhat/suse guys would have a opinion.
Regardless of them, I will review series when I have rest time.
Thanks, Kame.
--
Kind regards,
Minchan Kim
Is there anything here that couldn't be solved with a proper cgroups
configuration? In fact, wasn't this type of problem the original
cgroups motivation?
If you set up cpu+memory cgroup properly, I think it works well.
For well scheduled servers or some production devices, all
applications and relationship
of them can be designed properly and you can find the best cgroup set.
For a some desktop environ like mine, which has 1-4G of memory, I think a user
doesn't want to divide resources (limiting memory) for emergency
because I want to
use full resources of my poor host. Of course, I use memcg when I
handle very big
file or memory by an application when I can think of bad effects of that.
And, with experiences in ML.... I've advised "please use memcg" when I see
emails/questions about OOM....but there are still periodic OOM report to ML ;)
Maybe usual users doesn't pay costs to avoid some emergency by themselves.
(Some good daemon software should do that.)
I feel the kernel itself should have the last resort to quit
hard-to-recover status.
Thanks,
-Kame
On Fri, Mar 25, 2011 at 01:05:50PM +0900, Minchan Kim wrote:
> Okay. Each approach has a pros and cons and at least, now anyone
> doesn't provide any method and comments but I agree it is needed(ex,
> careless and lazy admin could need it strongly). Let us wait a little
> bit more. Maybe google guys or redhat/suse guys would have a opinion.
I haven't heard of fork bombs being an issue for us (and it's not been
for me on my desktop, either).
Also, I want to point out that there is a classical userspace solution
for this, as implemented by killall5 for example. One can do
kill(-1, SIGSTOP) to stop all processes that they can send
signals to (except for init and itself). Target processes
can never catch or ignore the SIGSTOP. This stops the fork bomb
from causing further damage. Then, one can look at the process
tree and do whatever is appropriate - including killing by uid,
by cgroup or whatever policies one wants to implement in userspace.
Finally, the remaining processes can be restarted using SIGCONT.
--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
2011/3/26 Michel Lespinasse <[email protected]>:
> On Fri, Mar 25, 2011 at 01:05:50PM +0900, Minchan Kim wrote:
>> Okay. Each approach has a pros and cons and at least, now anyone
>> doesn't provide any method and comments but I agree it is needed(ex,
>> careless and lazy admin could need it strongly). Let us wait a little
>> bit more. Maybe google guys or redhat/suse guys would have a opinion.
>
> I haven't heard of fork bombs being an issue for us (and it's not been
> for me on my desktop, either).
>
> Also, I want to point out that there is a classical userspace solution
> for this, as implemented by killall5 for example. One can do
> kill(-1, SIGSTOP) to stop all processes that they can send
> signals to (except for init and itself). Target processes
> can never catch or ignore the SIGSTOP. This stops the fork bomb
> from causing further damage. Then, one can look at the process
> tree and do whatever is appropriate - including killing by uid,
> by cgroup or whatever policies one wants to implement in userspace.
> Finally, the remaining processes can be restarted using SIGCONT.
>
Can that solution work even under OOM situation without new login/commands ?
Please show us your solution, how to avoid Andrey's Bomb with your way.
Then, we can add Documentation, at least. Or you can show us your tool.
Maybe it is....
- running as a daemon. (because it has to lock its work memory before OOM.)
- mlockall its own memory to work under OOM.
- It can show process tree of users/admin or do all in automatic way
with user's policy.
- tell us which process is guilty.
- wakes up automatically when OOM happens.....IOW, OOM should have some notifier
to userland.
- never allocate any memory at running. (maybe it can't use libc.)
- never be blocked by any locks, for example, some other task's mmap_sem.
One of typical mistakes of admins at OOM is typing 'ps' to see what
happens.....
- Can be used even with GUI system, which can't show console.
Thanks,
-Kame
On Sat, Mar 26, 2011 at 05:48:45PM +0900, Hiroyuki Kamezawa wrote:
> 2011/3/26 Michel Lespinasse <[email protected]>:
> > On Fri, Mar 25, 2011 at 01:05:50PM +0900, Minchan Kim wrote:
> >> Okay. Each approach has a pros and cons and at least, now anyone
> >> doesn't provide any method and comments but I agree it is needed(ex,
> >> careless and lazy admin could need it strongly). Let us wait a little
> >> bit more. Maybe google guys or redhat/suse guys would have a opinion.
> >
> > I haven't heard of fork bombs being an issue for us (and it's not been
> > for me on my desktop, either).
> >
> > Also, I want to point out that there is a classical userspace solution
> > for this, as implemented by killall5 for example. One can do
> > kill(-1, SIGSTOP) to stop all processes that they can send
> > signals to (except for init and itself). Target processes
> > can never catch or ignore the SIGSTOP. This stops the fork bomb
> > from causing further damage. Then, one can look at the process
> > tree and do whatever is appropriate - including killing by uid,
> > by cgroup or whatever policies one wants to implement in userspace.
> > Finally, the remaining processes can be restarted using SIGCONT.
> >
>
> Can that solution work even under OOM situation without new login/commands ?
> Please show us your solution, how to avoid Andrey's Bomb with your way.
> Then, we can add Documentation, at least. Or you can show us your tool.
>
> Maybe it is....
> - running as a daemon. (because it has to lock its work memory before OOM.)
> - mlockall its own memory to work under OOM.
> - It can show process tree of users/admin or do all in automatic way
> with user's policy.
> - tell us which process is guilty.
> - wakes up automatically when OOM happens.....IOW, OOM should have some notifier
> to userland.
> - never allocate any memory at running. (maybe it can't use libc.)
> - never be blocked by any locks, for example, some other task's mmap_sem.
> One of typical mistakes of admins at OOM is typing 'ps' to see what
> happens.....
> - Can be used even with GUI system, which can't show console.
Hi Kame,
I am worried about run-time cost.
Should we care of mistake of users for robustness of OS?
Mostly right but we can't handle all mistakes of user so we need admin.
For exampe, what happens if admin execute "rm -rf /"?
For avoiding it, we get a solution "backup" about critical data.
In the same manner, if the system is very critical of forkbomb,
admin should consider it using memcg, virtualization, ulimit and so on.
If he don't want it, he should become a hard worker who have to
cross over other building to reboot it. Although he is a diligent man,
Reboot isn't good. So I suggest following patch which is just RFC.
For making formal patch, I have to add more comment and modify sysrq.txt.
>From 51bec44086a6b6c0e56ea978a2eb47e995236b47 Mon Sep 17 00:00:00 2001
From: Minchan Kim <[email protected]>
Date: Tue, 29 Mar 2011 00:52:20 +0900
Subject: [PATCH] [RFC] Prevent livelock by forkbomb
Recently, We discussed how to prevent forkbomb.
The thing is a trade-off between cost VS effect.
Forkbomb is a _race_ case which happes by someone's mistake
so if we have to pay cost in fast path(ex, fork, exec, exit),
It's a not good.
Now, sysrq + I kills all processes. When I tested it, I still
need rebooting to work my system really well(ex, x start)
although console works. I don't know why we need such sysrq(kill
all processes and then what we can do?)
So I decide to change sysrq + I to meet our goal which prevent
forkbomb. The rationale is following as.
Forkbomb means somethings makes repeately tasks in a short time so
system don't have a free page then it become almost livelock state.
This patch uses the characteristc of forkbomb.
When you push sysrq + I, it kills recent created tasks.
(In this version, 1 minutes). Maybe all processes included
forkbomb tasks are killed. If you can't get normal state of system
after you push sysrq + I, you can try one more. It can kill futher
recent tasks(ex, 2 minutes).
You can continue to do it until your system becomes normal state.
Signed-off-by: Minchan Kim <[email protected]>
---
drivers/tty/sysrq.c | 45 ++++++++++++++++++++++++++++++++++++++++++---
include/linux/sched.h | 6 ++++++
2 files changed, 48 insertions(+), 3 deletions(-)
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index 81f1395..6fb7e18 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -329,6 +329,45 @@ static void send_sig_all(int sig)
}
}
+static void send_sig_recent(int sig)
+{
+ struct task_struct *p;
+ unsigned long task_jiffies, last_jiffies = 0;
+ bool kill = false;
+
+retry:
+ for_each_process_reverse(p) {
+ if (p->mm && !is_global_init(p) && !fatal_signal_pending(p)) {
+ /* recent created task */
+ last_jiffies = timeval_to_jiffies(p->real_start_time);
+ force_sig(sig, p);
+ break;
+ }
+ }
+
+ for_each_process_reverse(p) {
+ if (p->mm && !is_global_init(p)) {
+ task_jiffies = timeval_to_jiffies(p->real_start_time);
+ /*
+ * Kill all processes which are created recenlty
+ * (ex, 1 minutes)
+ */
+ if (task_jiffies > (last_jiffies - 60 * HZ)) {
+ force_sig(sig, p);
+ kill = true;
+ }
+ else
+ break;
+ }
+ }
+
+ /*
+ * If we can't kill anything, restart with next group.
+ */
+ if (!kill)
+ goto retry;
+}
+
static void sysrq_handle_term(int key)
{
send_sig_all(SIGTERM);
@@ -374,13 +413,13 @@ static struct sysrq_key_op sysrq_thaw_op = {
static void sysrq_handle_kill(int key)
{
- send_sig_all(SIGKILL);
+ send_sig_recent(SIGKILL);
console_loglevel = 8;
}
static struct sysrq_key_op sysrq_kill_op = {
.handler = sysrq_handle_kill,
- .help_msg = "kill-all-tasks(I)",
- .action_msg = "Kill All Tasks",
+ .help_msg = "kill-recent-tasks(I)",
+ .action_msg = "Kill Recent Tasks",
.enable_mask = SYSRQ_ENABLE_SIGNAL,
};
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 777d8a5..ddd0a40 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2194,12 +2194,18 @@ static inline unsigned long wait_task_inactive(struct task_struct *p,
}
#endif
+#define prev_task(p) \
+ list_entry_rcu((p)->tasks.prev, struct task_struct, tasks)
+
#define next_task(p) \
list_entry_rcu((p)->tasks.next, struct task_struct, tasks)
#define for_each_process(p) \
for (p = &init_task ; (p = next_task(p)) != &init_task ; )
+#define for_each_process_reverse(p) \
+ for (p = &init_task ; (p = prev_task(p)) != &init_task ; )
+
extern bool current_is_single_threaded(void);
/*
--
1.7.1
>
> Thanks,
> -Kame
--
Kind regards,
Minchan Kim
On Sat, Mar 26, 2011 at 1:48 AM, Hiroyuki Kamezawa
<[email protected]> wrote:
> 2011/3/26 Michel Lespinasse <[email protected]>:
>> I haven't heard of fork bombs being an issue for us (and it's not been
>> for me on my desktop, either).
>>
>> Also, I want to point out that there is a classical userspace solution
>> for this, as implemented by killall5 for example. One can do
>> kill(-1, SIGSTOP) to stop all processes that they can send
>> signals to (except for init and itself). Target processes
>> can never catch or ignore the SIGSTOP. This stops the fork bomb
>> from causing further damage. Then, one can look at the process
>> tree and do whatever is appropriate - including killing by uid,
>> by cgroup or whatever policies one wants to implement in userspace.
>> Finally, the remaining processes can be restarted using SIGCONT.
>>
>
> Can that solution work even under OOM situation without new login/commands ?
> Please show us your solution, how to avoid Andrey's Bomb ?with your way.
> Then, we can add Documentation, at least. Or you can show us your tool.
To be clear, I don't have a full solution. I just think that the
problem is approachable from userspace by freezing processes and then
sorting them out. The killall5 utility is an example of that, though
you would possibly want to add more smarts to it. If we want to
include a kernel solution, I do like the simplicity of Minchan's
proposal, too. But, I don't have a strong opinion on this matter, so
feel free to ignore me if this is not useful feedback.
--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
On Tue, 29 Mar 2011 01:21:37 +0900
Minchan Kim <[email protected]> wrote:
> On Sat, Mar 26, 2011 at 05:48:45PM +0900, Hiroyuki Kamezawa wrote:
> > 2011/3/26 Michel Lespinasse <[email protected]>:
> > > On Fri, Mar 25, 2011 at 01:05:50PM +0900, Minchan Kim wrote:
> > >> Okay. Each approach has a pros and cons and at least, now anyone
> > >> doesn't provide any method and comments but I agree it is needed(ex,
> > >> careless and lazy admin could need it strongly). Let us wait a little
> > >> bit more. Maybe google guys or redhat/suse guys would have a opinion.
> > >
> > > I haven't heard of fork bombs being an issue for us (and it's not been
> > > for me on my desktop, either).
> > >
> > > Also, I want to point out that there is a classical userspace solution
> > > for this, as implemented by killall5 for example. One can do
> > > kill(-1, SIGSTOP) to stop all processes that they can send
> > > signals to (except for init and itself). Target processes
> > > can never catch or ignore the SIGSTOP. This stops the fork bomb
> > > from causing further damage. Then, one can look at the process
> > > tree and do whatever is appropriate - including killing by uid,
> > > by cgroup or whatever policies one wants to implement in userspace.
> > > Finally, the remaining processes can be restarted using SIGCONT.
> > >
> >
> > Can that solution work even under OOM situation without new login/commands ?
> > Please show us your solution, how to avoid Andrey's Bomb with your way.
> > Then, we can add Documentation, at least. Or you can show us your tool.
> >
> > Maybe it is....
> > - running as a daemon. (because it has to lock its work memory before OOM.)
> > - mlockall its own memory to work under OOM.
> > - It can show process tree of users/admin or do all in automatic way
> > with user's policy.
> > - tell us which process is guilty.
> > - wakes up automatically when OOM happens.....IOW, OOM should have some notifier
> > to userland.
> > - never allocate any memory at running. (maybe it can't use libc.)
> > - never be blocked by any locks, for example, some other task's mmap_sem.
> > One of typical mistakes of admins at OOM is typing 'ps' to see what
> > happens.....
> > - Can be used even with GUI system, which can't show console.
>
> Hi Kame,
>
> I am worried about run-time cost.
> Should we care of mistake of users for robustness of OS?
> Mostly right but we can't handle all mistakes of user so we need admin.
> For exampe, what happens if admin execute "rm -rf /"?
> For avoiding it, we get a solution "backup" about critical data.
>
Then, my patch is configurable and has control knobs....never invasive for
people who don't want it. And simple and very low cost. It will have
no visible performance/resource usage impact for usual guys.
> In the same manner, if the system is very critical of forkbomb,
> admin should consider it using memcg, virtualization, ulimit and so on.
> If he don't want it, he should become a hard worker who have to
> cross over other building to reboot it. Although he is a diligent man,
> Reboot isn't good. So I suggest following patch which is just RFC.
> For making formal patch, I have to add more comment and modify sysrq.txt.
>
For me, sysrq is of-no-use as I explained.
> From 51bec44086a6b6c0e56ea978a2eb47e995236b47 Mon Sep 17 00:00:00 2001
> From: Minchan Kim <[email protected]>
> Date: Tue, 29 Mar 2011 00:52:20 +0900
> Subject: [PATCH] [RFC] Prevent livelock by forkbomb
>
> Recently, We discussed how to prevent forkbomb.
> The thing is a trade-off between cost VS effect.
>
> Forkbomb is a _race_ case which happes by someone's mistake
> so if we have to pay cost in fast path(ex, fork, exec, exit),
> It's a not good.
>
> Now, sysrq + I kills all processes. When I tested it, I still
> need rebooting to work my system really well(ex, x start)
> although console works. I don't know why we need such sysrq(kill
> all processes and then what we can do?)
>
> So I decide to change sysrq + I to meet our goal which prevent
> forkbomb. The rationale is following as.
>
> Forkbomb means somethings makes repeately tasks in a short time so
> system don't have a free page then it become almost livelock state.
> This patch uses the characteristc of forkbomb.
>
> When you push sysrq + I, it kills recent created tasks.
> (In this version, 1 minutes). Maybe all processes included
> forkbomb tasks are killed. If you can't get normal state of system
> after you push sysrq + I, you can try one more. It can kill futher
> recent tasks(ex, 2 minutes).
>
> You can continue to do it until your system becomes normal state.
>
> Signed-off-by: Minchan Kim <[email protected]>
> ---
> drivers/tty/sysrq.c | 45 ++++++++++++++++++++++++++++++++++++++++++---
> include/linux/sched.h | 6 ++++++
> 2 files changed, 48 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> index 81f1395..6fb7e18 100644
> --- a/drivers/tty/sysrq.c
> +++ b/drivers/tty/sysrq.c
> @@ -329,6 +329,45 @@ static void send_sig_all(int sig)
> }
> }
>
> +static void send_sig_recent(int sig)
> +{
> + struct task_struct *p;
> + unsigned long task_jiffies, last_jiffies = 0;
> + bool kill = false;
> +
> +retry:
you need tasklist lock for scanning reverse.
> + for_each_process_reverse(p) {
> + if (p->mm && !is_global_init(p) && !fatal_signal_pending(p)) {
> + /* recent created task */
> + last_jiffies = timeval_to_jiffies(p->real_start_time);
> + force_sig(sig, p);
> + break;
why break ? you need to kill all youngers. And what is the relationship with below ?
> + }
> + }
> +
> + for_each_process_reverse(p) {
> + if (p->mm && !is_global_init(p)) {
> + task_jiffies = timeval_to_jiffies(p->real_start_time);
> + /*
> + * Kill all processes which are created recenlty
> + * (ex, 1 minutes)
> + */
> + if (task_jiffies > (last_jiffies - 60 * HZ)) {
> + force_sig(sig, p);
> + kill = true;
> + }
> + else
> + break;
> + }
> + }
> +
> + /*
> + * If we can't kill anything, restart with next group.
> + */
> + if (!kill)
> + goto retry;
> +}
This is not useful under OOM situation, we cannot use 'jiffies' to find younger tasks
because "memory reclaim-> livelock" can take some amount of minutes very easily.
So, I used other metrics. I think you do the same mistake I made before,
this doesn't work.
Thanks,
-Kame
On Tue, Mar 29, 2011 at 8:50 AM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Tue, 29 Mar 2011 01:21:37 +0900
> Minchan Kim <[email protected]> wrote:
>
>> On Sat, Mar 26, 2011 at 05:48:45PM +0900, Hiroyuki Kamezawa wrote:
>> > 2011/3/26 Michel Lespinasse <[email protected]>:
>> > > On Fri, Mar 25, 2011 at 01:05:50PM +0900, Minchan Kim wrote:
>> > >> Okay. Each approach has a pros and cons and at least, now anyone
>> > >> doesn't provide any method and comments but I agree it is needed(ex,
>> > >> careless and lazy admin could need it strongly). Let us wait a little
>> > >> bit more. Maybe google guys or redhat/suse guys would have a opinion.
>> > >
>> > > I haven't heard of fork bombs being an issue for us (and it's not been
>> > > for me on my desktop, either).
>> > >
>> > > Also, I want to point out that there is a classical userspace solution
>> > > for this, as implemented by killall5 for example. One can do
>> > > kill(-1, SIGSTOP) to stop all processes that they can send
>> > > signals to (except for init and itself). Target processes
>> > > can never catch or ignore the SIGSTOP. This stops the fork bomb
>> > > from causing further damage. Then, one can look at the process
>> > > tree and do whatever is appropriate - including killing by uid,
>> > > by cgroup or whatever policies one wants to implement in userspace.
>> > > Finally, the remaining processes can be restarted using SIGCONT.
>> > >
>> >
>> > Can that solution work even under OOM situation without new login/commands ?
>> > Please show us your solution, how to avoid Andrey's Bomb with your way.
>> > Then, we can add Documentation, at least. Or you can show us your tool.
>> >
>> > Maybe it is....
>> > - running as a daemon. (because it has to lock its work memory before OOM.)
>> > - mlockall its own memory to work under OOM.
>> > - It can show process tree of users/admin or do all in automatic way
>> > with user's policy.
>> > - tell us which process is guilty.
>> > - wakes up automatically when OOM happens.....IOW, OOM should have some notifier
>> > to userland.
>> > - never allocate any memory at running. (maybe it can't use libc.)
>> > - never be blocked by any locks, for example, some other task's mmap_sem.
>> > One of typical mistakes of admins at OOM is typing 'ps' to see what
>> > happens.....
>> > - Can be used even with GUI system, which can't show console.
>>
>> Hi Kame,
>>
>> I am worried about run-time cost.
>> Should we care of mistake of users for robustness of OS?
>> Mostly right but we can't handle all mistakes of user so we need admin.
>> For exampe, what happens if admin execute "rm -rf /"?
>> For avoiding it, we get a solution "backup" about critical data.
>>
>
> Then, my patch is configurable and has control knobs....never invasive for
> people who don't want it. And simple and very low cost. It will have
> no visible performance/resource usage impact for usual guys.
>
>
>
>> In the same manner, if the system is very critical of forkbomb,
>> admin should consider it using memcg, virtualization, ulimit and so on.
>> If he don't want it, he should become a hard worker who have to
>> cross over other building to reboot it. Although he is a diligent man,
>> Reboot isn't good. So I suggest following patch which is just RFC.
>> For making formal patch, I have to add more comment and modify sysrq.txt.
>>
>
> For me, sysrq is of-no-use as I explained.
Go to other building and new login?
I think if server is important on such problem, it should have a solution.
The solution can be careful admin step or console with serial for
sysrq step or your forkbomb killer. We have been used sysrq with local
solution of last resort. In such context, sysrq solution ins't bad, I
think.
If you can't provide 1 and 2, your forkbomb killer would be a last resort.
But someone can solve the problem in just careful admin or sysrq.
In that case, the user can disable forkbomb killer then it doesn't
affect system performance at all.
So maybe It could be separate topic.
>
>> From 51bec44086a6b6c0e56ea978a2eb47e995236b47 Mon Sep 17 00:00:00 2001
>> From: Minchan Kim <[email protected]>
>> Date: Tue, 29 Mar 2011 00:52:20 +0900
>> Subject: [PATCH] [RFC] Prevent livelock by forkbomb
>>
>> Recently, We discussed how to prevent forkbomb.
>> The thing is a trade-off between cost VS effect.
>>
>> Forkbomb is a _race_ case which happes by someone's mistake
>> so if we have to pay cost in fast path(ex, fork, exec, exit),
>> It's a not good.
>>
>> Now, sysrq + I kills all processes. When I tested it, I still
>> need rebooting to work my system really well(ex, x start)
>> although console works. I don't know why we need such sysrq(kill
>> all processes and then what we can do?)
>>
>> So I decide to change sysrq + I to meet our goal which prevent
>> forkbomb. The rationale is following as.
>>
>> Forkbomb means somethings makes repeately tasks in a short time so
>> system don't have a free page then it become almost livelock state.
>> This patch uses the characteristc of forkbomb.
>>
>> When you push sysrq + I, it kills recent created tasks.
>> (In this version, 1 minutes). Maybe all processes included
>> forkbomb tasks are killed. If you can't get normal state of system
>> after you push sysrq + I, you can try one more. It can kill futher
>> recent tasks(ex, 2 minutes).
>>
>> You can continue to do it until your system becomes normal state.
>>
>> Signed-off-by: Minchan Kim <[email protected]>
>> ---
>> drivers/tty/sysrq.c | 45 ++++++++++++++++++++++++++++++++++++++++++---
>> include/linux/sched.h | 6 ++++++
>> 2 files changed, 48 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
>> index 81f1395..6fb7e18 100644
>> --- a/drivers/tty/sysrq.c
>> +++ b/drivers/tty/sysrq.c
>> @@ -329,6 +329,45 @@ static void send_sig_all(int sig)
>> }
>> }
>>
>> +static void send_sig_recent(int sig)
>> +{
>> + struct task_struct *p;
>> + unsigned long task_jiffies, last_jiffies = 0;
>> + bool kill = false;
>> +
>> +retry:
>
> you need tasklist lock for scanning reverse.
Okay. I will look at it.
>
>> + for_each_process_reverse(p) {
>> + if (p->mm && !is_global_init(p) && !fatal_signal_pending(p)) {
>> + /* recent created task */
>> + last_jiffies = timeval_to_jiffies(p->real_start_time);
>> + force_sig(sig, p);
>> + break;
>
> why break ? you need to kill all youngers. And what is the relationship with below ?
It's for selecting recent _youngest_ task which are not kthread, not
init, not handled by below loop. In below loop, it start to send KILL
signal processes which are created within 1 minutes from _youngest_
process creation time.
>
>
>> + }
>> + }
>> +
>> + for_each_process_reverse(p) {
>> + if (p->mm && !is_global_init(p)) {
>> + task_jiffies = timeval_to_jiffies(p->real_start_time);
>> + /*
>> + * Kill all processes which are created recenlty
>> + * (ex, 1 minutes)
>> + */
>> + if (task_jiffies > (last_jiffies - 60 * HZ)) {
>> + force_sig(sig, p);
>> + kill = true;
>> + }
>> + else
>> + break;
>> + }
>> + }
>> +
>> + /*
>> + * If we can't kill anything, restart with next group.
>> + */
>> + if (!kill)
>> + goto retry;
>> +}
>
> This is not useful under OOM situation, we cannot use 'jiffies' to find younger tasks
> because "memory reclaim-> livelock" can take some amount of minutes very easily.
> So, I used other metrics. I think you do the same mistake I made before,
> this doesn't work.
As far as I understand right, p->real_start_time is create time, not jiffies.
What I want is that kill all processes created recently, not all
process like old sysrq + I.
Am I miss something?
>
> Thanks,
> -Kame
>
>
>
--
Kind regards,
Minchan Kim
On Mon, 28 Mar 2011 16:46:42 -0700
Michel Lespinasse <[email protected]> wrote:
> On Sat, Mar 26, 2011 at 1:48 AM, Hiroyuki Kamezawa
> <[email protected]> wrote:
> > 2011/3/26 Michel Lespinasse <[email protected]>:
> >> I haven't heard of fork bombs being an issue for us (and it's not been
> >> for me on my desktop, either).
> >>
> >> Also, I want to point out that there is a classical userspace solution
> >> for this, as implemented by killall5 for example. One can do
> >> kill(-1, SIGSTOP) to stop all processes that they can send
> >> signals to (except for init and itself). Target processes
> >> can never catch or ignore the SIGSTOP. This stops the fork bomb
> >> from causing further damage. Then, one can look at the process
> >> tree and do whatever is appropriate - including killing by uid,
> >> by cgroup or whatever policies one wants to implement in userspace.
> >> Finally, the remaining processes can be restarted using SIGCONT.
> >>
> >
> > Can that solution work even under OOM situation without new login/commands ?
> > Please show us your solution, how to avoid Andrey's Bomb with your way.
> > Then, we can add Documentation, at least. Or you can show us your tool.
>
> To be clear, I don't have a full solution. I just think that the
> problem is approachable from userspace by freezing processes and then
> sorting them out. The killall5 utility is an example of that, though
> you would possibly want to add more smarts to it. If we want to
> include a kernel solution, I do like the simplicity of Minchan's
> proposal, too. But, I don't have a strong opinion on this matter, so
> feel free to ignore me if this is not useful feedback.
>
I don't have strong opinion, too. I just think easily breakable kernel
by an user application is not ideal thing for me. To go to other buildings
to press reset-button is good for my health. I just implemnted a solution and
it seems to work well. Then, just want to ask how my patch looks.
But no one see patches, and it seems this feature is not welcome.
I'll continue to walk or just use virtual machines for testing OOM.
Thanks,
-Kame
On Tue, 29 Mar 2011 09:24:30 +0900
Minchan Kim <[email protected]> wrote:
> On Tue, Mar 29, 2011 at 8:50 AM, KAMEZAWA Hiroyuki
> <[email protected]> wrote:
> > On Tue, 29 Mar 2011 01:21:37 +0900
> > Minchan Kim <[email protected]> wrote:
> >
> >> On Sat, Mar 26, 2011 at 05:48:45PM +0900, Hiroyuki Kamezawa wrote:
> >> > 2011/3/26 Michel Lespinasse <[email protected]>:
> >> > > On Fri, Mar 25, 2011 at 01:05:50PM +0900, Minchan Kim wrote:
> >> > >> Okay. Each approach has a pros and cons and at least, now anyone
> >> > >> doesn't provide any method and comments but I agree it is needed(ex,
> >> > >> careless and lazy admin could need it strongly). Let us wait a little
> >> > >> bit more. Maybe google guys or redhat/suse guys would have a opinion.
> >> > >
> >> > > I haven't heard of fork bombs being an issue for us (and it's not been
> >> > > for me on my desktop, either).
> >> > >
> >> > > Also, I want to point out that there is a classical userspace solution
> >> > > for this, as implemented by killall5 for example. One can do
> >> > > kill(-1, SIGSTOP) to stop all processes that they can send
> >> > > signals to (except for init and itself). Target processes
> >> > > can never catch or ignore the SIGSTOP. This stops the fork bomb
> >> > > from causing further damage. Then, one can look at the process
> >> > > tree and do whatever is appropriate - including killing by uid,
> >> > > by cgroup or whatever policies one wants to implement in userspace.
> >> > > Finally, the remaining processes can be restarted using SIGCONT.
> >> > >
> >> >
> >> > Can that solution work even under OOM situation without new login/commands ?
> >> > Please show us your solution, how to avoid Andrey's Bomb with your way.
> >> > Then, we can add Documentation, at least. Or you can show us your tool.
> >> >
> >> > Maybe it is....
> >> > - running as a daemon. (because it has to lock its work memory before OOM.)
> >> > - mlockall its own memory to work under OOM.
> >> > - It can show process tree of users/admin or do all in automatic way
> >> > with user's policy.
> >> > - tell us which process is guilty.
> >> > - wakes up automatically when OOM happens.....IOW, OOM should have some notifier
> >> > to userland.
> >> > - never allocate any memory at running. (maybe it can't use libc.)
> >> > - never be blocked by any locks, for example, some other task's mmap_sem.
> >> > One of typical mistakes of admins at OOM is typing 'ps' to see what
> >> > happens.....
> >> > - Can be used even with GUI system, which can't show console.
> >>
> >> Hi Kame,
> >>
> >> I am worried about run-time cost.
> >> Should we care of mistake of users for robustness of OS?
> >> Mostly right but we can't handle all mistakes of user so we need admin.
> >> For exampe, what happens if admin execute "rm -rf /"?
> >> For avoiding it, we get a solution "backup" about critical data.
> >>
> >
> > Then, my patch is configurable and has control knobs....never invasive for
> > people who don't want it. And simple and very low cost. It will have
> > no visible performance/resource usage impact for usual guys.
> >
> >
> >
> >> In the same manner, if the system is very critical of forkbomb,
> >> admin should consider it using memcg, virtualization, ulimit and so on.
> >> If he don't want it, he should become a hard worker who have to
> >> cross over other building to reboot it. Although he is a diligent man,
> >> Reboot isn't good. So I suggest following patch which is just RFC.
> >> For making formal patch, I have to add more comment and modify sysrq.txt.
> >>
> >
> > For me, sysrq is of-no-use as I explained.
>
> Go to other building and new login?
>
I cannot login when the system is near happens.
> I think if server is important on such problem, it should have a solution.
> The solution can be careful admin step or console with serial for
> sysrq step or your forkbomb killer. We have been used sysrq with local
> solution of last resort. In such context, sysrq solution ins't bad, I
> think.
>
Mine works with Sysrq-f and this works poorly than mine.
> If you can't provide 1 and 2, your forkbomb killer would be a last resort.
> But someone can solve the problem in just careful admin or sysrq.
> In that case, the user can disable forkbomb killer then it doesn't
> affect system performance at all.
> So maybe It could be separate topic.
>
> >
> >> From 51bec44086a6b6c0e56ea978a2eb47e995236b47 Mon Sep 17 00:00:00 2001
> >> From: Minchan Kim <[email protected]>
> >> Date: Tue, 29 Mar 2011 00:52:20 +0900
> >> Subject: [PATCH] [RFC] Prevent livelock by forkbomb
> >>
> >> Recently, We discussed how to prevent forkbomb.
> >> The thing is a trade-off between cost VS effect.
> >>
> >> Forkbomb is a _race_ case which happes by someone's mistake
> >> so if we have to pay cost in fast path(ex, fork, exec, exit),
> >> It's a not good.
> >>
> >> Now, sysrq + I kills all processes. When I tested it, I still
> >> need rebooting to work my system really well(ex, x start)
> >> although console works. I don't know why we need such sysrq(kill
> >> all processes and then what we can do?)
> >>
> >> So I decide to change sysrq + I to meet our goal which prevent
> >> forkbomb. The rationale is following as.
> >>
> >> Forkbomb means somethings makes repeately tasks in a short time so
> >> system don't have a free page then it become almost livelock state.
> >> This patch uses the characteristc of forkbomb.
> >>
> >> When you push sysrq + I, it kills recent created tasks.
> >> (In this version, 1 minutes). Maybe all processes included
> >> forkbomb tasks are killed. If you can't get normal state of system
> >> after you push sysrq + I, you can try one more. It can kill futher
> >> recent tasks(ex, 2 minutes).
> >>
> >> You can continue to do it until your system becomes normal state.
> >>
> >> Signed-off-by: Minchan Kim <[email protected]>
> >> ---
> >> drivers/tty/sysrq.c | 45 ++++++++++++++++++++++++++++++++++++++++++---
> >> include/linux/sched.h | 6 ++++++
> >> 2 files changed, 48 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> >> index 81f1395..6fb7e18 100644
> >> --- a/drivers/tty/sysrq.c
> >> +++ b/drivers/tty/sysrq.c
> >> @@ -329,6 +329,45 @@ static void send_sig_all(int sig)
> >> }
> >> }
> >>
> >> +static void send_sig_recent(int sig)
> >> +{
> >> + struct task_struct *p;
> >> + unsigned long task_jiffies, last_jiffies = 0;
> >> + bool kill = false;
> >> +
> >> +retry:
> >
> > you need tasklist lock for scanning reverse.
>
> Okay. I will look at it.
>
> >
> >> + for_each_process_reverse(p) {
> >> + if (p->mm && !is_global_init(p) && !fatal_signal_pending(p)) {
> >> + /* recent created task */
> >> + last_jiffies = timeval_to_jiffies(p->real_start_time);
> >> + force_sig(sig, p);
> >> + break;
> >
> > why break ? you need to kill all youngers. And what is the relationship with below ?
>
> It's for selecting recent _youngest_ task which are not kthread, not
> init, not handled by below loop. In below loop, it start to send KILL
> signal processes which are created within 1 minutes from _youngest_
> process creation time.
>
> >
> >
> >> + }
> >> + }
> >> +
> >> + for_each_process_reverse(p) {
> >> + if (p->mm && !is_global_init(p)) {
> >> + task_jiffies = timeval_to_jiffies(p->real_start_time);
> >> + /*
> >> + * Kill all processes which are created recenlty
> >> + * (ex, 1 minutes)
> >> + */
> >> + if (task_jiffies > (last_jiffies - 60 * HZ)) {
> >> + force_sig(sig, p);
> >> + kill = true;
> >> + }
> >> + else
> >> + break;
> >> + }
> >> + }
> >> +
> >> + /*
> >> + * If we can't kill anything, restart with next group.
> >> + */
> >> + if (!kill)
> >> + goto retry;
> >> +}
> >
> > This is not useful under OOM situation, we cannot use 'jiffies' to find younger tasks
> > because "memory reclaim-> livelock" can take some amount of minutes very easily.
> > So, I used other metrics. I think you do the same mistake I made before,
> > this doesn't work.
>
> As far as I understand right, p->real_start_time is create time, not jiffies.
> What I want is that kill all processes created recently, not all
> process like old sysrq + I.
>
> Am I miss something?
>
When you run 'make -j' or 'Andrey's case' with "swap". You'll see 1minutes is too
short and no task will be killed.
To determine this 60*HZ is diffuclut. I think no one cannot detemine this.
1 minute is too short, 10 minutes are too long. So, I used a different manner,
which seems to work well.
Thanks,
-Kmae
On Tue, Mar 29, 2011 at 9:32 AM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Tue, 29 Mar 2011 09:24:30 +0900
> Minchan Kim <[email protected]> wrote:
>
>> On Tue, Mar 29, 2011 at 8:50 AM, KAMEZAWA Hiroyuki
>> <[email protected]> wrote:
>> > On Tue, 29 Mar 2011 01:21:37 +0900
>> > Minchan Kim <[email protected]> wrote:
>> >
>> >> On Sat, Mar 26, 2011 at 05:48:45PM +0900, Hiroyuki Kamezawa wrote:
>> >> > 2011/3/26 Michel Lespinasse <[email protected]>:
>> >> > > On Fri, Mar 25, 2011 at 01:05:50PM +0900, Minchan Kim wrote:
>> >> > >> Okay. Each approach has a pros and cons and at least, now anyone
>> >> > >> doesn't provide any method and comments but I agree it is needed(ex,
>> >> > >> careless and lazy admin could need it strongly). Let us wait a little
>> >> > >> bit more. Maybe google guys or redhat/suse guys would have a opinion.
>> >> > >
>> >> > > I haven't heard of fork bombs being an issue for us (and it's not been
>> >> > > for me on my desktop, either).
>> >> > >
>> >> > > Also, I want to point out that there is a classical userspace solution
>> >> > > for this, as implemented by killall5 for example. One can do
>> >> > > kill(-1, SIGSTOP) to stop all processes that they can send
>> >> > > signals to (except for init and itself). Target processes
>> >> > > can never catch or ignore the SIGSTOP. This stops the fork bomb
>> >> > > from causing further damage. Then, one can look at the process
>> >> > > tree and do whatever is appropriate - including killing by uid,
>> >> > > by cgroup or whatever policies one wants to implement in userspace.
>> >> > > Finally, the remaining processes can be restarted using SIGCONT.
>> >> > >
>> >> >
>> >> > Can that solution work even under OOM situation without new login/commands ?
>> >> > Please show us your solution, how to avoid Andrey's Bomb with your way.
>> >> > Then, we can add Documentation, at least. Or you can show us your tool.
>> >> >
>> >> > Maybe it is....
>> >> > - running as a daemon. (because it has to lock its work memory before OOM.)
>> >> > - mlockall its own memory to work under OOM.
>> >> > - It can show process tree of users/admin or do all in automatic way
>> >> > with user's policy.
>> >> > - tell us which process is guilty.
>> >> > - wakes up automatically when OOM happens.....IOW, OOM should have some notifier
>> >> > to userland.
>> >> > - never allocate any memory at running. (maybe it can't use libc.)
>> >> > - never be blocked by any locks, for example, some other task's mmap_sem.
>> >> > One of typical mistakes of admins at OOM is typing 'ps' to see what
>> >> > happens.....
>> >> > - Can be used even with GUI system, which can't show console.
>> >>
>> >> Hi Kame,
>> >>
>> >> I am worried about run-time cost.
>> >> Should we care of mistake of users for robustness of OS?
>> >> Mostly right but we can't handle all mistakes of user so we need admin.
>> >> For exampe, what happens if admin execute "rm -rf /"?
>> >> For avoiding it, we get a solution "backup" about critical data.
>> >>
>> >
>> > Then, my patch is configurable and has control knobs....never invasive for
>> > people who don't want it. And simple and very low cost. It will have
>> > no visible performance/resource usage impact for usual guys.
>> >
>> >
>> >
>> >> In the same manner, if the system is very critical of forkbomb,
>> >> admin should consider it using memcg, virtualization, ulimit and so on.
>> >> If he don't want it, he should become a hard worker who have to
>> >> cross over other building to reboot it. Although he is a diligent man,
>> >> Reboot isn't good. So I suggest following patch which is just RFC.
>> >> For making formal patch, I have to add more comment and modify sysrq.txt.
>> >>
>> >
>> > For me, sysrq is of-no-use as I explained.
>>
>> Go to other building and new login?
>>
> I cannot login when the system is near happens.
I understand so I said your solution would be a last resort.
>
>> I think if server is important on such problem, it should have a solution.
>> The solution can be careful admin step or console with serial for
>> sysrq step or your forkbomb killer. We have been used sysrq with local
>> solution of last resort. In such context, sysrq solution ins't bad, I
>> think.
>>
>
> Mine works with Sysrq-f and this works poorly than mine.
>
>> If you can't provide 1 and 2, your forkbomb killer would be a last resort.
>> But someone can solve the problem in just careful admin or sysrq.
>> In that case, the user can disable forkbomb killer then it doesn't
>> affect system performance at all.
>> So maybe It could be separate topic.
>>
>> >
>> >> From 51bec44086a6b6c0e56ea978a2eb47e995236b47 Mon Sep 17 00:00:00 2001
>> >> From: Minchan Kim <[email protected]>
>> >> Date: Tue, 29 Mar 2011 00:52:20 +0900
>> >> Subject: [PATCH] [RFC] Prevent livelock by forkbomb
>> >>
>> >> Recently, We discussed how to prevent forkbomb.
>> >> The thing is a trade-off between cost VS effect.
>> >>
>> >> Forkbomb is a _race_ case which happes by someone's mistake
>> >> so if we have to pay cost in fast path(ex, fork, exec, exit),
>> >> It's a not good.
>> >>
>> >> Now, sysrq + I kills all processes. When I tested it, I still
>> >> need rebooting to work my system really well(ex, x start)
>> >> although console works. I don't know why we need such sysrq(kill
>> >> all processes and then what we can do?)
>> >>
>> >> So I decide to change sysrq + I to meet our goal which prevent
>> >> forkbomb. The rationale is following as.
>> >>
>> >> Forkbomb means somethings makes repeately tasks in a short time so
>> >> system don't have a free page then it become almost livelock state.
>> >> This patch uses the characteristc of forkbomb.
>> >>
>> >> When you push sysrq + I, it kills recent created tasks.
>> >> (In this version, 1 minutes). Maybe all processes included
>> >> forkbomb tasks are killed. If you can't get normal state of system
>> >> after you push sysrq + I, you can try one more. It can kill futher
>> >> recent tasks(ex, 2 minutes).
>> >>
>> >> You can continue to do it until your system becomes normal state.
>> >>
>> >> Signed-off-by: Minchan Kim <[email protected]>
>> >> ---
>> >> drivers/tty/sysrq.c | 45 ++++++++++++++++++++++++++++++++++++++++++---
>> >> include/linux/sched.h | 6 ++++++
>> >> 2 files changed, 48 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
>> >> index 81f1395..6fb7e18 100644
>> >> --- a/drivers/tty/sysrq.c
>> >> +++ b/drivers/tty/sysrq.c
>> >> @@ -329,6 +329,45 @@ static void send_sig_all(int sig)
>> >> }
>> >> }
>> >>
>> >> +static void send_sig_recent(int sig)
>> >> +{
>> >> + struct task_struct *p;
>> >> + unsigned long task_jiffies, last_jiffies = 0;
>> >> + bool kill = false;
>> >> +
>> >> +retry:
>> >
>> > you need tasklist lock for scanning reverse.
>>
>> Okay. I will look at it.
>>
>> >
>> >> + for_each_process_reverse(p) {
>> >> + if (p->mm && !is_global_init(p) && !fatal_signal_pending(p)) {
>> >> + /* recent created task */
>> >> + last_jiffies = timeval_to_jiffies(p->real_start_time);
>> >> + force_sig(sig, p);
>> >> + break;
>> >
>> > why break ? you need to kill all youngers. And what is the relationship with below ?
>>
>> It's for selecting recent _youngest_ task which are not kthread, not
>> init, not handled by below loop. In below loop, it start to send KILL
>> signal processes which are created within 1 minutes from _youngest_
>> process creation time.
>>
>> >
>> >
>> >> + }
>> >> + }
>> >> +
>> >> + for_each_process_reverse(p) {
>> >> + if (p->mm && !is_global_init(p)) {
>> >> + task_jiffies = timeval_to_jiffies(p->real_start_time);
>> >> + /*
>> >> + * Kill all processes which are created recenlty
>> >> + * (ex, 1 minutes)
>> >> + */
>> >> + if (task_jiffies > (last_jiffies - 60 * HZ)) {
>> >> + force_sig(sig, p);
>> >> + kill = true;
>> >> + }
>> >> + else
>> >> + break;
>> >> + }
>> >> + }
>> >> +
>> >> + /*
>> >> + * If we can't kill anything, restart with next group.
>> >> + */
>> >> + if (!kill)
>> >> + goto retry;
>> >> +}
>> >
>> > This is not useful under OOM situation, we cannot use 'jiffies' to find younger tasks
>> > because "memory reclaim-> livelock" can take some amount of minutes very easily.
>> > So, I used other metrics. I think you do the same mistake I made before,
>> > this doesn't work.
>>
>> As far as I understand right, p->real_start_time is create time, not jiffies.
>> What I want is that kill all processes created recently, not all
>> process like old sysrq + I.
>>
>> Am I miss something?
>>
> When you run 'make -j' or 'Andrey's case' with "swap". You'll see 1minutes is too
> short and no task will be killed.
>
> To determine this 60*HZ is diffuclut. I think no one cannot detemine this.
> 1 minute is too short, 10 minutes are too long. So, I used a different manner,
> which seems to work well.
Okay. I can handle it. How about this?
retry:
old_time = yougest_task->start_time;
for_each_process_reverse(p) {
time = p->start_time;
if (time > old_time - 60 * HZ)
kill(p);
}
/*
* If user push sysrq within 1 minutes from last again,
* we kill processes more.
*/
if (call_time < (now - 60 * HZ))
goto retry;
call_time = now;
return;
So whenever user push sysrq, older tasks would be killed and at last,
root forkbomb task would be killed.
>
> Thanks,
> -Kmae
>
>
>
>
>
--
Kind regards,
Minchan Kim
On Tue, 29 Mar 2011 10:12:31 +0900
Minchan Kim <[email protected]> wrote:
> On Tue, Mar 29, 2011 at 9:32 AM, KAMEZAWA Hiroyuki
> <[email protected]> wrote:
> > On Tue, 29 Mar 2011 09:24:30 +0900
> > Minchan Kim <[email protected]> wrote:
> >
> >> On Tue, Mar 29, 2011 at 8:50 AM, KAMEZAWA Hiroyuki
> >> <[email protected]> wrote:
> >> > On Tue, 29 Mar 2011 01:21:37 +0900
> >> > Minchan Kim <[email protected]> wrote:
> >> >
> >> >> On Sat, Mar 26, 2011 at 05:48:45PM +0900, Hiroyuki Kamezawa wrote:
> >> >> > 2011/3/26 Michel Lespinasse <[email protected]>:
> >> >> > > On Fri, Mar 25, 2011 at 01:05:50PM +0900, Minchan Kim wrote:
> >> >> > >> Okay. Each approach has a pros and cons and at least, now anyone
> >> >> > >> doesn't provide any method and comments but I agree it is needed(ex,
> >> >> > >> careless and lazy admin could need it strongly). Let us wait a little
> >> >> > >> bit more. Maybe google guys or redhat/suse guys would have a opinion.
> >> >> > >
> >> >> > > I haven't heard of fork bombs being an issue for us (and it's not been
> >> >> > > for me on my desktop, either).
> >> >> > >
> >> >> > > Also, I want to point out that there is a classical userspace solution
> >> >> > > for this, as implemented by killall5 for example. One can do
> >> >> > > kill(-1, SIGSTOP) to stop all processes that they can send
> >> >> > > signals to (except for init and itself). Target processes
> >> >> > > can never catch or ignore the SIGSTOP. This stops the fork bomb
> >> >> > > from causing further damage. Then, one can look at the process
> >> >> > > tree and do whatever is appropriate - including killing by uid,
> >> >> > > by cgroup or whatever policies one wants to implement in userspace.
> >> >> > > Finally, the remaining processes can be restarted using SIGCONT.
> >> >> > >
> >> >> >
> >> >> > Can that solution work even under OOM situation without new login/commands ?
> >> >> > Please show us your solution, how to avoid Andrey's Bomb with your way.
> >> >> > Then, we can add Documentation, at least. Or you can show us your tool.
> >> >> >
> >> >> > Maybe it is....
> >> >> > - running as a daemon. (because it has to lock its work memory before OOM.)
> >> >> > - mlockall its own memory to work under OOM.
> >> >> > - It can show process tree of users/admin or do all in automatic way
> >> >> > with user's policy.
> >> >> > - tell us which process is guilty.
> >> >> > - wakes up automatically when OOM happens.....IOW, OOM should have some notifier
> >> >> > to userland.
> >> >> > - never allocate any memory at running. (maybe it can't use libc.)
> >> >> > - never be blocked by any locks, for example, some other task's mmap_sem.
> >> >> > One of typical mistakes of admins at OOM is typing 'ps' to see what
> >> >> > happens.....
> >> >> > - Can be used even with GUI system, which can't show console.
> >> >>
> >> >> Hi Kame,
> >> >>
> >> >> I am worried about run-time cost.
> >> >> Should we care of mistake of users for robustness of OS?
> >> >> Mostly right but we can't handle all mistakes of user so we need admin.
> >> >> For exampe, what happens if admin execute "rm -rf /"?
> >> >> For avoiding it, we get a solution "backup" about critical data.
> >> >>
> >> >
> >> > Then, my patch is configurable and has control knobs....never invasive for
> >> > people who don't want it. And simple and very low cost. It will have
> >> > no visible performance/resource usage impact for usual guys.
> >> >
> >> >
> >> >
> >> >> In the same manner, if the system is very critical of forkbomb,
> >> >> admin should consider it using memcg, virtualization, ulimit and so on.
> >> >> If he don't want it, he should become a hard worker who have to
> >> >> cross over other building to reboot it. Although he is a diligent man,
> >> >> Reboot isn't good. So I suggest following patch which is just RFC.
> >> >> For making formal patch, I have to add more comment and modify sysrq.txt.
> >> >>
> >> >
> >> > For me, sysrq is of-no-use as I explained.
> >>
> >> Go to other building and new login?
> >>
> > I cannot login when the system is near happens.
>
> I understand so I said your solution would be a last resort.
>
> >
> >> I think if server is important on such problem, it should have a solution.
> >> The solution can be careful admin step or console with serial for
> >> sysrq step or your forkbomb killer. We have been used sysrq with local
> >> solution of last resort. In such context, sysrq solution ins't bad, I
> >> think.
> >>
> >
> > Mine works with Sysrq-f and this works poorly than mine.
> >
> >> If you can't provide 1 and 2, your forkbomb killer would be a last resort.
> >> But someone can solve the problem in just careful admin or sysrq.
> >> In that case, the user can disable forkbomb killer then it doesn't
> >> affect system performance at all.
> >> So maybe It could be separate topic.
> >>
> >> >
> >> >> From 51bec44086a6b6c0e56ea978a2eb47e995236b47 Mon Sep 17 00:00:00 2001
> >> >> From: Minchan Kim <[email protected]>
> >> >> Date: Tue, 29 Mar 2011 00:52:20 +0900
> >> >> Subject: [PATCH] [RFC] Prevent livelock by forkbomb
> >> >>
> >> >> Recently, We discussed how to prevent forkbomb.
> >> >> The thing is a trade-off between cost VS effect.
> >> >>
> >> >> Forkbomb is a _race_ case which happes by someone's mistake
> >> >> so if we have to pay cost in fast path(ex, fork, exec, exit),
> >> >> It's a not good.
> >> >>
> >> >> Now, sysrq + I kills all processes. When I tested it, I still
> >> >> need rebooting to work my system really well(ex, x start)
> >> >> although console works. I don't know why we need such sysrq(kill
> >> >> all processes and then what we can do?)
> >> >>
> >> >> So I decide to change sysrq + I to meet our goal which prevent
> >> >> forkbomb. The rationale is following as.
> >> >>
> >> >> Forkbomb means somethings makes repeately tasks in a short time so
> >> >> system don't have a free page then it become almost livelock state.
> >> >> This patch uses the characteristc of forkbomb.
> >> >>
> >> >> When you push sysrq + I, it kills recent created tasks.
> >> >> (In this version, 1 minutes). Maybe all processes included
> >> >> forkbomb tasks are killed. If you can't get normal state of system
> >> >> after you push sysrq + I, you can try one more. It can kill futher
> >> >> recent tasks(ex, 2 minutes).
> >> >>
> >> >> You can continue to do it until your system becomes normal state.
> >> >>
> >> >> Signed-off-by: Minchan Kim <[email protected]>
> >> >> ---
> >> >> drivers/tty/sysrq.c | 45 ++++++++++++++++++++++++++++++++++++++++++---
> >> >> include/linux/sched.h | 6 ++++++
> >> >> 2 files changed, 48 insertions(+), 3 deletions(-)
> >> >>
> >> >> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> >> >> index 81f1395..6fb7e18 100644
> >> >> --- a/drivers/tty/sysrq.c
> >> >> +++ b/drivers/tty/sysrq.c
> >> >> @@ -329,6 +329,45 @@ static void send_sig_all(int sig)
> >> >> }
> >> >> }
> >> >>
> >> >> +static void send_sig_recent(int sig)
> >> >> +{
> >> >> + struct task_struct *p;
> >> >> + unsigned long task_jiffies, last_jiffies = 0;
> >> >> + bool kill = false;
> >> >> +
> >> >> +retry:
> >> >
> >> > you need tasklist lock for scanning reverse.
> >>
> >> Okay. I will look at it.
> >>
> >> >
> >> >> + for_each_process_reverse(p) {
> >> >> + if (p->mm && !is_global_init(p) && !fatal_signal_pending(p)) {
> >> >> + /* recent created task */
> >> >> + last_jiffies = timeval_to_jiffies(p->real_start_time);
> >> >> + force_sig(sig, p);
> >> >> + break;
> >> >
> >> > why break ? you need to kill all youngers. And what is the relationship with below ?
> >>
> >> It's for selecting recent _youngest_ task which are not kthread, not
> >> init, not handled by below loop. In below loop, it start to send KILL
> >> signal processes which are created within 1 minutes from _youngest_
> >> process creation time.
> >>
> >> >
> >> >
> >> >> + }
> >> >> + }
> >> >> +
> >> >> + for_each_process_reverse(p) {
> >> >> + if (p->mm && !is_global_init(p)) {
> >> >> + task_jiffies = timeval_to_jiffies(p->real_start_time);
> >> >> + /*
> >> >> + * Kill all processes which are created recenlty
> >> >> + * (ex, 1 minutes)
> >> >> + */
> >> >> + if (task_jiffies > (last_jiffies - 60 * HZ)) {
> >> >> + force_sig(sig, p);
> >> >> + kill = true;
> >> >> + }
> >> >> + else
> >> >> + break;
> >> >> + }
> >> >> + }
> >> >> +
> >> >> + /*
> >> >> + * If we can't kill anything, restart with next group.
> >> >> + */
> >> >> + if (!kill)
> >> >> + goto retry;
> >> >> +}
> >> >
> >> > This is not useful under OOM situation, we cannot use 'jiffies' to find younger tasks
> >> > because "memory reclaim-> livelock" can take some amount of minutes very easily.
> >> > So, I used other metrics. I think you do the same mistake I made before,
> >> > this doesn't work.
> >>
> >> As far as I understand right, p->real_start_time is create time, not jiffies.
> >> What I want is that kill all processes created recently, not all
> >> process like old sysrq + I.
> >>
> >> Am I miss something?
> >>
> > When you run 'make -j' or 'Andrey's case' with "swap". You'll see 1minutes is too
> > short and no task will be killed.
> >
> > To determine this 60*HZ is diffuclut. I think no one cannot detemine this.
> > 1 minute is too short, 10 minutes are too long. So, I used a different manner,
> > which seems to work well.
>
> Okay. I can handle it. How about this?
>
> retry:
> old_time = yougest_task->start_time;
> for_each_process_reverse(p) {
> time = p->start_time;
> if (time > old_time - 60 * HZ)
> kill(p);
> }
>
> /*
> * If user push sysrq within 1 minutes from last again,
> * we kill processes more.
> */
> if (call_time < (now - 60 * HZ))
> goto retry;
>
> call_time = now;
> return;
>
> So whenever user push sysrq, older tasks would be killed and at last,
> root forkbomb task would be killed.
>
Maybe good for a single user system and it can send Sysrq.
But I myself not very excited with this new feature becasuse I need to
run to push Sysrq ....
Please do as you like, I think the idea itself is interesting.
But I love some automatic ones. I do other jobs.
Thanks,
-Kame
On Tue, Mar 29, 2011 at 10:12 AM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Tue, 29 Mar 2011 10:12:31 +0900
> Minchan Kim <[email protected]> wrote:
>
>> On Tue, Mar 29, 2011 at 9:32 AM, KAMEZAWA Hiroyuki
>> <[email protected]> wrote:
>> > On Tue, 29 Mar 2011 09:24:30 +0900
>> > Minchan Kim <[email protected]> wrote:
>> >
>> >> On Tue, Mar 29, 2011 at 8:50 AM, KAMEZAWA Hiroyuki
>> >> <[email protected]> wrote:
>> >> > On Tue, 29 Mar 2011 01:21:37 +0900
>> >> > Minchan Kim <[email protected]> wrote:
>> >> >
>> >> >> On Sat, Mar 26, 2011 at 05:48:45PM +0900, Hiroyuki Kamezawa wrote:
>> >> >> > 2011/3/26 Michel Lespinasse <[email protected]>:
>> >> >> > > On Fri, Mar 25, 2011 at 01:05:50PM +0900, Minchan Kim wrote:
>> >> >> > >> Okay. Each approach has a pros and cons and at least, now anyone
>> >> >> > >> doesn't provide any method and comments but I agree it is needed(ex,
>> >> >> > >> careless and lazy admin could need it strongly). Let us wait a little
>> >> >> > >> bit more. Maybe google guys or redhat/suse guys would have a opinion.
>> >> >> > >
>> >> >> > > I haven't heard of fork bombs being an issue for us (and it's not been
>> >> >> > > for me on my desktop, either).
>> >> >> > >
>> >> >> > > Also, I want to point out that there is a classical userspace solution
>> >> >> > > for this, as implemented by killall5 for example. One can do
>> >> >> > > kill(-1, SIGSTOP) to stop all processes that they can send
>> >> >> > > signals to (except for init and itself). Target processes
>> >> >> > > can never catch or ignore the SIGSTOP. This stops the fork bomb
>> >> >> > > from causing further damage. Then, one can look at the process
>> >> >> > > tree and do whatever is appropriate - including killing by uid,
>> >> >> > > by cgroup or whatever policies one wants to implement in userspace.
>> >> >> > > Finally, the remaining processes can be restarted using SIGCONT.
>> >> >> > >
>> >> >> >
>> >> >> > Can that solution work even under OOM situation without new login/commands ?
>> >> >> > Please show us your solution, how to avoid Andrey's Bomb with your way.
>> >> >> > Then, we can add Documentation, at least. Or you can show us your tool.
>> >> >> >
>> >> >> > Maybe it is....
>> >> >> > - running as a daemon. (because it has to lock its work memory before OOM.)
>> >> >> > - mlockall its own memory to work under OOM.
>> >> >> > - It can show process tree of users/admin or do all in automatic way
>> >> >> > with user's policy.
>> >> >> > - tell us which process is guilty.
>> >> >> > - wakes up automatically when OOM happens.....IOW, OOM should have some notifier
>> >> >> > to userland.
>> >> >> > - never allocate any memory at running. (maybe it can't use libc.)
>> >> >> > - never be blocked by any locks, for example, some other task's mmap_sem.
>> >> >> > One of typical mistakes of admins at OOM is typing 'ps' to see what
>> >> >> > happens.....
>> >> >> > - Can be used even with GUI system, which can't show console.
>> >> >>
>> >> >> Hi Kame,
>> >> >>
>> >> >> I am worried about run-time cost.
>> >> >> Should we care of mistake of users for robustness of OS?
>> >> >> Mostly right but we can't handle all mistakes of user so we need admin.
>> >> >> For exampe, what happens if admin execute "rm -rf /"?
>> >> >> For avoiding it, we get a solution "backup" about critical data.
>> >> >>
>> >> >
>> >> > Then, my patch is configurable and has control knobs....never invasive for
>> >> > people who don't want it. And simple and very low cost. It will have
>> >> > no visible performance/resource usage impact for usual guys.
>> >> >
>> >> >
>> >> >
>> >> >> In the same manner, if the system is very critical of forkbomb,
>> >> >> admin should consider it using memcg, virtualization, ulimit and so on.
>> >> >> If he don't want it, he should become a hard worker who have to
>> >> >> cross over other building to reboot it. Although he is a diligent man,
>> >> >> Reboot isn't good. So I suggest following patch which is just RFC.
>> >> >> For making formal patch, I have to add more comment and modify sysrq.txt.
>> >> >>
>> >> >
>> >> > For me, sysrq is of-no-use as I explained.
>> >>
>> >> Go to other building and new login?
>> >>
>> > I cannot login when the system is near happens.
>>
>> I understand so I said your solution would be a last resort.
>>
>> >
>> >> I think if server is important on such problem, it should have a solution.
>> >> The solution can be careful admin step or console with serial for
>> >> sysrq step or your forkbomb killer. We have been used sysrq with local
>> >> solution of last resort. In such context, sysrq solution ins't bad, I
>> >> think.
>> >>
>> >
>> > Mine works with Sysrq-f and this works poorly than mine.
>> >
>> >> If you can't provide 1 and 2, your forkbomb killer would be a last resort.
>> >> But someone can solve the problem in just careful admin or sysrq.
>> >> In that case, the user can disable forkbomb killer then it doesn't
>> >> affect system performance at all.
>> >> So maybe It could be separate topic.
>> >>
>> >> >
>> >> >> From 51bec44086a6b6c0e56ea978a2eb47e995236b47 Mon Sep 17 00:00:00 2001
>> >> >> From: Minchan Kim <[email protected]>
>> >> >> Date: Tue, 29 Mar 2011 00:52:20 +0900
>> >> >> Subject: [PATCH] [RFC] Prevent livelock by forkbomb
>> >> >>
>> >> >> Recently, We discussed how to prevent forkbomb.
>> >> >> The thing is a trade-off between cost VS effect.
>> >> >>
>> >> >> Forkbomb is a _race_ case which happes by someone's mistake
>> >> >> so if we have to pay cost in fast path(ex, fork, exec, exit),
>> >> >> It's a not good.
>> >> >>
>> >> >> Now, sysrq + I kills all processes. When I tested it, I still
>> >> >> need rebooting to work my system really well(ex, x start)
>> >> >> although console works. I don't know why we need such sysrq(kill
>> >> >> all processes and then what we can do?)
>> >> >>
>> >> >> So I decide to change sysrq + I to meet our goal which prevent
>> >> >> forkbomb. The rationale is following as.
>> >> >>
>> >> >> Forkbomb means somethings makes repeately tasks in a short time so
>> >> >> system don't have a free page then it become almost livelock state.
>> >> >> This patch uses the characteristc of forkbomb.
>> >> >>
>> >> >> When you push sysrq + I, it kills recent created tasks.
>> >> >> (In this version, 1 minutes). Maybe all processes included
>> >> >> forkbomb tasks are killed. If you can't get normal state of system
>> >> >> after you push sysrq + I, you can try one more. It can kill futher
>> >> >> recent tasks(ex, 2 minutes).
>> >> >>
>> >> >> You can continue to do it until your system becomes normal state.
>> >> >>
>> >> >> Signed-off-by: Minchan Kim <[email protected]>
>> >> >> ---
>> >> >> drivers/tty/sysrq.c | 45 ++++++++++++++++++++++++++++++++++++++++++---
>> >> >> include/linux/sched.h | 6 ++++++
>> >> >> 2 files changed, 48 insertions(+), 3 deletions(-)
>> >> >>
>> >> >> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
>> >> >> index 81f1395..6fb7e18 100644
>> >> >> --- a/drivers/tty/sysrq.c
>> >> >> +++ b/drivers/tty/sysrq.c
>> >> >> @@ -329,6 +329,45 @@ static void send_sig_all(int sig)
>> >> >> }
>> >> >> }
>> >> >>
>> >> >> +static void send_sig_recent(int sig)
>> >> >> +{
>> >> >> + struct task_struct *p;
>> >> >> + unsigned long task_jiffies, last_jiffies = 0;
>> >> >> + bool kill = false;
>> >> >> +
>> >> >> +retry:
>> >> >
>> >> > you need tasklist lock for scanning reverse.
>> >>
>> >> Okay. I will look at it.
>> >>
>> >> >
>> >> >> + for_each_process_reverse(p) {
>> >> >> + if (p->mm && !is_global_init(p) && !fatal_signal_pending(p)) {
>> >> >> + /* recent created task */
>> >> >> + last_jiffies = timeval_to_jiffies(p->real_start_time);
>> >> >> + force_sig(sig, p);
>> >> >> + break;
>> >> >
>> >> > why break ? you need to kill all youngers. And what is the relationship with below ?
>> >>
>> >> It's for selecting recent _youngest_ task which are not kthread, not
>> >> init, not handled by below loop. In below loop, it start to send KILL
>> >> signal processes which are created within 1 minutes from _youngest_
>> >> process creation time.
>> >>
>> >> >
>> >> >
>> >> >> + }
>> >> >> + }
>> >> >> +
>> >> >> + for_each_process_reverse(p) {
>> >> >> + if (p->mm && !is_global_init(p)) {
>> >> >> + task_jiffies = timeval_to_jiffies(p->real_start_time);
>> >> >> + /*
>> >> >> + * Kill all processes which are created recenlty
>> >> >> + * (ex, 1 minutes)
>> >> >> + */
>> >> >> + if (task_jiffies > (last_jiffies - 60 * HZ)) {
>> >> >> + force_sig(sig, p);
>> >> >> + kill = true;
>> >> >> + }
>> >> >> + else
>> >> >> + break;
>> >> >> + }
>> >> >> + }
>> >> >> +
>> >> >> + /*
>> >> >> + * If we can't kill anything, restart with next group.
>> >> >> + */
>> >> >> + if (!kill)
>> >> >> + goto retry;
>> >> >> +}
>> >> >
>> >> > This is not useful under OOM situation, we cannot use 'jiffies' to find younger tasks
>> >> > because "memory reclaim-> livelock" can take some amount of minutes very easily.
>> >> > So, I used other metrics. I think you do the same mistake I made before,
>> >> > this doesn't work.
>> >>
>> >> As far as I understand right, p->real_start_time is create time, not jiffies.
>> >> What I want is that kill all processes created recently, not all
>> >> process like old sysrq + I.
>> >>
>> >> Am I miss something?
>> >>
>> > When you run 'make -j' or 'Andrey's case' with "swap". You'll see 1minutes is too
>> > short and no task will be killed.
>> >
>> > To determine this 60*HZ is diffuclut. I think no one cannot detemine this.
>> > 1 minute is too short, 10 minutes are too long. So, I used a different manner,
>> > which seems to work well.
>>
>> Okay. I can handle it. How about this?
>>
>> retry:
>> old_time = yougest_task->start_time;
>> for_each_process_reverse(p) {
>> time = p->start_time;
>> if (time > old_time - 60 * HZ)
>> kill(p);
>> }
>>
>> /*
>> * If user push sysrq within 1 minutes from last again,
>> * we kill processes more.
>> */
>> if (call_time < (now - 60 * HZ))
>> goto retry;
>>
>> call_time = now;
>> return;
>>
>> So whenever user push sysrq, older tasks would be killed and at last,
>> root forkbomb task would be killed.
>>
>
> Maybe good for a single user system and it can send Sysrq.
> But I myself not very excited with this new feature becasuse I need to
> run to push Sysrq ....
>
> Please do as you like, I think the idea itself is interesting.
> But I love some automatic ones. I do other jobs.
Okay. Thanks for the comment, Kame.
I hope Andrew or someone gives feedback forkbomb problem itself before
diving into this.
--
Kind regards,
Minchan Kim
Hi, Minchan, Kamezawa-san,
> >> So whenever user push sysrq, older tasks would be killed and at last,
> >> root forkbomb task would be killed.
> >>
> >
> > Maybe good for a single user system and it can send Sysrq.
> > But I myself not very excited with this new feature becasuse I need to
> > run to push Sysrq ....
> >
> > Please do as you like, I think the idea itself is interesting.
> > But I love some automatic ones. I do other jobs.
>
> Okay. Thanks for the comment, Kame.
>
> I hope Andrew or someone gives feedback forkbomb problem itself before
> diving into this.
May I ask current status of this thread? I'm unhappy if our kernel keep
to have forkbomb weakness. ;)
Can we consider to take either or both idea?
On Thu, 14 Apr 2011 09:20:41 +0900 (JST)
KOSAKI Motohiro <[email protected]> wrote:
> Hi, Minchan, Kamezawa-san,
>
> > >> So whenever user push sysrq, older tasks would be killed and at last,
> > >> root forkbomb task would be killed.
> > >>
> > >
> > > Maybe good for a single user system and it can send Sysrq.
> > > But I myself not very excited with this new feature becasuse I need to
> > > run to push Sysrq ....
> > >
> > > Please do as you like, I think the idea itself is interesting.
> > > But I love some automatic ones. I do other jobs.
> >
> > Okay. Thanks for the comment, Kame.
> >
> > I hope Andrew or someone gives feedback forkbomb problem itself before
> > diving into this.
>
> May I ask current status of this thread? I'm unhappy if our kernel keep
> to have forkbomb weakness. ;)
I've stopped updating but can restart at any time. (And I found a bug ;)
> Can we consider to take either or both idea?
>
I think yes, both idea can be used.
One idea is
- kill all recent threads by Sysrq. The user can use Sysrq multiple times
until forkbomb stops.
Another(mine) is
- kill all problematic in automatic. This adds some tracking costs but
can be configurable.
Thanks,
-Kame
Hi, KOSAKI and Kame.
On Thu, Apr 14, 2011 at 9:35 AM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Thu, 14 Apr 2011 09:20:41 +0900 (JST)
> KOSAKI Motohiro <[email protected]> wrote:
>
>> Hi, Minchan, Kamezawa-san,
>>
>> > >> So whenever user push sysrq, older tasks would be killed and at last,
>> > >> root forkbomb task would be killed.
>> > >>
>> > >
>> > > Maybe good for a single user system and it can send Sysrq.
>> > > But I myself not very excited with this new feature becasuse I need to
>> > > run to push Sysrq ....
>> > >
>> > > Please do as you like, I think the idea itself is interesting.
>> > > But I love some automatic ones. I do other jobs.
>> >
>> > Okay. Thanks for the comment, Kame.
>> >
>> > I hope Andrew or someone gives feedback forkbomb problem itself before
>> > diving into this.
>>
>> May I ask current status of this thread? I'm unhappy if our kernel keep
>> to have forkbomb weakness. ;)
>
> I've stopped updating but can restart at any time. (And I found a bug ;)
>
>> Can we consider to take either or both idea?
>>
> I think yes, both idea can be used.
> One idea is
> - kill all recent threads by Sysrq. The user can use Sysrq multiple times
> until forkbomb stops.
> Another(mine) is
> - kill all problematic in automatic. This adds some tracking costs but
> can be configurable.
>
> Thanks,
> -Kame
>
>
Unfortunately, we didn't have a slot to discuss the oom and forkbomb.
So, personally, I talked it with some guys(who we know very well :) )
for a moment during lunch time at LSF/MM. It seems he doesn't feel
strongly we really need it and still I am not sure it, either.
Now most important thing is to listen other's opinions about we really
need it and we need it in kernel.
And I have a idea to implement my one in automatic, too. :)
Thanks for your interest.
--
Kind regards,
Minchan Kim
On Thu, 14 Apr 2011, Minchan Kim wrote:
> Unfortunately, we didn't have a slot to discuss the oom and forkbomb.
> So, personally, I talked it with some guys(who we know very well :) )
> for a moment during lunch time at LSF/MM. It seems he doesn't feel
> strongly we really need it and still I am not sure it, either.
>
I'm not sure who you're referring to here, but I don't think we should
ignore forkbomb vulnerabilities that exist in the kernel because you
talked to a guy and he doesn't think we need it. I know you have
particularly taken an interest in this thread, so I also know that's not
what you're saying, but I'm not sure what you meant by the above. I think
we _must_ address forkbomb issues, whether it's in the oom killer or
elsewhere, if it causes negative effects for other users on the machine as
it appears is possible in Andrey's test case.
When I was doing the oom killer rewrite, I included my own forkbomb killer
in early revisions and removed it because there was a thought that it
would negatively impact webservers or other processes that fork thousands
of threads for a very legitimate purpose. The old oom killer also
attempted to prefer killing children of a forkbomb first, but its method
was error-prone because it factored the size of each child's VM into the
parent and that could unfairly penalize the parent for high priority work.
It seems like there are a few common principles that everyone would agree
with:
- forkbombs need only be addressed when oom,
- forkbombs don't need complex handling when isolated to a memcg,
- forkbombs should be handled automatically without mandatory
intervention by the admin, and
- forkbombs should result in the entire process tree being killed.
If that's the case, then the appropriate place for such a feature would be
in the oom killer by extending oom_badness() to detect forkbombs and then
in oom_kill_process() to kill the parent process and all children instead
of its default of sacrificing a child first.
The absolute simplest form would be to implement a threshold similar to
what is done in Kame's patchset where previous history is declared as
forgotten. Then, add a jiffies member to struct task_struct and, on
fork(), one of two things would happen:
- if the jiffies value is less than a system-wide predefined forkbomb
threshold, increment a counter in the same struct, or
- if the jiffies value is greater than the threshold, clear the counter
and update the jiffies value.
This is lightweight and approximates how many children a parent has forked
in the most recent time period. On oom, a preliminary tasklist scan could
accumulate all of the counts and charge them up its ancestory as long as
each successive parent has a jiffies value less than the forkbomb
threshold.
If a task has a cumulative fork count that exceeds a threshold, it is
declared as a forkbomb and specially handled. (Once the forkbomb is
identified, it would be trivial to SIGKILL it and all of its children to
limit the damage.) If no task exceeds the threshold, the forkbomb killer
is a no-op and the oom killer proceeds as it does today.
The key is to implement the correct thresholds, especially the threshold
to identify a parent as a forkbomb. That's not trivial, is 1,000 forks in
one second a forkbomb? 10,000? If the system is oom and a process and
its children have forked 10,000 threads in the past second, I think it
would be sane to kill it even if another process is using 95% of RAM, for
example, since the loss of work is relatively small and if we really do
want to start that thread with 10,000 forks/sec in oom conditions, then it
places the burden of freeing enough memory to do so on the user instead of
the kernel where it is more appropriate.