Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp7034554imu; Mon, 3 Dec 2018 06:48:13 -0800 (PST) X-Google-Smtp-Source: AFSGD/Xlod/3Q+BrlmBR5r6kHmDrWTBYHNXrOGD0Zb6Ie8IlvQLSBnF95cgJzXtv/hCQMlGMcefR X-Received: by 2002:a17:902:7791:: with SMTP id o17mr15931951pll.60.1543848493304; Mon, 03 Dec 2018 06:48:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543848493; cv=none; d=google.com; s=arc-20160816; b=gW1b0KLnOjLKhNED3P3k4jH9KPRtYSOHCzz/TVtLiQk9LLVtb100ODXXJj4Q1H9gUS maQd7mqJGMxivO3xZO1CwOqw2m3LKDMX6ZBsyhtKNKciL8mAVZSR11IUXn8BNAicJYGQ 800FGdZFKHmVJXl0cssMFqniYIsQ2sHwjLAxBoToRtbq0/qeQtBQNZFl15MGIhtDPTiC VgBljha2yc9ImAZQe69GAhpnWUNs8gBtN0t5ibb8DY6FbFSur7lHy79WcI+kIePZmrWn xv2zrip+TSolMZ966SfpnfWAmZGIvTEfT0yuvxulDdy45JecbIHRsnxMAP6qX6+Tfgcx t8Fg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=477t/Wr+sB5kFXgyXtIfDS5FoMPpkwQ59l1KKArMw1M=; b=iKYdZBw+9079PVy0Zg4Dq44dfh8yrESmgGOZkpso2LaxMN31eH9CibaRD7ccR+GBK1 2YqP+Ht1lXHSwPPnRw5fBpueFjzSnLNmQyYQ3UZorxf8rE1mcDn5u8Ay9mQJe0gAlBJ/ Myri8+hV5RbtlQrBi1wcj/LxFBrDZsqJoxO9s3tfpx9l9z4cUwuXP58NAr8ecb9lnjr5 4OAxBwkMmCSBr5BYTGHDA4RhO0bzjrcZQF1puABFgr13FLYU3gRrx8lW1w1CUxB55X1u LSBscvLbvpfPJQEMtw87HGmYzUoooRRFwbewp+fI+46kTBHekZD0hj91eXIVf3pXbirC 7bPA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x186si14518513pfx.269.2018.12.03.06.47.58; Mon, 03 Dec 2018 06:48:13 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726596AbeLCOsi (ORCPT + 99 others); Mon, 3 Dec 2018 09:48:38 -0500 Received: from mx1.redhat.com ([209.132.183.28]:55348 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726116AbeLCOsi (ORCPT ); Mon, 3 Dec 2018 09:48:38 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E7DB430020A7; Mon, 3 Dec 2018 14:47:20 +0000 (UTC) Received: from dhcp-27-174.brq.redhat.com (unknown [10.43.17.12]) by smtp.corp.redhat.com (Postfix) with SMTP id 1A97B2A2CC; Mon, 3 Dec 2018 14:47:18 +0000 (UTC) Received: by dhcp-27-174.brq.redhat.com (nbSMTP-1.00) for uid 1000 oleg@redhat.com; Mon, 3 Dec 2018 15:47:20 +0100 (CET) Date: Mon, 3 Dec 2018 15:47:18 +0100 From: Oleg Nesterov To: Roman Gushchin Cc: Tejun Heo , Dan Carpenter , Mike Rapoport , cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, Roman Gushchin Subject: Re: [PATCH v4 4/7] cgroup: cgroup v2 freezer Message-ID: <20181203144717.GD31795@redhat.com> References: <20181130234745.6756-1-guro@fb.com> <20181130234745.6756-5-guro@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181130234745.6756-5-guro@fb.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Mon, 03 Dec 2018 14:47:21 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org To be honest, I fail to understand this patch. At least after a quick glance, I will try to read it again tomorrow but so far I do not even understand the desired semantics wrt signals/ptrace. On 11/30, Roman Gushchin wrote: > > @@ -368,6 +369,8 @@ static inline int signal_pending_state(long state, struct task_struct *p) > return 0; > if (!signal_pending(p)) > return 0; > + if (unlikely(cgroup_task_frozen(p))) > + return __fatal_signal_pending(p); Oh, this is not nice. And doesn't look right. > +/* > + * Entry path into frozen state. > + * If the task was not frozen before, counters are updated and the cgroup state > + * is revisited. Otherwise, the task is put into the TASK_KILLABLE sleep. > + */ > +void cgroup_enter_frozen(void) > +{ > + if (!current->frozen) { > + struct cgroup *cgrp; > + > + spin_lock_irq(&css_set_lock); > + current->frozen = true; > + cgrp = task_dfl_cgroup(current); > + cgrp->freezer.nr_frozen_tasks++; > + WARN_ON_ONCE(cgrp->freezer.nr_frozen_tasks > > + cgrp->freezer.nr_tasks_to_freeze); > + cgroup_update_frozen(cgrp, true); > + spin_unlock_irq(&css_set_lock); > + } > + > + __set_current_state(TASK_INTERRUPTIBLE); > + schedule(); The comment above says TASK_KILLABLE, very confusing. Probably this pairs with the change in signal_pending_state() above. So this schedule() should actually "work" in that it won't return if signal_pending(). But this can't protect from another signal_wake_up(). Yes, iiuc in this case cgroup_enter_frozen() will be called again "soon" but this all looks strange. > --- a/kernel/ptrace.c > +++ b/kernel/ptrace.c > @@ -410,6 +410,13 @@ static int ptrace_attach(struct task_struct *task, long request, > > spin_lock(&task->sighand->siglock); > > + /* > + * If the process is frozen, let's wake it up to give it a chance > + * to enter the ptrace trap. > + */ > + if (cgroup_task_frozen(task)) > + wake_up_process(task); And why this can't race with cgroup_enter_frozen() ? Or think of PTRACE_INTERRUPT. It can race with cgroup_enter_frozen() too, the tracee can miss this request because of that change in signal_pending_state(). > static void do_jobctl_trap(void) > { > + struct sighand_struct *sighand = current->sighand; > struct signal_struct *signal = current->signal; > int signr = current->jobctl & JOBCTL_STOP_SIGMASK; > > - if (current->ptrace & PT_SEIZED) { > - if (!signal->group_stop_count && > - !(signal->flags & SIGNAL_STOP_STOPPED)) > - signr = SIGTRAP; > - WARN_ON_ONCE(!signr); > - ptrace_do_notify(signr, signr | (PTRACE_EVENT_STOP << 8), > - CLD_STOPPED); > - } else { > - WARN_ON_ONCE(!signr); > - ptrace_stop(signr, CLD_STOPPED, 0, NULL); > - current->exit_code = 0; > + if (current->jobctl & (JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)) { > + if (current->ptrace & PT_SEIZED) { > + if (!signal->group_stop_count && > + !(signal->flags & SIGNAL_STOP_STOPPED)) > + signr = SIGTRAP; > + WARN_ON_ONCE(!signr); > + ptrace_do_notify(signr, > + signr | (PTRACE_EVENT_STOP << 8), > + CLD_STOPPED); > + } else { > + WARN_ON_ONCE(!signr); > + ptrace_stop(signr, CLD_STOPPED, 0, NULL); > + current->exit_code = 0; > + } > + } else if (current->jobctl & JOBCTL_TRAP_FREEZE) { > + /* > + * Enter the frozen state, unless the task is about to exit. > + */ > + if (fatal_signal_pending(current)) { > + current->jobctl &= ~JOBCTL_TRAP_FREEZE; > + } else { > + spin_unlock_irq(&sighand->siglock); > + cgroup_enter_frozen(); > + spin_lock_irq(&sighand->siglock); > + } > } > } > > @@ -2401,12 +2420,23 @@ bool get_signal(struct ksignal *ksig) > do_signal_stop(0)) > goto relock; > > - if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) { > + if (unlikely(current->jobctl & > + (JOBCTL_TRAP_MASK | JOBCTL_TRAP_FREEZE))) { > do_jobctl_trap(); Cosmetic nit, but can't you add another helper? To me something like if (JOBCTL_TRAP_MASK) do_jobctl_trap(); else if (JOBCTL_TRAP_FREEZE) do_jobctl_freeze(); will look more clean, but I won't insist. > + /* > + * If the task is leaving the frozen state, let's update > + * cgroup counters and reset the frozen bit. > + */ > + if (unlikely(cgroup_task_frozen(current))) { > + spin_unlock_irq(&sighand->siglock); > + cgroup_leave_frozen(); > + spin_lock_irq(&sighand->siglock); looks like this needs another "goto relock", no? And perhaps this another reason for the new do_jobctl_freeze() helper which could absorb this leave_frozen() ? Oleg.