Received: by 2002:a17:90a:8582:0:0:0:0 with SMTP id m2csp2365562pjn; Tue, 2 Apr 2019 15:10:33 -0700 (PDT) X-Google-Smtp-Source: APXvYqw3uSuRaAC0I7NajqYFKC8KBkJZCUOtZe9yifsGHKELGVuVJU6qL7W0Y3CfPMMsn4E+rmUV X-Received: by 2002:aa7:91d5:: with SMTP id z21mr39930479pfa.222.1554243032964; Tue, 02 Apr 2019 15:10:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554243032; cv=none; d=google.com; s=arc-20160816; b=sfAHxkbnDc3rE0YiIXQBlpBmc6Jc5W05bmlLSPOLuxfU1GM//YEK1FQVud5jT+LsSH GGgOMsjPmPMRcJ1bkNsf/b8nvvCQWyiRnjmXmvPZWoNLPW613dIPRSk6Spyk6AE93NjO kl2uiy3Jf6/k/+lpmrGSxBrldLawE3MCGZrEwps+y2dJWz+QvE13WwpjUfkLj25/p3Yo p3OhG+ByW19gU6BbI3ep8A88hmzJCV63lSd4y4wPihHFF7uzJ4WYvDy3rPHK0gFYM2TF 19y5wYM1Z3ZtAQnK/vXM8rk70T8kcDqWWqdCmFWI6umLmzIbzWLbaTh8fVA2D9Rh2Z93 rfAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-id:content-language:accept-language:in-reply-to:references :message-id:date:thread-index:thread-topic:subject:cc:to:from :dkim-signature:dkim-signature; bh=SwsKDjjj7SRUR25hjJd7eLdumKNnAozD6h0qL1PCTxU=; b=HQtiOl10jKPxxL1Ee0j9nO5sA7/AC2i1+VCIdeWOFboYSpyqij8UXwOs9bN4/6K1aw qL/1TKpxgqf1nWwG1f3d0RGTOexfLpjIZhdwxHAFP9WLhS9qSmo54vR30yGiN1c1khU6 ruIaA9PvbcjaT8bN3wsxI8+ct6c3ADMiTxH8go3GVqi+C5LYgI7sdEN7YTXeDT/WAaAC j3X4eTspjggI9Q2p90RvcKRTUQ+f4YPerAHqyNQXVbfFyFs73mRXFieFeyfRV92pwcyW 9+CYIBUzrG+6sMi96YBB1EBoOLyveCEhsbQly9BI4OkpsfLSEu63KQYZqrKOVE+RyzSh Iz0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fb.com header.s=facebook header.b=FgHm9rtv; dkim=pass header.i=@fb.onmicrosoft.com header.s=selector1-fb-com header.b=DD2G0TBX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=fb.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t10si11908435plr.229.2019.04.02.15.10.17; Tue, 02 Apr 2019 15:10:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@fb.com header.s=facebook header.b=FgHm9rtv; dkim=pass header.i=@fb.onmicrosoft.com header.s=selector1-fb-com header.b=DD2G0TBX; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=fb.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726584AbfDBWJG (ORCPT + 99 others); Tue, 2 Apr 2019 18:09:06 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:35276 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726078AbfDBWJG (ORCPT ); Tue, 2 Apr 2019 18:09:06 -0400 Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x32M7Pno032502; Tue, 2 Apr 2019 15:09:01 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-id : content-transfer-encoding : mime-version; s=facebook; bh=SwsKDjjj7SRUR25hjJd7eLdumKNnAozD6h0qL1PCTxU=; b=FgHm9rtvZ7xKbY+J9DIloiRRh2Ps2y0kpK0lwuhVEcp6H7etCSneQC2MWdgdkAo4RjB0 54v0h3FzYwku0zpA08sVnSsYtIeQXE2WIh/sS7t+v06A01tfU6KePq8uLa1W0YSTUakW 1CPj6wgJnb2G5yPOVThG+q4x6sVT+V/R09U= Received: from maileast.thefacebook.com ([199.201.65.23]) by mx0a-00082601.pphosted.com with ESMTP id 2rmdvj8j6e-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Tue, 02 Apr 2019 15:09:01 -0700 Received: from frc-mbx05.TheFacebook.com (2620:10d:c0a1:f82::29) by frc-hub06.TheFacebook.com (2620:10d:c021:18::176) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.1.1713.5; Tue, 2 Apr 2019 15:08:44 -0700 Received: from frc-hub04.TheFacebook.com (2620:10d:c021:18::174) by frc-mbx05.TheFacebook.com (2620:10d:c0a1:f82::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.1.1713.5; Tue, 2 Apr 2019 15:08:44 -0700 Received: from NAM04-SN1-obe.outbound.protection.outlook.com (192.168.183.28) by o365-in.thefacebook.com (192.168.177.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.1.1713.5 via Frontend Transport; Tue, 2 Apr 2019 15:08:44 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.onmicrosoft.com; s=selector1-fb-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=SwsKDjjj7SRUR25hjJd7eLdumKNnAozD6h0qL1PCTxU=; b=DD2G0TBXw3/TEhkcUhWGkhMT5K274ncosRlbkbWSvpMjMGIrsAkMpO/vj/RVPdSD+ixpd4kNRaDnbtWyji5duUsy0qCaI0Vz/6cZ2pBn6l2fUUxC/ZKZwAg/XdSYc78ZL4uC4mT51wk/ymc9omsnCsDYZppItlCt5BTKu4C5luE= Received: from BYAPR15MB2631.namprd15.prod.outlook.com (20.179.156.24) by BYAPR15MB2437.namprd15.prod.outlook.com (52.135.198.153) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1750.17; Tue, 2 Apr 2019 22:08:42 +0000 Received: from BYAPR15MB2631.namprd15.prod.outlook.com ([fe80::790e:7294:b086:9ded]) by BYAPR15MB2631.namprd15.prod.outlook.com ([fe80::790e:7294:b086:9ded%3]) with mapi id 15.20.1750.017; Tue, 2 Apr 2019 22:08:42 +0000 From: Roman Gushchin To: Oleg Nesterov CC: Roman Gushchin , Tejun Heo , Kernel Team , "cgroups@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v9 4/9] cgroup: cgroup v2 freezer Thread-Topic: [PATCH v9 4/9] cgroup: cgroup v2 freezer Thread-Index: AQHU3CHllXzybPyDGEG0vLR4lYlRjKYpIryAgABmQIA= Date: Tue, 2 Apr 2019 22:08:42 +0000 Message-ID: <20190402220836.GA16417@tower.DHCP.thefacebook.com> References: <20190316175812.6787-1-guro@fb.com> <20190316175812.6787-5-guro@fb.com> <20190402160241.GA10425@redhat.com> In-Reply-To: <20190402160241.GA10425@redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: BYAPR04CA0006.namprd04.prod.outlook.com (2603:10b6:a03:40::19) To BYAPR15MB2631.namprd15.prod.outlook.com (2603:10b6:a03:152::24) x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [2620:10d:c090:200::3:923f] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: ab2d55be-4944-4d2d-0260-08d6b7b7c43f x-microsoft-antispam: BCL:0;PCL:0;RULEID:(2390118)(7020095)(4652040)(8989299)(5600139)(711020)(4605104)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328)(7193020);SRVR:BYAPR15MB2437; x-ms-traffictypediagnostic: BYAPR15MB2437: x-microsoft-antispam-prvs: x-forefront-prvs: 0995196AA2 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(136003)(346002)(376002)(396003)(366004)(39860400002)(189003)(199004)(106356001)(6116002)(316002)(99286004)(25786009)(6486002)(6916009)(7736002)(6436002)(229853002)(256004)(14444005)(6506007)(386003)(105586002)(1076003)(478600001)(14454004)(2906002)(97736004)(4326008)(71200400001)(11346002)(6246003)(71190400001)(46003)(102836004)(52116002)(446003)(54906003)(33656002)(5660300002)(76176011)(6512007)(186003)(81166006)(9686003)(8936002)(305945005)(81156014)(53936002)(476003)(68736007)(86362001)(8676002)(486006);DIR:OUT;SFP:1102;SCL:1;SRVR:BYAPR15MB2437;H:BYAPR15MB2631.namprd15.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1; received-spf: None (protection.outlook.com: fb.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: TqgamV6gO78tIdTQf38QBcoNwqux7T0leOyqcE3AU7CjAQHB/jnO6VyZqlOg4nrw1023FRNs5gUfggHC1bB1RBhcL5ObaJuVVLzSf2GU5gM01TWUsqbEDkurVrWg3Zdz9UVeEGYIEEEi8nkGatvsoZ0F9O2adtfbel5r+yYx7FSd2BGtUy1loh7uUH9hWxtoGvwZvrQ4h+/gzF+7kqPlesiv49kK3YCl3iF7Az6wxjqrH78OXhWHGq0fDKZkNvoY8QK7EOwVn6cOuYobhuHPFg6gm+QBogMMTxLt1d8VC3tZf31vdkSqAvUzCiqO3jVY52/TWfAVgXDyqJKf1T2mEUBO7+WLI5nMWWOditX9ZqMprmBt8YE+Xf9uw3y8us/9jbH2KC34oCLhFPOc13bDQ6kvBYPGuqVkPbOVUkqN21g= Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: ab2d55be-4944-4d2d-0260-08d6b7b7c43f X-MS-Exchange-CrossTenant-originalarrivaltime: 02 Apr 2019 22:08:42.3599 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR15MB2437 X-OriginatorOrg: fb.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-04-02_10:,, signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Oleg, Thank you very much for your time! Really appreciate it. I'll rebase the patchset, address your comments and send v10 soon-ish. Please, find some notes below. On Tue, Apr 02, 2019 at 06:02:41PM +0200, Oleg Nesterov wrote: > Hi Roman, >=20 > let me apologize again for the huge delay. >=20 > I see nothing really wrong in this version, so no objections from me. >=20 > However, 4/9 doesn't apply, so it seems you will need to make v10 anyway > to adapt these changes to the recent changes in kernel/signal.c ;) >=20 > Just a couple of minor nits below... >=20 > On 03/16, Roman Gushchin wrote: > > > > + * If always_leave is not set, and the cgroup is freezing, > > + * we're racing with the cgroup freezing. In this case, we don't > > + * drop the frozen counter to avoid a transient switch to > > + * the unfrozen state. To make sure that the task won't go > > + * to the userspace before reaching the signal handler loop, > > + * let's set TIF_SIGPENDING flag. > > + */ > > +void cgroup_leave_frozen(bool always_leave) > > +{ > > + struct cgroup *cgrp; > > + > > + spin_lock_irq(&css_set_lock); > > + cgrp =3D task_dfl_cgroup(current); > > + if (always_leave || !test_bit(CGRP_FREEZE, &cgrp->flags)) { > > + cgroup_dec_frozen_cnt(cgrp); > > + cgroup_update_frozen(cgrp); > > + WARN_ON_ONCE(!current->frozen); > > + current->frozen =3D false; > > + } else { > > + set_tsk_thread_flag(current, TIF_SIGPENDING); >=20 > The setting of TIF_SIGPENDING looks unnecessary and even not correct; bec= ause > this flag must not be updated without ->siglock held (even if "set" is mo= re > or less safe). >=20 > If JOBCTL_TRAP_FREEZE is already set, then TIF_SIGPENDING must be set too= . >=20 > Otherwise set_tsk_thread_flag(TIF_SIGPENDING) can't help because the task= can > do recalc_sigpending() at any moment. >=20 > In particular, get_signal() does dequeue_signal()->recalc_sigpending() ri= ght > after cgroup_leave_frozen(), so I fail to understand why do we need to se= t > TIF_SIGPENDING. So this "else" part is only for vfork path. What I'm trying to do, is to avoid flipping the cgroup state from frozen to unfrozen and back to frozen. You're right, TIF_SIGPENDING isn't enough, we need proper cgroup_freeze_tas= k() with some additional synchronization. >=20 >=20 > > @@ -912,6 +912,10 @@ static struct task_struct *dup_task_struct(struct = task_struct *orig, int node) > > tsk->fail_nth =3D 0; > > #endif > > > > +#ifdef CONFIG_CGROUPS > > + tsk->frozen =3D 0; > > +#endif >=20 > Hmm, do we really need this? How can a cgroup_task_frozen() task call > copy_process() ? We shouldn't, but let me double check it. >=20 > > +static void do_freezer_trap(void) > > + __releases(¤t->sighand->siglock) > > +{ > > + /* > > + * If a fatal signal is pending, there is no way back for the process= , > > + * so let it escape from the freezer trap and exit. > > + * If the task has been frozen, cgroup_leave_frozen() will be invoked > > + * to update the cgroup state, if necessary. > > + */ > > + if (fatal_signal_pending(current)) { > > + current->jobctl &=3D ~JOBCTL_TRAP_FREEZE; > > + spin_unlock_irq(¤t->sighand->siglock); > > + return; > > + } > > + > > + /* > > + * If there are other trap bits pending except JOBCTL_TRAP_FREEZE, > > + * let's make another loop to give it a chance to be handled. > > + * In any case, we'll return back. > > + */ > > + if (((current->jobctl & (JOBCTL_PENDING_MASK | JOBCTL_TRAP_FREEZE)) != =3D > > + JOBCTL_TRAP_FREEZE) || fatal_signal_pending(current)) { > ^^^^^^^^^^^^^^^^^^^^ >=20 > We have already checked fatal_signal_pending() at the start? Yeah, I'll remove the second call. Thanks! >=20 > And in fact, you can probably remove fatal_signal_pending() altogether... > Note that with recent changes get_signal() does >=20 > if (signal_group_exit(signal)) { > ksig->info.si_signo =3D signr =3D SIGKILL; > sigdelset(¤t->pending.signal, SIGKILL); > recalc_sigpending(); > goto fatal; > } >=20 > before the main loop, so afaics fatal_signal_pending() =3D=3D T in do_fre= ezer_trap() > is simply impossible. This means that you can't clear JOBCTL_TRAP_FREEZE,= but > this is probably fine... if not, you can add jobctl &=3D ~JOBCTL_TRAP_FRE= EZE into > the "if (signal_group_exit(signal))" above. I'll try. >=20 > > @@ -2401,12 +2453,27 @@ bool get_signal(struct ksignal *ksig) > > do_signal_stop(0)) > > goto relock; > > =20 > > - if (unlikely(current->jobctl & JOBCTL_TRAP_MASK)) { > > - do_jobctl_trap(); > > - spin_unlock_irq(&sighand->siglock); > > + if (unlikely(current->jobctl & > > + (JOBCTL_TRAP_MASK | JOBCTL_TRAP_FREEZE))) { > > + if (current->jobctl & JOBCTL_TRAP_MASK) { > > + do_jobctl_trap(); > > + spin_unlock_irq(&sighand->siglock); > > + } else if (current->jobctl & JOBCTL_TRAP_FREEZE) > > + do_freezer_trap(); > > + > > goto relock; > > } > > =20 > > + /* > > + * If the task is leaving the frozen state, let's update > > + * cgroup counters and reset the frozen bit. > > + */ > > + if (unlikely(cgroup_task_frozen(current))) { > > + spin_unlock_irq(&sighand->siglock); > > + cgroup_leave_frozen(true); > > + spin_lock_irq(&sighand->siglock); >=20 > I'd suggest to do "goto relock" rather than spin_lock_irq(&sighand->siglo= ck). > To ensure we can't miss SIGKILL which can come right after we drop sigloc= k, > note again the new signal_group_exit() check above. Sure, will switch to "goto relock". Thank you! Roman