Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp3581992ybc; Thu, 14 Nov 2019 11:18:10 -0800 (PST) X-Google-Smtp-Source: APXvYqwhSIteimpD7WxVk/idqda5JDguu/b5UMeycaljcx99HwsP7dk4Ti3vQIZToxFDTj9cONvo X-Received: by 2002:a1c:113:: with SMTP id 19mr10190752wmb.42.1573759090558; Thu, 14 Nov 2019 11:18:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573759090; cv=none; d=google.com; s=arc-20160816; b=AzeuUr0k4TRG7i42Ym9sn9EWcTXFC4aG4qTbFLQz9Gpc77DmrB+lxkkBazFhjrManj mqvAErgIlary/VEw0TPDYJ/uxtW3n6Ryj22gAGNx5kWSTZ2L/S9eNl09aNQIMuvgBjC4 kxwHRsMhOwDM4cpS+fHLQKAhlNul9MuiTNdjRQSN2FlMldpYkL3PquAahSaf2Dh2294h aqPyRACqKKS/kneXLokM3ULOh8xESEfYjW6VXgHj4foS8HY5DJRa3e5PWnMJLSwVTnBz KwpCJC+2sjXD659mGJ6iTCYNLnwnKr44Q2AHxDDWOSYijUGvx/3MsDnOCYPOADU89vPx 6C+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=z63GcjaMWNicLSNlBdn845wuAPEFDRmuQYlMLcn6RQ8=; b=Jru4jHXi2WbmID+Fk2DBlC/3qzzpgPOS+ZxIYNLif7pbAP1aw4hC1At3OScSEhsoQ1 GLxo8RFzxpl/D4nXAylvdsFxUCFIsTd8xlL685zDT4PRTuQK7WL78k+ICOjVnsQGepkJ dU5CXxq6969Z8niEHLGPhb2KRPSz0wuVVejD7M4dAzwnGLPkuZyRACBUAhmD8vpUkiRm 3nWAl0AuL59THav6+ouGtYlwySX5WvnM0E1zUrDdnccrMN4tkNU4c1tAC+5z96YZmpuu HJjxFigFhkFRTEq6+5kUGcASVK2kOlz/Zmpe6s4Z9jNjARZvzFmLQjkGpFubp52G7Ib5 TPRw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v20si5461125edc.69.2019.11.14.11.17.46; Thu, 14 Nov 2019 11:18:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726949AbfKNTRC (ORCPT + 99 others); Thu, 14 Nov 2019 14:17:02 -0500 Received: from mx2.suse.de ([195.135.220.15]:46214 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726098AbfKNTRB (ORCPT ); Thu, 14 Nov 2019 14:17:01 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id C28CFB1F0; Thu, 14 Nov 2019 19:16:59 +0000 (UTC) Date: Thu, 14 Nov 2019 20:16:57 +0100 From: Michal Hocko To: Roman Gushchin Cc: Michal =?iso-8859-1?Q?Koutn=FD?= , "linux-mm@kvack.org" , Andrew Morton , Johannes Weiner , "linux-kernel@vger.kernel.org" , Kernel Team , "stable@vger.kernel.org" , Tejun Heo Subject: Re: [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm() Message-ID: <20191114191657.GN20866@dhcp22.suse.cz> References: <20191106225131.3543616-1-guro@fb.com> <20191113162934.GF19372@blackbody.suse.cz> <20191113170823.GA12464@castle.DHCP.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20191113170823.GA12464@castle.DHCP.thefacebook.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 13-11-19 17:08:29, Roman Gushchin wrote: > On Wed, Nov 13, 2019 at 05:29:34PM +0100, Michal Koutn? wrote: > > Hi. > > > > On Wed, Nov 06, 2019 at 02:51:30PM -0800, Roman Gushchin wrote: > > > Let's fix it by switching from css_tryget_online() to css_tryget(). > > Is this a safe thing to do? The stack captures a kmem charge path, with > > css_tryget() it may happen it gets an offlined memcg and carry out > > charge into it. What happens when e.g. memcg_deactivate_kmem_caches is > > skipped as a consequence? > > The thing here is that css_tryget_online() cannot pin the online state, > so even if returned true, the cgroup can be offline at the return from > the function. So if we rely somewhere on it, it's already broken. Then what is the point of this function and what about all other users? > Generally speaking, it's better to reduce it's usage to the bare minimum. If it doesn't have any sensible semantic then I would argue it should go altogether otherwise we are going to chase new users again and aagain? > > > The problem is caused by an exiting task which is associated with > > > an offline memcg. We're iterating over and over in the > > > do {} while (!css_tryget_online()) loop, but obviously the memcg won't > > > become online and the exiting task won't be migrated to a live memcg. > > As discussed in other replies, the task is not yet exiting. However, the > > access to memcg isn't through `current` but `mm->owner`, i.e. another > > task of a threadgroup may have got stuck in an offlined memcg (I don't > > have a good explanation for that though). The trace however points to current->mm or current->active_memcg. Is it possible that we have a stale active_memcg? -- Michal Hocko SUSE Labs