Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1129282imu; Fri, 25 Jan 2019 18:31:10 -0800 (PST) X-Google-Smtp-Source: ALg8bN6SExTNdmtEyqXPBfUytD+KZz12/ufHi7o2RqS/qkSNmmeb6zPfTxmhGrt/Q/P/ped6PgbH X-Received: by 2002:a17:902:2a66:: with SMTP id i93mr13257612plb.113.1548469870733; Fri, 25 Jan 2019 18:31:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548469870; cv=none; d=google.com; s=arc-20160816; b=scLI33MUo2Q5e1dPox2Jy2u7nf5Vuso1N8iZ6aSYLdpXWLClZO4tzyAigpNwUy8SEY CFFqoSQM7mF0iOCixY43XP8+Q7TO5mrjM/3g+Fz3YMd59rE8N9QRGbxUL9TADHnE5lzL G/XpCvaO5wZ5P6+h+xCKro1UfcYOL2F7Ud6UGkvAL+dc6jEj2QgKmcAwNthPALNYCbrn 4N/t1j2LvJ0CAhBcoG+41dw7yaU6DI+SarAozBjL2oQasONluagtocO4ObFLtAXhCj+8 GQ1eby2HuKR9vJJsalQuAvqDsFol9vNkh/S+rYWYZ9JYLyId30SMaNpb4PFtFixLDD0p ZYWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=FqYq2DCR5ZjVkJR8Ig60cpqxTOec9pvrU8lo75E6znE=; b=Ci/XA5cO/NAQotfNi53IavMxGaWnc14V56ixgfq5lksqvT3xKUv8+CsdqQkbbXYPJa 2w1Qi5K6ldiZ4sxpS/c2Cif1OM9wAF/0YK6LAfuIv5YnjNAnZRl1ZGQKX0bzObgijQJw wlcAr+A6qGhryq6feUxpow/rK5wRe6SketVd8HrocDb+PBImPODspON99MYfYbMIV0RV wBgUJj2w2MM4M5B+B6VKtR25lXh58Nv5/MMmDgC7WXhc6WhcrqfT2Be/6+g7n8dChUof yg73FfIdLUGQIC0byrHDKjM1vO9YIsGO+EZ1NPWXDkHL5dEFvtTbntCqbj/Mfivhxf25 c0WA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=n3FqbnR5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h1si5793373plt.44.2019.01.25.18.30.40; Fri, 25 Jan 2019 18:31:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=n3FqbnR5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728888AbfAZC2Q (ORCPT + 99 others); Fri, 25 Jan 2019 21:28:16 -0500 Received: from mail-lf1-f68.google.com ([209.85.167.68]:46711 "EHLO mail-lf1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726097AbfAZC2P (ORCPT ); Fri, 25 Jan 2019 21:28:15 -0500 Received: by mail-lf1-f68.google.com with SMTP id f5so8255810lfc.13; Fri, 25 Jan 2019 18:28:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=FqYq2DCR5ZjVkJR8Ig60cpqxTOec9pvrU8lo75E6znE=; b=n3FqbnR5RecGJmjuI2lCRCCb25AHqevsbD2XbzHc/Tsn25JzhtV+2d96h+jUh8lGVD wVt4e7jnj1T8nBz+ocBDqYNgkAvn3poUSlG3GBnGndUKzd8/89kht0XpHfxZkxyYJ1mU Y3dM8Qfhci/6X6znUtsTdw5O+I0n0IZpWo+vGKn7BepCFXSE5jylXt6R+NC7g7QWx3Qj 4eVPaDt6TKA0mNKQA6tSjwCCziAC9DO5rjVWEpCpAE3o7S4sItR4mlO8km9Q986DM8Xx uML1cRIfH67uLaElMhB6p+fBbiHdK/iHd54SBU5dZh4G1F0KYagpeewF1H843DjnyO0r TD7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=FqYq2DCR5ZjVkJR8Ig60cpqxTOec9pvrU8lo75E6znE=; b=NjJ/1eL07Y5y10kxPZCzHh183mBHTJ0DQV0OHYPDiNFdQZZAQounny+V3yQnE6kqsu GconKWU5Nh7QWbnxkicSJFfS2lN7sgH64R56JP9FZDV597+ItvS/Bh3Tu4eObHIv73qf g8zYstUkLF64bD7SssLTPmJNe6i23SyiSXmgRC8g4cEVv/XCRX7fDdFYikbjPyC6u5t9 XJHWJD5EjhvgQ6LdrJSGDamjURxnwKiq7Kd0IFS4TaLWRs8iCcCQZyjt8ndenhsD7SaY BAASnXcFhDQPFdYcNemOJVRnCGZp1VvUp2YVbXN+UJ7evogUDuzmaLkcFgxT/gCygyIs NFjA== X-Gm-Message-State: AJcUuke6PKZSsfY1HOgcXjngd3FuqCjEiHntOlbq9G2rOft2zFFtyale IICdCuyE2qfS+6vwHytonsrezMLf/lo= X-Received: by 2002:a19:f89:: with SMTP id 9mr10242569lfp.10.1548469692431; Fri, 25 Jan 2019 18:28:12 -0800 (PST) Received: from amb.local (31-179-17-47.dynamic.chello.pl. [31.179.17.47]) by smtp.gmail.com with ESMTPSA id f8sm1836790lfb.2.2019.01.25.18.28.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 25 Jan 2019 18:28:11 -0800 (PST) Subject: Re: pids.current with invalid value for hours [5.0.0 rc3 git] To: Roman Gushchin Cc: Tejun Heo , "cgroups@vger.kernel.org" , Aleksa Sarai , Jay Kamat , Michal Hocko , Johannes Weiner , "linux-kernel@vger.kernel.org" References: <20190117122535.njcbqhlmzozdkncw@mikami> <1d36b181-cbaf-6694-1a31-2f7f55d15675@gmail.com> <96ef6615-a5df-30af-b4dc-417a18ca63f1@gmail.com> <1cdbef13-564d-61a6-95f4-579d2cad243d@gmail.com> <20190125163731.GJ50184@devbig004.ftw2.facebook.com> <20190126014055.GA25864@castle.DHCP.thefacebook.com> From: =?UTF-8?Q?Arkadiusz_Mi=c5=9bkiewicz?= Message-ID: <38f5f498-0a0f-aa9e-c2b0-88c4d3f2e2fb@gmail.com> Date: Sat, 26 Jan 2019 03:28:10 +0100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <20190126014055.GA25864@castle.DHCP.thefacebook.com> Content-Type: text/plain; charset=utf-8 Content-Language: pl Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 26/01/2019 02:41, Roman Gushchin wrote: > On Fri, Jan 25, 2019 at 08:47:57PM +0100, Arkadiusz Miśkiewicz wrote: >> On 25/01/2019 17:37, Tejun Heo wrote: >>> On Fri, Jan 25, 2019 at 08:52:11AM +0100, Arkadiusz Miśkiewicz wrote: >>>> On 24/01/2019 12:21, Arkadiusz Miśkiewicz wrote: >>>>> On 17/01/2019 14:17, Arkadiusz Miśkiewicz wrote: >>>>>> On 17/01/2019 13:25, Aleksa Sarai wrote: >>>>>>> On 2019-01-17, Arkadiusz Miśkiewicz wrote: >>>>>>>> Using kernel 4.19.13. >>>>>>>> >>>>>>>> For one cgroup I noticed weird behaviour: >>>>>>>> >>>>>>>> # cat pids.current >>>>>>>> 60 >>>>>>>> # cat cgroup.procs >>>>>>>> # >>>>>>> >>>>>>> Are there any zombies in the cgroup? pids.current is linked up directly >>>>>>> to __put_task_struct (so exit(2) won't decrease it, only the task_struct >>>>>>> actually being freed will decrease it). >>>>>>> >>>>>> >>>>>> There are no zombie processes. >>>>>> >>>>>> In mean time the problem shows on multiple servers and so far saw it >>>>>> only in cgroups that were OOMed. >>>>>> >>>>>> What has changed on these servers (yesterday) is turning on >>>>>> memory.oom.group=1 for all cgroups and changing memory.high from 1G to >>>>>> "max" (leaving memory.max=2G limit only). >>>>>> >>>>>> Previously there was no such problem. >>>>>> >>>>> >>>>> I'm attaching reproducer. This time tried on different distribution >>>>> kernel (arch linux). >>>>> >>>>> After 60s pids.current still shows 37 processes even if there are no >>>>> processes running (according to ps aux). >>>> >>>> >>>> The same test on 5.0.0-rc3-00104-gc04e2a780caf and it's easy to >>>> reproduce bug. No processes in cgroup but pids.current reports 91. >>> >>> Can you please see whether the problem can be reproduced on the >>> current linux-next? >>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git >> >> I can reproduce on next (5.0.0-rc3-next-20190125), too: > > How reliably you can reproduce it? I've tried to run your reproducer > several times with different parameters, but wasn't lucky so far. Hmm, I'm able to reproduce each time using that python3 script. > What's yours cpu number and total ram size? On different machines: 1) old pc, 1 x intel E6600, 4GB of ram (arch linux 4.20.3 distro kernel), using python script 2) virtualbox vm, 1 cpu with 2 cores, 8GB of ram (pld kernel and custom built kernels like 5.0rc3 and 5.0rc3 next), using python script 3) server with 2 x intel E5405, 32GB of ram (4.19.13), with real life scenario 4) server with 1 x intel E5-2630 v2, 64GB of ram (4.19.15), with real life scenario > > Can you, please, provide the corresponding dmesg output? From my virtualbox vm, after 60s pids.current reports 7 despite no processes: http://ixion.pld-linux.org/~arekm/cgroup-oom-1.txt kernel config: http://ixion.pld-linux.org/~arekm/cgroup-oom-kernelconf-1.txt Using 5.0.0-rc3-next-20190125 on that vm. > I've checked the code again, and my wild guess is that these missing > tasks are waiting (maybe hopelessly) for the OOM reaper. Dmesg output > might be very useful here. > > Thanks! > -- Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )