Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp131107imm; Wed, 5 Sep 2018 23:01:50 -0700 (PDT) X-Google-Smtp-Source: ANB0VdaePUvqAVXyTH7HdryZS0GDAl/RVzm2W7+l2m8/apsBN29fGS0zOcXso3sWZMCmV2v/XgsF X-Received: by 2002:a17:902:52c:: with SMTP id 41-v6mr1149756plf.201.1536213710537; Wed, 05 Sep 2018 23:01:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536213710; cv=none; d=google.com; s=arc-20160816; b=hOGwJrSWt7SmZ89i2PcHCvBC9uoLOPFjG0LIWd6u+VDauZiMpFUJIvwB5O+AOMBp2Y smu/HBq0DmNTCDOLIMFYSOFBd9pg3UVZX8OZc+2+H3xPuQQZDyaUYg0nYdIJSqjc+Rcy KsO6ERuF2TiOpgFxm5fjMcHgGm3+45YWHUy/i+RyDut9rESws8H3fWk2Xnk0Bppw9iPT 1x3fOM3exZpNzOfMj0XdueNQbTJMOJoWnsFeluDN4WrDqb921auGt0sYuVluo/N7tSjz yLTMueMt2LXTtbIOof0LkjSIMWA6cD9oue3iOReiZo1YhDoGA58muWLbFlqL+pa47bgF dYZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=0Ay/44U8setuFJWHYuoC8WpB8Q2kmpoM2eLMZouxN/8=; b=gCl3zw0GAMp47xTup5qpDvDIUkcgEMVPTJTyRtYnHuKsdpQmQdIj0pdSPTQburvXq2 krf71fSYc6JavuecOfxZFv3AbvyMGy+uVu7Z3h2j592ClyDwE1cMy2abCCaiXm11EjT2 ZCRUkjZ8xctYoJZzSpWwxrLb5wfrgPq3UmznOtRJn8QCQNXz5ba5Dq70sAC5YV3hHWCU d0PuDlJwmjF1jXgw0fNtd4+LR65Syws+WoZ4pt6D4IZCrcRoQgiKcAsn3uFzlVR4eEaw ka2Fe7Zl6rjF/i2XoQEFI4S9h5AKCyUaCIxA68bKJ/3U9HmjktNq+YMQLThJS06uOjsj VEHA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h2-v6si4299955pgk.330.2018.09.05.23.01.05; Wed, 05 Sep 2018 23:01:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726337AbeIFKbm (ORCPT + 99 others); Thu, 6 Sep 2018 06:31:42 -0400 Received: from mx2.suse.de ([195.135.220.15]:47072 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725850AbeIFKbm (ORCPT ); Thu, 6 Sep 2018 06:31:42 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 9ED55AEF7; Thu, 6 Sep 2018 05:57:53 +0000 (UTC) Date: Thu, 6 Sep 2018 07:57:42 +0200 From: Michal Hocko To: Tetsuo Handa Cc: David Rientjes , Tejun Heo , Roman Gushchin , Johannes Weiner , Vladimir Davydov , Andrew Morton , Linus Torvalds , linux-mm , LKML Subject: Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry(). Message-ID: <20180906055742.GL14951@dhcp22.suse.cz> References: <81cc1f29-e42e-7813-dc70-5d6d9e999dd1@i-love.sakura.ne.jp> <20180905140451.GG14951@dhcp22.suse.cz> <201809060100.w86100i6060716@www262.sakura.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201809060100.w86100i6060716@www262.sakura.ne.jp> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 06-09-18 10:00:00, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Wed 05-09-18 22:53:33, Tetsuo Handa wrote: > > > On 2018/09/05 22:40, Michal Hocko wrote: > > > > Changelog said > > > > > > > > "Although this is possible in principle let's wait for it to actually > > > > happen in real life before we make the locking more complex again." > > > > > > > > So what is the real life workload that hits it? The log you have pasted > > > > below doesn't tell much. > > > > > > Nothing special. I just ran a multi-threaded memory eater on a CONFIG_PREEMPT=y kernel. > > > > I strongly suspec that your test doesn't really represent or simulate > > any real and useful workload. Sure it triggers a rare race and we kill > > another oom victim. Does this warrant to make the code more complex? > > Well, I am not convinced, as I've said countless times. > > Yes. Below is an example from a machine running Apache Web server/Tomcat AP server/PostgreSQL DB server. > An memory eater needlessly killed Tomcat due to this race. What prevents you from modifying you mem eater in a way that Tomcat resp. others from being the primary oom victim choice? In other words, yeah it is not optimal to lose the race but if it is rare enough then this is something to live with because it can be hardly considered a new DoS vector AFAICS. Remember that this is always going to be racy land and we are not going to plumb all possible races because this is simply not viable. But I am pretty sure we have been through all this many times already. Oh well... > I assert that we should fix af5679fbc669f31f. If you can come up with reasonable patch which doesn't complicate the code and it is a clear win for both this particular workload as well as others then why not. -- Michal Hocko SUSE Labs