Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp4310258imm; Mon, 30 Jul 2018 12:12:29 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdtlKbnWsMkyKmtbiTeemYkMK/Wut4ow7wPVu24DVF+pbOeVKdsxN86pelLU059zUtmpN8R X-Received: by 2002:a63:db4f:: with SMTP id x15-v6mr17531025pgi.214.1532977949472; Mon, 30 Jul 2018 12:12:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532977949; cv=none; d=google.com; s=arc-20160816; b=xuNlOt7plDPNkYUkarqqrfmSJPwdGv7v6IUly8QXXnAGBIREhjRFg5UqhoMiB33jBZ rxGDR6sJ0MWllkaodld5VxGuvKKefg1vcWWxWEepL23siGUacc52VsKJYeufNHbY/A38 WhYerr4Y6rXQH25tjMrp5+28y4+D5uQVQuO4r26UBZ3r8goDiYKyN2HLx+Pf9UPH3gUX BnqWB2I5qdbCxHShikVGQwABKvyqE0cfCK7uB+gxSd0cKlAz3OOciKDAGPdnzvpfo4Lw DNypmpuBKB28j/uHmYVyv9pT/W6qNBuaGMIg1ULwo7LB1N6g4yt8fbVAvw4MwzNmSItl JZCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=L+o2b9EV3MgUkzDGWqK/zkIrBilDbITsoh1cR31lC9o=; b=RDR5udzOCRBByAZdrMb8Go3GWjMKiAsr0+EtOZjakDYtNsxrPQ2bO1ULopiKqL4cYA FFlCd4IxWIBVUxZi/NdACM3tXYcPKpBBv5YPJJjbGdB9dDWC4ImGoaYom5t1ITfUoVXO OI/XkCKRxX6sl33KhgyG86S+h4Mes/O+0XN+vRf1OuQJPDZB1KoZMO9WpQH/hzu++q/j CHQwWKSvtqHHPWu2A9rMobs5NQKE+KVk2jFy5vUsB+qgcU4LUUjhzfWPUsnvh6TV4Sp/ Q82gqVCdXYgKjefrJvj1JWslx/PzO0P7E1T8O/O1Z5DU9Z8xBuJuMuiSdJc478hcumKU dhdw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n2-v6si11714232pge.74.2018.07.30.12.12.14; Mon, 30 Jul 2018 12:12:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732076AbeG3Uqd (ORCPT + 99 others); Mon, 30 Jul 2018 16:46:33 -0400 Received: from mx2.suse.de ([195.135.220.15]:49730 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731047AbeG3Uqd (ORCPT ); Mon, 30 Jul 2018 16:46:33 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 599E9ADA9; Mon, 30 Jul 2018 19:10:07 +0000 (UTC) Date: Mon, 30 Jul 2018 21:10:05 +0200 From: Michal Hocko To: Tejun Heo Cc: Tetsuo Handa , Roman Gushchin , Johannes Weiner , Vladimir Davydov , David Rientjes , Andrew Morton , Linus Torvalds , linux-mm , LKML Subject: Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry(). Message-ID: <20180730191005.GC24267@dhcp22.suse.cz> References: <20180726113958.GE28386@dhcp22.suse.cz> <55c9da7f-e448-964a-5b50-47f89a24235b@i-love.sakura.ne.jp> <20180730093257.GG24267@dhcp22.suse.cz> <9158a23e-7793-7735-e35c-acd540ca59bf@i-love.sakura.ne.jp> <20180730144647.GX24267@dhcp22.suse.cz> <20180730145425.GE1206094@devbig004.ftw2.facebook.com> <0018ac3b-94ee-5f09-e4e0-df53d2cbc925@i-love.sakura.ne.jp> <20180730154424.GG1206094@devbig004.ftw2.facebook.com> <20180730185110.GB24267@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180730185110.GB24267@dhcp22.suse.cz> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This change has been posted several times with some concerns about the changelog. Originally I thought it is more of a "nice to have" thing rather than a bug fix, later Tetsuo has taken over it but the changelog was not really comprehensible so I reworded it. Let's see if this is better. From 9bbea6516bb99615aff5ba5699865aa2d48333cc Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 26 Jul 2018 14:40:03 +0900 Subject: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry(). Tetsuo Handa has reported that it is possible to bypass the short sleep for PF_WQ_WORKER threads which was introduced by commit 373ccbe5927034b5 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress") and lock up the system if OOM. The primary reason is that WQ_MEM_RECLAIM WQs are not guaranteed to run even when they have a rescuer available. Those workers might be essential for reclaim to make a forward progress, however. If we are too unlucky all the allocations requests can get stuck waiting for a WQ_MEM_RECLAIM work item and the system is essentially stuck in an OOM condition without much hope to move on. Tetsuo has seen the reclaim stuck on drain_local_pages_wq or xlog_cil_push_work (xfs). There might be others. Since should_reclaim_retry() should be a natural reschedule point, let's do the short sleep for PF_WQ_WORKER threads unconditionally in order to guarantee that other pending work items are started. This will workaround this problem and it is less fragile than hunting down when the sleep is missed. E.g. we used to have a sleeping point in the oom path but this has been removed recently because it caused other issues. Having a single sleeping point is more robust. Reported-and-debugged-by: Tetsuo Handa Signed-off-by: Michal Hocko Cc: Roman Gushchin Cc: Johannes Weiner Cc: Vladimir Davydov Cc: David Rientjes Cc: Tejun Heo --- mm/page_alloc.c | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1521100f1e63..f56cc0958d09 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3922,6 +3922,7 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, { struct zone *zone; struct zoneref *z; + bool ret = false; /* * Costly allocations might have made a progress but this doesn't mean @@ -3985,25 +3986,26 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, } } - /* - * Memory allocation/reclaim might be called from a WQ - * context and the current implementation of the WQ - * concurrency control doesn't recognize that - * a particular WQ is congested if the worker thread is - * looping without ever sleeping. Therefore we have to - * do a short sleep here rather than calling - * cond_resched(). - */ - if (current->flags & PF_WQ_WORKER) - schedule_timeout_uninterruptible(1); - else - cond_resched(); - - return true; + ret = true; + goto out; } } - return false; +out: + /* + * Memory allocation/reclaim might be called from a WQ + * context and the current implementation of the WQ + * concurrency control doesn't recognize that + * a particular WQ is congested if the worker thread is + * looping without ever sleeping. Therefore we have to + * do a short sleep here rather than calling + * cond_resched(). + */ + if (current->flags & PF_WQ_WORKER) + schedule_timeout_uninterruptible(1); + else + cond_resched(); + return ret; } static inline bool -- 2.18.0 -- Michal Hocko SUSE Labs