Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp4312947imm; Mon, 30 Jul 2018 12:15:31 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdYTG0JdPdENMWAW6m65JcTx2fDgesUGcL36/2chbYuFllye68hmpAh7dh0FlbtNlwQoJ6x X-Received: by 2002:a63:cf4a:: with SMTP id b10-v6mr17684110pgj.235.1532978131254; Mon, 30 Jul 2018 12:15:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532978131; cv=none; d=google.com; s=arc-20160816; b=x21INLozFKx8eJxVdbPFkos/YMg9NRm5BmwylAN7HAB8//hPSItHqzcHEqPnsA0+bz VEB1/G5kLf4JKAEFyLA5UE10H+WBK7vxbp4iBZ48CAUVU0nvV8D+WFBPzJZ7Z2QNxhkO NDYG1NPhfS8R7komLIGTOfK4N7swdc2H3i/DSoms1nB7GbnUAvNHH+c+xo0RvAOhLD5k yUB0hWXlzL8AcqfQtq4g8azWGCky+YDqsjtXVCSll+5JVt0HndiQ8RMfNczyL3YlHE+L x8JB0iS1VQBMfqv+jQCypTyxCPYBg/2e3UTlW23dWePhyW51r/ub9TOwbdOqBF6NB5f2 zC6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=7VKO+/0WpTfQu8GsWuG02LInhXhEPsRHnk1plEJRctE=; b=jHkIIrjexSgU6wiEFXA1bKj1ta/cyqVHQwcRJXmSN+FEJ5ckDH1dmRoaEKvSY4riHK TP8G5GV8WSp2owBw9ux5OzCuBLy6JtpgTb4siS8QDyjmwQDak4gfNg/1Pzjba1BrgjEp 51JY7ziWxBBxUMXP9b7XWhklUHy+DvFeCDafJpDTLz34mKOckgUhz8nuNHO0V8EoUY76 e5tD6frTqkREQZNRVqLEzJtp7hkD7yA+OAR7IDvyvhOlYazsVj7Wz5XH3Rcw05jF0frm 61Xu0oqqnxXax65f170kejkXanuie6K42X06EpRZmYQLucRn0Mdo1+IA21UWz8+rJSpJ sG1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=iMqwBTza; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 87-v6si12524218pfi.60.2018.07.30.12.15.16; Mon, 30 Jul 2018 12:15:31 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=iMqwBTza; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731776AbeG3Uuv (ORCPT + 99 others); Mon, 30 Jul 2018 16:50:51 -0400 Received: from mail-yb0-f194.google.com ([209.85.213.194]:45060 "EHLO mail-yb0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730400AbeG3Uuu (ORCPT ); Mon, 30 Jul 2018 16:50:50 -0400 Received: by mail-yb0-f194.google.com with SMTP id h127-v6so5188448ybg.12 for ; Mon, 30 Jul 2018 12:14:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=7VKO+/0WpTfQu8GsWuG02LInhXhEPsRHnk1plEJRctE=; b=iMqwBTzaZEUz08AuqK3jhYhanGsog0iY7tcPhptDeyj+F4HdXFkWRq0z7TgjLL7MHZ gF3S+Qifa4h1/+h6ftktUirWzvfyrTBR9jR2QNciYVo4Xl3DgMTpYhv/axemHi6Kvr1S E9dynC+UDq75pUYLJenkpy8jd2U7LqF30Tf/CQTajW4hkXhnUPNYgUX/c4JF8bAazxTr oWs7OZvYp2xU2CIiWexRHLaksNJvXtbb36dvsVz+d2/ZR/+oOMVVK2dLWUTAIMmh4ZU4 UaZZ54vZNOqEEvBmbg6jxdmLiGlsyGovW2o1EDhla3h1o/hU9ThfEucbG1G2JgtAw5E2 Xb9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=7VKO+/0WpTfQu8GsWuG02LInhXhEPsRHnk1plEJRctE=; b=q3HzbUi/PRVPLOqSKxyUMlooGoiFrxh8aPEISVdR9UMp6xyy9crmIvI2GzLMTiBou8 BAP0ANewri3c6K/99UmXCsQi2P0XdpHXZ9ttC3SL4KG7RARr3zAGUEHkc4YFQkoOHuMb HPlyOCq2NOUTN9IPbldROqA0p2ySgx3c+vuF16PTpMe3LUXvl7ftL+aQxcwHV3SUIUPf zLL02Y2ZK6yqSOdxyy2XHnh41SbOHLvdlFR49LbxdSjbAdRAl1yeuybBj1dZEnDFc8NR XrZcqWVRbZ5/gfKCR/jrqJVYw840dwHPWUuyOc4gAN2n2T8+jYYQ3g8PaKElN2L8kVzR awRA== X-Gm-Message-State: AOUpUlGqQ6LjctIIZc1dU0motjmbZhb+oV19LXaiycbuRclhRt+ADTjm j1Hb/3fjGEm4J0WRtzbnXGY= X-Received: by 2002:a25:b4a:: with SMTP id 71-v6mr9715323ybl.412.1532978065543; Mon, 30 Jul 2018 12:14:25 -0700 (PDT) Received: from localhost ([2620:10d:c091:200::2:4bfe]) by smtp.gmail.com with ESMTPSA id k10-v6sm5032096ywk.101.2018.07.30.12.14.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Jul 2018 12:14:24 -0700 (PDT) Date: Mon, 30 Jul 2018 12:14:23 -0700 From: Tejun Heo To: Michal Hocko Cc: Tetsuo Handa , Roman Gushchin , Johannes Weiner , Vladimir Davydov , David Rientjes , Andrew Morton , Linus Torvalds , linux-mm , LKML Subject: Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at should_reclaim_retry(). Message-ID: <20180730191423.GN1206094@devbig004.ftw2.facebook.com> References: <20180726113958.GE28386@dhcp22.suse.cz> <55c9da7f-e448-964a-5b50-47f89a24235b@i-love.sakura.ne.jp> <20180730093257.GG24267@dhcp22.suse.cz> <9158a23e-7793-7735-e35c-acd540ca59bf@i-love.sakura.ne.jp> <20180730144647.GX24267@dhcp22.suse.cz> <20180730145425.GE1206094@devbig004.ftw2.facebook.com> <0018ac3b-94ee-5f09-e4e0-df53d2cbc925@i-love.sakura.ne.jp> <20180730154424.GG1206094@devbig004.ftw2.facebook.com> <20180730185110.GB24267@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180730185110.GB24267@dhcp22.suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Michal. On Mon, Jul 30, 2018 at 08:51:10PM +0200, Michal Hocko wrote: > > Yeah, workqueue can choke on things like that and kthread indefinitely > > busy looping doesn't do anybody any good. > > Yeah, I do agree. But this is much easier said than done ;) Sure > we have that hack that does sleep rather than cond_resched in the > page allocator. We can and will "fix" it to be unconditional in the > should_reclaim_retry [1] but this whole thing is really subtle. It just > take one misbehaving worker and something which is really important to > run will get stuck. Oh yeah, I'm not saying the current behavior is ideal or anything, but since the behavior has been put in many years ago, it only became a problem only a couple times and all cases were rather easy and obvious fixes on the wq user side. It shouldn't be difficult to add a timer mechanism on top. We might be able to simply extend the hang detection mechanism to kick off all pending rescuers after detecting a wq stall. I'm wary about making it a part of normal operation (ie. silent timeout). per-cpu kworkers really shouldn't busy loop for an extended period of time. Thanks. -- tejun