Received: by 10.213.65.68 with SMTP id h4csp384441imn; Tue, 3 Apr 2018 23:33:14 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+fvUu+33NTAhBk5BwsfQC+8guSL6+QYWTD+dpRSVurFaTGHy4yidKF/idpPFwxJnuGpmtk X-Received: by 10.99.0.213 with SMTP id 204mr4192957pga.256.1522823594765; Tue, 03 Apr 2018 23:33:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522823594; cv=none; d=google.com; s=arc-20160816; b=V+MmZVUo44scMndyF5XMU8SPjFesL+RHP1jcZ6akIMSMRIJJMJ+UxUiEMMTcK9go4+ z8musvzbtEFryZW9BFtqOFOKS6eEi6+YJO/Upm4w33bjVRKIkhHL3E9V0MVXknlhdS73 mHDfZnRBoclYpW1/yhE5yhiT2SzcegwWWrPt1TqLCjHJ2rKsdWBsPtgjmUEJTFCCelO3 Tjw6DUojjiQcOxpU4lDiUcFTi0dmt+JPNyb/A9WyulyTa0r3OjlCF1oscLnhFIBR6NGO xvju1P5DL8/8RD0R4p/7X1/z4SBKGSdBqdqCntF8Ul6G0idus62Nn0bJVcIpxGM3PSMy 8GMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature:dkim-filter :arc-authentication-results; bh=cY/Yo2m8tExQf6Z1jIu+DTrnNfF8n/wVs0zuKX55SUQ=; b=YCaUP19m7xtcjxCE7thjQrIkVtr5t+zRG6gPJaKx0oB/M7wBvf5fADNtgNCeftZlgd YfqbdQyFsEUZiztVEf7IYiIF7xKaFi9k4yV+w+JydVQC/l2MOu5JT9UPewkpaAcjSmHV OUyqTHCRcSvs2xQlKX3kE665XMFaR0kWBnJ6uibOYMS5K80cUsBBGO9EgIBtCfzGaUU1 bVM8s6oS+/scgYYpYuosclbOt5keVpQ5Dpmo8GUg+8gXIHvlKQqOgUT3/P9KLBZ/+PuG 0849i9mzZCaeRp0SNDJlgzY1c4v2K2S57awbdZ5DkGEeUV2PJWnD/V6ADZHE6nA45WR5 Qiog== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@meituan.com header.s=20130113 header.b=JMBR9U7t; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=meituan.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u10si3168349pgp.45.2018.04.03.23.33.00; Tue, 03 Apr 2018 23:33:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@meituan.com header.s=20130113 header.b=JMBR9U7t; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=meituan.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750854AbeDDGby (ORCPT + 99 others); Wed, 4 Apr 2018 02:31:54 -0400 Received: from mx-fe5-210.meituan.com ([103.37.138.210]:48640 "EHLO mx02.meituan.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750714AbeDDGbw (ORCPT ); Wed, 4 Apr 2018 02:31:52 -0400 Received: from localhost (localhost [127.0.0.1]) by dx-it-mx02.dx.sankuai.com (Postfix) with ESMTP id 98E372976621; Wed, 4 Apr 2018 14:31:48 +0800 (CST) Received: from mx02.meituan.com ([127.0.0.1]) by localhost (dx-it-mx02.dx.sankuai.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id RyI6waNj4xlm; Wed, 4 Apr 2018 14:31:48 +0800 (CST) Received: from localhost (localhost [127.0.0.1]) by dx-it-mx02.dx.sankuai.com (Postfix) with ESMTP id 6548B297675C; Wed, 4 Apr 2018 14:31:48 +0800 (CST) DKIM-Filter: OpenDKIM Filter v2.9.2 dx-it-mx02.dx.sankuai.com 6548B297675C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meituan.com; s=20130113; t=1522823508; bh=cY/Yo2m8tExQf6Z1jIu+DTrnNfF8n/wVs0zuKX55SUQ=; h=Subject:To:From:Message-ID:Date:MIME-Version:Content-Type: Content-Transfer-Encoding; b=JMBR9U7tgTXh2W50gxRgjELcic7qG7M4JZKlmGzTiz/SjvUS4f4JFWeg0DUmYuT3y EcONV45iIGe5Lm8XVHcyM55ZMYKrHvGqj1K/iPHIdJYIzsXgrfgFY9gLSPneUXVviU 9Ymoeg1shbjQGUSqUVZv8tG+M9L4JtVMJJaXABdA= X-Virus-Scanned: amavisd-new at dx-it-mx02.dx.sankuai.com Received: from mx02.meituan.com ([127.0.0.1]) by localhost (dx-it-mx02.dx.sankuai.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id pzmOlZ5ERJ7G; Wed, 4 Apr 2018 14:31:48 +0800 (CST) Received: from wanglongs-MacBook-Pro.local (unknown [103.37.140.18]) by dx-it-mx02.dx.sankuai.com (Postfix) with ESMTPSA id A78352976621; Wed, 4 Apr 2018 14:31:36 +0800 (CST) Subject: Re: [RFC] Is it correctly that the usage for spin_{lock|unlock}_irq in clear_page_dirty_for_io To: Greg Thelen , Michal Hocko Cc: Tejun Heo , Johannes Weiner , akpm@osdl.org, LKML , npiggin@gmail.com References: <157ed606-4a61-508b-d26a-2f5d638f39bb@meituan.com> <20180403120312.GS5501@dhcp22.suse.cz> From: Wang Long Message-ID: <2cb713cd-0b9b-594c-31db-b4582f8ba822@meituan.com> Date: Wed, 4 Apr 2018 14:31:36 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/4/2018 7:12 AM, Greg Thelen wrote: > On Tue, Apr 3, 2018 at 5:03 AM Michal Hocko wrote: > >> On Mon 02-04-18 19:50:50, Wang Long wrote: >>> Hi, Johannes Weiner and Tejun Heo >>> >>> I use linux-4.4.y to test the new cgroup controller io and the current >>> stable kernel linux-4.4.y has the follow logic >>> >>> >>> int clear_page_dirty_for_io(struct page *page){ >>> ... >>> ... >>> memcg = mem_cgroup_begin_page_stat(page); ----------(a) >>> wb = unlocked_inode_to_wb_begin(inode, &locked); > ---------(b) >>> if (TestClearPageDirty(page)) { >>> mem_cgroup_dec_page_stat(memcg, > MEM_CGROUP_STAT_DIRTY); >>> dec_zone_page_state(page, NR_FILE_DIRTY); >>> dec_wb_stat(wb, WB_RECLAIMABLE); >>> ret =1; >>> } >>> unlocked_inode_to_wb_end(inode, locked); -----------(c) >>> mem_cgroup_end_page_stat(memcg); -------------(d) >>> return ret; >>> ... >>> ... >>> } >>> >>> >>> when memcg is moving, and I_WB_SWITCH flags for inode is set. the logic >>> is the following: >>> >>> >>> spin_lock_irqsave(&memcg->move_lock, flags); -------------(a) >>> spin_lock_irq(&inode->i_mapping->tree_lock); ------------(b) >>> spin_unlock_irq(&inode->i_mapping->tree_lock); -----------(c) >>> spin_unlock_irqrestore(&memcg->move_lock, flags); -----------(d) >>> >>> >>> after (c) , the local irq is enabled. I think it is not correct. >>> >>> We get a deadlock backtrace after (c), the cpu get an softirq and in the >>> irq it also call mem_cgroup_begin_page_stat to lock the same >>> memcg->move_lock. >>> >>> Since the conditions are too harsh, this scenario is difficult to >>> reproduce. But it really exists. >>> >>> So how about change (b) (c) to spin_lock_irqsave/spin_lock_irqrestore? >> Yes, it seems we really need this even for the current tree. Please note >> that At least clear_page_dirty_for_io doesn't lock memcg anymore. >> __cancel_dirty_page still uses lock_page_memcg though (former >> mem_cgroup_begin_page_stat). >> -- >> Michal Hocko >> SUSE Labs > I agree the issue looks real in 4.4 stable and upstream. It seems like > unlocked_inode_to_wb_begin/_end should use spin_lock_irqsave/restore. > > I'm testing a little patch now. Thanks. When fix it on upstream. The longterm kernel 4.9 and 4.14 also need to fix.