Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp6835810ybf; Fri, 6 Mar 2020 05:31:29 -0800 (PST) X-Google-Smtp-Source: ADFU+vtPJuWGBT9AnbiPFEZNLEFV/q4bkKl49YM/O7PVb5k9JJtzFlin3fmu1fGT2N2VVymXmiMa X-Received: by 2002:a05:6808:8d0:: with SMTP id k16mr2582502oij.68.1583501488957; Fri, 06 Mar 2020 05:31:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583501488; cv=none; d=google.com; s=arc-20160816; b=JlutG+KPnrYis3BhHVaKDT5pDZ/BfFleKpTJHhe5Us9FgQae4rI71/cDTLR1yTaSJB /HzKLE2+KgiuNO0P6OzSRCWyCsktdsdxg0sz7DCI/9QEVCe7jYQ4Sudw3FvaImsNKlle xOeQOs2jUnZzpTIS9vikqXutIpcASxfld646ESFHLRPejtv/ed4VABdJN9kh72MrUKFn hnxaUlDel9ngmugJLqKLHM5sa0OhldF7EQh5xyF2HGs2jObVEYUmadjjkC5O9f+Rsm1a p5YnSk3eyvHu9V4WAhNzSBT2WUHFCIYO4v4GegVjQEpEVNbozAOv0yC3N5Hv55JKis35 PeOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=SWB/YleoOu9UTjA6KJcvFV8SuPE7yN6QR7Wv5n0FzH8=; b=Ql4lR24fqi+ou2J1hDxpIdHDK5bUVMv3p6E3+w1s4K5WC/HTVgtsdkO8ZqtrYSN0l4 ME5tn9GlXoO1LaD7dAbT4LblKkvobO1ddlakqNEXlGtdYQ9LexycZiYPBcuFvI7ITalC 4FOR5N8Rqmi7Pp+s+7Rzv4I5w1MukI1ePjIw9/iDAR7scr/qCR9/s/xlraorvcXppzlZ 63usWcz5aBpdt6ODCTZlIiPDd4SM1F14dLuPmVeaK6fDoFvRzejWcasw/1YTtH0PSanj pwEeEbTzZSc8MpEcR2snxOJYtIIufWyhokjNHPH1Rau7W4gXsKdWJkXNnzsOyFn241LL tQ9g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e31si1293826ote.47.2020.03.06.05.31.14; Fri, 06 Mar 2020 05:31:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727206AbgCFNao (ORCPT + 99 others); Fri, 6 Mar 2020 08:30:44 -0500 Received: from out30-54.freemail.mail.aliyun.com ([115.124.30.54]:59917 "EHLO out30-54.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727023AbgCFNan (ORCPT ); Fri, 6 Mar 2020 08:30:43 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04452;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=17;SR=0;TI=SMTPD_---0TrpcGV2_1583501424; Received: from IT-FVFX43SYHV2H.local(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0TrpcGV2_1583501424) by smtp.aliyun-inc.com(127.0.0.1); Fri, 06 Mar 2020 21:30:25 +0800 Subject: Re: [failures] mm-vmscan-remove-unnecessary-lruvec-adding.patch removed from -mm tree To: Hugh Dickins , Qian Cai Cc: Matthew Wilcox , LKML , Andrew Morton , aarcange@redhat.com, daniel.m.jordan@oracle.com, hannes@cmpxchg.org, khlebnikov@yandex-team.ru, kirill@shutemov.name, kravetz@us.ibm.com, mhocko@kernel.org, mm-commits@vger.kernel.org, tj@kernel.org, vdavydov.dev@gmail.com, yang.shi@linux.alibaba.com, linux-mm@kvack.org References: <20200306025041.rERhvnYmB%akpm@linux-foundation.org> <211632B1-2D6F-4BFA-A5A0-3030339D3D2A@lca.pw> <20200306033850.GO29971@bombadil.infradead.org> <97EE83E1-FEC9-48B6-98E8-07FB3FECB961@lca.pw> From: Alex Shi Message-ID: Date: Fri, 6 Mar 2020 21:30:24 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2020/3/6 下午12:17, Hugh Dickins 写道: >>> >>> Subject: Re: [PATCH v9 00/21] per lruvec lru_lock for memcg >> >> I don’t see it on lore.kernel or anywhere. Private email? > > You're right, sorry I didn't notice, lots of ccs but > neither lkml nor linux-mm were on that thread from the start: My fault, I thought people would often give comments on each patch, will care this from now on. > > And now the bad news. > > Andrew, please revert those six (or seven as they ended up in mmotm). > 5.6-rc4-mm1 without them runs my tmpfs+loop+swapping+memcg+ksm kernel > build loads fine (did four hours just now), but 5.6-rc4-mm1 itself > crashed just after starting - seconds or minutes I didn't see, > but it did not complete an iteration. > > I thought maybe those six would be harmless (though I've not looked > at them at all); but knew already that the full series is not good yet: > I gave it a try over 5.6-rc4 on Monday, and crashed very soon on simpler > testing, in different ways from what hits mmotm. > > The first thing wrong with the full set was when I tried tmpfs+loop+ > swapping kernel builds in "mem=700M cgroup_disabled=memory", of course > with CONFIG_DEBUG_LIST=y. That soon collapsed in a splurge of OOM kills > and list_del corruption messages: __list_del_entry_valid < list_del < > __page_cache_release < __put_page < put_page < __try_to_reclaim_swap < > free_swap_and_cache < shmem_free_swap < shmem_undo_range. I have been run kernel build with a "mem=700M cgroup_disabled=memory" qemu-kvm with a swapfile for 3 hours, Hope I could catch sth while waiting for your kindly reproduce scripts. Thanks Hugh! > > When I next tried with "mem=1G" and memcg enabled (but not being used), > that managed some iterations, no OOM kills, no list_del warnings (was > it swapping? perhaps, perhaps not, I was trying to go easy on it just > to see if "cgroup_disabled=memory" had been the problem); but when > rebooting after that, again list_del corruption messages and crash > (I didn't note them down). > > So I didn't take much notice of what the mmotm crash backtrace showed > (but IIRC shmem and swap were in it). Is there some place to get mmotm's crash backtrace? > > Alex, I'm afraid you're focusing too much on performance results, > without doing the basic testing needed - I thought we had given you > some hints on the challenging areas (swapping, move_charge_at_immigrate, > page migration) when we attached a *correctly working* 5.3 version back > on 23rd August: > > https://lore.kernel.org/linux-mm/alpine.LSU.2.11.1908231736001.16920@eggly.anvils/ > > (Correctly working, except missing two patches I'd mistakenly dropped > as unnecessary in earlier rebases: but our discussions with Johannes > later showed to be very necessary, though their races rarely seen.) > Did you mean the Johannes's question of race on page->memcg in previous email? "> I don't see what prevents the lruvec from changing under compaction, > neither in your patches nor in Hugh's. Maybe I'm missing something?" https://lkml.org/lkml/2019/11/22/2153 From then on, I have tired 2 solutions to protect page->memcg, first use lock_page_memcg(wrong) and 2nd new solution, taking PageLRU bit as page isoltion precondition which may work for memcg migration, and page migration in compaction etc. Could you like to give some comments on this? > I have not had the time (and do not expect to have the time) to review > your series: maybe it's one or two small fixes away from being complete, > or maybe it's still fundamentally flawed, I do not know. I had naively > hoped that you would help with a patchset that worked, rather than > cutting it down into something which does not.> Sorry, Hugh, I didn't know you have per memcg lru_lock patchset before I sent out my first verion. > Submitting your series to routine testing is much easier for me than > reviewing it: but then, yes, it's a pity that I don't find the time > to report the results on intervening versions, which also crashed. > > What I have to do now, is set aside time today and tomorrow, to package > up the old scripts I use, describe them and their environment, and send > them to you (cc akpm in case I fall under a bus): so that you can > reproduce the crashes for yourself, and get to work on them. > Thanks advance for your coming testing scripts, I believe it will help a lot. BTW, I try my best to orgnize this patches to make it stright, a senior experts like you, won't cost much time to go through whole patches. and give some precious comment! I am looking forward to hear comments from you. :) Thanks Alex