Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp857007imu; Fri, 16 Nov 2018 11:20:09 -0800 (PST) X-Google-Smtp-Source: AJdET5fKiBgIGWrmoXCj5dx4U+Ri6LBZm4WlSZ3fAeHG8VUXnPh5eGl5xFSsyx8qZkulHsPKbtR9 X-Received: by 2002:a17:902:4523:: with SMTP id m32-v6mr11420237pld.276.1542396009483; Fri, 16 Nov 2018 11:20:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542396009; cv=none; d=google.com; s=arc-20160816; b=f014WAZZQqsg+SbDm1PYOhVll54E0IXXZrOEsD9E2eGMii0J/PSl8edbFoEuVcWhK8 4gpEM5ujw2RSc90mVQPmsFbPbs8+nwnAsoQygCKb0vKk/ub+9pvHwyH5jRxeMqClnkja RAMIEnlYunkIkKhwM44kFL5LKytGT+kIGvfsIbzYe/7o+9nfauSWH8yCRBHDMRr+Te+V rHMeUV0ZVvFLxXPY1nGJz7HNRbFieSjSQHWEHrnciXvm1IdBqEu+l5XRvLYGdgSwBzfz sgOsTQY6lppQPX9pOizTyXhURKGXUNIAvmwYzhcK02WycUm2oRwoSoA+qEryW/BsowVJ pm3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=4D5ho2grypApMeWN1cVkErHE+xzNY1v2wUsV5EvZCoM=; b=c38SQGsynVLxt4wjnhyjl4iVC9jYXSdAVjURlB3tj/Z34WYBM3Peg6NpxBospC0DQX HE2AAnctD8IBwyoBrAty0GzoEsHSfHm0/lBFhuWxMiKXBFcWTtwTMcix5grGxOADcAEe BsDMRKdtsis3cqGKtQ8zDpjU3n0KwbKsfThXBJoRftcFKkhZi9LqDiFvuRTjVYz/I42V +GoZw2YotFOeK6j+Ih4QYhGWt+kbW7vf8WYNn3hsw2cuwfkDSTxGy1YBzGMLcNEU+KVy dJy6vyXjWahc2O9/6IwIMdD5072G6sq3l8kQKpR7bV23TUOlLeSXuIjs9VbceWoSHTfe N+yA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=1cpRBReK; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f16si30510955pgb.140.2018.11.16.11.19.52; Fri, 16 Nov 2018 11:20:09 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=1cpRBReK; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726135AbeKQFcx (ORCPT + 99 others); Sat, 17 Nov 2018 00:32:53 -0500 Received: from mail.kernel.org ([198.145.29.99]:33180 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725729AbeKQFcw (ORCPT ); Sat, 17 Nov 2018 00:32:52 -0500 Received: from localhost (unknown [64.114.255.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 1D4622086C; Fri, 16 Nov 2018 19:19:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1542395953; bh=mS6hubojQZQVo5ec4podIWhkMcmddk3rdV4jE4nAa2k=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=1cpRBReKaDIgOT1csTA2dn1aX9cIZF8R3wvG9nG2cXguAv97x3vtnpnAdWdvZM5/l 92SOJkIukVPCmmO+vsY7gEO3LVND7XpRHHbaW5U7CS8Ybmwh+mukOsI6c65I1cslK2 bQIrZEzq/DDujvK+GNJ1BjI/UKVXmhslTncUs6So= Date: Fri, 16 Nov 2018 14:19:10 -0500 From: Sasha Levin To: Michal Hocko Cc: Andrew Morton , stable@vger.kernel.org, linux-kernel@vger.kernel.org, Jann Horn , Davidlohr Bueso , Oleg Nesterov , Christoph Lameter , Kemi Wang , Andy Lutomirski , Ingo Molnar , Linus Torvalds , linux-mm@kvack.org Subject: Re: [PATCH AUTOSEL 3.18 8/9] mm/vmstat.c: assert that vmstat_text is in sync with stat_items_size Message-ID: <20181116191910.GJ1706@sasha-vm> References: <20181113055252.79406-1-sashal@kernel.org> <20181113055252.79406-8-sashal@kernel.org> <20181115140810.e3292c83467544f6a1d82686@linux-foundation.org> <20181115223718.GB1706@sasha-vm> <20181115144719.d26dc7a2d47fade8d41a83d5@linux-foundation.org> <20181115230118.GC1706@sasha-vm> <20181116085525.GC14706@dhcp22.suse.cz> <20181116181904.GH1706@sasha-vm> <20181116184457.GA11906@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20181116184457.GA11906@dhcp22.suse.cz> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 16, 2018 at 07:44:57PM +0100, Michal Hocko wrote: >On Fri 16-11-18 13:19:04, Sasha Levin wrote: >> On Fri, Nov 16, 2018 at 09:55:25AM +0100, Michal Hocko wrote: >[...] >> > > Race condition with memory hotplug due to missing locks: >> > > >> > > https://marc.info/?l=linux-mm&m=154211934011188&w=2 >> > >> > Memory hotplug locking is dubious at best and this patch doesn't really >> > fix it. It fixes a theoretical problem. I am not aware anybody would be >> > hitting in practice. We need to rework the locking quite extensively. >> >> The word "theoretical" used in the stable rules file does not mean >> that we need to have actual reports of users hitting bugs before we >> start backporting the relevant patch, it simply suggests that there >> needs to be a reasonable explanation of how this issue can be hit. >> >> For this memory hotplug patch in particular, I use the hv_balloon driver >> at this very moment (running a linux guest on windows, with "dynamic >> memory" enabled). Should I wait for it to crash before I can fix it? >> >> Is the upstream code perfect? No, but that doesn't mean that it's not >> working at all, and if there are users they expect to see fixes going in >> and not just sitting idly waiting for a big rewrite that will come in a >> few years. >> >> Memory hotplug fixes are not something you think should go to stable? >> Andrew sent a few of them to stable, so that can't be the case. > >I am not arguing about hotplug fixes in general. I was arguing that this >particular one is a theoretical one and hotplug locking is quite subtle. >E.g. 381eab4a6ee mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock >http://lkml.kernel.org/r/20181114070909.GB2653@MiWiFi-R3L-srv >So in general unless the issue is really triggered easily I am rather >conservative. We have millions of machines running linux, everything is triggered "easily" at that scale. >> > > Raising an OOM event that causes issues in userspace when no OOM has >> > > actually occured: >> > > >> > > https://marc.info/?l=linux-mm&m=154211939811582&w=2 >> > >> > The patch makes sense I just do not think this is a stable material. The >> > semantic of the event was and still is suboptimal. >> >> I really fail to understand your reasoning about -stable here. This >> patch is something people actually hit in the field, spent time on >> triaging and analysing it, and submitting a fix which looks reasonably >> straightforward. >> >> That fix was acked by quite a few folks (including yourself) and merged >> in. And as far as we can tell, it actually fixed the problem. >> >> Why is it not stable material? > >Because the semantic of the OOM event is quite tricky itself. We have >discussed this patch and concluded that the updated one is more >sensible. But it is not yet clear whether this is actually what other >users expect as well. That to me does sound quite risky for a stable >kernel. So there's another patch following this one that fixes it? Sure - can I take both? Users expect to not have their containers die randomly, if you're saying that you're still working on a fix for that then that is a different story than saying "we fixed it, but it should not go to stable". And let's also draw a line there, users will not wait for the OOM event logic to be perfect before they can expect their workloads to run without issues. >> My understanding is that you're concerned with the patch itself being >> "suboptimal", but in that case - why did you ack it? >> >> > > I think that all 3 cases represent a "real" bug users can hit, and I >> > > honestly don't know why they were not tagged for stable. >> > >> > It would be much better to ask in the respective email thread rather >> > than spamming mailing with AUTOSEL patches which rarely get any >> > attention. >> >> I actually tried it, but the comments I got is that it gets in the way >> and people preferred something they can filter. > >which means that AUTOSEL just goes to /dev/null... Or just not get mixed with the process? for some people it's easier to see AUTOSEL mails with the way it works now rather than if they suddenly show up as a continuation of a weeks old thread. >> > We have been through this discussion several times already and I thought >> > we have agreed that those subsystems which are seriously considering stable >> > are opted out from the AUTOSEL automagic. Has anything changed in that >> > regards. >> >> I checked in with Andrew to get his input on this, he suggested that >> these patches should be sent to linux-mm and he'll give it a close look. > >If Andrew is happy to get AUTOSEL patches then I will not object of >course but let's not merge these patches without and expclicit OK. This is fair. I think that the process has caused some unnecessary friction: we all want the same result but just disagree on the means :) I won't merge any mm/ AUTOSEL patches until this gets clearer. -- Thanks, Sasha