Received: by 10.223.164.221 with SMTP id h29csp44376wrb; Tue, 31 Oct 2017 09:46:58 -0700 (PDT) X-Google-Smtp-Source: ABhQp+RuAbi6bdz1kGK4M4T5+S9tiiTIFlA7bmlFjkT4ngpSWxmfCA4FGTi6shcrTzq7DsZSvw1Y X-Received: by 10.84.133.69 with SMTP id 63mr2358479plf.203.1509468418420; Tue, 31 Oct 2017 09:46:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1509468418; cv=none; d=google.com; s=arc-20160816; b=wWVg+Zt/mJGF2theCME3TRaIgfpT/IOhMbxFOcWrB+th+XVzIgOb2t8rzZUyGsyAHo xIsE6ErHdrF8WbQGV51PHsgv/zTI5/GDI/jzvA5/pjIFvDvearipFv7yyg6yZIvtSjTf rkoK5r0zW3vRp53REe7BjsnyLIZ4aczBqSpxXhhki+sywIH48umfw7l7cOCIzICL2yYK Y3RP0Bbb4a1cfkVwZXASNEe5rQEx9PFHKKNGQPGOtdEUX/rNmqnM671c0divXOSJoRo7 cq4my0zbur3hOiZyJfGcLDmXCNE84kWtInM+mCWHEbm3eyrXkLdlM+dhc8VA104LZbh1 mtnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=07wvFozbIjsZmawh7t3wvHqePJNDgx69/szDtWyCmAc=; b=QZz/6xtgFWQaJmKq3uqUQHRycDay+3mcFLFrvq+VW5fKtKPuLpnsLu/bNd64yr4Wwk 6o1F3Ikjk5kjbcEK7I1fpWQEzUo8ibTwnBtTr6DjEGYNcBZMje15hrB+AkboG2L01mUA FCaSjgK4NBZOINfY+ziJi0sVd4I7xZ/3z9ldIruaF8+w9jNt1Ee0XvPDH0TItDXy3tHh VpWP3LXfwCXJHfJ5X/7+QGJzcSPgL33oOqCppeOz18vAuy7T5OlmsJ+qWrQzAQOg5Zwc mWx8iDKt03HPRPCuMPWlCNKMMorq4hmtewIbqShZCQHhPKzAzmEbYES46S69yg5sNHqQ wtuQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@alibaba-inc.com header.s=default header.b=Yl0+0Jje; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba-inc.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r2si2002507pfi.483.2017.10.31.09.46.45; Tue, 31 Oct 2017 09:46:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@alibaba-inc.com header.s=default header.b=Yl0+0Jje; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alibaba-inc.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753940AbdJaQoi (ORCPT + 99 others); Tue, 31 Oct 2017 12:44:38 -0400 Received: from out0-241.mail.aliyun.com ([140.205.0.241]:49134 "EHLO out0-241.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753141AbdJaQof (ORCPT ); Tue, 31 Oct 2017 12:44:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alibaba-inc.com; s=default; t=1509468269; h=Subject:To:From:Message-ID:Date:MIME-Version:Content-Type; bh=07wvFozbIjsZmawh7t3wvHqePJNDgx69/szDtWyCmAc=; b=Yl0+0JjeZ0p1KiPLFod+fg0qBiE00Caf82QlpslkH5jfefh2uEK4wHPS4buYc+a3asXc1eXCiNW45W8/3G+zEcnYQ8AM9NDCOC59ClXYOjUDC1fbVgsX0fkcdyEfukdfFdWbNyQiF3/DypDzYcjCGhX0XrP3N5sj9GfYCKjYzIs= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R211e4;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e02c03271;MF=yang.s@alibaba-inc.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---.9HqTRAo_1509468263; Received: from US-143344MP.local(mailfrom:yang.s@alibaba-inc.com ip:121.0.29.194) by smtp.aliyun-inc.com(127.0.0.1); Wed, 01 Nov 2017 00:44:25 +0800 Subject: Re: [PATCH v2] fs: fsnotify: account fsnotify metadata to kmemcg To: Jan Kara Cc: amir73il@gmail.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, mhocko@suse.cz References: <1509128538-50162-1-git-send-email-yang.s@alibaba-inc.com> <20171030124358.GF23278@quack2.suse.cz> <76a4d544-833a-5f42-a898-115640b6783b@alibaba-inc.com> <20171031101238.GD8989@quack2.suse.cz> From: "Yang Shi" Message-ID: Date: Wed, 01 Nov 2017 00:44:18 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20171031101238.GD8989@quack2.suse.cz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/31/17 3:12 AM, Jan Kara wrote: > On Tue 31-10-17 00:39:58, Yang Shi wrote: >> On 10/30/17 5:43 AM, Jan Kara wrote: >>> On Sat 28-10-17 02:22:18, Yang Shi wrote: >>>> If some process generates events into a huge or unlimit event queue, but no >>>> listener read them, they may consume significant amount of memory silently >>>> until oom happens or some memory pressure issue is raised. >>>> It'd better to account those slab caches in memcg so that we can get heads >>>> up before the problematic process consume too much memory silently. >>>> >>>> But, the accounting might be heuristic if the producer is in the different >>>> memcg from listener if the listener doesn't read the events. Due to the >>>> current design of kmemcg, who does the allocation, who gets the accounting. >>>> >>>> Signed-off-by: Yang Shi >>>> --- >>>> v1 --> v2: >>>> * Updated commit log per Amir's suggestion >>> >>> I'm sorry but I don't think this solution is acceptable. I understand that >>> in some cases (and you likely run one of these) the result may *happen* to >>> be the desired one but in other cases, you might be charging wrong memcg >>> and so misbehaving process in memcg A can effectively cause a DoS attack on >>> a process in memcg B. >> >> Yes, as what I discussed with Amir in earlier review, current memcg design >> just accounts memory to the allocation process, but has no idea who is >> consumer process. >> >> Although it is not desirable to DoS a memcg, it still sounds better than DoS >> the whole machine due to potential oom. This patch is aimed to avoid such >> case. > > Thinking about this even more, your solution may have even worse impact - > due to allocations failing, some applications may avoid generation of fs > notification events for actions they do. And that maybe a security issue in > case there are other applications using fanotify for security enforcement, > virus scanning, or whatever... In such cases it is better to take the > whole machine down than to let it run. I guess (just guess) this might be able to be solved by Amir's patch, right? An overflow or error event will be queued, then the consumer applications could do nicer error handling/softer exit. Actually, the event is dropped when -ENOMEM regardless of my patch. As Amir said this patch may just amplify this problem if my understanding is right. Thanks, Yang > >>> If you have a setup in which notification events can consume considerable >>> amount of resources, you are doing something wrong I think. Standard event >>> queue length is limited, overall events are bounded to consume less than 1 >>> MB. If you have unbounded queue, the process has to be CAP_SYS_ADMIN and >>> presumably it has good reasons for requesting unbounded queue and it should >>> know what it is doing. >> >> Yes, I agree it does mean something is going wrong. So, it'd better to be >> accounted in order to get some heads up early before something is going >> really bad. The limit will not be set too high since fsnotify metadata will >> not consume too much memory in *normal* case. >> >> I agree we should trust admin user, but kernel should be responsible for the >> last defense when something is really going wrong. And, we can't guarantee >> admin process will not do something wrong, the code might be not reviewed >> thoroughly, the test might not cover some extreme cases. >> >>> >>> So maybe we could come up with some better way to control amount of >>> resources consumed by notification events but for that we lack more >>> information about your use case. And I maintain that the solution should >>> account events to the consumer, not the producer... >> >> I do agree it is not fair and not neat to account to producer rather than >> misbehaving consumer, but current memcg design looks not support such use >> case. And, the other question is do we know who is the listener if it >> doesn't read the events? > > So you never know who will read from the notification file descriptor but > you can simply account that to the process that created the notification > group and that is IMO the right process to account to. > > I agree that current SLAB memcg accounting does not allow to account to a > different memcg than the one of the running process. However I *think* it > should be possible to add such interface. Michal? > > Honza > From 1582767685849308022@xxx Tue Oct 31 10:14:50 +0000 2017 X-GM-THRID: 1582436124352495971 X-Gmail-Labels: Inbox,Category Forums