Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752354AbcJJXDc (ORCPT ); Mon, 10 Oct 2016 19:03:32 -0400 Received: from out03.mta.xmission.com ([166.70.13.233]:53791 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752225AbcJJXDb (ORCPT ); Mon, 10 Oct 2016 19:03:31 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Nikolay Borisov Cc: Jan Kara , John McCutchan , Eric Paris , Alexander Viro , "Serge E. Hallyn" , Andrey Vagin , LKML , Linux Containers References: <1475837161-4626-1-git-send-email-kernel@kyup.com> <8737k86n7q.fsf@x220.int.ebiederm.org> <57FB38C3.9090803@kyup.com> <20161010164046.GG24081@quack2.suse.cz> <87eg3o3p6l.fsf@x220.int.ebiederm.org> Date: Mon, 10 Oct 2016 17:39:07 -0500 In-Reply-To: (Nikolay Borisov's message of "Tue, 11 Oct 2016 00:54:04 +0300") Message-ID: <87r37n25is.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1btjFq-0000kv-SP;;;mid=<87r37n25is.fsf@x220.int.ebiederm.org>;;;hst=in02.mta.xmission.com;;;ip=75.170.125.99;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/AXeBs42jgFnXoFJhdHnYwIZMoulSA71Y= X-SA-Exim-Connect-IP: 75.170.125.99 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa05 1397; Body=1 Fuz1=1 Fuz2=1] * 1.0 T_XMDrugObfuBody_08 obfuscated drug references X-Spam-DCC: XMission; sa05 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Nikolay Borisov X-Spam-Relay-Country: X-Spam-Timing: total 299 ms - load_scoreonly_sql: 0.03 (0.0%), signal_user_changed: 3.4 (1.1%), b_tie_ro: 2.4 (0.8%), parse: 0.68 (0.2%), extract_message_metadata: 15 (4.9%), get_uri_detail_list: 2.0 (0.7%), tests_pri_-1000: 9 (3.0%), tests_pri_-950: 0.91 (0.3%), tests_pri_-900: 0.76 (0.3%), tests_pri_-400: 27 (9.0%), check_bayes: 26 (8.7%), b_tokenize: 6 (2.1%), b_tok_get_all: 10 (3.3%), b_comp_prob: 2.2 (0.7%), b_tok_touch_all: 5 (1.8%), b_finish: 0.68 (0.2%), tests_pri_0: 237 (79.2%), check_dkim_signature: 0.41 (0.1%), check_dkim_adsp: 2.4 (0.8%), tests_pri_500: 3.4 (1.1%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH] inotify: Convert to using per-namespace limits X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3496 Lines: 76 Nikolay Borisov writes: > On Mon, Oct 10, 2016 at 11:49 PM, Eric W. Biederman > wrote: >> Jan Kara writes: >> >>> On Mon 10-10-16 09:44:19, Nikolay Borisov wrote: >>>> On 10/07/2016 09:14 PM, Eric W. Biederman wrote: >>>> > Nikolay Borisov writes: >>>> > >>>> >> This patchset converts inotify to using the newly introduced >>>> >> per-userns sysctl infrastructure. >>>> >> >>>> >> Currently the inotify instances/watches are being accounted in the >>>> >> user_struct structure. This means that in setups where multiple >>>> >> users in unprivileged containers map to the same underlying >>>> >> real user (i.e. pointing to the same user_struct) the inotify limits >>>> >> are going to be shared as well, allowing one user(or application) to exhaust >>>> >> all others limits. >>>> >> >>>> >> Fix this by switching the inotify sysctls to using the >>>> >> per-namespace/per-user limits. This will allow the server admin to >>>> >> set sensible global limits, which can further be tuned inside every >>>> >> individual user namespace. >>>> >> >>>> >> Signed-off-by: Nikolay Borisov >>>> >> --- >>>> >> Hello Eric, >>>> >> >>>> >> I saw you've finally sent your pull request for 4.9 and it >>>> >> includes your implementatino of the ucount infrastructure. So >>>> >> here is my respin of the inotify patches using that. >>>> > >>>> > Thanks. I will take a good hard look at this after -rc1 when things are >>>> > stable enough that I can start a new development branch. >>>> > >>>> > I am a little concerned that the old sysctls have gone away. If no one >>>> > cares it is fine, but if someone depends on them existing that may count >>>> > as an unnecessary userspace regression. But otherwise skimming through >>>> > this code it looks good. >>>> >>>> So this indeed this is real issue and I meant to write something about >>>> it. Anyway, in order to preserve those sysctl what can be done is to >>>> hook them up with a custom sysctl handler taking the ns from the proc >>>> mount and the euid of current? I think this is a good approach, but >>>> let's wait and see if anyone will have objections to completely >>>> eliminating those sysctls. >>> >>> Well, I believe just discarding those sysctls is not an option - I'm pretty >>> sure there are scripts out there which tune these sysctls and those would >>> stop working. IMO not acceptable regression. >> >> Nikolay there is your objection. >> >> So since it should be straight forward let's preserve the existing >> sysctls. Then this change doesn't need to prove there are no scripts >> that tweak those sysctls. >> >> We are just talking changing the values in the initial user namespace so >> it should be completely compatible and straight forward to implement >> unless I am missing something. > > Well I'm not so sure about this. Let's say those sysctls are going to > modify the ucount values in the init_user_ns. That's fine, however for > which particular user should they do this ? Should it be hardcoded for > kuid 0? or current_euid? I personally think they should be changing > the values for the current_euid. Unless I have missed something the limits are per user namespace. The counts are per user in that namespace. Certainly that is what the rest of the ucount infrastructure is doing. At which point having the existing sysctls simply update the limit in the initial user namespace should result in no change. Eric