Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161098AbbBCUrs (ORCPT ); Tue, 3 Feb 2015 15:47:48 -0500 Received: from out03.mta.xmission.com ([166.70.13.233]:57093 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756258AbbBCUrn (ORCPT ); Tue, 3 Feb 2015 15:47:43 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Michal Hocko Cc: "Michael Kerrisk \(man-pages\)" , Linux API , Andrew Morton , Oleg Nesterov , LKML References: <20150203150557.GB8907@dhcp22.suse.cz> <20150203155248.GD8907@dhcp22.suse.cz> Date: Tue, 03 Feb 2015 14:44:31 -0600 In-Reply-To: <20150203155248.GD8907@dhcp22.suse.cz> (Michal Hocko's message of "Tue, 3 Feb 2015 16:52:48 +0100") Message-ID: <87bnlalo7k.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX18b9ObiIoLiMDDt278tKVSWL6jObLn1wyc= X-SA-Exim-Connect-IP: 70.59.163.10 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.7 XMSubLong Long Subject * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa04 1397; Body=1 Fuz1=1 Fuz2=1] * 0.5 XM_Body_Dirty_Words Contains a dirty word * 0.1 XMSolicitRefs_0 Weightloss drug * 1.0 T_XMDrugObfuBody_08 obfuscated drug references X-Spam-DCC: XMission; sa04 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: **;Michal Hocko X-Spam-Relay-Country: X-Spam-Timing: total 497 ms - load_scoreonly_sql: 0.04 (0.0%), signal_user_changed: 7 (1.3%), b_tie_ro: 6 (1.2%), parse: 0.74 (0.1%), extract_message_metadata: 12 (2.4%), get_uri_detail_list: 2.3 (0.5%), tests_pri_-1000: 5 (1.0%), tests_pri_-950: 1.09 (0.2%), tests_pri_-900: 0.92 (0.2%), tests_pri_-400: 31 (6.3%), check_bayes: 30 (6.1%), b_tokenize: 7 (1.5%), b_tok_get_all: 14 (2.9%), b_comp_prob: 2.5 (0.5%), b_tok_touch_all: 2.5 (0.5%), b_finish: 0.69 (0.1%), tests_pri_0: 431 (86.7%), tests_pri_500: 5 (1.1%), rewrite_mail: 0.00 (0.0%) Subject: Re: [RFC PATCH] fork: report pid reservation failure properly X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4381 Lines: 112 Michal Hocko writes: > On Tue 03-02-15 16:33:03, Michael Kerrisk wrote: >> Hi Michal, >> >> >> On 3 February 2015 at 16:05, Michal Hocko wrote: >> > Hi, >> > while debugging an unexpected ENOMEM from fork (there was no memory >> > pressure and OVERCOMMIT_ALWAYS) I have found out that fork returns >> > ENOMEM even when not short on memory. >> > >> > In this particular case it was due to depleted pid space which is >> > documented to return EAGAIN in man pages. >> > >> > Here is a quick fix up. >> >> Could you summarize briefly what the user-space visible change is >> here? > > The user visible change is that the userspace will get EAGAIN when > calling fork and the pid space is depleted because of a system wide > limit as per man page description rather than ENOMEM which we return > currently. I don't think that EAGAIN is any better than ENOMEM, nor do I know that it is safe to return EBUSY from fork. What nonsense will applications do when they see an unexpected error code. >> It is not so obvious from your message. I believe you're turning >> some cases of ENOMEM into EAGAIN, right? > > Yes, except for the case mentioned below which discusses a potential > error code for pid namespace triggered failures. > >> Note, by the way, that if I understandwhat you intend, this change >> would bring the implementation closer to POSIX, which specifies: > > True. > > HTH. > >> EAGAIN The system lacked the necessary resources to create >> another process, or the system-imposed limit on the total >> number of processes under execution system-wide or by a >> single user {CHILD_MAX} would be exceeded. >> Note. All of those documented errors documented to return EAGAIN are the kind of errors that if you wait a while you can reasonably expect fork to succeed later. With respecting to dealing with errors from fork, fork is a major pain. Fork only has only two return codes documented, and fork is one of the most complicated system calls in the kernel with the most failure modes of any system call I am familiar with. Mapping a plethora of failure modes onto two error codes is always going to be problematic from some view point. EAGAIN is a bad idea in general because that means try again and if you have hit a fixed limit trying again is wrong. Frankly I think posix is probably borked to recommend EAGAIN instead of ENOMEM. Everyone in the world uses fork which makes is quite tricky to figure out which assumptions on the return values of fork exist in the wild, so it is not clear if it is safe to add new more descriptive return messages. With respect to the case where PIDNS_HASH_ADDING would cause fork to fail, that only happens after init has exited in a pid namespace, so it is very much a permanent failure, and there are no longer any processes in the specific pid namespace nor will there ever be any more processes in that pid namespace. EINVAL might actually makes sense. Of course a sensible error code from fork does not seem to be allowed. Of the two return codes that are allowed for fork, EAGAIN and ENOMEM ENOMEM seems to be better as it is a more permanement failure. I agree it is a little confusing, but I don't see anything that is other than a little confusing. Other than someone doing: unshare(CLONE_NEWPID); pid = fork(); waitpid(pid); fork(); /* returns ENOMEM */ Was there any other real world issue that started this goal to fix fork? I think there is a reasonable argument for digging into the fork return code situation. Perhaps it is just a matter of returning exotic return codes for the weird and strange cases like trying to create a pid in a dead pid namespace. But what we have works, and I don't know of anything bad that happens except when people are developing new code they get confused. Further we can't count on people to read their man pages because this behavior of returning ENOMEM is documented in pid_namespaces(7). Which makes me really thinking changing the code to match the manpage is more likely to break code than to fix code. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/