Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752824AbbBYDbW (ORCPT ); Tue, 24 Feb 2015 22:31:22 -0500 Received: from mx1.redhat.com ([209.132.183.28]:49890 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752720AbbBYDbU (ORCPT ); Tue, 24 Feb 2015 22:31:20 -0500 Date: Wed, 25 Feb 2015 11:30:09 +0800 From: Fam Zheng To: Ingo Molnar Cc: Jonathan Corbet , linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Alexander Viro , Andrew Morton , Kees Cook , Andy Lutomirski , David Herrmann , Alexei Starovoitov , Miklos Szeredi , David Drysdale , Oleg Nesterov , "David S. Miller" , Vivek Goyal , Mike Frysinger , "Theodore Ts'o" , Heiko Carstens , Rasmus Villemoes , Rashika Kheria , Hugh Dickins , Mathieu Desnoyers , Peter Zijlstra , linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, Josh Triplett , "Michael Kerrisk (man-pages)" , Paolo Bonzini , Omar Sandoval Subject: Re: [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Message-ID: <20150225033009.GA20485@ad.nay.redhat.com> References: <1423818243-15410-1-git-send-email-famz@redhat.com> <20150215150011.0340686c@lwn.net> <20150216010224.GA32421@ad.nay.redhat.com> <20150218184934.GA7493@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150218184934.GA7493@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2951 Lines: 76 On Wed, 02/18 19:49, Ingo Molnar wrote: > > * Fam Zheng wrote: > > > On Sun, 02/15 15:00, Jonathan Corbet wrote: > > > On Fri, 13 Feb 2015 17:03:56 +0800 > > > Fam Zheng wrote: > > > > > > > SYNOPSIS > > > > > > > > #include > > > > > > > > int epoll_pwait1(int epfd, int flags, > > > > struct epoll_event *events, > > > > int maxevents, > > > > struct epoll_wait_params *params); > > > > > > Quick, possibly dumb question: might it make sense to also pass in > > > sizeof(struct epoll_wait_params)? That way, when somebody wants to add > > > another parameter in the future, the kernel can tell which version is in > > > use and they won't have to do an epoll_pwait2()? > > > > > > > Flags can be used for that, if the change is not > > radically different. > > Passing in size is generally better than flags, because > that way an extension of the ABI (new field[s]) > automatically signals towards the kernel what to do with > old binaries - while extending the functionality of new > binaries, without sacrificing functionality. > > With flags you are either limited to the same structure > size - or have to decode a 'size' value from the flags > value - which is fragile (and in which case a real 'size' > parameter is better). > > in the perf ABI we use something like that: there's a > perf_attr.size parameter that iterates the ABI forward, > while still being binary compatible with older software. > > If old binaries pass in a smaller structure to a newer > kernel then the kernel pads the new fields with zero by > default - that way the kernel internals are never burdened > with compatibility details and data format versions. > > If new user-space passes in a large structure than the > kernel can handle then the kernel returns an error - this > way user-space can transparently support conditional > features and fallback logic. > > It works really well, we've done literally a hundred perf > ABI extensions this way in the last 4+ years, in a pretty > natural fashion, without littering the kernel (or > user-space) with version legacies and without breaking > existing perf tooling. > > Other syscall ABIs already get painful when trying to > handle 2-3 data structure versions, so people either give > up, or add flags kludges or go to new syscall entries: > which is painful in its own fashion and adds unnecessary > latency to feature introduction as well. > Excellent. This now makes a lot of sense to me, thanks to your explanations, Ingo. I'll add the "size" field in the next revision. Thanks, Fam -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/