Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755916AbbBRStn (ORCPT ); Wed, 18 Feb 2015 13:49:43 -0500 Received: from mail-wg0-f50.google.com ([74.125.82.50]:55583 "EHLO mail-wg0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751982AbbBRStk (ORCPT ); Wed, 18 Feb 2015 13:49:40 -0500 Date: Wed, 18 Feb 2015 19:49:34 +0100 From: Ingo Molnar To: Fam Zheng Cc: Jonathan Corbet , linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Alexander Viro , Andrew Morton , Kees Cook , Andy Lutomirski , David Herrmann , Alexei Starovoitov , Miklos Szeredi , David Drysdale , Oleg Nesterov , "David S. Miller" , Vivek Goyal , Mike Frysinger , "Theodore Ts'o" , Heiko Carstens , Rasmus Villemoes , Rashika Kheria , Hugh Dickins , Mathieu Desnoyers , Peter Zijlstra , linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, Josh Triplett , "Michael Kerrisk (man-pages)" , Paolo Bonzini , Omar Sandoval Subject: Re: [PATCH RFC v3 0/7] epoll: Introduce new syscalls, epoll_ctl_batch and epoll_pwait1 Message-ID: <20150218184934.GA7493@gmail.com> References: <1423818243-15410-1-git-send-email-famz@redhat.com> <20150215150011.0340686c@lwn.net> <20150216010224.GA32421@ad.nay.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150216010224.GA32421@ad.nay.redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2656 Lines: 70 * Fam Zheng wrote: > On Sun, 02/15 15:00, Jonathan Corbet wrote: > > On Fri, 13 Feb 2015 17:03:56 +0800 > > Fam Zheng wrote: > > > > > SYNOPSIS > > > > > > #include > > > > > > int epoll_pwait1(int epfd, int flags, > > > struct epoll_event *events, > > > int maxevents, > > > struct epoll_wait_params *params); > > > > Quick, possibly dumb question: might it make sense to also pass in > > sizeof(struct epoll_wait_params)? That way, when somebody wants to add > > another parameter in the future, the kernel can tell which version is in > > use and they won't have to do an epoll_pwait2()? > > > > Flags can be used for that, if the change is not > radically different. Passing in size is generally better than flags, because that way an extension of the ABI (new field[s]) automatically signals towards the kernel what to do with old binaries - while extending the functionality of new binaries, without sacrificing functionality. With flags you are either limited to the same structure size - or have to decode a 'size' value from the flags value - which is fragile (and in which case a real 'size' parameter is better). in the perf ABI we use something like that: there's a perf_attr.size parameter that iterates the ABI forward, while still being binary compatible with older software. If old binaries pass in a smaller structure to a newer kernel then the kernel pads the new fields with zero by default - that way the kernel internals are never burdened with compatibility details and data format versions. If new user-space passes in a large structure than the kernel can handle then the kernel returns an error - this way user-space can transparently support conditional features and fallback logic. It works really well, we've done literally a hundred perf ABI extensions this way in the last 4+ years, in a pretty natural fashion, without littering the kernel (or user-space) with version legacies and without breaking existing perf tooling. Other syscall ABIs already get painful when trying to handle 2-3 data structure versions, so people either give up, or add flags kludges or go to new syscall entries: which is painful in its own fashion and adds unnecessary latency to feature introduction as well. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/