Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752765AbbGBPox (ORCPT ); Thu, 2 Jul 2015 11:44:53 -0400 Received: from relay1.mentorg.com ([192.94.38.131]:36699 "EHLO relay1.mentorg.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751675AbbGBPop (ORCPT ); Thu, 2 Jul 2015 11:44:45 -0400 Date: Thu, 2 Jul 2015 15:44:38 +0000 From: Joseph Myers X-X-Sender: jsm28@digraph.polyomino.org.uk To: , , , , , , Subject: [PATCH 0/8] math-emu: Update kernel math-emu code from current glibc soft-fp Message-ID: User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10385 Lines: 206 From: Joseph Myers The include/math-emu code (used for alpha powerpc sh sparc, and to a very limited extent for s390) was taken from an old version of glibc's soft-fp code around 15 years ago (in the pre-git era, anyway, and some of the initial code may have been developed around 1997-9 with a view to being used in both places). Since then, there have only been a handful of small changes in the kernel version, while the glibc version has been extensively developed, with many bug fixes, performance improvements and miscellaneous cleanups, and is also now used in libgcc, including for __float128 on x86_64 since GCC 4.3 (see for more information regarding performance improvements and use in libgcc). Thus the kernel version is missing those various improvements and it would make sense to update it to include them (as was noted back in 2006 when a large group of changes went into glibc). I believe it also makes sense to aim to have *exactly* the same code in both places to simplify future updates of the kernel version. (And in particular, as external code imported largely verbatim into the kernel, include/math-emu has never followed the kernel coding style and it doesn't make sense for it to do so.) I made an analysis of what kernel-local changes there were to this code in , and since then have added the various missing features to the glibc version so that it is feature-complete regarding features used in the kernel and so that exactly the same code is usable in both places. This patch series updates the include/math-emu code, and its kernel users, so that the shared code is identical to glibc's current soft-fp code. Regarding what testing seems appropriate for this patch series, see my notes in . I've done that testing for powerpc (both e500 and emulation of classic hard float). For reports of testing on other architectures, see (alpha), (s390), (sh), (sparc); the fixes indicated in those reports as needed on particular architectures have been integrated into this version. The bulk of the changes are updating the code from glibc, and a detailed review of that part probably does not make sense in this context if you want to aim for the same code in both places. The trickier part is the architecture updates for the various API changes in soft-fp since the version used by the kernel. The following changes have occurred in the soft-fp API since the version used in the kernel and so are addressed in the architecture updates in this patch series. (This list only includes changes relating to features used in the kernel, not pure new features that aren't relevant to updating existing code, and not pure bug fixes.) * - Semi-raw unpacking is added, as something intermediate between raw and cooked unpacking, for efficiency. - Addition and subtraction are changed to work on semi-raw values. Thus, cooked results of multiplication can't be passed directly into addition, as was done in some kernel emulations of fused multiply-add, but that isn't a proper fused operation anyway (a proper fused operation involves using the unrounded multiplication result in twice the input precision, not an intermediate value in input precision plus three working bits); the appropriate fix is to use the new fused multiply-add support in soft-fp. - Conversions from one floating-point type to another now use FP_EXTEND (raw) and FP_TRUNC (semi-raw) instead of FP_CONV (cooked). Those operations now deal with quieting signaling NaNs. - Conversions from floating-point to integer now use raw inputs, and require the integer variable passed to the FP_TO_INT macros to have unsigned type. - Conversions from integer to floating-point now use raw outputs. * - Conversions from integer to floating-point now pass the name of an unsigned type to the FP_FROM_INT macros, not a signed type to which "unsigned" is added in the macro definition. * - soft-fp supports the reversed quiet NaN convention used on MIPS and HPPA; sfp-machine.h must define _FP_QNANNEGATEDP (to 0, for architectures using the normal convention; to 1, for architectures using the MIPS convention). * - Negation now works on raw values. * - soft-fp now supports after-rounding tininess detection for architectures where that is the defined way in which tiny results are detected (of the architectures for which the Linux kernel uses this code, that's Alpha and SH). sfp-machine.h must define _FP_TININESS_AFTER_ROUNDING to either 0 or 1. * - FP_CLEAR_EXCEPTIONS is removed; all uses in the Linux kernel are no longer needed as, now unpacking only occurs in the correct format, exceptions are already clear at that point. * - The FP_CMP macros have an extra argument to specify when exceptions should be set (0 for no exception setting, 1 for exceptions only for signaling NaNs, 2 for exceptions for all NaNs). In the old version in the kernel, it was necessary for the caller to handle all exception setting for comparisons. * - FP_DENORM_ZERO does not set "inexact" when flushing to zero, as that does not appear to match the documented semantics for either of the architectures (Alpha and SH) for which the kernel uses FP_DENORM_ZERO. FP_DENORM_ZERO is also checked for comparisons (the documentation for both Alpha and SH is explicit that their corresponding control bits do apply to comparisons). * - The more precise FP_EX_INVALID_* exceptions include more cases than in the kernel version (in particular, FP_EX_INVALID_IMZ_FMA is split out from FP_EX_INVALID_IMZ, so if only the latter is defined then fma using the new fma support would not raise that exception any more - except that this doesn't actually affect powerpc because it hardcodes setting various exceptions in powerpc-specific code despite also defining FP_EX_INVALID_*). Generally this patch series only does cleanups and bug fixes to architecture-specific code when they are closely connected to API changes in the new code (either required by such API changes, or the new API means the idiomatic way to do something has changed). Where something was already odd with the old version of the code, or apparently did not match documented instruction set semantics, it's not changed if that seems unconnected to the update from glibc. I've noted various such cases (especially for powerpc) that may be addressed in followup patch series once the main upgrade is in (or, where the fix seems more complicated and difficult to fix without convenient access to the architecture for testing, I may just list the issues on the relevant architecture mailing list). The following architecture-specific cleanups or bug fixes (that might change how the emulation behaves, or that go beyond mechanical conversion to new APIs) are included in this patch series because of their close connection to the API changes: * Alpha and SH now use after-rounding tininess detection. * On Alpha, extensions from single to double now use FP_EXTEND with raw unpacking instead of the previous hardcoded code with cooked unpacking; these should be equivalent and the new code, with the optimizations in FP_EXTEND relative to the old FP_CONV, should be as efficient as the previous hardcoded code. * On PowerPC and SH, fused multiply-add operations now use the new soft-fp fma support (meaning they are properly fused rather than only having 3 extra bits precision on the intermediate result of the multiplication). * On PowerPC for SPE floating-point emulation, the pre-existing bug of comparisons using cooked unpacking is fixed (as the structure of the code meant unpacking types naturally needed specifying explicitly for all operations). This should not in fact change how the emulation behaves, other than making it more efficient. Various operations that should not have unpacked at all now no longer unpack instead of using cooked unpacking, so avoiding spurious exceptions on signaling NaNs (on the other case of arguments that are actually a different floating-point type but would wrongly be interpreted as signaling NaNs by the unpacking, FP_CLEAR_EXCEPTIONS may have avoided the issue). * On SPARC, comparisons now use raw unpacking (this should not in fact change how the emulation behaves, just make it more efficient). Signed-off-by: Joseph Myers --- Compared to the previous version , this patch series is split into eight patches so that each architecture's changes can be reviewed individually and the only patch that changes all affected architectures together is the first mechanical one moving math-emu to math-emu-old. Patch 2 depends on patch 1, patches 3-7 depend on patches 1 and 2 but are independent of each other, patch 8 depends on all the other patches. Applying all eight patches gives an identical tree to applying the previous monolithic patch. -- Joseph S. Myers joseph@codesourcery.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/