Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753910AbdIYIqa (ORCPT ); Mon, 25 Sep 2017 04:46:30 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:43444 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751521AbdIYIqX (ORCPT ); Mon, 25 Sep 2017 04:46:23 -0400 From: Claudio Imbrenda To: linux-kernel@vger.kernel.org Cc: borntraeger@de.ibm.com, kvm@vger.kernel.org, linux-mm@kvack.org, nefelim4ag@gmail.com, akpm@linux-foundation.org, aarcange@redhat.com, mingo@kernel.org, zhongjiang@huawei.com, kirill.shutemov@linux.intel.com, arvind.yadav.cs@gmail.com, solee@os.korea.ac.kr, ak@linux.intel.com Subject: [RFC v1 0/2] Per-arch page checksumming and comparison Date: Mon, 25 Sep 2017 10:46:12 +0200 X-Mailer: git-send-email 2.7.4 X-TM-AS-GCONF: 00 x-cbid: 17092508-0040-0000-0000-000003FCC439 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17092508-0041-0000-0000-0000209E09C7 Message-Id: <1506329174-19265-1-git-send-email-imbrenda@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-09-25_03:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=1 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1709250137 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3866 Lines: 89 Since we now have two different proposals on how to speed up KSM, I thought I'd share what I had done too, so we can now have three :) I have analysed the performance of KSM, and I have found out that both the checksum and the memcmp take up a significant amount of time. Depending on the content of the pages, either function can be the "bottleneck". I did some synthetic benchmarks, using different checksum functions and with different page content scenarios. Only in the best case (e.g. pages differing at the very beginning) was the checksum consuming more CPU time than the memcmps. Using a simpler function (like CRC32 or even just a simple sum) significantly reduced the CPU load. In other scenarios, like when the pages differ in the middle or at the end, the biggest offender is the memcmp. Still, using simpler checksums lowers the overall CPU load. The idea I had in this patchseries was to provide arch-overridable functions to checksum and compare whole pages. Depending on the arch, the best memcmp/checksum to use in the specialized case of comparing/checksumming one whole page might not necessarily be the one that is the best in the general case. So what I did here was to factor out the old code and make it generic, and then provide an s390-specific implementation for the checksum using the CKSM instruction, which is also used to calculate the checksum of IP headers, the idea being that other architectures can then follow and use their preferred checksum. I like Sioh Lee's proposal of using the crypto API to choose a fast but good checksum, since this can be made arch-dependant too, and CRC32 is also almost as fast as the simple checksum. Also, I had underestimated how many more collisions the simple checksum could potentially cause (although I did not see any performance regressions in my tests). While there is a crypto API to choose between different hash functions, there is nothing like that for page comparison. I think at this point we need to coordinate a little, to avoid reinventing the wheel three times and in different ways. Claudio Imbrenda (2): VS1544 KSM generic memory comparison functions VS1544 KSM s390-specific memory comparison functions arch/alpha/include/asm/Kbuild | 1 + arch/arc/include/asm/Kbuild | 1 + arch/arm/include/asm/Kbuild | 1 + arch/arm64/include/asm/Kbuild | 1 + arch/blackfin/include/asm/Kbuild | 1 + arch/c6x/include/asm/Kbuild | 1 + arch/cris/include/asm/Kbuild | 1 + arch/frv/include/asm/Kbuild | 1 + arch/h8300/include/asm/Kbuild | 1 + arch/hexagon/include/asm/Kbuild | 1 + arch/ia64/include/asm/Kbuild | 1 + arch/m32r/include/asm/Kbuild | 1 + arch/m68k/include/asm/Kbuild | 1 + arch/metag/include/asm/Kbuild | 1 + arch/microblaze/include/asm/Kbuild | 1 + arch/mips/include/asm/Kbuild | 1 + arch/mn10300/include/asm/Kbuild | 1 + arch/nios2/include/asm/Kbuild | 1 + arch/openrisc/include/asm/Kbuild | 1 + arch/parisc/include/asm/Kbuild | 1 + arch/powerpc/include/asm/Kbuild | 1 + arch/s390/include/asm/page_memops.h | 18 ++++++++++++++++++ arch/score/include/asm/Kbuild | 1 + arch/sh/include/asm/Kbuild | 1 + arch/sparc/include/asm/Kbuild | 1 + arch/tile/include/asm/Kbuild | 1 + arch/um/include/asm/Kbuild | 1 + arch/unicore32/include/asm/Kbuild | 1 + arch/x86/include/asm/Kbuild | 1 + arch/xtensa/include/asm/Kbuild | 1 + include/asm-generic/page_memops.h | 31 +++++++++++++++++++++++++++++++ mm/ksm.c | 27 +++------------------------ 32 files changed, 81 insertions(+), 24 deletions(-) create mode 100644 arch/s390/include/asm/page_memops.h create mode 100644 include/asm-generic/page_memops.h -- 2.7.4