Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2816013pxj; Sun, 6 Jun 2021 15:20:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJweM6iDxSMjybF+XonHC7LR8VUk/Al6eXFeVwzhGfc1PBy5R/ldMMtfWACOCzG1pEDAxzhK X-Received: by 2002:a17:906:b308:: with SMTP id n8mr15123610ejz.93.1623018011603; Sun, 06 Jun 2021 15:20:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623018011; cv=none; d=google.com; s=arc-20160816; b=epP96F2C5JEEqHYWqqiAaXZe9PkfL8HKgViCGM/AjtvAuzDpkrVjS1iv5dGGzAgfnf ZX/27wc35775yY5tWtTKSw+CrhG9rHmi3jUqXncBNY6br0VMbef11qTaPYD2EIZHXC8v J7LasisB+M1RuYyk5XaOhf/M6wxNlnoAqtHqKr3rd5vSXfurAnFJtDnL8bCuyEgbFXQd RJBGzW0BS8SdDXazyXc+k9aUpgRbSgNmbz/XgBJ8YXH2RevhoZXHQoNm6LP4rRY8Ow9f dw291BCxdUOxrnRxMBKgBhKpUrV7vYQJrqfHOAkJ0zbJUEQ+A+uiLq7G80/dCS+qOMIZ I0lA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:references:cc :to:subject:from:dkim-signature; bh=iosAm+f1XXquGfWTgQDxbtIMOzIFmMntfY/zG8ojX8Q=; b=MimtbWUshA1c3zu/ePFmxjPswL0Xb/AuclJl2ahXO1O6EZ/ldpzQQM7XdMSGEiCUxc GbyiSiPdaAEoB706hZil4udW05V8mhP8eLcaTYV+u8IjrBTmg1astagwz/5YOvMMaykc YdUoYZg6wd1D6VxdIntnN+fBRcztjc4BxV6todh7yrySVkuGevLq8/0kPQlW+z41OPPh wdaBd/65b+oaEkQStLPBPDbbk3Pw3BLC79gzSpER+q0hpCm6XI+a9ocks7lYzVbGTVCM DfUp8cgu+9O4ZX4tYIKeB+IzUCnmH4BCPrMEZ24/kZg/NbAXiwLnj1G08Ca3w6v2fGDg IHHA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BHzUo2xO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s6si10295269edx.64.2021.06.06.15.19.49; Sun, 06 Jun 2021 15:20:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BHzUo2xO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231184AbhFFWPz (ORCPT + 99 others); Sun, 6 Jun 2021 18:15:55 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:47083 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230514AbhFFWPy (ORCPT ); Sun, 6 Jun 2021 18:15:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1623017644; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iosAm+f1XXquGfWTgQDxbtIMOzIFmMntfY/zG8ojX8Q=; b=BHzUo2xOSU3yOkBedgMd+dc/1wR4EZkEoSKIDBzaFBpfF75p5+Q26e7fmHwy89yr2zpuZi s3q7srTJUA5wAzE6wlaMbZaeLqRfZJK9efMF2WDeZMRtEMf/U+MgeEsCLENWuBdCFLbVQQ tfWJeotO/lyYK70HTCeRdc+m8XZ5FGQ= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-515-alAEH6gHN22ru9udyKJkZw-1; Sun, 06 Jun 2021 18:14:02 -0400 X-MC-Unique: alAEH6gHN22ru9udyKJkZw-1 Received: by mail-qv1-f70.google.com with SMTP id z93-20020a0ca5e60000b02901ec19d8ff47so11736527qvz.8 for ; Sun, 06 Jun 2021 15:14:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=iosAm+f1XXquGfWTgQDxbtIMOzIFmMntfY/zG8ojX8Q=; b=mNfS4OwH0rrymX/jBh/sc/FrMVlmlr6NSfOO3XzKu8UGKSzdXOQN+ktWqzdq0NPJRO D6kEEdyiBTAlfoJT28I1HPqUeXjsgo9Iw9TlwaDqmbCg0n1KbIViM6Rj8MbsM9NM7pod 9PLdSFo1sVYKt08KKX9+vGAc+ns6S8imb8ZXLGTziqk5fVFpMv2Wvdi8Y0mOTpvdJQ8o mdDSsuHuMTTgc3Y/wLkJ+eX+CM3jGeFXyh9KhDxeg5wZ2KWV4dHQ/gcbs9nVXDH9DvXO FU179HUofIkbX70LSzNKkuCkJlE6gJHDYbATdASqONIJsdT7tneyKxW3WxFoC/+jO1IT aIYg== X-Gm-Message-State: AOAM532IsRjG8qmJQmS4E0PqJaJYdICxu6sSp+8y9SEJ1VI9l3gdmHbz rBIS/0tRvFnG4FXMa2+5Vd+VoZ/wQqQQsftBgLLnt/BT+qBr5vTXM0lNID/U9cmAIYEzmkiPpBP 0bDjn0PwFyaHKz0EO9x8kMVwF X-Received: by 2002:a05:6214:207:: with SMTP id i7mr15432191qvt.10.1623017642242; Sun, 06 Jun 2021 15:14:02 -0700 (PDT) X-Received: by 2002:a05:6214:207:: with SMTP id i7mr15432173qvt.10.1623017642009; Sun, 06 Jun 2021 15:14:02 -0700 (PDT) Received: from llong.remote.csb ([2601:191:8500:76c0::cdbc]) by smtp.gmail.com with ESMTPSA id c68sm8481746qkd.112.2021.06.06.15.14.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 06 Jun 2021 15:14:01 -0700 (PDT) From: Waiman Long X-Google-Original-From: Waiman Long Subject: Re: [mm/gup] 57efa1fe59: will-it-scale.per_thread_ops -9.2% regression To: Linus Torvalds , Feng Tang Cc: Jason Gunthorpe , kernel test robot , John Hubbard , Jan Kara , Peter Xu , Andrea Arcangeli , "Aneesh Kumar K.V" , Christoph Hellwig , Hugh Dickins , Jann Horn , Kirill Shutemov , Kirill Tkhai , Leon Romanovsky , Michal Hocko , Oleg Nesterov , Andrew Morton , LKML , lkp@lists.01.org, kernel test robot , "Huang, Ying" , zhengjun.xing@intel.com References: <20210525031636.GB7744@xsang-OptiPlex-9020> <20210604070411.GA8221@shbuild999.sh.intel.com> <20210604075220.GA40621@shbuild999.sh.intel.com> <20210606101623.GA48020@shbuild999.sh.intel.com> Message-ID: Date: Sun, 6 Jun 2021 18:13:59 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/6/21 3:20 PM, Linus Torvalds wrote: > [ Adding Waiman Long to the participants, because this seems to be a > very specific cacheline alignment behavior of rwsems, maybe Waiman has > some comments ] > > On Sun, Jun 6, 2021 at 3:16 AM Feng Tang wrote: >> * perf-c2c: The hotspots(HITM) for 2 kernels are different due to the >> data structure change >> >> - old kernel >> >> - first cacheline >> mmap_lock->count (75%) >> mm->mapcount (14%) >> >> - second cacheline >> mmap_lock->owner (97%) >> >> - new kernel >> >> mainly in the cacheline of 'mmap_lock' >> >> mmap_lock->count (~2%) >> mmap_lock->owner (95%) > Oooh. > > It looks like pretty much all the contention is on mmap_lock, and the > difference is that the old kernel just _happened_ to split the > mmap_lock rwsem at *exactly* the right place. > > The rw_semaphore structure looks like this: > > struct rw_semaphore { > atomic_long_t count; > atomic_long_t owner; > struct optimistic_spin_queue osq; /* spinner MCS lock */ > ... > > and before the addition of the 'write_protect_seq' field, the mmap_sem > was at offset 120 in 'struct mm_struct'. > > Which meant that count and owner were in two different cachelines, and > then when you have contention and spend time in > rwsem_down_write_slowpath(), this is probably *exactly* the kind of > layout you want. > > Because first the rwsem_write_trylock() will do a cmpxchg on the first > cacheline (for the optimistic fast-path), and then in the case of > contention, rwsem_down_write_slowpath() will just access the second > cacheline. > > Which is probably just optimal for a load that spends a lot of time > contended - new waiters touch that first cacheline, and then they > queue themselves up on the second cacheline. Waiman, does that sound > believable? Yes, I think so. The count field is accessed when a task tries to acquire the rwsem or when a owner releases the lock. If the trylock fails, the writer will go into the slowpath doing optimistic spinning on the owner field. As a result, a lot of reads to owner are issued relative to the read/write of count. Normally, there should only be one spinner that has the OSQ lock spinning on owner and the 9% performance degradation seems a bit high to me. In the rare case that the head waiter in the wait queue sets the handoff flag, the waiter may also spin on owner causing a bit more contention on the owner cacheline. I will do further investigation on this possibility when I have time. Cheers, Longman