Received: by 2002:a17:90a:bc8d:0:0:0:0 with SMTP id x13csp1512643pjr; Mon, 18 May 2020 15:07:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwAOR0v8iShe4LuNjytW/dYPhA5uUML97/pklgGVVRRa4VcRPjDGgVLf6KZ7kd7x9C/qOtn X-Received: by 2002:a17:907:20c6:: with SMTP id qq6mr16221774ejb.194.1589839676259; Mon, 18 May 2020 15:07:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589839676; cv=none; d=google.com; s=arc-20160816; b=zasB/m5xIR1b9hyNvAnZptMzeNX3V8V901U8n0aesW8aEfHzIkPYAU1l7g+0GQEfPD 3w3MQGPITZGLIAG8Gtat1Zf8NxMf5ujaB3BQHWijyC17qD5zdm5amQnUxOWpWgrDKXLs LRZAFg0tVCjPTxwZKOqL34QIp4oW1RPoKBROm/G9m4hKzGFjinG97xmvVW6zTRWg7WTF R3iWRgbkRejp0VF6aPEVwMyIHIIs6jWkaLKnKaKF9PPqZDlzHzbpLvfc1QYM39iBQ4G7 7JeoZzaNbW7mKO460l0x6WvV7rEIhmhxpMECRt3tVqRzbk3pxcYeyDCMj+wFTyH5zaeD HJ2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:cc:to:subject :dkim-signature; bh=iZK8o0/YgEUO016TV6TQ7POrns9r3ptB7P38AZfbfLY=; b=PKziqEs6sD7amuslfvGlFIOo71PkgSSsuNzB/6ybthbeTMj1yK4PSlbA0YLf7W+3y1 AyHTKENdLG0kVg5XZ9AfSWB5mmupMXxQdEWeZe1EwVzoynKOc7xmP01DWuW3fpe+YLZH FfviMX3wXR9x9gWl3isCCQDC7rAfW3C14zOBIbwweChXWm/To+mIKhN8X3LU+XD/LEPo C8ggP2K7fo6iRDdLK6aRjXVeCm62F+wZ+IVRStwYWOF1qvb/bdf9Ua6u5F+gcK8n2av4 GTQbqkP5QAgVwkfc9DdoUioZTjSwpPKsmCWFMOj1kX25DW5Xc0iSQf6FDDrsIwh7BliQ Oalw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NZSy1xkt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b25si6561586edt.290.2020.05.18.15.07.33; Mon, 18 May 2020 15:07:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=NZSy1xkt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728129AbgERWFP (ORCPT + 99 others); Mon, 18 May 2020 18:05:15 -0400 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:33200 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728100AbgERWFN (ORCPT ); Mon, 18 May 2020 18:05:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1589839511; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iZK8o0/YgEUO016TV6TQ7POrns9r3ptB7P38AZfbfLY=; b=NZSy1xktGmA8/pvmLb1i6powT+xv0TGvezpc0H74gC44w6M8NODwNdePpXz2dkxQ3CaMZf Gq21MUedJoQ+8jF2gGhCn43bTYX8V7g4FHvtaG7alTkY/GiIW9RosDXlRrR+D3/yAR7sbc ZsbodOYiyP+FVm24gXngS/PGuNYKML0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-366-jiqD3ErjMku290OgbMSPcA-1; Mon, 18 May 2020 18:05:09 -0400 X-MC-Unique: jiqD3ErjMku290OgbMSPcA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id BCE021800704; Mon, 18 May 2020 22:05:06 +0000 (UTC) Received: from llong.remote.csb (ovpn-114-131.rdu2.redhat.com [10.10.114.131]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2EB1C5C1B2; Mon, 18 May 2020 22:05:00 +0000 (UTC) Subject: Re: [PATCH v2 3/4] mm/slub: Fix another circular locking dependency in slab_attr_store() To: Qian Cai Cc: Andrew Morton , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Johannes Weiner , Michal Hocko , Vladimir Davydov , linux-mm@kvack.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Juri Lelli References: <20200427235621.7823-4-longman@redhat.com> From: Waiman Long Organization: Red Hat Message-ID: <638f59c0-60f1-2279-fea6-28b2980720f4@redhat.com> Date: Mon, 18 May 2020 18:05:00 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/16/20 10:19 PM, Qian Cai wrote: > >> On Apr 27, 2020, at 7:56 PM, Waiman Long wrote: >> >> It turns out that switching from slab_mutex to memcg_cache_ids_sem in >> slab_attr_store() does not completely eliminate circular locking dependency >> as shown by the following lockdep splat when the system is shut down: >> >> [ 2095.079697] Chain exists of: >> [ 2095.079697] kn->count#278 --> memcg_cache_ids_sem --> slab_mutex >> [ 2095.079697] >> [ 2095.090278] Possible unsafe locking scenario: >> [ 2095.090278] >> [ 2095.096227] CPU0 CPU1 >> [ 2095.100779] ---- ---- >> [ 2095.105331] lock(slab_mutex); >> [ 2095.108486] lock(memcg_cache_ids_sem); >> [ 2095.114961] lock(slab_mutex); >> [ 2095.120649] lock(kn->count#278); >> [ 2095.124068] >> [ 2095.124068] *** DEADLOCK *** > Can you show the full splat? > >> To eliminate this possibility, we have to use trylock to acquire >> memcg_cache_ids_sem. Unlikely slab_mutex which can be acquired in >> many places, the memcg_cache_ids_sem write lock is only acquired >> in memcg_alloc_cache_id() to double the size of memcg_nr_cache_ids. >> So the chance of successive calls to memcg_alloc_cache_id() within >> a short time is pretty low. As a result, we can retry the read lock >> acquisition a few times if the first attempt fails. >> >> Signed-off-by: Waiman Long > The code looks a bit hacky and probably not that robust. Since it is the shutdown path which is not all that important without lockdep, maybe you could drop this single patch for now until there is a better solution? That is true. Unlike using the slab_mutex, the chance of failing to acquire a read lock on memcg_cache_ids_sem is pretty low. Maybe just print_once a warning if that happen. Thanks, Longman