Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp23708pxb; Tue, 12 Apr 2022 15:43:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJygT3timPKW9MRFG8b73+ej2VlgKSeFDSAuWBbTVuDLLMQdYDKP4z5eIKSKNqynsxcdm6fg X-Received: by 2002:a17:902:8ec9:b0:14f:11f7:db77 with SMTP id x9-20020a1709028ec900b0014f11f7db77mr39652037plo.136.1649803420125; Tue, 12 Apr 2022 15:43:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649803420; cv=none; d=google.com; s=arc-20160816; b=P93XehYHGuSP/UicQ7jzxttfLKO699o25ZCcGF2r7FYh1qwbIVEwAm/N21Puw5zU6S nFy5xJSoPDWc/bKG8VrzLomgHV16Rx94FXvn2Z372DgDp8slz2x5V2YA7ObqCHZFtOqj CAgSl8HlkfE9O2dOTqELs6IqcwUQVaoutYQ21medPqR02TuExw4D57m+tjwOK6qutdTV ZUS9jVj+ai4ArqxF8/JY7uMafaGEjk6P2RzZAogeQYYtspIfwBdG75oYXE12TL09q5xN 4+caGsAHCN0baQbWvXRFic5TEp7q8MXBSTO59ifzCucMdfsZnmDCgin1GtlFLrVMiXp1 Qh5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=6yP5J8nxyQra9M+hDPLnlcwcmCfGna/5XejuacIZlIg=; b=kv5CVhAR2sH7HlAaLOfB7u4jxlCKz3RiXqa6yDVZEvlWkEW8OGS20g97OHiXmkz9tB x0ev6sJ8DU0lyzT6GjKZ1VozVKwvEj2mESr0bade7khxlJhRXdLLEgoCGM3uL9DdLMID Ff1ucZvIWGr6h4H5FiF5UBmqkHapWuGO6xosmdUBJ1wfV2Xiq8x7XfUKhdFnjhb67dtc pAKzrIBIgTnq9fepH6nkXceWBfdtdUIm6pwXS2oVdlXTxqL4BNDZ3lSeVV/ffjg7kmGH 5aM8UxlVxaRGS92UXyFv8z5i6opc2YGUohsSE6LUbJrmt2F+RwT+bcw1aBPi5af+o9OS x7fA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=N1oK5VzI; dkim=neutral (no key) header.i=@linutronix.de header.b=H9k4MA+4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id o19-20020a056a0015d300b004fa3a8dff8esi14178520pfu.69.2022.04.12.15.43.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Apr 2022 15:43:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=N1oK5VzI; dkim=neutral (no key) header.i=@linutronix.de header.b=H9k4MA+4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8050E1B29C2; Tue, 12 Apr 2022 14:25:30 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357561AbiDLQXC (ORCPT + 99 others); Tue, 12 Apr 2022 12:23:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47252 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1357541AbiDLQXB (ORCPT ); Tue, 12 Apr 2022 12:23:01 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F7075D197 for ; Tue, 12 Apr 2022 09:20:43 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1649780440; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=6yP5J8nxyQra9M+hDPLnlcwcmCfGna/5XejuacIZlIg=; b=N1oK5VzI2sa6RgXreww7KwtqF40d6u/LCoCgJFgO8kafu/rANPi3P9s/Mv64JAZy/2SHx1 wkfMklYPg/2vXD1//kKVQHg0oQE9e0vuDChQsCjyR/4aEn+wEZouXcrhEroIX9kvV61I8b DSgCQN+54Ys47eIJ+kdXZS+YAeknOvhEs/5sJ9A71pYM/sGrL2DY6M8IR/FelMLZxaZnYT y0xHcebe9vmiJBBv/QzFCCOXaO/Jm76Yt1VZYAmyiaFpKXHsiRivefq5lLg7eIfPgSFXSM VGxMc10744do/HfRWfGRsABriLu7wy16abYLSZJug8pbv684H027vRV1DxM2KQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1649780440; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=6yP5J8nxyQra9M+hDPLnlcwcmCfGna/5XejuacIZlIg=; b=H9k4MA+4/IG3seXGL3Q1NlTEfYreQBmGC4uAirVJ2d42xxjgyVPlQfkkl1u3wl9HV6/izf 2uVUBgNLY/T64lAA== To: Nico Pache , Peter Zijlstra Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Rafael Aquini , Waiman Long , Baoquan He , Christoph von Recklinghausen , Don Dutile , "Herton R . Krzesinski" , David Rientjes , Michal Hocko , Andrea Arcangeli , Andrew Morton , Davidlohr Bueso , Ingo Molnar , Joel Savitz , Darren Hart , stable@kernel.org Subject: Re: [PATCH v8] oom_kill.c: futex: Don't OOM reap the VMA containing the robust_list_head In-Reply-To: <1a7944c7-d717-d5af-f71d-92326f7bb7f6@redhat.com> References: <20220408032809.3696798-1-npache@redhat.com> <20220408081549.GM2731@worktop.programming.kicks-ass.net> <87k0bzk7e5.ffs@tglx> <1a7944c7-d717-d5af-f71d-92326f7bb7f6@redhat.com> Date: Tue, 12 Apr 2022 18:20:40 +0200 Message-ID: <87h76yff3b.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 11 2022 at 19:51, Nico Pache wrote: > On 4/8/22 09:54, Thomas Gleixner wrote: >> The below reproduces the problem nicely, i.e. the lock() in the parent >> times out. So why would the OOM killer fail to cause the same problem >> when it reaps the private anon mapping where the private futex sits? >> >> If you revert the lock order in the child the robust muck works. > > Thanks for the reproducer Thomas :) > > I think I need to re-up my knowledge around COW and how it effects > that stack. There are increased oddities when you add the pthread > library that I cant fully wrap my head around at the moment. The pthread library functions are just conveniance so I did not have to hand code the futex and robust list handling. > My confusion lies in how the parent/child share a robust list here, but they > obviously do. In my mind the mut_s would be different in the child/parent after > the fork and pthread_mutex_init (and friends) are done in the child. They don't share a robust list, each thread has it's own. The shared mutex mut_s is initialized in the parent before fork and it's the same address in the child and it's not COWed because the mapping is MAP_SHARED. The child allocates private memory and initializes the private mutex in that private mapping. So now child does: pthread_mutex_lock(mut_s); That's the mutex in the memory shared with the parent. After that the childs robusts list head points to mut_s::robust_list. Now it does: pthread_mutex_lock(mut_p); after that the childs robust list head points to mut_p::robust_list and mut_p::robust_list points to mut_s::robust_list. So now the child unmaps the private memory and exists. The kernel tries to walk the robust list pointer and faults when trying to access mut_p. End of walk and mut_s stays locked. So now think about the OOM case. The killed process has a shared mapping with some other unrelated process (file, shmem) where mut_p sits. It gets killed after: pthread_mutex_lock(mut_s); pthread_mutex_lock(mut_p); So the OOM reaper rips the VMA which contains mut_p and therefore breaks the chain which is necessary to reach mut_p. See? Thanks, tglx