Received: by 2002:a25:ef43:0:0:0:0:0 with SMTP id w3csp19279ybm; Tue, 26 May 2020 09:42:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyQIBIlnUwYw0+jWVsLO5TknmjwVn4rIZC5ps1ov/TPOBBARqOy9O9/HH56MR/Ja19JEvs+ X-Received: by 2002:a05:6402:1547:: with SMTP id p7mr20269067edx.31.1590511347436; Tue, 26 May 2020 09:42:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590511347; cv=none; d=google.com; s=arc-20160816; b=rJVWxe1uhuJroqjQAzQoMlhE8PSU/MHZuve+7cVwtlsdDpsKUYT7vXYLZCdb/7Z1xp mOAVdnesqWQHiwdHoe25NgDMzfF5QYj/AWEqXkw1egKME7Cop+AkZkuCFzvp8I81eKxq vbHu282JV9QuRxvaE/emDbbMT594Br/0DoqfGrBh071kmAkCPKQ/cpa9oaq7ung4IC7S medG3etZxYCscLPZrYktfDEqUzQ0rIgug74XAb1sO1UPBzPyGIy4r9PNaigyW8SgSUBg bOis6yeQWBVICNUXsV/oRKh8VR445P0GmRD9dhyWmsW8EFxMYdhtN8Q2uYpwqV7y5q2l 6xMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:in-reply-to :subject:cc:to:from:user-agent:references; bh=eYesyRxYN7xsYUYTTt3z1yeHeUvOxJBLAHAlCp/v8ok=; b=Q3kEj+02oGyfWeFKlmhbejOEE3V4vgGs1bj4OSvKX3B73vddqEolIYJYJSWhZmuSjR gVVR4tc77D6X3/cVvfDNqfd78/Sfx2m3Nr9npqeFtYCHrzMKO2JgiRchXv855yjnDmdA KNarTsNuEj72MuflnPgplGoyvaSzaUilwwUT+S6jCKf2fx/eFKSXNWapOXaW/A7zdJuj mL86JYsaaxEwMIIUWlWrHgc4XAtjqxOAQGg50poVaBXDHUw/r15yK27ujqAy3GHNLArb 2S8zdJv8yU6IRcrKgJhFNyMPVP5KXIhboMF64H+jVR16HUbALNZnOBsBOChE035qjX7u +oVQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f22si217481edx.204.2020.05.26.09.42.04; Tue, 26 May 2020 09:42:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729784AbgEZQkK (ORCPT + 99 others); Tue, 26 May 2020 12:40:10 -0400 Received: from foss.arm.com ([217.140.110.172]:53528 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728686AbgEZQkJ (ORCPT ); Tue, 26 May 2020 12:40:09 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6337930E; Tue, 26 May 2020 09:40:09 -0700 (PDT) Received: from e113632-lin (e113632-lin.cambridge.arm.com [10.1.194.46]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7FFD33F52E; Tue, 26 May 2020 09:40:08 -0700 (PDT) References: <865de121-8190-5d30-ece5-3b097dc74431@kernel.dk> User-agent: mu4e 0.9.17; emacs 26.3 From: Valentin Schneider To: Jens Axboe Cc: Peter Zijlstra , Ingo Molnar , "linux-kernel\@vger.kernel.org" , Stefano Garzarella Subject: Re: [PATCH] sched/fair: don't NUMA balance for kthreads In-reply-to: <865de121-8190-5d30-ece5-3b097dc74431@kernel.dk> Date: Tue, 26 May 2020 17:40:06 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 26/05/20 16:38, Jens Axboe wrote: > Stefano reported a crash with using SQPOLL with io_uring: > > BUG: kernel NULL pointer dereference, address: 00000000000003b0 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 800000046c042067 P4D 800000046c042067 PUD 461fcf067 PMD 0 > Oops: 0000 [#1] SMP PTI > CPU: 2 PID: 1307 Comm: io_uring-sq Not tainted 5.7.0-rc7 #11 > Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.2.6 06/08/2015 > RIP: 0010:task_numa_work+0x4f/0x2c0 > Code: 18 4c 8b 25 e3 f0 8e 01 49 8b 9f 00 08 00 00 4d 8b af c8 00 00 00 49 39 c7 0f 85 e8 01 00 00 48 89 6d 00 41 f6 47 24 04 75 67 <48> 8b ab b0 03 00 00 48 85 ed 75 16 8b 3d 6f 68 94 01 e8 aa fb 04 > RSP: 0018:ffffaaa98415be10 EFLAGS: 00010246 > RAX: ffff953ee36b8000 RBX: 0000000000000000 RCX: 0000000000000000 > RDX: 0000000000000001 RSI: ffff953ee36b8000 RDI: ffff953ee36b8dc8 > RBP: ffff953ee36b8dc8 R08: 00000000001200db R09: ffff9542e3ad2e08 > R10: ffff9542ecd20070 R11: 0000000000000000 R12: 00000000fffca35b > R13: 000000012a06a949 R14: ffff9542e3ad2c00 R15: ffff953ee36b8000 > FS: 0000000000000000(0000) GS:ffff953eefc40000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00000000000003b0 CR3: 000000046bbd0002 CR4: 00000000001606e0 > Call Trace: > task_work_run+0x68/0xa0 > io_sq_thread+0x252/0x3d0 > ? finish_wait+0x80/0x80 > kthread+0xf9/0x130 > ? __ia32_sys_io_uring_enter+0x370/0x370 > ? kthread_park+0x90/0x90 > ret_from_fork+0x35/0x40 > > which is task_numa_work() oopsing on current->mm being NULL. The task > work is queued by task_tick_numa(), which checks if current->mm is NULL > at the time of the call. But this state isn't necessarily persistent, > if the kthread is using use_mm() to temporarily adopt the mm of a task. > > Change the task_tick_numa() check to exclude kernel threads in general, > as it doesn't make sense to attempt ot balance for kthreads anyway. > Does it? (this isn't a rethorical question) Suppose a given kthread ends up doing more accesses to some pages (via use_mm()) than the other threads that access them, wouldn't it make sense to take that into account when it comes to NUMA balancing? AFAIA that's not something that really happens in the wild, and as you say we don't have a (guaranteed) persistent state to work with in those cases, so it's more of a pipe dream. With that in mind, I think the proposed change is fine. Reviewed-by: Valentin Schneider > Reported-by: Stefano Garzarella > Signed-off-by: Jens Axboe > > --- > > This should go into 5.7 imho > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 02f323b85b6d..403556fc16d4 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2908,7 +2908,7 @@ static void task_tick_numa(struct rq *rq, struct task_struct *curr) > /* > * We don't care about NUMA placement if we don't have memory. > */ > - if (!curr->mm || (curr->flags & PF_EXITING) || work->next != work) > + if ((curr->flags & (PF_EXITING | PF_KTHREAD)) || work->next != work) > return; > > /*