Received: by 2002:a89:413:0:b0:1fd:dba5:e537 with SMTP id m19csp1649916lqs; Sat, 15 Jun 2024 20:47:58 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUMrUeFnd9TBB1k3JrLmtvFZmnQvhAkm4bVFcFMav11gvi+0iTFji2H4AP+ZCp132CLY5dvTAJ5LWxmFsqhGO9AaGVfV2rBi1D0dl692A== X-Google-Smtp-Source: AGHT+IEvbvqEyeX4YmQMqKoFvqiHM3MKxSV5uQdQbwsnkpJd+MS9E+7R2xuVcfySvS9bcUOY/Jse X-Received: by 2002:a05:622a:588:b0:441:382b:75c with SMTP id d75a77b69052e-4421687bef3mr92788371cf.20.1718509677744; Sat, 15 Jun 2024 20:47:57 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1718509677; cv=pass; d=google.com; s=arc-20160816; b=mbJ/TLt2auDDAr3yKZ8gJ3nqqMzEzLq5JuWFsd+u8vdHBpEB18Gk6DF6wtcCv673AA as15f7B6+IwtfnM4v1f3pOgd12RBxxIyqtAMbvCuO5GUgonwXmcg6JFBkMWe1ts1PQaQ PoC9oz2N+Ezu7Vm+p7ymdE8vBRCQOViY9GwLYPnuH20nLRx2mu43nypqdk1Ka/m9OYP7 J/IePrU1fQs9QMkEZpScgUxXG6RyXgfd0wGyrBrbV4ShonLO0zUSnmHirTwSkLXacBqP 9JwdSVH38wrpvZur+HpQ2mqsvjqN7O051Hjm/N101Zvps+iHXXfwvYxHW2kfi1hr9tWv r6LQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:cc:from:content-language :references:to:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:dkim-signature; bh=fHRM7ivjzV6OzBGsuvzci4doGaQpriteZPMqHJY7J1I=; fh=4mrcNo8/4At+Bh5UHQTDJmSDMsdJkxAucPEhxgIm8hQ=; b=tWQEuZAOPjm+YkESHT3Uk4P/DenmSTi3Hu7FkkavTwcpvhY4IkG7sziwxYV0P4zl6P zls4m70aepBmPYzSSYIT+uN74ws+pJKOB0cKf9FPdhPj5IRdM8ALdhf35JF+zesi28M6 crcBiSX6qBDm92fbLIFW/vQr7fKa11xN1NNSDj72GDncspXv6g/nfwIENUQ5NxWkmnNB 0mDsNaNoSFgDRj44n1Tq3UagdQ9219r0iIEUTvQrwY9UoIoKvlhlhKFu4/wvM2tT+2DZ yLrygrbyQTMT2aB0tvdj4r+2SV8MOEz2YKx7nWjH7jd6r2VikHAcCVPaYPe3PgBVr3Jb Rhmw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=OQo6Fyvq; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-216104-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-216104-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id d75a77b69052e-441f2feb4e1si69444161cf.610.2024.06.15.20.47.57 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Jun 2024 20:47:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-216104-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=OQo6Fyvq; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-216104-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-216104-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 68E961C20F2C for ; Sun, 16 Jun 2024 03:47:57 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C8604169ADA; Sun, 16 Jun 2024 03:47:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="OQo6Fyvq" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 99C73A20; Sun, 16 Jun 2024 03:47:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.14 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718509667; cv=none; b=oqqPfrv60+8ko2JM8x3Fv+XbQYqGb0m1JEraVtrXSDFF74QAqXopg3Pl5lhnbIeqwTNWF708H17KlPTObAYO3CXys8EAXc6M65SxQrU/RaSlURHoxAiQrm7YsiX8zz4AvHM5KAGn4DXrJeNJ6bdmQR7uPI+KRddyRDCYCvBqheg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718509667; c=relaxed/simple; bh=xtp4Gs5IDSmT3zH0M+U3sMn0gDatcq8ADESTWYEKAmY=; h=Message-ID:Date:MIME-Version:Subject:To:References:From:Cc: In-Reply-To:Content-Type; b=rqVGofDAN4C1h7KRxqIxLHwGebzhItlv3MKyx9YEds8UVthqQMH2P+QrhWWUh1oV0lwp+Qh2HBV42xQJHC7aI51sr5e0CmHd0spHOTPJDGgMZJNy/oLQZn82rsVqsFmH/jcpRkBkaqUSH/09fMwSHjGn0AXPP1SJU7DtpxrhhbE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=OQo6Fyvq; arc=none smtp.client-ip=198.175.65.14 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1718509666; x=1750045666; h=message-id:date:mime-version:subject:to:references:from: cc:in-reply-to:content-transfer-encoding; bh=xtp4Gs5IDSmT3zH0M+U3sMn0gDatcq8ADESTWYEKAmY=; b=OQo6FyvqWIWQW4IOkaLsz+Zdn8kk6eE5DVpnQg5X1zIv1QAMxmvcfxUj /+f+koxR5mgaTeecl0kBv3qLHp5x6sdyN58zljfu79Cu3/OXF6yLqj8u+ fl/J7FBDmiXxrqJlYaWDC6/DHjOtLdZenhZv4WN591xwWTg1qc8D/nyYS 7EsK+iX1jEbGSzK2TXTz8dhMIL4ZIq5ISaksKTTkjEd+owauwDQb+CalY 01WMgKUORsAyCEi4h86R3/mVsDhdgYGOV3m22FzuOBj3QVsYmfln7Q/1b 5F/oI4yxEyIPkXCMouaruHfe5dRsMINZz5faC1ooO9XDmnjSmJY3yyiVI w==; X-CSE-ConnectionGUID: UjDAxiHHT5+nJhcoIvS3NQ== X-CSE-MsgGUID: e41aFWGrQ6WtdORwZpRcUQ== X-IronPort-AV: E=McAfee;i="6700,10204,11104"; a="19193104" X-IronPort-AV: E=Sophos;i="6.08,241,1712646000"; d="scan'208";a="19193104" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by orvoesa106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jun 2024 20:47:45 -0700 X-CSE-ConnectionGUID: y0d1gb/7RjWBRdKavdjBAQ== X-CSE-MsgGUID: dJHAnFYNQ0+QcxqCCs9VeA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,241,1712646000"; d="scan'208";a="45992625" Received: from yma27-mobl.ccr.corp.intel.com (HELO [10.124.232.251]) ([10.124.232.251]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jun 2024 20:47:42 -0700 Message-ID: Date: Sun, 16 Jun 2024 11:47:40 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 3/3] fs/file.c: move sanity_check from alloc_fd() to put_unused_fd() To: Mateusz Guzik References: <20240614163416.728752-1-yu.ma@intel.com> <20240614163416.728752-4-yu.ma@intel.com> Content-Language: en-US From: "Ma, Yu" Cc: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, tim.c.chen@linux.intel.com, tim.c.chen@intel.com, pan.deng@intel.com, tianyou.li@intel.com, yu.ma@intel.com In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 6/15/2024 12:41 PM, Mateusz Guzik wrote: > On Fri, Jun 14, 2024 at 12:34:16PM -0400, Yu Ma wrote: >> alloc_fd() has a sanity check inside to make sure the FILE object mapping to the > Total nitpick: FILE is the libc thing, I would refer to it as 'struct > file'. See below for the actual point. Good point, not nitpick at all ;) , will update the word in commit message. >> Combined with patch 1 and 2 in series, pts/blogbench-1.1.0 read improved by >> 32%, write improved by 15% on Intel ICX 160 cores configuration with v6.8-rc6. >> >> Reviewed-by: Tim Chen >> Signed-off-by: Yu Ma >> --- >> fs/file.c | 14 ++++++-------- >> 1 file changed, 6 insertions(+), 8 deletions(-) >> >> diff --git a/fs/file.c b/fs/file.c >> index a0e94a178c0b..59d62909e2e3 100644 >> --- a/fs/file.c >> +++ b/fs/file.c >> @@ -548,13 +548,6 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags) >> else >> __clear_close_on_exec(fd, fdt); >> error = fd; >> -#if 1 >> - /* Sanity check */ >> - if (rcu_access_pointer(fdt->fd[fd]) != NULL) { >> - printk(KERN_WARNING "alloc_fd: slot %d not NULL!\n", fd); >> - rcu_assign_pointer(fdt->fd[fd], NULL); >> - } >> -#endif >> > I was going to ask when was the last time anyone seen this fire and > suggest getting rid of it if enough time(tm) passed. Turns out it does > show up sometimes, latest result I found is 2017 vintage: > https://groups.google.com/g/syzkaller-bugs/c/jfQ7upCDf9s/m/RQjhDrZ7AQAJ > > So you are moving this to another locked area, but one which does not > execute in the benchmark? The consideration here as mentioned is to reduce the performance impact (if to reserve the sanity check, and have the same functionality) by moving it from critical path to non-critical, as put_unused_fd() is mostly used for error handling when fd is allocated successfully but struct file failed to obtained in the next step. > > Patch 2/3 states 28% read and 14% write increase, this commit message > claims it goes up to 32% and 15% respectively -- pretty big. I presume > this has to do with bouncing a line containing the fd. > > I would argue moving this check elsewhere is about as good as removing > it altogether, but that's for the vfs overlords to decide > > All that aside, looking at disasm of alloc_fd it is pretty clear there > is time to save, for example: > > if (unlikely(nr >= fdt->max_fds)) { > if (fd >= end) { > error = -EMFILE; > goto out; > } > error = expand_files(fd, fd); > if (error < 0) > goto out; > if (error) > goto repeat; > } > > This elides 2 branches and a func call in the common case. Completely > untested, maybe has some brainfarts, feel free to take without credit > and further massage the routine. > > Moreover my disasm shows that even looking for a bit results in > a func call(!) to _find_next_zero_bit -- someone(tm) should probably > massage it into another inline. > > After the above massaging is done and if it turns out the check has to > stay, you can plausibly damage-control it with prefetch -- issue it > immediately after finding the fd number, before any other work. > > All that said, by the above I'm confident there is still *some* > performance left on the table despite the lock. Thank you Guzik for such quick check and good suggestions :) Yes, there are extra branches and func call can be reduced for better performance, considering the fix for fast path, how about flow as below draft (besides the massage to find_next_fd()):         error = -EMFILE;         if (fd < fdt->max_fds) {                 if (~fdt->open_fds[0]) {                         fd = find_next_zero_bit(fdt->open_fds, BITS_PER_LONG, fd);                         goto fastreturn;                 }                 fd = find_next_fd(fdt, fd);         }         if (unlikely(fd >= fdt->max_fds)) {                 error = expand_files(files, fd);                 if (error < 0)                         goto out;                 if (error)                         goto repeat;         } fastreturn:         if (unlikely(fd >= end))                 goto out;         if (start <= files->next_fd)                 files->next_fd = fd + 1;        .... >> out: >> spin_unlock(&files->file_lock); >> @@ -572,7 +565,7 @@ int get_unused_fd_flags(unsigned flags) >> } >> EXPORT_SYMBOL(get_unused_fd_flags); >> >> -static void __put_unused_fd(struct files_struct *files, unsigned int fd) >> +static inline void __put_unused_fd(struct files_struct *files, unsigned int fd) >> { >> struct fdtable *fdt = files_fdtable(files); >> __clear_open_fd(fd, fdt); >> @@ -583,7 +576,12 @@ static void __put_unused_fd(struct files_struct *files, unsigned int fd) >> void put_unused_fd(unsigned int fd) >> { >> struct files_struct *files = current->files; >> + struct fdtable *fdt = files_fdtable(files); >> spin_lock(&files->file_lock); >> + if (unlikely(rcu_access_pointer(fdt->fd[fd]))) { >> + printk(KERN_WARNING "put_unused_fd: slot %d not NULL!\n", fd); >> + rcu_assign_pointer(fdt->fd[fd], NULL); >> + } >> __put_unused_fd(files, fd); >> spin_unlock(&files->file_lock); >> }