Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp1736845ybh; Tue, 14 Jul 2020 06:11:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxt3w6X3jTziiRBOATn8sxdDeFXtwiLXzFkW5y2S6XANreD7HJor3eaawpkorCW2gL1oKmO X-Received: by 2002:a17:906:c44c:: with SMTP id ck12mr4532823ejb.145.1594732293540; Tue, 14 Jul 2020 06:11:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594732293; cv=none; d=google.com; s=arc-20160816; b=WsCE/jD+1t4zrlb3XHiwBqcamj+9Ome1EkAw7GxD5hlH5irkfi8a1Owgl6cmY9/hI/ 2CRe1BwtCLOD3rBVY6LyRnHILBsLhTfm45aZ41knnFhihXTmbpBHx4VM0qTIEsGxUda+ uACp/dDz/1VTokRIpN4enh16UJZAWG31ey5pAw8yN+HapRa2ZtJ2wFGyBzXw2APzJ8gh lHV8pBv3Alxb3KERpbH9TGEyWmo7U2Qr1ggW58ik+QyjrukEXdYvqvs2YfbhJxj0iNUC DTwdV85FiZJBCES1iuRyGf/p56LsU/61bEdqiArBmnVuT/+yop3+IXr6HyDc10ckR+Uq u8HQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=lv9gVEtOCHmdCNCXNh+kZOMT5/e2RCfYMVdBCxYu8hY=; b=sx4fSjXVvVdgeZBTHT8UbKW52Zq4QnOfpjmmBLRJig9ySQku3nUoNkhoLUZVnw7gvN 4vJmDC8Yut2H0wJPnXUsAOc9PX8pd23n8g4i1RtScVui9qCGGk0NVMSgIy1OCPjARm8h rj9O+ZQdsJdTmEJIz0XL/ATmkskCxAUj4myd8kZ9Vqcvnrwtw2zic4EXCyRCbNUFmO3V o8gL7pPlf63U1HW2rz5oKMFUNyWtNBriTOEuqVrSQQCmhdcAdAy4syWEFwx0lQF/K8AV 3j3nt287HG/Ga+Z+jaLmlFuQkahUGtLHHQOwzH+0MPBiALiSD1nA+F+j5JVrA9H6ZMeV 0WVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=GGvCVe9f; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c95si14334038edf.304.2020.07.14.06.11.10; Tue, 14 Jul 2020 06:11:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=GGvCVe9f; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728269AbgGNNKq (ORCPT + 99 others); Tue, 14 Jul 2020 09:10:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41850 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726354AbgGNNKp (ORCPT ); Tue, 14 Jul 2020 09:10:45 -0400 Received: from mail-io1-xd44.google.com (mail-io1-xd44.google.com [IPv6:2607:f8b0:4864:20::d44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73B89C061755; Tue, 14 Jul 2020 06:10:45 -0700 (PDT) Received: by mail-io1-xd44.google.com with SMTP id d18so17203604ion.0; Tue, 14 Jul 2020 06:10:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lv9gVEtOCHmdCNCXNh+kZOMT5/e2RCfYMVdBCxYu8hY=; b=GGvCVe9fQU1dq95jD2QsY9q8iP9qkPLUMm8oPCRjJ7Anpl4RZFDQbyaFWBWnDnpLZ4 akCnj2zM5URBnRzaIYx3QViomArLjo2TYpYkKUdo/rc9Ev91ZRKMH/8XeuHIazcC6c9X X5e55Z39z5v/S/1QJJmyS5e4BULhf2dcpWbN4Rz5aiw1fkHUaQuYK//DOOIuRhg/YMoN n2xr6Y/ChWx51FMUP57QmncbKCORyhewPR3gy1lBR0mf/Hn9kfig/IhmlXPSlGpiPi80 ugnw+VVG06u4vmLe0YcKVW76Sjjn51ULmDMzk5J0lyt30N7aqkZelE3x0JW4tdCaw606 ps1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lv9gVEtOCHmdCNCXNh+kZOMT5/e2RCfYMVdBCxYu8hY=; b=MwxVaudSE/X1IYIAZJFlTYzsvAapWoaQW9dkpQ0+T4jgQI5kBd2tEAD2cFp8FeHU03 DiqoHp9BIAqsxjvb88VIwGsTKtjLXfEAjXORSUPMdJzuOvBUAroIRDohIXx6XEpZ9MOw zKnSg/axMc+O9tLxi08LRuyEYk8mBcxo7z0OCHOhdNV0C+lqFLoX5/ZDs5R6kRqLSidj f/YcpblgVDNkrc3iaNtfsmq3PNIuu/ITrvgGyapuU3YCzAtOjeM8JQd0AAh0AP4freYb N1Yu7TKQFI2a2hdgd1UNXfdTZuSF3fgJOxNzav44cz1pXUs6pZZW33QBw4QpdE0qFTcJ QhTw== X-Gm-Message-State: AOAM5303SgIuUS2BDRhPz9hGEYCiZuEB7MuMFNaCohRnrk6VNlC8rBQn 7SgcrTelWNWNsXFaRrkT5073ZksYPrdeeCi77E0V4lPF X-Received: by 2002:a6b:b483:: with SMTP id d125mr4887880iof.186.1594732244839; Tue, 14 Jul 2020 06:10:44 -0700 (PDT) MIME-Version: 1.0 References: <20200714025417.A25EB95C0339@us180.sjc.aristanetworks.com> In-Reply-To: <20200714025417.A25EB95C0339@us180.sjc.aristanetworks.com> From: Amir Goldstein Date: Tue, 14 Jul 2020 16:10:33 +0300 Message-ID: Subject: Re: soft lockup in fanotify_read To: Francesco Ruggeri Cc: linux-kernel , linux-fsdevel , Jan Kara Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 14, 2020 at 5:54 AM Francesco Ruggeri wrote: > > We are getting this soft lockup in fanotify_read. > The reason is that this code does not seem to scale to cases where there > are big bursts of events generated by fanotify_handle_event. > fanotify_read acquires group->notification_lock for each event. > fanotify_handle_event uses the lock to add one event, which also involves > fanotify_merge, which scans the whole list trying to find an event to > merge the new one with. Yes, that is a terribly inefficient merge algorithm. If it helps I am carrying a quick brown paper bag fix for this issue in my tree: @@ -65,6 +74,8 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) { struct fsnotify_event *test_event; struct fanotify_event *new; + int limit = 128; + int i = 0; pr_debug("%s: list=%p event=%p\n", __func__, list, event); new = FANOTIFY_E(event); @@ -78,6 +89,9 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) return 0; list_for_each_entry_reverse(test_event, list, list) { + /* Event merges are expensive so should be limited */ + if (++i > limit) + break; if (should_merge(test_event, event)) { It's somewhere down my TODO list to fix this properly with a hash table. > In our case fanotify_read is invoked with a buffer big enough for 200 > events, and what happens is that every time fanotify_read dequeues an > event and releases the lock, fanotify_handle_event adds several more, > scanning a longer and longer list. This causes fanotify_read to wait > longer and longer for the lock, and the soft lockup happens before > fanotify_read can reach 200 events. > Is it intentional for fanotify_read to acquire the lock for each event, > rather than batching together a user buffer worth of events? I think it is meant to allow for multiple reader threads to read events with fairness, but not sure. Even if it was fine to read a batch of events on every spinlock acquire making the code in the fanotify_read() loop behave well in case of an error in an event after reading a bunch of good events looks challenging, but I didn't try. Anyway, the root cause of the issue seems to be the inefficient merge and not the spinlock taken per one event read. Thanks, Amir.