Received: by 10.192.165.148 with SMTP id m20csp4055322imm; Mon, 30 Apr 2018 10:53:28 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpEZduy5DFe86An4hUpwunqCW5INCs3IrkyVyXm9GRjT/kkOfcRJvwpt7LYCmTvuuNtuNwI X-Received: by 2002:a65:498e:: with SMTP id r14-v6mr10450218pgs.78.1525110808795; Mon, 30 Apr 2018 10:53:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525110808; cv=none; d=google.com; s=arc-20160816; b=TrgF+MY0Z9C+Zabb568GGT1PnabK+8y6OrJQonGArKZ+e41ZojD2vr9/BHqMVwqfuS xrOp2ZdrvElih37peyIgZOowtsH9qAAESk0GggFZyALRfEFI5rhuQAMGGEoT36oxrjoz 1Dui8VGEwVsI4IrkyLcVHJuKesk20NulTjaL+M7X/Ch2IhUwiKzNl0pG2c5OEtgZfRDn ss31lijnUMv8OIdzQGBp7iyknOHBKnSuqrAJZ4Dxtyq3AW23HPKBVN8/6j/dL4XVC+oQ Hzogl4PjgSYDinkS4wUhUbBOYEIN2qWnncZHwMjIpu1dpWRaoBcoeeDEO1dVMVU1Xjf+ fmDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:arc-authentication-results; bh=dqwoStTgCnqKB+qXABpZNFTLemE7aLy6Gjpsk/YSjII=; b=U99aG39OTTWjko8KBWM2QzMibMf/wZ2XuMF93rNSC1gJyAi7wlczb4UAqEbYDdPTVt EpZZmEZVLQ4ynM4koV50bbLfpoE+2EeL032IVGi0JGNQoduv4gXWophURMKMoxrUOjE3 kuJZbBh0BwJfskGKbixRVnDCg9BkpzIVbbyUBLc1/kPjSQYVRC3ZWAakzSvDTJPuCNgy jIZv/+TUmTe5Jf4sAKLg6OAZop8Mtm9frvCKoYhrC/GTe+W3GnCFR088RuR4XQfgAV7q X4E/x3e3DOoAW5LdMdkQ1Yqq7f4CkatZzORWea4k7yq7evaM8YOPgqN2/fwrEVeHiRXb 1WZw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k3-v6si6594275pgn.634.2018.04.30.10.53.14; Mon, 30 Apr 2018 10:53:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754183AbeD3Rvs (ORCPT + 99 others); Mon, 30 Apr 2018 13:51:48 -0400 Received: from mga01.intel.com ([192.55.52.88]:6315 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753425AbeD3Rvr (ORCPT ); Mon, 30 Apr 2018 13:51:47 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 Apr 2018 10:51:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,346,1520924400"; d="scan'208";a="52039507" Received: from jacob-builder.jf.intel.com (HELO jacob-builder) ([10.7.199.155]) by orsmga001.jf.intel.com with ESMTP; 30 Apr 2018 10:51:46 -0700 Date: Mon, 30 Apr 2018 10:54:24 -0700 From: Jacob Pan To: Jean-Philippe Brucker Cc: "iommu@lists.linux-foundation.org" , LKML , Joerg Roedel , David Woodhouse , Greg Kroah-Hartman , Alex Williamson , Rafael Wysocki , "Liu, Yi L" , "Tian, Kevin" , Raj Ashok , Christoph Hellwig , Lu Baolu , jacob.jun.pan@linux.intel.com Subject: Re: [PATCH v4 14/22] iommu: handle page response timeout Message-ID: <20180430105424.3f12f792@jacob-builder> In-Reply-To: References: <1523915351-54415-1-git-send-email-jacob.jun.pan@linux.intel.com> <1523915351-54415-15-git-send-email-jacob.jun.pan@linux.intel.com> <20180423153622.GC38106@ostrya.localdomain> <20180425083711.222202e7@jacob-builder> Organization: OTC X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 30 Apr 2018 11:58:10 +0100 Jean-Philippe Brucker wrote: > On 25/04/18 16:37, Jacob Pan wrote: > >> In the other cases (unsupported PRI or rogue guest) then disabling > >> PRI using a FAILURE status might be the right thing to do. However, > >> assuming the device follows the PCI spec it will stop sending page > >> requests once there are as many PPRs in flight as the allocated > >> credit. > >> > > Agreed, here I am not taking any actions. There may be need to drain > > in-fly requests. > > Right, as long as we first ensure that no new fault is generated (by > using a Response Failure). Though in my opinion not taking action > might be the safest option :) > > Another thought: currently the comment in iommu.h says > "@IOMMU_FAULT_STATUS_FAILURE: General error. Drop all subsequent > faults from this device if possible. This is "Response Failure" in > PCI PRI." > > I wonder if we should simply say "Drop all subsequent faults from the > device". Even if the PCI device doesn't properly implement PRI, the > IOMMU driver should set a "PRI disabled" bit in the device data that > prevents it from from reporting new faults and flooding the queue. > Anyway, it's a small detail that could go in a future patch series. > right, we should disable PRI and let future PRQ response pending on re-enabling of PRI on the device. I will leave that to future enhancement. > >> If there isn't any possibility of memory leak or abusing > >> resources, I don't think it's our problem that the guest is > >> excessively slow at handling page requests. Setting an upper bound > >> to page request latency might do more harm than good. Ensuring > >> that devices respect the number of allocated in-flight PPRs is > >> more important in my opinion. > > How about we have a really long timeout, e.g. 1 min similar to > > device invalidate response timeout in ATS spec., just for basic > > safety and diagnosis. Optionally, we could have quota in parallel. > > I agree that for development a timeout is useful. It might be worth > adding it as an option to the IOMMU module instead of a define. > Perhaps a number of seconds, 10 being the default and 0 disabling the > timeout? Otherwise we would probably end up with a succession of > patches incrementing the timeout by arbitrary values, if people find > it inconvenient. > make sense. will do.