Received: by 10.213.65.68 with SMTP id h4csp23068imn; Mon, 26 Mar 2018 14:09:59 -0700 (PDT) X-Google-Smtp-Source: AG47ELvxhiIT4cjN2Dtgb+66EKbxKhdcdDV3ohPmPKBboHNtjrQLuIzSBTDXI/rJmyKl6p/+q3Z3 X-Received: by 2002:a17:902:8d92:: with SMTP id v18-v6mr42099721plo.21.1522098599681; Mon, 26 Mar 2018 14:09:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522098599; cv=none; d=google.com; s=arc-20160816; b=0Hr77oI/IifendHsO5M3iffl1z1USW1kFQkDI8fqgmKvT2lW96hFiXtg263Wt6O9g3 R6lEfkiwfZD6JGxN9i10YYfFjpx6W51qzcVl9U3U0fz0Y3vC1Qy8M4R1Li5ZfFvq1+CX RHRA/JDhbpiGDjYQWc1+ax14qYGiqz3UwmmuqsAiHClTQwYbnQS7ni5d6GyO6ifWMx2H Rz2gQygtLXvEnzdhrmB3BjkaVFRImvLeSwbEYyZn8QuMuMX8amrNSDosxhJaUWVUOUIX +QcnK79v83pj5n/Eh56RIQ++ArN39j+wLP0gnjLuJrT8vyZ8G5m3EIlAlnAuY/4N+GdP vRKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:subject:message-id:date:from :mime-version:dkim-signature:arc-authentication-results; bh=89EMiZZbutNBCaay2HrV9029wSD1/F2kFz8mbVYdDyQ=; b=wc3SuwQ7xEnW5LLruHfQaQ5CKAZUH+3fMkpAd0vJQ7X1COAmJgOxVwsFsvrhuXVVyG v5joAUo0k3vNgLWFWj3etYUHhwEc+PJvUa4uErSEOFWkR4zdUteN7G3m+kgaCGH5KaMi bRutYTqhp5puDyhT7BBjVESy8KultGWU9gZuoG7UPS645nNxXKnjacdKaBak0pUm6mRd ddGr07q2r3N3uugjrDV0mAYEXWumdht5DOWbSlggUpUUn8+nGLwcvYIDJxNPN3XkdR5X OwPd2zVVo+homXAXrLo+Q7rYAeFQJoUH6HDDFj2ukQUuS7ifZsVHtERJmdqmTLa71tG9 JjQA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=sptvynxB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 19si10714455pgg.395.2018.03.26.14.09.45; Mon, 26 Mar 2018 14:09:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=sptvynxB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751805AbeCZVIt (ORCPT + 99 others); Mon, 26 Mar 2018 17:08:49 -0400 Received: from mail-ua0-f172.google.com ([209.85.217.172]:45294 "EHLO mail-ua0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750983AbeCZVIr (ORCPT ); Mon, 26 Mar 2018 17:08:47 -0400 Received: by mail-ua0-f172.google.com with SMTP id j18so2480728uae.12 for ; Mon, 26 Mar 2018 14:08:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=89EMiZZbutNBCaay2HrV9029wSD1/F2kFz8mbVYdDyQ=; b=sptvynxBFpQjRVufx8Px7l48Am65RksdObiyaChRuWXckK8vUz3BJnp/2GBwbT5pss CM615J7PKJbRd9qW1pvbccPwXkcaxiAsQu6R0EvQtwCnxdVjXWNpKaeVh2YO3sEeSkoE 7Rd06kWL6OB8gvKSW8vIVGnXyNFBZ7bkv5AUv6PWeNBMjFPvTd7mYElXImr/XEK55Q4x IpleR5c1Yz6gcHejnT1Jz+s38Wy91wyj1Lmbpaq/IvyFVI98GgmJ8DDO4rqsQC/9hWNK p9shXAwXM+BRKa3HvVx9TeNP/kLP/+Ek0BdUC4yA+yGSkTvVRmDiQDXteqR4XVUVE69G 8t9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=89EMiZZbutNBCaay2HrV9029wSD1/F2kFz8mbVYdDyQ=; b=rvaqZPyVNgrHoo4/fKQF8FPSn9kd2yZLZ2AkJDC+zOcDFHnMaAbgpC6WGydq5wdsrF VzOwKbD3LQ+DMfOe3J9GFvODQ9w3PZV2acET2nh0vtFVkP4L8lbzL3hm3wElHDzlKRyO lKKXg1KE+ALaNHzVQ2rCovMxZHxT+T2FlxiKNmTG4wCMLZk0MW1NSMrwT7Sgvaa031S6 tMNel0hQeOzQOMW/HSqMJiR3Umtrs0OTHUq+ifRXn12PT0pEOsQ5euHT75OfMKCiefUe 6gPFzRlNPBpL8sHNJ64UkSndhSuPTMyK+mupkYTm3BGhv8ksr9WqJkwlUa9UeiA+lUOT /i/w== X-Gm-Message-State: AElRT7Gs6gWIi0iH3hLa6oS0FdW1GM/77F9ejGGepknztI3m77ntTpcA mMEwTTeUkGogREql+/gjD+wAqm+Np98JOAvqW2PR X-Received: by 10.176.21.1 with SMTP id o1mr17933998uae.60.1522098526736; Mon, 26 Mar 2018 14:08:46 -0700 (PDT) MIME-Version: 1.0 Received: by 10.159.51.228 with HTTP; Mon, 26 Mar 2018 14:08:46 -0700 (PDT) From: Timothy Normand Miller Date: Mon, 26 Mar 2018 17:08:46 -0400 Message-ID: Subject: HELP PLEASE: Without ugly hacks, no interrupt delivery at all to our driver; 3.10.0 kernel (RHEL7.4) on Intel 82X38/X48 chipset, Shuttle (SX38/FX38, Core 2 Duo) To: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Everyone, I have really been struggling with interrupt delivery from one of our PCI devices to our kernel driver, and I would really appreciate some feedback on what we're doing wrong here. I did manage to get it working, but I had to do some things that I would expect to be handled automatically if I were doing all of this correctly. ** Background and previously working drivers We've had a RHEL6 driver (RHEL 6.5, 2.6.32-431.el6.i686) working for a long time. So when we started porting it to RHEL7 (RHEL 7.4, 3.10.0-693.17.1.el7.x86_64) and couldn't get interrupts to work, we assumed it was that our driver required updates for the newer kernel version. However, we installed RHEL 6.5 on hardware more similar to the machine on which we're running RHEL 7.4, and we're having interrupt problems there too; in that case interrupts are delivered intermittently, probably because the driver is designed to work with level-triggered interrupts, but our device got connected to an edge-triggered interrupt line. Of course, one of the first things I did was rewrite the driver so as to make the hardware interrupt signal more edge-friendly. Unfortunately, this had no effect on the Shuttle machine we're developing on. Before I get into details, if you want info about a system on which interrupts work fine, please see my stackoverflow post at "https://stackoverflow.com/questions/49459207/rhel7-4-on-x86-with-intel-82x38-x48-express-chipset-completely-unable-to-get". There is also a bit more info there about the system we're having problems with. ** What our device is like I suspect a large part of the problem is that our device isn't really a PCIe device. It's a PCI device retrofitted with a TI XIO2000(A)/XIO2200A PCI Express-to-PCI Bridge. Large numbers of this product are out in the field, and we have to continue to support them so that air traffic controllers can continue to help pilots safely land planes. :) So when it comes to interrupts, the only option we have is the "legacy" Assert_INTx and Deassert_INTx MSI messages. Here is some info about how our card appears in the PCI hierarchy. "/proc/interrupts" tells us that it's on the APIC interrupt line 10 as an edge-triggered interrupt. Line 10 is consistent with what we read from PCI config space. # cat /proc/interrupts: CPU0 CPU1 ... 10: 0 0 IO-APIC-edge rapafp # lspci -vvv ... 02:00.0 Display controller: Tech-Source Device 0043 Subsystem: Tech-Source Device 0043 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B+ DisINTx- Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- irq=10 Mar 26 12:09:32 localhost kernel: raptor_attach: pci_enable_device succeeded! Mar 26 12:09:32 localhost kernel: After pci_enable_device: irq_byte=10, dev->irq=16 The effects of this make no sense. This is how I dump that info: char irq_byte; pci_read_config_byte(handle, PCI_INTERRUPT_LINE, &irq_byte); printk( KERN_INFO "After pci_read_config_byte: irq_byte=%d, dev->irq=%d\n", (int)irq_byte, pTspci->pdev->irq); So, the pci_dev struct has had irq updated to 16, and lspci reports that the IRQ line has been updated to 16 in the hardware. So why is it that when I read PCI config space directly, I get the old value? In fact, lspci is apparently LYING about this! Here's what I get from a raw dump of PCI config space: 02:00.0 Display controller: Tech-Source Device 0043 00: 27 12 43 00 02 02 a0 02 00 00 80 03 00 00 00 00 10: 08 00 ff bf 08 00 00 a0 08 00 00 b8 00 00 00 00 20: 08 00 fc bf 00 00 00 00 00 00 00 00 27 12 43 00 30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00 In that last line, you can clearly see that INTA is still routed to pin 10. Apparently, "lspci -vvv" is getting its "interpreted" config space info from "/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:00.0/", where the irq node reports 16. Doesn't it seem like a kernel bug that the IRQ number is changed in data structures, but PCI config space is not automatically updated with the correct pin? And since request_irq doesn't take a pci_dev pointer as an argument, there is no opportunity for any sanity checks, no way for request_irq to fix the incorrect setting in PCI config space, and no way for request_irq to return an error due to the mismatch. If there's some call I'm failing to make that would update the hardware properly, I can't figure out where it is. I've struggled to find good documentation on what is the proper way to do all this stuff. When I google things, I mostly only find really old stuff, and some of it's from theoretical discussions on LKML (e.g. https://patchwork.kernel.org/patch/7469831/). I have tried looking at lots of in-kernel drivers, but most of them don't seem to make most of the calls mentioned in posts like that one and don't seem to do anything about adapting to altered IRQ numbers. Before just jamming the correct IRQ number into the graphics chip's config space, I went looking for a proper way of doing this. I found pcibios_fixup_irqs, but that seems to want to alter every device in the system, which is probably not something we should be doing this long after booting. There seems to be a global function pointer pcibios_enable_irq that is called from pcibios_enable_device. And this brings me to the question as to what is the difference between pci_ calls and pcibios_ calls. One thing I can see is that pci_enable_device explicitly excludes interrupts from the flags it passes to __pci_enable_device_flags. Another thing I notice is that there is no mention of pcibios_enable_device in the kernel Documentation directory, while pci_enable_device IS. In fact, only two pcibios_ calls are mentioned anywhere in the docs. When googling this, I instead find discussions of deprecating pcibios_ calls. There are a few places in "arch/x86/pci" that write to PCI_INTERRUPT_LINE, but the only relevant one is pcibios_update_irq, which I assume is deprecated. In the Documention on PCI, it mentions using pci_enable_msi and pci_enable_msix calls. That and other text makes it pretty clear that INTx interrupts are the *default*, and MSI has to be enabled explicitly. Looking at do_pci_enable_device, if msi and msix are disabled, then it will clear PCI_COMMAND_INTX_DISABLE on the device, but it doesn't ascend the bridge hierarchy fixing those too. It also calls pcibios_enable_device. From what I can see, if both msi_enabled and msix_enabled are false, then it should fall back to INTx interrupts. So before calling pci_enable_device, I first printed out the msi flags in my pci_dev structure and then just for kicks called pci_disable_msi and pci_disable_msix. This had no impact. It wasn't until after I had added the code to manually fix the interrupt line in config space did I start receiving interrupts: [ 4898.207690] pci_dev->msi_enabled=0 pci_dev->msix_enabled=0 [ 4898.207839] raptor_attach: pci_enable_device succeeded! [ 4898.207844] After pci_enable_device: irq_byte=10, dev->irq=16 [ 4898.207846] PCI_INTERRUPT_LINE set wrong in hardware -- fixing [ 4898.207855] raptor_enable_intx: DisINTx already clear for device 0000:02:00.0 [ 4898.207861] raptor_enable_intx: DisINTx already clear for device 0000:01:00.0 [ 4898.207865] raptor_enable_intx: Successfully cleared DisINTx for device 0000:00:01.0 [ 4898.207867] raptor_attach: configured 1 bridges [ 4898.207870] After raptor_enable_intx: irq_byte=16, dev->irq=16 [ 4898.207875] raptor_attach: calling request_irq. [ 4898.207884] raptor_attach: request_irq(16) succeeded! [ 4898.207990] After request_irq: irq_byte=16, dev->irq=16 So finally, I have managed to get this driver kinda working. It needs more testing to verify that it fully behaves properly. So we have something we can work with for now. This driver has a long history, ported from OS to OS and linux version to linux version. It can be built for Tru64, AIX, Solaris, and several other UNIXes, thanks to the C preprocessor. And it also able to support various of our graphics cards going pretty far back. So it's complicated, and there is a lot of legacy code in there. So what I'd really like to do is bring it up to date and use all the proper API calls with out any iffy hacks that will break with the next chipset or kernel version. I'm sure I haven't provided enough info, so feel free to as questions, and I would really appreciate any feedback I can get. Thanks a million! Here's my code that walks the device tree. static int raptor_enable_intx(struct pci_dev *dev, TspciPtr pTspci) { int num_en = 0; int result; u16 cmd, old_cmd; while (dev) { pci_read_config_word(dev, PCI_COMMAND, &old_cmd); pci_intx(dev, true); pci_read_config_word(dev, PCI_COMMAND, &cmd); if (cmd & PCI_COMMAND_INTX_DISABLE) { printk (KERN_INFO "raptor_enable_intx: Could not clear DisINTx for device %s\n", pci_name(dev)); } else { printk (KERN_INFO "raptor_enable_intx: Successfully cleared DisINTx for device %s\n", pci_name(dev)); if ((old_cmd & PCI_COMMAND_INTX_DISABLE)) num_en++; } dev = pci_upstream_bridge(dev); } return num_en; } -- Timothy Normand Miller, PhD Principal Engineer, Eizo Rugged Solutions