Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933191AbbGUUkV (ORCPT ); Tue, 21 Jul 2015 16:40:21 -0400 Received: from g9t5008.houston.hp.com ([15.240.92.66]:56430 "EHLO g9t5008.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752510AbbGUUkT (ORCPT ); Tue, 21 Jul 2015 16:40:19 -0400 Message-ID: <1437511150.3214.230.camel@hp.com> Subject: Re: Regression in v4.2-rc1: vmalloc_to_page with ioremap From: Toshi Kani To: Ashutosh Dixit Cc: "linux-kernel@vger.kernel.org" , "Dutt, Sudeep" , "Rao, Nikhil" , "Williams, Dan J" Date: Tue, 21 Jul 2015 14:39:10 -0600 In-Reply-To: References: <6DC2528F945B4149AB6566DFB5F22ED39BC5540B@ORSMSX115.amr.corp.intel.com> <1437407695.3214.156.camel@hp.com> <1437407942.3214.159.camel@hp.com> <1437420078.3214.185.camel@hp.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.16.3 (3.16.3-2.fc22) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4553 Lines: 113 On Tue, 2015-07-21 at 08:17 -0700, Ashutosh Dixit wrote: > On Mon, Jul 20 2015 at 12:21:18 PM, Toshi Kani wrote: > > > > Can you send me outputs of the following files? If the driver > > > > fails to load in v4.2-rc1, you can obtain the info in v4.1. > > > > > > > > /proc/mtrr > > > > /proc/iomem > > > > /proc/vmallocinfo > > > > /sys/kernel/debug/kernel_page_tables (need CONFIG_X86_PTDUMP set) > > > > > > Since the outputs are large I have sent you the outputs in a separate > > > mail outside the mailing list. > > > > Did you collect the info with your driver loaded? I did not see > > ioremap > > requests from your driver in the vmallocinfo output. > > Sorry, yes the driver was loaded. The ioremap itself is not done by > scif.ko > but by mic_host.ko and then the addresses are passed to scif.ko. So I > think what you are looking for is this: > > 0xffffc90040000000-0xffffc90240001000 8589938688 mic_probe+0x281/0x5c0 \ > [mic_host] phys=3c7e00000000 ioremap > 0xffffc90280000000-0xffffc90480001000 8589938688 mic_probe+0x281/0x5c0 \ > [mic_host] phys=3c7c00000000 ioremap OK, this confirms that they had 4KB mappings to 8GB ranges before. 0xffffc90040000000-0xffffc90240000000 8G RW PWT GLB NX pte 0xffffc90280000000-0xffffc90480000000 8G RW PWT GLB NX pte And these ranges are from PCI MMIO. 3c7c00000000-3c7dffffffff : PCI Bus 0000:84 3c7c00000000-3c7dffffffff : 0000:84:00.0 3c7c00000000-3c7dffffffff : mic 3c7e00000000-3c7fffffffff : PCI Bus 0000:83 3c7e00000000-3c7fffffffff : 0000:83:00.0 3c7e00000000-3c7fffffffff : mic > > > > Also, does the driver map a regular memory range with ioremap? If > > > > not, > > > > how does 'struct page' get allocated for the range (since > > > > vmalloc_to_page returns a page pointer)? > > > > > > No the driver does not map regular memory with ioremap, only device > > > memory. vmalloc_to_page was returning a valid 'struct page' in this > > > case > > > too. It appears it can do this correctly using pte_page as long as > > > all > > > four > > > page table levels (pgd, pud, pmd, pte) are present and the problem > > > seems > > > to > > > be happening because in the case of huge pages they are not. For us > > > the > > > bar > > > size is 8 G so we think that with the new ioremap maps the bars using > > > 1 G > > > pages. > > > > Well, it is probably not a valid `struct page` pointer you got from > > vmalloc_to_page... pte_page(pte) simply adds a pfn to vmemmap. So, > > yes, > > you get a pointer, which can be put back to the pfn by subtracting > > vmemmap. > > It may look fine, but this pointer does not point to a struct page > > unless > > it is part of the regular memory (or you somehow allocated struct page > > for > > the range). Can you check to see if the struct page pointer from > > vmalloc_to_page actually points a struct page entry? > > > > /* memmap is virtually contiguous. */ > > #define __pfn_to_page(pfn) (vmemmap + (pfn)) > > #define __page_to_pfn(page) (unsigned long)((page) - vmemmap) > > Yes, you are correct, the 'struct page' pointer returned by > vmalloc_to_page does not point to a "real" struct page entry. Neither > have we allocated a struct page for the range. However, because we pass > the returned pointer to streaming DMA mapping API's > (dma_map_ops->dma_map_sg or dma_map_ops->dma_map_page) and all those > functions do is call page_to_phys, they only care about the physical > address, it used to work. > > Would it be possible to have a different API which can do this or can > vmalloc_to_page be updated to handle huge ioremaps without crashing? Or > would you have a suggestion for doing this differently? You can do the following instead. If you have the physical address already (i.e. the address you passed to ioremap), you can skip slow_virt_to_phys(). pfn_to_page() is a hack for the time being so that you can use the same DMA mapping APIs. phys = slow_virt_to_phys(vaddr); page = pfn_to_page(phys >> PAGE_SHIFT); Dan is working on the change to introduce __pfn_t. With this change, you can pass a pfn, instead of a fake page pointer, to APIs. You may want to check if your APIs are covered in this change. https://lkml.org/lkml/2015/6/5/802 Thanks, -Toshi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/