Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261236AbUDIW5M (ORCPT ); Fri, 9 Apr 2004 18:57:12 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261851AbUDIW5M (ORCPT ); Fri, 9 Apr 2004 18:57:12 -0400 Received: from citrine.spiritone.com ([216.99.193.133]:18336 "EHLO citrine.spiritone.com") by vger.kernel.org with ESMTP id S261236AbUDIW5C (ORCPT ); Fri, 9 Apr 2004 18:57:02 -0400 Date: Fri, 09 Apr 2004 15:56:51 -0700 From: "Martin J. Bligh" To: Hugh Dickins cc: linux-kernel@vger.kernel.org, Andrew Morton , Rajesh Venkatasubramanian Subject: Re: [PATCH] anobjrmap 9 priority mjb tree Message-ID: <5220000.1081551411@[10.10.2.4]> In-Reply-To: References: X-Mailer: Mulberry/2.2.1 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4622 Lines: 131 >> > This anobjrmap 9 (or anon_mm9) patch adds Rajesh's radix priority search >> > tree on top of Martin's 2.6.5-rc3-mjb2 tree, making a priority mjb tree! >> > Approximately equivalent to Andrea's 2.6.5-aa1, but using anonmm instead >> > of anon_vma, and of course each tree has its own additional features. >> >> This slows down kernel compile a little, but worse, it slows down SDET >> by about 25% (on the 16x). I think you did something horrible to sem >> contention ... presumably i_shared_sem, which SDET was fighting with >> as it was anyway ;-( >> >> Diffprofile shows: >> >> 122626 15.7% total >> 44129 790.0% __down >> 20988 4.1% default_idle > > Many thanks for the good news, Martin ;) > Looks like I've done something very stupid, perhaps a mismerge. > Not found it yet, I'll carry on looking tomorrow. I applied Andrew's high sophisticated proprietary semtrace technology. The common ones are: Call Trace: [] __down+0x96/0x10c [] default_wake_function+0x0/0x1c [] __down_failed+0x8/0xc [] .text.lock.mmap+0x39/0x12a [] do_mmap_pgoff+0x4cf/0x60c [] old_mmap+0x108/0x144 [] syscall_call+0x7/0xb Which is the vma_link call from do_mmap_pgoff here: /* Can addr have changed?? * * Answer: Yes, several device drivers can do it in their * f_op->mmap method. -DaveM */ addr = vma->vm_start; if (!file || !rb_parent || !vma_merge(mm, prev, rb_parent, addr, addr + len, vma->vm_flags, file, pgoff)) { vma_link(mm, vma, prev, rb_link, rb_parent); if (correct_wcount) atomic_inc(&inode->i_writecount); vma_link takes i_shared_sem. Call Trace: [] __down+0x96/0x10c [] default_wake_function+0x0/0x1c [] __down_failed+0x8/0xc [] .text.lock.mmap+0xc7/0x12a [] do_munmap+0xbc/0x128 [] do_mmap_pgoff+0x2b9/0x60c [] old_mmap+0x108/0x144 [] syscall_call+0x7/0xb Is the call to split_vma from do_munmap here: /* * If we need to split any vma, do it now to save pain later. * * Note: mremap's move_vma VM_ACCOUNT handling assumes a partially * unmapped vm_area_struct will remain in use: so lower split_vma * places tmp vma above, and higher split_vma places tmp vma below. */ if (start > mpnt->vm_start) { if (split_vma(mm, mpnt, start, 0)) return -ENOMEM; prev = mpnt; } split_vma takes i_shared_sem, then takes page_table_lock inside it (which probably isn't helping either ;-)). if (mapping) down(&mapping->i_shared_sem); spin_lock(&mm->page_table_lock); Call Trace: [] __down+0x96/0x10c [] default_wake_function+0x0/0x1c [] __down_failed+0x8/0xc [] .text.lock.mmap+0x5/0x12a [] exit_mmap+0x191/0x1d0 [] mmput+0x50/0x70 [] do_exit+0x1b9/0x330 [] do_group_exit+0x9e/0xa0 [] sys_exit_group+0xe/0x14 [] syscall_call+0x7/0xb That's remove_shared_vm_struct calling i_shared_sem Call Trace: [] __down+0x96/0x10c [] default_wake_function+0x0/0x1c [] __down_failed+0x8/0xc [] .text.lock.mmap+0x5/0x12a [] unmap_vma+0x44/0x78 [] unmap_vma_list+0x14/0x20 [] do_munmap+0x115/0x128 [] do_mmap_pgoff+0x2b9/0x60c [] old_mmap+0x108/0x144 [] syscall_call+0x7/0xb That's remove_shared_vm_struct again, but called from unmap_vma this time Call Trace: [] __down+0x96/0x10c [] default_wake_function+0x0/0x1c [] __down_failed+0x8/0xc [] .text.lock.fork+0x79/0x125 [] copy_process+0x61c/0xa6c [] do_fork+0x76/0x16f [] sys_clone+0x29/0x30 [] syscall_call+0x7/0xb That's dup_mmap taking i_shared_sem here: /* insert tmp into the share list, just after mpnt */ down(&file->f_mapping->i_shared_sem); __vma_prio_tree_add(tmp, mpnt); up(&file->f_mapping->i_shared_sem); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/