Received: by 10.223.185.111 with SMTP id b44csp1653810wrg; Sat, 10 Mar 2018 10:41:34 -0800 (PST) X-Google-Smtp-Source: AG47ELvVDjsLtGx1m6pAfpJGRMuP3i9oY0bY9/SGFaAyqvq74ZsrPmTZTRu1V+r0xEK/8UCqAzxk X-Received: by 2002:a17:902:8a4:: with SMTP id 33-v6mr2886758pll.274.1520707294108; Sat, 10 Mar 2018 10:41:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520707294; cv=none; d=google.com; s=arc-20160816; b=Pj4sOCokhaA7qvLP5EGH7fLHwFclWew8PfNNvNy4jM1BJOV1gkUBIFZd6olBfjZskq 8k1vMB97yFJAK24ZyiVG4LLoitThvyfepfA8GGhWQEK1viyYt+v68IdPcry5IOef1++E OzaFNvH5JDMfzuAM7VVyj6pxFLFOKghrl8TaPVPczuBry8eE8cMs19RItAQDclwuMBBF eFhwgOPPlPWKiSxrPbssCXuhRaAysbv6mT3cCgAyIIT4psofbX7Ceeb2dLfcA7+nevEF sfPk+Zm4LKzO7kNKxN+iCc2EwEI9DFrqk68d1+Vn+cOkcWdDDmBf5ZK6xip0/HaA6DHG FL0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=6FV0U0BZU4WM8GPOemWn/6OCF15HycDQx0qS/k3mC1E=; b=1LzEmXO+Fi1+7rhV0xyH+wKDaHnHcjw67SudR02pPUUfUiZNQIEEzYrRMDARLiW14j DI8GILJIHQGtc6tSsXHSamwG8NO5DDr9/dNxqCAIp1GNOCfv5OXsn1dtU1LJMyoWHQAG n8rqDbnhUWQSEe67zA71jJaeEWC02Uq/Ar5QuzH+VsApuA0bncV6GNqAMKb+OXbXX1iV e+ONomkTht6vJGMq3Dp5QjUtC+ShS0GPpmxUVO7tuPu4sPi39RB5xyTpswuvrvQzPtMN lS/d+AxhgJc6zik6VWLuJ1/uiULi6gRjW3SukKDV3usBpzw+KxD9sL48i+X2kDswYhaQ fd+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@eng.ucsd.edu header.s=google header.b=C0QOQFI7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 135si702506pgd.561.2018.03.10.10.41.19; Sat, 10 Mar 2018 10:41:34 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@eng.ucsd.edu header.s=google header.b=C0QOQFI7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752188AbeCJSkY (ORCPT + 99 others); Sat, 10 Mar 2018 13:40:24 -0500 Received: from mail-pg0-f65.google.com ([74.125.83.65]:35335 "EHLO mail-pg0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932358AbeCJSUl (ORCPT ); Sat, 10 Mar 2018 13:20:41 -0500 Received: by mail-pg0-f65.google.com with SMTP id l131so4839847pga.2 for ; Sat, 10 Mar 2018 10:20:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=eng.ucsd.edu; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=6FV0U0BZU4WM8GPOemWn/6OCF15HycDQx0qS/k3mC1E=; b=C0QOQFI72CKZWfD+EWYZaWk8VxtDJugHrzSI49dlTV3XFMRltqbg9ymkQyPjXoUUK/ aiwQqrtD1LY/HdyXNOPPNOGqehBzvMxmjFuDS3uPKg3uRVh5hpcwBySo7vdk88LmAN0j +ErPG1zHXmhpnO3eBHX91fHfqn7J7g6P+ioy8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=6FV0U0BZU4WM8GPOemWn/6OCF15HycDQx0qS/k3mC1E=; b=knS0hY8nk3Oo+f1z02Xk1CgmTAr/rRYHQPEGMKXiGnCE70ACQ79J94ayNZiB1DHSqk 8i3jKvepAiZGNOyYXHckEH3flXZB/NTLLzjjFBcboBcM/18L57/xu8ZF9FaUhfY9czm0 c6QX1+qF9F0ZJxcr5aqBp0Zyu0LZu+AUeOHn3gwc7WUsmJMnAHy6UUadlG9FwOQSBM2Y 6XK8PZSd5VXk7DSm1vCSOABHkWy0wgCmW1iY3QeHzal9T9KSRzoVDJv9+DRNfAeNLVvS Z7n7FbJJ/tV9tm3J+eThxnPSNvigcwOEChDGrU1Nj1YvAI00/XbR495jNNPTaukrndrh Wy5Q== X-Gm-Message-State: AElRT7EtfPsQp+VbhcDeyNWRFQcozgV00xshaZKYp5GPu1IO3skT6FYf SjTkkKqoH3n6GERsTMXUnZtAog== X-Received: by 10.167.131.86 with SMTP id z22mr2656879pfm.185.1520706040353; Sat, 10 Mar 2018 10:20:40 -0800 (PST) Received: from brienza-desktop.8.8.4.4 (andxu.ucsd.edu. [132.239.17.134]) by smtp.gmail.com with ESMTPSA id h80sm9210167pfj.181.2018.03.10.10.20.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 10 Mar 2018 10:20:39 -0800 (PST) From: Andiry Xu To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org Cc: dan.j.williams@intel.com, andy.rudoff@intel.com, coughlan@redhat.com, swanson@cs.ucsd.edu, david@fromorbit.com, jack@suse.com, swhiteho@redhat.com, miklos@szeredi.hu, andiry.xu@gmail.com, Andiry Xu Subject: [RFC v2 20/83] Pmem block allocation routines. Date: Sat, 10 Mar 2018 10:18:01 -0800 Message-Id: <1520705944-6723-21-git-send-email-jix024@eng.ucsd.edu> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1520705944-6723-1-git-send-email-jix024@eng.ucsd.edu> References: <1520705944-6723-1-git-send-email-jix024@eng.ucsd.edu> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Andiry Xu Upon a allocation request, NOVA first try the free list on current CPU. If there are not enough blocks to allocate, NOVA will go to the free list with the most free blocks. Caller can specify allocation direction: from low address or from high address. Signed-off-by: Andiry Xu --- fs/nova/balloc.c | 270 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/nova/balloc.h | 10 +++ 2 files changed, 280 insertions(+) diff --git a/fs/nova/balloc.c b/fs/nova/balloc.c index 9108721..8e99215 100644 --- a/fs/nova/balloc.c +++ b/fs/nova/balloc.c @@ -441,6 +441,276 @@ int nova_free_log_blocks(struct super_block *sb, return ret; } +static int not_enough_blocks(struct free_list *free_list, + unsigned long num_blocks, enum alloc_type atype) +{ + struct nova_range_node *first = free_list->first_node; + struct nova_range_node *last = free_list->last_node; + + if (free_list->num_free_blocks < num_blocks || !first || !last) { + nova_dbgv("%s: num_free_blocks=%ld; num_blocks=%ld; first=0x%p; last=0x%p", + __func__, free_list->num_free_blocks, num_blocks, + first, last); + return 1; + } + + return 0; +} + +/* Return how many blocks allocated */ +static long nova_alloc_blocks_in_free_list(struct super_block *sb, + struct free_list *free_list, unsigned short btype, + enum alloc_type atype, unsigned long num_blocks, + unsigned long *new_blocknr, enum nova_alloc_direction from_tail) +{ + struct rb_root *tree; + struct nova_range_node *curr, *next = NULL, *prev = NULL; + struct rb_node *temp, *next_node, *prev_node; + unsigned long curr_blocks; + bool found = 0; + unsigned long step = 0; + + if (!free_list->first_node || free_list->num_free_blocks == 0) { + nova_dbgv("%s: Can't alloc. free_list->first_node=0x%p free_list->num_free_blocks = %lu", + __func__, free_list->first_node, + free_list->num_free_blocks); + return -ENOSPC; + } + + if (atype == LOG && not_enough_blocks(free_list, num_blocks, atype)) { + nova_dbgv("%s: Can't alloc. not_enough_blocks() == true", + __func__); + return -ENOSPC; + } + + tree = &(free_list->block_free_tree); + if (from_tail == ALLOC_FROM_HEAD) + temp = &(free_list->first_node->node); + else + temp = &(free_list->last_node->node); + + while (temp) { + step++; + curr = container_of(temp, struct nova_range_node, node); + + curr_blocks = curr->range_high - curr->range_low + 1; + + if (num_blocks >= curr_blocks) { + /* Superpage allocation must succeed */ + if (btype > 0 && num_blocks > curr_blocks) + goto next; + + /* Otherwise, allocate the whole blocknode */ + if (curr == free_list->first_node) { + next_node = rb_next(temp); + if (next_node) + next = container_of(next_node, + struct nova_range_node, node); + free_list->first_node = next; + } + + if (curr == free_list->last_node) { + prev_node = rb_prev(temp); + if (prev_node) + prev = container_of(prev_node, + struct nova_range_node, node); + free_list->last_node = prev; + } + + rb_erase(&curr->node, tree); + free_list->num_blocknode--; + num_blocks = curr_blocks; + *new_blocknr = curr->range_low; + nova_free_blocknode(sb, curr); + found = 1; + break; + } + + /* Allocate partial blocknode */ + if (from_tail == ALLOC_FROM_HEAD) { + *new_blocknr = curr->range_low; + curr->range_low += num_blocks; + } else { + *new_blocknr = curr->range_high + 1 - num_blocks; + curr->range_high -= num_blocks; + } + + found = 1; + break; +next: + if (from_tail == ALLOC_FROM_HEAD) + temp = rb_next(temp); + else + temp = rb_prev(temp); + } + + if (free_list->num_free_blocks < num_blocks) { + nova_dbg("%s: free list %d has %lu free blocks, but allocated %lu blocks?\n", + __func__, free_list->index, + free_list->num_free_blocks, num_blocks); + return -ENOSPC; + } + + if (found == 1) + free_list->num_free_blocks -= num_blocks; + else { + nova_dbgv("%s: Can't alloc. found = %d", __func__, found); + return -ENOSPC; + } + + NOVA_STATS_ADD(alloc_steps, step); + + return num_blocks; +} + +/* Find out the free list with most free blocks */ +static int nova_get_candidate_free_list(struct super_block *sb) +{ + struct nova_sb_info *sbi = NOVA_SB(sb); + struct free_list *free_list; + int cpuid = 0; + int num_free_blocks = 0; + int i; + + for (i = 0; i < sbi->cpus; i++) { + free_list = nova_get_free_list(sb, i); + if (free_list->num_free_blocks > num_free_blocks) { + cpuid = i; + num_free_blocks = free_list->num_free_blocks; + } + } + + return cpuid; +} + +static int nova_new_blocks(struct super_block *sb, unsigned long *blocknr, + unsigned int num, unsigned short btype, int zero, + enum alloc_type atype, int cpuid, enum nova_alloc_direction from_tail) +{ + struct free_list *free_list; + void *bp; + unsigned long num_blocks = 0; + unsigned long new_blocknr = 0; + long ret_blocks = 0; + int retried = 0; + timing_t alloc_time; + + num_blocks = num * nova_get_numblocks(btype); + if (num_blocks == 0) { + nova_dbg_verbose("%s: num_blocks == 0", __func__); + return -EINVAL; + } + + NOVA_START_TIMING(new_blocks_t, alloc_time); + if (cpuid == ANY_CPU) + cpuid = smp_processor_id(); + +retry: + free_list = nova_get_free_list(sb, cpuid); + spin_lock(&free_list->s_lock); + + if (not_enough_blocks(free_list, num_blocks, atype)) { + nova_dbgv("%s: cpu %d, free_blocks %lu, required %lu, blocknode %lu\n", + __func__, cpuid, free_list->num_free_blocks, + num_blocks, free_list->num_blocknode); + + if (retried >= 2) + /* Allocate anyway */ + goto alloc; + + spin_unlock(&free_list->s_lock); + cpuid = nova_get_candidate_free_list(sb); + retried++; + goto retry; + } +alloc: + ret_blocks = nova_alloc_blocks_in_free_list(sb, free_list, btype, atype, + num_blocks, &new_blocknr, from_tail); + + if (ret_blocks > 0) { + if (atype == LOG) { + free_list->alloc_log_count++; + free_list->alloc_log_pages += ret_blocks; + } else if (atype == DATA) { + free_list->alloc_data_count++; + free_list->alloc_data_pages += ret_blocks; + } + } + + spin_unlock(&free_list->s_lock); + NOVA_END_TIMING(new_blocks_t, alloc_time); + + if (ret_blocks <= 0 || new_blocknr == 0) { + nova_dbg_verbose("%s: not able to allocate %d blocks. ret_blocks=%ld; new_blocknr=%lu", + __func__, num, ret_blocks, new_blocknr); + return -ENOSPC; + } + + if (zero) { + bp = nova_get_block(sb, nova_get_block_off(sb, + new_blocknr, btype)); + memset_nt(bp, 0, PAGE_SIZE * ret_blocks); + } + *blocknr = new_blocknr; + + nova_dbg_verbose("Alloc %lu NVMM blocks 0x%lx\n", ret_blocks, *blocknr); + return ret_blocks / nova_get_numblocks(btype); +} + +// Allocate data blocks. The offset for the allocated block comes back in +// blocknr. Return the number of blocks allocated. +inline int nova_new_data_blocks(struct super_block *sb, + struct nova_inode_info_header *sih, unsigned long *blocknr, + unsigned long start_blk, unsigned int num, + enum nova_alloc_init zero, int cpu, + enum nova_alloc_direction from_tail) +{ + int allocated; + timing_t alloc_time; + + NOVA_START_TIMING(new_data_blocks_t, alloc_time); + allocated = nova_new_blocks(sb, blocknr, num, + sih->i_blk_type, zero, DATA, cpu, from_tail); + NOVA_END_TIMING(new_data_blocks_t, alloc_time); + if (allocated < 0) { + nova_dbgv("FAILED: Inode %lu, start blk %lu, alloc %d data blocks from %lu to %lu\n", + sih->ino, start_blk, allocated, *blocknr, + *blocknr + allocated - 1); + } else { + nova_dbgv("Inode %lu, start blk %lu, alloc %d data blocks from %lu to %lu\n", + sih->ino, start_blk, allocated, *blocknr, + *blocknr + allocated - 1); + } + return allocated; +} + + +// Allocate log blocks. The offset for the allocated block comes back in +// blocknr. Return the number of blocks allocated. +inline int nova_new_log_blocks(struct super_block *sb, + struct nova_inode_info_header *sih, + unsigned long *blocknr, unsigned int num, + enum nova_alloc_init zero, int cpu, + enum nova_alloc_direction from_tail) +{ + int allocated; + timing_t alloc_time; + + NOVA_START_TIMING(new_log_blocks_t, alloc_time); + allocated = nova_new_blocks(sb, blocknr, num, + sih->i_blk_type, zero, LOG, cpu, from_tail); + NOVA_END_TIMING(new_log_blocks_t, alloc_time); + if (allocated < 0) { + nova_dbgv("%s: ino %lu, failed to alloc %d log blocks", + __func__, sih->ino, num); + } else { + nova_dbgv("%s: ino %lu, alloc %d of %d log blocks %lu to %lu\n", + __func__, sih->ino, allocated, num, *blocknr, + *blocknr + allocated - 1); + } + return allocated; +} + /* We do not take locks so it's inaccurate */ unsigned long nova_count_free_blocks(struct super_block *sb) { diff --git a/fs/nova/balloc.h b/fs/nova/balloc.h index 249eb72..463fbac 100644 --- a/fs/nova/balloc.h +++ b/fs/nova/balloc.h @@ -73,6 +73,16 @@ extern int nova_free_data_blocks(struct super_block *sb, struct nova_inode_info_header *sih, unsigned long blocknr, int num); extern int nova_free_log_blocks(struct super_block *sb, struct nova_inode_info_header *sih, unsigned long blocknr, int num); +extern inline int nova_new_data_blocks(struct super_block *sb, + struct nova_inode_info_header *sih, unsigned long *blocknr, + unsigned long start_blk, unsigned int num, + enum nova_alloc_init zero, int cpu, + enum nova_alloc_direction from_tail); +extern int nova_new_log_blocks(struct super_block *sb, + struct nova_inode_info_header *sih, + unsigned long *blocknr, unsigned int num, + enum nova_alloc_init zero, int cpu, + enum nova_alloc_direction from_tail); int nova_find_free_slot(struct nova_sb_info *sbi, struct rb_root *tree, unsigned long range_low, unsigned long range_high, struct nova_range_node **prev, -- 2.7.4