2018-11-08 08:25:33

by Arun KS

[permalink] [raw]
Subject: [PATCH v3 0/4] mm: convert totalram_pages, totalhigh_pages and managed pages to atomic

This series convert totalram_pages, totalhigh_pages and
zone->managed_pages to atomic variables.

The patch was comiple tested on x86(x86_64_defconfig & i386_defconfig)
on 4.20-rc1. And memory hotplug tested on arm64, but on an older version
of kernel.

totalram_pages, zone->managed_pages and totalhigh_pages updates
are protected by managed_page_count_lock, but readers never care
about it. Convert these variables to atomic to avoid readers
potentially seeing a store tear.

Main motivation was that managed_page_count_lock handling was
complicating things. It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785
It seemes better to remove the lock and convert variables
to atomic. With the change, preventing poteintial store-to-read
tearing comes as a bonus.

Changes in v3:
- Fixed kbuild test robot errors.
- Modifed changelogs to be more clear.
- EXPORT_SYMBOL for _totalram_pages and _totalhigh_pages.

Arun KS (4):
mm: reference totalram_pages and managed_pages once per function
mm: convert zone->managed_pages to atomic variable
mm: convert totalram_pages and totalhigh_pages variables to atomic
mm: Remove managed_page_count spinlock

arch/csky/mm/init.c | 4 +-
arch/powerpc/platforms/pseries/cmm.c | 10 ++--
arch/s390/mm/init.c | 2 +-
arch/um/kernel/mem.c | 3 +-
arch/x86/kernel/cpu/microcode/core.c | 5 +-
drivers/char/agp/backend.c | 4 +-
drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 2 +-
drivers/gpu/drm/i915/i915_gem.c | 2 +-
drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 4 +-
drivers/hv/hv_balloon.c | 19 +++----
drivers/md/dm-bufio.c | 2 +-
drivers/md/dm-crypt.c | 2 +-
drivers/md/dm-integrity.c | 2 +-
drivers/md/dm-stats.c | 2 +-
drivers/media/platform/mtk-vpu/mtk_vpu.c | 2 +-
drivers/misc/vmw_balloon.c | 2 +-
drivers/parisc/ccio-dma.c | 4 +-
drivers/parisc/sba_iommu.c | 4 +-
drivers/staging/android/ion/ion_system_heap.c | 2 +-
drivers/xen/xen-selfballoon.c | 6 +--
fs/ceph/super.h | 2 +-
fs/file_table.c | 7 +--
fs/fuse/inode.c | 2 +-
fs/nfs/write.c | 2 +-
fs/nfsd/nfscache.c | 2 +-
fs/ntfs/malloc.h | 2 +-
fs/proc/base.c | 2 +-
include/linux/highmem.h | 28 ++++++++++-
include/linux/mm.h | 27 +++++++++-
include/linux/mmzone.h | 15 +++---
include/linux/swap.h | 1 -
kernel/fork.c | 5 +-
kernel/kexec_core.c | 5 +-
kernel/power/snapshot.c | 2 +-
lib/show_mem.c | 2 +-
mm/highmem.c | 5 +-
mm/huge_memory.c | 2 +-
mm/kasan/quarantine.c | 2 +-
mm/memblock.c | 6 +--
mm/mm_init.c | 2 +-
mm/oom_kill.c | 2 +-
mm/page_alloc.c | 72 +++++++++++++--------------
mm/shmem.c | 7 +--
mm/slab.c | 2 +-
mm/swap.c | 2 +-
mm/util.c | 2 +-
mm/vmalloc.c | 4 +-
mm/vmstat.c | 4 +-
mm/workingset.c | 2 +-
mm/zswap.c | 4 +-
net/dccp/proto.c | 7 +--
net/decnet/dn_route.c | 2 +-
net/ipv4/tcp_metrics.c | 2 +-
net/netfilter/nf_conntrack_core.c | 7 +--
net/netfilter/xt_hashlimit.c | 5 +-
net/sctp/protocol.c | 7 +--
security/integrity/ima/ima_kexec.c | 2 +-
57 files changed, 195 insertions(+), 142 deletions(-)

--
1.9.1



2018-11-08 08:24:15

by Arun KS

[permalink] [raw]
Subject: [PATCH v3 1/4] mm: reference totalram_pages and managed_pages once per function

This patch is in preparation to a later patch which converts totalram_pages
and zone->managed_pages to atomic variables. Please note that re-reading
the value might lead to a different value and as such it could lead to
unexpected behavior. There are no known bugs as a result of the current code
but it is better to prevent from them in principle.

Signed-off-by: Arun KS <[email protected]>
Reviewed-by: Konstantin Khlebnikov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
---
arch/um/kernel/mem.c | 3 +--
arch/x86/kernel/cpu/microcode/core.c | 5 +++--
drivers/hv/hv_balloon.c | 19 ++++++++++---------
fs/file_table.c | 7 ++++---
kernel/fork.c | 5 +++--
kernel/kexec_core.c | 5 +++--
mm/page_alloc.c | 5 +++--
mm/shmem.c | 3 ++-
net/dccp/proto.c | 7 ++++---
net/netfilter/nf_conntrack_core.c | 7 ++++---
net/netfilter/xt_hashlimit.c | 5 +++--
net/sctp/protocol.c | 7 ++++---
12 files changed, 44 insertions(+), 34 deletions(-)

diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 1067469..134d3fd 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -51,8 +51,7 @@ void __init mem_init(void)

/* this will put all low memory onto the freelists */
memblock_free_all();
- max_low_pfn = totalram_pages;
- max_pfn = totalram_pages;
+ max_pfn = max_low_pfn = totalram_pages;
mem_init_print_info(NULL);
kmalloc_ok = 1;
}
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 2637ff0..99c67ca 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -434,9 +434,10 @@ static ssize_t microcode_write(struct file *file, const char __user *buf,
size_t len, loff_t *ppos)
{
ssize_t ret = -EINVAL;
+ unsigned long totalram_pgs = totalram_pages;

- if ((len >> PAGE_SHIFT) > totalram_pages) {
- pr_err("too much data (max %ld pages)\n", totalram_pages);
+ if ((len >> PAGE_SHIFT) > totalram_pgs) {
+ pr_err("too much data (max %ld pages)\n", totalram_pgs);
return ret;
}

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 4163151..cac4945 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1090,6 +1090,7 @@ static void process_info(struct hv_dynmem_device *dm, struct dm_info_msg *msg)
static unsigned long compute_balloon_floor(void)
{
unsigned long min_pages;
+ unsigned long totalram_pgs = totalram_pages;
#define MB2PAGES(mb) ((mb) << (20 - PAGE_SHIFT))
/* Simple continuous piecewiese linear function:
* max MiB -> min MiB gradient
@@ -1102,16 +1103,16 @@ static unsigned long compute_balloon_floor(void)
* 8192 744 (1/16)
* 32768 1512 (1/32)
*/
- if (totalram_pages < MB2PAGES(128))
- min_pages = MB2PAGES(8) + (totalram_pages >> 1);
- else if (totalram_pages < MB2PAGES(512))
- min_pages = MB2PAGES(40) + (totalram_pages >> 2);
- else if (totalram_pages < MB2PAGES(2048))
- min_pages = MB2PAGES(104) + (totalram_pages >> 3);
- else if (totalram_pages < MB2PAGES(8192))
- min_pages = MB2PAGES(232) + (totalram_pages >> 4);
+ if (totalram_pgs < MB2PAGES(128))
+ min_pages = MB2PAGES(8) + (totalram_pgs >> 1);
+ else if (totalram_pgs < MB2PAGES(512))
+ min_pages = MB2PAGES(40) + (totalram_pgs >> 2);
+ else if (totalram_pgs < MB2PAGES(2048))
+ min_pages = MB2PAGES(104) + (totalram_pgs >> 3);
+ else if (totalram_pgs < MB2PAGES(8192))
+ min_pages = MB2PAGES(232) + (totalram_pgs >> 4);
else
- min_pages = MB2PAGES(488) + (totalram_pages >> 5);
+ min_pages = MB2PAGES(488) + (totalram_pgs >> 5);
#undef MB2PAGES
return min_pages;
}
diff --git a/fs/file_table.c b/fs/file_table.c
index e49af4c..6e3c088 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -380,10 +380,11 @@ void __init files_init(void)
void __init files_maxfiles_init(void)
{
unsigned long n;
- unsigned long memreserve = (totalram_pages - nr_free_pages()) * 3/2;
+ unsigned long totalram_pgs = totalram_pages;
+ unsigned long memreserve = (totalram_pgs - nr_free_pages()) * 3/2;

- memreserve = min(memreserve, totalram_pages - 1);
- n = ((totalram_pages - memreserve) * (PAGE_SIZE / 1024)) / 10;
+ memreserve = min(memreserve, totalram_pgs - 1);
+ n = ((totalram_pgs - memreserve) * (PAGE_SIZE / 1024)) / 10;

files_stat.max_files = max_t(unsigned long, n, NR_FILE);
}
diff --git a/kernel/fork.c b/kernel/fork.c
index 07cddff..7823f31 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -739,15 +739,16 @@ void __init __weak arch_task_cache_init(void) { }
static void set_max_threads(unsigned int max_threads_suggested)
{
u64 threads;
+ unsigned long totalram_pgs = totalram_pages;

/*
* The number of threads shall be limited such that the thread
* structures may only consume a small part of the available memory.
*/
- if (fls64(totalram_pages) + fls64(PAGE_SIZE) > 64)
+ if (fls64(totalram_pgs) + fls64(PAGE_SIZE) > 64)
threads = MAX_THREADS;
else
- threads = div64_u64((u64) totalram_pages * (u64) PAGE_SIZE,
+ threads = div64_u64((u64) totalram_pgs * (u64) PAGE_SIZE,
(u64) THREAD_SIZE * 8UL);

if (threads > max_threads_suggested)
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 86ef06d..dff217c 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -152,6 +152,7 @@ int sanity_check_segment_list(struct kimage *image)
int i;
unsigned long nr_segments = image->nr_segments;
unsigned long total_pages = 0;
+ unsigned long totalram_pgs = totalram_pages;

/*
* Verify we have good destination addresses. The caller is
@@ -217,13 +218,13 @@ int sanity_check_segment_list(struct kimage *image)
* wasted allocating pages, which can cause a soft lockup.
*/
for (i = 0; i < nr_segments; i++) {
- if (PAGE_COUNT(image->segment[i].memsz) > totalram_pages / 2)
+ if (PAGE_COUNT(image->segment[i].memsz) > totalram_pgs / 2)
return -EINVAL;

total_pages += PAGE_COUNT(image->segment[i].memsz);
}

- if (total_pages > totalram_pages / 2)
+ if (total_pages > totalram_pgs / 2)
return -EINVAL;

/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a919ba5..173312b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7245,6 +7245,7 @@ static void calculate_totalreserve_pages(void)
for (i = 0; i < MAX_NR_ZONES; i++) {
struct zone *zone = pgdat->node_zones + i;
long max = 0;
+ unsigned long managed_pages = zone->managed_pages;

/* Find valid and maximum lowmem_reserve in the zone */
for (j = i; j < MAX_NR_ZONES; j++) {
@@ -7255,8 +7256,8 @@ static void calculate_totalreserve_pages(void)
/* we treat the high watermark as reserved pages. */
max += high_wmark_pages(zone);

- if (max > zone->managed_pages)
- max = zone->managed_pages;
+ if (max > managed_pages)
+ max = managed_pages;

pgdat->totalreserve_pages += max;

diff --git a/mm/shmem.c b/mm/shmem.c
index ea26d7a..6b91eab 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -114,7 +114,8 @@ static unsigned long shmem_default_max_blocks(void)

static unsigned long shmem_default_max_inodes(void)
{
- return min(totalram_pages - totalhigh_pages, totalram_pages / 2);
+ unsigned long totalram_pgs = totalram_pages;
+ return min(totalram_pgs - totalhigh_pages, totalram_pgs / 2);
}
#endif

diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 43733ac..f27daa1 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -1131,6 +1131,7 @@ static inline void dccp_mib_exit(void)
static int __init dccp_init(void)
{
unsigned long goal;
+ unsigned long totalram_pgs = totalram_pages;
int ehash_order, bhash_order, i;
int rc;

@@ -1154,10 +1155,10 @@ static int __init dccp_init(void)
*
* The methodology is similar to that of the buffer cache.
*/
- if (totalram_pages >= (128 * 1024))
- goal = totalram_pages >> (21 - PAGE_SHIFT);
+ if (totalram_pgs >= (128 * 1024))
+ goal = totalram_pgs >> (21 - PAGE_SHIFT);
else
- goal = totalram_pages >> (23 - PAGE_SHIFT);
+ goal = totalram_pgs >> (23 - PAGE_SHIFT);

if (thash_entries)
goal = (thash_entries *
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index e92e749..cd233f6 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -2251,6 +2251,7 @@ static __always_inline unsigned int total_extension_size(void)

int nf_conntrack_init_start(void)
{
+ unsigned long totalram_pgs = totalram_pages;
int max_factor = 8;
int ret = -ENOMEM;
int i;
@@ -2270,11 +2271,11 @@ int nf_conntrack_init_start(void)
* >= 4GB machines have 65536 buckets.
*/
nf_conntrack_htable_size
- = (((totalram_pages << PAGE_SHIFT) / 16384)
+ = (((totalram_pgs << PAGE_SHIFT) / 16384)
/ sizeof(struct hlist_head));
- if (totalram_pages > (4 * (1024 * 1024 * 1024 / PAGE_SIZE)))
+ if (totalram_pgs > (4 * (1024 * 1024 * 1024 / PAGE_SIZE)))
nf_conntrack_htable_size = 65536;
- else if (totalram_pages > (1024 * 1024 * 1024 / PAGE_SIZE))
+ else if (totalram_pgs > (1024 * 1024 * 1024 / PAGE_SIZE))
nf_conntrack_htable_size = 16384;
if (nf_conntrack_htable_size < 32)
nf_conntrack_htable_size = 32;
diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 3e7d259..6cb9a74 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -274,14 +274,15 @@ static int htable_create(struct net *net, struct hashlimit_cfg3 *cfg,
struct xt_hashlimit_htable *hinfo;
const struct seq_operations *ops;
unsigned int size, i;
+ unsigned long totalram_pgs = totalram_pages;
int ret;

if (cfg->size) {
size = cfg->size;
} else {
- size = (totalram_pages << PAGE_SHIFT) / 16384 /
+ size = (totalram_pgs << PAGE_SHIFT) / 16384 /
sizeof(struct hlist_head);
- if (totalram_pages > 1024 * 1024 * 1024 / PAGE_SIZE)
+ if (totalram_pgs > 1024 * 1024 * 1024 / PAGE_SIZE)
size = 8192;
if (size < 16)
size = 16;
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 9b277bd..3bdade2 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1368,6 +1368,7 @@ static __init int sctp_init(void)
int status = -EINVAL;
unsigned long goal;
unsigned long limit;
+ unsigned long totalram_pgs;
int max_share;
int order;
int num_entries;
@@ -1426,10 +1427,10 @@ static __init int sctp_init(void)
* The methodology is similar to that of the tcp hash tables.
* Though not identical. Start by getting a goal size
*/
- if (totalram_pages >= (128 * 1024))
- goal = totalram_pages >> (22 - PAGE_SHIFT);
+ if (totalram_pgs >= (128 * 1024))
+ goal = totalram_pgs >> (22 - PAGE_SHIFT);
else
- goal = totalram_pages >> (24 - PAGE_SHIFT);
+ goal = totalram_pgs >> (24 - PAGE_SHIFT);

/* Then compute the page order for said goal */
order = get_order(goal);
--
1.9.1


2018-11-08 08:24:19

by Arun KS

[permalink] [raw]
Subject: [PATCH v3 4/4] mm: Remove managed_page_count spinlock

Now totalram_pages and managed_pages are atomic varibles. No need
of managed_page_count spinlock.

Signed-off-by: Arun KS <[email protected]>
Reviewed-by: Konstantin Khlebnikov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
include/linux/mmzone.h | 6 ------
mm/page_alloc.c | 5 -----
2 files changed, 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index e73dc31..c71b4d9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -428,12 +428,6 @@ struct zone {
* Write access to present_pages at runtime should be protected by
* mem_hotplug_begin/end(). Any reader who can't tolerant drift of
* present_pages should get_online_mems() to get a stable value.
- *
- * Read access to managed_pages should be safe because it's unsigned
- * long. Write access to zone->managed_pages and totalram_pages are
- * protected by managed_page_count_lock at runtime. Idealy only
- * adjust_managed_page_count() should be used instead of directly
- * touching zone->managed_pages and totalram_pages.
*/
atomic_long_t managed_pages;
unsigned long spanned_pages;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f8b64cc..26c5e14 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -122,9 +122,6 @@
};
EXPORT_SYMBOL(node_states);

-/* Protect totalram_pages and zone->managed_pages */
-static DEFINE_SPINLOCK(managed_page_count_lock);
-
atomic_long_t _totalram_pages __read_mostly;
EXPORT_SYMBOL(_totalram_pages);
unsigned long totalreserve_pages __read_mostly;
@@ -7065,14 +7062,12 @@ static int __init cmdline_parse_movablecore(char *p)

void adjust_managed_page_count(struct page *page, long count)
{
- spin_lock(&managed_page_count_lock);
atomic_long_add(count, &page_zone(page)->managed_pages);
totalram_pages_add(count);
#ifdef CONFIG_HIGHMEM
if (PageHighMem(page))
totalhigh_pages_add(count);
#endif
- spin_unlock(&managed_page_count_lock);
}
EXPORT_SYMBOL(adjust_managed_page_count);

--
1.9.1


2018-11-08 08:24:25

by Arun KS

[permalink] [raw]
Subject: [PATCH v3 3/4] mm: convert totalram_pages and totalhigh_pages variables to atomic

totalram_pages and totalhigh_pages are made static inline function.

Suggested-by: Michal Hocko <[email protected]>
Suggested-by: Vlastimil Babka <[email protected]>
Signed-off-by: Arun KS <[email protected]>
Reviewed-by: Konstantin Khlebnikov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
---
coccinelle script to make most of the changes,

@@
declarer name EXPORT_SYMBOL;
symbol totalram_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalram_pages);
|
- totalram_pages = e
+ totalram_pages_set(e)
|
- totalram_pages += e
+ totalram_pages_add(e)
|
- totalram_pages++
+ totalram_pages_inc()
|
- totalram_pages--
+ totalram_pages_dec()
|
- totalram_pages
+ totalram_pages()
)

@@
symbol totalhigh_pages;
expression e;
@@
(
EXPORT_SYMBOL(totalhigh_pages);
|
- totalhigh_pages = e
+ totalhigh_pages_set(e)
|
- totalhigh_pages += e
+ totalhigh_pages_add(e)
|
- totalhigh_pages++
+ totalhigh_pages_inc()
|
- totalhigh_pages--
+ totalhigh_pages_dec()
|
- totalhigh_pages
+ totalhigh_pages()
)

Manaually apply all changes of following files,

include/linux/highmem.h
include/linux/mm.h
include/linux/swap.h
mm/highmem.c

and for mm/page_alloc.c mannualy apply only below changes,

#include <linux/stddef.h>
#include <linux/mm.h>
+#include <linux/highmem.h>
#include <linux/swap.h>
#include <linux/interrupt.h>
#include <linux/pagemap.h>

/* Protect totalram_pages and zone->managed_pages */
static DEFINE_SPINLOCK(managed_page_count_lock);

-unsigned long totalram_pages __read_mostly;
+atomic_long_t _totalram_pages __read_mostly;
unsigned long totalreserve_pages __read_mostly;
unsigned long totalcma_pages __read_mostly;

---
---
arch/csky/mm/init.c | 4 ++--
arch/powerpc/platforms/pseries/cmm.c | 10 +++++-----
arch/s390/mm/init.c | 2 +-
arch/um/kernel/mem.c | 2 +-
arch/x86/kernel/cpu/microcode/core.c | 2 +-
drivers/char/agp/backend.c | 4 ++--
drivers/gpu/drm/i915/i915_gem.c | 2 +-
drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 4 ++--
drivers/hv/hv_balloon.c | 2 +-
drivers/md/dm-bufio.c | 2 +-
drivers/md/dm-crypt.c | 2 +-
drivers/md/dm-integrity.c | 2 +-
drivers/md/dm-stats.c | 2 +-
drivers/media/platform/mtk-vpu/mtk_vpu.c | 2 +-
drivers/misc/vmw_balloon.c | 2 +-
drivers/parisc/ccio-dma.c | 4 ++--
drivers/parisc/sba_iommu.c | 4 ++--
drivers/staging/android/ion/ion_system_heap.c | 2 +-
drivers/xen/xen-selfballoon.c | 6 +++---
fs/ceph/super.h | 2 +-
fs/file_table.c | 2 +-
fs/fuse/inode.c | 2 +-
fs/nfs/write.c | 2 +-
fs/nfsd/nfscache.c | 2 +-
fs/ntfs/malloc.h | 2 +-
fs/proc/base.c | 2 +-
include/linux/highmem.h | 28 +++++++++++++++++++++++++--
include/linux/mm.h | 27 +++++++++++++++++++++++++-
include/linux/swap.h | 1 -
kernel/fork.c | 2 +-
kernel/kexec_core.c | 2 +-
kernel/power/snapshot.c | 2 +-
mm/highmem.c | 5 ++---
mm/huge_memory.c | 2 +-
mm/kasan/quarantine.c | 2 +-
mm/memblock.c | 4 ++--
mm/mm_init.c | 2 +-
mm/oom_kill.c | 2 +-
mm/page_alloc.c | 20 ++++++++++---------
mm/shmem.c | 8 ++++----
mm/slab.c | 2 +-
mm/swap.c | 2 +-
mm/util.c | 2 +-
mm/vmalloc.c | 4 ++--
mm/workingset.c | 2 +-
mm/zswap.c | 4 ++--
net/dccp/proto.c | 2 +-
net/decnet/dn_route.c | 2 +-
net/ipv4/tcp_metrics.c | 2 +-
net/netfilter/nf_conntrack_core.c | 2 +-
net/netfilter/xt_hashlimit.c | 2 +-
security/integrity/ima/ima_kexec.c | 2 +-
52 files changed, 129 insertions(+), 80 deletions(-)

diff --git a/arch/csky/mm/init.c b/arch/csky/mm/init.c
index dc07c07..66e5970 100644
--- a/arch/csky/mm/init.c
+++ b/arch/csky/mm/init.c
@@ -71,7 +71,7 @@ void free_initrd_mem(unsigned long start, unsigned long end)
ClearPageReserved(virt_to_page(start));
init_page_count(virt_to_page(start));
free_page(start);
- totalram_pages++;
+ totalram_pages_inc();
}
}
#endif
@@ -88,7 +88,7 @@ void free_initmem(void)
ClearPageReserved(virt_to_page(addr));
init_page_count(virt_to_page(addr));
free_page(addr);
- totalram_pages++;
+ totalram_pages_inc();
addr += PAGE_SIZE;
}

diff --git a/arch/powerpc/platforms/pseries/cmm.c b/arch/powerpc/platforms/pseries/cmm.c
index 25427a4..e8d63a6 100644
--- a/arch/powerpc/platforms/pseries/cmm.c
+++ b/arch/powerpc/platforms/pseries/cmm.c
@@ -208,7 +208,7 @@ static long cmm_alloc_pages(long nr)

pa->page[pa->index++] = addr;
loaned_pages++;
- totalram_pages--;
+ totalram_pages_dec();
spin_unlock(&cmm_lock);
nr--;
}
@@ -247,7 +247,7 @@ static long cmm_free_pages(long nr)
free_page(addr);
loaned_pages--;
nr--;
- totalram_pages++;
+ totalram_pages_inc();
}
spin_unlock(&cmm_lock);
cmm_dbg("End request with %ld pages unfulfilled\n", nr);
@@ -291,7 +291,7 @@ static void cmm_get_mpp(void)
int rc;
struct hvcall_mpp_data mpp_data;
signed long active_pages_target, page_loan_request, target;
- signed long total_pages = totalram_pages + loaned_pages;
+ signed long total_pages = totalram_pages() + loaned_pages;
signed long min_mem_pages = (min_mem_mb * 1024 * 1024) / PAGE_SIZE;

rc = h_get_mpp(&mpp_data);
@@ -322,7 +322,7 @@ static void cmm_get_mpp(void)

cmm_dbg("delta = %ld, loaned = %lu, target = %lu, oom = %lu, totalram = %lu\n",
page_loan_request, loaned_pages, loaned_pages_target,
- oom_freed_pages, totalram_pages);
+ oom_freed_pages, totalram_pages());
}

static struct notifier_block cmm_oom_nb = {
@@ -581,7 +581,7 @@ static int cmm_mem_going_offline(void *arg)
free_page(pa_curr->page[idx]);
freed++;
loaned_pages--;
- totalram_pages++;
+ totalram_pages_inc();
pa_curr->page[idx] = pa_last->page[--pa_last->index];
if (pa_last->index == 0) {
if (pa_curr == pa_last)
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 76d0708..5038819 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -59,7 +59,7 @@ static void __init setup_zero_pages(void)
order = 7;

/* Limit number of empty zero pages for small memory sizes */
- while (order > 2 && (totalram_pages >> 10) < (1UL << order))
+ while (order > 2 && (totalram_pages() >> 10) < (1UL << order))
order--;

empty_zero_page = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 134d3fd..64b62a8 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -51,7 +51,7 @@ void __init mem_init(void)

/* this will put all low memory onto the freelists */
memblock_free_all();
- max_pfn = max_low_pfn = totalram_pages;
+ max_pfn = max_low_pfn = totalram_pages();
mem_init_print_info(NULL);
kmalloc_ok = 1;
}
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 99c67ca..8594641 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -434,7 +434,7 @@ static ssize_t microcode_write(struct file *file, const char __user *buf,
size_t len, loff_t *ppos)
{
ssize_t ret = -EINVAL;
- unsigned long totalram_pgs = totalram_pages;
+ unsigned long totalram_pgs = totalram_pages();

if ((len >> PAGE_SHIFT) > totalram_pgs) {
pr_err("too much data (max %ld pages)\n", totalram_pgs);
diff --git a/drivers/char/agp/backend.c b/drivers/char/agp/backend.c
index 38ffb28..004a3ce 100644
--- a/drivers/char/agp/backend.c
+++ b/drivers/char/agp/backend.c
@@ -115,9 +115,9 @@ static int agp_find_max(void)
long memory, index, result;

#if PAGE_SHIFT < 20
- memory = totalram_pages >> (20 - PAGE_SHIFT);
+ memory = totalram_pages() >> (20 - PAGE_SHIFT);
#else
- memory = totalram_pages << (PAGE_SHIFT - 20);
+ memory = totalram_pages() << (PAGE_SHIFT - 20);
#endif
index = 1;

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0c8aa57..6ed0e75 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2539,7 +2539,7 @@ static int i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
* If there's no chance of allocating enough pages for the whole
* object, bail early.
*/
- if (page_count > totalram_pages)
+ if (page_count > totalram_pages())
return -ENOMEM;

st = kmalloc(sizeof(*st), GFP_KERNEL);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 8e2e269..91a8fa4 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -170,7 +170,7 @@ static int igt_ppgtt_alloc(void *arg)
* This should ensure that we do not run into the oomkiller during
* the test and take down the machine wilfully.
*/
- limit = totalram_pages << PAGE_SHIFT;
+ limit = totalram_pages() << PAGE_SHIFT;
limit = min(ppgtt->vm.total, limit);

/* Check we can allocate the entire range */
@@ -1244,7 +1244,7 @@ static int exercise_mock(struct drm_i915_private *i915,
u64 hole_start, u64 hole_end,
unsigned long end_time))
{
- const u64 limit = totalram_pages << PAGE_SHIFT;
+ const u64 limit = totalram_pages() << PAGE_SHIFT;
struct i915_gem_context *ctx;
struct i915_hw_ppgtt *ppgtt;
IGT_TIMEOUT(end_time);
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index cac4945..99bd058 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1090,7 +1090,7 @@ static void process_info(struct hv_dynmem_device *dm, struct dm_info_msg *msg)
static unsigned long compute_balloon_floor(void)
{
unsigned long min_pages;
- unsigned long totalram_pgs = totalram_pages;
+ unsigned long totalram_pgs = totalram_pages();
#define MB2PAGES(mb) ((mb) << (20 - PAGE_SHIFT))
/* Simple continuous piecewiese linear function:
* max MiB -> min MiB gradient
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index dc385b7..8b0b628 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -1887,7 +1887,7 @@ static int __init dm_bufio_init(void)
dm_bufio_allocated_vmalloc = 0;
dm_bufio_current_allocated = 0;

- mem = (__u64)mult_frac(totalram_pages - totalhigh_pages,
+ mem = (__u64)mult_frac(totalram_pages() - totalhigh_pages(),
DM_BUFIO_MEMORY_PERCENT, 100) << PAGE_SHIFT;

if (mem > ULONG_MAX)
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index b8eec51..f3f2ac0 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -2158,7 +2158,7 @@ static int crypt_wipe_key(struct crypt_config *cc)

static void crypt_calculate_pages_per_client(void)
{
- unsigned long pages = (totalram_pages - totalhigh_pages) * DM_CRYPT_MEMORY_PERCENT / 100;
+ unsigned long pages = (totalram_pages() - totalhigh_pages()) * DM_CRYPT_MEMORY_PERCENT / 100;

if (!dm_crypt_clients_n)
return;
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index bb3096b..c12fa01 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -2843,7 +2843,7 @@ static int create_journal(struct dm_integrity_c *ic, char **error)
journal_pages = roundup((__u64)ic->journal_sections * ic->journal_section_sectors,
PAGE_SIZE >> SECTOR_SHIFT) >> (PAGE_SHIFT - SECTOR_SHIFT);
journal_desc_size = journal_pages * sizeof(struct page_list);
- if (journal_pages >= totalram_pages - totalhigh_pages || journal_desc_size > ULONG_MAX) {
+ if (journal_pages >= totalram_pages() - totalhigh_pages() || journal_desc_size > ULONG_MAX) {
*error = "Journal doesn't fit into memory";
r = -ENOMEM;
goto bad;
diff --git a/drivers/md/dm-stats.c b/drivers/md/dm-stats.c
index 21de30b..45b92a3 100644
--- a/drivers/md/dm-stats.c
+++ b/drivers/md/dm-stats.c
@@ -85,7 +85,7 @@ static bool __check_shared_memory(size_t alloc_size)
a = shared_memory_amount + alloc_size;
if (a < shared_memory_amount)
return false;
- if (a >> PAGE_SHIFT > totalram_pages / DM_STATS_MEMORY_FACTOR)
+ if (a >> PAGE_SHIFT > totalram_pages() / DM_STATS_MEMORY_FACTOR)
return false;
#ifdef CONFIG_MMU
if (a > (VMALLOC_END - VMALLOC_START) / DM_STATS_VMALLOC_FACTOR)
diff --git a/drivers/media/platform/mtk-vpu/mtk_vpu.c b/drivers/media/platform/mtk-vpu/mtk_vpu.c
index 616f78b..b660249 100644
--- a/drivers/media/platform/mtk-vpu/mtk_vpu.c
+++ b/drivers/media/platform/mtk-vpu/mtk_vpu.c
@@ -855,7 +855,7 @@ static int mtk_vpu_probe(struct platform_device *pdev)
/* Set PTCM to 96K and DTCM to 32K */
vpu_cfg_writel(vpu, 0x2, VPU_TCM_CFG);

- vpu->enable_4GB = !!(totalram_pages > (SZ_2G >> PAGE_SHIFT));
+ vpu->enable_4GB = !!(totalram_pages() > (SZ_2G >> PAGE_SHIFT));
dev_info(dev, "4GB mode %u\n", vpu->enable_4GB);

if (vpu->enable_4GB) {
diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c
index 9b0b3fa..e6126a4 100644
--- a/drivers/misc/vmw_balloon.c
+++ b/drivers/misc/vmw_balloon.c
@@ -570,7 +570,7 @@ static int vmballoon_send_get_target(struct vmballoon *b)
unsigned long status;
unsigned long limit;

- limit = totalram_pages;
+ limit = totalram_pages();

/* Ensure limit fits in 32-bits */
if (limit != (u32)limit)
diff --git a/drivers/parisc/ccio-dma.c b/drivers/parisc/ccio-dma.c
index 701a7d6..358e380 100644
--- a/drivers/parisc/ccio-dma.c
+++ b/drivers/parisc/ccio-dma.c
@@ -1251,7 +1251,7 @@ void __init ccio_cujo20_fixup(struct parisc_device *cujo, u32 iovp)
** Hot-Plug/Removal of PCI cards. (aka PCI OLARD).
*/

- iova_space_size = (u32) (totalram_pages / count_parisc_driver(&ccio_driver));
+ iova_space_size = (u32) (totalram_pages() / count_parisc_driver(&ccio_driver));

/* limit IOVA space size to 1MB-1GB */

@@ -1290,7 +1290,7 @@ void __init ccio_cujo20_fixup(struct parisc_device *cujo, u32 iovp)

DBG_INIT("%s() hpa 0x%p mem %luMB IOV %dMB (%d bits)\n",
__func__, ioc->ioc_regs,
- (unsigned long) totalram_pages >> (20 - PAGE_SHIFT),
+ (unsigned long) totalram_pages() >> (20 - PAGE_SHIFT),
iova_space_size>>20,
iov_order + PAGE_SHIFT);

diff --git a/drivers/parisc/sba_iommu.c b/drivers/parisc/sba_iommu.c
index c1e599a..e065594 100644
--- a/drivers/parisc/sba_iommu.c
+++ b/drivers/parisc/sba_iommu.c
@@ -1414,7 +1414,7 @@ static int setup_ibase_imask_callback(struct device *dev, void *data)
** for DMA hints - ergo only 30 bits max.
*/

- iova_space_size = (u32) (totalram_pages/global_ioc_cnt);
+ iova_space_size = (u32) (totalram_pages()/global_ioc_cnt);

/* limit IOVA space size to 1MB-1GB */
if (iova_space_size < (1 << (20 - PAGE_SHIFT))) {
@@ -1439,7 +1439,7 @@ static int setup_ibase_imask_callback(struct device *dev, void *data)
DBG_INIT("%s() hpa 0x%lx mem %ldMB IOV %dMB (%d bits)\n",
__func__,
ioc->ioc_hpa,
- (unsigned long) totalram_pages >> (20 - PAGE_SHIFT),
+ (unsigned long) totalram_pages() >> (20 - PAGE_SHIFT),
iova_space_size>>20,
iov_order + PAGE_SHIFT);

diff --git a/drivers/staging/android/ion/ion_system_heap.c b/drivers/staging/android/ion/ion_system_heap.c
index 548bb02..6cb0eeb 100644
--- a/drivers/staging/android/ion/ion_system_heap.c
+++ b/drivers/staging/android/ion/ion_system_heap.c
@@ -110,7 +110,7 @@ static int ion_system_heap_allocate(struct ion_heap *heap,
unsigned long size_remaining = PAGE_ALIGN(size);
unsigned int max_order = orders[0];

- if (size / PAGE_SIZE > totalram_pages / 2)
+ if (size / PAGE_SIZE > totalram_pages() / 2)
return -ENOMEM;

INIT_LIST_HEAD(&pages);
diff --git a/drivers/xen/xen-selfballoon.c b/drivers/xen/xen-selfballoon.c
index 5165aa8..246f612 100644
--- a/drivers/xen/xen-selfballoon.c
+++ b/drivers/xen/xen-selfballoon.c
@@ -189,7 +189,7 @@ static void selfballoon_process(struct work_struct *work)
bool reset_timer = false;

if (xen_selfballooning_enabled) {
- cur_pages = totalram_pages;
+ cur_pages = totalram_pages();
tgt_pages = cur_pages; /* default is no change */
goal_pages = vm_memory_committed() +
totalreserve_pages +
@@ -227,7 +227,7 @@ static void selfballoon_process(struct work_struct *work)
if (tgt_pages < floor_pages)
tgt_pages = floor_pages;
balloon_set_new_target(tgt_pages +
- balloon_stats.current_pages - totalram_pages);
+ balloon_stats.current_pages - totalram_pages());
reset_timer = true;
}
#ifdef CONFIG_FRONTSWAP
@@ -569,7 +569,7 @@ int xen_selfballoon_init(bool use_selfballooning, bool use_frontswap_selfshrink)
* much more reliably and response faster in some cases.
*/
if (!selfballoon_reserved_mb) {
- reserve_pages = totalram_pages / 10;
+ reserve_pages = totalram_pages() / 10;
selfballoon_reserved_mb = PAGES2MB(reserve_pages);
}
schedule_delayed_work(&selfballoon_worker, selfballoon_interval * HZ);
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index c005a54..9a2d861 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -808,7 +808,7 @@ static inline int default_congestion_kb(void)
* This allows larger machines to have larger/more transfers.
* Limit the default to 256M
*/
- congestion_kb = (16*int_sqrt(totalram_pages)) << (PAGE_SHIFT-10);
+ congestion_kb = (16*int_sqrt(totalram_pages())) << (PAGE_SHIFT-10);
if (congestion_kb > 256*1024)
congestion_kb = 256*1024;

diff --git a/fs/file_table.c b/fs/file_table.c
index 6e3c088..ee1bb23 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -380,7 +380,7 @@ void __init files_init(void)
void __init files_maxfiles_init(void)
{
unsigned long n;
- unsigned long totalram_pgs = totalram_pages;
+ unsigned long totalram_pgs = totalram_pages();
unsigned long memreserve = (totalram_pgs - nr_free_pages()) * 3/2;

memreserve = min(memreserve, totalram_pgs - 1);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 0b94b23..2121e71 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -824,7 +824,7 @@ static struct dentry *fuse_get_parent(struct dentry *child)
static void sanitize_global_limit(unsigned *limit)
{
if (*limit == 0)
- *limit = ((totalram_pages << PAGE_SHIFT) >> 13) /
+ *limit = ((totalram_pages() << PAGE_SHIFT) >> 13) /
sizeof(struct fuse_req);

if (*limit >= 1 << 16)
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 586726a..4f15665 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -2121,7 +2121,7 @@ int __init nfs_init_writepagecache(void)
* This allows larger machines to have larger/more transfers.
* Limit the default to 256M
*/
- nfs_congestion_kb = (16*int_sqrt(totalram_pages)) << (PAGE_SHIFT-10);
+ nfs_congestion_kb = (16*int_sqrt(totalram_pages())) << (PAGE_SHIFT-10);
if (nfs_congestion_kb > 256*1024)
nfs_congestion_kb = 256*1024;

diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index e2fe0e9..da52b59 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -99,7 +99,7 @@ static unsigned long nfsd_reply_cache_scan(struct shrinker *shrink,
nfsd_cache_size_limit(void)
{
unsigned int limit;
- unsigned long low_pages = totalram_pages - totalhigh_pages;
+ unsigned long low_pages = totalram_pages() - totalhigh_pages();

limit = (16 * int_sqrt(low_pages)) << (PAGE_SHIFT-10);
return min_t(unsigned int, limit, 256*1024);
diff --git a/fs/ntfs/malloc.h b/fs/ntfs/malloc.h
index ab172e5..5becc8a 100644
--- a/fs/ntfs/malloc.h
+++ b/fs/ntfs/malloc.h
@@ -47,7 +47,7 @@ static inline void *__ntfs_malloc(unsigned long size, gfp_t gfp_mask)
return kmalloc(PAGE_SIZE, gfp_mask & ~__GFP_HIGHMEM);
/* return (void *)__get_free_page(gfp_mask); */
}
- if (likely((size >> PAGE_SHIFT) < totalram_pages))
+ if (likely((size >> PAGE_SHIFT) < totalram_pages()))
return __vmalloc(size, gfp_mask, PAGE_KERNEL);
return NULL;
}
diff --git a/fs/proc/base.c b/fs/proc/base.c
index ce34654..d7fd1ca 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -530,7 +530,7 @@ static ssize_t lstats_write(struct file *file, const char __user *buf,
static int proc_oom_score(struct seq_file *m, struct pid_namespace *ns,
struct pid *pid, struct task_struct *task)
{
- unsigned long totalpages = totalram_pages + total_swap_pages;
+ unsigned long totalpages = totalram_pages() + total_swap_pages;
unsigned long points = 0;

points = oom_badness(task, NULL, NULL, totalpages) *
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 0690679..cea3a01 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -36,7 +36,31 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)

/* declarations for linux/mm/highmem.c */
unsigned int nr_free_highpages(void);
-extern unsigned long totalhigh_pages;
+extern atomic_long_t _totalhigh_pages;
+static inline unsigned long totalhigh_pages(void)
+{
+ return (unsigned long)atomic_long_read(&_totalhigh_pages);
+}
+
+static inline void totalhigh_pages_inc(void)
+{
+ atomic_long_inc(&_totalhigh_pages);
+}
+
+static inline void totalhigh_pages_dec(void)
+{
+ atomic_long_dec(&_totalhigh_pages);
+}
+
+static inline void totalhigh_pages_add(long count)
+{
+ atomic_long_add(count, &_totalhigh_pages);
+}
+
+static inline void totalhigh_pages_set(long val)
+{
+ atomic_long_set(&_totalhigh_pages, val);
+}

void kmap_flush_unused(void);

@@ -51,7 +75,7 @@ static inline struct page *kmap_to_page(void *addr)
return virt_to_page(addr);
}

-#define totalhigh_pages 0UL
+static inline unsigned long totalhigh_pages(void) { return 0UL; }

#ifndef ARCH_HAS_KMAP
static inline void *kmap(struct page *page)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index fcf9cc9..d2c1646 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -48,7 +48,32 @@ static inline void set_max_mapnr(unsigned long limit)
static inline void set_max_mapnr(unsigned long limit) { }
#endif

-extern unsigned long totalram_pages;
+extern atomic_long_t _totalram_pages;
+static inline unsigned long totalram_pages(void)
+{
+ return (unsigned long)atomic_long_read(&_totalram_pages);
+}
+
+static inline void totalram_pages_inc(void)
+{
+ atomic_long_inc(&_totalram_pages);
+}
+
+static inline void totalram_pages_dec(void)
+{
+ atomic_long_dec(&_totalram_pages);
+}
+
+static inline void totalram_pages_add(long count)
+{
+ atomic_long_add(count, &_totalram_pages);
+}
+
+static inline void totalram_pages_set(long val)
+{
+ atomic_long_set(&_totalram_pages, val);
+}
+
extern void * high_memory;
extern int page_cluster;

diff --git a/include/linux/swap.h b/include/linux/swap.h
index d8a07a4..ea66108 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -308,7 +308,6 @@ struct vma_swap_readahead {
} while (0)

/* linux/mm/page_alloc.c */
-extern unsigned long totalram_pages;
extern unsigned long totalreserve_pages;
extern unsigned long nr_free_buffer_pages(void);
extern unsigned long nr_free_pagecache_pages(void);
diff --git a/kernel/fork.c b/kernel/fork.c
index 7823f31..ba2c517 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -739,7 +739,7 @@ void __init __weak arch_task_cache_init(void) { }
static void set_max_threads(unsigned int max_threads_suggested)
{
u64 threads;
- unsigned long totalram_pgs = totalram_pages;
+ unsigned long totalram_pgs = totalram_pages();

/*
* The number of threads shall be limited such that the thread
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index dff217c..7c50f56 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -152,7 +152,7 @@ int sanity_check_segment_list(struct kimage *image)
int i;
unsigned long nr_segments = image->nr_segments;
unsigned long total_pages = 0;
- unsigned long totalram_pgs = totalram_pages;
+ unsigned long totalram_pgs = totalram_pages();

/*
* Verify we have good destination addresses. The caller is
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index b0308a2..640b203 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -105,7 +105,7 @@ void __init hibernate_reserved_size_init(void)

void __init hibernate_image_size_init(void)
{
- image_size = ((totalram_pages * 2) / 5) * PAGE_SIZE;
+ image_size = ((totalram_pages() * 2) / 5) * PAGE_SIZE;
}

/*
diff --git a/mm/highmem.c b/mm/highmem.c
index 59db322..107b10f 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -105,9 +105,8 @@ static inline wait_queue_head_t *get_pkmap_wait_queue_head(unsigned int color)
}
#endif

-unsigned long totalhigh_pages __read_mostly;
-EXPORT_SYMBOL(totalhigh_pages);
-
+atomic_long_t _totalhigh_pages __read_mostly;
+EXPORT_SYMBOL(_totalhigh_pages);

EXPORT_PER_CPU_SYMBOL(__kmap_atomic_idx);

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 55478ab..6e88f72 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -420,7 +420,7 @@ static int __init hugepage_init(void)
* where the extra memory used could hurt more than TLB overhead
* is likely to save. The admin can still enable it through /sys.
*/
- if (totalram_pages < (512 << (20 - PAGE_SHIFT))) {
+ if (totalram_pages() < (512 << (20 - PAGE_SHIFT))) {
transparent_hugepage_flags = 0;
return 0;
}
diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index b209dba..5835c0f 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -236,7 +236,7 @@ void quarantine_reduce(void)
* Update quarantine size in case of hotplug. Allocate a fraction of
* the installed memory to quarantine minus per-cpu queue limits.
*/
- total_size = (READ_ONCE(totalram_pages) << PAGE_SHIFT) /
+ total_size = (totalram_pages() << PAGE_SHIFT) /
QUARANTINE_FRACTION;
percpu_quarantines = QUARANTINE_PERCPU_SIZE * num_online_cpus();
new_quarantine_size = (total_size < percpu_quarantines) ?
diff --git a/mm/memblock.c b/mm/memblock.c
index bbd82ab..2aa1598 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1576,7 +1576,7 @@ void __init __memblock_free_late(phys_addr_t base, phys_addr_t size)

for (; cursor < end; cursor++) {
memblock_free_pages(pfn_to_page(cursor), cursor, 0);
- totalram_pages++;
+ totalram_pages_inc();
}
}

@@ -1978,7 +1978,7 @@ unsigned long __init memblock_free_all(void)
reset_all_zones_managed_pages();

pages = free_low_memory_core_early();
- totalram_pages += pages;
+ totalram_pages_add(pages);

return pages;
}
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 6838a53..3391710 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -146,7 +146,7 @@ static void __meminit mm_compute_batch(void)
s32 batch = max_t(s32, nr*2, 32);

/* batch size set to 0.4% of (total memory/#cpus), or max int32 */
- memsized_batch = min_t(u64, (totalram_pages/nr)/256, 0x7fffffff);
+ memsized_batch = min_t(u64, (totalram_pages()/nr)/256, 0x7fffffff);

vm_committed_as_batch = max_t(s32, memsized_batch, batch);
}
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 6589f60..21d4877 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -269,7 +269,7 @@ static enum oom_constraint constrained_alloc(struct oom_control *oc)
}

/* Default to all available memory */
- oc->totalpages = totalram_pages + total_swap_pages;
+ oc->totalpages = totalram_pages() + total_swap_pages;

if (!IS_ENABLED(CONFIG_NUMA))
return CONSTRAINT_NONE;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 22e6645..f8b64cc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -16,6 +16,7 @@

#include <linux/stddef.h>
#include <linux/mm.h>
+#include <linux/highmem.h>
#include <linux/swap.h>
#include <linux/interrupt.h>
#include <linux/pagemap.h>
@@ -124,7 +125,8 @@
/* Protect totalram_pages and zone->managed_pages */
static DEFINE_SPINLOCK(managed_page_count_lock);

-unsigned long totalram_pages __read_mostly;
+atomic_long_t _totalram_pages __read_mostly;
+EXPORT_SYMBOL(_totalram_pages);
unsigned long totalreserve_pages __read_mostly;
unsigned long totalcma_pages __read_mostly;

@@ -4748,11 +4750,11 @@ long si_mem_available(void)

void si_meminfo(struct sysinfo *val)
{
- val->totalram = totalram_pages;
+ val->totalram = totalram_pages();
val->sharedram = global_node_page_state(NR_SHMEM);
val->freeram = global_zone_page_state(NR_FREE_PAGES);
val->bufferram = nr_blockdev_pages();
- val->totalhigh = totalhigh_pages;
+ val->totalhigh = totalhigh_pages();
val->freehigh = nr_free_highpages();
val->mem_unit = PAGE_SIZE;
}
@@ -7065,10 +7067,10 @@ void adjust_managed_page_count(struct page *page, long count)
{
spin_lock(&managed_page_count_lock);
atomic_long_add(count, &page_zone(page)->managed_pages);
- totalram_pages += count;
+ totalram_pages_add(count);
#ifdef CONFIG_HIGHMEM
if (PageHighMem(page))
- totalhigh_pages += count;
+ totalhigh_pages_add(count);
#endif
spin_unlock(&managed_page_count_lock);
}
@@ -7111,9 +7113,9 @@ unsigned long free_reserved_area(void *start, void *end, int poison, char *s)
void free_highmem_page(struct page *page)
{
__free_reserved_page(page);
- totalram_pages++;
+ totalram_pages_inc();
atomic_long_inc(&page_zone(page)->managed_pages);
- totalhigh_pages++;
+ totalhigh_pages_inc();
}
#endif

@@ -7162,10 +7164,10 @@ void __init mem_init_print_info(const char *str)
physpages << (PAGE_SHIFT - 10),
codesize >> 10, datasize >> 10, rosize >> 10,
(init_data_size + init_code_size) >> 10, bss_size >> 10,
- (physpages - totalram_pages - totalcma_pages) << (PAGE_SHIFT - 10),
+ (physpages - totalram_pages() - totalcma_pages) << (PAGE_SHIFT - 10),
totalcma_pages << (PAGE_SHIFT - 10),
#ifdef CONFIG_HIGHMEM
- totalhigh_pages << (PAGE_SHIFT - 10),
+ totalhigh_pages() << (PAGE_SHIFT - 10),
#endif
str ? ", " : "", str ? str : "");
}
diff --git a/mm/shmem.c b/mm/shmem.c
index 6b91eab..649a144 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -109,13 +109,13 @@ struct shmem_falloc {
#ifdef CONFIG_TMPFS
static unsigned long shmem_default_max_blocks(void)
{
- return totalram_pages / 2;
+ return totalram_pages() / 2;
}

static unsigned long shmem_default_max_inodes(void)
{
- unsigned long totalram_pgs = totalram_pages;
- return min(totalram_pgs - totalhigh_pages, totalram_pgs / 2);
+ unsigned long totalram_pgs = totalram_pages();
+ return min(totalram_pgs - totalhigh_pages(), totalram_pgs / 2);
}
#endif

@@ -3275,7 +3275,7 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo,
size = memparse(value,&rest);
if (*rest == '%') {
size <<= PAGE_SHIFT;
- size *= totalram_pages;
+ size *= totalram_pages();
do_div(size, 100);
rest++;
}
diff --git a/mm/slab.c b/mm/slab.c
index 2a5654b..bc3de2f 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1248,7 +1248,7 @@ void __init kmem_cache_init(void)
* page orders on machines with more than 32MB of memory if
* not overridden on the command line.
*/
- if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
+ if (!slab_max_order_set && totalram_pages() > (32 << 20) >> PAGE_SHIFT)
slab_max_order = SLAB_MAX_ORDER_HI;

/* Bootstrap is tricky, because several objects are allocated
diff --git a/mm/swap.c b/mm/swap.c
index aa48371..a87bd4c 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -1023,7 +1023,7 @@ unsigned pagevec_lookup_range_nr_tag(struct pagevec *pvec,
*/
void __init swap_setup(void)
{
- unsigned long megs = totalram_pages >> (20 - PAGE_SHIFT);
+ unsigned long megs = totalram_pages() >> (20 - PAGE_SHIFT);

/* Use a smaller cluster for small-memory machines */
if (megs < 16)
diff --git a/mm/util.c b/mm/util.c
index 8bf08b5..4df23d6 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -593,7 +593,7 @@ unsigned long vm_commit_limit(void)
if (sysctl_overcommit_kbytes)
allowed = sysctl_overcommit_kbytes >> (PAGE_SHIFT - 10);
else
- allowed = ((totalram_pages - hugetlb_total_pages())
+ allowed = ((totalram_pages() - hugetlb_total_pages())
* sysctl_overcommit_ratio / 100);
allowed += total_swap_pages;

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 97d4b25..871e41c 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1634,7 +1634,7 @@ void *vmap(struct page **pages, unsigned int count,

might_sleep();

- if (count > totalram_pages)
+ if (count > totalram_pages())
return NULL;

size = (unsigned long)count << PAGE_SHIFT;
@@ -1739,7 +1739,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
unsigned long real_size = size;

size = PAGE_ALIGN(size);
- if (!size || (size >> PAGE_SHIFT) > totalram_pages)
+ if (!size || (size >> PAGE_SHIFT) > totalram_pages())
goto fail;

area = __get_vm_area_node(size, align, VM_ALLOC | VM_UNINITIALIZED |
diff --git a/mm/workingset.c b/mm/workingset.c
index d46f8c9..dcb994f 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -549,7 +549,7 @@ static int __init workingset_init(void)
* double the initial memory by using totalram_pages as-is.
*/
timestamp_bits = BITS_PER_LONG - EVICTION_SHIFT;
- max_order = fls_long(totalram_pages - 1);
+ max_order = fls_long(totalram_pages() - 1);
if (max_order > timestamp_bits)
bucket_order = max_order - timestamp_bits;
pr_info("workingset: timestamp_bits=%d max_order=%d bucket_order=%u\n",
diff --git a/mm/zswap.c b/mm/zswap.c
index cd91fd9..a4e4d36 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -219,8 +219,8 @@ struct zswap_tree {

static bool zswap_is_full(void)
{
- return totalram_pages * zswap_max_pool_percent / 100 <
- DIV_ROUND_UP(zswap_pool_total_size, PAGE_SIZE);
+ return totalram_pages() * zswap_max_pool_percent / 100 <
+ DIV_ROUND_UP(zswap_pool_total_size, PAGE_SIZE);
}

static void zswap_update_total_size(void)
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index f27daa1..1b4d39b 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -1131,7 +1131,7 @@ static inline void dccp_mib_exit(void)
static int __init dccp_init(void)
{
unsigned long goal;
- unsigned long totalram_pgs = totalram_pages;
+ unsigned long totalram_pgs = totalram_pages();
int ehash_order, bhash_order, i;
int rc;

diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 1c002c0..950613e 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1866,7 +1866,7 @@ void __init dn_route_init(void)
dn_route_timer.expires = jiffies + decnet_dst_gc_interval * HZ;
add_timer(&dn_route_timer);

- goal = totalram_pages >> (26 - PAGE_SHIFT);
+ goal = totalram_pages() >> (26 - PAGE_SHIFT);

for(order = 0; (1UL << order) < goal; order++)
/* NOTHING */;
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index 03b51cd..b467a7c 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -1000,7 +1000,7 @@ static int __net_init tcp_net_metrics_init(struct net *net)

slots = tcpmhash_entries;
if (!slots) {
- if (totalram_pages >= 128 * 1024)
+ if (totalram_pages() >= 128 * 1024)
slots = 16 * 1024;
else
slots = 8 * 1024;
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index cd233f6..b4e4dfd 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -2251,7 +2251,7 @@ static __always_inline unsigned int total_extension_size(void)

int nf_conntrack_init_start(void)
{
- unsigned long totalram_pgs = totalram_pages;
+ unsigned long totalram_pgs = totalram_pages();
int max_factor = 8;
int ret = -ENOMEM;
int i;
diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 6cb9a74..2df06c4f 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -274,7 +274,7 @@ static int htable_create(struct net *net, struct hashlimit_cfg3 *cfg,
struct xt_hashlimit_htable *hinfo;
const struct seq_operations *ops;
unsigned int size, i;
- unsigned long totalram_pgs = totalram_pages;
+ unsigned long totalram_pgs = totalram_pages();
int ret;

if (cfg->size) {
diff --git a/security/integrity/ima/ima_kexec.c b/security/integrity/ima/ima_kexec.c
index 16bd187..d6f3280 100644
--- a/security/integrity/ima/ima_kexec.c
+++ b/security/integrity/ima/ima_kexec.c
@@ -106,7 +106,7 @@ void ima_add_kexec_buffer(struct kimage *image)
kexec_segment_size = ALIGN(ima_get_binary_runtime_size() +
PAGE_SIZE / 2, PAGE_SIZE);
if ((kexec_segment_size == ULONG_MAX) ||
- ((kexec_segment_size >> PAGE_SHIFT) > totalram_pages / 2)) {
+ ((kexec_segment_size >> PAGE_SHIFT) > totalram_pages() / 2)) {
pr_err("Binary measurement list too large.\n");
return;
}
--
1.9.1


2018-11-08 08:24:49

by Arun KS

[permalink] [raw]
Subject: [PATCH v3 2/4] mm: convert zone->managed_pages to atomic variable

totalram_pages, zone->managed_pages and totalhigh_pages updates
are protected by managed_page_count_lock, but readers never care
about it. Convert these variables to atomic to avoid readers
potentially seeing a store tear.

This patch converts zone->managed_pages. Subsequent patches will
convert totalram_panges, totalhigh_pages and eventually
managed_page_count_lock will be removed.

Suggested-by: Michal Hocko <[email protected]>
Suggested-by: Vlastimil Babka <[email protected]>
Signed-off-by: Arun KS <[email protected]>
Reviewed-by: Konstantin Khlebnikov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>

---
Main motivation was that managed_page_count_lock handling was
complicating things. It was discussed in lenght here,
https://lore.kernel.org/patchwork/patch/995739/#1181785
So it seemes better to remove the lock and convert variables
to atomic, with preventing poteintial store-to-read tearing as
a bonus.

Most of the changes are done by below coccinelle script,

@@
struct zone *z;
expression e1;
@@
(
- z->managed_pages = e1
+ atomic_long_set(&z->managed_pages, e1)
|
- e1->managed_pages++
+ atomic_long_inc(&e1->managed_pages)
|
- z->managed_pages
+ zone_managed_pages(z)
)

@@
expression e,e1;
@@
- e->managed_pages += e1
+ atomic_long_add(e1, &e->managed_pages)

@@
expression z;
@@
- z.managed_pages
+ zone_managed_pages(&z)

Then, manually apply following change,
include/linux/mmzone.h

- unsigned long managed_pages;
+ atomic_long_t managed_pages;

+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+ return (unsigned long)atomic_long_read(&zone->managed_pages);
+}

---
---
drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 2 +-
include/linux/mmzone.h | 9 +++++--
lib/show_mem.c | 2 +-
mm/memblock.c | 2 +-
mm/page_alloc.c | 44 +++++++++++++++++------------------
mm/vmstat.c | 4 ++--
6 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 56412b0..c0e55bb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -848,7 +848,7 @@ static int kfd_fill_mem_info_for_cpu(int numa_node_id, int *avail_size,
*/
pgdat = NODE_DATA(numa_node_id);
for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
- mem_in_bytes += pgdat->node_zones[zone_type].managed_pages;
+ mem_in_bytes += zone_managed_pages(&pgdat->node_zones[zone_type]);
mem_in_bytes <<= PAGE_SHIFT;

sub_type_hdr->length_low = lower_32_bits(mem_in_bytes);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 847705a..e73dc31 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -435,7 +435,7 @@ struct zone {
* adjust_managed_page_count() should be used instead of directly
* touching zone->managed_pages and totalram_pages.
*/
- unsigned long managed_pages;
+ atomic_long_t managed_pages;
unsigned long spanned_pages;
unsigned long present_pages;

@@ -524,6 +524,11 @@ enum pgdat_flags {
PGDAT_RECLAIM_LOCKED, /* prevents concurrent reclaim */
};

+static inline unsigned long zone_managed_pages(struct zone *zone)
+{
+ return (unsigned long)atomic_long_read(&zone->managed_pages);
+}
+
static inline unsigned long zone_end_pfn(const struct zone *zone)
{
return zone->zone_start_pfn + zone->spanned_pages;
@@ -814,7 +819,7 @@ static inline bool is_dev_zone(const struct zone *zone)
*/
static inline bool managed_zone(struct zone *zone)
{
- return zone->managed_pages;
+ return zone_managed_pages(zone);
}

/* Returns true if a zone has memory */
diff --git a/lib/show_mem.c b/lib/show_mem.c
index 0beaa1d..eefe67d 100644
--- a/lib/show_mem.c
+++ b/lib/show_mem.c
@@ -28,7 +28,7 @@ void show_mem(unsigned int filter, nodemask_t *nodemask)
continue;

total += zone->present_pages;
- reserved += zone->present_pages - zone->managed_pages;
+ reserved += zone->present_pages - zone_managed_pages(zone);

if (is_highmem_idx(zoneid))
highmem += zone->present_pages;
diff --git a/mm/memblock.c b/mm/memblock.c
index 7df468c..bbd82ab 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1950,7 +1950,7 @@ void reset_node_managed_pages(pg_data_t *pgdat)
struct zone *z;

for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
- z->managed_pages = 0;
+ atomic_long_set(&z->managed_pages, 0);
}

void __init reset_all_zones_managed_pages(void)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 173312b..22e6645 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1279,7 +1279,7 @@ static void __init __free_pages_boot_core(struct page *page, unsigned int order)
__ClearPageReserved(p);
set_page_count(p, 0);

- page_zone(page)->managed_pages += nr_pages;
+ atomic_long_add(nr_pages, &page_zone(page)->managed_pages);
set_page_refcounted(page);
__free_pages(page, order);
}
@@ -2258,7 +2258,7 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
* Limit the number reserved to 1 pageblock or roughly 1% of a zone.
* Check is race-prone but harmless.
*/
- max_managed = (zone->managed_pages / 100) + pageblock_nr_pages;
+ max_managed = (zone_managed_pages(zone) / 100) + pageblock_nr_pages;
if (zone->nr_reserved_highatomic >= max_managed)
return;

@@ -4662,7 +4662,7 @@ static unsigned long nr_free_zone_pages(int offset)
struct zonelist *zonelist = node_zonelist(numa_node_id(), GFP_KERNEL);

for_each_zone_zonelist(zone, z, zonelist, offset) {
- unsigned long size = zone->managed_pages;
+ unsigned long size = zone_managed_pages(zone);
unsigned long high = high_wmark_pages(zone);
if (size > high)
sum += size - high;
@@ -4769,7 +4769,7 @@ void si_meminfo_node(struct sysinfo *val, int nid)
pg_data_t *pgdat = NODE_DATA(nid);

for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
- managed_pages += pgdat->node_zones[zone_type].managed_pages;
+ managed_pages += zone_managed_pages(&pgdat->node_zones[zone_type]);
val->totalram = managed_pages;
val->sharedram = node_page_state(pgdat, NR_SHMEM);
val->freeram = sum_zone_node_page_state(nid, NR_FREE_PAGES);
@@ -4778,7 +4778,7 @@ void si_meminfo_node(struct sysinfo *val, int nid)
struct zone *zone = &pgdat->node_zones[zone_type];

if (is_highmem(zone)) {
- managed_highpages += zone->managed_pages;
+ managed_highpages += zone_managed_pages(zone);
free_highpages += zone_page_state(zone, NR_FREE_PAGES);
}
}
@@ -4985,7 +4985,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
K(zone_page_state(zone, NR_ZONE_UNEVICTABLE)),
K(zone_page_state(zone, NR_ZONE_WRITE_PENDING)),
K(zone->present_pages),
- K(zone->managed_pages),
+ K(zone_managed_pages(zone)),
K(zone_page_state(zone, NR_MLOCK)),
zone_page_state(zone, NR_KERNEL_STACK_KB),
K(zone_page_state(zone, NR_PAGETABLE)),
@@ -5645,7 +5645,7 @@ static int zone_batchsize(struct zone *zone)
* The per-cpu-pages pools are set to around 1000th of the
* size of the zone.
*/
- batch = zone->managed_pages / 1024;
+ batch = zone_managed_pages(zone) / 1024;
/* But no more than a meg. */
if (batch * PAGE_SIZE > 1024 * 1024)
batch = (1024 * 1024) / PAGE_SIZE;
@@ -5756,7 +5756,7 @@ static void pageset_set_high_and_batch(struct zone *zone,
{
if (percpu_pagelist_fraction)
pageset_set_high(pcp,
- (zone->managed_pages /
+ (zone_managed_pages(zone) /
percpu_pagelist_fraction));
else
pageset_set_batch(pcp, zone_batchsize(zone));
@@ -6311,7 +6311,7 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat)
static void __meminit zone_init_internals(struct zone *zone, enum zone_type idx, int nid,
unsigned long remaining_pages)
{
- zone->managed_pages = remaining_pages;
+ atomic_long_set(&zone->managed_pages, remaining_pages);
zone_set_nid(zone, nid);
zone->name = zone_names[idx];
zone->zone_pgdat = NODE_DATA(nid);
@@ -7064,7 +7064,7 @@ static int __init cmdline_parse_movablecore(char *p)
void adjust_managed_page_count(struct page *page, long count)
{
spin_lock(&managed_page_count_lock);
- page_zone(page)->managed_pages += count;
+ atomic_long_add(count, &page_zone(page)->managed_pages);
totalram_pages += count;
#ifdef CONFIG_HIGHMEM
if (PageHighMem(page))
@@ -7112,7 +7112,7 @@ void free_highmem_page(struct page *page)
{
__free_reserved_page(page);
totalram_pages++;
- page_zone(page)->managed_pages++;
+ atomic_long_inc(&page_zone(page)->managed_pages);
totalhigh_pages++;
}
#endif
@@ -7245,7 +7245,7 @@ static void calculate_totalreserve_pages(void)
for (i = 0; i < MAX_NR_ZONES; i++) {
struct zone *zone = pgdat->node_zones + i;
long max = 0;
- unsigned long managed_pages = zone->managed_pages;
+ unsigned long managed_pages = zone_managed_pages(zone);

/* Find valid and maximum lowmem_reserve in the zone */
for (j = i; j < MAX_NR_ZONES; j++) {
@@ -7281,7 +7281,7 @@ static void setup_per_zone_lowmem_reserve(void)
for_each_online_pgdat(pgdat) {
for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
- unsigned long managed_pages = zone->managed_pages;
+ unsigned long managed_pages = zone_managed_pages(zone);

zone->lowmem_reserve[j] = 0;

@@ -7299,7 +7299,7 @@ static void setup_per_zone_lowmem_reserve(void)
lower_zone->lowmem_reserve[j] =
managed_pages / sysctl_lowmem_reserve_ratio[idx];
}
- managed_pages += lower_zone->managed_pages;
+ managed_pages += zone_managed_pages(lower_zone);
}
}
}
@@ -7318,14 +7318,14 @@ static void __setup_per_zone_wmarks(void)
/* Calculate total number of !ZONE_HIGHMEM pages */
for_each_zone(zone) {
if (!is_highmem(zone))
- lowmem_pages += zone->managed_pages;
+ lowmem_pages += zone_managed_pages(zone);
}

for_each_zone(zone) {
u64 tmp;

spin_lock_irqsave(&zone->lock, flags);
- tmp = (u64)pages_min * zone->managed_pages;
+ tmp = (u64)pages_min * zone_managed_pages(zone);
do_div(tmp, lowmem_pages);
if (is_highmem(zone)) {
/*
@@ -7339,7 +7339,7 @@ static void __setup_per_zone_wmarks(void)
*/
unsigned long min_pages;

- min_pages = zone->managed_pages / 1024;
+ min_pages = zone_managed_pages(zone) / 1024;
min_pages = clamp(min_pages, SWAP_CLUSTER_MAX, 128UL);
zone->watermark[WMARK_MIN] = min_pages;
} else {
@@ -7356,7 +7356,7 @@ static void __setup_per_zone_wmarks(void)
* ensure a minimum size on small systems.
*/
tmp = max_t(u64, tmp >> 2,
- mult_frac(zone->managed_pages,
+ mult_frac(zone_managed_pages(zone),
watermark_scale_factor, 10000));

zone->watermark[WMARK_LOW] = min_wmark_pages(zone) + tmp;
@@ -7486,8 +7486,8 @@ static void setup_min_unmapped_ratio(void)
pgdat->min_unmapped_pages = 0;

for_each_zone(zone)
- zone->zone_pgdat->min_unmapped_pages += (zone->managed_pages *
- sysctl_min_unmapped_ratio) / 100;
+ zone->zone_pgdat->min_unmapped_pages += (zone_managed_pages(zone) *
+ sysctl_min_unmapped_ratio) / 100;
}


@@ -7514,8 +7514,8 @@ static void setup_min_slab_ratio(void)
pgdat->min_slab_pages = 0;

for_each_zone(zone)
- zone->zone_pgdat->min_slab_pages += (zone->managed_pages *
- sysctl_min_slab_ratio) / 100;
+ zone->zone_pgdat->min_slab_pages += (zone_managed_pages(zone) *
+ sysctl_min_slab_ratio) / 100;
}

int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *table, int write,
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 6038ce5..9fee037 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -227,7 +227,7 @@ int calculate_normal_threshold(struct zone *zone)
* 125 1024 10 16-32 GB 9
*/

- mem = zone->managed_pages >> (27 - PAGE_SHIFT);
+ mem = zone_managed_pages(zone) >> (27 - PAGE_SHIFT);

threshold = 2 * fls(num_online_cpus()) * (1 + fls(mem));

@@ -1569,7 +1569,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
high_wmark_pages(zone),
zone->spanned_pages,
zone->present_pages,
- zone->managed_pages);
+ zone_managed_pages(zone));

seq_printf(m,
"\n protection: (%ld",
--
1.9.1


2018-11-08 08:33:51

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] mm: convert zone->managed_pages to atomic variable

On Thu 08-11-18 13:53:16, Arun KS wrote:
> totalram_pages, zone->managed_pages and totalhigh_pages updates
> are protected by managed_page_count_lock, but readers never care
> about it. Convert these variables to atomic to avoid readers
> potentially seeing a store tear.
>
> This patch converts zone->managed_pages. Subsequent patches will
> convert totalram_panges, totalhigh_pages and eventually
> managed_page_count_lock will be removed.
>
> Suggested-by: Michal Hocko <[email protected]>
> Suggested-by: Vlastimil Babka <[email protected]>
> Signed-off-by: Arun KS <[email protected]>
> Reviewed-by: Konstantin Khlebnikov <[email protected]>
> Acked-by: Michal Hocko <[email protected]>
> Acked-by: Vlastimil Babka <[email protected]>
>
> ---
> Main motivation was that managed_page_count_lock handling was
> complicating things. It was discussed in lenght here,
> https://lore.kernel.org/patchwork/patch/995739/#1181785
> So it seemes better to remove the lock and convert variables
> to atomic, with preventing poteintial store-to-read tearing as
> a bonus.

Do not be afraid to put this into the changelog. It is much better to
have it there in case anybody wonders in future and use git blame rather
than chase an email archive to find it in the foot note. The same
applies to the meta patch.

> Most of the changes are done by below coccinelle script,
>
> @@
> struct zone *z;
> expression e1;
> @@
> (
> - z->managed_pages = e1
> + atomic_long_set(&z->managed_pages, e1)
> |
> - e1->managed_pages++
> + atomic_long_inc(&e1->managed_pages)
> |
> - z->managed_pages
> + zone_managed_pages(z)
> )
>
> @@
> expression e,e1;
> @@
> - e->managed_pages += e1
> + atomic_long_add(e1, &e->managed_pages)
>
> @@
> expression z;
> @@
> - z.managed_pages
> + zone_managed_pages(&z)
>
> Then, manually apply following change,
> include/linux/mmzone.h
>
> - unsigned long managed_pages;
> + atomic_long_t managed_pages;
>
> +static inline unsigned long zone_managed_pages(struct zone *zone)
> +{
> + return (unsigned long)atomic_long_read(&zone->managed_pages);
> +}
>
--
Michal Hocko
SUSE Labs

2018-11-08 08:35:03

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] mm: Remove managed_page_count spinlock

On Thu 08-11-18 13:53:18, Arun KS wrote:
> Now totalram_pages and managed_pages are atomic varibles. No need
> of managed_page_count spinlock.

As explained earlier. Please add a motivation here. Feel free to reuse
wording from http://lkml.kernel.org/r/[email protected]

>
> Signed-off-by: Arun KS <[email protected]>
> Reviewed-by: Konstantin Khlebnikov <[email protected]>
> Acked-by: Michal Hocko <[email protected]>
> Acked-by: Vlastimil Babka <[email protected]>
> ---
> include/linux/mmzone.h | 6 ------
> mm/page_alloc.c | 5 -----
> 2 files changed, 11 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index e73dc31..c71b4d9 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -428,12 +428,6 @@ struct zone {
> * Write access to present_pages at runtime should be protected by
> * mem_hotplug_begin/end(). Any reader who can't tolerant drift of
> * present_pages should get_online_mems() to get a stable value.
> - *
> - * Read access to managed_pages should be safe because it's unsigned
> - * long. Write access to zone->managed_pages and totalram_pages are
> - * protected by managed_page_count_lock at runtime. Idealy only
> - * adjust_managed_page_count() should be used instead of directly
> - * touching zone->managed_pages and totalram_pages.
> */
> atomic_long_t managed_pages;
> unsigned long spanned_pages;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f8b64cc..26c5e14 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -122,9 +122,6 @@
> };
> EXPORT_SYMBOL(node_states);
>
> -/* Protect totalram_pages and zone->managed_pages */
> -static DEFINE_SPINLOCK(managed_page_count_lock);
> -
> atomic_long_t _totalram_pages __read_mostly;
> EXPORT_SYMBOL(_totalram_pages);
> unsigned long totalreserve_pages __read_mostly;
> @@ -7065,14 +7062,12 @@ static int __init cmdline_parse_movablecore(char *p)
>
> void adjust_managed_page_count(struct page *page, long count)
> {
> - spin_lock(&managed_page_count_lock);
> atomic_long_add(count, &page_zone(page)->managed_pages);
> totalram_pages_add(count);
> #ifdef CONFIG_HIGHMEM
> if (PageHighMem(page))
> totalhigh_pages_add(count);
> #endif
> - spin_unlock(&managed_page_count_lock);
> }
> EXPORT_SYMBOL(adjust_managed_page_count);
>
> --
> 1.9.1

--
Michal Hocko
SUSE Labs

2018-11-08 10:05:25

by Arun KS

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] mm: Remove managed_page_count spinlock

On 2018-11-08 14:04, Michal Hocko wrote:
> On Thu 08-11-18 13:53:18, Arun KS wrote:
>> Now totalram_pages and managed_pages are atomic varibles. No need
>> of managed_page_count spinlock.
>
> As explained earlier. Please add a motivation here. Feel free to reuse
> wording from
> http://lkml.kernel.org/r/[email protected]

Sure. Will add in next spin.

Regards,
Arun
>
>>
>> Signed-off-by: Arun KS <[email protected]>
>> Reviewed-by: Konstantin Khlebnikov <[email protected]>
>> Acked-by: Michal Hocko <[email protected]>
>> Acked-by: Vlastimil Babka <[email protected]>
>> ---
>> include/linux/mmzone.h | 6 ------
>> mm/page_alloc.c | 5 -----
>> 2 files changed, 11 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index e73dc31..c71b4d9 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -428,12 +428,6 @@ struct zone {
>> * Write access to present_pages at runtime should be protected by
>> * mem_hotplug_begin/end(). Any reader who can't tolerant drift of
>> * present_pages should get_online_mems() to get a stable value.
>> - *
>> - * Read access to managed_pages should be safe because it's unsigned
>> - * long. Write access to zone->managed_pages and totalram_pages are
>> - * protected by managed_page_count_lock at runtime. Idealy only
>> - * adjust_managed_page_count() should be used instead of directly
>> - * touching zone->managed_pages and totalram_pages.
>> */
>> atomic_long_t managed_pages;
>> unsigned long spanned_pages;
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index f8b64cc..26c5e14 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -122,9 +122,6 @@
>> };
>> EXPORT_SYMBOL(node_states);
>>
>> -/* Protect totalram_pages and zone->managed_pages */
>> -static DEFINE_SPINLOCK(managed_page_count_lock);
>> -
>> atomic_long_t _totalram_pages __read_mostly;
>> EXPORT_SYMBOL(_totalram_pages);
>> unsigned long totalreserve_pages __read_mostly;
>> @@ -7065,14 +7062,12 @@ static int __init
>> cmdline_parse_movablecore(char *p)
>>
>> void adjust_managed_page_count(struct page *page, long count)
>> {
>> - spin_lock(&managed_page_count_lock);
>> atomic_long_add(count, &page_zone(page)->managed_pages);
>> totalram_pages_add(count);
>> #ifdef CONFIG_HIGHMEM
>> if (PageHighMem(page))
>> totalhigh_pages_add(count);
>> #endif
>> - spin_unlock(&managed_page_count_lock);
>> }
>> EXPORT_SYMBOL(adjust_managed_page_count);
>>
>> --
>> 1.9.1

2018-11-08 10:16:21

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] mm: Remove managed_page_count spinlock

On Thu 08-11-18 15:33:06, Arun KS wrote:
> On 2018-11-08 14:04, Michal Hocko wrote:
> > On Thu 08-11-18 13:53:18, Arun KS wrote:
> > > Now totalram_pages and managed_pages are atomic varibles. No need
> > > of managed_page_count spinlock.
> >
> > As explained earlier. Please add a motivation here. Feel free to reuse
> > wording from
> > http://lkml.kernel.org/r/[email protected]
>
> Sure. Will add in next spin.

Andrew usually updates changelogs if you give him the full wording.
I would wait few days before resubmitting, if that is needed at all.
0day will throw a lot of random configs which can reveal some leftovers.
--
Michal Hocko
SUSE Labs

2018-11-08 10:31:58

by Arun KS

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] mm: Remove managed_page_count spinlock

On 2018-11-08 15:44, Michal Hocko wrote:
> On Thu 08-11-18 15:33:06, Arun KS wrote:
>> On 2018-11-08 14:04, Michal Hocko wrote:
>> > On Thu 08-11-18 13:53:18, Arun KS wrote:
>> > > Now totalram_pages and managed_pages are atomic varibles. No need
>> > > of managed_page_count spinlock.
>> >
>> > As explained earlier. Please add a motivation here. Feel free to reuse
>> > wording from
>> > http://lkml.kernel.org/r/[email protected]
>>
>> Sure. Will add in next spin.
>
> Andrew usually updates changelogs if you give him the full wording.
> I would wait few days before resubmitting, if that is needed at all.

mm: Remove managed_page_count spinlock

Now that totalram_pages and managed_pages are atomic varibles, no need
of managed_page_count spinlock. The lock had really a weak consistency
guarantee. It hasn't been used for anything but the update but no reader
actually cares about all the values being updated to be in sync.

Signed-off-by: Arun KS <[email protected]>
Reviewed-by: Konstantin Khlebnikov <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>


> 0day will throw a lot of random configs which can reveal some
> leftovers.

Yea. Fixed few of them during v3.

Regards,
Arun

2018-11-08 11:45:42

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] mm: reference totalram_pages and managed_pages once per function

On 11/8/18 9:23 AM, Arun KS wrote:
> This patch is in preparation to a later patch which converts totalram_pages
> and zone->managed_pages to atomic variables. Please note that re-reading
> the value might lead to a different value and as such it could lead to
> unexpected behavior. There are no known bugs as a result of the current code
> but it is better to prevent from them in principle.

..., which will happen after the atomic conversion in the next patch.

> Signed-off-by: Arun KS <[email protected]>
> Reviewed-by: Konstantin Khlebnikov <[email protected]>
> Acked-by: Michal Hocko <[email protected]>

Acked-by: Vlastimil Babka <[email protected]>

2018-11-09 11:35:07

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] mm: reference totalram_pages and managed_pages once per function

Hi Arun,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.20-rc1 next-20181109]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Arun-KS/mm-convert-totalram_pages-totalhigh_pages-and-managed-pages-to-atomic/20181109-184653
config: i386-randconfig-x003-201844 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

All warnings (new ones prefixed by >>):

net//sctp/protocol.c: In function 'sctp_init':
>> net//sctp/protocol.c:1430:5: warning: 'totalram_pgs' may be used uninitialized in this function [-Wmaybe-uninitialized]
if (totalram_pgs >= (128 * 1024))
^

vim +/totalram_pgs +1430 net//sctp/protocol.c

1363
1364 /* Initialize the universe into something sensible. */
1365 static __init int sctp_init(void)
1366 {
1367 int i;
1368 int status = -EINVAL;
1369 unsigned long goal;
1370 unsigned long limit;
1371 unsigned long totalram_pgs;
1372 int max_share;
1373 int order;
1374 int num_entries;
1375 int max_entry_order;
1376
1377 sock_skb_cb_check_size(sizeof(struct sctp_ulpevent));
1378
1379 /* Allocate bind_bucket and chunk caches. */
1380 status = -ENOBUFS;
1381 sctp_bucket_cachep = kmem_cache_create("sctp_bind_bucket",
1382 sizeof(struct sctp_bind_bucket),
1383 0, SLAB_HWCACHE_ALIGN,
1384 NULL);
1385 if (!sctp_bucket_cachep)
1386 goto out;
1387
1388 sctp_chunk_cachep = kmem_cache_create("sctp_chunk",
1389 sizeof(struct sctp_chunk),
1390 0, SLAB_HWCACHE_ALIGN,
1391 NULL);
1392 if (!sctp_chunk_cachep)
1393 goto err_chunk_cachep;
1394
1395 status = percpu_counter_init(&sctp_sockets_allocated, 0, GFP_KERNEL);
1396 if (status)
1397 goto err_percpu_counter_init;
1398
1399 /* Implementation specific variables. */
1400
1401 /* Initialize default stream count setup information. */
1402 sctp_max_instreams = SCTP_DEFAULT_INSTREAMS;
1403 sctp_max_outstreams = SCTP_DEFAULT_OUTSTREAMS;
1404
1405 /* Initialize handle used for association ids. */
1406 idr_init(&sctp_assocs_id);
1407
1408 limit = nr_free_buffer_pages() / 8;
1409 limit = max(limit, 128UL);
1410 sysctl_sctp_mem[0] = limit / 4 * 3;
1411 sysctl_sctp_mem[1] = limit;
1412 sysctl_sctp_mem[2] = sysctl_sctp_mem[0] * 2;
1413
1414 /* Set per-socket limits to no more than 1/128 the pressure threshold*/
1415 limit = (sysctl_sctp_mem[1]) << (PAGE_SHIFT - 7);
1416 max_share = min(4UL*1024*1024, limit);
1417
1418 sysctl_sctp_rmem[0] = SK_MEM_QUANTUM; /* give each asoc 1 page min */
1419 sysctl_sctp_rmem[1] = 1500 * SKB_TRUESIZE(1);
1420 sysctl_sctp_rmem[2] = max(sysctl_sctp_rmem[1], max_share);
1421
1422 sysctl_sctp_wmem[0] = SK_MEM_QUANTUM;
1423 sysctl_sctp_wmem[1] = 16*1024;
1424 sysctl_sctp_wmem[2] = max(64*1024, max_share);
1425
1426 /* Size and allocate the association hash table.
1427 * The methodology is similar to that of the tcp hash tables.
1428 * Though not identical. Start by getting a goal size
1429 */
> 1430 if (totalram_pgs >= (128 * 1024))
1431 goal = totalram_pgs >> (22 - PAGE_SHIFT);
1432 else
1433 goal = totalram_pgs >> (24 - PAGE_SHIFT);
1434
1435 /* Then compute the page order for said goal */
1436 order = get_order(goal);
1437
1438 /* Now compute the required page order for the maximum sized table we
1439 * want to create
1440 */
1441 max_entry_order = get_order(MAX_SCTP_PORT_HASH_ENTRIES *
1442 sizeof(struct sctp_bind_hashbucket));
1443
1444 /* Limit the page order by that maximum hash table size */
1445 order = min(order, max_entry_order);
1446
1447 /* Allocate and initialize the endpoint hash table. */
1448 sctp_ep_hashsize = 64;
1449 sctp_ep_hashtable =
1450 kmalloc_array(64, sizeof(struct sctp_hashbucket), GFP_KERNEL);
1451 if (!sctp_ep_hashtable) {
1452 pr_err("Failed endpoint_hash alloc\n");
1453 status = -ENOMEM;
1454 goto err_ehash_alloc;
1455 }
1456 for (i = 0; i < sctp_ep_hashsize; i++) {
1457 rwlock_init(&sctp_ep_hashtable[i].lock);
1458 INIT_HLIST_HEAD(&sctp_ep_hashtable[i].chain);
1459 }
1460
1461 /* Allocate and initialize the SCTP port hash table.
1462 * Note that order is initalized to start at the max sized
1463 * table we want to support. If we can't get that many pages
1464 * reduce the order and try again
1465 */
1466 do {
1467 sctp_port_hashtable = (struct sctp_bind_hashbucket *)
1468 __get_free_pages(GFP_KERNEL | __GFP_NOWARN, order);
1469 } while (!sctp_port_hashtable && --order > 0);
1470
1471 if (!sctp_port_hashtable) {
1472 pr_err("Failed bind hash alloc\n");
1473 status = -ENOMEM;
1474 goto err_bhash_alloc;
1475 }
1476
1477 /* Now compute the number of entries that will fit in the
1478 * port hash space we allocated
1479 */
1480 num_entries = (1UL << order) * PAGE_SIZE /
1481 sizeof(struct sctp_bind_hashbucket);
1482
1483 /* And finish by rounding it down to the nearest power of two
1484 * this wastes some memory of course, but its needed because
1485 * the hash function operates based on the assumption that
1486 * that the number of entries is a power of two
1487 */
1488 sctp_port_hashsize = rounddown_pow_of_two(num_entries);
1489
1490 for (i = 0; i < sctp_port_hashsize; i++) {
1491 spin_lock_init(&sctp_port_hashtable[i].lock);
1492 INIT_HLIST_HEAD(&sctp_port_hashtable[i].chain);
1493 }
1494
1495 status = sctp_transport_hashtable_init();
1496 if (status)
1497 goto err_thash_alloc;
1498
1499 pr_info("Hash tables configured (bind %d/%d)\n", sctp_port_hashsize,
1500 num_entries);
1501
1502 sctp_sysctl_register();
1503
1504 INIT_LIST_HEAD(&sctp_address_families);
1505 sctp_v4_pf_init();
1506 sctp_v6_pf_init();
1507 sctp_sched_ops_init();
1508
1509 status = register_pernet_subsys(&sctp_defaults_ops);
1510 if (status)
1511 goto err_register_defaults;
1512
1513 status = sctp_v4_protosw_init();
1514 if (status)
1515 goto err_protosw_init;
1516
1517 status = sctp_v6_protosw_init();
1518 if (status)
1519 goto err_v6_protosw_init;
1520
1521 status = register_pernet_subsys(&sctp_ctrlsock_ops);
1522 if (status)
1523 goto err_register_ctrlsock;
1524
1525 status = sctp_v4_add_protocol();
1526 if (status)
1527 goto err_add_protocol;
1528
1529 /* Register SCTP with inet6 layer. */
1530 status = sctp_v6_add_protocol();
1531 if (status)
1532 goto err_v6_add_protocol;
1533
1534 if (sctp_offload_init() < 0)
1535 pr_crit("%s: Cannot add SCTP protocol offload\n", __func__);
1536
1537 out:
1538 return status;
1539 err_v6_add_protocol:
1540 sctp_v4_del_protocol();
1541 err_add_protocol:
1542 unregister_pernet_subsys(&sctp_ctrlsock_ops);
1543 err_register_ctrlsock:
1544 sctp_v6_protosw_exit();
1545 err_v6_protosw_init:
1546 sctp_v4_protosw_exit();
1547 err_protosw_init:
1548 unregister_pernet_subsys(&sctp_defaults_ops);
1549 err_register_defaults:
1550 sctp_v4_pf_exit();
1551 sctp_v6_pf_exit();
1552 sctp_sysctl_unregister();
1553 free_pages((unsigned long)sctp_port_hashtable,
1554 get_order(sctp_port_hashsize *
1555 sizeof(struct sctp_bind_hashbucket)));
1556 err_bhash_alloc:
1557 sctp_transport_hashtable_destroy();
1558 err_thash_alloc:
1559 kfree(sctp_ep_hashtable);
1560 err_ehash_alloc:
1561 percpu_counter_destroy(&sctp_sockets_allocated);
1562 err_percpu_counter_init:
1563 kmem_cache_destroy(sctp_chunk_cachep);
1564 err_chunk_cachep:
1565 kmem_cache_destroy(sctp_bucket_cachep);
1566 goto out;
1567 }
1568

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (8.47 kB)
.config.gz (29.96 kB)
Download all attachments

2018-11-09 15:45:20

by Arun KS

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] mm: Remove managed_page_count spinlock

On 2018-11-08 15:44, Michal Hocko wrote:
> On Thu 08-11-18 15:33:06, Arun KS wrote:
>> On 2018-11-08 14:04, Michal Hocko wrote:
>> > On Thu 08-11-18 13:53:18, Arun KS wrote:
>> > > Now totalram_pages and managed_pages are atomic varibles. No need
>> > > of managed_page_count spinlock.
>> >
>> > As explained earlier. Please add a motivation here. Feel free to reuse
>> > wording from
>> > http://lkml.kernel.org/r/[email protected]
>>
>> Sure. Will add in next spin.
>
> Andrew usually updates changelogs if you give him the full wording.
> I would wait few days before resubmitting, if that is needed at all.
> 0day will throw a lot of random configs which can reveal some
> leftovers.

0day sent one more failure. Will fix that and resend one more version.

Regards,
Arun