Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
806 views
in Technique[技术] by (71.8m points)

c - Analyzing memory mapping of a process with pmap. [stack]

I'm trying to understand how stack works in Linux. I read AMD64 ABI sections about stack and process initialization and it is not clear how the stack should be mapped. Here is the relevant quote (3.4.1):

Stack State

This section describes the machine state that exec (BA_OS) creates for new processes.

and

It is unspecified whether the data and stack segments are initially mapped with execute permissions or not. Applications which need to execute code on the stack or data segments should take proper precautions, e.g., by calling mprotect().

So I can deduce from the quotes above that the stack is mapped (it is unspecified if PROT_EXEC is used to create the mapping). Also the mapping is created by exec.

The question is whether the "main thread"'s stack uses MAP_GROWSDOWN | MAP_STACK mapping or maybe even via sbrk?

Looking at pmap -x <pid> the stack is marked with [stack] as

00007ffc04c78000     132      12      12 rw---   [ stack ]

Creating a mapping as

mmap(NULL, 4096,
     PROT_READ | PROT_WRITE,
     MAP_ANONYMOUS | MAP_PRIVATE | MAP_STACK,
     -1, 0);

simply creates anonymous mapping as that is shown in pmap -x <pid> as

00007fb6e42fa000       4       0       0 rw---   [ anon ]
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I can deduce from the quotes above that the stack is mapped

That literally just means that memory is allocated. i.e. that there is a logical mapping from those virtual addresses to physical pages. We know this because you can use a push or call instruction in _start without making a system call from user-space to allocate a stack.

In fact the x86-64 System V ABI specifies that argc, argv, and envp are on the stack at process startup.

The question is whether the "main thread"'s stack uses MAP_GROWSDOWN | MAP_STACK mapping or maybe even via sbrk?

The ELF binary loader sets the _GROWSDOWN flag for the main thread's stack, but not the MAP_STACK flag. This is code inside the kernel, and it does not go through the regular mmap system call interface.

(Nothing in user-space uses mmap(MAP_GROWSDOWN) so normally the main thread stack is the only mapping that have the VM_GROWSDOWN flag inside the kernel.)

The internal name of the flag that is used for the virtual memory aree (VMA) of the stack is called VM_GROWSDOWN. In case you're interested, here are all the flags that are used for the main thread's stack: VM_GROWSDOWN, VM_READ, VM_WRITE, VM_MAYREAD, VM_MAYWRITE, and VM_MAYEXEC. In addition, if the ELF binary is specified to have an executable stack (e.g., by compiling with gcc -z execstack), the VM_EXEC flag is also used. Note that on architectures that support stacks that grow upwards, VM_GROWSUP is used instead of VM_GROWSDOWN if the kernel was compiled with CONFIG_STACK_GROWSUP defined. The line of code where these flags are specified in the Linux kernel can be found here.

/proc/.../maps and pmap don't use the VM_GROWSDOWN - they rely on address comparison instead. Therefore they may not be able to determine exactly the exact range of the virtual address space that the main thread's stack occupies (see an example). On the other hand, /proc/.../smaps looks for the VM_GROWSDOWN flag and marks each memory region that has this flag as gd. (Although it seems to ignore VM_GROWSUP.)

All of these tools/files ignore the MAP_STACK flag. In fact, the whole Linux kernel ignores this flag (which is probably why the program loader doesn't set it.) User-space only passes it for future-proofing in case the kernel does want to start treating thread-stack allocations specially.


sbrk makes no sense here; the stack isn't contiguous with the "break", and the brk heap grows upward toward the stack anyway. Linux puts the stack very near the top of virtual address space. So of course the primary stack couldn't be allocated with (the in-kernel equivalent of) sbrk.


And no, nothing uses MAP_GROWSDOWN, not even secondary thread stacks, because it can't in general be used safely.

The mmap(2) man page which says MAP_GROWSDOWN is "used for stacks" is laughably out of date and misleading. See How to mmap the stack for the clone() system call on linux?. As Ulrich Drepper explained in 2008, code using MAP_GROWSDOWN is typically broken, and proposed removing the flag from Linux mmap and from glibc headers. (This obviously didn't happen, but pthreads hasn't used it since well before then, if ever.)


MAP_GROWSDOWN sets the VM_GROWSDOWN flag for the mapping inside the kernel. The main thread also uses that flag to enable the growth mechanism, so a thread stack may be able to grow the same way the main stack does: arbitrarily far (up to ulimit -s?) if the stack pointer is below the page fault location. (Linux does not require "stack probes" to touch every page of a large multi-page stack array or alloca.)

(Thread stacks are fully allocated up front; only normal lazy allocation of physical pages to back that virtual allocation avoids wasting space for thread stacks.)

MAP_GROWSDOWN mapping can also grow the way the mmap man page describes: access to the "guard page" below the lowest mapped page will also trigger growth, even if that's below the bottom of the red zone.

But the main thread's stack has magic you don't get with mmap(MAP_GROWSDOWN). It reserves the growth space up to ulimit -s to prevent random choice of mmap address from creating a roadblock to stack growth. That magic is only available to the in-kernel program-loader which maps the main thread's stack during execve(), making it safe from an mmap(NULL, ...) randomly blocking future stack growth.

mmap(MAP_FIXED) could still create a roadblock for the main stack, but if you use MAP_FIXED you're 100% responsible for not breaking anything. (Unlimited stack cannot grow beyond the initial 132KiB if MAP_FIXED involved?). MAP_FIXED will replace existing mappings and reservations, but anything else will treat the main thread's stack-growth space as reserved;. (I think that's true; worth trying with MAP_FIXED_NOREPLACE or just a non-NULL hint address)

See

pthread_create doesn't use MAP_GROWSDOWN for thread stacks, and neither should anyone else. Generally do not use. Linux pthreads by default allocates the full size for a thread stack. This costs virtual address space but (until it's actually touched) not physical pages.

The inconsistent results in comments on Why is MAP_GROWSDOWN mapping does not grow? (some people finding it works, some finding it still segfaults when touching the return value and the page below) sound like https://bugs.centos.org/view.php?id=4767 - MAP_GROWSDOWN may even be buggy outside of the way the standard main-stack VM_GROWSDOWN mapping is used.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...