Review of ptmalloc2
Since we are done with our own dynamic memory allocator in the malloc lab a while ago, it’s good to look at the real one used in GNU C Library (glibc): ptmalloc2
. It will help us better understand how real world allocator worked compare to our simple naive allocator. But I have to admit that the lab is designed so well so that we could make it really close to the real world allocator if we spend two more weeks in this lab to optimize the performace.
Allocator Overview
There are varies dynamic memory allocator in use. Alough every allocators claim they are fast and general use allocators, not all allocator could fit your application. Or in other words, each of them have slightly different implementations which may have different performance issues. But in this blog, we will only talk about glibc allocator ptmalloc2
1 | dlmalloc – General purpose allocator |
ptmalloc2
was forked form dlmalloc
. After fork, multi-thread concurrent allocation support was added to it and released at 2006. After the official release, ptmalloc2
was integrated/merged into glibc source code. Since then, code changes were made directly to glibc malloc source code itself.
Heap
Generally speaking, heap is a part of memory (or a memory segment) that avaliable for program to dynamically allocate or free. Many run time data structures are stored in the heap because their size is highly depend on run time workload. In the context of currency, each thread will have their own heap managed by the heap allocator so threads will not interupt each other when allocating memory.
There are two ways to ask operating system for more heap space: brk&sbrk
and mmap
. The details of these two system calls can be found on their man page: brk&sbrk and mmap.
Specifically, in ptmalloc2
‘s data structure, heap usually indicate a part of contiguous memory containing malloc_chunk
. heap_info
indicates the start of a heap, or a heap header. It contains all the information for this part of contiguous blocks of memory.
1 | /* A heap is a single contiguous memory region holding (coalesceable) |
Arena
Arena contains the information for a heap allocation of a thread (could be main thread or any other thread) or process, for example, information about bins, top chunk, last remainder chunk… There are two different arena used in ptmalloc2
: main_arena
and non_main_arena
, also mean thread_arena
.
Each arena is connected by a circular linked list. They use mutex lock to ensure that only one thread will be access this arena. When a thread call malloc
to allocate memory, ptmalloc2
will first check if that thread already have an arena. If it does have its own thread arena, it will try to access that arena and lock it. If it fails, it will traverse the circular linked list to find any possiable arena that not being locked by other threads. If it still can’t find it, a new arena will be created and inserted to the circular linked list.
1 | struct malloc_state |
Chunks
According to glibc source code (latest version glibc 2.37), a malloc chunk have mainly three parts: size
for both previous chunk and current chunk, ptr
for doubly linked free list, and next size
for adjacent (in the doubly linked list) chunks in doubly linked list.
1 | /* |
The struct definition is a bit abstract and misleading for us to truly understand how the chunk is manipulated in the memory. So glibc also provide us with a more detailed documentation for allocated chunks and free chunks.
Allocated Chunks
1 | chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
prev_size
: Previous chunk size. This part will be user data if the previous chunk is allocated, which indicate by the P
bit.
size
: Current chunk size
To increase CPU performance when access memory, ptmalloc2
choose to use 8 byte alignment, which gives us 3 unuse bits for bit indicators:
P
(PREV_INUSE) bit: if the previous block is usedM
(IS_MMAPED) bit: if the chunk allocated viammap
A
(NON_MAIN_ARENA) bit: if this chunk is not inmain_arena
. If it is set, this chunk belongs tothread_arena
Free Chunks
1 | chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
Similar to allocated chunks, free chunks also have the similar fields:
prev_size
: Previous chunk size. This part will be user data if the previous chunk is allocated, which indicate by the P
bit.
size
: Current chunk size
fd_ptr
& bk_ptr
: indicate adjacent free chunks in the doubly linked free list.
Bit map is the same.
Bins
Bins are used to locate free or unallocated chunks, or explicit free lists. As shown in the Free Chunks section, each bin is doubly linked. There are total 128 bins for different free chunk sizes. Free chunks in the bins are kept in size order. Since each bin is relatively small, the traversal cost can be ignored. For more details, we could check the comments in the source code:
1 | /* |
We could see that it is very similar to the free list we have done in the malloc lab.
1 | /* |
There are 4 kinds of bins maintained by ptmalloc2
:
- Fast bin
- Small bin
- Large bin
- Unsorted bin
TCache & Fast Bin
1 | /* |
Small Bin
Large Bin
Unsorted Bin
Allocation
Free
Concurrent Allocation
Reference
Source code:
https://elixir.bootlin.com/glibc/glibc-2.37/source/malloc/malloc.c
https://elixir.bootlin.com/glibc/glibc-2.37/source/malloc/arena.c
Other’s blog for glibc malloc:
https://sploitfun.wordpress.com/2015/02/10/understanding-glibc-malloc/
https://azeria-labs.com/heap-exploitation-part-2-glibc-heap-free-bins/
https://ctf-wiki.org/en/pwn/linux/user-mode/heap/ptmalloc2/implementation/overview/