I am writing this blog after spending two days working on the malloclab. Finally came out with a piece of shit. I got 85/100 in this lab, which almost “totally references” others’ implementations. I should do more work on C programming. I still have a long way to go. :(
Goals
In this lab, we are going to write our own dynamic memory allocator, including the functionalities of what we usually use: malloc, free, and realloc. We will have to finish four functions in the provided lab handout: mm_init, mm_malloc, mm_free, and mm_realloc. Similar to the original malloc (ptmalloc), our dynamic allocator is an immediate general-purpose allocator, which (1) we may not make any assumption on the data structure we are going to store. (2) have to make allocation immediately after calling the allocation function. (3) data should align with the 8-byte alignment.
We will have a check program mdriver.c, after we do make in the handout directory, we are able to run the driver and test our code on different functionalities. (Caution: the trace file does not provide us with the handout, you may search on the web or use what I found in my CSAPP github repo)
Similar to the syscall that manages the virtual memory: brk, sbrk, and mmap (which we are not going to use), the handout provides us some syscall-like functions in memlib.c: mem_sbrk, mem_heap_lo, mem_heap_hi, mem_heapsize, and mem_pagesize.
Implementations
Implicit Free List
Implicit free list implementation is actually provided by the code in the book. The data structure used in this implementation is as follows:
| header | data block & padding | footer | another header | ... |
By the way, this project doesn’t allow us to define any explicitly stated structure like above, so we may first find out a good way to do the pointer arithmetic.
We are suggested to use the macro to solve this issue. The reason not to use different helper functions to implement pointer arithmetic is that function calls are expensive. Each pointer arithmetic is really simple, but to make a function call needs to allocate a stack, jump to the function body, and jump back. This lab will be going to measure both the memory utilization of the implementation and the throughput. Macros, on the other hand, will do the pointer arithmetic in the compile time. The compiler will transform different macros to the corresponding assembly instructions and replace them everywhere in the binary ELF file.
The book provides us with the macro below. With those macros, we can easily manipulate the heap block data structure.
/* Basic constants and macros */ #define WSIZE 4 /* Word and header/footer size (bytes) */ #define DSIZE 8 /* Double word size (bytes) */ #define CHUNKSIZE (1 << 12) /* Extend heap by this amount (bytes) */
#define MAX(x, y) ((x) > (y) ? (x) : (y))
/* Pack a size and allocated bit into a word */ #define PACK(size, alloc) ((size) | (alloc))
/* Read and write a word at address p */ #define GET(p) (*(unsigned int *)(p)) #define PUT(p, val) (*(unsigned int *)(p) = (val))
/* Read the size and allocated fields from address p */ #define GET_SIZE(p) (GET(p) & ~0x7) #define GET_ALLOC(p) (GET(p) & 0x1)
/* Given block ptr bp, compute address of its header and footer */ #define HDRP(bp) ((char *)(bp) - WSIZE) #define FTRP(bp) ((char *)(bp) + GET_SIZE(HDRP(bp)) - DSIZE)
/* Given block ptr bp, compute address of next and previous blocks */ #define NEXT_BLKP(bp) ((char *)(bp) + GET_SIZE(((char *)(bp) - WSIZE))) #define PREV_BLKP(bp) ((char *)(bp) - GET_SIZE(((char *)(bp) - DSIZE)))
The implementation details will be discussed through the sequence of the calling routine.
mm_init function is straightforward. First open part of the memory region for a default heap data structure: prologue and epilogue. These two data blocks are used to constrain the search will not go ahead of the heap start (prologue) and will not go beyond the heap end (epilogue). Then we extend a CHUNKSIZE of the heap to store values.
/* Extend the empty heap with a free block of CHUNKSIZE bytes */ if (extend_heap(CHUNKSIZE / WSIZE) == NULL) return-1; return0; }
1
| padding | prologue header | prologue footer | heap start from here | ... | epilogue header |
extend_heap function mainly does two things: (1) use sbrk to open new space for heap. (2) make sure the newly opened space will be consistent with the original blocks, which we will use coalesce function for the newly inserted free block.
/* * extend_heap - Extend heap with free block and return its block pointer */ staticvoid *extend_heap(size_t words) { char *bp; size_t size;
/* Allocate an even number of words to maintain alignment */ size = (words % 2) ? (words + 1) * WSIZE : words * WSIZE; if ((long)(bp = mem_sbrk(size)) == -1) returnNULL;
/* Coalesce if the previous block was free */ return coalesce(bp); }
coalesce function will coalesce the previous and next free block to make a continuous big free block. Discuss the condition case by case based on the allocation bit that is stored in the header and footer of the previous and next blocks will be very easy to implement.
mm_malloc function will deal with first aligning the size (including header, footer, and data block size) provided by the user, trying to find a fit block of this size. There are two different search algorithms that will be discussed later: First Fit Search and Best Fit Search. If it fails, try to extend the heap by at least a CHUNKSIZE and return the new ptr as the allocated block.
/* * mm_malloc - Allocate a block with at least size bytes of payload */ void *mm_malloc(size_t size) { size_t asize; /* Adjusted block size */ size_t extendsize; /* Amount to extend heap if no fit */ char *bp;
if (heap_listp == 0) mm_init();
/* Ignore spurious requests */ if (size == 0) returnNULL;
/* Adjust block size to include overhead and alignment reqs. */ asize = ALIGN(size + DSIZE);
/* Search the free list for a fit */ if ((bp = find_fit(asize)) != NULL) { place(bp, asize); return bp; }
/* No fit found. Get more memory and place the block */ extendsize = MAX(asize, CHUNKSIZE); if ((bp = extend_heap(extendsize / WSIZE)) == NULL) returnNULL;
place(bp, asize); return bp; }
place function just to place the newly allocated block, mark it as allocated and do the split if the block is too large to fit.
/* * place - Place block of asize bytes at start of free block bp * and split if remainder would be at least minimum block size */ staticvoidplace(void *bp, size_t asize) { size_t csize = GET_SIZE(HDRP(bp));
/* * mm_realloc - Implemented simply in terms of mm_malloc and mm_free */ void *mm_realloc(void *ptr, size_t size) { size_t oldsize; void *newptr;
/* If size == 0 then this is just free, and we return NULL. */ if (size == 0) { mm_free(ptr); returnNULL; }
/* If oldptr is NULL, then this is just malloc. */ if (ptr == NULL) { return mm_malloc(size); }
/* Copy the old data. */ oldsize = GET_SIZE(HDRP(ptr)); if (ALIGN(size + DSIZE) > oldsize) { newptr = mm_malloc(size); /* If realloc() fails the original block is left untouched */ if (!newptr) { returnNULL; }
The next fit search version of find_fit helper function is shown below. We will need an additional global rover to store what is the last allocated block. We search from that block to the end, then search from the beginning until around back to rover block.
/* * find_fit - Find a fit for a block with asize bytes */ staticvoid *find_fit(size_t asize) { /* Next fit search */ char *oldrover = rover;
/* Search from the rover to the end of list */ for (; GET_SIZE(HDRP(rover)) > 0; rover = NEXT_BLKP(rover)) if (!GET_ALLOC(HDRP(rover)) && (asize <= GET_SIZE(HDRP(rover)))) return rover;
/* search from start of list to old rover */ for (rover = heap_listp; rover < oldrover; rover = NEXT_BLKP(rover)) if (!GET_ALLOC(HDRP(rover)) && (asize <= GET_SIZE(HDRP(rover)))) return rover;
returnNULL; /* no fit found */ }
The result of the test shows that next fit is a lot better than the first fit search (probably because most of the test trace are written in malloc a lot of block at the beginning)
I didn’t implement the best fit search, but I did implement it on my wisc easier version of malloc lab. So the logic is easy and can add on to both the first fit and next fit search. Have an additional variable to store the best size fit and the best fit block pointer. If we find another block fits better than the current one, we replace it with the new one. Once we find a perfect fit (same size), then we terminate the search and return that block.
Segregate Free List
Rather than using the implicit free list, we could make the free list visible to our program so that we can search through the free list instead of the. Though there are still many different implementations on the free list. I choose a relatively more sufficient implementation: segregate free list.
Segregate free list, by definition, is to put multiple free blocks within a range of size into a separate free list. We could have multiple ranges of sizes, here for malloclab, I choose the range cutoff: 16 32 64 128 256 512 1024 2048 4096. Each free block will have a new data structure as follow:
Since new data structure is introduced, we will need new macrod to deal with the free list.
1 2 3 4 5
/* Given free block ptr fbp, read and write prev and next ptr */ #define GET_PREV(fbp) (GET(fbp)) #define SET_PREV(fbp, prev) (PUT((fbp), (prev))) #define GET_NEXT(fbp) (GET((char *)(fbp) + WSIZE)) #define SET_NEXT(fbp, next) (PUT((char *)(fbp) + WSIZE, (next)))
In order to store the free list, we will need additional space in the heap block to store it, so we will first make more room in the heap block to store the list.
sgrgt_list_init function is used to initialize the segregate list before we initialize any prologue and epilogue block in the mm_init function.
/* Shift the start of the heap */ heap_listp += 10 * WSIZE; }
sgrgt_list_insrt and sgrgt_list_remov functions are used to insert and remove a block to/from the segregate list. The implementation is easy, just a double-linked list insert and remove. A block should be inserted if it was coalesced in coalesce function and removed if it was to place as an allocated block in place function or coalesced in coalesce function to make a bigger free block.
if (prev == NULL) { if (next != NULL) SET_PREV(next, NULL); PUT(root, next); } else { if (next != NULL) SET_PREV(next, prev); SET_NEXT(prev, next); } }
With all those helper functions above (and some small logic like setting the prev and next pointer when a free a block, remove the redundant pointer after alloc the block. Check out the CSAPP github repo), we could successfully implement the segregate list version of malloc. The improvement is huge compared to the first version.
gprof is a great tool to find out where we could do to improve the program. It will give a detailed call graph of the program and also some analysis on it. Here is what I have on the segregate free list version
% the percentage of the total running time of the time program used by this function.
cumulative a running sum of the number of seconds accounted seconds for by this function and those listed above it.
self the number of seconds accounted for by this seconds function alone. This is the major sort for this listing.
calls the number of times this function was invoked, if this function is profiled, else blank.
self the average number of milliseconds spent in this ms/call function per call, if this function is profiled, else blank.
total the average number of milliseconds spent in this ms/call function and its descendents per call, if this function is profiled, else blank.
name the name of the function. This is the minor sort for this listing. The index shows the location of the function in the gprof listing. If the index is in parenthesis it shows where it would appear in the gprof listing if it were to be printed.
Copyright (C) 2012-2020 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification, are permitted in any medium without royalty provided the copyright notice and this notice are preserved.
Call graph (explanation follows)
granularity: each sample hit covers 4 byte(s) for 0.78% of 1.28 seconds
This table describes the call tree of the program, and was sorted by the total amount of time spent in each function and its children.
Each entry in this table consists of several lines. The line with the index number at the left hand margin lists the current function. The lines above it list the functions that called this function, and the lines below it list the functions this one called. This line lists: index A unique number given to each element of the table. Index numbers are sorted numerically. The index number is printed next to every function name so it is easier to look up where the function is in the table.
% time This is the percentage of the `total' time that was spent in this function and its children. Note that due to different viewpoints, functions excluded by options, etc, these numbers will NOT add up to 100%.
self This is the total amount of time spent in this function.
children This is the total amount of time propagated into this function by its children.
called This is the number of times the function was called. If the function called itself recursively, the number only includes non-recursive calls, and is followed by a `+' and the number of recursive calls.
name The name of the current function. The index number is printed after it. If the function is a member of a cycle, the cycle number is printed between the function's name and the index number.
For the function's parents, the fields have the following meanings:
self This is the amount of time that was propagated directly from the function into this parent.
children This is the amount of time that was propagated from the function's children into this parent.
called This is the number of times this parent called the function `/' the total number of times the function was called. Recursive calls to the function are not included in the number after the `/'.
name This is the name of the parent. The parent's index number is printed after it. If the parent is a member of a cycle, the cycle number is printed between the name and the index number.
If the parents of the function cannot be determined, the word `<spontaneous>' is printed in the `name' field, and all the other fields are blank.
For the function's children, the fields have the following meanings:
self This is the amount of time that was propagated directly from the child into the function.
children This is the amount of time that was propagated from the child's children to the function.
called This is the number of times the function called this child `/' the total number of times the child was called. Recursive calls by the child are not listed in the number after the `/'.
name This is the name of the child. The child's index number is printed after it. If the child is a member of a cycle, the cycle number is printed between the name and the index number.
If there are any cycles (circles) in the call graph, there is an entry for the cycle-as-a-whole. This entry shows who called the cycle (as parents) and the members of the cycle (as children.) The `+' recursive calls entry shows the number of function calls that were internal to the cycle, and the calls entry for each member shows, for that member, how many times it was called from other members of the cycle.
Copyright (C) 2012-2020 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification, are permitted in any medium without royalty provided the copyright notice and this notice are preserved.
Malloclab is inevitably the hardest lab among all labs in CSAPP. But I learned a lot from it (though my code is still a piece of shit). The next step will be to take look at what real libc ptmalloc is doing by studying its source code. Then we could head to the last lab - proxylab.