Relocatable Malloc: februari 2012

måndag 27 februari 2012

w9d1. Locked blocks and unrecoverable fragmentation

CompactLock1 seed 1330343837

O(0xb7cf1008)XXXXXXXXXXXXXXXXo(0xb7cf0ff8)_________________________________o(0xb7cf0fe8)____________________O(0xb7cf0fd8)XXXXXXXXXXXXXXXXXXXXXXXXXO(0xb7cf0fc8)XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXO(0xb7cf0fb8)|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||O(0xb7cf0fa8)XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXO(0xb7cf0f98)XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXo(0xb7cf0f88)______________________________________O(0xb7cf0f78)||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||o(0xb7cf0f68)_____________________________________O(0xb7cf0f58)O(0xb7cf0f48)||||||O(0xb7cf0f38)||||||||||||||||||||||||||||||||||||||o(0xb7cf0f28)________________________________________________________________o(0xb7cf0f18)_____________________________________o(0xb7cf0ea8)___________________________________O(0xb7cf0f08)||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||o(0xb7cf0ef8)__________________o(0xb7cf0ee8)_____________________________________o(0xb7cf0ed8)______________________________________________O(0xb7cf0ec8)|||||||||O(0xb7cf0eb8)||||||||||||||||||||||||||||||||||||||||||||||||O(0xb7cf0e98)|||||||||||||||

Output that follows:

First free block 0xb7cf0ff8, last free 0xb7cf0fe8, total free 58036
First locked block 0xb7cf0fd8, last locked 0xb7cf0fc8
First used block = 0xb7cf0fb8, last used block = 0xb7cf0fb8, found_used_block = 0
Free and used blocks are not adjacent.
no unlocked blocks

Explanation:

0xb7cf1008 locked block skipped
0xb7cf0ff8 - 0xb7cf0fe8 free
0xb7cf0fd8 - 0xb7cf0fc8 locked
0xb7cf0fb8 used too large
stop.

Needs to look further for smaller used blocks to move (skipping over potential free blocks!), then continuing again from the first free block and re-link the blocks moved. Relinking blocks is always messy.

måndag 13 februari 2012

w7d7: Free list became free block slot array

Bunch of commits relating to the free list. Since the top-most block can be locked, there has to be a fallback. That's handled with a buddy-style slot array with free lists of the corresponding size. Too large blocks (via merging) are moved to the correct location by alloc(), but could in theory be handled completely by the compactor. Benchmark task!

commit 5d1f7237d0d8e800007be6b2960bb8c3cf33c01c
Author: Mikael Jansson <mail@mikael.jansson.be>
Date: Fri Feb 10 18:33:57 2012 +0100
Replaced g_free_list with g_free_block_slots
Blocks are now stored in linked list indexed by log2(size) where blocks of 2^k - 2^(k+1) bytes can be found.
There can also exist larger blocks than 2^(k+1), in which case they must be moved (by alloc()) to the appropriate slot immediately. Then, continue with the next item in list.

commit 1992d2d09ad3b4647d17ff93afb34b2b7908f140
Author: Mikael Jansson <mail@mikael.jansson.be>
Date: Sun Feb 12 12:51:49 2012 +0100
freeblock_find()
* move too large objects out of the way
* if no other blocks found, use what we did find: shrink, insert rest, return block.
commit c5921bc442996f1e522a9c5aaf7794aa801bd7c2

Author: Mikael Jansson <mail@mikael.jansson.be>

Date: Sun Feb 12 16:01:50 2012 +0100

freeblock_find()
* new shrink layout [rest|block] instead of [block|rest] to avoid aliasing by code referring to block.

onsdag 8 februari 2012

w7d3: Current issues with compacting

The free list is stored as an unsorted single-linked list to save a) space (minimum block size) and b) time (insertion). Used blocks are linked to from the header list, also unsorted.
There are two ways of doing compacting: sliding and arbitrary.

Arbitrary compacting: blocks are moved at random to free spaces. Order of blocks is not preserved.
Sliding compacting: blocks are pushed together (moved by offset) to the free space. Order of blocks is preserved.

Preserving blocks gives better cache locality on the premise that related objects are often allocated together in time.

Optimally, preserving blocks is the better idea. However, there are two constraints that makes this hard:

1. Memory is allocated from the top of the heap and no other location.
2. Compacting is allowed to take time, but it needs to be incremental so it can run in a given time slice.

(1) makes a perfect match for arbitrary compacting, where the top-most object (if unlocked) is moved to the bottom of the heap, freeing memory. However:

Issues with allocating objects at top of heap
Any object that is locked will effectively limit how much heap is available. Ponder the following allocation (_ = free, O = unlocked, X = locked)::

[_______OOOO__OOOO_XOOX_______]

In this situation, there will never be any larger block available than what's left at the right side of the X, regardless of how well-compacted the rest of the heap is. Therefore, simply increasing the top pointer is not a good idea. All available memory needs to be utilized, regardless of compacting option.

Greedy sliding, to free up as much memory as possible::

1. [_______OOOO__OOOO_XOOX_______]
2. [_________OOOOOOOO_XOOX_______]
3. [__________OOOOOOOOXOOX_______]

When choosing between segments 1 and 2 (3 is locked between Xs), pick the one that causes the largest continuous block of free memory after move, i.e. move block 1 to the right. Disregard any block with one or more X:s in them. Wiht a slightly different layout, it looks like this::

1. [_______OOOO__OOOO___OO__X____]: g1 left
2. [_________OOOOOOOO___OO__X____]: g3 left
3. [_________OOOOOOOOOO_____X____]: done.

This all requires sorted list of used blocks and free blocks to find largest continuous areas not dispersed with locked blocks. Insertion can be done sorted, for O(log N) in malloc()/free(), or they can be sorted in compact() and re-sorted on each entry. On each subsequent compact step, the list would have to be sorted, but it would also always be semi-sorted. Thus, an algorithm can be picked that has best performance when the list is already in near-order.

Start at bottom also problematic
It's not certain, however, that we will have an ideal situation, in fact, the first objects tend to live longer, i.e.:

1. [X______OOOO__OOOO___OO__X____]: g1 left

which means we cannot allocate from start. Instead, have to allocate in free blocks. What blocks to choose? Just the first one and split it up into smaller blocks? Would it, possibly, be a better idea to look at the free block that is the closest size of the requested block? Searching for block inside an unordered list, however, is O(n). That is too slow. So, keep a 2^k array (2^k ... heap_size) of linked lists where blocks of *at least* the size would be linked in. Possibly sorted by order (binary tree? red-black? what techniques are there?).

Then, finding a suiting block would be heap[log2(size)], grab the first block, split up between used and free, relink list, done. Similar to the buddy allocator. However, since there will be a split in any event, why even bother about looking at the smallest block? Might as well just any block. No. The block would have to fit. That's why they would have to be stored in the 2^k array, but internal order of each list doesn't matter.

fredag 3 februari 2012

w6d5: Header fragmentation

Continuously growing header list

Growing the header list is O(1) but will cause holes when headers are in state FREE_BLOCK (for later compacting) or UNUSED. Since handle is mapped 1:1 with header, they cannot be compacted. Scanning the list to find a free header to use in malloc() would take O(n) time

By wasting an extra integer value in the header structure, a monotonically increasing number can be assigned to each new handle and copied into the header. When compacting, the handles will be sorted by that key. Looking up a handle in lock/unlock() will be O(log n). Is that acceptable?

According to http://jfdube.wordpress.com/2012/01/13/memory-management-part-4-allocators-overhead-waste-and-fragmentation/, the overhead is 8-24 bytes per chunk. I'm up to 20 bytes right now.

Could the address of the header be cached in the handle for as long as compact hasn't happened yet? Then, when the compact happens, all handles are updated according to the new value. Caching the address with the integer requires actual storage somewhere, and we're back to square one with actual lookup having to be done. Rats.

A hash function mapping <handle-integer> to <header> in O(1) that is rewritten on each compact? See http://burtleburtle.net/bob/hash/doobs.html for more on hashes -- has a thorough discussion. Also Paul Hsieh's quick hash used in WebKit: http://www.azillionmonkeys.com/qed/hash.html

To consider: to get rid of the extra integer, is it possible to combine the handle and memory block addresses and use that to adress, and update the handle according to an invariant, e.g. that only one will ever be moved at a time, or similar?

Don't grow the header list, don't scan

Since we don't care about any particular order of the headers, only the entire table remains compact, getting any header is fine.

Therefore, use the fact that UNUSED headers don't use their pointer field to keep track of memory blocks, to instead maintain a header free list by using the pointer as the next pointer! Requires two pointers: one to keep track of the start and the end of the free list. New headers are grabbed from the start and appended to the end. If no free list, then grow list as before.

At the compact() step, all FREE_BLOCK headers turn into UNUSED and inserted to the free list.

w6d4: Workings and growing header list

Memory layout and algorithm

memory layout:

- header list grows from top of stack and downwards

- objects grow from bottom of stack and upwards

header:

- lock type: unlocked = 0, locked = 1, weak = 2

- pointer to memory block

- size of memory block

free memory block:

- 4 bytes at N-4: header address, points back to this block

- 4 bytes at N-8: next free memory block.

handle:

- alias of header, opaque type. means the header block cannot be compacted!

header list:

- insert new at the first free location

possibly:

- keep bitmap (64 bytes) to narrow down the search of first free header

- 64*8=512 bits, initially 8 headers per bit = 4096 entries.

- bitmap scaled to number of objects, when limit reached, double and scale down (b0=b0*b1, b2=b2*b3, ...).

Current issues

Header is mapped 1:1 with handle. There must be header for each block in the free list (for identity when merging with previous block). Which means that between compacts, header list will always grow, since there are by definition never any unused handles.

This means that in the current setup, the header list will never be fully compacted.

Sidor