tisdag 31 januari 2012

w6d1: Cold.

No work done, save for thinking of the data model.

fredag 27 januari 2012

w5d5: balloc: compacting; related work

Related work
http://www.usenix.org/event/usenix08/tech/full_papers/craciunas/craciunas_html/

Similar what I want, except I don't have hard real-time requirements nor do I want to limit the amount of memory available or use size-classes. Was presented around the same time as I was given the task. Their implementation incl. future work is similar to what I had thought out in the beginning of my thesis work (circa 2008), in particular storing the handle table in the free-list to save memory.

Compacting
Trivial moving compacting (doesn't yet respect locked handles) that works only in the same K line. Scans for the first (i.e. lowest starting memory address) free slot of the globally largest size. Checks for a chunk with a free and a used component, moves the used one to the free slot and merges the two parts of the chunk into a new (partial) chunk of size (k-1).

torsdag 19 januari 2012

w4d3: More alloc/free tests and handles.

Test to randomly allocate and free data, verifying that the internal free/used nibble count matches allocations.

Introduced handles.

måndag 16 januari 2012

w4d1: balloc() lower bounds

Allocation does what it's supposed to, according to the printout of the memory layout. Good!

Chunks and calculating memory usage
Writing a test case for how much RAM is available is not straight-forward.  Simply counting blocks of the  size used for the requested size does not work, e.g. balloc(5000) = 8192 nibble used, subtract from the total amount of memory available, and keep on until available memory < requested nibble size. For the following series (with 16M of starting RAM):

  1. allocated 5316418 bytes to nibble 0xa25e0008 (k 11)
  2. allocated 3142922 bytes to nibble 0xa2de0008 (k 10)
  3. allocated 479360 bytes to nibble 0xa31e0008 (k 7)
  4. allocated 1359988 bytes to nibble 0xa33e0008 (k 9)
  5. allocated 55739 bytes to nibble 0xa3260008 (k 4)
  6. allocated 232232 bytes to nibble 0xa32a0008 (k 6)
  7. allocated 160914 bytes to nibble 0xa32e0008 (k 6)
  8. allocated 23031 bytes to nibble 0xa3270008 (k 3)
  9. allocated 284332 bytes to nibble 0xa3360008 (k 7)
  10. allocated 208094 bytes to nibble 0xa3320008 (k 6)
  11. allocated 121497 bytes to nibble 0xa3280008 (k 5)
  12. allocated 4092 bytes to nibble 0xa3278008 (k 0)
  13. allocated 11350 bytes to nibble 0xa327c008 (k 2)
  14. allocated 2033 bytes to nibble 0xa3279008 (k 0)
  15. allocated 6196 bytes to nibble 0xa327a008 (k 1)
  16. allocated 431 bytes to nibble (nil) (k 0)
After #15, there is 2048 bytes left and a block of size k=0 is requested. However, as I write this, I realize that you must check for available memory >= minimum nibble size (4kb), or the allocation will, of course, fail. But, that is not enough.  There are situations in which there's enough memory available from just counting allocated blocks, but because how the blocks are fit into memory (greedily), there aren't any spare blocks of appropriate size available. Knapsack problem.  

There is a upper and higher bound on how much spill memory (i.e. internal fragmentation) you can have before you can't allocate another minimum nibble sized-block. My guess is that it's not only related to allocated memory vs. memory left, but also the distribution of the earlier block allocations.  However, as this is still the buddy allocator prototype, I'll test allocation until a semi-arbitrary threshold of block size*2-1. i.e. available memory + maximum spill in one chunk.

And in the end, it all turned out to be my boundary checking: bytes = up2(size); balloc(bytes); without checking for bytes < MIN_CHUNK_SIZE. Test now passes.

Visualizing
...as a flat image is tricky since earlier memory addresses are modified (= split) as smaller chunks are requested. The final memory layout can easily be dumped and drawn, but how useful will it be for performance evaluation?  It gives the layout at the end, but could miss interesting data in the middle.

torsdag 5 januari 2012

w3d3: Meeting and roadmap adjustment

Short-term roadmap adjustments
The prioritized items for the prototype are as follows:
  1. bmalloc()/bfree()
  2. compacting
  3. remove internal dependency on system malloc()/free()
bmalloc()/bfree() will be built using request_chunk() and merge_nibbles() and glue code.

Compacting requires the four operations bmalloc(), block(), bunlock(), bfree(). handle_t to be introduced.

Internal dependency is for the structures used for the data storage. Supervisor pointed out those are non-important for the prototype and should therefore be left out.
No code will be stored so the actual solutions can be ugly/inefficient, as long as the key concepts are there. Which, in the case of the prototype buddy allocator, is the case.

Benchmarking
There are different ways of measuring the efficiency of the allocator and there are many definitions of efficiency (with t=time, n=number of alloc/free so far):
  • fragmentation over n
  • n before OOM
  • n/t, i.e. how long time memory operations take
It's interesting to plot these in the same graph such that design choices that giving trade-offs in time can be compared to e.g. n before OOM, if that is a sought goal.  

w3d2: Merge nibbles test case

free(foo) with malloc(sizeof(foo)) immediately after will give the pointers the same address.

MergeOneLevel (for merge_nibbles) test case done.

måndag 2 januari 2012

w3d1: Merging nibbles

Chunks with nibbles or one nibble per chunk?
Nomenclature:
Block = info and free lists for chunks of size k
Chunk = two nibbles, a and b, each of size k
Nibble = a memory block of size k
k = represents size in bytes, calculated by MIN_BLOCK_SIZE*2^k

Having A and B inside a chunk was easiest thing instead of returning two chunks, however, when merging nibbles that come from two different chunks (i.e. merging  a B | A' b'), first and second's parents will be locked-up as follows in scenario 1.

p->a => c1->a, c1->b
p->b => c2->a, c2->b
release(c1->b), release(c2->a) => new parent chunk p2 can be created

At this point, p->a will be used as c1->a and p->b will be used as c2->b. The new p2 contains a full k-sized chunk. There are now two (k-1) chunks and one k chunk.

Scenario 2, in which chunks only have one memory blob instead of A and B:

p1->mem => c1->mem, c2->mem
p2->mem => c3->mem, c4->mem
release(c2->mem), release(c3->mem) => new parent chunk p3 can be created
p1->mem blocked by c1
p2->mem blocked by c4

My fear that this would lead to bits of memory never being restored (blocked by new parent chunk) was unfounded, since the same thing will happen in both scenarios.  The advantage of not having A/B in one chunk is the somewhat easier management.

Merging nibbles to a new chunk
Until now there has been a relationship between parent and child chunks, i.e. p->a = c1->a, c1->b. However, as discussed in the above scenario, partially restoring the parent and creating a new would look like this:

p->a => c1->a, c1->b
p->b => c2->a, c2->b
release(c1->b), release(c2->a) => new parent chunk p2 can be created
p->flags |= A_SECOND_HALF_USED
p->flags |= B_FIRST_HALF_USED

This because p->a and p->b are only partially free (p->a's first half and p->b's second half) and cannot be used to create new child chunks, and becase of that, the flags are only useful if/when they're eventually combined, i.e. A_SECOND_HALF_USED|A_FIRST_HALF_USED. While possible, the occurance is most likely too rare to be justified the extra logic and memory for book keeping.  Instead, the relationship between the parent and the child is ignored, and the flags are still A_SPLIT, B_SPLIT, A_USED and B_USED. The parent chunk will continue to hold A_SPLIT and B_SPLIT, and the new parent chunk will have blank flags, ready for use.  Then, at an appropriate point in time (alloc? free? compact?), each chunk can be checked for children, and if needed, discarded.


In other words, a new parent will be created on each merge with no respect to the previous parent of the merged nibbles, except for the case when the nibbles are from the same chunk. Garbage collection happens at a later point, to make logic simpler.