söndag 8 september 2013

Memory block access locking heuristics

Locking and unlocking objects is the main reason for rmalloc's existance. Not applying locking in benchmarking will skew results favorably since defragmentation will always be perfect.

Figuring out when an object should be locked within its lifetime ("micro lifetime") is tricky. What I've done so far with lifetime ("macro lifetime") is a ratio of own ops vs others ops, producing a number between 0 and 1 that tells us an average over the entire lifetime.  By having a threshold at 50% we can say that all objects below that threshold are never locked and above is always locked.

That's a gross simplification. What I want to do with the micro lifetime is to be able to lock/unlock the object several times within the object's lifetime. Doing so can be reduced to the following problem: in what intervals does an object T "dominate" over non-T objects, e.g. in a series of ops looking like this:

TLTLLLTTLTLTLTTTTTTTTTTTLLLLLLLLTTLLTTTLLLLLLTLLLLLLTT

Using fixed-width windows, this is easy. But that's not what I want to do.  Thinking first about how to make the windows dynamic in length, I thought about letting the window "float" and "sink" depending on the current op.  After a bit of tinkering, I came up with:

FLOAT_SPEED=1.0
SINK_SPEED=0.5
NEEDLE=0
def dominance(xs):
    life = 0
    dom = []
    for x in xs:
        if x == NEEDLE:
            life += FLOAT_SPEED
        else:
            life -= SINK_SPEED
        if life < 0:
            life = 0

        if life > 0: dom.append(1)
        else: dom.append(0)

    return dom

Applying this to a series of random values [0, 3], I get the following pretty graph:


Alas, this needs to be done for each needle (i.e. handle) during its lifetime. That's horribly slow (O(n2)), and my previous optimization with caching the number of ops within an object's entire lifetime is not possible since I want to have the data explicitly for each point in time, i.e. operation.  Luckily, there's a multiprocessing module in Python that creates a pool of workers that I can throw each object's [start, end] period, i.e. from malloc to free. This makes things faster, and according to my preliminary results it should take about six hours on an intel i7 quad-core to calculate the locking list for all objects. Workers save their output into one file per object, which I can later pick up and use for the locking benchmark.

Inga kommentarer:

Skicka en kommentar