måndag 9 september 2013

Fine-grained locking not feasible due to time constraints

Calculating lock statutus per handle and op takes too long time to be feasible.  It would give a better picture of how memory is actually used, but unfortunately it cannot be calculated in something close to realtime - even pushing it out to many workers on a fast CPU takes well over two days and yields data proportional to the size of the number of ops.

Coming up with a better way to calculate "micro" lifetime is something for the future work-section.

söndag 8 september 2013

Memory block access locking heuristics

Locking and unlocking objects is the main reason for rmalloc's existance. Not applying locking in benchmarking will skew results favorably since defragmentation will always be perfect.

Figuring out when an object should be locked within its lifetime ("micro lifetime") is tricky. What I've done so far with lifetime ("macro lifetime") is a ratio of own ops vs others ops, producing a number between 0 and 1 that tells us an average over the entire lifetime.  By having a threshold at 50% we can say that all objects below that threshold are never locked and above is always locked.

That's a gross simplification. What I want to do with the micro lifetime is to be able to lock/unlock the object several times within the object's lifetime. Doing so can be reduced to the following problem: in what intervals does an object T "dominate" over non-T objects, e.g. in a series of ops looking like this:


Using fixed-width windows, this is easy. But that's not what I want to do.  Thinking first about how to make the windows dynamic in length, I thought about letting the window "float" and "sink" depending on the current op.  After a bit of tinkering, I came up with:

def dominance(xs):
    life = 0
    dom = []
    for x in xs:
        if x == NEEDLE:
            life += FLOAT_SPEED
            life -= SINK_SPEED
        if life < 0:
            life = 0

        if life > 0: dom.append(1)
        else: dom.append(0)

    return dom

Applying this to a series of random values [0, 3], I get the following pretty graph:

Alas, this needs to be done for each needle (i.e. handle) during its lifetime. That's horribly slow (O(n2)), and my previous optimization with caching the number of ops within an object's entire lifetime is not possible since I want to have the data explicitly for each point in time, i.e. operation.  Luckily, there's a multiprocessing module in Python that creates a pool of workers that I can throw each object's [start, end] period, i.e. from malloc to free. This makes things faster, and according to my preliminary results it should take about six hours on an intel i7 quad-core to calculate the locking list for all objects. Workers save their output into one file per object, which I can later pick up and use for the locking benchmark.

fredag 6 september 2013

Making sense of histograms and fine-grained locking

The histograms are now properly generated. The main cause for histograms turning out incorrect was because matplotlib stores state in its convenience module pyplot.  Adding code to clear the drawing after each plot fixed the issue.  I can now visualize parts of histograms. This is a macro graph of coverage from 1%-100% (since 0% is a huge spike which skews the diagram):

I've shown this before.  Again, we clearly see a big divide between long-lived objects and short-lived objects. This is good! Zooming in on the even more short-lived objects between 0% and 0.1%:

Okay, there's a huge amount of extremely short-lived objects, pushing down all the others.  Just moving slightly past those, into 0.01% to 2%, we see a more nuanced view:

Still, most objects are clustered into <= ~20% or >= 80%, with not many being equally used, and a lot of them with essentially no lifetime at all. This makes it easier to reason about locked objects.

Leads us into next problem.  While I can get the lifetime of a handle and determine if it's locked/not locked, I'd prefer to get finer-grained locking information.  Specifically, what I need to do is to for each point in time store if a handle is in a locked or unlocked state.  This'll be a boolean value, in the form of:

locking[handle] = [bool]*op_count_during_handle_lifetime

The bool will most likely be a threshold from the activity measurement (own ops / other ops, most likely), but I also hope to see a pattern where activity is clustered in <<50% and >>50%.  While I might not come up with something better than what I have now (locked during its entire lifetime), the structure will still serve me when doing rmalloc benchmarking. (Just with an array filled with True).

Checking out after 6h58m,

Generating histograms of large data

Continuing from where I left off at yesterday's histogram generation.  It's now running on a large data set. Even though it's a lot quicker to run the application now (15 minutes vs probably 15 days on the Opera loading google.com dataset), it's still a bit of a chore to regenerate each time.

Especially since the data is going to be used as input for my measurements of rmalloc, specifically when to insert lock/unlock operations.  Lifetime data is now cached as a json blob in a file to be used by other tools. The histogram tool checks the contents of that file, and if it can be loaded, it is used instead of the ops file.

Checking out after 3h5m,

onsdag 4 september 2013

Generating histograms

Replaced the broken, but quick, handle lifetime generating code and got correct results from the ops file. Which is good, since I don't have to regenerate the ops file each time.

However, the histogram generation code is, as noted yesterday, naive and runs in O(n2), which turns out to take a lot of time processing 8 Gb data.  I modified the original code based on what I had now - working - and got code that runs in O(n) instead, still generating the same results for the small dataset of LibreOffice- opening an empty database file.

Checking out after 2h48m,

tisdag 3 september 2013

Three Week Sprint Starts Now

First out is cleaning up. Valgrind needs patching up to build & run on latest Ubuntu. Translators from customized memcheck format needs updating for the new format I switched to. The translators themselves need to be cleaned up and renamed.

It became a great mess because I had to do partial processing due to lack of available RAM and not being on a 64-bit system. And reinstalling my main computer's OS wasn't something I wanted to do. (Processing the 8Gb Opera allocation memtrace (output of instrumentation of access/malloc/free into memcheck) turned out to require loads of RAM. Luckily, I now have access to a 64-bit system with lots of RAM for such tasks.)

Remainder of today's goals is to fix histogram generation. There are two different versions: one that works but is slow, and one that is faster but does not yield the same results. It is possible I discard the optimized version now that I have a quicker system to run the emulations on. After that, I'm back to a state where I can automatically regenerate all plots and graphs from a memtrace run.

Work so far is pushed to the repository as https://github.com/mikaelj/rmalloc/commit/ab98b32ce9a2a67cf21e51b62b352fc7a08d2d7e

Checking out after 8h15m,