Friday, December 29, 2017

~Mysterious deadlock~

Since I recently started devoting more time to Playform, one of the first things I did was update my Rust compiler. All of a sudden, my Playform wouldn't run for more than a few seconds without hanging and becoming unresponsive to input. Reverting my recent minor-looking refactoring commits didn't help, so I reverted more and more aggressively to no avail.

I printf-debugged the mutex locks that looked suspicious. After several rounds of printfs, it turned out the problem wasn't in my code at all - the program was hanging in Vec::push. What.

Then I started downgrading my compiler. I wasn't sure what version I had before I updated, so this was a a bit of a search using my "rustup" commits as a guide. Eventually I found that the change in behavior happened between rust nightly-07-05 and nightly-07-06.

Okay, so this was clearly a rust bug. Being pessimistic about the odds of a bugfix if I just dumped my current findings into a Github issue, I started pulling out code to narrow down a repro (I also posted a vague complaint in IRC and on the rust user forum, in the hopes that someone would take enough of an interest to help out). I happened to notice a bug in my initial sizing of a short-lived vector, and bizarrely, fixing the initial size made the bug appear in a totally different patch of code. So it wasn't about the amount of memory I was allocating (I wasn't, for instance, triggering some new heap size limit), it was about the allocation operation.

Given that, it was hard to narrow down a good repro. Naively pulling out just about any code meant removing a bunch of allocations (there's no reason other than laziness for Playform to be allocating that much, but that's a separate issue). I figured maybe I could do something clever by keeping similar allocation patterns and amounts, but slowly replace all the types with integers and the "real" input data with a deterministic set of inputs.

On another train of thought, I also put some time into narrowing down the exact commit that broke Playform. After rediscovering the pain of building the rust compiler, I eventually made it to this commit, which just looks like it might have done something subtle and Playform-breaking.

But then a new wrinkle. I noticed if I ran the Playform server and client as separate binaries (which I wanted to do to make the repro easier), then the bug manifested even before nightly-07-05. What. Ouch. Okay, maybe the compiler change just revealed a pre-existing bug.

At this point, I decided to give up trying to be a super sleuth and just ask what was actually happening. After an exciting time learning about code signing and the exact right way to do this in recent MacOS releases in order to make gdb work (muttering terminal incantations, rebooting, dragging-and-dropping, and typing in an admin password a lot), I finally got gdb to run the Playform client. Better yet, it was still showing the bug!

The backtraces were not good. They were all question marks except for one thing: jemalloc_mutex_lock. Well, at least I felt pretty confident that this wasn't my fault. I spent some time googling around, trying to figure out if I should blame MacOS, jemalloc, rust, or Rememberus, the god of memory allocations.

One easy thing to try was to stop Playform from using jemalloc and seeing if the problem persisted. So I followed some instructions to change the allocator for the client binary and.. nope, still the same bug. Ugh.

But, oddly enough, the backtrace was still showing jemalloc. I was using the right rust incantation. Maybe I didn't understand some important subtlety about libraries vs binaries. After more googling, I found a Github issue that revealed that the rust compiler commit I'd tracked down had introduced a silent change in syntax for switching allocators.

Using the right syntax, suddenly my bug was gone! I went to patch it into the standalone server+client binary and found out that it already contained patch of code using the old syntax to opt out of jemalloc. I wonder why that was there.

So apparently what happened was that the syntax change introduced in nightly-07-06 had silently undone a careful allocator-switch I'd added and forgotten about, uncovering a bug I'd also entirely forgotten about.

Someday I'll try to find out why jemalloc is deadlocking, but for now I'm just happy that Playform runs again without using a compiler that's about 200 versions behind.

It's alive!

I haven't done any significant work on Playform in quite a while. I believe this is in part because it's hard to get anything significant done, so I'm working on making Playform development a bit more forgiving.

The most immediate thing that needs improving is performance. Playform teeters on the edge of "fast enough to be testable" on my machine, but is really far less performant than it should be. This is something that I've been talking about for a while, but is now my primary focus before anything else happens. I'm separating parts of Playform (e.g. terrain generation, isosurface extraction, brush operations) enough so that they can be tested and benchmarked independently. That way, it will be easier to keep track of performance over time, to notice changes negative performance impacts, and to elucidate which areas of the Playform pipeline (which has a lot of buffers and a lot of pushback) are in need of improvement.

Monday, August 1, 2016

A little grass

Playform has had grass for a while now, but I kept putting off the blog post. It still needs a lot of work, to be sure, but I think even shoddy grass can do a lot to help the sense of detail in a virtual world.

The approach is roughly taken from this GPU Gems article (which does a much better job, just for the record). The basic method is to arrange 2D grass billboards into cross-shaped patterns (see this diagram).

The billboards work equally well in each direction, so we turn off backface culling when we render grass (instead of placing redundant billboards facing in the opposite directions). The resulting tuft maintains a reasonable 3D look even as you walk around it, although they do look odd from above.

To start off, we just place one of these "tufts" pointing straight up, at the center of every polygon that supports grass:

This already works pretty well, but there are some problems. First of all, the grass in the background doesn't properly cover the terrain. We'll deal with that later. Another problem is that the tufts don't connect properly to the terrain! Because billboards are placed in cross-shaped patterns, the base of each tuft is 2D; if we place a tuft on a polygon, the base of the tuft should sit flat on the polygon, but that conflicts with our goal of having grass point straight up.

First of all, let's rotate the tufts so that they sit neatly on their underlying terrain polygons:

That also adds some variation into the grass direction, which improves the realism a bit. But now the grass always points straight out from the terrain surface, which creates a weird effect on steeper slopes.

To control where grass points without rotating the base, we apply a shear effect to each tuft. First, we decide which direction we want the tuft to point, in world coordinates. Directly upward seems like a good start. Then we transform that direction back into model space, and find the shear that would cause the normal vector (which, in model space, is straight up) to become that desired model-space vector (or some multiple of it).

There's no viable shear when the desired direction is more than 90 degrees away from the surface normal. To deal with this in a somewhat reasonable way, we find the angle between the normal and the desired direction, and feed that through a function to smoothly clamp it back to [0, 90). The particular function probably isn't that important; I just use acos(e^(cos(x)-1).

Shearing a vector changes its length, stretching the grass. Let's reset vector lengths after shearing:

We still have the problem that the grass tufts in the background are too small. More generally, if we're placing one grass tuft per terrain polygon, we want to make sure it roughly covers the polygon. So we apply a shader effect to scale the grass tuft based on the size of the polygon (again, the specific function probably doesn't make a huge difference).

Lastly, we can add a wind effect. I'm lazy, and I have this WebGL noise function lying around (which produces smooth noise given a point). We construct the input using the grass tuft position and the current time (plus some arbitrary offsets and trial-and-errored scaling factors), and we can get back two random angles, one clamped to [-180, 180], and the other clamped to [45, 135]. Taken together, these two angles can describe a vector that points within 45 degrees of straight up. Then we just apply a shear effect to make the grass point that way, similar to the one we're already applying to make it point upward.

And now we have rolling hills of waving grass! It's not winning any award, and there are definitely areas for improvement, but I think it will do the job for at least a little while.

Sunday, July 31, 2016

The worst kind of crabgrass

I was experimenting with a shader effect that would make grass billboards grow bigger if they were farther from the camera, so lots of faraway terrain could be covered with relatively few grass billboards.

That was the idea anyway. I overshot it a little:

I had the scaling effect set so dramatically that as grass spawned further from the camera, it would gain back more than that distance by growing bigger. So it crept back towards the camera by getting huge.

I say "crept", but it wasn't a very subtle process.

Things didn't get much better from there.

Friday, July 29, 2016

Bugs! In *my* grass?

I'm still preparing the post on grass blades. Part of the slowness is because I'm trying to go through each step incrementally, and show screenshots of the improvement. Because this means commenting out and tweaking parts of the grass code, I've ended up rewriting it a few times. The latest rewrite made the code simpler and saved some math, but it introduced an interesting bug. See if you can spot what's wrong:

If you guessed "those weird streaks shouldn't be there", then consider yourself a programmer, because they should almost definitely probably not be there.

Although they are pretty dramatically placed...

And they look kind of exciting...

I got too close. I guess this is my life now.

Thursday, July 28, 2016

Fixing/improving the grass illumination

As part of some mundane tech debt cleanup, I ended up tweaking the code that creates and places grass billboards (also: Playform has 3D grass now. Post on that soon). In doing that, I noticed that when a "tuft" of grass was placed on top of a polygon, the grass code was recalculating the surface normal. That information's already available from the polygon structure, so it's definitely a little silly to be recalculating it when it's sitting right there.

But the worse part is that I was doing it badly. The surface normal was just a cross-product of the polygon's sides. The surface normals stored in the polygon struct itself are averaged vertex normals, which make for smoother lighting. And they're sitting right there.

So long story short I took a little time out and made the code interpolate the vertex normals instead. Here's what the changed normals look like with grass debugging shaders:


And in some non-debug shots:


The grass itself has changed too, so it's hard to see the difference in the foreground. But the slope in the background really shows improvement.

Monday, February 8, 2016

Terrain loading design

Current Design

The Playform client divides the world into discrete, fixed-size chunks. A chunk is either entirely loaded or entirely unloaded, at some LOD. When a chunk is needed, the client requests all the voxels of the given LOD in that chunk, as well as neighboring voxels of that LOD. Each chunk in the client then "waits" to receive all the voxels it needs, before generating the mesh and loading it. Chunks are not unloaded until they are either further than the view distance, or a new chunk is being loaded at the same position.

Between adjacent chunks of different LODs, seam meshes need to be loaded to stitch together the different LODs. This is not done.

Ideal Design

Mesh generation happens around edges - every edge between four adjacent voxels can generate a mesh fragment. It doesn't require any chunking, and in fact, chunking adds annoying edge cases (heh) and things to be careful of. Ideally, voxels would just be requested at a given LOD based on their distance from the player. When the server sends a voxel back, we could examine all adjacent edges to see if any new mesh fragments should be loaded/unloaded.

There are some extra wrinkles with this approach: one is that a voxel may be received at a higher LOD than we had before, which means the new voxel is smaller. If we unload all the mesh fragments including the old voxel, in order to load the ones including the new voxel, we will have created holes. We face a similar problem right now, when chunk LODs need to be switched, which we solve by putting off unloading until all the new voxels have been received, at which point we can generate all the mesh fragments at a higher LOD. We can do something similar here, but it will be more complex, since chunks had fixed sizes and a more obvious set of legal positions.

Another related difficulty is the problem of determining which mesh fragments describe the same area. Fragments of different LODs are extracted from edges of different sizes, so we can't use any kind of equivalence relation (e.g. hashing) to find conflicting edges quickly. For example, if we receive a new voxel of size 1, we might look at its adjacent edges and find that some of them are able to generate mesh fragments that should be loaded. In this case, we would need to unload any mesh fragments associated with the same regions of space, but mesh fragments could be associated with that region not only via edges of size 1, but also of size 1/2, 1/4, 2, 4, etc. We need a data structure that gives a reasonably efficient way of finding which edges conflict with one another.