Saturday, September 26, 2015

A handful of tweaks

The updates for this weekend (so far) have been a few smaller changes, none of which individually merit a post of their own. Here's what's been happening:

Lag spikes
Moving around in the world caused noticeably jumpy behavior. After spending weeks adding timers and looking at inscrutable performance graphs, I finally decided to go with my gut. At least one category of lag spikes was pretty obviously happening when the player crossed block boundaries (the terrain in Playform is divided into blocks of 8m x 8m x 8m), so I experimented with effectively disabling different parts of the code that were sensitive to block transitions. It didn't take long to find out that the major offender was the code that updates the render state. Playform has a separate thread to deal with OS I/O (specifically, to deal with all SDL-specific things, since a single thread to deal with SDL and OpenGL is not-uncommonly a system requirement), which includes things like mouse movement, window focus, and rendering.

Because of the single thread (and the message-passing style used to communicate with it), the "move the camera to where the player is" messages weren't being treated differently from the "now start displaying this piece of terrain" messages. Basically, crossing the boundary of a terrain block caused some LOD (Level-Of-Detail) switching to happen, which slowed down the processing of camera movement events (and other events, which were less noticeable, like sun position updates). The solution was to split the updates into two separate queues, and to give terrain changes their own, lower-priority queue. The code is really hacky right now (I mostly just copy-pasted code and added "0" and "1" suffixes to things). I also capped the amount of time (1ms) that a thread can spend on a given task before it "leaves work for next time". The code is starting to look a little repetitive, and I'm considering creating some abstraction for the general idea of "prioritized budgeted processing". where you have several code blocks that have some total budget, and that run in some (prioritized) specified order.

Loading Surroundings
When you start up Playform, your player hovers in the air for quite a time while the terrain loads. I'd like to say I fixed this, but the fix adds some incredible (and unacceptable) lag. I still think the basic idea is sound, though, which is that the server prioritizes "give me this block" requests from the client based on how far the block is from the player. Ideally, the load priority would also favor more recent requests within a given "priority tranche".

Biomes, Mountains
I've added a preliminary stone material for the preliminary mountain biome. This is the kind of thing that feels very productive, even though the code changes are minuscule (I'll pat myself on the back here for making some solid abstractions).



I also accidentally placed a tree.



Internal
Other than those changes, I've made some internal changes: I've created a "voxel" crate, which separates out the generic voxel-y parts of the code; and a "terrain" crate, which separates out the terrain data structures and terrain gen code. I'm creating the crates based on a loose idea of abstraction boundaries of "I want code to exist that I don't actually use, for completionist purposes". For instance, the different biome code, or the different voxel materials. Playform is built with a specific biome in mind, but I still want to have the code for other biomes lying around in the terrain crate.

Saturday, September 5, 2015

Trees are back!

After a long absence, trees have made it back into Playform!

Development
One of the big requirements was the voxel material system, because I didn't just want to have tree-shaped dirt. The leaves are still shaded using the grass shader though, because I was too lazy to work on a better texture.

On that note, you might notice that the terrain has a more textured look! I put some time into making passable-looking procedural texture shaders, instead of the embarrassing sine-wave-based mixing between plain green and plain brown. The grassiness of the terrain is controlled by a noise function (huge thanks to ashima/webgl-noise!), and also the slope of the terrain - the steeper the terrain, the less grass it has, on average.

All this texturing is done in the shaders right now. I do want to eventually move these into real textures, and let the user "paint" the terrain.

The trees themselves are defined through the same code that defines the terrain and the sphere brushes, although this that code's changed a little with the material system. You map every point in space to a material. Since the trees are placed with a brush, it's also valid to say "I don't touch this point". This code isn't the nicest to write: even with the trees as simple as they are right now, there's a fair amount of "wait am I in this part of the tree? Oh I'm not? Alright, how about THIS ONE?" I plan to put some work into composing shapes like trees out of smaller shapes like spheres and columns.

TODOs
The trees don't spawn naturally right now; the user can place them with a left click.

They also all look exactly the same. Branches and better leaves are a must-have. Different kinds of trees would also be nice.

Sunday, August 30, 2015

Procedural texturing

Lots of hours have been spent on shaping Playform's terrain, playing with noise functions, isosurface extraction, and brushes. Lately I've been realizing how much of an impact the finer visual details make, starting with the introduction of depth fog. They help break up the monotony of an otherwise pretty homogenous landscape. I plan to bring in landscape features like trees to help too.

Procedural texturing means not having hand-crafted textures ahead of time. Because the terrain can shift and change organically, mapping these textures seamlessly is a bit of a headache. It's not impossible by any stretch (e.g. via Wang tiles). But I'm not convinced I want preconstructed assets at all - I think eventually, I'd like the user to be able to paint the terrain, much like how they can sculpt it. I want some cheap, easy, and decent-looking texturing there as a base, but I don't want the user married to it.

Noise
So how do we make a cheap, easy, and decent-looking texture from nothing? The solution is to find some smooth noise function, and twist and turn it into the desired texture. Lots of noise functions exist (Perlin noise is probably the most well-known), and there are even some built right into GLSL, the OpenGL shader language, but I ended up using this one for performance reasons, and because I'm not too nitpicky about the details as long as it's smooth and kind of bubbly looking. Here's a little patch of noise:


I'm still figuring out how to transform this basic thing, and how to think about stacks of transformations, but I've adopted Miguel Cepero's approach of adding layers until it looks halfway-decent. Here's what I've got so far. I suggest clicking to see the full-size images.

Dirt
Grass

Bark

I'm still having trouble getting things to look more jagged. The bark texture is the closest I've gotten (and I wasn't even trying to make bark - I was going for a dirt texture like this). Maybe I should just be using a different noise function for more jagged things. I'm not well-read on noise functions, so to my smooth bubbly mind, everything looks like a sinusoidal nail.

The code is up at github.com/bfops/procedural-textures. I encourage you to play around and try and make some better-looking things!

Saturday, August 29, 2015

Voxel brushes

Playform now comes equipped with voxel brushes! They are very rough and very buggy, and the only one exposed directly to the user is a sphere brush, but the core functionality is still there.



This has been slowly in the works for over a month, and things were crashing pretty consistently for most of that time.

One big source of crashes was the code that turns the terrain voxels into a surface mesh (see this post). One invariant of that code is that if a voxel edge crosses the terrain surface, all the voxels containing that edge must be aware that they too cross the terrain, and keep extra data around to help build the mesh. It turns out it's really easy to violate that invariant if you're not being careful, and I wasn't. Another source of problems was that sometimes the "eraser" brush would expose parts of the world that hadn't been evaluated yet, because they'd been buried underground, and therefore weren't relevant to creating the mesh. Yet another series of crashes came up when a brush ended exactly between two voxels, because then they would "disagree" about whether they were crossing the terrain surface. After lots of debug logs and code comments, a version was reached that didn't constantly crash. It's still far from perfect, but it kindasortamaybesometimes works enough that I'm considering it functional for now, and doing other things for a while.

I started with a cube brush, because it's easy to define a cube and test whether it intersects various other things (especially points and cube-shaped voxels). This ended up being great since it exercised a lot of edge cases (pun intended), because the numbers were "too perfect". The sphere brush gave me an unreasonable amount of headache, because I couldn't find a clever way to place points inside some voxel that rested on the surface of some sphere (said another way: it's easy to pick some point inside a given voxel that rests on the surface of a given sphere, but it's a lot harder to find a good point that will make the mesh look nice). In the end, I just approached it from the point of view of "you have some shape, find a point on it", and reworked the terrain generation code that has to deal with exactly the same problem.

Terrain, voxels, and meshes


Defining the terrain
Playform's terrain is defined using some cheap and easy noise functions. The basis is Perlin noise, which is a cheap and easy way of generating smooth, bubbly-looking output, like hilly terrain. A fractional Brownian model is applied to this noise, which is a term I learned just now, and a term I probably won't need to use again for at least another year. It basically just means that I take my Perlin noise, generate it at several amplitudes and frequencies in a fractal pattern (e.g. repeatedly multiplying both by 1/2), and then adding all the results together.

This lets you roughly describe lots of different terrain features pretty easily - mountains are high amplitude and low frequency, where bumps and ditches are low amplitude and high frequency. A nice thing about these fancy-sounding noise functions is that they're popular and well-understood, so finding libraries to generate them (and generate them quickly) isn't hard. Perlin noise can also be used to generate things like textures, which is a thing I've been meaning to work on for a while.

Storing the terrain
Playform's world is divided into a big 3D grid, and each cell can be called a "voxel" (which really just means a 3D pixel). Minecraft and many other games use this idea, so often each cell is considered filled or empty (or equivalently, filled with air). This world representation lets you add and remove pieces "anywhere" in the world without a whole lot of work.

Voxels don't just have to contain simple "full or empty" data; Playform's voxels contain more data so that we know whether each vertex in the grid is inside or outside the terrain. This means that any given grid edge can be entirely inside or outside the terrain (if both its vertices are), or it can be crossing the terrain. The crossing edges are the interesting ones.

This same idea applies to entire grid cells: if all the corners of a cell are on the same side of the terrain, we can just say that the whole voxel is inside/outside, but if it has some corners inside and some outside (or equivalently, if any of its edges cross the terrain), then the voxel intersects the surface of the terrain. Each intersecting voxel stores some extra data to help build the mesh: it stores a point inside that voxel that lies on the surface of the terrain (it doesn't have to be perfect, just approximate). It also stores a surface normal at that point, for lighting and whatnot.

Lastly, this grid isn't stored as a big 3D array; it's stored as an octree (a 3D binary tree). This lets you have bigger cells where less detail is needed (e.g. huge open areas, or huge chunks of space that are entirely inside the terrain), and it gives you the option of having "arbitrarily" smaller cells where more detail is needed.

From voxels to polys
The bulk of the credit for Playform's voxel-to-mesh algorithm goes to this Siggraph 2002 paper and to Miguel Cepero's blog post about exactly this problem.

Whenever we find an edge that intersects the terrain, we generate a little bit of mesh to represent the part of the terrain that is being crossed. We do this by looking at the four grid cells that the edge is a part of. Since the edge is crossing the terrain surface, all of those grid cells must be crossing it too, so they all store a point. We use those four points to construct a little patch of mesh.

This choice of what data to store in each voxel is nice for several reasons. It's pretty simple - a big portion of the work is deciding whether or not a given point is inside the terrain. Deciding where to place points inside the voxels isn't that important algorithmically, it just affects the visual quality of the resulting mesh - you can just put every point in the very center of its "parent" voxel. Curves won't look nice, but it's really easy to calculate. You can calculate surface normals using cross products, since once you have a few points, you also have vectors between those points that are roughly "lying on" the surface. You can make them a little smoother by averaging adjacent normals together.

The mesh points can be placed roughly anywhere inside the voxel, which means you can faithfully represent all kinds of different surfaces. Flat, curvy, pointy, the algorithm really doesn't care, as long as it has some points to work with. One thing I'd like to do is have a mesh import feature (again, credit to Miguel Cepero at Voxel Farm); as long as the grid is high-resolution enough that every mesh vertex is inside its own grid cell, the mesh could be imported without any loss of data, and then made a part of the world!

Depth fog - it makes a difference

Depth fog is the obscuring of objects based on how far they are from the viewer, eventually blending them into the background entirely. Here's a picture of Playform in all its depth-foggy glory:

Ahh.. Playform at dawn

Here's a roughly side-by-side example of the change. I'm unreasonably proud of how this looks.

Without depth fog

With depth fog

Depth fog adds a lot! The code change is incredibly shallow (heh), with only a few extra lines in the shader for a simple exponential drop-off in visibility. I spent much more time tweaking the constants and the curve than I spent writing the code, and that's a nice place to be working in.

It definitely helps me get a sense of distance, and ideally the fog would be tweaked such that the objects at the furthest edge of the viewing distance aren't visible at all. Then they would fade in gradually as you moved closer, instead of aggressively popping in because they've suddenly been loaded. Right now, the bottleneck on view distance is memory consumption - the GPU crunches through the whole mesh in less than a tenth of a millisecond, but the amount of mesh that fits in memory is a little lackluster.

Stay tuned! Coming up hopefully-soon is voxel brushes, better texturing, and voxel materials. I hear trees are also planning a comeback tour.

Sunday, March 15, 2015

Cap'n Proto Rust Macros

I've found Rust is pretty good at designing powerfully for the common use case while still smoothly supporting uncommon ones. unsafe is a great example: Rust provides tons of tools to keep yourself from shooting yourself in the foot in ways that are common in C++, like null pointers and pointers to expired data. But if what Rust provides doesn't work for you, for whatever reason (e.g. your data ownership is complicated enough that you can't explain it to the compiler), then you can opt out of these things using unsafe. You, as the programmer, accept the responsibility for the things you've given up.

The macro system is another good example. Macros are pretty effective for a lot of the macro-y use cases: you can use them as inline functions, you can use them to parse varargs, and you can use them to create tree-style DSLs, like this example taken from the Rust guide:

  let mut out = String::new();

  write_html!(&mut out,
      html[
          head[title["Macros guide"]]
          body[h1["Macros are the best!"]]
      ]);

  assert_eq!(out,
      "<html><head><title>Macros guide</title></head>\
       <body><h1>Macros are the best!</h1></body></html>");

I used macros to make a DSL like that for Cap'n Proto message construction code, and I'm pretty happy with the results. This is the example from capnpc-rust:

let mut message = MallocMessageBuilder::new_default();
{
    let address_book = message.init_root::<address_book::Builder>();

    let mut people = address_book.init_people(2);

    {
        let mut alice = people.borrow().get(0);
        alice.set_id(123);
        alice.set_name("Alice");
        alice.set_email("alice@example.com");
        {
            let mut alice_phones = alice.borrow().init_phones(1);
            alice_phones.borrow().get(0).set_number("555-1212");
            alice_phones.borrow().get(0).set_type(person::phone_number::Type::Mobile);
        }
        alice.get_employment().set_school("MIT");
    }

    {
        let mut bob = people.get(1);
        bob.set_id(456);
        bob.set_name("Bob");
        bob.set_email("bob@example.com");
        {
            let mut bob_phones = bob.borrow().init_phones(2);
            bob_phones.borrow().get(0).set_number("555-4567");
            bob_phones.borrow().get(0).set_type(person::phone_number::Type::Home);
            bob_phones.borrow().get(1).set_number("555-7654");
            bob_phones.borrow().get(1).set_type(person::phone_number::Type::Work);
        }
        bob.get_employment().set_unemployed(());
    }
}

And here's how it looks using capnpc-macros:

let mut message =
    capnpc_new!(
        address_book::Builder =>
        [array init_people 2 =>
            [
                [set_id 123]
                [set_name "Alice"]
                [set_email "alice@example.com"]
                [array init_phones 1 =>
                  [
                      [set_number "555-1212"]
                      [set_type {person::phone_number::Type::Mobile}]
                  ]
                ]
                [init_employment => [set_school "MIT"]]
            ]

            [
                [set_id 456]
                [set_name "Bob"]
                [set_email "bob@example.com"]
                [array init_phones 2 =>
                    [
                        [set_number "555-4567"]
                        [set_type {person::phone_number::Type::Home}]
                    ]
                    [
                        [set_number "555-7654"]
                        [set_type {person::phone_number::Type::Work}]
                    ]
                ]
                [init_employment => [set_unemployed ()]]
            ]
        ]
    );
I think that's pretty cool.

Update: Client/Server Refactoring

The client/server refactoring is done for now! It took about a month, but Playform has been split into separate server and client binaries that communicate over sockets. Hypothetically, this should work over LAN too, but I haven't tested it.

Async IO Design
A huge portion of the work went into creating a predictable and performant async IO design, that lets you explicitly prioritize different kinds of events (e.g. prioritizing world updates and client inputs over terrain loading).

The Journey: For a while, I tried using an "intuitive" approach to threading, where each conceptually independent line of work (e.g. terrain loading, client update processing, world updating) ran in a separate thread. But it was hard to control which thread ran at which time, so terrain loading would interrupt world updating at unhelpful times. It's not really reasonable or helpful for all those independent lines of work to be treated equally. It's not at all impossible to coordinate threads with one another (e.g. by using condition variables to have threads signal each other), but it just seemed like too much work. I'd be trying to describe the relationships between threads by defining individual actions between them; I imagine that's like trying to draw a picture in Conway's game of life: doable, but unnecessarily hard. If you want to describe a picture, just describe the picture.

The Result: There's that old rule of thumb about multithreading that says if any nontrivial portion of the work you do is serial, then the speedup you get from multithreading will be incremental. That basic idea lingered in my head throughout the whole redesign. I felt like although I could try to jump straight to an inherently-multithreaded design, it might work better to try and get a design that works pretty well in a single thread, that can have more threads added to it, i.e. a design where the work being done is independent of the threads doing it. I went back to having a single (main) thread for the client and server, plus message-handoff threads that look like:
  let mut listen_socket =
    ReceiveSocket::new(
      listen_url.clone().as_slice(),
      Some(Duration::seconds(30)),
    );
  loop {
    let msg = listen_socket.read();
    server_recv_thread_send.send(msg).unwrap();
  }

These threads are basically always blocked, since they're either waiting for incoming message (so blocked on interrupts) or sending a message (so probably blocked on a memcpy). They don't spend a whole lot of time actually doing anything, so I pretty much ignore them (let me know if that'll get me into trouble).

The server's main thread looks like:
  in_series!(
    {
      let updates =
        server.update_timer.lock().unwrap().update(
          time::precise_time_ns()
        );
      if updates > 0 {
        update_world(
          ...
        );
        true
      } else {
        false
      }
    },
    {
      listen_thread_recv.lock().unwrap().try_recv_opt()
        .map_to_bool(|up| {
          let up = binary::decode(up.as_slice()).unwrap();
          apply_client_update(
            ...
          );
        })
    },
    {
      gaia_thread_recv.lock().unwrap().try_recv_opt()
        .map_to_bool(|up| {
          update_gaia(
            ...
          )
        })
    },
  );

The in_series macro runs its parameter "actions" in top-down order until one of them succeeds, then returns to the top of the list. If none of the functions succeeds, it explicitly yields the thread. The result is that it runs the top action until it fails, then for as long as it fails, it runs the next action until it fails, and so on. So the server's main thread updates the world at a regular interval (e.g. update players, mobs, move the sun, etc.). If there's time available, it'll process events from the listen socket. With any remaining time, we process terrain load requests. Terrain requests are separate from world updates because they're less critical and cost more CPU time. That said, the world updates (top priority) decide what terrain to load, and load solid placeholder blocks until the terrain is loaded, so, for instance, the player won't fall through the world if the terrain gen is too slow.

The client has a similar main thread. If you wanted to process two of the actions above at the same priority, you could just merge the message queues into a single queue. So far, that hasn't been particularly necessary: realistically, we should be "running out" of world updates and client updates on a really frequent basis (i.e. several times a second), so none of the actions get starved. If we assume that's happening, then the exact ordering doesn't matter anyway.

This design is also easy to scale to multiple threads; I can just add a new thread with the same in_series-based code; the message queues and server data are Mutex-wrapped to be thread-safe, so the threads can run pretty independently (I still want to put in more work on the locking side of things, so threads are less likely to subtly/unintentionally block one another). The one major wrinkle is when some things need to be single-threaded, like OpenGL calls in the client (I've been operating under the assumption that the OpenCL calls for terrain generation work the same way). In that case, you can just remove those actions from the in_series invocations of all but one thread.

The last 20%
The async IO redesign was a big chunk of the client/server refactoring, but a few more things needed to happen to get performance back to where it was before - I wanted to be able to run a local server and local client and have the same smoothness I had when they were bundled together in the same process.

Message-passing: A big bottleneck that came up was the message-passing between client and server: they communicate over sockets (since, ideally, I want this to also work over at least LAN), so before, where we could just pass data by passing a pointer, now we actually need to send data, byte-by-byte, in a serialized format (no pointers for things like dynamic arrays!) I was using JSON originally, just because it's easy and built in to the standard library, but it was too verbose and too slow. I tried to use Cap'n Proto, but the refactoring to make it work was getting to be too much (I did get a cool new repo out of it, though). I ended up just writing my own "Flatten" trait to turn data types into "flat" memory blocks, and using that.

Rate limiting: The client asks the server to send it all the blocks of terrain it wants to load. The end goal for the client is "load all blocks within some radius R". We could just send a "gimme all blocks within radius R" packet to the server, but if we move a little to the right, we don't want to get all those blocks again. Ideally, we could just ask for the new "slice" of blocks, but we also don't have any guarantee that all the previous ones finished loading, or what order they loaded in (since the server could really send them back in any order, regardless of the requested order - we want to load blocks as soon as they're available, don't we?) So although it might be possible to avoid individually requesting blocks all the time, the amount of work we'd have to do in order to request in bigger "chunks" doesn't seem worth it. Instead, let's just handle that huge stream of requests better. More compact, efficient serialization was a big part of it. Handling terrain loading as a lower-priority action was another big step, since handling client terrain requests as part of the client update action would lag all the event processing from the client (e.g. you'd press W and the player wouldn't move because the server was still busy dealing with your client's terrain requests). But there were still too many actions; the client updates were still being slowed down by all the terrain requests. The obvious, simple, functional solution was just to rate-limit the client. Ideally, this would be done on the server's end (since we have to do that eventually for security reasons anyway), but for now, rate limiting on the client's end is good enough - just don't be as eager about sending requests.

Don't be an idiot: The one last optimization that made a big difference was some of the more low-hanging fruit I could've found: at some point during the switch (or maybe it's been this way the whole time), I accidentally removed the code that would filter out requests for blocks that were already loaded. When the client moved a few steps in any direction, we'd re-request roughly render_distance^3 blocks, instead of roughly render_distance^2. That made a pretty noticeable difference, and gave the performance that last push back to smoothness!

Monday, March 2, 2015

Stripping Down the Scheduler

In my last post, I laid out a high-level client/server design based around a custom scheduler enqueuing and prioritizing closures. The basic idea is still sound, but somewhat surprisingly (at least to me) the "closure" part of this design is the least relevant one.

All the queued closures would originate from somewhere: either incoming messages, or by timed signals. If we allocate a closure for every single event, it means a whole lot of small, repetitive heap allocations. We can take down the churn a little by batching up messages, but we sacrifice some flexibility and latency.

But more importantly, why use closures at all? The set of things causing a closure is small and well-defined - we can keep the priority- and deadline-based approach, but we can implement it on run-of-the-mill message queues too. We have to add a few enums to describe the messages, but that's okay. Then a fixed-size task pool can pop and process the prioritized events just like before.

Once again, I find myself bitten by trying to think too generically.

Saturday, February 28, 2015

High-Level Client/Server Design

The server has to be able to send things like player updates at a relatively high rate, with a relatively low latency. Ideally, the "reaction time" between a keypress and movement is <10ms (a nice round number for human-noticeable latency), not counting network latency. That would mean we could just run a local one-player server for singleplayer.

The idea of tiered priorities to organize async IO is becoming more and more appealing. Each tier holds a priority queue of closures, ordered by completion deadline. We do not process lower-tier actions before a higher tier is complete. (Note that actions may schedule other actions).

Warnings should be added for when the queue is growing too quickly, or when too many things are being processed too late.

We should also audit these actions for things that can be processed greedily (i.e. before their actual deadline). Interval-based actions (e.g. rendering) can be responsible for scheduling their next iterations, so that the overall rate of an action is unaffected by greedy processing.

Client
  • Priority 0
    • Receives blocks from the server.
      • Queues blocks for loading. 
    • Receives world updates.
      • Moves the client.
        • Removes obsolete surroundings from render state.
        • Requests blocks from the server.
        • Updates the render state.
    • Receives entity updates.
      • Updates the render state.
    • Receives user input events.
      • Sends the server player updates.
      • Updates the render state.
  • Priority 1
    • Renders at a regular interval.
  • Priority 2
    • Loads queued blocks.
Server
  • Priority 0
    • Acks new clients.
      • Register client listen addresses.
      • Open a new connection.
      • Sends clients initial world state.
    • Queues terrain block requests.
    • Applies player updates from clients.
  • Priority 1
    • Updates the world at a regular interval.
      • Sends world updates to clients.
      • Unload blocks from the octree.
      • Requests terrain blocks.
  • Priority 2
    • Loads/generates queued blocks.
      • Sends blocks to client.
      • Loads blocks into octree.

Tuesday, February 24, 2015

Async IO: Latency and Consistency

The degree of control I have over the threads in Playform is lacking. It seems that when the server should be flushing messages to the client, it's loading terrain instead. That's not always a bad call, it's just not great if the client is waiting on player updates (low latency is kind of key in realtime games). Potentially, we could just say "process all incoming events before doing anything else", by using a condition variable, but that doesn't really give any kind of enforcement between "anything else"s. World updates need to be considered too. Trying to organize the relationships between these threads using condition variables is going to become a headache quickly, even with only a few of them.

I could put a whole lot of things back into a single thread, but there's a reason we separated them in the first place.. If they can be running concurrently, they should be (e.g. generating more terrain in the background). And putting things all into a single loop means you tend to have a strict sequence, e.g. incoming events -> world updates -> generate terrain, although of course you could write more code to let them run with a more dynamic ordering.

Which brings me to the new approach I'm considering: implement threading via a priority queue of closures. A fixed-size ThreadPool constantly executes the highest priority closures on the queue. Currently, our threads are all initialization blocks for shared state, followed by a message processing loop. The loop can become queued closures by scheduling a single iteration, and having it schedule itself again when done. The shared state becomes a parameter to the closure (which it then passes to itself as part of its recursive scheduling). What about the priority? Next iteration deadline works as a base priority for the loops; it can become more complicated as needed.

Now this is a switch from interrupt-based to polling-based IO, so instead of blocking on message queue reads, each loop body executes at a regular interval and checks whether messages are available. If these loops are dealing with incoming events, the average latency for dealing with an incoming event is directly proportional to how often the closure is scheduled. If we have events that require low-latency, low-throughput processing, we'll be doing more spinning than we should be. I think this is a fair trade to make for a game though: most events are high throughput, and a check for "are there messages available" is pretty damn cheap (even including the context switch) compared to some of the other loop bodies.

If anybody has thoughts on this, let me know. Async IO is hard and I would love for someone to have already dealt with my use case.

Monday, February 23, 2015

Async IO

Today, I added SendSocket<T> and ReceiveSocket<T> structs to Playform. These replace and subsume the spark_socket_sender and spark_socket_receiver functions that were there before, which would spawn a new thread to pump messages between a socket and async queue. I briefly debated adding support for a kill signal, to be sent from the parent, so the threads could cleanly kill themselves. But since the socket threads can be blocked indefinitely (e.g. waiting on a disconnected client), some mechanism of external kill by the parent needs to exist anyway, so we might as well use that all the time. So far, I've been returning the socket endpoints to the caller of spark_socket_{sender,receiver}, and the caller can shut them down, which causes the sparked thread to die. But that's far from clean, and the resources that were moved into the sparked threads would be kind of messily dealt with.

Now we have SendSocket::spawn and ReceiveSocket::spawn. They do the same thing, but return a SendSocket/ReceiveSocket, which hold all the state created. This requires that the threads spawned have a more constrained lifetime (so they can be put inside a data structure), which turns out to actually be a good thing: it means that the compiler now lets those threads make use of values that aren't around for as long, e.g. sockets created at runtime.

Adding these types meant changing a lot of signatures, though. A lot of Playform's threads look something like:
/// Keep the terrain around the client loaded.
pub fn surroundings_thread(
  client: &Client,
  ups_to_view: &Mutex<Sender<ClientToView>>,
  ups_to_server: &Mutex<Sender<ClientToServer>>,
) {
  ...
}
Because the high-level design is still evolving, these types change a lot. Sometimes references get added or removed, sometimes Mutexes get added or removed, and here, I'm changing from a Sender to SendSocket - a different type entirely. This is annoying.

My solution was to let the threads just say "I want to fire off an event of type X", without caring much about exactly how. I replaced the offending signatures with things like:
/// Keep the terrain around the client loaded.
pub fn surroundings_thread<UpdateView, UpdateServer>(
  client: &Client,
  update_view: &mut UpdateView,
  update_server: &mut UpdateServer,
) where
  UpdateView: FnMut(ClientToView),
  UpdateServer: FnMut(ClientToServer),
{
  ...
}
It's longer and clunkier, but it means I don't have to change these signatures nearly as much. What this does is say that the surroundings_thread can be run with any two functions that accept ClientToView and ClientToServer respectively. The exact nature of these functions isn't important (e.g. whether it's a plain-ol' rust function, a closure on the stack, or something else that just kinda works like a function), which is why the types of functions themselves are parameters (<UpdateView, UpdateServer>).

At use time, we specify what exactly it means to apply our updates:
thread::spawn(move || {
  surroundings_thread::surroundings_thread(
    client,
    &mut |msg| { client_to_view_send.lock().unwrap().send(msg).unwrap() },
    &mut |msg| { ups_to_server.lock().unwrap().send(msg).unwrap() },
  );
})

Saturday, February 21, 2015

Terrain loading design

We can't keep an entire world loaded, but we need to make sure that the useful parts of the world (i.e. the parts near players) are loaded. How do we do this?

First, the world is divided into a regular cubic grid. Each cell on the grid is loaded and unloaded contiguously. Each cell is generated at several levels of detail (LODs), and the appropriate version is selected depending on distance from the viewer. This is called a geometry clipmap.

A SurroundingsIter struct iterates through all the cells in an NxNxN block; it iterates from the center outward in cubic "layers", so that terrain loads from near to far around a player.

This SurroundingsIter is wrapped up in a SurroundingsLoader struct, which is responsible for maintaining the cubic area around some (moving) point loaded. Its update function takes in a new central point, and produces Load and Unload events, which designate a cell to be unloaded or loaded at a specific LOD. In different places, these events are handled differently: the client responds by either requesting blocks from the server (for Loads) or by removing the block data from VRAM (for Unloads); the server responds to Loads by inserting the physical mesh data into the world for collision detection (note also that because the server doesn't care about visuals, it doesn't need to keep more blocks loaded than just the ones that the player touches).

Now, what if generating terrain takes too long? Should the server block? Probably not. Other updates and clients should be kept alive and well, regardless of terrain gen. Just load it when it's ready? What if the terrain is under the player? They'll fall through the world if it takes too long to load.

My solution so far has been to make unloaded areas solid near players; you simply can't move into a cell until it's loaded. This has some interesting side effects, though - what if one block isn't loaded, but the empty block above it is? Then I can use the solid unloaded block as a step to reach that chest I couldn't reach before (hypothetically).

The solidfying logic is surprisingly code-costly too: in the original client, blocks were either loaded both physically AND graphically (at some LOD), or not at all. The concept of "solid as a placeholder until the terrain gen is done" had to be threaded through all the code and data structures that dealt with loading, unloading, and levels of detail. In the end, the "cleanest" thing ended up being to add Placeholder as a somewhat special minimal LOD. This still exists in the server code, but thankfully the client doesn't worry about physics, and doesn't have to deal with this particular quirk.

I'd like to remove this logic, although there's no ideal replacement - if you stop anything from moving until the terrain around it is loaded, that can be exploited to, say, fire at an opponent from above. Minecraft just kind of pretends that unloaded areas are air until proven otherwise, and resolves inconsistencies by suffocating players that end up inside something. Although the idea of directly punishing players for slow servers tickles me, I feel like there's a better solution. Maybe the best solution is just to work hard at keeping players from encountering unloaded areas, and then it becomes less important what happens when they do.

The server code for keeping track of loaded blocks is overcomplicated too; since it doesn't care about graphics, it really only loads blocks at one LOD (max) for collision logic. The data structure(s) it uses are left over from the old client, though, and they can reason about many different owners requesting blocks at different LODs. Although right now Placeholder is still a LOD, so this functionality is kindasorta used, it really can get a lot simpler (which almost definitely means faster, too).