Notes from Strange Loop 2015: Day One

Unconventional Programming with Chemical Computing

  • Carin Meier
  • Living Clojure
  • @Cognitect
  • Inspired by Book – unconventional programming paradigms
  • “the grass is computing”
    • all living things process information via chemical reactions on molecular level
    •  hormones
    • immune system
    • bacteria signal processing
  • will NOT be programming with chemicalsusing metaphor of molecules and reactions to do computing
    • nothing currently in the wild using chemical computing
  • at the heart of chemical programming: the reaction
  • will calculate primes two ways:
    • traditional
    • with prime reaction
  • uses clojure for the examples
  • prime reaction
    • think of the integers as molecules
    • simple rule: take a vector of 2 integers, divide them, if the mod is zero, return the result of the division, otherwise, return the vector unchanged
    • name of this procedure: gamma chemical programming
    • reaction is a condition + action
    • execute: replacement of original elements by resulting element
    • solution is known when it results in a steady state (hence, for prime reaction, have to churn over lists of integers multiple times to filter out all the non-primes)
  • possible advantages:
    • modeling probabilistic systems
    • drive a computation towards a global max or min
  • higher order
    • make the functions molecules as well
    • fn could “capture” integer molecules to use as args
    • what does it do?
    • it “hatches” => yields original fn and result of applying fn to the captured arguments
    • reducing reaction fn: return fewer arguments than is taken in
    • two fns interacting: allow to exchange captured values (leads to more “stirring” in the chem sims)
  • no real need for sequential processing; can do things in any order and still get the “right” answer
  • dining philosophers problem
    • something chemical programming handles well
    • two forks: eating philosopher
    • one fork or no forks: thinking philosopher
    • TP with 2fs reacting with EAT => EP
  • “self organizing”: simple behaviors combine to create what look like complex behaviors
  • mail system: messages, servers, networks, mailboxes, membranes
    • membranes control reactions, keep molecules sorted
    • passage through membranes controlled by servers and network
    • “self organizing”

How Machine Learning helps Cancer Research

  • evelina gabasova
  • university of cambridge
  • cost per human genome has gone down from $100mil (2001) to a few thousand dollars (methodology change in mid-2000s paid big dividends)
  • cancer is not a single disease; underlying cause is mutations in the genetic code that regulates protein formation inside the cell
  • brca1 and brca2 are guardians; they check the chromosomes for mistakes and kill cells that have them, so suppress tumor growth; when they stop working correctly or get mutated, you can have tumors
  • clustering: finding groups in data that are more similar to each other than to other data points
    • example: clustering customers
    • but: clustering might vary based on the attributes chosen (or they way those attributes are lumped together)?
    • yes: but choose projection based on which ones give the most variance between data points
    • can use in cancer research by plotting genes and their expression and looking for grouping
  • want to be able to craft more targeted responses to the diagnosis of cancer based on the patient and how they will react
  • collaborative filtering
    • used in netflix recommendation engine
    • filling in cells in a matrix
    • compute as the product of two smaller matrices
    • in cancer research, can help because the number of people with certain mutations is small, leading to a sparsely populated database
  • theorem proving
    • basically prolog-style programming, constraints plus relations leading to single (or multiple) solutions
    • can use to model cancer systems
    • was used to show that chronic myeloid leukemia is a very stable system, that just knocking out one part will not be enough to kill the bad cell and slow the disease; helps with drug and treatment design
    • data taken from academic papers reporting the results of different treatments on different populations
  • machine learning not just for targeted ads or algorithmic trading
  • will become more important in the future as more and more data becomes available
  • Q: how long does the calculation take for stabilization sims?
    • A: for very simple systems, can take milliseconds
  • Q: how much discovery is involved, to find the data?
    • A: actually, whole teams developing text mining techniques for extracting data from academic papers (!)

When Worst is Best

  • Peter Bailis
  • what if we designed computer systems for the worst-case scenarios?
  • website that served 7.3Billion simultaneous users; would on average have lots of idle resources
  • hardware: what if we built this chip for the mars rover? would lead to very expensive packaging (and a lot of R&D to handle low-power low-weight environments)
  • security: all our devs are malicious; makes code deployment harder
  • designing for the worst case often penalizes the average case
  • could we break the curve? design for the worst case and improve the average case too
  • distributed systems
    • almost everything non-trivial is distributed these days
    • operate over a network
    • networks make designs hard
      • packets can be delayed
      • packets may be dropped
    • async network: can’t tell if message has been delayed or dropped
      • handle this by adding replicas that can respond to any request at any time
      • network interruptions don’t stop service
  • no coordination means even when everything is fine, we don’t have to talk
    • possible infinite service scale-out
  • coordinated multi-server transactions pay large penalty as we add more servers (from locks); get more throughput if we let access be uncoordinated
  • don’t care about latency if you don’t have to send messages everywhere
  • but what about the CAP theorem?
    • inktomi from eric brewer: for large scale services, have to trade off between always giving an answer and always giving the right answer
    • takeaway: certain properties of a system (like serializability) require unavailability
    • original paper: cathy lynch
    • common conclusion: availability is too expensive, and we have to give up too much, and it only matters during failures, so forget about it
  • if you use worst case as design tool, you skew toward coordination-avoiding databases
    • high coordination is legacy of old db design
    • coordination-free designs are possible
  • example: read committed isolation
    • goal: never read uncommitted data
    • legacy implementation: lock records during access (coordination)
    • one way: copy on write (x -> x’, do stuff -> write back to x)
    • or: versioning
    • for more detail, see martin’s talk on saturday about transactions
  • research on coordination-free systems have potential for huge speedups
  • other situations where worst-case thinking yields good results
    • replication for fault tolerance can also increase your request-serving capacity
    • fail-over can help deployments/upgrades: if it’s automatic, you can shut off the primary whenever you want and know that the backups will take over, then bring the primary back up when your work is done
    • tail latency in services:
      • avg of 1.2ms (not bad) can mean 0.1% of requests have 100ms (which is terrible)
      • if you’re one of many services being used to fulfill a front-end request, your worst case is more likely to happen, and so drag down the avg latency for the end-user
  • universal design: designing well for everyone; ex: curb cuts, subtitles on netflix
  • sometimes best is brittle: global maximum can sit on top of a very narrow peak, where any little change in the inputs can drive it away from the optimum
  • defining normal defines our designs; considering a different edge case as normal can open up new design spaces
  • hardware: what happens if we have bit flips?
  • clusters: what’s our scale-out strategy?
  • security: how do we audit data access?
  • examine your biases

All In with Determinism for Performance and Testing in Distributed Systems

  • John Hugg
  • VoltDB
  • so you need a replicated setup?
    • could run primary and secondary
    • could allow writes to 2 servers, do conflict detection, and merge all writes
    • NOPE
  • active-active: state a + deterministic op = state b
    • if do same ops across all servers, should end up with the same state
    • have client that sends A B C to coordination system, which then ends ABC to all replicas, which do the ops in order
    • ABC: a logical log, the ordering is what’s important
    • can write log to disk, for later replay
    • can replicate log to all servers, for constant active-active updates
    • can also send log across network for cluster replication
  • look out for non-determinism
    • random numbers
    • wall-clock time
    • record order
    • external systems (ping noaa for weather)
    • bad memory
    • libraries that use randomness for security
  • how to protect from non-determinism?
    • make sure sql is as deterministic as possible
    • 100% of their DML is deterministic
    • rw transactions are hard to make deterministic, have to do a little more planning (swap row-scan for tree-index scan)
    • use seeded random-number generators that are lists created in advance
    • hash up the write ops, and require replicas to send back their computed hashes once the ops are done so the coordinator can confirm the ops were deterministic
    • can also hash the whole replica state when doing a transactional snapshot
    • reduce latency by sending condensed representation of ops instead of all the steps (the recipe name, not the recipe)
  • why do it?
    • replicate faster, reduces concerns for latency
    • persist everything faster: start logging when the work is requested, not when the work is completed
    • bounded sizes: the work comes in as fast as the network allows, so the log will only be written no faster than the network (no firehose)
  • trade-offs?
    • it’s more work: testing, enforcing determinism
    • running mixed versions is scary: if you fix a bug, and you’re running different versions of the software between the replicas, you no longer have deterministic transactions
    • if you trip the safety checks, we shut down the cluster
  • testing?
    • multi-pronged approach: acid, sql correctness, etc
    • simulation a la foundationDB not as useful for them, since they have more states
    • message/state-machine fuzzing
    • unit tests
    • smoke tests
    • self-checking workload (best value)
      • everything written gets self-checked; so to check a read value, write it back out and see if it comes back unchanged
    • use “nefarious app”: application that runs a lot of nasty transactions, checks for ACID failures
    • nasty transactions:
      • read values, hash them, write them back
      • add huge blobs to rows to slow down processing
      • add mayhem threads that run ad-hoc sql doing updates
      • multi-table joins
        • read and write multiple values
      • do it all many many times within the same transaction
    • mix up all different kinds of environment tweaks
    • different jvms
    • different VM hosts
    • different OSes
    • inject latency, disk faults, etc
  • client knows last sent and last acknowledged transaction, checker can be sure recovered data (shut down and restart) contains all the acknowledged transactions

Scaling Stateful Services

  • Caitie MacCaffrey
  • been using stateless services for a long time, depending on db to store and coordinate our state
  • has worked for a long time, but got to place where one db wasn’t enough, so we went to no-sql and sharded dbs
  • data shipping paradigm: client makes request, service fetches data, sends data to client, throws away “stale” data
  • will talk about stateful services, and their benefits, but WARNING: NOT A MAGIC BULLET
  • data locality: keep the fetched data on the service machine
    • lower latency
    • good for data intensive ops where client needs quick responses to operations on large amounts of data
  • sticky connections and consistency
    • using sticky connections and stateful services gives you more consistency models to use: pipelined random access memory, read your write, etc
  • blog post from werner vogel: eventual consistency revisited
  • building sticky connections
    • client connecting to a cluster always gets routed to the same server
  • easiest way: persistent connections
    • but: no stickiness once connection breaks
    • also: mucks with your load balancing (connections might not all last the same amount of time, can end up with one machine holding everything)
    • will need backpressure on the machines so they can break connections when they need to
  • next easiest: routing logic in cluster
    • but: how do you know who’s in the cluster?
    • and: how do you ensure the work is evenly distributed?
    • static cluster membership: dumbest thing that might work; not very fault tolerant; painful to expand;
    • next better: dynamic cluster membership
      • gossip protocols: machines chat about who is alive and dead, each machine on its own decides who’s in the cluster and who’s not; works so long as system is relatively stable, but can lead to split-brain pretty quickly
      • consensus systems: better consistency; but if the consensus truth holder goes down, the whole cluster goes down
  • work distribution: random placement
    • write anywhere
    • read from everywhere
    • not sticky connection, but stateful service
  • work distribution: consistent hashing
    • deterministic request placement
    • nodes in cluster get placed on a ring, request gets mapped to spot in the ring
    • can still have hot spots form, since different requests will have different work that needs to be done, can have a lot of heavy work requests placed on one node
    • work around the hot spots by having larger cluster, but that’s more expensive
  • work distribution: distributed hash table
    • non-deterministic placement
  • stateful services in the real world
  • scuba:
    • in-memory db from facebook
    • believe to be static cluster membership
    • random fan-out on write
    • reads from every machine in cluster
    • results get composed by machine running query
    • results include a completeness metric
  • uber ringpop
    • nodejs library that does application-layer sharding for their dispatching services
    • swim gossip protocol for cluster membership
    • consistent hashing for work distribution
  • orleans
    • from Microsoft Research
    • used for Halo4
    • runtime and programming model for building distributed systems based on Actor Model
    • gossip protocol for cluster membership
    • consistent hashing + distributed hash table for work distribution
    • actors can take request and:
      • update their state
      • return their state
      • create a new Actor
    • request comes in to any machine in cluster, it applies hash to find where the DHT is for that client, then that DHT machine routes the request to the right Actor
    • if a machine fails, the DHT is updated to point new requests to a different Actor
    • can also update the DHT if it detects a hot machine
  • cautions
    • unbounded data structures (huge requests, clients asking for too much data, having to hold a lot of things in memory, etc)
    • memory management (get ready to make friends with the garbage collector profiler)
    • reloading state: recovering from crashes, deploying a new node, the very first connection of a session (no data, have to fetch it all)
    • sometimes can get away with lazy loading, because even if the first connection fails, you know the client’s going to come back and ask for the same data anyway
    • fast restarts at facebook: with lots of data in memory, shutting down your process and restarting causes a long wait time for the data to come back up; had success decoupling memory lifetime from process lifetime, would write data to shared memory before shutting process down and then bring new process up and copy over the data from shared to the process’ memory
  • should i read papers? YES!

How to Fix Riddick

I love Pitch Black. It’s an almost perfect B movie to me, all horror and snark and very little fat left on the bone.

After the bloat of Chronicles of Riddick, I was hoping the third movie would be a return to form, stripping away the mythology of the sequel to reveal the basics that made the original great.

Instead, Riddick is just another male power fantasy, embracing every cliche possible, from “one man against the wilderness” to “masculine man of manliness converts lesbian to heterosexuality.”

What a mess.

But it’s not hopeless. There’s a good movie buried in there. We get flashes of it in the dialog given to the grunt mercs, which is cynical and darkly funny. We see more of it in the early scenes of Riddick hunting the mercs down, a horror film where Riddick is the monster.

It’s this film we need to strengthen.

We start by dropping the entire first third of the movie. I don’t care how Riddick ended up marooned on the world. The fact that he is marooned is what’s important, and that it happened after the events of Chronicles of Riddick. But I can learn he’s marooned there from the mercs’ dialog when they talk about someone setting off the emergency beacon, and I can deduce this is happening after Chronicles when I see Riddick wearing his Necromonger armor.

Instead of starting with backstory, the movie should open with the mercs landing. By starting there, all the mystery they encounter gives the movie tension. We know (or think we know) Riddick’s going to show up at some point, but we don’t know where or when or how. And when we find out he called the mercs there, and we read his note, we wonder when the bodies will start to fall.

The entire first half of the movie should be given over to this Alien-like horror sequence, with the mercs pitted against Riddick, the monster in the night.

Given more room to breathe, this part can tell us all we need to know about Riddick’s time on the planet. We can see him use his dog to trick the mercs. We can watch him use the water monsters’ poison to kill one or two of the others (and let him explain in an off-hand remark that he’s immune to their venom). By using the planet as part of his arsenal, we’ll get the sense that Riddick’s been there a while, that he knows his way around, and that the mercs face an uphill battle.

For the final half, we can introduce the rain storm. This twist forces Riddick to reach out to the (reduced to maybe one or two remaining) mercs for a truce, and now we get the scenes of a captured Riddick escaping and the tension of the mistrust between the two groups.

Finally, Dahl’s character should have a consistent sexuality. Either she should be — and remain — a lesbian, and the sexual talk between her and Riddick rewritten into a form of oddly respectful banter, or her line to Santana should be changed to “I don’t f— little boys,” and it made clear that she’s attracted to men that could their own against her in a fight (maybe by hitting on Diaz). Either way, their lines to each other need to be rewritten to show some chemistry — either friendly or otherwise — between the two.

Pride and Prejudice by Jane Austen

The second of the set of classics I’ve decided to finally go back and read.

As with Heart of Darkness, this book deserves its status. It’s oddly written from a modern perspective, violating rules left and right — telling instead of showing, switching from third to first person narration at the end of the book, having significant action happen off-screen — but is an absolute delight to read. The characters are all distinct and interesting, the dialog often made me laugh out loud, and despite the gulf of two hundred years — and a good deal of class status — made me relate to and care about the happiness of the Bennets.

Three things I learned about writing:

  • Verbal tags (e.g., he shouted, she sighed) aren’t as necessary as I thought. Austen uses almost none, yet since we know so much about each character’s personality, we can infer the tone and intent.
  • Description can be dropped for a book set in the same time period as the audience. Austen didn’t need to describe a drawing room, or a coach, or any of the characters’ clothes. Cutting all that description gave her more room for dialog and inner thoughts, which was more time for us to spend getting to know and care about her characters.
  • Don’t feel constrained by time. Austen zooms in and out of events as she pleases, summarizing a ball but giving a single conversation blow-by-blow. Skipping over events let her cover a lot of ground in a single novel.

Dropping Threads

Novel’s made it to 43,593 words.

Starting to worry that pantsing it means I’m dropping plot threads. I’ve already noticed a major one that just completely fell of my radar, and two more that are smaller but also haven’t been addressed in a while.

Not sure if I should slow down and try to fill them in, work the missing threads back into the book, or keep moving forward, and worry about fixing it later.

This might even be a good thing, a sign that these plot elements don’t belong, and should be cut, not reinforced.

It’s hard to tell which is right. I think it’s too late for the major plot point, that’ll have to wait for the second draft. The minor ones, though, I think I can fill in as I go, and take care not to leave them behind. I guess if I get stuck somewhere further in to the book, and it’s because of these missing threads, I’ll know to be more careful in the future.

Aliens vs Predator: Which is the Better Movie?

A friend of mine last week insisted that Aliens was a better movie than Predator. Having fond memories of both of these movies from my younger days, I didn’t believe her at first. I thought the movies were very different but equally good sci-fi films.

I re-watched both movies to test her thesis, and man, was I wrong. Aliens is far and away the better movie, and not just because Sigourney Weaver can out-act the former governor.

Both movies turn out to be very similar to each other, but the writing and structure of Aliens is much better, much better.

How They’re Similar

Both movies follow a military team into an uncertain situation. This uncertain situation turns out to contain an alien threat.

The alien threat in both cases clearly outmatches the resources of the team.

Both squads have an Outsider Who Is In Charge along with them (Dillon in Predator, Burke in Aliens). This Outsider has a different moral code than the rest of the team, being concerned with either profit or enemy intelligence above everything else.

The original mission in both movies is supposed to be rescue, but we find out the team has been tricked, and they’re really there to advance the Outsider’s agenda.

The Outsider is karmically punished for their betrayal of the team by the alien threat.

The climax of both movies is a one-on-one fight between the protagonist and the main alien threat.

What Aliens Does Better

Almost everything.

The Team

Let’s start with the team, since that’s who we spend most of the movie with. This is supposed to be a tight-knit group of people who have worked together for a long time, and we’re supposed to root for them throughout. So the film needs to take every chance it has to communicate that to us.

Aliens succeeds. Its marines seem to actually like each other, and function as team. We get to see them joking and talking as they come out of hyper-sleep and while they’re eating before the mission briefing. They continue to banter using their radios as the mission starts (before things go haywire).

We also get a clear sense of the hierarchy and role for each member of the team: we know who the sergeant is, which people are carrying the heavy guns, who’s got the radar for spotting, etc.

Predator fails to do any of this. The members of the team don’t seem to like each other at all. We don’t see them bantering, but we do see them do some macho posturing, which is not a substitute.

What’s more, none of the team members really seem to have a clear role. They all carry basically the same weapons, they don’t work in groups, and they all have the same skills.

The one exception is Billy, the tracker, but he’s so close to the “wise Native American hunter” stereotype that it doesn’t serve to flesh out his character, it just makes him more of a caricature.

The Betrayal

Next, the “turn” or “betrayal” moment, when we find out the Outsider has tricked the team.

In Aliens this is a real betrayal. Burke locked two of them in with an alien in the hopes it would impregnate one of them, and was ready to kill the others so he could take off on his own (with the alien and its host). The Outsider turns out to be a real threat to the team, and there’s conflict generated both in overcoming his betrayal and deciding how to punish him for it.

Predator‘s betrayal is much lower key. The team’s capture of the rebel camp seems effortless, with not much risk to any of the team members. Dillon’s betrayal is just an ulterior motive for getting in the camp. He never directly puts anyone’s lives in danger, and so the protagonist’s treatment of him feels overblown and melodramatic. There’s no real punch to it.

It would have been much better to make Dillon’s betrayal more serious. Imagine if Dutch’s team made it to the camp only to find that everyone was dead, with Anna the only survivor. She won’t talk, but they decide to take her back with them anyway. As they head back to the evac point, the team start getting picked off by the Predator. Eventually only Dutch, Anna, and Dillon are left.

Dillon finally confesses what’s really happening: he knew about the Predator, and contracted Dutch’s team under false pretenses because his first pick got wiped out by the alien. He wants to capture it, which is why he hasn’t been shooting to kill when he sees it. He’s ready to admit that he was wrong, though, and wants to help kill it so they can all get home.

Now Dutch has got a real moral problem: should he trust Dillon and work with him to defeat the predator? Or should he punish him for betraying his team and getting most of them killed?

Either choice is interesting, and would have a significant impact on the plot.

The Climax

Finally, the climax of Aliens is done better. I don’t just mean the robot-on-alien action (which is objectively awesome).

I mean that in the Predator climax, the alien gets progressively dumber. He starts out as this advanced warrior, but eventually ditches all his advantages — his armor, his gun, his helmet — to take on the protagonist in one-on-one combat. Against such a willfully dumb and weakened adversary, how could the protagonist lose?

In Aliens, the alien queen gets smarter as the fight goes on. We originally see her as just an egg laying machine. But she escapes from the power station before it blows up, stowing away on the ship. Once on the ship, she waits until they’re docked with the main one before emerging, and when she does she goes after the humans for food (Well, and maybe a little revenge. She does seem pissed off). She uses every advantage she has, all her strength and cunning, which makes Ripley’s victory even more impressive.

The Discovery of Middle Earth: Mapping the Lost World of the Celts by Graham Robb

An odd book. The author’s main thesis — that the Celts knew enough about geometry and astronomy to align their cities with the path of the sun — is convincing, once his evidence is laid out. But along the way he falls into claims that sound more like an “aliens built the pyramids” book, such as when he says all Celtic art was based on complex geometric designs.

It’s hard to fault him too much, though; the central idea is inspiring, and his excitement at getting to share it bleeds through.

Just a few of the things I learned from this book:

  • The Druids — and the Celts in general — were not illiterate, though writing down druidic knowledge was taboo. Most of their writing was done using the Greek alphabet.
  • There were several large Celtic migrations in the 4th and 3rd centuries BCE, that were apparently well-planned (Caesar relates one that was planned two years ahead of time). Many of these ended up in northern Italy; both Bologna and Milan were founded by migrating Gauls.
  • The Roman conception of Gaul’s geography was terrible. Tacitus thought Ireland was just off the coast of Spain (!). Caesar had to rely completely on local knowledge to navigate the terrain. In contrast, a Gaul from Marseille (Pytheas) circumnavigated Europe in the 320s BCE (Mediterranean to Atlantic Coast to Britain to Baltic to Black Sea back to Mediterranean), taking accurate latitude readings the whole way.

Writing Through It

Novel’s grown to 41,169 words.

This week’s writing has been done not in spite of stress, or without it, but because of it.

A lot of things I thought were settled suddenly popped back up again: my mother-in-law has been in and out of the hospital, the buyers for our house seem to be having second thoughts, and my day job turned into slamming my head into a brick wall over and over again, for eight hours.

On top of that, the time for me to pack up the house and move is getting closer, so I’ve got that prep to deal with: going through years of accumulated memories in an empty house and sorting through which ones get to come with us and which ones get left behind.

I thought it would prove too much, and that I’d have to stop writing again. I did take off an extra day this week, spent it watching movies instead of working on the book.

But the next day I got back into it, and was surprised to find that writing the novel — at this point, at least — is the easiest way to take my mind off of all the stress. It’s hard to feel lonely when I’m writing dialog, or worry about my house selling when I’m trying to work through a character’s alibi.

I’m not sure why it’s so different now than back in July. Perhaps it’s because I’ve loosened my grip on my outline, so I don’t have to think so far ahead?

Whatever the cause, I’m grateful for it.