Clojure/West 2015: Notes from Day Two

Data Science in Clojure

  • Soren Macbeth; yieldbot
  • yieldbot: similar to how adwords works, but not google and not on search results
  • 1 billion pageviews per week: lots of data
  • end up using almost all of the big data tools out there
  • EXCEPT HADOOP: no more hadoop for them
  • lots of machine learning
  • always used clojure, never had or used anything else
  • why clojure?
    • most of the large distributed processing systems run on the jvm
    • repl great for data exploration
    • no delta between prototyping and production code
  • cascalog: was great, enabled them to write hadoop code without hating it the whole time, but still grew to hate hadoop (running hadoop) over time
  • december: finally got rid of last hadoop job, now life is great
  • replaced with: storm
  • marceline: clojure dsl (open-source) on top of the trident java library
  • writing trident in clojure much better than using the java examples
  • flambo: clojure dsl on top of spark’s java api
    • renamed, expanded version of climate corp’s clj-spark

Pattern Matching in Clojure

  • Sean Johnson; path.com
  • runs remote engineering team at Path
  • history of pattern matching
    • SNOBOL: 60s and 70s, pattern matching around strings
    • Prolog: 1972; unification at its core
    • lots of functional and pattern matching work in the 70s and 80s
    • 87: Erlang -> from prolog to telecoms; functional
    • 90s: standard ml, haskell…
    • clojure?
  • prolog: unification does spooky things
    • bound match unbound
    • unbound match bound
    • unbound match unbound
  • clojurific ways: core.logic, miniKanren, Learn Prolog Now
  • erlang: one way pattern matching: bound match unbound, unbound match bound
  • what about us? macros!
  • pattern matching all around us
    • destructuring is a mini pattern matching language
    • multimethods dispatch based on pattern matching
    • case: simple pattern matching macro
  • but: we have macros, we can use them to create the language that we want
  • core.match
  • dennis’ library defun: macros all the way down: a macro that wraps the core.match macro
    • pattern matching macro for defining functions just like erlang
    • (defun say-hi
      ([“Dennis”] “Hi Dennis!”)
      ([:catty] “Morning, Catty!”))
    • can also use the :guard syntax from core.match in defining your functions’ pattern matching
    • not in clojurescript yet…
  • but: how well does this work in practice?
    • falkland CMS, SEACAT -> incidental use
    • POSThere.io -> deliberate use (the sweet spot)
    • clj-json-ld, filter-map -> maximal use
  • does it hurt? ever?
  • limitations
    • guards only accept one argument, workaround with tuples
  • best practices
    • use to eliminate conditionals at the top of a function
    • use to eliminate nested conditionals
    • handle multiple function inputs (think map that might have different keys in it?)
    • recursive function pattern: one def for the start, one def for the work, one def for the finish
      • used all over erlang
      • not as explicit in idiomatic clojure

Clojure/West 2015: Notes from Day One

Life of a Clojure Expression

  • John Hume, duelinmarkers.com (DRW trading)
  • a quick tour of clojure internals
  • giving the talk in org mode (!)
  • disclaimers: no expert, internals can change, excerpts have been mangled for readability
  • most code will be java, not clojure
  • (defn m [v] {:foo “bar” :baz v})
  • minor differences: calculated key, constant values, more than 8 key/value pairs
  • MapReader called from static array of IFns used to track macros; triggered by ‘{‘ character
  • PersistentArrayMap used for less than 8 objects in map
  • eval treats forms wrapped in (do..) as a special case
  • if form is non-def bit of code, eval will wrap it in a 0-arity function and invoke it
  • eval’s macroexpand will turn our form into (def m (fn [v] {:foo “bar :baz v}))
  • checks for duplicate keys twice: once on read, once on analyze, since forms for keys might have been evaluated into duplicates
  • java class emitted at the end with name of our fn tacked on, like: class a_map$m
  • intelli-j will report a lot of unused methods in the java compiler code, but what’s happening is the methods are getting invoked, but at load time via some asm method strings
  • no supported api for creating small maps with compile-time constant keys; array-map is slow and does a lot of work it doesn’t need to do

Clojure Parallelism: Beyond Futures

  • Leon Barrett, the climate corporation
  • climate corp: model weather and plants, give advice to farmers
  • wrote Claypoole, a parallelism library
  • map/reduce to compute average: might use future to shove computation of the average divisor (inverse of # of items) off at the beginning, then do the map work, then deref the future at the end
  • future -> future-call: sends fn-wrapped body to an Agent/soloExecutor
  • concurrency vs parallelism: concurrency means things could be re-ordered arbitrarily, parallelism means multiple things happen at once
  • thread pool: recycle a set number of threads to avoid constantly incurring the overhead of creating a new thread
  • agent thread pool: used for agents and futures; program will not exit while threads are there; lifetime of 60 sec
  • future limitations
    • tasks too small for the overhead
    • exceptions get wrapped in ExecutionException, so your try/catches won’t work normally anymore
  • pmap: just a parallel map; lazy; runs N-cpu + 3 tasks in futures
    • generates threads as needed; could have problems if you’re creating multiple pmaps at once
    • slow task can stall it, since it waits for the first task in the sequence to complete for each trip through
    • also wraps exceptions just like future
  • laziness and parallelism: don’t mix
  • core.async
    • channels and coroutines
    • reads like go
    • fixed-size thread pool
    • handy when you’ve got a lot of callbacks in your code
    • mostly for concurrency, not parallelism
    • can use pipeline for some parallelism; it’s like a pmap across a channel
    • exceptions can kill coroutines
  • claypoole
    • pmap that uses a fixed-size thread pool
    • with-shutdown! will clean up thread pool when done
    • eager by default
    • output is an eagerly streaming sequence
    • also get pfor (parallel for)
    • lazy versions are available; can be better for chaining (fast pmap into slow pmap would have speed mismatch with eagerness)
    • exceptions are re-thrown properly
    • no chunking worries
    • can have priorities on your tasks
  • reducers
    • uses fork/join pool
    • good for cpu-bound tasks
    • gives you a parallel reduce
  • tesser
    • distributable on hadoop
    • designed to max out cpu
    • gives parallel reduce as well (fold)
  • tools for working with parallelism:
    • promises to block the state of the world and check things
    • yorkit (?) for jvm profiling

Boot Can Build It

  • Alan Dipert and Micha Niskin, adzerk
  • why a new build tool?
    • build tooling hasn’t kept up with the complexity of deploys
    • especially for web applications
    • builds are processes, not specifications
    • most tools: maven, ant, oriented around configuration instead of programming
  • boot
    • many independent parts that do one thing well
    • composition left to the user
    • maven for dependency resolution
    • builds clojure and clojurescript
    • sample boot project has main method (they used java project for demo)
    • uses ‘–‘ for piping tasks together (instead of the real |)
    • filesets are generated and passed to a task, then output of task is gathered up and sent to the next task in the chain (like ring middleware)
  • boot has a repl
    • can do most boot tasks from the repl as well
    • can define new build tasks via deftask macro
    • (deftask build …)
    • (boot (watch) (build))
  • make build script: (build.boot)
    • #!/usr/bin/env boot
    • write in the clojure code defining and using your boot tasks
    • if it’s in build.boot, boot will find it on command line for help and automatically write the main fn for you
  • FileSet: immutable snapshot of the current files; passed to task, new one created and returned by that task to be given to the next one; task must call commit! to commit changes to it (a la git)
  • dealing with dependency hell (conflicting dependencies)
    • pods
    • isolated runtimes, with own dependencies
    • some things can’t be passed between pods (such as the things clojure runtime creates for itself when it starts up)
    • example: define pod with env that uses clojure 1.5.1 as a dependency, can then run code inside that pod and it’ll only see clojure 1.5.1

One Binder to Rule Them All: Introduction to Trapperkeeper

  • Ruth Linehan and Nathaniel Smith; puppetlabs
  • back-end service engineers at puppetlabs
  • service framework for long-running applications
  • basis for all back-end services at puppetlabs
  • service framework:
    • code generalization
    • component reuse
    • state management
    • lifecycle
    • dependencies
  • why trapperkeeper?
    • influenced by clojure reloaded pattern
    • similar to component and jake
    • puppetlabs ships on-prem software
    • need something for users to configure, may not have any clojure experience
    • needs to be lightweight: don’t want to ship jboss everywhere
  • features
    • turn on and off services via config
    • multiple web apps on a single web server
    • unified logging and config
    • simple config
  • existing services that can be used
    • config service: for parsing config files
    • web server service: easily add ring handler
    • nrepl service: for debugging
    • rpc server service: nathaniel wrote
  • demo app: github -> trapperkeeper-demo
  • anatomy of service
    • protocol: specifies the api contract that that service will have
    • can have any number of implementations of the contract
    • can choose between implementations at runtime
  • defservice: like defining a protocol implementation, one big series of defs of fns: (init [this context] (let …)))
    • handle dependencies in defservice by vector after service name: [[:ConfigService get-in-config] [:MeowService meow]]
    • lifecycle of the service: what happens when initialized, started, stopped
    • don’t have to implement every part of the lifecycle
  • config for the service: pulled from file
    • supports .json, .edn, .conf, .ini, .properties, .yaml
    • can specify single file or an entire directory on startup
    • they prefer .conf (HOCON)
    • have to use the config service to get the config values
    • bootstrap.cfg: the config file that controls which services get picked up and loaded into app
    • order is irrelevant: will be decided based on parsing of the dependencies
  • context: way for service to store and access state locally not globally
  • testing
    • should write code as plain clojure
    • pass in context/config as plain maps
    • trapperkeeper provides helper utilities for starting and stopping services via code
    • with-app-with-config macro: offers symbol to bind the app to, plus define config as a map, code will be executed with that app binding and that config
  • there’s a lein template for trapperkeeper that stubs out working application with web server + test suite + repl
  • repl utils:
    • start, stop, inspect TK apps from the repl: (go); (stop)
    • don’t need to restart whole jvm to see changes: (reset)
    • can print out the context: (:MeowService (context))
  • trapperkeeper-rpc
    • macro for generating RPC versions of existing trapperkeeper protocols
    • supports https
    • defremoteservice
    • with web server on one jvm and core logic on a different one, can scale them independently; can keep web server up even while swapping out or starting/stopping the core logic server
    • future: rpc over ssl websockets (using message-pack in transit for data transmission); metrics, function retrying; load balancing

Domain-Specific Type Systems

  • Nathan Sorenson, sparkfund
  • you can type-check your dsls
  • libraries are often examples of dsls: not necessarily macros involved, but have opinionated way of working within a domain
  • many examples pulled from “How to Design Programs”
  • domain represented as data, interpreted as information
  • type structure: syntactic means of enforcing abstraction
  • abstraction is a map to help a user navigate a domain
    • audience is important: would give different map to pedestrian than to bus driver
  • can also think of abstraction as specification, as dictating what should be built or how many things should be built to be similar
  • showing inception to programmers is like showing jaws to a shark
  • fable: parent trap over complex analysis
  • moral: types are not data structures
  • static vs dynamic specs
    • static: types; things as they are at compile time; definitions and derivations
    • dynamic: things as they are at runtime; unit tests and integration tests; expressed as falsifiable conjectures
  • types not always about enforcing correctness, so much as describing abstractions
  • simon peyton jones: types are the UML of functional programming
  • valuable habit: think of the types involved when designing functions
  • spec-tacular: more structure for datomic schemas
    • from sparkfund
    • the type system they wanted for datomic
    • open source but not quite ready for public consumption just yet
    • datomic too flexible: attributes can be attached to any entity, relationships can happen between any two entities, no constraints
    • use specs to articulate the constraints
    • (defspec Lease [lesse :is-a Corp] [clauses :is-many String] [status :is-a Status])
    • (defenum Status …)
    • wrote query language that’s aware of the defined types
    • uses bi-directional type checking: github.com/takeoutweight/bidirectional
    • can write sensical error messages: Lease has no field ‘lesee’
    • can pull type info from their type checker and feed it into core.typed and let core.typed check use of that data in other code (enforce types)
    • does handle recursive types
    • no polymorphism
  • resources
    • practical foundations for programming languages: robert harper
    • types and programming languages: benjamin c pierce
    • study haskell or ocaml; they’ve had a lot of time to work through the problems of types and type theory
  • they’re using spec-tacular in production now, even using it to generate type docs that are useful for non-technical folks to refer to and discuss; but don’t feel the code is at the point where other teams could pull it in and use it easily

ClojureScript Update

  • David Nolen
  • ambly: cljs compiled for iOS
  • uses bonjour and webdav to target ios devices
  • creator already has app in app store that was written entirely in clojurescript
  • can connect to device and use repl to write directly on it (!)

Clojure Update

  • Alex Miller
  • clojure 1.7 is at 1.7.0-beta1 -> final release approaching
  • transducers coming
  • define a transducer as a set of operations on a sequence/stream
    • (def xf (comp (filter? odd) (map inc) (take 5)))
  • then apply transducer to different streams
    • (into [] xf (range 1000))
    • (transduce xf + 0 (range 1000))
    • (sequence xf (range 1000))
  • reader conditionals
    • portable code across clj platforms
    • new extension: .cljc
    • use to select out different expressions based on platform (clj vs cljs)
    • #?(:clj (java.util.Date.)
      :cljs (js/Date.))
    • can fall through the conditionals and emit nothing (not nil, but literally don’t emit anything to be read by the reader)
  • performance has also been a big focus
    • reduced class lookups for faster compile times
    • iterator-seq is now chunked
    • multimethod default value dispatch is now cached

Back to the Outline of the Future

The presentation’s done, the conference is over. I can start turning my attention back to writing.

But writing what? The book I’m outlining now is near-future science fiction, something that’s usually difficult to pin down. It’ll almost certainly be thought of as a prediction of what’s to come, instead of what it really is: an excuse to indulge some of my programming daydreams without moving too far away from the known and familiar.

So how do I balance the whimsy I want to put in there and the reality-grounding I know it’ll need? How do I gel a coherent story out of all the ideas bubbling around in my head for such a setting?

There’s no way to know except to write it, to set something down and then work with it until it reaches the shape I want. So I’m outlining, sketching characters and situations out, building up the scaffolding I’ll need before starting to write in earnest.

Here we go again.

Rewatching: The Matrix Trilogy

I was happy to find that the first movie still holds up. I think part of why it works is because it is largely set in a 1999 that is frozen in time. It also helps that the basic structure of the movie is classic: naive youngster is shown a wider world, told of a prophecy where they will save the world, then begins to fulfill that prophecy.

The relative roles of the other main characters in the story bothered me this time, though, where they didn’t before. Morpheus is still amazing, but step back a bit and he’s one more wise black man guiding a white kid to a greatness that he can’t achieve. Trinity kicks ass, but squint and she’s a kung-fu wielding female who’s only there to fall in love with and support the male hero. For a film set in the future with a nominal theme of breaking down mental boundaries, these elements feel distinctly old and out of place.

The second and third movies are still complete failures, though. I enjoyed the second movie at the time, and remember hating the third one along with everyone else. But rewatching them showed me how much the two final films are really one film, and it’s not a good one.

I think a large part of the problem with the last two movies is that they violate the narrative expectations set by the first movie. The trilogy sequence setup in The Matrix was: a nobody becomes special (first film), then learns more about their specialness (second film), then pursues and achieves the mission for which they became special (third film).

But they skipped the second step, and padded the third step out over two movies. Mistake.

There’s a whole chunk of story missing, where we’d normally see Neo rescuing people from the Matrix — maybe getting frustrated that he can’t convince more people it’s fake? — and learning about what he can and can’t do. For example, he can’t do the Keymaker’s trick with doors: if we saw other people do it for an entire movie, but didn’t know how they were doing it, the Keymaker reveal would have a lot more punch.

Skipping that piece of the story prevents us from watching Neo learn and grow, and drops an opportunity to deepen the characterization of the crew of the Nebuchadnezzar along the way.

Given the current structure — and the large narrative hole in it — the last 2 movies should have been compressed into one. Cut out Zion, the attack on Zion subplot, the scenes with the last stand on the docks, etc. Stick to the thread of Neo and his crew chasing down the Keymaker and getting to the Source, then Neo taking a ship to the Machine City and ending the war.

Everything else — the machines attacking the docks, the sabotage of the ships by the Smith-infested human — can come to us as reports that Morpheus relays to the crew. This lets us keep the focus on Neo’s story, since we don’t have time to give the other plots and characters their due. Trying to squeeze them in — like the second and third films do — weakens Neo’s plot and doesn’t deliver any emotional heft. There’s simply not enough screen time.

The best course would have been to make the second bridging movie that’s missing, and then made the trimmed down third movie to wrap it up. Instead, we get one and a half good movies: the original Matrix, and the half a movie buried inside the latter two.

Feeling the Itch

Not writing is starting to get under my skin.

The two weeks I was going to take off has turned into a month.

At first it was so I could catch up on all the house work I’ve been putting off: hardscaping the front yard in response to the drought, cleaning up the back yard, fixing both fences.

More recently I’ve held off writing because of a talk I’m giving at a conference for work next week. I knew starting on any novel would quickly occupy whatever spare head space I have, and I wanted to keep that free to work on the presentation.

Both good reasons. But it doesn’t stop me from missing it. When I was working on the novel, I felt like I had a purpose, a mission to fulfill. Without another novel to work on, I feel more relaxed, true, but also a little empty, a little directionless, a little smaller now that I don’t have any characters spouting dialogue into my head.

I’m trying to be patient, to keep notes on the books I’m debating working on, to stay focused on the other goals I’ve set for myself for this month. But I miss the work, and need to get back to it.

Wednesday Grab Bag: Sad Puppies

Background:

http://terribleminds.com/ramble/2015/04/06/the-hugo-awards-gamergate-edition-2015/

http://whatever.scalzi.com/2015/04/07/human-shields-cabals-and-poster-boys/

http://grrm.livejournal.com/420090.html

I think if these tactics had been used to ensure that only women got nominated for the Hugos this year, or that only PoC did, the Sad Puppies wouldn’t see that as right or fair.

I also think that they had — still have, I guess — a chance to act on their feelings of rejection in a positive way, by starting their own convention. No one could fault them if they started a Con that promoted the authors they prefer, nor would anyone be this mad if they’d launched their own awards at that Con.

But who will read it?

First draft of novel’s done, writing vacation is winding down.

I’ve got an urge to start editing the novel now, to go back and fix the mistakes I know are there, and find the ones I don’t yet know about.

But I’m holding off. I’m not ready to treat it objectively yet. While printing it off for my wife to read I read a few pages, and liked it a little too much.

I need fresh eyes on it, eyes that haven’t seen anything but the words on the page, and so will notice if something’s missing or inconsistent or out of tune. Thus the printing run for my wife, so she can read it while soaking in the tub. And thus my new search for beta readers, for those willing to slog through the mess that is the first draft.

Wish me luck.