Clojure/West 2015: Notes from Day Three

Everything Will Flow

  • Zach Tellman, Factual
  • queues: didn’t deal with directly in clojure until core.async
  • queues are everywhere: even software threads have queues for their execution, and correspond to hardware threads that have their own buffers (queues)
  • queueing theory: a lot of math, ignore most
  • performance modeling and design of computer systems: queueing theory in action
  • closed systems: when produce something, must wait for consumer to deal with it before we can produce something else
    • ex: repl, web browser
  • open systems: requests come in without regard for how fast the consumer is using them
    • adding consumers makes the open systems we build more robust
  • but: because we’re often adding producers and consumers, our systems may respond well for a good while, but then suddenly fall over (can keep up better for longer, but when gets unstable, does so rapidly)
  • lesson: unbounded queues are fundamentally broken
  • three responses to too much incoming data:
    • drop: valid if new data overrides old data, or if don’t care
    • reject: often the only choice for an application
    • pause (backpressure): often the only choice for a closed system, or sub-system (can’t be sure that dropping or rejecting would be the right choice for the system as a whole)
    • this is why core.async has the puts buffer in front of their normal channel buffer
  • in fact, queues don’t need buffer, so much as they need the puts and takes buffers; which is the default channel you get from core.async

Clojure At Scale

  • Anthony Moocar, Walmart Labs
  • redis and cassandra plus clojure
  • 20 services, 70 lein projects, 50K lines of code
  • prefer component over global state

Clojure/West 2015: Notes from Day Two

Data Science in Clojure

  • Soren Macbeth; yieldbot
  • yieldbot: similar to how adwords works, but not google and not on search results
  • 1 billion pageviews per week: lots of data
  • end up using almost all of the big data tools out there
  • EXCEPT HADOOP: no more hadoop for them
  • lots of machine learning
  • always used clojure, never had or used anything else
  • why clojure?
    • most of the large distributed processing systems run on the jvm
    • repl great for data exploration
    • no delta between prototyping and production code
  • cascalog: was great, enabled them to write hadoop code without hating it the whole time, but still grew to hate hadoop (running hadoop) over time
  • december: finally got rid of last hadoop job, now life is great
  • replaced with: storm
  • marceline: clojure dsl (open-source) on top of the trident java library
  • writing trident in clojure much better than using the java examples
  • flambo: clojure dsl on top of spark’s java api
    • renamed, expanded version of climate corp’s clj-spark

Pattern Matching in Clojure

  • Sean Johnson; path.com
  • runs remote engineering team at Path
  • history of pattern matching
    • SNOBOL: 60s and 70s, pattern matching around strings
    • Prolog: 1972; unification at its core
    • lots of functional and pattern matching work in the 70s and 80s
    • 87: Erlang -> from prolog to telecoms; functional
    • 90s: standard ml, haskell…
    • clojure?
  • prolog: unification does spooky things
    • bound match unbound
    • unbound match bound
    • unbound match unbound
  • clojurific ways: core.logic, miniKanren, Learn Prolog Now
  • erlang: one way pattern matching: bound match unbound, unbound match bound
  • what about us? macros!
  • pattern matching all around us
    • destructuring is a mini pattern matching language
    • multimethods dispatch based on pattern matching
    • case: simple pattern matching macro
  • but: we have macros, we can use them to create the language that we want
  • core.match
  • dennis’ library defun: macros all the way down: a macro that wraps the core.match macro
    • pattern matching macro for defining functions just like erlang
    • (defun say-hi
      ([“Dennis”] “Hi Dennis!”)
      ([:catty] “Morning, Catty!”))
    • can also use the :guard syntax from core.match in defining your functions’ pattern matching
    • not in clojurescript yet…
  • but: how well does this work in practice?
    • falkland CMS, SEACAT -> incidental use
    • POSThere.io -> deliberate use (the sweet spot)
    • clj-json-ld, filter-map -> maximal use
  • does it hurt? ever?
  • limitations
    • guards only accept one argument, workaround with tuples
  • best practices
    • use to eliminate conditionals at the top of a function
    • use to eliminate nested conditionals
    • handle multiple function inputs (think map that might have different keys in it?)
    • recursive function pattern: one def for the start, one def for the work, one def for the finish
      • used all over erlang
      • not as explicit in idiomatic clojure

Clojure/West 2015: Notes from Day One

Life of a Clojure Expression

  • John Hume, duelinmarkers.com (DRW trading)
  • a quick tour of clojure internals
  • giving the talk in org mode (!)
  • disclaimers: no expert, internals can change, excerpts have been mangled for readability
  • most code will be java, not clojure
  • (defn m [v] {:foo “bar” :baz v})
  • minor differences: calculated key, constant values, more than 8 key/value pairs
  • MapReader called from static array of IFns used to track macros; triggered by ‘{‘ character
  • PersistentArrayMap used for less than 8 objects in map
  • eval treats forms wrapped in (do..) as a special case
  • if form is non-def bit of code, eval will wrap it in a 0-arity function and invoke it
  • eval’s macroexpand will turn our form into (def m (fn [v] {:foo “bar :baz v}))
  • checks for duplicate keys twice: once on read, once on analyze, since forms for keys might have been evaluated into duplicates
  • java class emitted at the end with name of our fn tacked on, like: class a_map$m
  • intelli-j will report a lot of unused methods in the java compiler code, but what’s happening is the methods are getting invoked, but at load time via some asm method strings
  • no supported api for creating small maps with compile-time constant keys; array-map is slow and does a lot of work it doesn’t need to do

Clojure Parallelism: Beyond Futures

  • Leon Barrett, the climate corporation
  • climate corp: model weather and plants, give advice to farmers
  • wrote Claypoole, a parallelism library
  • map/reduce to compute average: might use future to shove computation of the average divisor (inverse of # of items) off at the beginning, then do the map work, then deref the future at the end
  • future -> future-call: sends fn-wrapped body to an Agent/soloExecutor
  • concurrency vs parallelism: concurrency means things could be re-ordered arbitrarily, parallelism means multiple things happen at once
  • thread pool: recycle a set number of threads to avoid constantly incurring the overhead of creating a new thread
  • agent thread pool: used for agents and futures; program will not exit while threads are there; lifetime of 60 sec
  • future limitations
    • tasks too small for the overhead
    • exceptions get wrapped in ExecutionException, so your try/catches won’t work normally anymore
  • pmap: just a parallel map; lazy; runs N-cpu + 3 tasks in futures
    • generates threads as needed; could have problems if you’re creating multiple pmaps at once
    • slow task can stall it, since it waits for the first task in the sequence to complete for each trip through
    • also wraps exceptions just like future
  • laziness and parallelism: don’t mix
  • core.async
    • channels and coroutines
    • reads like go
    • fixed-size thread pool
    • handy when you’ve got a lot of callbacks in your code
    • mostly for concurrency, not parallelism
    • can use pipeline for some parallelism; it’s like a pmap across a channel
    • exceptions can kill coroutines
  • claypoole
    • pmap that uses a fixed-size thread pool
    • with-shutdown! will clean up thread pool when done
    • eager by default
    • output is an eagerly streaming sequence
    • also get pfor (parallel for)
    • lazy versions are available; can be better for chaining (fast pmap into slow pmap would have speed mismatch with eagerness)
    • exceptions are re-thrown properly
    • no chunking worries
    • can have priorities on your tasks
  • reducers
    • uses fork/join pool
    • good for cpu-bound tasks
    • gives you a parallel reduce
  • tesser
    • distributable on hadoop
    • designed to max out cpu
    • gives parallel reduce as well (fold)
  • tools for working with parallelism:
    • promises to block the state of the world and check things
    • yorkit (?) for jvm profiling

Boot Can Build It

  • Alan Dipert and Micha Niskin, adzerk
  • why a new build tool?
    • build tooling hasn’t kept up with the complexity of deploys
    • especially for web applications
    • builds are processes, not specifications
    • most tools: maven, ant, oriented around configuration instead of programming
  • boot
    • many independent parts that do one thing well
    • composition left to the user
    • maven for dependency resolution
    • builds clojure and clojurescript
    • sample boot project has main method (they used java project for demo)
    • uses ‘–‘ for piping tasks together (instead of the real |)
    • filesets are generated and passed to a task, then output of task is gathered up and sent to the next task in the chain (like ring middleware)
  • boot has a repl
    • can do most boot tasks from the repl as well
    • can define new build tasks via deftask macro
    • (deftask build …)
    • (boot (watch) (build))
  • make build script: (build.boot)
    • #!/usr/bin/env boot
    • write in the clojure code defining and using your boot tasks
    • if it’s in build.boot, boot will find it on command line for help and automatically write the main fn for you
  • FileSet: immutable snapshot of the current files; passed to task, new one created and returned by that task to be given to the next one; task must call commit! to commit changes to it (a la git)
  • dealing with dependency hell (conflicting dependencies)
    • pods
    • isolated runtimes, with own dependencies
    • some things can’t be passed between pods (such as the things clojure runtime creates for itself when it starts up)
    • example: define pod with env that uses clojure 1.5.1 as a dependency, can then run code inside that pod and it’ll only see clojure 1.5.1

One Binder to Rule Them All: Introduction to Trapperkeeper

  • Ruth Linehan and Nathaniel Smith; puppetlabs
  • back-end service engineers at puppetlabs
  • service framework for long-running applications
  • basis for all back-end services at puppetlabs
  • service framework:
    • code generalization
    • component reuse
    • state management
    • lifecycle
    • dependencies
  • why trapperkeeper?
    • influenced by clojure reloaded pattern
    • similar to component and jake
    • puppetlabs ships on-prem software
    • need something for users to configure, may not have any clojure experience
    • needs to be lightweight: don’t want to ship jboss everywhere
  • features
    • turn on and off services via config
    • multiple web apps on a single web server
    • unified logging and config
    • simple config
  • existing services that can be used
    • config service: for parsing config files
    • web server service: easily add ring handler
    • nrepl service: for debugging
    • rpc server service: nathaniel wrote
  • demo app: github -> trapperkeeper-demo
  • anatomy of service
    • protocol: specifies the api contract that that service will have
    • can have any number of implementations of the contract
    • can choose between implementations at runtime
  • defservice: like defining a protocol implementation, one big series of defs of fns: (init [this context] (let …)))
    • handle dependencies in defservice by vector after service name: [[:ConfigService get-in-config] [:MeowService meow]]
    • lifecycle of the service: what happens when initialized, started, stopped
    • don’t have to implement every part of the lifecycle
  • config for the service: pulled from file
    • supports .json, .edn, .conf, .ini, .properties, .yaml
    • can specify single file or an entire directory on startup
    • they prefer .conf (HOCON)
    • have to use the config service to get the config values
    • bootstrap.cfg: the config file that controls which services get picked up and loaded into app
    • order is irrelevant: will be decided based on parsing of the dependencies
  • context: way for service to store and access state locally not globally
  • testing
    • should write code as plain clojure
    • pass in context/config as plain maps
    • trapperkeeper provides helper utilities for starting and stopping services via code
    • with-app-with-config macro: offers symbol to bind the app to, plus define config as a map, code will be executed with that app binding and that config
  • there’s a lein template for trapperkeeper that stubs out working application with web server + test suite + repl
  • repl utils:
    • start, stop, inspect TK apps from the repl: (go); (stop)
    • don’t need to restart whole jvm to see changes: (reset)
    • can print out the context: (:MeowService (context))
  • trapperkeeper-rpc
    • macro for generating RPC versions of existing trapperkeeper protocols
    • supports https
    • defremoteservice
    • with web server on one jvm and core logic on a different one, can scale them independently; can keep web server up even while swapping out or starting/stopping the core logic server
    • future: rpc over ssl websockets (using message-pack in transit for data transmission); metrics, function retrying; load balancing

Domain-Specific Type Systems

  • Nathan Sorenson, sparkfund
  • you can type-check your dsls
  • libraries are often examples of dsls: not necessarily macros involved, but have opinionated way of working within a domain
  • many examples pulled from “How to Design Programs”
  • domain represented as data, interpreted as information
  • type structure: syntactic means of enforcing abstraction
  • abstraction is a map to help a user navigate a domain
    • audience is important: would give different map to pedestrian than to bus driver
  • can also think of abstraction as specification, as dictating what should be built or how many things should be built to be similar
  • showing inception to programmers is like showing jaws to a shark
  • fable: parent trap over complex analysis
  • moral: types are not data structures
  • static vs dynamic specs
    • static: types; things as they are at compile time; definitions and derivations
    • dynamic: things as they are at runtime; unit tests and integration tests; expressed as falsifiable conjectures
  • types not always about enforcing correctness, so much as describing abstractions
  • simon peyton jones: types are the UML of functional programming
  • valuable habit: think of the types involved when designing functions
  • spec-tacular: more structure for datomic schemas
    • from sparkfund
    • the type system they wanted for datomic
    • open source but not quite ready for public consumption just yet
    • datomic too flexible: attributes can be attached to any entity, relationships can happen between any two entities, no constraints
    • use specs to articulate the constraints
    • (defspec Lease [lesse :is-a Corp] [clauses :is-many String] [status :is-a Status])
    • (defenum Status …)
    • wrote query language that’s aware of the defined types
    • uses bi-directional type checking: github.com/takeoutweight/bidirectional
    • can write sensical error messages: Lease has no field ‘lesee’
    • can pull type info from their type checker and feed it into core.typed and let core.typed check use of that data in other code (enforce types)
    • does handle recursive types
    • no polymorphism
  • resources
    • practical foundations for programming languages: robert harper
    • types and programming languages: benjamin c pierce
    • study haskell or ocaml; they’ve had a lot of time to work through the problems of types and type theory
  • they’re using spec-tacular in production now, even using it to generate type docs that are useful for non-technical folks to refer to and discuss; but don’t feel the code is at the point where other teams could pull it in and use it easily

ClojureScript Update

  • David Nolen
  • ambly: cljs compiled for iOS
  • uses bonjour and webdav to target ios devices
  • creator already has app in app store that was written entirely in clojurescript
  • can connect to device and use repl to write directly on it (!)

Clojure Update

  • Alex Miller
  • clojure 1.7 is at 1.7.0-beta1 -> final release approaching
  • transducers coming
  • define a transducer as a set of operations on a sequence/stream
    • (def xf (comp (filter? odd) (map inc) (take 5)))
  • then apply transducer to different streams
    • (into [] xf (range 1000))
    • (transduce xf + 0 (range 1000))
    • (sequence xf (range 1000))
  • reader conditionals
    • portable code across clj platforms
    • new extension: .cljc
    • use to select out different expressions based on platform (clj vs cljs)
    • #?(:clj (java.util.Date.)
      :cljs (js/Date.))
    • can fall through the conditionals and emit nothing (not nil, but literally don’t emit anything to be read by the reader)
  • performance has also been a big focus
    • reduced class lookups for faster compile times
    • iterator-seq is now chunked
    • multimethod default value dispatch is now cached