Clojure/West 2015: Notes from Day Three

Everything Will Flow

  • Zach Tellman, Factual
  • queues: didn’t deal with directly in clojure until core.async
  • queues are everywhere: even software threads have queues for their execution, and correspond to hardware threads that have their own buffers (queues)
  • queueing theory: a lot of math, ignore most
  • performance modeling and design of computer systems: queueing theory in action
  • closed systems: when produce something, must wait for consumer to deal with it before we can produce something else
    • ex: repl, web browser
  • open systems: requests come in without regard for how fast the consumer is using them
    • adding consumers makes the open systems we build more robust
  • but: because we’re often adding producers and consumers, our systems may respond well for a good while, but then suddenly fall over (can keep up better for longer, but when gets unstable, does so rapidly)
  • lesson: unbounded queues are fundamentally broken
  • three responses to too much incoming data:
    • drop: valid if new data overrides old data, or if don’t care
    • reject: often the only choice for an application
    • pause (backpressure): often the only choice for a closed system, or sub-system (can’t be sure that dropping or rejecting would be the right choice for the system as a whole)
    • this is why core.async has the puts buffer in front of their normal channel buffer
  • in fact, queues don’t need buffer, so much as they need the puts and takes buffers; which is the default channel you get from core.async

Clojure At Scale

  • Anthony Moocar, Walmart Labs
  • redis and cassandra plus clojure
  • 20 services, 70 lein projects, 50K lines of code
  • prefer component over global state

Clojure/West 2015: Notes from Day Two

Data Science in Clojure

  • Soren Macbeth; yieldbot
  • yieldbot: similar to how adwords works, but not google and not on search results
  • 1 billion pageviews per week: lots of data
  • end up using almost all of the big data tools out there
  • EXCEPT HADOOP: no more hadoop for them
  • lots of machine learning
  • always used clojure, never had or used anything else
  • why clojure?
    • most of the large distributed processing systems run on the jvm
    • repl great for data exploration
    • no delta between prototyping and production code
  • cascalog: was great, enabled them to write hadoop code without hating it the whole time, but still grew to hate hadoop (running hadoop) over time
  • december: finally got rid of last hadoop job, now life is great
  • replaced with: storm
  • marceline: clojure dsl (open-source) on top of the trident java library
  • writing trident in clojure much better than using the java examples
  • flambo: clojure dsl on top of spark’s java api
    • renamed, expanded version of climate corp’s clj-spark

Pattern Matching in Clojure

  • Sean Johnson;
  • runs remote engineering team at Path
  • history of pattern matching
    • SNOBOL: 60s and 70s, pattern matching around strings
    • Prolog: 1972; unification at its core
    • lots of functional and pattern matching work in the 70s and 80s
    • 87: Erlang -> from prolog to telecoms; functional
    • 90s: standard ml, haskell…
    • clojure?
  • prolog: unification does spooky things
    • bound match unbound
    • unbound match bound
    • unbound match unbound
  • clojurific ways: core.logic, miniKanren, Learn Prolog Now
  • erlang: one way pattern matching: bound match unbound, unbound match bound
  • what about us? macros!
  • pattern matching all around us
    • destructuring is a mini pattern matching language
    • multimethods dispatch based on pattern matching
    • case: simple pattern matching macro
  • but: we have macros, we can use them to create the language that we want
  • core.match
  • dennis’ library defun: macros all the way down: a macro that wraps the core.match macro
    • pattern matching macro for defining functions just like erlang
    • (defun say-hi
      ([“Dennis”] “Hi Dennis!”)
      ([:catty] “Morning, Catty!”))
    • can also use the :guard syntax from core.match in defining your functions’ pattern matching
    • not in clojurescript yet…
  • but: how well does this work in practice?
    • falkland CMS, SEACAT -> incidental use
    • -> deliberate use (the sweet spot)
    • clj-json-ld, filter-map -> maximal use
  • does it hurt? ever?
  • limitations
    • guards only accept one argument, workaround with tuples
  • best practices
    • use to eliminate conditionals at the top of a function
    • use to eliminate nested conditionals
    • handle multiple function inputs (think map that might have different keys in it?)
    • recursive function pattern: one def for the start, one def for the work, one def for the finish
      • used all over erlang
      • not as explicit in idiomatic clojure

Clojure/West 2015: Notes from Day One

Life of a Clojure Expression

  • John Hume, (DRW trading)
  • a quick tour of clojure internals
  • giving the talk in org mode (!)
  • disclaimers: no expert, internals can change, excerpts have been mangled for readability
  • most code will be java, not clojure
  • (defn m [v] {:foo “bar” :baz v})
  • minor differences: calculated key, constant values, more than 8 key/value pairs
  • MapReader called from static array of IFns used to track macros; triggered by ‘{‘ character
  • PersistentArrayMap used for less than 8 objects in map
  • eval treats forms wrapped in (do..) as a special case
  • if form is non-def bit of code, eval will wrap it in a 0-arity function and invoke it
  • eval’s macroexpand will turn our form into (def m (fn [v] {:foo “bar :baz v}))
  • checks for duplicate keys twice: once on read, once on analyze, since forms for keys might have been evaluated into duplicates
  • java class emitted at the end with name of our fn tacked on, like: class a_map$m
  • intelli-j will report a lot of unused methods in the java compiler code, but what’s happening is the methods are getting invoked, but at load time via some asm method strings
  • no supported api for creating small maps with compile-time constant keys; array-map is slow and does a lot of work it doesn’t need to do

Clojure Parallelism: Beyond Futures

  • Leon Barrett, the climate corporation
  • climate corp: model weather and plants, give advice to farmers
  • wrote Claypoole, a parallelism library
  • map/reduce to compute average: might use future to shove computation of the average divisor (inverse of # of items) off at the beginning, then do the map work, then deref the future at the end
  • future -> future-call: sends fn-wrapped body to an Agent/soloExecutor
  • concurrency vs parallelism: concurrency means things could be re-ordered arbitrarily, parallelism means multiple things happen at once
  • thread pool: recycle a set number of threads to avoid constantly incurring the overhead of creating a new thread
  • agent thread pool: used for agents and futures; program will not exit while threads are there; lifetime of 60 sec
  • future limitations
    • tasks too small for the overhead
    • exceptions get wrapped in ExecutionException, so your try/catches won’t work normally anymore
  • pmap: just a parallel map; lazy; runs N-cpu + 3 tasks in futures
    • generates threads as needed; could have problems if you’re creating multiple pmaps at once
    • slow task can stall it, since it waits for the first task in the sequence to complete for each trip through
    • also wraps exceptions just like future
  • laziness and parallelism: don’t mix
  • core.async
    • channels and coroutines
    • reads like go
    • fixed-size thread pool
    • handy when you’ve got a lot of callbacks in your code
    • mostly for concurrency, not parallelism
    • can use pipeline for some parallelism; it’s like a pmap across a channel
    • exceptions can kill coroutines
  • claypoole
    • pmap that uses a fixed-size thread pool
    • with-shutdown! will clean up thread pool when done
    • eager by default
    • output is an eagerly streaming sequence
    • also get pfor (parallel for)
    • lazy versions are available; can be better for chaining (fast pmap into slow pmap would have speed mismatch with eagerness)
    • exceptions are re-thrown properly
    • no chunking worries
    • can have priorities on your tasks
  • reducers
    • uses fork/join pool
    • good for cpu-bound tasks
    • gives you a parallel reduce
  • tesser
    • distributable on hadoop
    • designed to max out cpu
    • gives parallel reduce as well (fold)
  • tools for working with parallelism:
    • promises to block the state of the world and check things
    • yorkit (?) for jvm profiling

Boot Can Build It

  • Alan Dipert and Micha Niskin, adzerk
  • why a new build tool?
    • build tooling hasn’t kept up with the complexity of deploys
    • especially for web applications
    • builds are processes, not specifications
    • most tools: maven, ant, oriented around configuration instead of programming
  • boot
    • many independent parts that do one thing well
    • composition left to the user
    • maven for dependency resolution
    • builds clojure and clojurescript
    • sample boot project has main method (they used java project for demo)
    • uses ‘–‘ for piping tasks together (instead of the real |)
    • filesets are generated and passed to a task, then output of task is gathered up and sent to the next task in the chain (like ring middleware)
  • boot has a repl
    • can do most boot tasks from the repl as well
    • can define new build tasks via deftask macro
    • (deftask build …)
    • (boot (watch) (build))
  • make build script: (build.boot)
    • #!/usr/bin/env boot
    • write in the clojure code defining and using your boot tasks
    • if it’s in build.boot, boot will find it on command line for help and automatically write the main fn for you
  • FileSet: immutable snapshot of the current files; passed to task, new one created and returned by that task to be given to the next one; task must call commit! to commit changes to it (a la git)
  • dealing with dependency hell (conflicting dependencies)
    • pods
    • isolated runtimes, with own dependencies
    • some things can’t be passed between pods (such as the things clojure runtime creates for itself when it starts up)
    • example: define pod with env that uses clojure 1.5.1 as a dependency, can then run code inside that pod and it’ll only see clojure 1.5.1

One Binder to Rule Them All: Introduction to Trapperkeeper

  • Ruth Linehan and Nathaniel Smith; puppetlabs
  • back-end service engineers at puppetlabs
  • service framework for long-running applications
  • basis for all back-end services at puppetlabs
  • service framework:
    • code generalization
    • component reuse
    • state management
    • lifecycle
    • dependencies
  • why trapperkeeper?
    • influenced by clojure reloaded pattern
    • similar to component and jake
    • puppetlabs ships on-prem software
    • need something for users to configure, may not have any clojure experience
    • needs to be lightweight: don’t want to ship jboss everywhere
  • features
    • turn on and off services via config
    • multiple web apps on a single web server
    • unified logging and config
    • simple config
  • existing services that can be used
    • config service: for parsing config files
    • web server service: easily add ring handler
    • nrepl service: for debugging
    • rpc server service: nathaniel wrote
  • demo app: github -> trapperkeeper-demo
  • anatomy of service
    • protocol: specifies the api contract that that service will have
    • can have any number of implementations of the contract
    • can choose between implementations at runtime
  • defservice: like defining a protocol implementation, one big series of defs of fns: (init [this context] (let …)))
    • handle dependencies in defservice by vector after service name: [[:ConfigService get-in-config] [:MeowService meow]]
    • lifecycle of the service: what happens when initialized, started, stopped
    • don’t have to implement every part of the lifecycle
  • config for the service: pulled from file
    • supports .json, .edn, .conf, .ini, .properties, .yaml
    • can specify single file or an entire directory on startup
    • they prefer .conf (HOCON)
    • have to use the config service to get the config values
    • bootstrap.cfg: the config file that controls which services get picked up and loaded into app
    • order is irrelevant: will be decided based on parsing of the dependencies
  • context: way for service to store and access state locally not globally
  • testing
    • should write code as plain clojure
    • pass in context/config as plain maps
    • trapperkeeper provides helper utilities for starting and stopping services via code
    • with-app-with-config macro: offers symbol to bind the app to, plus define config as a map, code will be executed with that app binding and that config
  • there’s a lein template for trapperkeeper that stubs out working application with web server + test suite + repl
  • repl utils:
    • start, stop, inspect TK apps from the repl: (go); (stop)
    • don’t need to restart whole jvm to see changes: (reset)
    • can print out the context: (:MeowService (context))
  • trapperkeeper-rpc
    • macro for generating RPC versions of existing trapperkeeper protocols
    • supports https
    • defremoteservice
    • with web server on one jvm and core logic on a different one, can scale them independently; can keep web server up even while swapping out or starting/stopping the core logic server
    • future: rpc over ssl websockets (using message-pack in transit for data transmission); metrics, function retrying; load balancing

Domain-Specific Type Systems

  • Nathan Sorenson, sparkfund
  • you can type-check your dsls
  • libraries are often examples of dsls: not necessarily macros involved, but have opinionated way of working within a domain
  • many examples pulled from “How to Design Programs”
  • domain represented as data, interpreted as information
  • type structure: syntactic means of enforcing abstraction
  • abstraction is a map to help a user navigate a domain
    • audience is important: would give different map to pedestrian than to bus driver
  • can also think of abstraction as specification, as dictating what should be built or how many things should be built to be similar
  • showing inception to programmers is like showing jaws to a shark
  • fable: parent trap over complex analysis
  • moral: types are not data structures
  • static vs dynamic specs
    • static: types; things as they are at compile time; definitions and derivations
    • dynamic: things as they are at runtime; unit tests and integration tests; expressed as falsifiable conjectures
  • types not always about enforcing correctness, so much as describing abstractions
  • simon peyton jones: types are the UML of functional programming
  • valuable habit: think of the types involved when designing functions
  • spec-tacular: more structure for datomic schemas
    • from sparkfund
    • the type system they wanted for datomic
    • open source but not quite ready for public consumption just yet
    • datomic too flexible: attributes can be attached to any entity, relationships can happen between any two entities, no constraints
    • use specs to articulate the constraints
    • (defspec Lease [lesse :is-a Corp] [clauses :is-many String] [status :is-a Status])
    • (defenum Status …)
    • wrote query language that’s aware of the defined types
    • uses bi-directional type checking:
    • can write sensical error messages: Lease has no field ‘lesee’
    • can pull type info from their type checker and feed it into core.typed and let core.typed check use of that data in other code (enforce types)
    • does handle recursive types
    • no polymorphism
  • resources
    • practical foundations for programming languages: robert harper
    • types and programming languages: benjamin c pierce
    • study haskell or ocaml; they’ve had a lot of time to work through the problems of types and type theory
  • they’re using spec-tacular in production now, even using it to generate type docs that are useful for non-technical folks to refer to and discuss; but don’t feel the code is at the point where other teams could pull it in and use it easily

ClojureScript Update

  • David Nolen
  • ambly: cljs compiled for iOS
  • uses bonjour and webdav to target ios devices
  • creator already has app in app store that was written entirely in clojurescript
  • can connect to device and use repl to write directly on it (!)

Clojure Update

  • Alex Miller
  • clojure 1.7 is at 1.7.0-beta1 -> final release approaching
  • transducers coming
  • define a transducer as a set of operations on a sequence/stream
    • (def xf (comp (filter? odd) (map inc) (take 5)))
  • then apply transducer to different streams
    • (into [] xf (range 1000))
    • (transduce xf + 0 (range 1000))
    • (sequence xf (range 1000))
  • reader conditionals
    • portable code across clj platforms
    • new extension: .cljc
    • use to select out different expressions based on platform (clj vs cljs)
    • #?(:clj (java.util.Date.)
      :cljs (js/Date.))
    • can fall through the conditionals and emit nothing (not nil, but literally don’t emit anything to be read by the reader)
  • performance has also been a big focus
    • reduced class lookups for faster compile times
    • iterator-seq is now chunked
    • multimethod default value dispatch is now cached

Trust is Critical to Building Software

So much of software engineering is built on trust.

I have to trust that the other engineers on my team will pull me back from the brink if i start to spend too much time chasing down a bug. I have to trust that they’ll catch the flaws in my code during code review, and show me how to do it better. When reviewing their code, at some point I have to trust that they’ve at least tested things locally, and written something that works, even if doesn’t work well.

Beyond my team, I have to trust the marketing and sales folks to bring in new customers so we can grow the company. I’ve got to trust the customer support team to keep our current customers happy, and to report bugs they discover that I need to fix. I have to trust the product guys to know what features the customer wants next, so we don’t waste our time building things nobody needs.

And every time I use test fixture someone else wrote, I’m trusting the engineers that worked here in the past. When I push new code, I’m trusting our CI builds to run the tests properly and catch anything that might have broken. By trusting those tests, I’m trusting everyone that wrote them, too.

Every new line of code I write, every test I create, adds to that chain of trust, and brings me into it. As an engineer, I strive to be worthy of that trust, to build software that is a help, and not a burden, to those that rely on it.

Building a Bridge with Scrum

If you built a river bridge with Scrum:

The roadway would be built first, because that’s the Product Owner’s highest priority. Unfortunately, it would sink to the bottom of the river once deployed. This experience would be called “learning from failure.”

Realizing they needed to get above the waterline using some supports, the team would build a short set of pillars on top of the sunken roadway. The pillars wouldn’t reach above the waterline, but they would be valuable for the experience in building support pillars for this particular river.

Once the first set of pillars had been sunk, the team would use their new knowledge to build a second set of pillars that just breached the waterline, and construct¬†a second roadway on top of that. This would make the Product Owner very happy, because “customers could use it” for the first time.

After the first week, they’d realize they’d built a leaky dam, not a bridge, and that all boat traffic was being blocked.

Working seven days a week, 14 hours a day, the team would scramble to fix the bridge. They’d cut channels in the roadway to let some boat traffic through, then build a third set of supports and a third roadway at the proper height.

Burnt out and frazzled, the team would begin pointing fingers, each specialty blaming the other for the death march at the end. Half would quit, and join a different company, building high rises. The other half would struggle to support the many bugs still left in the bridge, despite not having any experience with the things built by the members who’d left (what was the special concrete mix Nancy used?).

Meanwhile, the Product Owner would be praised for pulling off a miracle, and be put in charge of another bridge-building project.

Three years later, the whole thing would be bulldozed to make room for a ferry.

Software Engineering Lessons from NASA: Part One

Long before I became a programmer, I worked as an optical engineer at NASA’s Goddard Space Flight Center. I was part of the Optical Alignment and Test group, which is responsible for the integration and testing of science instruments for spacecraft like the Hubble Space Telescope, Swift, and the upcoming James Webb Space Telescope.

While most of the day-to-day engineering practices I learned didn’t transfer over to software engineering, I’ve found certain principles still hold true.¬†Here’s the first one:

The first version you build will be wrong.

At NASA, it was taken as axiomatic that the first time you built an instrument, you’d screw it up. Engineers always pressed for the budget to build an Engineering Test Unit (ETU), which was a fully-working mock-up of the instrument you ultimately wanted to launch. Since the ETU was thrown away afterward, you could use it to practice the techniques you’d use to assemble the real instrument.

And you needed the practice. The requirements were often so tight (to the thousandth of an inch) that no matter how well you’d planned it out, something always went wrong: a screw would prove too hard to reach, or a baffle would be too thin to block all the light.

No amount of planning and peer review could find all the problems. The only way to know for sure if it would work was to build it. By building an ETU, you could shake out all the potential flaws in the design, so the final instrument would be solid.

In software, I’ve found the same principle holds: the first time I solve I problem, it’s never the optimal solution. Oddly enough, I often can’t see the optimal solution until after I’ve solved the problem in some other way.

Knowing this, I treat the first solution as a learning experience, to be returned to down the line and further optimized. I still shoot for the best first solution I can get, but I don’t beat myself up when I look back at the finished product and see the flaws in my design.

Five Reasons Your Company Should Allow Remote Developers

If you don’t allow your software developers to work from home, you’re not just withholding a nice perk from your employees, you’re hurting your business.

Here’s five reasons letting developers work remotely should be the default:

1. Widen your talent pool

Good software engineers are already hard to find. On top of that, you’ve got to hire someone skilled in your chosen tech stack, which narrows the pool even further. Why restrict yourself to those engineers within driving distance to your office? Can you really afford to wait for talent to move close to you?

The CTO for my current gig lives an hour away from the office, and can’t move because of his wife’s job. We wouldn’t have been able to hire him if we didn’t let him work from home. Who are you missing out on because you won’t look at remote devs.

2. Reclaim commute time

The average commute time in the US is 30 minutes. That’s an hour a day your in-office developers aren’t working. If you let them work from home, they can start earlier and finish later.

Don’t think they will? The evidence shows they do: people working from home work longer hours than people in the office.

An extra hour of work a day is five more hours a week, or almost three more work days in the month. Why throw that time away?

3. Reduce sick leave

When I’ve got a cold or flu, I stay home. I usually spend the first day or two resting, but by the third day I’m usually able to work for a little, even though I’m too sick to go into the office.

If my boss didn’t let me work from home, he wouldn’t get those “sick hours” from me. I’d be forced into taking more time off, getting less done.

And when my wife’s sick, I don’t have to choose between taking care of her and getting my work done. By working from home, I can do both.

4. Increase employee retention

The only thing worse than not hiring a good dev is losing an engineer you’ve already got. When they leave, you’re not only losing a resource, you’re losing all the historical knowledge they have about the system: weird bugs that only show up once a quarter, why the team chose X db instead of another, etc.

Since the competition for talent is fierce, you don’t want to lose out to another company. Letting developers work from home sends a clear signal to your employees that they’re valued and that you appreciate work-life balance. And with many companies sticking to the old “gotta be in the office” way of thinking, you’re ensuring that any offer your employees get from another company won’t be as attractive.

5. Stay focused on work done

Finally, letting developers work from home forces the entire team to focus on what’s really important: getting work done. It doesn’t matter how many hours a developer spends in the office; if that time isn’t productive, it’s wasted.

What does matter is how much work a dev gets done: how many bugs squashed, how many new features completed, how many times they jump in to help another engineer. What you need from your developers is software for your business, not time spent in a cubicle. If they’re not producing, they should be let go. If they *are* producing, does it matter if they’re at a coffee shop downtown?

And if you don’t have a way of measuring developer productivity beyond hours spent at their desk, get one. You can’t improve something you don’t measure, and team productivity should be high on your list of things to always be improving.