Seven More Languages in Seven Weeks: Lua

Realized I haven’t learned any new programming languages in a while, so I picked up a copy of Seven More Languages in Seven Weeks.

Each chapter covers a different language. They’re broken up into ‘Days’, with each day’s exercises digging deeper into the language.

Here’s what I learned about the first language in the book, Lua:

Day One

Just a dip into basic syntax.

  • table based
  • embeddable
  • whitespace doesn’t matter
  • no integers, only floating-point (!)
  • comparison operators will not coerce their arguments, so you can’t do =42 < ’43’
  • functions are first class
  • has tail-call-optimization (!)
  • extra args are ignored
  • omitted args just get nil
  • variables are global by default (!)
  • can use anything as key in table, including functions
  • array indexes start at 1 (!)

Day Two

Multithreading and OOP.

  • no multithreading, no threads at all
  • coroutines will only ever run on one core, so have to handle blocking and unblocking them manually
  • explicit over implicit, i guess?
  • since can use functions as values in tables, can build entire OO system from scratch using (self) passed in as first value to those functions
  • coroutines can also get you memoization, since yielding means the state of the fn is saved and resumed later
  • modules: can choose what gets exported, via another table at the bottom

Day Three

A very cool project — build a midi player in Lua with C++ interop — that was incredibly frustrating to get working. Nothing in the chapter was helpful. Learned more about C++ and Mac OS X audio than Lua.

  • had to add Homebrew’s Lua include directory (/usr/local/Cellar/lua/5.2.4_3/include) into include_directories command in CMakeLists.txt file
  • when compiling play.cpp, linker couldn’t find lua libs, so had to invoke the command by hand (after reading ld manual) with brew lua lib directory added to its search path via -L
  • basically, add this to CMakeFiles/play.dir/link.txt: -L /usr/local/Cellar/lua/5.2.4_3/lib -L /usr/local/Cellar/rtmidi/2.1.1/lib
  • adding those -L declarations will ensure make will find the right lib directories when doing its ld invocation (linking)
  • also had to go into the Audio Midi Setup utility and set the IAC Driver to device is online in order for any open ports to show up
  • AND then needed to be sure was running the Simplesynth application with the input set to the IAC Driver, to be able to hear the notes

Follow the Tweeting Bot

I have a problem.

No, not my fondness for singing 80s country in a bad twang during karaoke.

I mean a real, nerd-world problem: I have too many books to read.

I can’t leave any bookstore without buying at least one. For a good bookstore, I’ll walk out with half a dozen or more, balancing them in my arms, hoping none of them fall over.

I get them home and try to squeeze them into my bookshelf of “books I have yet to read” (not to be confused with my “books I’ve read and need to donate” or “books I’ve read and will re-read someday when I have the time” shelves). That shelf is full, floor to ceiling.

My list of books to read is already too long for me to remember them all. And that’s not counting the ones I have sitting in ebook format, waiting on my Kobo or iPhone for me to tap their cover art and dive in.

Faced with so much reading material, so many good books waiting to be read, my question is this: What do I read next?

I could pick based on mood. But that usually means me sitting in front of my physical books, picking out the one that grabs me. I could pick based on which ones I’ve bought most recently, which would probably narrow things down to just my ebooks.

But I want to be able to choose from all of my books, physical and virtual, at any time.

So I wrote a bot to help me.

It listens to my twitter stream for instructions. When I give it the right command, it pulls down my to-read shelf from Goodreads (yes, I put all of my books, real and electronic, into Goodreads. yes, it took much longer than I thought it would), ranks them in order of which ones I should read first, and then tweets back to me the top 3.

I’ve been following its recommendations for about a month now, and so far, it’s working. Footsteps in the Sky was great. Data and Goliath was eye-opening. The Aesthetic of Play changed the way I view art and games.

Now, if only I could train it to order books for me automatically…

Notes from Strange Loop 2015: Day Two

Pixie

  • lisp
  • own vm
  • compiled using RPython tool chain
  • RPython – reduced python
    • used in PyPy project
    • has own tracing JIT
  • runs on os x, linux, ARM (!)
  • very similar to clojure, but deviates from it where he wanted to for performance reasons
  • has continuations, called stacklets
  • has an open-ended object system; deftype, etc
  • also wanted good foreign function interface (FFI) for calling C functions
  • wants to be able to do :import Math.h :refer cosine
  • ended up writing template that can be called recursively to define everything you want to import
  • writes C file using template that has everything you need and then compiles it and then uses return values with the type info, etc
  • you can actually call python from pixie, as well (if you want)
  • not ready for production yet, but a fun project and PRs welcome

History of Programming Languages for 2 Voices

  • David Nolen and Michael Bernstein
  • a programming language “mixtape”

Big Bang: The World, The Universe, and The Network in the Programming Language

  • Matthias Felleisen
  • worst thought you can have: your kids are in middle school
  • word problems in math are not interesting, they’re boring
  • can use image placement and substitution to create animations out of word problems
  • mistake to teach children programming per se. they should use programming to help their math, and math to help their programming. but no programming on its own
  • longitudinal study: understanding a function, even if you don’t do any other programming ever, means a higher income as an adult
  • can design curriculum taking kids from middle school (programming + math) to high school (scheme), college (design programs), to graduate work (folding network into the language)

Notes from Strange Loop 2015: Day One

Unconventional Programming with Chemical Computing

  • Carin Meier
  • Living Clojure
  • @Cognitect
  • Inspired by Book – unconventional programming paradigms
  • “the grass is computing”
    • all living things process information via chemical reactions on molecular level
    •  hormones
    • immune system
    • bacteria signal processing
  • will NOT be programming with chemicalsusing metaphor of molecules and reactions to do computing
    • nothing currently in the wild using chemical computing
  • at the heart of chemical programming: the reaction
  • will calculate primes two ways:
    • traditional
    • with prime reaction
  • uses clojure for the examples
  • prime reaction
    • think of the integers as molecules
    • simple rule: take a vector of 2 integers, divide them, if the mod is zero, return the result of the division, otherwise, return the vector unchanged
    • name of this procedure: gamma chemical programming
    • reaction is a condition + action
    • execute: replacement of original elements by resulting element
    • solution is known when it results in a steady state (hence, for prime reaction, have to churn over lists of integers multiple times to filter out all the non-primes)
  • possible advantages:
    • modeling probabilistic systems
    • drive a computation towards a global max or min
  • higher order
    • make the functions molecules as well
    • fn could “capture” integer molecules to use as args
    • what does it do?
    • it “hatches” => yields original fn and result of applying fn to the captured arguments
    • reducing reaction fn: return fewer arguments than is taken in
    • two fns interacting: allow to exchange captured values (leads to more “stirring” in the chem sims)
  • no real need for sequential processing; can do things in any order and still get the “right” answer
  • dining philosophers problem
    • something chemical programming handles well
    • two forks: eating philosopher
    • one fork or no forks: thinking philosopher
    • TP with 2fs reacting with EAT => EP
  • “self organizing”: simple behaviors combine to create what look like complex behaviors
  • mail system: messages, servers, networks, mailboxes, membranes
    • membranes control reactions, keep molecules sorted
    • passage through membranes controlled by servers and network
    • “self organizing”

How Machine Learning helps Cancer Research

  • evelina gabasova
  • university of cambridge
  • cost per human genome has gone down from $100mil (2001) to a few thousand dollars (methodology change in mid-2000s paid big dividends)
  • cancer is not a single disease; underlying cause is mutations in the genetic code that regulates protein formation inside the cell
  • brca1 and brca2 are guardians; they check the chromosomes for mistakes and kill cells that have them, so suppress tumor growth; when they stop working correctly or get mutated, you can have tumors
  • clustering: finding groups in data that are more similar to each other than to other data points
    • example: clustering customers
    • but: clustering might vary based on the attributes chosen (or they way those attributes are lumped together)?
    • yes: but choose projection based on which ones give the most variance between data points
    • can use in cancer research by plotting genes and their expression and looking for grouping
  • want to be able to craft more targeted responses to the diagnosis of cancer based on the patient and how they will react
  • collaborative filtering
    • used in netflix recommendation engine
    • filling in cells in a matrix
    • compute as the product of two smaller matrices
    • in cancer research, can help because the number of people with certain mutations is small, leading to a sparsely populated database
  • theorem proving
    • basically prolog-style programming, constraints plus relations leading to single (or multiple) solutions
    • can use to model cancer systems
    • was used to show that chronic myeloid leukemia is a very stable system, that just knocking out one part will not be enough to kill the bad cell and slow the disease; helps with drug and treatment design
    • data taken from academic papers reporting the results of different treatments on different populations
  • machine learning not just for targeted ads or algorithmic trading
  • will become more important in the future as more and more data becomes available
  • Q: how long does the calculation take for stabilization sims?
    • A: for very simple systems, can take milliseconds
  • Q: how much discovery is involved, to find the data?
    • A: actually, whole teams developing text mining techniques for extracting data from academic papers (!)

When Worst is Best

  • Peter Bailis
  • what if we designed computer systems for the worst-case scenarios?
  • website that served 7.3Billion simultaneous users; would on average have lots of idle resources
  • hardware: what if we built this chip for the mars rover? would lead to very expensive packaging (and a lot of R&D to handle low-power low-weight environments)
  • security: all our devs are malicious; makes code deployment harder
  • designing for the worst case often penalizes the average case
  • could we break the curve? design for the worst case and improve the average case too
  • distributed systems
    • almost everything non-trivial is distributed these days
    • operate over a network
    • networks make designs hard
      • packets can be delayed
      • packets may be dropped
    • async network: can’t tell if message has been delayed or dropped
      • handle this by adding replicas that can respond to any request at any time
      • network interruptions don’t stop service
  • no coordination means even when everything is fine, we don’t have to talk
    • possible infinite service scale-out
  • coordinated multi-server transactions pay large penalty as we add more servers (from locks); get more throughput if we let access be uncoordinated
  • don’t care about latency if you don’t have to send messages everywhere
  • but what about the CAP theorem?
    • inktomi from eric brewer: for large scale services, have to trade off between always giving an answer and always giving the right answer
    • takeaway: certain properties of a system (like serializability) require unavailability
    • original paper: cathy lynch
    • common conclusion: availability is too expensive, and we have to give up too much, and it only matters during failures, so forget about it
  • if you use worst case as design tool, you skew toward coordination-avoiding databases
    • high coordination is legacy of old db design
    • coordination-free designs are possible
  • example: read committed isolation
    • goal: never read uncommitted data
    • legacy implementation: lock records during access (coordination)
    • one way: copy on write (x -> x’, do stuff -> write back to x)
    • or: versioning
    • for more detail, see martin’s talk on saturday about transactions
  • research on coordination-free systems have potential for huge speedups
  • other situations where worst-case thinking yields good results
    • replication for fault tolerance can also increase your request-serving capacity
    • fail-over can help deployments/upgrades: if it’s automatic, you can shut off the primary whenever you want and know that the backups will take over, then bring the primary back up when your work is done
    • tail latency in services:
      • avg of 1.2ms (not bad) can mean 0.1% of requests have 100ms (which is terrible)
      • if you’re one of many services being used to fulfill a front-end request, your worst case is more likely to happen, and so drag down the avg latency for the end-user
  • universal design: designing well for everyone; ex: curb cuts, subtitles on netflix
  • sometimes best is brittle: global maximum can sit on top of a very narrow peak, where any little change in the inputs can drive it away from the optimum
  • defining normal defines our designs; considering a different edge case as normal can open up new design spaces
  • hardware: what happens if we have bit flips?
  • clusters: what’s our scale-out strategy?
  • security: how do we audit data access?
  • examine your biases

All In with Determinism for Performance and Testing in Distributed Systems

  • John Hugg
  • VoltDB
  • so you need a replicated setup?
    • could run primary and secondary
    • could allow writes to 2 servers, do conflict detection, and merge all writes
    • NOPE
  • active-active: state a + deterministic op = state b
    • if do same ops across all servers, should end up with the same state
    • have client that sends A B C to coordination system, which then ends ABC to all replicas, which do the ops in order
    • ABC: a logical log, the ordering is what’s important
    • can write log to disk, for later replay
    • can replicate log to all servers, for constant active-active updates
    • can also send log across network for cluster replication
  • look out for non-determinism
    • random numbers
    • wall-clock time
    • record order
    • external systems (ping noaa for weather)
    • bad memory
    • libraries that use randomness for security
  • how to protect from non-determinism?
    • make sure sql is as deterministic as possible
    • 100% of their DML is deterministic
    • rw transactions are hard to make deterministic, have to do a little more planning (swap row-scan for tree-index scan)
    • use seeded random-number generators that are lists created in advance
    • hash up the write ops, and require replicas to send back their computed hashes once the ops are done so the coordinator can confirm the ops were deterministic
    • can also hash the whole replica state when doing a transactional snapshot
    • reduce latency by sending condensed representation of ops instead of all the steps (the recipe name, not the recipe)
  • why do it?
    • replicate faster, reduces concerns for latency
    • persist everything faster: start logging when the work is requested, not when the work is completed
    • bounded sizes: the work comes in as fast as the network allows, so the log will only be written no faster than the network (no firehose)
  • trade-offs?
    • it’s more work: testing, enforcing determinism
    • running mixed versions is scary: if you fix a bug, and you’re running different versions of the software between the replicas, you no longer have deterministic transactions
    • if you trip the safety checks, we shut down the cluster
  • testing?
    • multi-pronged approach: acid, sql correctness, etc
    • simulation a la foundationDB not as useful for them, since they have more states
    • message/state-machine fuzzing
    • unit tests
    • smoke tests
    • self-checking workload (best value)
      • everything written gets self-checked; so to check a read value, write it back out and see if it comes back unchanged
    • use “nefarious app”: application that runs a lot of nasty transactions, checks for ACID failures
    • nasty transactions:
      • read values, hash them, write them back
      • add huge blobs to rows to slow down processing
      • add mayhem threads that run ad-hoc sql doing updates
      • multi-table joins
        • read and write multiple values
      • do it all many many times within the same transaction
    • mix up all different kinds of environment tweaks
    • different jvms
    • different VM hosts
    • different OSes
    • inject latency, disk faults, etc
  • client knows last sent and last acknowledged transaction, checker can be sure recovered data (shut down and restart) contains all the acknowledged transactions

Scaling Stateful Services

  • Caitie MacCaffrey
  • been using stateless services for a long time, depending on db to store and coordinate our state
  • has worked for a long time, but got to place where one db wasn’t enough, so we went to no-sql and sharded dbs
  • data shipping paradigm: client makes request, service fetches data, sends data to client, throws away “stale” data
  • will talk about stateful services, and their benefits, but WARNING: NOT A MAGIC BULLET
  • data locality: keep the fetched data on the service machine
    • lower latency
    • good for data intensive ops where client needs quick responses to operations on large amounts of data
  • sticky connections and consistency
    • using sticky connections and stateful services gives you more consistency models to use: pipelined random access memory, read your write, etc
  • blog post from werner vogel: eventual consistency revisited
  • building sticky connections
    • client connecting to a cluster always gets routed to the same server
  • easiest way: persistent connections
    • but: no stickiness once connection breaks
    • also: mucks with your load balancing (connections might not all last the same amount of time, can end up with one machine holding everything)
    • will need backpressure on the machines so they can break connections when they need to
  • next easiest: routing logic in cluster
    • but: how do you know who’s in the cluster?
    • and: how do you ensure the work is evenly distributed?
    • static cluster membership: dumbest thing that might work; not very fault tolerant; painful to expand;
    • next better: dynamic cluster membership
      • gossip protocols: machines chat about who is alive and dead, each machine on its own decides who’s in the cluster and who’s not; works so long as system is relatively stable, but can lead to split-brain pretty quickly
      • consensus systems: better consistency; but if the consensus truth holder goes down, the whole cluster goes down
  • work distribution: random placement
    • write anywhere
    • read from everywhere
    • not sticky connection, but stateful service
  • work distribution: consistent hashing
    • deterministic request placement
    • nodes in cluster get placed on a ring, request gets mapped to spot in the ring
    • can still have hot spots form, since different requests will have different work that needs to be done, can have a lot of heavy work requests placed on one node
    • work around the hot spots by having larger cluster, but that’s more expensive
  • work distribution: distributed hash table
    • non-deterministic placement
  • stateful services in the real world
  • scuba:
    • in-memory db from facebook
    • believe to be static cluster membership
    • random fan-out on write
    • reads from every machine in cluster
    • results get composed by machine running query
    • results include a completeness metric
  • uber ringpop
    • nodejs library that does application-layer sharding for their dispatching services
    • swim gossip protocol for cluster membership
    • consistent hashing for work distribution
  • orleans
    • from Microsoft Research
    • used for Halo4
    • runtime and programming model for building distributed systems based on Actor Model
    • gossip protocol for cluster membership
    • consistent hashing + distributed hash table for work distribution
    • actors can take request and:
      • update their state
      • return their state
      • create a new Actor
    • request comes in to any machine in cluster, it applies hash to find where the DHT is for that client, then that DHT machine routes the request to the right Actor
    • if a machine fails, the DHT is updated to point new requests to a different Actor
    • can also update the DHT if it detects a hot machine
  • cautions
    • unbounded data structures (huge requests, clients asking for too much data, having to hold a lot of things in memory, etc)
    • memory management (get ready to make friends with the garbage collector profiler)
    • reloading state: recovering from crashes, deploying a new node, the very first connection of a session (no data, have to fetch it all)
    • sometimes can get away with lazy loading, because even if the first connection fails, you know the client’s going to come back and ask for the same data anyway
    • fast restarts at facebook: with lots of data in memory, shutting down your process and restarting causes a long wait time for the data to come back up; had success decoupling memory lifetime from process lifetime, would write data to shared memory before shutting process down and then bring new process up and copy over the data from shared to the process’ memory
  • should i read papers? YES!

Notes from LambdaConf 2015

Haskell and Power Series Brought to Life

  • not interested in convergence
  • laziness lets you handle infinite series
  • head/tail great for describing series
  • operator overloading lets you redefine things to work on a power series (list of Nums) as well as Nums
  • multiplication complication: can’t multiply power series by a scalar, since they’re not the same type
  • could define negation as: negate = map negate
    • instead of recursively: negate(x:xs) = negate x : negate xs
  • once we define the product of two power series, we get integer powers for free, since it’s defined in terms of the product
  • by using haskell’s head-tail notation, we can clear a forest of subscripts from our proofs
  • reversion, or functional inversion, can be written as one line in haskell when you take this approach:
    • revert (0:fs) = rs where rs = 0 : 1/(fs#rs)
  • can define integral and derivative in terms of zipWith over a power series
  • once we have integrals and derivatives, we can solve differential equations
  • can use to express generating functions, which lets us do things like pascal’s triangle
  • can change the default ordering of type use for constants in haskell to get rationals out of the formulas instead of floats
    • default (Integer, Rational, Double)
  • all formulas can be found on web page: ???
    • somewhere on dartmouth’s site
  • why not make a data type? why overload lists?
    • would have needed to define Input and Ouput for the new data type
    • but: for complex numbers, algebraic extensions, would need to define your own types to keep everything straight
    • also: looks prettier this way

How to Learn Haskell in Less than 5 Years

  • Chris Allen (bitemyapp)
  • title derives from how long it took him
    • though, he says he’s not particularly smart
  • not steady progress; kept skimming off the surface like a stone
  • is this talk a waste of time?
    • not teaching haskell
    • not teaching how to teach haskell
    • not convince you to learn haskell
    • WILL talk about problems encountered as a learner
  • there is a happy ending: uses haskell in production very happily
  • eventually made it
    • mostly working through exercises and working on own projects
    • spent too much time bouncing between different resources
    • DOES NOT teach haskell like he learned it
  • been teaching haskell for two years now
    • was REALLY BAD at it
    • started teaching it because knew couldn’t bring work on board unless could train up own coworkers
  • irc channel: #haskell-beginners
  • the guide: github.com/bitemyapp/learnhaskell
  • current recommendations: cis194 (spring ’13) followed by NICTA course
  • don’t start with the NICTA course; it’ll drive you to depression
  • experienced haskellers often fetishize difficult materials that they didn’t use to learn haskell
  • happy and productive user of haskell without understanding category theory
    • has no problem understanding advanced talks
    • totally not necessary to learn in order to understand haskell
    • perhaps for work on the frontiers of haskell
  • his materials are optimized around keeping people from dropping out
  • steers them away from popular materials because most of them are the worst ways to learn
  • “happy to work with any of the authors i’ve critized to help them improve their materials”
  • people need multiple examples per concept to really get it, from multiple angles, for both good and bad ways to do things
  • doesn’t think haskell is really that difficult, but coming to it from other languages means you have to throw away most of what you already know
    • best to write haskell books for non-programmers
    • if you come to haskell from js, there’s almost nothing applicable
  • i/o and monad in haskell aren’t really related, but they’re often introduced together
  • language is still evolving; lots of the materials from 90s are good but leave out a lot of new (and useful!) things
  • how to learn: can’t just read, have to work
  • writing a book with Julie (?) @argumatronic that will teach haskell to non-programmers, should work for everyone else as well; will be very, very long (longer than Real World Haskell)
  • if onboarding new employee, would pair through tutorials for 2 weeks and then cut them loose
  • quit clojure because he and 4 other clojurians couldn’t debug a 250 line ns

Production Web App in Elm

  • app: web-based doc editor with offline capabilities: DreamWriter
  • wrote original version in GIMOJ: giant imperative mess of jquery
  • knew was in trouble when he broke paste; could no longer copy/paste text in the doc
  • in the midst of going through rewrite hell, saw the simple made easy talk by rich hickey
  • “simple is an objective notion” – rich hickey
    • measure of how intermingled the parts of a system are
  • easy is subjective, by contrast: just nearer to your current skillset
  • familiarity grows over time — but complexity is forever
  • simpler code is more maintainable
  • so how do we do this?
    • stateless functions minimize interleaving
    • dependencies are clear (so long as no side effects)
    • creates chunks of simpleness throughout the program
    • easier to keep track of what’s happening in your head
  • first rewrite: functional style in an imperative language (coffeescript)
    • fewer bugs
  • then react.js and flux came out, have a lot of the same principles, was able to use that to offload a lot of his rendering code
    • react uses virtual dom that gets passed around so you no longer touch the state of the real dom
  • got him curious: how far down the rabbit-hole could he go?
    • sometimes still got bugs due to mutated state (whether accidental on his part or from some third-party lib)
  • realized: been using discipline to do functional programming, instead of relying on invariants, which would be easier
  • over 200 languages compile to js (!)
  • how to decide?
  • deal-breakers
    • slow compiled js
    • poor interop with js libs (ex: lunar.js for notes)
    • unlikely to develop a community
  • js but less painful?
    • dart, typescript, coffeescript
    • was already using coffeescript, so not compelling
  • easily talks to js
    • elm, purescript, clojurescript
    • ruled out elm almost immediately because of rendering (!)
  • cljs
    • flourishing community
    • mutation allowed
    • trivial js interop
  • purescript
    • 100% immutability + type inference
    • js interop: just add type signature
    • functions cannot have side effects* (js interop means you can lie)
  • so, decision made: rewrite in purescript!
    • but: no react or flux equivalents in purescript (sad kitten)
  • but then: a new challenger: blazing fast html in eml (blog post)
    • react + flux style but even simpler and faster (benchmarked)
  • elm js interop: ports
    • client/server relationship, they only talk with data
    • pub/sub communication system
  • so, elm, hmm…
    • 100% immutability, type inference
    • js interop preserves immutability
    • time travelling debugger!!!
    • saves user inputs, can replay back and forth, edit the code and then replay with the same inputs, see the results
  • decision: rewrite in elm!
  • intermediate step of rewriting in functional coffeescript + react and flux was actually really helpful
    • could anticipate invariants
    • then translate those invariants over to the elm world
    • made the transition to elm easier
  • open-source: rtfledman/dreamwriter and dreamwriter-coffee on github
  • code for sidebar looks like templating language, but is actually real elm (dsl)
  • elm programs are built of signals, which are just values that change over time
  • only functions that have access to a given signal have any chance of affecting it (or messing things up)
  • so how was it?
    • SO AWESOME
    • ridiculous performance
    • since you can depend on the function always giving you the same result for the same arguments, you can CACHE ALL THE THINGS (called lazy in Elm)
    • language usability: readable error messages from the compiler (as in, paragraphs of descriptive text)
    • refactoring is THE MOST FUN THING
    • semantic versioning is guaranteed. for every package. enforced by the compiler. yes, really.
    • diff tool for comparing public api for a lib
    • no runtime exceptions EVER
  • Elm is now his favorite language
  • Elm is also the simplest (!)
  • elm-lang.org

Clojure/West 2015: Notes from Day Three

Everything Will Flow

  • Zach Tellman, Factual
  • queues: didn’t deal with directly in clojure until core.async
  • queues are everywhere: even software threads have queues for their execution, and correspond to hardware threads that have their own buffers (queues)
  • queueing theory: a lot of math, ignore most
  • performance modeling and design of computer systems: queueing theory in action
  • closed systems: when produce something, must wait for consumer to deal with it before we can produce something else
    • ex: repl, web browser
  • open systems: requests come in without regard for how fast the consumer is using them
    • adding consumers makes the open systems we build more robust
  • but: because we’re often adding producers and consumers, our systems may respond well for a good while, but then suddenly fall over (can keep up better for longer, but when gets unstable, does so rapidly)
  • lesson: unbounded queues are fundamentally broken
  • three responses to too much incoming data:
    • drop: valid if new data overrides old data, or if don’t care
    • reject: often the only choice for an application
    • pause (backpressure): often the only choice for a closed system, or sub-system (can’t be sure that dropping or rejecting would be the right choice for the system as a whole)
    • this is why core.async has the puts buffer in front of their normal channel buffer
  • in fact, queues don’t need buffer, so much as they need the puts and takes buffers; which is the default channel you get from core.async

Clojure At Scale

  • Anthony Moocar, Walmart Labs
  • redis and cassandra plus clojure
  • 20 services, 70 lein projects, 50K lines of code
  • prefer component over global state

Clojure/West 2015: Notes from Day Two

Data Science in Clojure

  • Soren Macbeth; yieldbot
  • yieldbot: similar to how adwords works, but not google and not on search results
  • 1 billion pageviews per week: lots of data
  • end up using almost all of the big data tools out there
  • EXCEPT HADOOP: no more hadoop for them
  • lots of machine learning
  • always used clojure, never had or used anything else
  • why clojure?
    • most of the large distributed processing systems run on the jvm
    • repl great for data exploration
    • no delta between prototyping and production code
  • cascalog: was great, enabled them to write hadoop code without hating it the whole time, but still grew to hate hadoop (running hadoop) over time
  • december: finally got rid of last hadoop job, now life is great
  • replaced with: storm
  • marceline: clojure dsl (open-source) on top of the trident java library
  • writing trident in clojure much better than using the java examples
  • flambo: clojure dsl on top of spark’s java api
    • renamed, expanded version of climate corp’s clj-spark

Pattern Matching in Clojure

  • Sean Johnson; path.com
  • runs remote engineering team at Path
  • history of pattern matching
    • SNOBOL: 60s and 70s, pattern matching around strings
    • Prolog: 1972; unification at its core
    • lots of functional and pattern matching work in the 70s and 80s
    • 87: Erlang -> from prolog to telecoms; functional
    • 90s: standard ml, haskell…
    • clojure?
  • prolog: unification does spooky things
    • bound match unbound
    • unbound match bound
    • unbound match unbound
  • clojurific ways: core.logic, miniKanren, Learn Prolog Now
  • erlang: one way pattern matching: bound match unbound, unbound match bound
  • what about us? macros!
  • pattern matching all around us
    • destructuring is a mini pattern matching language
    • multimethods dispatch based on pattern matching
    • case: simple pattern matching macro
  • but: we have macros, we can use them to create the language that we want
  • core.match
  • dennis’ library defun: macros all the way down: a macro that wraps the core.match macro
    • pattern matching macro for defining functions just like erlang
    • (defun say-hi
      ([“Dennis”] “Hi Dennis!”)
      ([:catty] “Morning, Catty!”))
    • can also use the :guard syntax from core.match in defining your functions’ pattern matching
    • not in clojurescript yet…
  • but: how well does this work in practice?
    • falkland CMS, SEACAT -> incidental use
    • POSThere.io -> deliberate use (the sweet spot)
    • clj-json-ld, filter-map -> maximal use
  • does it hurt? ever?
  • limitations
    • guards only accept one argument, workaround with tuples
  • best practices
    • use to eliminate conditionals at the top of a function
    • use to eliminate nested conditionals
    • handle multiple function inputs (think map that might have different keys in it?)
    • recursive function pattern: one def for the start, one def for the work, one def for the finish
      • used all over erlang
      • not as explicit in idiomatic clojure