Check out my post on my company blog about some of the tech we’re using at my startup.
Uncategorized
Abstractions should be Extracted, not Designed
I attended RailsConf in 2019 and only the above quote has stuck with me since. David Heinemeier Hansson (DHH) said this in his keynote. Of all people, the creator of Ruby on Rails should have something to say about abstractions.
I’ve understood this to mean that when I’m writing code that will require an abstraction, I should write a first implementation without using any abstractions (even if this means repeating some code). Then once I’ve repeated myself once or twice or three times I should extract the abstraction from there in a refactor. This as opposed to designing the abstraction ahead of time and doing all the coding in this framework.
The logic here is that if I design the abstraction without digging into the implementations I’m very likely to get many things wrong. For example, I thought each implementation of the abstraction would need to configure a certain field, but really they can all share one. Or I thought some value would always be an enum, but really need the flexibility of a string. And now to make these changes I have to change the abstraction as well as the implementations.
I follow this pattern religiously now. The underlying insight is there’s no better way to research and design the abstraction than by getting your hands dirty with some implementations. The design is the extraction.
I realized yesterday this pattern works for building a company as well. One of my friends co-founded Rentroom. He described how they built their application by automating the painful parts of their brick-and-mortar real estate management company. They are their own first, best customer. And, naturally, the problems they were solving (easier online payments, repair task management, security deposit holding) were being experienced by other landlords too. They extracted their SaaS product from their implementation.
My former company Flexport is another great example of this. Flexport started as a traditional freight forwarder (and in many ways still is one) but focused relentlessly on automating and streamlining the painful parts of the shipping process. Now Flexport needs to turn these internal automations into sellable SaaS products.
Finally, AWS might be the canonical example. Amazon realized they were duplicating much of their DevOps/Infrastructure work across different business units; different teams each managing their own severs and deployments. So Amazon created a team to offer DevOps as a service: just ask DevOps for a server or a database, get back an endpoint and some credentials in return. This duplicated work was extracted from the implementers and now the details are abstracted away from them. And since it turns out just about every other web-connected person on the planet would like to abstract away these details, AWS now accounts for over 75% of the Amazon’s profit.
Deployment with Docker and AWS Fargate
I’ve intended to write about our deployment stack with AWS Fargate for a while but kept putting it off. We’ve gotten tremendous value from Fargate and there’s a serious dearth of approachable material online. I think these are related: Fargate is scary to write about. It’s an abstraction over your entire deployment so there’s necessarily a lot of magic going on under the hood. The documentation is filled with buzzy words like orchestration and serverless and – as with all AWS docs – self-referential to an exponentially increasing number of other AWS docs for acronyms like ELB, EC2, VPC, and EBS. But without being experts we’ve managed to use Fargate to setup continuous, rolling deployment of multiple applications. These have been running for two months now without any downtime. So what follows is a beginner’s guide to Fargate, written by a beginner. Let’s start by establishing some background.
Deployment
Deploying is the process of getting the web applications you run locally during development running on the public internet (in this article, on AWS). This is harder than it sounds for a number of reasons.
- Resources: When running locally you take for granted your computer’s resources like CPU, RAM, memory. These all have to be provisioned on some machine in the cloud. Traditionally this meant provisioning an EC2 instance.
- Operating System: Again taken for granted locally, but your provisioned instance needs to have an operating system – usually some Linux distro. This OS needs to be compatible with the technologies your application is running
- Publishing and running the code: you need to get your code onto the instance, either as the raw source or a compiled binary. Then you need to compile and run this application. And you want to seamlessly roll the new deploy over the old one, without any downtime. On top of all this you might have multiple applications you need to do this for.
- Reliability: your production deployment needs to keep running indefinitely. If some intermittent error occurs that crashes one of your applications you need that process to restart automatically or you’ll have downtime.
- Services: your application will almost certainly use some database like Postgres and maybe many others like Redis. These services need to be installed and run somewhere your instance can access them.
- Networking: when running the code locally all of our processes are running on the same machine making communication trivial. This will not be the case in the cloud so we have to manage how they’ll talk to one another from different machines.
- Security: a deployed application is accessible to the world. All of our processes’ endpoints and internal communication need to be secure.
- Secrets: your applications will likely hold many API keys and tokens to authenticate with other services. This need to be available on each instance, but these are highly sensitive and so should not be transferred frequently or over insecure channels.
I’m sure there are many more that I’m missing but this is already a daunting list. Traditionally each of these steps involved configuring something in the AWS Console UI or CLI for each service. In addition to being a huge pain in the ass this is dangerous. This amounts to a huge amount of managed state. You have no easy way to track, let alone revert, changes made in the UI. There’s no way to test changes before making them. If you need to scale then you have to manually provision new machines, take offline the old ones, and do all the network and secrets configuration anew. Its almost impossible to do this without having some scheduled downtime.
Serverless Deployment
AWS Fargate uses a different paradigm called Serverless Deployment. This is a bit of a misnomer since plenty of servers are still involved. But what’s meant here is that no EC2 server instances are ever provisioned or configured manually. Instead you describe in code what infrastructure and configuration you want, pass this code to Fargate, and let AWS handle the provisioning and setup.
There are huge benefits to this arrangement. Because the configuration is now in code that lives in version control you can manage and audit changes through your normal PR review process. You can easily review and rollback any changes. You can setup tests to run in your CI to ensure nothing breaks.
More philosophically, we’ve switched from an imperative to a declarative paradigm. Instead of making a series of commands (imperatives) in the AWS Console that created a huge amount of state to manage, we’re now simply declaring once “This is the correct state of the world”. A new deployment (or a rolled back deployment) is as straightforward as declaring the old configuration.
The code that declares all of this lives in two places: one or more Dockerfiles and one or more Task Definitions.
Dockerfile
Docker is an incredible tool; there have been entire books written about it. The short version is that Docker enables containerization: packaging your source code along with the requirements to run it. You declare in code in a file called a Dockerfile a virtual environment for your application to run in (for example Linux with Python installed), any service dependencies like Postgres, and the steps to build and run your application. Any containerized application can be run simply with docker exec
. This is an extremely powerful abstraction. This enables container orchestration tools like Kubernetes and Fargate to run and manage deployments of multiple applications without knowing anything about the internals of those apps.
Practically speaking, here’s what one of our Dockerfiles to deploy our Rust backend looks like:
FROM rustlang/rust:nightly-stretch
WORKDIR /usr/src/sheets/server
COPY Cargo.toml ./
COPY server ./server
RUN cargo build --release -p server
RUN cargo install --path server
EXPOSE 7000 8000
CMD ["/usr/local/cargo/bin/server"]
In order:
- Use a container image with the latest Rust nightly build installed. This includes an Ubuntu install and other basic dependencies.
- Setup a working directory in the container
- Copy in the needed source code to the container
- Compile the Rust code to a binary
- Install the Rust binary
- Expose two ports (one for Websockets and one for HTTPS)
- Run the Rust binary
Now Fargate can deploy this application. Further, other developers can run this application themselves without worry about installing anything on their machines or having mismatched dependencies.
Task Definitions
We use a task definition to define how Amazon Elastic Container Service (ECS) should run our Docker containers. This means defining most of the deployment steps from our original list not handled by the Dockerfile.
You can find plenty of templates in the official documentation and I’ve uploaded a redacted version of ours (we use the NestJS framework, hence the names). Most of it is boilerplate, but to highlight the interesting parts:
{
"containerDefinitions": [
{
...
"portMappings": [
{
"hostPort": 80,
"protocol": "tcp",
"containerPort": 80
}
],
...
"environment": [
{
"name": "NEST_PORT",
"value": "80"
},
],
...
"secrets": [
{
"name": "ASM_GOOGLE_SECRET",
"valueFrom": "arn:aws:secretsmanager:us-west-1:12345:..."
},
],
"image": "12345.dkr.ecr.us-west-1.amazonaws.com/repo:latest",
...
}
],
"memory": "512",
"cpu": "256",
...
}
In order, we are:
- Defining how to map our container ports
- Setting environment variables
- Setting up our secrets using Amazon Secret Manager
- Defining what container image to use (using Amazon Elastic Container Registry)
- Defining what resources the machine we deploy to should have (CPU, memory, etc.)
These are the steps that, in the old deployment methodology, we’d have to do manually each time we wanted to setup a new machine. We would need to manually provision an EC2 instance, setup the networking, and copy over the secrets and environment variables to that machine. Instead we declare all these steps in code and Fargate handles them for us.
Additional Benefits
This level of automation is hugely valuable on its own. But Fargate also gives us plenty of additional benefits “for free.”
Because Fargate entirely understands how to deploy machines we can configure Fargate to provision additional machines automatically as necessary. So if our site suddenly comes under tremendous load (say because of a press push) Fargate can automatically add new resources to handle the scale. This is an incredible feature for preventing downtime and slowness.
Fargate also does safe, rolling deployments. When we deploy new code there is no downtime; Fargate handles taking down the old version and only does so once the new deployment is running safety. If the new deployment fails because the health check is not responding then the old code will stay up, again preventing major downtime.
Conclusion
Our Fargate experience has been amazing. We’ve been doing continuous deployment, including adding new resources and services, without any downtime for months. Our code deploys every time a change merges; deploys take twelve minutes. We’ve been saved from downtime multiple times by the Fargate guard rails. We deploy with confidence, even right before important demos.
I fully recommend this deployment stack to anyone, even novice AWS users. Though the setup seems daunting you derive a huge amount of value from the effort. To get all of the features mentioned above you would need to put in a huge amount of manual effort. With this upfront cost we’re ready to scale easily for the foreseeable future.
Ride of a Lifetime
The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company by Robert Iger
Fast, enjoyable read that mixes auto-biography with a high-level history of ESPN, Disney, ABC, Capital Cities and the chain of acquisitions that took Iger through each. The book also has a self-improvement aspect; there’s plenty of Dale Carnegie in here, as well as some Warren Buffet folksy wisdom and a lot of suggestions for how to be a great manager, thinker, and person. Iger seems like a very humble guy who willed himself through dedication, hard work, and open-mindedness to the top of successive multinational corporations in different areas (sports, entertainment, technology).
The pace gets pretty rapid in the modern era once Iger becomes CEO of Disney; acquisitions of Pixar, Lucasfilm, Marvel, BAMTech, and 20th Century Fox go off in a flurry as Iger tries to navigate Disney into the digital distribution era with Disney+ and ESPN+. Very interesting to hear the inside baseball from a guy who just closed over $70 billion worth of acquisitions, though naturally you get the feeling most of the details are left out. We’ve yet to see how this strategy plays out, but Iger clearly subscribes to the “innovate or die” maxim and I bought some shares when I was finished.
Overall a fine but maybe unmemorable read; I’d probably give a 3.5 if I could. I recommend starting with the Acquired episode that turned me on to this book first and opening Ride of a Lifetime if you’re looking for more.
View all my reviews
Balkan Ghosts

My rating: 5 of 5 stars
Balkan Ghosts is an epic history of the part of the world that’s synonymous with fractious, violent internecine conflict. The breadth of the topic is massive, covering centuries of history in the lands that now constitute Serbia, Croatia, Macedonia, Kosovo, Romania, Moldavia, Bulgaria, Hungary, and Greece. The people were converted to Catholicism, Eastern Orthodox, Judaism, and Islam and subject to conquests by the Byzantine Empire, Ottoman Empire, Habsburg Empire, Nazi Germany, and Soviet communism. This book took me almost a year to finish; the book is massive, dense, and brutal.
The narrative foil of the book is a travelogue of the author’s trips throughout the region over twenty years; Kaplan befriends and interviews many colorful priests, politicians, and other characters in each nation. His goal, and the book’s, is to understand the historical and religious scars that underly the violence and instability of the Balkans. To summarize: in a region with a thousand-years history and a multiplicity of languages, religions, and ethnicities virtually every group has some rightful grievance against the others; times when their co-religious were forcibly converted, or their ethnicity was ejected from their land, or enslaved by a conquering empire, or murdered in a violent pogrom. Every group feels itself wronged, oppressed, the victim of history. Depressingly, no one interviewed even admits the possibility of ethnic healing and inter-racial co-existence; the only solutions they’ll admit amount to genocide.
Indeed, this book was published in 1993. Six years later NATO would intervene in Yugoslava to prevent the genocide of Kosovars by Serbs. I suppose the penetrating pessimism about the intractability of these problems is warranted. But in my news-watching lifetime the Balkans have not been in the news. Maybe this is cause for some optimism.
View all my reviews
Book Review: And Then All Hell Broke Loose
And Then All Hell Broke Loose: Two Decades in the Middle East by Richard Engel
My rating: 4 of 5 stars
Semi-autobiographical narrative history of the modern Middle East by the journalist Richard Engel who was embedded in Baghdad for the entirety of operation Shock and Awe, in Israel and Lebanon during the Second Intifada, and Egypt, Tunisia, Libya, and Syria during the Arab Spring. Some great accounts of life as a journalists embedded in a war zone (even being kidnapped and rescued in Syria by a pre-natal ISIS), a high-level overview of how the Middle East historically became such a disaster, and more recently how Bush and then Obama fucked up a fucked up situation even more.
Engel’s basic premise is that the Middle East is a powder keg of centuries-old blood feuds, with intermittent genocides and war crimes, between Shias, Sunnis, Kurds, Jews, and Christians. These groups have seemingly no chance of coming to peaceable terms with one another. After the fall of the Ottoman Empire in WWI the Middle East was “organized” into modern nation-states by the victorious Allied powers; these states often contained a volatile mixture of ethnicities, so to keep things stable and the oil flowing they’ve been run by a succession of “strong man” dictators in the mold of Saddam Hussein, Gadhafi, Assad, Mubarak, etc. To varying degrees these men run pseudo-Islamic ethno-police states with a strong cult of personality, disregard for human life and rights, and occasional internal purges and border wars. But most importantly (to the West) these leaders keep true Islamists suppressed, support a cold peace with Israel, and keep the oil flowing to international markets. In exchange we looked the other way on all but the most egregious abuses of power. After the Cold War the United States became the primary guarantor of Middle Eastern stability and the inheritor of this devil’s bargain.
George W Bush violated these terms by toppling Saddam on changing and tenuous pretenses. The war was easy, but he massively underestimated the pandora’s box he opened, naively expecting that by planting the “seeds of Democracy” he could grow a democracy in the desert and this would cure all evils. Instead, a recalcitrant Sunni minority launched a civil war against the new Shiite government that, for its part, seemed to have little commitment to its army or democratic institutions. The disaffected the delusional Sunnis, now out of power after years controlling the army and government, would become the foundation for ISIS and the destabilization of the region.
Obama muddled matters further by having no clear doctrine on the Middle East. He turned his back on longtime US ally Mubarak in Egypt, allowing him to be toppled by protestors in the Arab Spring. He went further in Libya, directing air power to defend rebels against another long-time US ally in Gadhafi. Citizens in Arab countries began believing they could count on US and NATO support in the event that they led a popular revolt against their own tyrannies. Yet when revels in Syria did exactly this, Obama blanched; later he would draw a red line, but when crossed he still did nothing. Engel contends this was a massive failure by the Obama administration, essentially encouraging a Syrian civil war that he would not then help to end. I found this convincing, though its sad the think that Obama’s idealism and values would lead to his biggest failure of foreign policy.
While these are what I’d call the “theses” of the book most of the content describes Engel’s life as a journalist or gives a high-level overview of Middle East and Islamic history. He covers topics like the early schism between Shiites and Sunnis, the Ummayad and Ottoman caliphates, the history of Israel, the founding of the House of Saud and Wahabiism, etc. He also describes working in a war zone and the work that entails: preparing safe houses, hiding $20k on your body, bribing police officers, getting smuggled over boarders. He even recounts being kidnapped for ransom in Syria by an ISIS precursor. The narrative is gripping; Engel lived through some incredible moments and mixes his history lessons in with his travels.
Lately I’ve been really enjoying this type of narrative history; I find them easy and fun to read while also learning plenty (I had about twelve pages of notes from this book). Highly recommend for anyone who likes the same.
View all my reviews
4/5: The Storm Before the Storm
The Storm Before the Storm: The Beginning of the End of the Roman Republic by Mike Duncan
The first book from Mike Duncan, the creator of the terrific The History of Rome podcast. Describes the relatively unexplored period of the late Roman Republic between two far more famous events: the conquest of Cartage and the rise of the Caesars and the Roman imperium. Duncan explores how Rome fell from a republican power controlling the known world to one oscillating between extremes of popular demagoguery and aristocratic oligarchy, leading to civil war and ultimately paving the way for the previously unthinkable ascent of a tyrant.
If you enjoyed the podcast as I did you’ll enjoy this book; the tone is conversational and narrative with liberal sprinklings of dry humor and modern analogies. In terms of events this time period has everything: the rise of an oligarchy in the Senate, the popular reaction in the Assembly, the introduction of mob violence, the creation of cults of personalities and armies with personal loyalties, civil war, and finally tyranny being welcomed as a preferred alternative to anarchic chaos
Roman history is long and the Roman empire was large, so by necessity this book moves very quickly and often haltingly, with some eras or careers covered in a page where others take chapters. There’s also an unfortunate paucity of sources about some key events and Duncan does a great job of weaving a coherent enjoyable narrative out of the available sources. Even so, the number of names and events covered can be overwhelming and I have massive set of Kindle highlights to go through.
My main criticism is I wish the themes were fleshed out more. The first half of the book sets up and returns to important themes: the abandonment of political norms, the introduction of mob violence, fights over suffrage and the expanding definition of “Roman”, increasing disregard for laws by those in power. Duncan gives plenty of examples of these but never builds them into a coherent framework or thesis. The second half of the book is largely narrative describing Marius’ and then Sulla’s paths to power, mostly eschewing analysis altogether. Entirely absent is an analysis of why this happened or how it could have been avoided.
That said, this was a hugely pleasurable read giving the joy of a fictional narrative but with non-fictional learning. I knew little about this fascinating time period that has so much to tell us about our own empire and polity.
Bad Science
After California voters passed Proposition 47 in 2014, recategorizing theft up to $950 in value as misdemeanors rather than felonies, the state saw an uptick in rates of larceny theft despite nation-wide trends continuing downward. By comparison burglary rates, whose sentencing was unaffected by Prop 47, continued downward in line with national trends. Of course correlation does not prove causation. But there is a plausible theory for causation: if you lower the “cost” of stealing $950 worth of goods, for some cohort of people the cost/benefit ratio might shift in favor risking petty theft where they did not before, leading to an increase in larcenies.
Therefore I was surprised to see a San Francisco Chronicle headline declaring that “Easing penalties on low-level offenses didn’t raise crime rate.” The article cites a 2018 study and quotes author Charis Kubrin, a professor of criminology at the University of California, Irvine, as saying “Our analysis tells us Prop. 47 was not responsible, so it must have been something else.”
This quote seemed striking for its certainty as much as for its conclusion. Despite the seeming correlation between the passage of the law and the spike in larcenies, proving definitively that Proposition 47 caused the observed increase in crime would be impossible. There are far too many confounding variables, especially in a state the size of California. Likewise proving that Proposition 47 did not cause the observed increase in larceny crime rates should be equally difficult.
Wanting to understand how this seemingly intractable problem was solved I read through the study itself, published in the journal Criminology & Public Policy in August 2018. The authors modeled crime rates for each crime category (e.g., murder, burglary, larceny) in California as the weighted average of the other 49 US states. Since none of these other states passed a Proposition 47, the authors could then use this model to project theoretical crime rates in California during 2015 to 2017 if Proposition 47 had not passed.
To generate this model, the authors used FBI crime rate data from 1977 to 2014 for each state for each category of crime, with larceny being of particularly interest. Each states’ crime rate in each category over these years is a potential contributing variable in the resulting model. The authors then used an algorithm to create the most accurate possible model of California’s crime rates expressed as a weighted sum of these variables.
For example, in the case of larceny the algorithm finds that Nevada and New York have crime rates that are most highly correlated with California’s, with Colorado correlating significantly less:
So the resulting function expresses California’s larceny crime rate as:
.479 * [Nevada] + .406 * [New York] + .095 * [Colorado]
The authors can then use this function with crime rates from these states from more recent years (2015-2017) to model what would have happened in “counterfactual California”.
The intuition underlying this model is that if there was another proximate cause for a rise or fall in crime these states would also be affected and thus capture this change. In the graphs above we can see that Nevada experiences many of the same spikes and falls as California. This makes intuitive sense since these neighboring states share a large border, weather, and fluid population moving between the two.
Employing this methodology the authors found Proposition 47 had a meaningful impact on the observed crime rates for larceny and motor vehicle thefts: “the post-intervention gaps suggest that larceny and motor vehicle thefts were less than 10% and roughly 20% higher, respectively, in 2015 than they would have been without Prop 47.”
That is to say, the study shows a statistically significant increase in rates for larceny and car thefts in observed rates compared to the synthetic control model. This is completely at odds with the conclusions drawn in the Chronicle and in the author’s own quotes! By the authors’ own methodology, the best model for California’s non-intervention larceny theft rate was lower than the actual larceny rate by a statistically significant degree.
How then did the researchers get from this finding to the point where one of the authors would declare “When we compared crime levels between these two California’s, they were very similar, indicating that Prop. 47 was not responsible for the increase”? From the study:
To determine whether the estimated larceny effect is sensitive to changes in Synthetic California’s composition (i.e., different donor pool weights), we iteratively exclude the donor pool state with the greatest weight (ω) until all of the original donor pool states with nonzero weights have been removed. Synthetic California is composed of four donor pool states with weights that are greater than zero: New York, Michigan, Nevada, and New Jersey. The version of Synthetic California that results from this procedure is composed of a set of donor pool states that are entirely different than our original model. If the estimated impact of Prop 47 on California’s crime rate persists under both compositions, we can be confident that our larceny estimate is not dependent on the contribution of certain donor pool states to Synthetic California. If our interpretation changes under Synthetic California’s new composition, however, the estimated effect is dependent on the contribution of certain donor pool states and the finding should be interpreted cautiously.
In short, the authors removed the states with the highest correlation to California’s larceny rates from the model and created a new model from the remaining (more poorly correlated) states. The authors assert that only if this new, necessarily less accurate model also demonstrates that there was a statistically significant impact from Prop 47 can we conclude that this is the case.
These stipulations and modifications to the model are puzzling. By removing a small cohort of states that best approximate crime rates in California, the authors degrade the model and increase its error rate relative to the real California crime rates. These modifications appear to have been made with the intention of finding any paradigm in which Proposition 47’s effects on crime rates fell within the margin of error of their changing synthetic model. Once this is done, they find:
For larceny, we find that Synthetic California requires at least one of the following states be included in the donor pool in order to sustain the effect: New York, Michigan, Nevada, and New Jersey… When these four donor pool units are excluded, the post-intervention gap disappears.
So if you exclude all four states that originally were used as the best approximation of California’s crime rates, the gap disappears. This apparently is enough to reach the authors’ conclusion:
This suggests that our valid causal interpretation of the Prop 47 effect on larceny rests on the validity of including these four states in our donor pool. Thus, larceny, our only nonzero, nontrivial effect estimate, appears to be dependent on the contribution of four specific states from our donor pool. This finding, therefore, should be interpreted with caution.
From a statistical standpoint these conclusions are difficult to understand. While the authors state that inclusion of these 4 states calls into question the validity of the model’s findings regarding larceny rates, the authors include no similar testing or mention for modeling of other crimes, such as rape and murder rates. The authors do not attempt to parse out how these effects on larceny vary by inclusion of 1, 2, 3, or all 4 states. Finally, the authors make no mention of completing a Bonferroni correction, or other statistical means through which to correct for the statistical implications of multiple comparisons. In other words, these methodologies are concerning for P-hacking, or the practice of trying different study designs until the desired outcome is found.
The graph of this model from the study itself shows just how facile this process is:
Compare the graphs of Synthetic California (Baseline, No Restrictions), Synthetic California (NY, NY, MI, & NJ excluded) and California (Actual). The baseline model does a good job of approximating the actual observed data. The model excluding the four states does not come close to plausibly modeling the larceny rates in California. And of course once you remove enough states with the closest correlation to California’s crime rate you will eventually come up with a model that is bad enough that you can declare its findings statistically insignificant. Without an explanation for why the methodology required the removal of four states (rather than three, or two, or none) this simply reads like bad science.
More troubling than these experiment design issues, though, are the massive discrepancies between what is actually proven in the study and the claims made by the authors in the both the study’s conclusion and in the media.
The study’s abstract states that “Larceny and motor vehicle thefts, however, seem to have increased moderately after Prop 47, but these results were both sensitive to alternative specifications of our synthetic control group and small enough that placebo testing cannot rule out spuriousness.” Yet by the end of the abstract they note that “our findings suggest that California can downsize its prisons and jails without compromising public safety.” This conclusion is in no way reached by the actual findings of the study, yet is presented as such by the authors.
The study did find that Prop 47 had a statistically significant effect on larceny rates, albeit with caveats about the fallibility of computational models. Yet the author states to the Chronicle that “Our analysis tells us Prop. 47 was not responsible, so it must have been something else.”
Again, this is in no way proven in the study. This statement is complete contradiction of the study’s findings. And yet this conclusion is cited in multiple Chronicle articles, and even an editorial which adds “that analysis said the jump was matched in other states, suggest[ing] Prop. 47 wasn’t a difference maker” – despite this claim never appearing in the study (and in reality being contrary to fact).
Now this untruth has entered the public discourse as a fact with the rigor of a mathematical study behind it, despite being contrary to the findings of said study. We can trace this fuzziness beginning with some questionable but obscure modeling decisions, the implications of which are magnified by the authors in the study’s abstract (at the expense of other, more certain findings) and then further expounded upon in the media by the author and then by journalists writ large.
I emailed the Chronicle months ago but have received no response; sadly this will remain part of the record. All I can recommend is that we be wary of what we accept as fact and make sure to read the fine print before admitting study conclusions as such.
Books I Loved in 2019
According to my Google doc I read 48 books in 2019; at about one per week this sounds about right. I started off reading a lot of old Western novels – I guess because I’d just moved out West – and overall read way more history books than in 2018 – I guess because we recently feel to be at such a precipitous moment in history. Full list at the bottom, but some of my favorites:
Traitor to His Class: The Privileged Life and Radical Presidency of Franklin Delano Roosevelt. Great biography of one of the great presidents that I knew too little about. FDR took the helm of an America that was dedicatedly isolationist, even amidst the rise of Fascism, and that “had yet to conceive its role as providing social services”, even at the depths of the great depression. Roosevelt defeated Hoover by campaigning for the basic level of social assistance we take for granted today: unemployment insurance, federal insurance of bank deposits, social security, and fair labor standards among them. Many of the most important social government institutions we have today were created by Roosevelt, and our conception of the state role in social affairs we owe entirely to him.
His greater challenge by far was getting the US into World War II. FDR saw immediately the moral and existential threat posed by Fascism but, like Cassandra, was all but powerless to move a staunchly isolationist America into the war. FDR took a two pronged approach of helping the Allies indirectly by providing materiel through the lend-lease program and embargoing Japan, and slowly swaying the opinion of the country through his fireside chats. Eventually he all but forced the Japanese’ hand at Pearl Harbor which gave him the power to finally declare the war he’d felt necessary for so long.
Season of the Witch: Amazing, fascinating history of San Francisco starting with the Summer of Love in 1967 and the corruption of that promise that followed. Free love, music, drug use, and anarchy in the Summer of Love promised a new, better world. For a brief moment this was true; but then free love turned into AIDs, anarchy turned into gangs and cults abusing the unprotected, and drug use turned into addiction. Jim Jones, Harvey Milk, and Diane Feinstein are all covered – did you know Harvey Milk was a big defender of Jim Jones? Or that Feinstein was all but washed out of politics until she inherited the City Supervisor’s office after Harvey Milk’s assassination? Lots of good stuff in here.
On a personal level I was pleased to read about how the current tech wave in the Bay fits into San Francisco’s history of new arrivals changing the makeup of the city. The sixties brought in the hippies; the eighties brought in the gays; and now the 2000s bring in the tech bros. Each arrival triggered a reaction by the existing predominant culture worried about changing mores and their own group being sidelined, but the forces of change always win in SF.
Say Nothing: A True Story of Murder and Memory in Northern Ireland: has received plenty of coverage on “Best of 2019” lists so I’ll speak sparingly here. Another fascinating, recent part of our history that I had a complete blind spot for. I think of the Protestant vs. Catholic blood feud as occurring in ancient Europe, yet this story occurred in my parents’ lifetime. Scary to see how easily these sectarian tensions can be inflamed.
The Killer Angels: one of those great historical novelizations that reads like a novel but tells a true story. Narrates four days of the battle at Gettysburg from the perspective of the generals and captains leading the significant actions. Covers the purported as well as actual reasons given by both sides for fighting the war. There’s real tragedy here: the leaders on both sides know each other, often intimately, and fought as compatriots only ten years earlier. On the southern side many see the war as lost; on the Northern side there’s little faith in their commander; on both sides they know their actions could lead to the deaths of their friends. Yet everyone sees themselves as buffeted by forces beyond their control and forced into this tragedy. Some brutal descriptions of war.
Ruminations
Looking back over my reading list for 2019 I realize that I spent much of the year looking back; that is, reading more history books (and less fiction) compared to 2018. In part I think this was due to spending plenty of time with the terrific Revolutions podcast; this both fascinated me with European history and shocked me with how little I know about our proximate cultural history. I also ascribe this in part, as with seemingly everything in 2019, to Trump. At a time when previously immovable geopolitical boundaries are shifting and prevailing wisdom challenged I started to wonder about how we arrived at this state of affairs and how fluid our world really is. What is the point of NATO? Why are Ukraine’s boarders where they are? Why are we allied with Egypt, or Saudi Arabia? Why isn’t Russia more European?
Usually I find these histories comforting. There has been a massive amount of tumult in the history of the world and the calm that we think exists or existed was mostly illusory. Countries fracture and are formed continuously. Alliances are formed and broken, sometimes at dizzying speed. Compared to the depression, our economy is fine. Compared to WWII or the Cold War, our world is peaceful. Compared to the sixties or even the eighties our racial tensions are calm. The world usually feels itself to be at an unprecedented moment of peril but usually this is not the case.
Full List
- Barbarians at the Gate
- The Great Partition: The Making of India and Pakistan
- Fascism: A Warning
- The Sirens of Titan
- Slaughterhouse Five
- Brave New World
- Cat’s Cradle
- Educated
- Bad Blood
- Boom Town
- Red Notice
- Lonesome Dove
- True Grit
- Dawn of D-Day
- Boom Town
- The Sun Does Shine
- Killers of the Flower Moon
- American Dream
- Traitor to his Class
- Heavy
- Edge of Anarchy
- Dune
- Season of the Witch
- Charged (unfinished)
- Two Income Trap
- Deep Work
- Rules of Civility
- Say Nothing
- The Color of Money
- Ask Again, Yes
- Normal People
- The Call of the Wild
- Manufacturing Consent
- The Big Fella (unfinished)
- The French Revolution: A Very Short Introduction
- The Killer Angels
- Russian History: A Very Short Introduction
- Locking Up our Own
- Ender’s Shadow
- Operation Paperclip
- Ring of Steel
- Cherry
- Trafalgar: The Nelson Touch
- Our Man: George Packer
- Waterloo: Day of Battle
- 1066: The Year of Conquest
- The Odyssey
- The MVP Machine
Shortest Cycle Algorithm
One of the surprises of starting work as a software engineer after college was how little the coding I did professionally resembled the coding I did academically. I can count on one hand the number of times I’ve explicitly used the knowledge about algorithms and logic that I’d spent four hard years acquiring. Most of these “hard” problems are solved for us by libraries so the closest we usually come is plugging some package in where we need it.
So I was pretty happy when the opportunity recently arose to apply my knowledge of graph algorithms outside of the classroom. At Dimagi we allow users to define the data they want to collect using the XForm specification. Among other things, XForms allows users to define logical dependencies between questions. The value of question A might be a function of the answers to B and C; question D might or might not show up (in the parlance, be relevant) depending on the answer to question E; the answer to question F might be invalid depending on the answer to question G. Because XForms are declarative there isn’t a control flow; all questions are evalauted and re-evaluated until they are in the correct state. This means that cycles in XForms are illegal; they break this evaluation flow.
Until recently our parser was able to detect cycles and so declare the XForm illegal at validation time; however, while the algorithm was very fast at detecting a cycle it could not output the exact path of cycle, merely detect its existence. Of course this was pretty unhelpful to our users who would sometimes have forms with hundreds of questions and then have to find a needle in the haystack.
After much grumbling from our field team I decided to tackle this problem at a hackathon. There are plenty of algorithms for accomplishing this so I chose a simple option found on Stack Overflow. Put simply, perform a depth first search from every node keeping a reference to the first node; if you ever encounter the node you started with, you’ve found a cycle. In our case, nodes would be questions and the directed edges would be the types of references listed above (calculations, relevency conditions, and validation conditions).
Easy enough! The algorithm was working fine until recently when we got a couple reports of the page where people edit their forms hanging indefinitely. Both forms were very large (over a thousand nodes) and because they were clustered in large top-level groups they were highly connected. (This is because if question B and C are both sub-questions of group A, then the relevency state naturally depends on the relevancy state of group A.) We narrowed in on the validate_form
endpoint being the culprit. When I didn’t find any errors in Sentry, it became clear that the problem was one of efficiency, not incorrectness.
After replicating the bug locally, setting some break points, and observing the logic flow I quickly honed in on the problem: repeated work. Because questions could be nested in five or ten layers of groups the connectedness of the graph was massive. Further, if you had ten questions in a sub-group that itself was nested in four or five sub-groups then those ten questions would be calculating almost identical dependency graphs. We were repeatedly walking the same walk.
The solution, as always, was caching. When we completed a new depth first search from a node we would save the set of nodes that we reached from that node – the list of nodes in that node’s sub-graph. Then, before starting a DFS on a new node, we would check to see if we’d already cached the reachable nodes from that node. If so, we could check the set of reachable nodes; if the set of reachable nodes does not contain the node we are currently searching for cycles from, then we know that a cycle is not possible from this sub-search and we short circuit.
After deploying this code the issue was immediately resolved. You can see the code here; currently its quite paired to our XForm implementation but the algorithm itself operates only on Strings so it would be straightforward to re-purpose.
One thought I had while implementing this was the emphasis placed on caching academically and professionally. Specifically, I remember caching being mentioned mostly as an aside in college and discussed thoroughly only in my Operating Systems class where caching was essential to reasonable performance. For the most part the algorithms classes I took were interested in the correctness and complexity of the algorithm; by this standard, the original algorithm I’d written was completely correct. However, in the professional world that algorithm was not even close to correct; its slow performance caused a breaking error in our product. So often good caching is essential to our products working at all, let alone efficiently.
There’s another conversation I’ve been having recently about the divergence of academic and vocational coding that I want to write abotu soon, but that’s for another day!