Bad Science

After California voters passed Proposition 47 in 2014, recategorizing theft up to $950 in value as misdemeanors rather than felonies, the state saw an uptick in rates of larceny theft despite nation-wide trends continuing downward. By comparison burglary rates, whose sentencing was unaffected by Prop 47, continued downward in line with national trends. Of course correlation does not prove causation. But there is a plausible theory for causation: if you lower the “cost” of stealing $950 worth of goods, for some cohort of people the cost/benefit ratio might shift in favor risking petty theft where they did not before, leading to an increase in larcenies.

Therefore I was surprised to see a San Francisco Chronicle headline declaring that “Easing penalties on low-level offenses didn’t raise crime rate.” The article cites a 2018 study and quotes author Charis Kubrin, a professor of criminology at the University of California, Irvine, as saying “Our analysis tells us Prop. 47 was not responsible, so it must have been something else.”

This quote seemed striking for its certainty as much as for its conclusion. Despite the seeming correlation between the passage of the law and the spike in larcenies, proving definitively that Proposition 47 caused the observed increase in crime would be impossible. There are far too many confounding variables, especially in a state the size of California. Likewise proving that Proposition 47 did not cause the observed increase in larceny crime rates should be equally difficult.

Wanting to understand how this seemingly intractable problem was solved I read through the study itself, published in the journal Criminology & Public Policy in August 2018. The authors modeled crime rates for each crime category (e.g., murder, burglary, larceny) in California as the weighted average of the other 49 US states.  Since none of these other states passed a Proposition 47, the authors could then use this model to project theoretical crime rates in California during 2015 to 2017 if Proposition 47 had not passed.

To generate this model, the authors used FBI crime rate data from 1977 to 2014 for each state for each category of crime, with larceny being of particularly interest. Each states’ crime rate in each category over these years is a potential contributing variable in the resulting model. The authors then used an algorithm to create the most accurate possible model of California’s crime rates expressed as a weighted sum of these variables. 

For example, in the case of larceny the algorithm finds that Nevada and New York have crime rates that are most highly correlated with California’s, with Colorado correlating significantly less:

Screen Shot 2020-02-01 at 6.37.43 PM.pngScreen Shot 2020-02-01 at 6.37.04 PM.png

Screen Shot 2020-02-01 at 8.05.34 PM.pngScreen Shot 2020-02-01 at 8.06.54 PM.png

So the resulting function expresses California’s larceny crime rate as:

 .479 * [Nevada] + .406 * [New York] + .095 * [Colorado]

The authors can then use this function with crime rates from these states from more recent years (2015-2017) to model what would have happened in “counterfactual California”.

The intuition underlying this model is that if there was another proximate cause for a rise or fall in crime these states would also be affected and thus capture this change. In the graphs above we can see that Nevada experiences many of the same spikes and falls as California. This makes intuitive sense since these neighboring states share a large border, weather, and fluid population moving between the two. 

Employing this methodology the authors found Proposition 47 had a meaningful impact on the observed crime rates for larceny and motor vehicle thefts: “the post-intervention gaps suggest that larceny and motor vehicle thefts were less than 10% and roughly 20% higher, respectively, in 2015 than they would have been without Prop 47.”

That is to say, the study shows a statistically significant increase in rates for larceny and car thefts in observed rates compared to the synthetic control model. This is completely at odds with the conclusions drawn in the Chronicle and in the author’s own quotes! By the authors’ own methodology, the best model for California’s non-intervention larceny theft rate was lower than the actual larceny rate by a statistically significant degree.

How then did the researchers get from this finding to the point where one of the authors would declare “When we compared crime levels between these two California’s, they were very similar, indicating that Prop. 47 was not responsible for the increase”? From the study:

To determine whether the estimated larceny effect is sensitive to changes in Synthetic California’s composition (i.e., different donor pool weights), we iteratively exclude the donor pool state with the greatest weight (ω) until all of the original donor pool states with nonzero weights have been removed. Synthetic California is composed of four donor pool states with weights that are greater than zero: New York, Michigan, Nevada, and New Jersey. The version of Synthetic California that results from this procedure is composed of a set of donor pool states that are entirely different than our original model. If the estimated impact of Prop 47 on California’s crime rate persists under both compositions, we can be confident that our larceny estimate is not dependent on the contribution of certain donor pool states to Synthetic California. If our interpretation changes under Synthetic California’s new composition, however, the estimated effect is dependent on the contribution of certain donor pool states and the finding should be interpreted cautiously.

In short, the authors removed the states with the highest correlation to California’s larceny rates from the model and created a new model from the remaining (more poorly correlated) states. The authors assert that only if this new, necessarily less accurate model also demonstrates that there was a statistically significant impact from Prop 47 can we conclude that this is the case.

These stipulations and modifications to the model are puzzling. By removing a small cohort of states that best approximate crime rates in California, the authors degrade the model and increase its error rate relative to the real California crime rates. These modifications appear to have been made with the intention of finding any paradigm in which Proposition 47’s effects on crime rates fell within the margin of error of their changing synthetic model. Once this is done, they find:

For larceny, we find that Synthetic California requires at least one of the following states be included in the donor pool in order to sustain the effect: New York, Michigan, Nevada, and New Jersey… When these four donor pool units are excluded, the post-intervention gap disappears. 

So if you exclude all four states that originally were used as the best approximation of California’s crime rates, the gap disappears. This apparently is enough to reach the authors’ conclusion:

This suggests that our valid causal interpretation of the Prop 47 effect on larceny rests on the validity of including these four states in our donor pool. Thus, larceny, our only nonzero, nontrivial effect estimate, appears to be dependent on the contribution of four specific states from our donor pool. This finding, therefore, should be interpreted with caution.

From a statistical standpoint these conclusions are difficult to understand. While the authors state that inclusion of these 4 states calls into question the validity of the model’s findings regarding larceny rates, the authors include no similar testing or mention for modeling of other crimes, such as rape and murder rates. The authors do not attempt to parse out how these effects on larceny vary by inclusion of 1, 2, 3, or all 4 states. Finally, the authors make no mention of completing a Bonferroni correction, or other statistical means through which to correct for the statistical implications of multiple comparisons. In other words, these methodologies are concerning for P-hacking, or the practice of trying different study designs until the desired outcome is found.

The graph of this model from the study itself shows just how facile this process is:

Compare the graphs of Synthetic California (Baseline, No Restrictions), Synthetic California (NY, NY, MI, & NJ excluded) and California (Actual). The baseline model does a good job of approximating the actual observed data. The model excluding the four states does not come close to plausibly modeling the larceny rates in California. And of course once you remove enough states with the closest correlation to California’s crime rate you will eventually come up with a model that is bad enough that you can declare its findings statistically insignificant. Without an explanation for why the methodology required the removal of four states (rather than three, or two, or none) this simply reads like bad science.

More troubling than these experiment design issues, though, are the massive discrepancies between what is actually proven in the study and the claims made by the authors in the both the study’s conclusion and in the media.

The study’s abstract states that “Larceny and motor vehicle thefts, however, seem to have increased moderately after Prop 47, but these results were both sensitive to alternative specifications of our synthetic control group and small enough that placebo testing cannot rule out spuriousness.” Yet by the end of the abstract they note that “our findings suggest that California can downsize its prisons and jails without compromising public safety.” This conclusion is in no way reached by the actual findings of the study, yet is presented as such by the authors.

The study did find that Prop 47 had a statistically significant effect on larceny rates, albeit with caveats about the fallibility of computational models. Yet the author states to the Chronicle that “Our analysis tells us Prop. 47 was not responsible, so it must have been something else.”

Again, this is in no way proven in the study. This statement is complete contradiction of the study’s findings. And yet this conclusion is cited in multiple Chronicle articles, and even an editorial which adds “that analysis said the jump was matched in other states, suggest[ing] Prop. 47 wasn’t a difference maker” – despite this claim never appearing in the study (and in reality being contrary to fact). 

Now this untruth has entered the public discourse as a fact with the rigor of a mathematical study behind it, despite being contrary to the findings of said study. We can trace this fuzziness beginning with some questionable but obscure modeling decisions, the implications of which are magnified by the authors in the study’s abstract (at the expense of other, more certain findings) and then further expounded upon in the media by the author and then by journalists writ large.

I emailed the Chronicle months ago but have received no response; sadly this will remain part of the record. All I can recommend is that we be wary of what we accept as fact and make sure to read the fine print before admitting study conclusions as such.

Books I Loved in 2019

According to my Google doc I read 48 books in 2019; at about one per week this sounds about right. I started off reading a lot of old Western novels – I guess because I’d just moved out West – and overall read way more history books than in 2018 – I guess because we recently feel to be at such a precipitous moment in history. Full list at the bottom, but some of my favorites:

Traitor to His Class: The Privileged Life and Radical Presidency of Franklin Delano RooseveltGreat biography of one of the great presidents that I knew too little about. FDR took the helm of an America that was dedicatedly isolationist, even amidst the rise of Fascism, and that “had yet to conceive its role as providing social services”, even at the depths of the great depression. Roosevelt defeated Hoover by campaigning for the basic level of social assistance we take for granted today: unemployment insurance, federal insurance of bank deposits, social security, and fair labor standards among them. Many of the most important social government institutions we have today were created by Roosevelt, and our conception of the state role in social affairs we owe entirely to him.

His greater challenge by far was getting the US into World War II. FDR saw immediately the moral and existential threat posed by Fascism but, like Cassandra, was all but powerless to move a staunchly isolationist America into the war. FDR took a two pronged approach of helping the Allies indirectly by providing materiel through the lend-lease program and embargoing Japan, and slowly swaying the opinion of the country through his fireside chats. Eventually he all but forced the Japanese’ hand at Pearl Harbor which gave him the power to finally declare the war he’d felt necessary for so long.

Season of the Witch: Amazing, fascinating history of San Francisco starting with the Summer of Love in 1967 and the corruption of that promise that followed. Free love, music, drug use, and anarchy in the Summer of Love promised a new, better world. For a brief moment this was true; but then free love turned into AIDs, anarchy turned into gangs and cults abusing the unprotected, and drug use turned into addiction. Jim Jones, Harvey Milk, and Diane Feinstein are all covered – did you know Harvey Milk was a big defender of Jim Jones? Or that Feinstein was all but washed out of politics until she inherited the City Supervisor’s office after Harvey Milk’s assassination? Lots of good stuff in here.

On a personal level I was pleased to read about how the current tech wave in the Bay fits into San Francisco’s history of new arrivals changing the makeup of the city. The sixties brought in the hippies; the eighties brought in the gays; and now the 2000s bring in the tech bros. Each arrival triggered a reaction by the existing predominant culture worried about changing mores and their own group being sidelined, but the forces of change always win in SF.

Say Nothing: A True Story of Murder and Memory in Northern Irelandhas received plenty of coverage on “Best of 2019” lists so I’ll speak sparingly here. Another fascinating, recent part of our history that I had a complete blind spot for. I think of the Protestant vs. Catholic blood feud as occurring in ancient Europe, yet this story occurred in my parents’ lifetime. Scary to see how easily these sectarian tensions can be inflamed.

The Killer Angels: one of those great historical novelizations that reads like a novel but tells a true story. Narrates four days of the battle at Gettysburg from the perspective of the generals and captains leading the significant actions. Covers the purported as well as actual reasons given by both sides for fighting the war. There’s real tragedy here: the leaders on both sides know each other, often intimately, and fought as compatriots only ten years earlier. On the southern side many see the war as lost; on the Northern side there’s little faith in their commander; on both sides they know their actions could lead to the deaths of their friends. Yet everyone sees themselves as buffeted by forces beyond their control and forced into this tragedy. Some brutal descriptions of war.

Ruminations

Looking back over my reading list for 2019 I realize that I spent much of the year looking back; that is, reading more history books (and less fiction) compared to 2018. In part I think this was due to spending plenty of time with the terrific Revolutions podcast; this both fascinated me with European history and shocked me with how little I know about our proximate cultural history. I also ascribe this in part, as with seemingly everything in 2019, to Trump. At a time when previously immovable geopolitical boundaries are shifting and prevailing wisdom challenged I started to wonder about how we arrived at this state of affairs and how fluid our world really is. What is the point of NATO? Why are Ukraine’s boarders where they are? Why are we allied with Egypt, or Saudi Arabia? Why isn’t Russia more European?

Usually I find these histories comforting. There has been a massive amount of tumult in the history of the world and the calm that we think exists or existed was mostly illusory. Countries fracture and are formed continuously. Alliances are formed and broken, sometimes at dizzying speed. Compared to the depression, our economy is fine. Compared to WWII or the Cold War, our world is peaceful. Compared to the sixties or even the eighties our racial tensions are calm. The world usually feels itself to be at an unprecedented moment of peril but usually this is not the case.

Full List

  • Barbarians at the Gate
  • The Great Partition: The Making of India and Pakistan
  • Fascism: A Warning
  • The Sirens of Titan
  • Slaughterhouse Five
  • Brave New World
  • Cat’s Cradle
  • Educated
  • Bad Blood
  • Boom Town
  • Red Notice
  • Lonesome Dove
  • True Grit
  • Dawn of D-Day
  • Boom Town
  • The Sun Does Shine
  • Killers of the Flower Moon
  • American Dream
  • Traitor to his Class
  • Heavy
  • Edge of Anarchy
  • Dune
  • Season of the Witch
  • Charged (unfinished)
  • Two Income Trap
  • Deep Work
  • Rules of Civility
  • Say Nothing
  • The Color of Money
  • Ask Again, Yes
  • Normal People
  • The Call of the Wild
  • Manufacturing Consent
  • The Big Fella (unfinished)
  • The French Revolution: A Very Short Introduction
  • The Killer Angels
  • Russian History: A Very Short Introduction
  • Locking Up our Own
  • Ender’s Shadow
  • Operation Paperclip
  • Ring of Steel
  • Cherry
  • Trafalgar: The Nelson Touch
  • Our Man: George Packer
  • Waterloo: Day of Battle
  • 1066: The Year of Conquest
  • The Odyssey
  • The MVP Machine

Shortest Cycle Algorithm

One of the surprises of starting work as a software engineer after college was how little the coding I did professionally resembled the coding I did academically. I can count on one hand the number of times I’ve explicitly used the knowledge about algorithms and logic that I’d spent four hard years acquiring. Most of these “hard” problems are solved for us by libraries so the closest we usually come is plugging some package in where we need it.

So I was pretty happy when the opportunity recently arose to apply my knowledge of graph algorithms outside of the classroom. At Dimagi we allow users to define the data they want to collect using the XForm specification. Among other things, XForms allows users to define logical dependencies between questions. The value of question A might be a function of the answers to B and C; question D might or might not show up (in the parlance, be relevant) depending on the answer to question E; the answer to question F might be invalid depending on the answer to question G. Because XForms are declarative there isn’t a control flow; all questions are evalauted and re-evaluated until they are in the correct state. This means that cycles in XForms are illegal; they break this evaluation flow.

Until recently our parser was able to detect cycles and so declare the XForm illegal at validation time; however, while the algorithm was very fast at detecting a cycle it could not output the exact path of cycle, merely detect its existence. Of course this was pretty unhelpful to our users who would sometimes have forms with hundreds of questions and then have to find a needle in the haystack.

After much grumbling from our field team I decided to tackle this problem at a hackathon. There are plenty of algorithms for accomplishing this so I chose a simple option found on Stack Overflow. Put simply, perform a depth first search from every node keeping a reference to the first node; if you ever encounter the node you started with, you’ve found a cycle. In our case, nodes would be questions and the directed edges would be the types of references listed above (calculations, relevency conditions, and validation conditions).

Easy enough! The algorithm was working fine until recently when we got a couple reports of the page where people edit their forms hanging indefinitely. Both forms were very large (over a thousand nodes) and because they were clustered in large top-level groups they were highly connected. (This is because if question B and C are both sub-questions of group A, then the relevency state naturally depends on the relevancy state of group A.) We narrowed in on the validate_form endpoint being the culprit. When I didn’t find any errors in Sentry, it became clear that the problem was one of efficiency, not incorrectness.

After replicating the bug locally, setting some break points, and observing the logic flow I quickly honed in on the problem: repeated work. Because questions could be nested in five or ten layers of groups the connectedness of the graph was massive. Further, if you had ten questions in a sub-group that itself was nested in four or five sub-groups then those ten questions would be calculating almost identical dependency graphs. We were repeatedly walking the same walk.

The solution, as always, was caching. When we completed a new depth first search from a node we would save the set of nodes that we reached from that node – the list of nodes in that node’s sub-graph. Then, before starting a DFS on a new node, we would check to see if we’d already cached the reachable nodes from that node. If so, we could check the set of reachable nodes; if the set of reachable nodes does not contain the node we are currently searching for cycles from, then we know that a cycle is not possible from this sub-search and we short circuit.

After deploying this code the issue was immediately resolved. You can see the code here; currently its quite paired to our XForm implementation but the algorithm itself operates only on Strings so it would be straightforward to re-purpose.

One thought I had while implementing this was the emphasis placed on caching academically and professionally. Specifically, I remember caching being mentioned mostly as an aside in college and discussed thoroughly only in my Operating Systems class where caching was essential to reasonable performance. For the most part the algorithms classes I took were interested in the correctness and complexity of the algorithm; by this standard, the original algorithm I’d written was completely correct. However, in the professional world that algorithm was not even close to correct; its slow performance caused a breaking error in our product. So often good caching is essential to our products working at all, let alone efficiently.

There’s another conversation I’ve been having recently about the divergence of academic and vocational coding that I want to write abotu soon, but that’s for another day!

Bad Blood

Bad Blood

I recently read Bad Blood (brilliant title) by John Carreyrou on the flight from Boston to San Francisco; in fact, I couldn’t put it down, even when I should have been prepping for interviews. The book is a piece of investigative journalism that reads like a thriller, mostly because you desperately want to see the bad guys finally get taken down. Yet, like Walter White, they keep getting away with it. The book can’t report their downfall because, fascinatingly, the book itself catalyzed the fall of Elizabeth Holmes and ‘Sunny’ Balwani. In a gonzo twist the author becomes a player in the story he’s reporting, threatened by the same Theranos enforcers that silenced internal whistle blowers and repelled investigators. Determined investigative journalism was required to expose the big lie that federal regulators turned a blind eye to and, sometimes, were complicit with.

In this way and others Bad Blood reminds me of The Best and the Brightest. In both Holmes and Lyndon Johnson you had charismatic but vindictive leaders that purged dissenters and promoted sycophants. In both cases you had the big lie – that Theranos technology worked, that the Vietnam War was necessary and we were winning – that could not be questioned for risk of retaliation. All ‘data’ was then fabricated to align with this outcome. This information was not independently verifiable because it was a trade secret or a national security risk. Once this ‘data’ generated by motivated reasoning existed it became further unassailable proof of the rightness of the original claim. Dissenters were not just ostracized but prosecuted – for violating their NDAs at Theranos, for treason at the White House – and thus silenced.

Theranos also featured a powerful cast of government figures on its board of directors – James Mattis, Henry Kissinger, John Biden, and most of all George Schultz – that provide cover in the federal bureacracy and national media. Notably, they are all older men described as personally enthralled with the young, charismastic Holmes. From the Wall Street Journal:

“The brilliant young Stanford dropout behind the breakthrough invention was anointed ‘the next Steve Jobs or Bill Gates’ by no less than former secretary of state George Schultz”

Empire of Illusion comments that fame has become credential enough for one to be an authority on any issue. I can’t think of any other justification for having a Cold War-era Secretary of State weigh in on the merits of a medical device. We saw the same with Obama and Solyndra; because these figures are brilliant in one domain we assume they are expert in another. Medical VCs quickly honed in on Holmes’ inability and hostility to backing her claims and declined to invest. The legitimacy she couldn’t obtain from the experts she obtained from the famous and powerful, to whom we (the public) readily assigned the same expertise as the experts. There might be a short-selling strategy here: look for the company that famous people who have no idea what they’re talking about are boosting.

Holmes found a ready audience for this publicity blitz in a media that desperately wanted a female STEM success story.

“As much as she courted the attention, Elizabeth’s sudden fame wasn’t entirely her doing. Her emergence tapped into the public’s hunger to see a female entrepreneur break through in a technology world dominated by men. Women like Yahoo’s Marissa Mayer and Facebook’s Sheryl Sandberg had achieved a measure of renown in Silicon Valley, but they hadn’t created their own companies from scratch. In Elizabeth Holmes, the Valley had its first female billionaire tech founder.”

I suppose this is the other side of the coin whereby VC firms expect the next Mark Zuckerberg to look like Mark Zuckerberg. Parmenides: “The good and the true are not necessarily the same.”

One last idea I wanted to mention was how many harmful affects Non-Disclosure Agreements (NDAs) seem to have in our society today. The #MeToo movement brought to light countless monsters who protected themselves for years with NDAs and no doubt there are thousands more. NDAs Theranos to antagonize their employees and cover their lies for years. Trump’s NDAs may have safeguarded his election. So far as I can tell these contracts serve only those with secrets to hide who possess the legal power and wealth to enforce them.

A President or a Press Secretary

I finished David Halberstram’s The Best and the Brightest about a month ago and I’ve been trying to find a way to approach writing about it ever since. The book is so packed with fascinating history and insights that its hard to choose what to bite off first; worse, each approach ties inexorably into a few others, making disentangling one into a blog post near impossible.

I heard a quote in a Podcast the other day (I think Vox’s The Weeds but I can’t remember certainly) that succinctly captured one of the main messages and has been banging around in my head ever since; to paraphrase, “We think of our brain as a president, when really it is a press secretary.” That is: we imagine that we’re impartially weighing and evaluating each piece of evidence we read about the world and incorporating this into our updated worldview, when really our worldview is mostly fixed and the ‘evidence’ we hear is recast or highlighted as to best fit into this worldview (or outright rejected if no fit can be made).

This isn’t a revolutionary idea and is in many ways just a restatement of Confirmation bias; however, the idea and the metaphor itself aptly describe one of the main problems that befell the Kennedy and Johnson administrations.

At a high level: Kennedy and his acacademic and operationally minded technocracy of men like McGeorge Bundy, Bill McNamara, and Walt Rostow believed in the overwhelming power of their own open-minded brilliance and rationality to resolve any political matter; however, this self-same belief made them blind to their own biases and preconceptions and eventually outright hostile to any questioning of their essential rightness. The propoganda machinery supporting this self-deception eventually permeated stretched from the reporting structure at the lowest levels of the military on the ground in Vietnam to the highest ranking members of the presidental cabinet.

The mechanisms enforcing this conformity are myriad and take up many of the pages of this (massive) book. However, three (that magic number for enumeration) stick with me now: bureacratic enforcement, patriotic pressure, and misinformation.

Bureacratic pressure is one of the most obvious and pernicious mechanisms that was available to the administration. Obvious because this is a bureacracy; pernicious because the pressure could be applied without administrators even realizing they had done so. At its meanest, men who disagreed with the presidential concensus on Vietnam and East Asia generally would be fired or removed to remote, insignificant posts in order to silence their criticism. More common, I think, would be the subtle pressure on military underlings to provide rosy assessments of the war to their superiors all the way up the main, or likewise the pressure on State Department officials to sit on information that upset the administration’s worldview. Promotions for those with ‘good’ information; stalled careers for those with true information. In this way the adminstrators, intentionally and not, outright corrupted the data that was so essential to the proper function of their hyper-rationalist evaluations.

Before reading this book I had underestimated how much the shadow of McCarthyism hung over Washington for years after his death. Not only communists but anyone to the left of the prevaling hard-line anti-Communist demonology of the day were purged from political life and sometimes faced with legal problems. Through the Johnson administration Democrats in particular remained fearful of seeming ‘soft’ on Communism, even when it came to supporting liberal domestic causes. The way this worked to silence critics of the hard-line of Vietnam cannot be understated. Journalists were afraid to question the statements and assumptions of the adminstration; potential dissenters in Congress feared not “supporting the boys”; outsiders within State and the military had long been purged. Disagreeing that the United States should support the cruel and corrupt Diem regime in Vietnam, pointing out the danger and futility of saturation bombing in North Vietnam, questioning whether the “Communuist Monolith” was real or imagined were not opinions meriting discussiin; they were committing treason. This led to a nation-wide purging of dissenting voices and created the illusion of concensus in the administration.

Finally, the Johnson (and to a lesser extent Kennedy) administration used the information and dissemination powers of the Executive to mislead the public, the press, and Congress. They executive under-stated the troop requirements given by the generals, painted an inanely optimisitic picture of the war (which would always be “over by Christmas”, lied about funding requirements, leaked when it benefit them but otherwise completely hid the internal deliberations and decisions of the administration. These machinations were part and parcel with the leader’s belief in their own intellectual superiority and rationality; they held the public and Congress in such disdain that misleading them was necessary and right in order to bring them along on the correct course. Those who questioned and disagreed with the leadership were belittled and treated with hostility. Halberstram summarizes this beautifully:

Nor had they, leaders of a democracy, bothered to involve the people of their country in the course they had chosen: they knew the right path and they knew how much could be revealed, step by step along the way. They had manipulated the public, the Congress and the press from the start, told half truths, about why we were going in, how deeply we were going in, how much we were spending, and how long we were in for. When their predictions turned out to be hopelessly inaccurate, and when the public and the Congress, annoyed at being manipulated, soured on the war, then the architects had been aggrieved. They had turned on those very symbols of the democratic society they had once manipulated, criticizing them for their lack of fiber, stamina and lack of belief.

What are those of us who believe in rationality to do, then? The “Go with your Gut” instict of the second Bush administration brought us Iraq, our modern Vietnam; the passions of the mob brought us Trump. Halberstram argues that infusing humanism and political courage were the necessary but missing partners of Kennedy’s rationalism. The team was too willing to see napalm bombing as numbers on a spreadsheet modeling the balance of the war rather than as the act of human cruelty that it was. And even when Kennedy saw an unfortunate political truth, he lacked the courage to bear the political cost of his convinction – always waiting until the second term.

I’d argue that intellectual humility was also a crucial but missing element of this administration. Too many men who’d made a career of being perpertually correct and meteorically successful had no capacity for considering their own flaws and blindspots and no tolerance for the less-credentialed who challenged the rightness of their betters. To my thinking Obama, most often compared to Kennedy, fused these traits of humanism, political courage, and humility nicely. Perhaps relatedly he stands almost alone among his modern peers in not embroiling the United States in some foreign military debacle (though, somehow, Trump has a chance to be the most peace-loving president yet).

More important than prescriptions for society, though, is advise for ourselves. Remain humble, question your assumptions, and attempt to meet new information with the open mind of a true Executive rather than the fixed, debate-poised mind of a Press Secretary.

Cuyahoga National Park

After attending the wedding of a friend from Kindergarten (what a thing to say!) in Bedford, Pennsylvania this past weekend my girlfriend I took the opportunity to visit the youngest child of the National Parks system, Cuyahoga Valley in Ohio. The park is about two hours west of Pittsburgh Airport and we stopped at Fallingwater on the way.

We weren’t sure what to expect from the Park; the imagery we found online lacked a single characteristic that defines most parks (such as Half Dome in Yosemite or the arches in Arches), instead featuring various meadows, rivers, waterfalls, and train tracks. The map of the park hints at its unique layout, situated directly between Akron to the south and Cleveland to the north and with highways bisecting the park horizontally. Indeed we never felt too far from the road at any point in the park; as in many of the parks of the East, you won’t find the extreme solitude here that you will in the larger, remoter parks of the West.

In fact I must admit that we’d added this visit to our itinerary mostly for the sake of crossing this park off our list while we were in the neighborhood and had rather low expectations for the park itself.

These expectations were exceeded on our first hike around Ritchie Ledges where we saw some wonderful rock formations (pictured), beautiful meadows and had a moderately strenuous four mile walk. The rain alternated with sunniness throughout and the forest smelled richly; creeks and rivers sounded continually along with the birds. The overwatch wasn’t much of one, even for a cloudy day; the feeling was one of closeness with nature rather than the remote hugeness of the views in some other parks. We had the trail almost to ourselves and the ranger even encouraged us to go off trail to explore the crannies of the ledges; a NP first for us both.

IMG_4804.png

Afterwards we hiked from the Stanford House to Brandywine Falls, the latter being the signature attraction of the park if there is one. Again this hike exceeded all expectations; there was enough elevation gain to get a decent sweat up, you’re frequently passing cool creeks, and the waterfall could hold its own against many of the others we’ve seen (though not those of proud Yosemite). We were both reminded of some of our hikes in Olympic National Park where we had rainy weather, rich forests, and interesting water features (though Cuyahoga has not the beaches or mountains). Brandywine Falls is accessible by car and as such was one of the few places on our hikes that was almost busy.

IMG_4824.png

Before leaving the following afternoon we did two more hikes; one around Kendall Lake and the Salt Run Tail, then another around Buttermilk Fall. I found myself thinking of home during these hikes; besides the Lake I found our first walk uninteresting (besides the portions around the lake, pictured at the top) and Buttermilk was simply lesser than Brandywine and lacking the nice lead-up hike. At this point I became convinced that our 48 hours was sufficient for the park, at least for one visit.

 

The most interesting artifact of these hikes was a strangely placed tunnel along the lake loop that appeared to be built to no purpose being under an easily-surmountable hill. A placard on the other side explained that this tunnel was built to allow guests to walk under a toboggan run that featured prominently at the lake in the area’ heyday of the 1930s-40s. Strange to think that an area now so desolate was so recently a hub of activity – how quickly a place of activity became a place of history!

Finally I have to mention the delightful AirBnB where we stayed during for our time in Cuyahoga. Besides being a beautiful, peaceful property with a cozy apartment, usable bedside fireplace and lovely patio the host, Wade, was wonderful. He was knowledgable about restaurants and hikes, unfailingly sweet and thoughtful, and took us on a tour of his incredible antiques collection including the Messerschmitt Kabinenroller pictured below. If you visit I can’t recommend this AirBnB enough; and please say “hi” to Wade for me.

IMG_4840.png

One last note: the food at The Lounge at PIT is exceptional!

Cheers,

Will

Spring Cleaning with Android StrictMode

While monitoring Fabric after our most recent release we noticed a new non-fatal crash:

Non-fatal Exception: android.database.CursorWindowAllocationException: 
Cursor window allocation of 2048 kb failed. 
# Open Cursors=937 (# cursors opened by pid 4459=936) 
(# cursors opened by this proc=1)

being thrown when CommCare attempted to save a form. Some Googling confirmed this to have an out of memory exception at its root and indeed the exception reports 936(!) open Cursors. After manually looking through our code I couldn’t find anywhere that we were obviously failing to clean up our Cursors. Further Googling turned up this StackOverflow response with an extremely helpful code snippet:

if (BuildConfig.DEBUG) {     
     StrictMode.setVmPolicy(new StrictMode.VmPolicy.Builder()
     .detectLeakedSqlLiteObjects()
     .detectLeakedClosableObjects()
     .penaltyLog()
     .penaltyDeath()
     .build());
}

This snippet (pasted into the onCreate of your main Activity or Application) enables thread- and VM-level analysis that will crash the application whenever the errors I specified occur (for example, when we’ve lost the handle to a Cursor or InputStream without those resources being closed). You could call this the fail very fast principle and this is certainly not something you’d want on in production.

Booting up CommCare I ran into four crashes before even reaching our launch activity. For the most part we were failing to close InputStream objects and using a try-with-resources statement fixed the problem.

Interestingly the only dropped Cursor I turned up did not come from our application but was coming from the google_app_measurement_local.db – looks like a Google problem! Further hunting took me to this discussion implicating firebase as the culprit, and updating our library resolved the issue.

After poking around our application for a while I was satisfied that I’d exposed and fixed every issue. However, after leaving the tablet dormant for a while a background process kicked in and triggered yet another StrictMode crash! We’re now debating whether to leave StrictMode on for all dev builds and looking into logging these events on production builds. I’d be interested to hear others’ best practices on this and I recommend giving this snippet a try for any dev build.

[Photo from the beautiful Sand Dunes National Park]

Temple Renovations

This weekend my parents and I retired to our family home in Falmouth, Massachusetts to begin demolition. The kitchen had grown out of date in both form and function, and with my sister now firmly graduated my parents can shift their attention to renovations.

Our Cape house and community are something of a secular religion to me. The home is the one thing I share with my paternal great-grandparents. Lifelong friendships were formed there, coming of age rites passed, our family grown and bonded. The idea of modifying this chapel felt borderline sacrilegious. How could we mere mortals improve upon so perfect a place? Though the updates were long anticipated, I was anxious and hesitant to begin.

I arrived late Friday night and already the character of the house was changed. My parents had already moved everything not bolted down from the kitchen into the living room, leaving only empty cupboards. My memories of the house are primarily of warmth, coziness, and unquestionable belonging. Now, we remarked, we felt like we were lamming it – strangers hiding in the skeleton of a home.

Desecration began the next morning. At first I moved gingerly, treating the room with well-worn respect. Then I slowly accepted my role as destroyer – using a sledge hammer and crowbar was unexpectedly joyful. Physical labor often becomes meditative.The final give of a cabinet coming off the wall was pure exuberance. However, there was something undeniably treacherous about gleefully taking to pieces the cabinets that had held together your childhood – holding your cereal, hiding your secretive liquor. Cupboards that had served so well and so long as to become anachronisms now turned to kindling.

Yet in the destruction I felt a more intimate connection with the home than I’d ever had in twenty-seven years of living in it. To go from acting within the house to acting upon it brought a better sense of knowing than I’d thought possible. Instead of seeing the house as a completed, singular thing I saw the many constituents – the pipe carrying your water, the cables weaving under the stairs, the black steel supporting the second floor. These things that before seemed magic are now understood. My former knowledge of this place, the lodestar of my youth, now felt superficial. I came to understand how the home made such a beloved childhood possible, and to respect our home all the more.

Like archaeologists we also uncovered clues about the past generations – two generations of (to me) unknown Prides. I ripped up linoleum and learned their taste (or lack thereof) in tiling. I tore down drywall I saw the fifty’s fashion in wall paneling, or a mount where they’d hung a picture eighty years before. In one portion of the floor the wood was rotted – the result of some previous generation’s cataclysm. Here was proof that these ancestors had lived where we lived and renovated the place they loved, as we do now. And proof that this place has always changed in its constituents yet always remained the same in its sum.

This history and this intimacy would have remained hidden without our undertaking, lovingly, the demolition. And now we leave our own imprint on this sacred  yet changing place.

Automated Android testing with Calabash, Jenkins, Pipeline, and Amazon Device Farm: Part 1

Around two years ago we began playing with the Calabash library to see if we could automate some portion of mobile release QA process. Previously we’d sent an APK file and spreadsheet of steps to India for every release, then engaged in a weeks-long back and forth as we hashed out bugs with the application and test plan. This process had become costly and burdensome as the size of our development team grew (and our feature set along with it). Moving to some UI level automated testing suite offered some obvious advantages

  • We would save the money we’d been paying contractors to run through the tests
  • We would find out immediately when something was broken, rather than all at once during release.
    • This allows the person who caused the bug to handle it rather than whoever is assigned to triage release bugs
  • We could test across a large set of devices and Android OS versions
  • We would be managing test code instead of managing a huge test spreadsheet

Unfortunately the road to this gold standard was a long one, beset on all sides by the tyrannies of poor documentation and rapidly evolving technology. While writing and running Calabash tests was straightforward, this wasn’t even half the battle. In order for tests to be useful we needed them to run and pass consistently, deliver meaningful and accessible reports, and plug into our notifications workflows like Slack and Github. Oh, and write the tests.

This November we ran our first release with a significant portion of the test plan (more than 40 man hours) automated. We have about forty “scenarios” running nightly builds on Amazon Web Services across three to five devices. Failure notifications end up in our Slack channel and reports emailed to us and stored on our Jenkins server. This is how we got there.

The Test Suite

Our test repository is a standard Calabash test suite. Most important are the files in the features/ folder. These contain Calabash files written in the business logic language Gherkin. For example:


@Fixtures
Feature: Fixtures

  @AWS
  Scenario: Ensure that we error cleanly when missing a fixture
    Then I install the ccz app at "fixtures.ccz"
    Then I login with username "fixtures_fails" and password "123"
    Then I press start
    Then I select module "Fixtures"
    Then I select form "Fixtures Form"
    Then I wait for form to load
    Then Next
    # Should error
    Then I should see "Error Occurred"
    Then I should see "Make sure the"
    Then I should see "lookup table is available"

The biggest unit of a Calabash test is a “Feature” which can contain multiple scenarios. The Android sandbox is shared within a feature so if you add some persistable data in the first scenario it will still be available in the second. In this case we install a CommCare application, login with a dummy user, and then run through some menus and a form. Eventually we expect to see a clean failure. We add Tags (prefixed with ‘@’) to our tests as this is the primary unit of control for organizing test runs. We’ll be using those later.

We’ve also included in the repository some test resources that we need such as CommCare applications, and a spreadsheet, and even another Android application that we use for testing our mobile APIs.

The killer feature of Calabash is that this (weird sounding) plain English becomes a runnable test. “Will, this is amazing, how is this possible?” Well, Calabash gives us a number of so-called “canned steps” for free. These canned steps are enough to manipulate an Android device in just about any way possible. You can touch a TextView based on its String content or resource ID, use the soft keyboard, assert some text exists, etc. We use the “Then I should see [x]” step above. Calabash then compiles these down to Ruby that runs on the Calabash server alongside your app on the device.

For more control and cleaner code we specify our own steps as well. For example, each of our tests must begin with the installation of some CommCare app. So we defined a step to install an app from a local resource:

# peform offline install using ccz file pushed from repository
Then (/^I install the ccz app at "([^\"]*)"$/) do |path|
  press_menu_button()
  tap_when_element_exists("* {text CONTAINS[c] 'Offline install'}")
  push("features/resource_files/ccz_apps/%s" % path, "/sdcard/%s" % path)
  step("I enter \"storage/emulated/0/%s\" into input field number 1" % path)
  hide_soft_keyboard()

  # get around bug where the install button is disabled after entering text
  perform_action('set_activity_orientation', 'landscape')
  perform_action('set_activity_orientation', 'portrait')
  sleep 1

  tap_when_element_exists("* {text CONTAINS[c] 'Install App'}")
  wait_for_element_exists("* id:'edit_password'", timeout: 6000)
end

This ought to give a better idea of what Calabash actually runs on the application. Here we navigate to the install screen, push our test app from test resources into the application’s storage space, start the install, and then wait for the login screen to appear. At this level we’re writing Ruby rather than Gherkin so we have much less readable but more precise code. Ideally our developers would be able to define enough robust steps that a technical non-developer could write their own tests. And indeed this has happened with a member of our QA team outputting a large portion of our test plan.

The commented portion in the middle hints at a persistent weakness of Calabash that we’ve discovered; namely, that we frequently run into UI quirks such as views becoming “invisible” due to screen rotation or keyboard focus. Most frustratingly these issues often are not consistent so a test will pass a few times before failing when a device happens to run more slowly than before. We’ve always been able to work around these issues and usually can hide the fix in a step so this isn’t show stopping. However, these bugs do slow development time.

In Part 2 I’ll go into how these tests are packaged and run on Jenkins.