Reflections on 2021 and Interests Going Into 2022

As 2021 wrapped up, I’ve been reflecting on the past year and thinking about the next. I was similarly reflective this time last year, when I wrote about how 2020, for me, was the Year of Emacs.

I will first note that in 2021, our second daughter was born, and our family is happy and healthy! This cannot be under-appreciated with the current state of the world. I spent quite a bit of time reflecting on the nature of life, work, and family, but frankly I don’t think I have anything especially interesting to say about these topics. I am immensely grateful for my wonderful wife, for our daughters, and for the friends and family helping us get through this pandemic. I am grateful to have a job with such great work/life balance, and to be on a team where I feel so supported personally and professionally.

Now, less importantly, but perhaps more interestingly, I’d like to discuss the technical themes of 2021 and what is top of mind going into 2022.

I’m pleased to report I worked on several interesting convex optimization problems in 2021, I think moreso than at any other point in my career. My work with A/B testing, and statistical inference more generally, focuses on Generalized Linear Models. It took me an embarrassingly long time to realize that the confidence region for the model parameters is convex (this is true whenever working with log-concave densities). What this means is we can formulate certain associated problems as being convex optimization problems, such as computing confidence intervals, assessing opportunity size, and robust optimization and power analysis.

The second big theme for me was Causal Inference. I’ve been writing about Causal Inference, especially based on controlled experiments, for nearly 4 years(!), but this year I really dove into some of the more advanced techniques, especially those that combine observational and experimental approaches. I applied Instrumental Variables for a few problems where it wasn’t feasible to randomly assign the treatment I was interested in. I read a lot about Mediation Analysis, a generalization of Instrumental Variables that permits the instrument directly to influence the response. I modeled heterogeneous treatment effects to understand how a treatment affects different people differently. I learned a lot about Switchback Tests, which apply or withhold a treatment to the same units at random intervals to increase statistical power.

I haven’t written about any of these topics because I haven’t worked with them enough to have my own perspectives, and I certainly see no value in summarizing what others have better said. That said, I read several great books this year that encompass these topics:

  • Causal Inference: The Mixtape by Scott Cunningham
  • Mostly Harmless Econometrics by Angrist and Pischke
  • Econometric Analysis of Cross Section and Panel Data by Jeffrey Wooldridge
  • Mediation Analysis by Tyler Vanderwheele

(I actually read a lot this year, especially science fiction. I thought Project Hail Mary by Andy Weir was better than Artemis but not as good as The Martian. I read all seven books of the Foundation series by Isaac Asimov: the prequels are the best, the sequels are not very good, and the TV show is insulting. I’ve also started reading Asimov’s Robot series, which so far I prefer to Foundation. I haven’t read much fiction since high school, but the pandemic changed that. I’d like to think of a better way of keeping track of what I’ve read; I think I probably read at least 30 books this year but I couldn’t list them all.)

What am I interested in, going into 2022?

There’s plenty to follow up on, regarding all the topics mentioned above, especially Mediation Analysis and heterogeneous treatment effect estimation. I’d also like to learn more about mixed effects models. I feel like I’ve been dancing around these the last few years, but I also think the Bayesian Hierarchical approach has advantages. I’d love to feel more confident in modeling correlated observations, for example with cluster-robust standard errors.

I’ve also found myself moving closer to economics, but I actually don’t know much! So I’d love to fill in some knowledge gaps this year. And here is a word cloud of things I think are interesting but that I don’t know much about: post-selection inference, multivariate methods, probabilistic graphical models, time series methods.

That’s a jumble I know, but I wanted to write it all down so I can check back in a year and see where I ended up!

Subscribe to Adventures in Why

* indicates required
Bob Wilson
Bob Wilson
Data Scientist

The views expressed on this blog are Bob’s alone and do not necessarily reflect the positions of current or previous employers.