Posts

Viterbi Algorithm, Part 1: Likelihood

The Viterbi algorithm is used to find the most likely sequence of states given a sequence of observations emitted by those states and some details of transition and emission probabilities. It has applications in Natural Language Processing like part-of-speech tagging, in error correction codes, and more!

Minimum Edit Distance

Minimum Edit Distance is defined as the minimum number of edits (delete, insert, replace) needed to transform a source string to a target string. The algorithm uses dynamic programming both to calculate the minimum edit distance and to identify a corresponding sequence of edits.

Getting Things Done: Projects List and Next Actions

Lately I’ve been practicing David Allen’s “Getting Things Done” framework, which consists of components for getting tasks out of your head and into a system to improve productivity and reduce stress. I wrote about the overall system here. In this post, I want to talk about my Projects list and my Next Actions agenda.

Spinning up PostgreSQL in Docker for Easy Analysis

My typical analysis workflow is to start with data in some kind of database, perhaps Redshift or Snowflake. Often I’m working with millions or even billions of rows, but modern databases excel at operating with data at scale. Moreover, SQL is an intuitive and powerful tool for combining, filtering, and aggregating data. I’ll often do as much as I can in SQL, aggregate the data as much as I can, then export the data as a CSV to continue more advanced statistical calculations in python.

Timekeeping with Emacs and Org-Mode

Although I have been an Emacs user for 15 years, for the first 13 of those years I only used a handful of commands and one or two “modes”. A couple years ago I went through the Emacs tutorial (within Emacs, type C-h r) to see if I was missing anything useful. I was not disappointed! Since that time, I have gone through the entire Emacs manual, made full use of Elpy to create a rich Python IDE, adopted Magit to speed up my version control workflow, and more!

A/B Testing Best Practices

When I started this blog, my primary objective was less about teaching others A/B testing and more about clarifying my own thoughts on A/B testing. I had been running A/B tests for about a year, and I was starting to feel uncomfortable with some of the standard methodologies. It’s pretty common to use Student’s t-test to analyze A/B tests for example. One of the assumptions underlying that test is that the distributions are Gaussian. “What about A/B testing is Gaussian?”, I wondered. I knew there was a big difference between one-sided and two-sided tests, but I didn’t feel confident in my ability to choose the right one. And the multiple comparisons problem seemed to rear its ugly head at every turn: what was the best way to handle this?

Getting Things Done

Getting Things Done or GTD is a productivity framework introduced by David Allen. Since his book was first published in 2001, the paradigm has achieved something of a cult status, especially among Emacs users. In this post I will describe my very-much-in-progress implementation of these systems.

Object Detection with Deep Learning

One of the most interesting topics in the Coursera Deep Learning specialization is the “YOLO” algorithm for object detection. I often find it helpful to describe algorithms in my own words to solidify my understanding, and that is precisely what I will do here. Readers likely will prefer the original paper and its sequel.

Thoughts on the Coursera Deep Learning Specialization

I recently completed the Deep Learning specialization on Coursera from deeplearning.ai. Over five courses, they go over generic neural networks, regularization, convolutional neural nets, and recurrent neural nets.

Having completed it, I would say the specialization is a great overview, and a jumping off point for learning more about particular techniques. I wouldn’t say I have an in-depth understanding of all the material, but I do feel like I could go off and read papers and understand them, which is maybe all I could expect.

Distribution of Local Minima in Deep Neural Networks

The “unreasonable effectiveness of deep learning” has been much discussed. Namely, as the cost function is non-convex, any optimization procedure will in general find a local, non-global, minimum.

Actually, algorithms like gradient descent will terminate (perhaps because of early stopping) before even reaching a local minimum. For many experts in optimization, this seems like a bad thing. Concretely, it seems like the performance of networks trained in this way would be much worse than other optimization-based systems where we are in fact able to find the global minimum, such as logistic regression.