The Viterbi algorithm is used to find the most likely sequence of states given a sequence of observations emitted by those states and some details of transition and emission probabilities. It has applications in Natural Language Processing like part-of-speech tagging, in error correction codes, and more!
Lately I’ve been practicing David Allen’s “Getting Things Done” framework, which consists of components for getting tasks out of your head and into a system to improve productivity and reduce stress. I wrote about the overall system here. In this post, I want to talk about my Projects list and my Next Actions agenda.
My typical analysis workflow is to start with data in some kind of database, perhaps Redshift or Snowflake. Often I’m working with millions or even billions of rows, but modern databases excel at operating with data at scale. Moreover, SQL is an intuitive and powerful tool for combining, filtering, and aggregating data. I’ll often do as much as I can in SQL, aggregate the data as much as I can, then export the data as a CSV to continue more advanced statistical calculations in python.
Although I have been an Emacs user for 15 years, for the first 13 of
those years I only used a handful of commands and one or two “modes”.
A couple years ago I went through the Emacs tutorial (within Emacs,
type C-h r
) to see if I was missing anything useful. I was not
disappointed! Since that time, I have gone through the entire
Emacs manual,
made full use of
Elpy to
create a rich Python IDE, adopted
Magit to speed
up my version control workflow, and more!
When I started this blog, my primary objective was less about teaching others A/B testing and more about clarifying my own thoughts on A/B testing. I had been running A/B tests for about a year, and I was starting to feel uncomfortable with some of the standard methodologies. It’s pretty common to use Student’s t-test to analyze A/B tests for example. One of the assumptions underlying that test is that the distributions are Gaussian. “What about A/B testing is Gaussian?”, I wondered. I knew there was a big difference between one-sided and two-sided tests, but I didn’t feel confident in my ability to choose the right one. And the multiple comparisons problem seemed to rear its ugly head at every turn: what was the best way to handle this?
One of the most interesting topics in the Coursera Deep Learning specialization is the “YOLO” algorithm for object detection. I often find it helpful to describe algorithms in my own words to solidify my understanding, and that is precisely what I will do here. Readers likely will prefer the original paper and its sequel.
I recently completed the Deep Learning specialization on Coursera from deeplearning.ai. Over five courses, they go over generic neural networks, regularization, convolutional neural nets, and recurrent neural nets.
Having completed it, I would say the specialization is a great overview, and a jumping off point for learning more about particular techniques. I wouldn’t say I have an in-depth understanding of all the material, but I do feel like I could go off and read papers and understand them, which is maybe all I could expect.
The “unreasonable effectiveness of deep learning” has been much discussed. Namely, as the cost function is non-convex, any optimization procedure will in general find a local, non-global, minimum.
Actually, algorithms like gradient descent will terminate (perhaps because of early stopping) before even reaching a local minimum. For many experts in optimization, this seems like a bad thing. Concretely, it seems like the performance of networks trained in this way would be much worse than other optimization-based systems where we are in fact able to find the global minimum, such as logistic regression.