📅 February 11, 2021 (Clubhouse Data & AI Group)
🎙️ Bilal Mahmood - Head of Product @ Amplitude (the product analytic SaaS company), Past: CEO ClearBrain (Causal Analytics Platform, acq. by Amplitude), PM Data Science @ Optimizely
🔑 ****Plaintext his. Italic comments mine.
Evaluate product market fit without code.
The transformation and data cleaning layer that integrates with data lake is the most challenging technically to solve. There are a variety of input sources, types, and non-obvious problems with data that need to be removed and canonicalized to enable processing across entire datasets. Hire ML engineers before data scientists because ETL problems are more pressing than investigating state of the art ML models.
This echoes the classic 'Hidden Technical Debt in Machine Learning Systems'. The common trope with ML systems is the vast complexity. The actual model represents only a small fraction of the total infrastructure. The rest is held together with (hopefully not) pipeline sprawl and spaghetti code.
Data idiosyncrasies cause crashes. ETL pipelines have a long-tail of potential inputs. Compute on dataflows is hard (Apache Spark is a sharp, finicky knife). ****Handling the long-tail of varied potential user input is a serious scalability issue for AI systems.
Identity resolution across multiple input sources is hard. Luckily, data warehouse APIs have standardized schemas that are similarly formatted, removing need for one-off pipelines. APIs help you scale. Avoid data lakes without identical taxonomies.
"Transforming data isn't a product, it's a job".
You need the whole end-to-end system. Marketers don't want a data engineering product, they want churn prediction at scale for their digital experiences.
No code is still too hard for marketers. KISS for business audiences. Persona to work with is Product (actual users). Marketers don't live in data tools. Changing workflows is hard (remembering how to use functionality if it is not engrained is a great way to introduce friction). Build for the every day user.