bistro17's comments

bistro17 · on Dec 29, 2020

a weekly roundup of the essential repositories, research, conference talks on MLOPS hand curated and delivered.

bistro17 · on Aug 24, 2020

cold showers - recommend Wimhoff

bistro17 · on April 4, 2018

if you are in europe here is a template of an email you can send in post GDPR world - https://sixthvariable.com/?p=6

bistro17 · on Feb 26, 2018

I was wondering how featuretools differs from https://github.com/AxeldeRomblay/MLBox https://github.com/crawles/automl_service

and the proprietary and newly launched driverless ai (from h2o)

kmax12 · on Feb 26, 2018

Featuretools focuses on handling data with relational structure and timestamps. Here's an example to explain those two key points.

Imagine you have a relational database from a retail store with tables for customers, transactions, products, and stores.

Featuretools can make a feature matrix for any entity in the database using an algorithm called Deep Feature Synthesis. We wrote a blog post about it here: https://www.featurelabs.com/blog/deep-feature-synthesis/. Basically, it tries to stack dataset-agnostic "feature primitives" to construct features similar to what human data scientists would create. This means that a data scientist can go from building models about their customers to models about their stores in one line of code.

One aspect worth highlighting is that Featuretools can be extended with custom primitives to expand the set of features in can produce. As the repo of primitives grows, everyone in the community benefits because primitives aren't tied to a specific dataset or use case. Some of our demos highlight this functionality to increase scores on the Kaggle leaderboards.

Featuretools is good at handling time. When performing feature engineering on completely raw data it is important not to mix up time. When your data is timestamped, you can tell Featuretools to create features at any point in time and it automatically slices the data for you (even across relationships between tables!). You want to avoid situations similar to training a machine learning model on stock market data from 2017, testing that it works on data from 2016, and then deploying it and expecting to make money in 2018. You can read more about how featuretools handle time here: https://docs.featuretools.com/automated_feature_engineering/...)

bistro17 · on Feb 25, 2018

>However, Google Cloud's compute pricing is reasonably similar to DigitalOcean (with sustained usage discounts) and from what I hear these companies will often negotiate discounts.

Given the storage usecase of DropBox what would be the percent of saving if DropBox indeed went with Google or Digital Ocean?

BerislavLopac · on Feb 25, 2018

I might be misremembering, but I'm pretty sure that when they started AWS is the only remotely reliable game in town.

bistro17 · on Feb 22, 2018

implementation of this paper - https://metacpan.org/pod/Bloom::Scalable

bistro17 · on Jan 17, 2018

and addressing model interpretibility?