Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A really easy way that I try to explain things to people is like this:

You can't compress information until you have it in a format that is appropriate for compression.

That is:

You can't compress (apply/create algorithms) information (data) until you have it (instrumented data collection) in a format (schema) that is appropriate for efficient compression (structured logging/cleaning).

99% of that is Data Engineering and building good engineering practices which have good data practices as a priority.

For any organization that has more than a handful of employees and more than one product, that is a non trivial task and gets more difficult the larger the organization gets.



Totally agree. Non-tech companies that think they need "data science" should instead put same effort into (data) engineering.

It's not quite 99% of the effort but close enough ;)

Search "data science hierarchy of needs"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: