GitPrime is a great exemplar of a whole class of companies that are waiting to get built: domain-specific business intelligence (BI) tools. And it’s gotten radically easier to build something like GitPrime even in the last year. You can use an automated data pipeline like Fivetran (my company) to centralize all your users data in a single multi-tenant Snowflake data warehouse, and then build your analytics on top of that foundation. Finally, you can sell direct access to the curated data via data sharing as an enterprise feature.
It’s a highly repeatable pattern. There should be a GitPrime for every industry.
Generally agree on your broad point, but not on the architecture. I'm biased, but I would suggest that if you follow GitPrime's lead, you architect with OSS components that you can deploy on-prem as it was crucial to their success: https://blog.replicated.com/gitprime-enterprise-saas/
Trying to build a vertical analytics offering on top of OSS increases the level of difficulty by 100x. I’m not saying that it’s never the right answer, but I am saying there’s a large category of companies waiting to be built where the primary value is the expertise around how to analyze the data, not the infrastructure. If you’re analyzing data that comes from cloud data sources anyway, there’s really no sense in “deploying”. CIOs ask for it, you say no, you do the security compliances and eventually they come around.
"If you’re analyzing data that comes from cloud data sources anyway, there’s really no sense in “deploying”."
- This is only true if the security controls that your team, application, infrastructure has in place is matches the major cloud providers (i.e. Salesforce, Google, AWS, Microsoft). Even then, spreading your data around to 1,000 different SaaS vendors increases the surface area for attack/loss by 1000x.
"Trying to build a vertical analytics offering on top of OSS increases the level of difficulty by 100x"
- 100x is hyperbole, it significantly harder before OSS was focused on operations, but now there is an HA Helm chart, or even an K8s operator for most of the popular OSS components. It might still be slightly harder today, but organizations that want to pull insights from THEIR data often value the proprietary nature of that data.
Keep in mind you're replying to the CEO of a YC W13 company that wasn't acquired two months ago for $150M by Google (that was his competitor, Alooma [1]) and wasn't acquired in November for $60M by Talend (that was Stitch Data, another competitor [2]). Your points are absolutely valid, you're just barking up the wrong tree. Both of those were SaaS-based ETL connector companies that were acquired while executing a similar enough strategy, and they both saw nice exits.
If you look at his comment history, you'll notice he mentions Snowflake at every single opportunity. Snowflake raised $450M last October (~$1B valuation), so they have a nice warchest for strategic acquisitions.
I stand by 100x in this context. OSS is way behind the commercial analytical databases. And deploying a horizontally scaleable analytical database is a completely different ballgame than, say, Postgres.
I don't have much experience with the exact topic, but anyone throwing phrases like "100x more difficult" very quickly loses credence with people.
How do you measure 100x? How come it isn't 90x or 112x?
Does that mean what one engineer can do with a commercial database, you would need 100 people do the the same thing with OSS in the same time frame?
If you used a phrase like "difficulty on another level" or something descriptive like that instead, I'm sure people would be more interested to hear what you have to say.
This is really cool, I've been thinking about your comment for the past 15 minutes. I can imagine a swath of these BI tools, and also something that could be used in a consumer setting to manage your finances, health/exercise, mood, and more. Sort of like a personalOS.
While highly repeatable, the biggest issue I've seen with this pattern is actually the data pipeline component. Specifically:
- The pricing structures don't tend to be amenable to this operating model. Most of the newer SaaS ETL services like Fivetran and Alooma tend to structure their pricing discrimination/tiering to favor a higher volume of throughput through a low number of individual connectors/sources. At least the last time I was sourcing one of these solutions, trying to follow the above pattern (a high volume of individual connections with potentially low volume of events from each source) was almost impossible to negotiate without a substantial increase in cost for any given volume of events.
- The pricing structures also tend to be unfriendly to unpredictable and unknown event volume, particularly at the pre-launch phase. While that's not a problem if you're aiming towards the enterprise space or have investment capital to burn, it's a non-starter for people looking to bootstrap an idea that may or may not gain any market adoption. Industry- and domain-specific BI tools are inherently niche, and tend to have a market discovery phase as you align with that industry. Committing to minimal annual event volume for your data pipelines before you've even attempted to go to market pushes you towards rolling your own pipelines and away from using any of the tools like Fivetran that would have simplified the go to market process in the first place.
- Data security and legal compliance. It's quite possibly more of an issue with my current environment and client pool, which skews more towards the enterprise. But my current job is infinitely more difficult because leveraging SaaS-based ETL tools like Fivetran, Alooma, Funnel.io, Supermetrics, etc are all absolute non-starters. Between GDPR, the new California privacy laws, and all of the press on data leaks over the last few years, any solution that includes "sending internal data to external companies" or "providing credentials for internal data sources to external parties" triggers instant lawyer and IT security review. And that includes going all the way down the rabbit hole - if we try to use a vendor, then we need to know and audit who all they give access to, etc. Which basically means we have to roll our own pipelines and keep them internal or on a client's own cloud account. At least for a lot of interesting use cases for domain-specific BI tools, I could see that same caginess coming into play.
I can’t speak to the others, but Fivetran pricing is based on connectors, not volume, and we have a “powered by Fivetran” price sheet for multitenant scenarios.
The security reviews when there are 3 parties are a pain. It can be overcome, the benefit of getting the data infrastructure automated just has to justify the extra sales effort.
I think that was GP’s point - pricing by the connector means that younger companies with a lot of SaaS services but not a lot of data in each may struggle to find a workable price. (I’m not commenting on whether that’s true, but I think that’s what they meant.)
Not a fan of git prime. It's built for micro managers, measures mostly meaningless metrics.
Encourages frequent small commits with low "churn". Trivial to game if you write a script to split your commits and artifically remove "churn" by making sure you don't touch your changes more than once. Also penalizes you for high "impact" commits. Adding a bunch of docs, renaming packages, or even running auto format penalizes you in metrics.
It's completely blind to the language. Doesn't understand the code one bit. Doesn't tell you anything about code quality or if your "rockstar" only writes so much code because he copy pasted everything. Not kidding, almost all the code our "top coder", per git prime, wrote had to be redone after he got canned. He was copy pasting trash everywhere.
I'm not against metrics but what it measures largely isn't meaningful. Ex. It's well known that lines of code across different languages are not equivalent.
I thought sure this was a stock deal (ie, a handshake basically). I didn't realize Pluralsight had $170 million in cash on hand. I guess that's why they give out such high quality fidget spinners as swag at PyCon.
There is a ton of filler content on Pluralsight. Half of all the courses are entirely setup, beginner level bullshit. Great content can be found there and everywhere. Poor content too.
I have no inside info, but my guess is that GitPrime's product could help Pluralsight quantify its ROI. E.g. "developers that complete these three courses write 17% more code and have 8% fewer bugs. Can you really afford not to buy Pluralsight licenses for all of your developers??"
Pluralsight has been slowly getting into the analytics space to show management and companies that the training works and land bigger corporate deals.
It started with custom courses, certifications, training journeys and completion scores and I believe the intention here is to tie that towards project velocity and code productivity.
I tried GitPrime and thought it was extremely well done. If you ignore the obvious use case of management using it for bean-counting, if you work with 10, 25, 50+ developers it really reveals some enlightening patterns.
It is fairly expensive, however. Starts at around $10k/year which can make it a tough budgetary sell.
I wonder if they'll price it in with Pluralsight as a package or keep it independant.
Might be interesting/feasible to build a GitPrime competitor and sell it for far less, making it approachable to much smaller businesses or teams. Since the data comes from git one assumes the analysis can be done asynchronously in the background, so there's no complex scalability engineering. I might well be wrong though.
It used some pretty interesting strategies around creating Git statistics such as the concept of seperating "new code" vs "churn". They defined churn as "modifying a line that had been previously modified in the last month" (I forgot if it was month or some other timeline).
It would chart if an engineer made a large number of "churn" diffs vs new code vs "legacy/refactoring" which was modifying code that was last touched a long time ago.
I did a trial in 2018 and liked the product a lot. There were some simple metrics like "does a developer commit code every business day" and "how long do your PRs sit waiting for review". Other more complex things they produced from the git log were statistics about churn (how often you touch the same code), and new code vs. refactoring vs. bug fixing.
Overall, I thought they handled the "you can't make metrics for developers" type objections fairly well. To get everyone on the dev team engaged and thinking about their metrics seemed like too big of a hurdle given the cost per seat of GitPrime.
While the acquisition price is public, would love to learn more about revenue or # of customers. They must have had some serious revenue/traction to be bought for $170M.
"Pluralsight will acquire GitPrime for a total purchase price of $170 million to be paid in cash. This amount represents a multiple of approximately 5 times to 6 times expected 2020 billings."
That puts 2020 projected revenue at $28-34 million. During the Q&A portion of the call, Pluralsight's CEO mentioned that "they [GitPrime] only have a few hundred customers".
> While the acquisition price is public, would love to learn more about revenue or # of customers. They must have had some serious revenue/traction to be bought for $170M.
Not exactly. I was at a startup that was only doing about $15M in annual revenue that got picked up for $350M. A lot of common PE values in tech are well past 25x in public stocks which seems (oddly) "normal" these days.
Usually the Earnings part of P/E is 0 in startup acquisitions, so technically the P/E ratio is undefined. 25x revenue isn't unheard of, though. It's not that odd either if you figure out what angle the acquirer is betting on. Often it involves a mix of:
- opportunity to easily cross-sell to a new swath of the market
- continuing to grow the acquired business
- realize economies of scale that the acquired business were working towards
A previous company was using it and I think it was largely useless. The manager knew it and was quite vocal about it, but we still went through the tool in our 1-2-1s.
It tells you what the reality in Git is - how many commits, how many lines of code changed, how much refactoring etc., so there's nothing really wrong with it. If that's what you're interested in, that's what you get.
Agree, there is an old saying "What gets measured gets managed" These numbers are handy (we look at similar numbers) but they absolutely do not tell the whole story and it's dangerous to think they do. You can often find yourself optimizing the wrong area and in large organizations even marginalizing your best performers because they don't fit the mold.
Sure, you could have a team of people running those git commands against every repository you have and compiling an excel spreadsheet to create data visualizations for each and every engineer you've got on a team, and you might end up with something roughly where GitPrime was 3 years ago, and taking hours to do what it does in seconds.
Now scale it to 2,000 engineers for reports the CTO of Walmart can look at. Scale it to give insight at the team level.
You've basically said "oh AWS isn't anything special, you can do all of that with KVM."
It’s a highly repeatable pattern. There should be a GitPrime for every industry.