Hacker Newsnew | past | comments | ask | show | jobs | submitlogin



I have a fun anecdote. About 5-6 years ago, Elixir completely disappeared from the top 100 after spending some time in the top 50. People reached out to me and then I reached out to TIOBE to understand why and the reason given was "bad presence on Amazon".

After further investigation, the root cause seemed to be that we finally had enough published Elixir books. At the time, if you searched for "xyz programming" on Amazon and only found a few results, Amazon would pad those results with non-relevant entries. However, because Elixir reached about 20-30 books, we were no longer padded, so we suddenly got worse rankings than every other language with only a handful of books. This happened on every Amazon domain they searched on, so it compounded and effectively kicked us out of the top 100 altogether. This all happened at a time Elixir language activity had already reached top 25 on GitHub PRs/stars.


First of all kudos for making elixir.

And secondly, Like you are saying of "xyz programming", then to my understanding let's say I searched "elixir programming" on amazon, and then earlier there were not much books so it was (padded?) but once it reached 20-30 books, it wasn't padded but then how does it have an impact on search ranking. I still can't comprehend how having more books can have a negative impact on a popularity index and if such an index like TIOBE is doing so, then its clearly messed up.


My understanding (which may be wrong) from the exchange is that they literally search for "elixir programming" on several websites, including Amazon. So it is very sensitive to whatever changes those websites do to their own search engines. I can no longer reproduce the behaviour from back then but it is very understandable that websites like Amazon are optimizing their search results for sales and other key metrics rather than term precision.


Your understanding is correct. Their methodology is really that silly and susceptible to wild swings.


Amazon is one of the search engines TIOBE uses.

It seems like Amazon showed other unrelated things if search for Elixir Programming to bolster the search results.

So you got more result than books existed. Maybe 50,100 or even more results.

After a certain threshold Amazon stopped doing that, so you get less results.

Less results, lower TIOBE position


Amazon (US) is right there, we can try it:

  "python programming" : 6526
  "uxntal programming" : 2
  "elixir programming" : 2085
  "kotlin programming" : 390
I tried a couple of very new/niche languages like granule/futhark/carbon/jasmin but got either no results, or only obviously unrelated junk. For the languages above I quickly scanned the top result and they looked relevant.


> or only obviously unrelated junk

Does Tiobe detect that it's junk/padding or they just scrap the number and take it at face value?


The latter. That's precisely what they do. I emailed them and verified this.


That's pretty funny, since the search turned up downright confounding results, like books about programming with an author named "Jasmin". For "carbon programming" I got a ton of books about C, can't guess why, but it's surely not good data.

Maybe I'll make a language called "Introduction to" or "Linear" and shoot to the top of the index.


I think he's saying that before, when you searched for "elixir books" you would get:

- Elixir for Dummies

- Elixir for Beginners

- Elixir Programming with Phoenix

- General programming book

- Some other general programming book

- Etc

And this list would end up being a full page.

After you got a few Elixir specific books, you only had those, but the page was shorter. So ranking was lower.


If you wanted accurate statistics for each language, you'd probably have to go closer to the source:

  - How many downloads for the compiler/runtime/toolchains have?
  - How many downloads do the packages on the package manager (if any) have?
  - How many downloads do base containers have? How popular are the SaSS/PaSS offerings geared towards the languages?
But of course, doing that for a bunch of stacks would be quite difficult and time consuming, so people feel confident in just looking at Google Trends or an equivalent (or aggregating similar surface level data from a bunch of providers) and just calling it a day.


> How many downloads do the packages on the package manager (if any) have?

This will overrepresent languages that rely heavily on external packages, such as JavaScript and Rust, while underrepresenting languages with a large standard library where packages are not needed as much.


Getting those stats seems practically impossible if you want to include as many languages as possible (I don't know how many TIOBE includes, they don't seem to state that anywhere on their site).

How do you measure the downloads on Github? Do you include only releases or also git clones? How do you compare languages with a package manager vs languages without one? What if the language compiler is hosted on a less popular git platform or maybe a personal website? Do you contact those regularly to give you the precise numbers? How do you know those numbers are reliable? How do you e.g. count the number of Rust toolchain installations without putting telemetry into rustup? Do you count nightly + stable + testing toolchains separately?

So it makes sense TIOBE only uses search results as those are comparable - or at least they seem to be, because search engines change their ranking and filtering methods over time and maybe personalize results.


I think those stats might not be easy to come by. I know you can find download stats for Rust at https://lib.rs/stats but I don’t think it’s easy to find a similar data set for other languages?


And some languages like gcc, Python, Perl are often installed as a default with the OS or other tools.


Anything beyond directly asking developers (SO posts, Github repositories, books...) ends up being extremely biased. The Stack Overflow Annual Dev Survey is the only source I check, and even there the population targets and questions are not free from bias. For instance, I've been adding OpenScad in the free text option for the last 5 years.


It's really great for identifying whether people are interested in the facts or were just reaching for justification of their pre-existing conclusion.


Very roughly, TIOBE gives you search popularity of a given language. So Ada is climbing some popularity chart. The question is what?


> TIOBE gives you search popularity of a given language

No it does not. It gives you the number of results returned by a search engine, which has nothing to do with how many people are searching for that term.


It definitely gives some sort of correlation to it. Otherwise, you wouldn't get Python at #1 for several years, in line with other metrics.


PYPL is more interesting, but it also shows Ada climbing fast:

https://pypl.github.io/PYPL.html


I just checked it for Rust:

"18 17 change Rust page Rust 0.97% -0.20%"


That is one of the few measurements that management listens to, regardless of how bad we think it is.


While one can choose to dismiss the TIOBE index (I don’t have any strong opinion about it), there was also a screen shot of PYPL showing a steady increase in Ada over recent months. Something positive is happening!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: