BigQuery has two unusual technical features that make it uniquely suited to hosting public data like this:
1. It has a separate "storage tier" and "compute tier", so we can all run queries against the same dataset at the same time.
2. The "compute tier" is 100% shared, so I can use BigQuery even if I just want to run a few queries.
I hope we'll see more projects like this. Having public data available in SQL is a great thing. SQL is easy enough that a lot of people can figure it out, but powerful enough to do real work.
Also check out http://government-contracts.insidegov.com/ and http://government-contractors.insidegov.com/. Data is slightly outdated, but is being updated as we speak to account for the recent changes USASpending.gov made to their file schema. We did work to tie in all subsidiaries and branches into summary profiles through Duns numbers to give a true estimate for the obligated contract amount won by large corporations. Thanks for posting the other sites with this data, always great to check out how others handle these problems!
Full disclosure, I work for Graphiq which owns InsideGov.
And for those companies providing services (i.e. people doing work, rather than just SaaS) there's a lot of subcontracting that doesn't show up in this data.
I don't know much about this new explorer tool, but it's based off of USASpending.gov which is a bit fickle...which is to be expected given the messiness of the data involved (companies have varied names, subsidiaries etc., among other real-life complexities).
But you can do an "advanced search" by registered company name, and this is what comes up for "Microsoft" -- probably doesn't include all of its subsidiaries that don't have "Microsoft" in the name: ~$1.4 billion in awarded contracts
You really have to have a groomed database to get reasonably accurate data. At a former employer (federal market research) we maintained a separate database of contractors mapping the relationship between companies based on DUNS number, which enabled rolling up totals for subsidiaries. Of course that doesn't account for JVs or subcontracting.
Government requires competition for purchases. So vendors like Dell, SHI, CDW, Carahsoft, etc are all wholesalers who compete over the pennies for fulfillment to sell Microsoft (and most other) software.
1. Agencies often buy from distributors and integrators, not publishers. In this case, original manufacturers may appear in the variable `descriptionofcontractrequirement`.
2. Spelling may differ. Variable `dunsnumber` is be a better indicator for uniqueness.
That said, there are plenty of open source projects that scale horizontally and let you run SQL faster than RDBMS - Presto, Impala, Spark, Druid, Hive, etc.
1. It has a separate "storage tier" and "compute tier", so we can all run queries against the same dataset at the same time.
2. The "compute tier" is 100% shared, so I can use BigQuery even if I just want to run a few queries.
I hope we'll see more projects like this. Having public data available in SQL is a great thing. SQL is easy enough that a lot of people can figure it out, but powerful enough to do real work.