A few months back I was pessimistic about AI, and now I am the opposite. The perspective change happened when I realized giving it an entire problem and expecting it to solve that is unrealistic. The real value add is if you can use AI at the right steps of your workflow or larger system.
I did a PhD in program synthesis (programming languages techniques) and one the tricks there was to efficiently prune the space of programs. With LLMs it is more much more likely to start with an almost correct guess, the burden now shifts to lighter verification methods.
I still do not believe in the AGI hype. But I am genuinely excited. Computing has always been humans writing precise algorithms and getting correct answers. The current generation of LLMs are the opposite you can be imprecise, but the answers can be wrong. We have to figure out what interesting systems we can build with it.
Do you mean Ruby lacks syntactic support for adding type annotations inline in your programs?
I am one of the authors of RDL (https://github.com/tupl-tufts/rdl) a research project that looked at type systems for Ruby before it became mainstream. We went for strings that looked nice, but were parsed into a type signature. Sorbet, on the other hand, uses Ruby values in a DSL to define types. We were of the impression that many of our core ideas were absorbed by other projects and Sorbet and RBS has pretty much mainstream. What is missing to get usable gradual types in Ruby?
My point isn't technical per se, my point is more about the UX of actually trying to use gradual typing in a flesh and blood Ruby project.
Sorbet type annotations are noisy, verbose, and are much less easy to parse at a glance than an equivalent typesig in other languages. Sorbet itself feels... hefty. Incorporating Sorbet in an existing project seems like a substantial investment. RBS files are nuts from a DRY perspective, and generating them from e.g. RDoc is a second rate experience.
More broadly, the extensive use of runtime metaprogramming in Ruby gems severely limits static analysis in practice, and there seems to be a strong cultural resistance to gradual typing even where it would be possible and make sense, which I would - at least in part - attribute to the cumbersome UX of RBS/Sorbet, cf. something like Python's gradual typing.
Gradual typing isn't technically impossible in Ruby, it just feels... unwelcome.
None of my customers ever asked for type definitions in Ruby (nor in Python.) I'm pretty happy of the choice of hiding types under the carpet of a separate file. I think they made it deliberately because Ruby's core team didn't like type definitions but had to cave to the recent fashion. It will swing back but I think that this is a slow pendulum. Talking about myself I picked Ruby 20 years ago exactly because I didn't have to type types so I'm not a fan of the projects you are working at, but I don't even oppose them. I just wish I'm never forced to define types.
Can you list the concrete problems a FastAPI approach will have, and what tools like Nvidia Triton do differently to get around it? I have no idea about running such models at scale.
- Dynamic batching while limiting latency to a set threshold
- Running multiple instances of a model, effectively load-balancing inference requests.
- Loading/unloading/running multiple versions of models dynamically, which is useful if you want to update (or roll back) your model while not interfering with existing inference requests.
Its client provides async based inference APIs, so you can easily put a FastAPI-based API server in front and don't necessarily need a queue (like Celery).
FastAPI loads a model statically on startup. There are some hacks to reload versions and new models via things with load balancers, etc but they’re just that - hacks. There are also known issues with TensorFlow especially having poor memory management over request count.
FastAPI is great but at the end of the day it’s Python and the performance reflects that (more on this later).
With Nvidia Triton you get:
- Automatic support for various model frameworks/formats: native PyTorch/TensorFlow, ONNX, and more.
- Dynamic batching. You can configure an SLA with max additional latency for response time where Triton will queue requests from multiple clients over a given time period and pass them through though model batched. If you have the VRAM (you should) it’s an instant performance multiplier.
- Even better performance: Triton can do things like automatically compile/convert a model to TensorRT on the runtime hardware. This allows you to deploy models across hardware families with optimized performance while not worrying about the specific compute architecture or dealing with TensorRT itself.
- Optimized and efficient use of multiple GPUs.
- Model version management. Triton has a model management API you can use to upload a new model/version and load it dynamically. It can hot load/reload a model and serve it instantly, with configuration options for always serving the latest model or allowing client to request a specific version.
- Performance metrics. It has built in support for Prometheus.
- Other tools like Model Navigator and Performance Analyzer. You can pass a model to these tools and they will try every possible model format, batch size, etc, etc against an actual Triton server and produce a report and optimized model configuration based on your selected parameters - requests per second, response time, etc. Even memory/compute utilization, power usage, and more.
- Out of the box without any of these tricks Triton is faster, uses less memory, less GPU compute, and less CPU compute. Written in C and optimized by Nvidia.
- It’s a single implementation (often container) that from the get go is smaller, lighter weight, and easier to manage than pip installing a bunch of dependencies and the entire runtime framework itself. It exists solely to serve models and serve them well.
When you add it up (as I mentioned) I’ve personally seen cases where requests per second increase by orders of magnitude with lower response times than a single request against FastAPI (or similar). Plus all of the mlops and metrics features.
The idea that symbolic AI lost is uninformed. Symbolic AI essentially boils down to different kinds of modeling and constraint solving systems, which are very much in use today: linear programming, SMT solvers, datalog, etc.
Here is here symbolic AI lost: any thing where you do not have a formal criteria of correctness (or goal) cannot be handled well by symbolic AI. For example perception problems like vision, audio, robot locomotion, or natural language. It is very hard to encode such problems in terms of formal language, which in turn means symbolic AI is bad at these kind of problems. In contrast, deep learning has won because it is good at exactly these set of things. Throw a symbolic problem at a deep neural network and it fails in unexpected ways (yes, I have read neural networks that solve SAT problems, and no, a percentage accuracy is not good enough in domains where correctness is paramount).
The saying goes, anything that becomes common enough is not considered AI anymore. Symbolic AI went through that phase and we use symbolic AI systems today without realizing we are using old school AI. Deep learning is the current hype because it solves a class of problems that we couldn't solve before (not all problems). Once deep learning is common, we will stop considering it AI and move on the to the next set of problems that require novel insights.
Today's symbolic software is just software that was written by humans. Software existed as long as there are computers. AI was never just another term for software. I don't think any human written software today captures what proponents of symbolic AI wanted to achieve 50 to 60 years ago. Well, okay, it beat Kasparov at chess in 1996, but chess algorithms were old news even in 1970. I don't think Deep Blue used anything fundamentally new. It was not an AI breakthrough, it was a feat which showed how fast computers are.
The fact is, "AI" was always about much higher ambitions, about solving truly fuzzy tasks. Recognizing handwritten digits is exactly such a problem that has been solved, even if you don't want to call it "AI" anymore because it has stopped to be impressive.
It is already here to be honest. I know BrowserStack and other mobile testing platforms (at Facebook and Amazon) do host real devices, both Android and iPhones, in server farms like this. Meta wrote a blog post about it: https://engineering.fb.com/2016/07/13/android/the-mobile-dev...
At one of my previous workplaces, we discussed running the Z3 theorem prover on an iPhone cluster, because they run so much faster on A series processor than a desktop Intel machine.
This is exactly how Compilers are taught at the University of Maryland. The class CMSC430 (https://www.cs.umd.edu/class/fall2021/cmsc430/) actually starts off with a Scheme (limited subset of Racket) and gradually grows the language to include more features. The first class compiles just numbers to x86 code, followed by arithmetic operations for numbers, building up to higher level features like function calls, pattern matching, and so on. See the notes at: https://www.cs.umd.edu/class/fall2021/cmsc430/Notes.html
There has been attempts as you describe before. I can specifically point to work done in Ruby by my PhD advisor using the exact profiling approach, and then static typing from that: http://www.cs.tufts.edu/~jfoster/papers/cs-tr-4935.pdf
> you're already executing the code for free anyway
Based on my experience of working on similar domain of type systems for Ruby (though not the exact approach you describe), this turns out to be the ultimate bottleneck. If you are instrumenting everything, the code execution is very slow. A practical approach here is to abstract values in the interpreter (like represent all whole numbers are Int). However, this would eliminate the specific cases where you can track "Dict[String->int] WHERE 'foo' in Dict and Dict['bar'] == 42". You could get some mileage out of singleton types, but there are still limitations on running arbitrary queries: how do you record a profile and run queries on open file or network handles later? How do you reconcile side effects between two program execution profiles? It is a tradeoff between how much information can you record in a profile vs cost of recording.
There is definitely some scope here that can be undertaken with longer term studies that I have not seen yet. Does recording type information (or other facts from profiling) over the longer term enough to cover all paths through the program? If so, as this discussion is about maintaining code long term, does it help developers refactor and maintain code as a code base undergoes bitrot and then gets minor updates? There is a gap between industry who faces this problem but usually doesn't invest in such studies and academia who usually invests in such studies but doesn't have the same changing requirements as an industrial codebase.
I have experienced something similar as well. I was visiting NYC, and the place we booked did not look anything like the online listing showed. Regardless, the hosts were nice and helpful, so I left them a 5 star rating on all parameters except cleanliness. This was the first time rated something on Airbnb that is not 5 stars, and I have travelled quite a bit before.
Few weeks after that I receive racist messages from the host on my personal cellphone. I am shocked Airbnb would share my personal details just like that. I reported it to Airbnb, but they were pretty clueless as this was happening outside their messaging system. I finally had to share screenshots of the text messages, and they said they will take down the hosts account. They did for sometime, but after a couple of months that host's account is still live and his place is still accepting reservations. My review has been taken down though.
This seems to be an instance of concolic execution which has seen some success in the fuzzing and testing research community. The key ideas originate from these papers:
The technique works well when parts of the program space (including paths through the program) can be easily represented in SMT solvers (like booleans, bit vectors, and arithmetic), and the remainder program space can be explored using random testing. I am excited to see this work being brought to Crosshair in Python via Hypothesis!
That is a fair concern. The tool only guarantees correctness up to the level checked by the tests you provide. So if all corner cases are not covered, RbSyn will generate some program that passes the tests, but might not pass unspecified corner cases. I suspect the missing cases will be clear with a manual audit of the synthesized code, and you can add update the synthesized code manually or add requisite tests.
It is a hard task, for sure! But without any tests to convey your intent as a programmer, RbSyn also has no way of knowing what you intended a method's behavior to be.
The running thread here is if you would have written a method and some tests to check that you indeed wrote the method correctly, RbSyn will automate the task of writing the method for you. In other words, it is just like test driven development, where the writing the code part is automated.
By manual audit, I meant you can verify if you need more tests to capture your intended behavior or just take a shorter route and update the synthesized code directly without any new tests for the newly added behavior. One could argue the latter is bad programming practice.
I did a PhD in program synthesis (programming languages techniques) and one the tricks there was to efficiently prune the space of programs. With LLMs it is more much more likely to start with an almost correct guess, the burden now shifts to lighter verification methods.
I still do not believe in the AGI hype. But I am genuinely excited. Computing has always been humans writing precise algorithms and getting correct answers. The current generation of LLMs are the opposite you can be imprecise, but the answers can be wrong. We have to figure out what interesting systems we can build with it.