Cool, so are you actually using a LLM? If so, is it yours or are you borrowing someone else's (you mentioned that recent improvements in LLM's being a catalyst as the right time to tackle it)?
If not, I'd definitely like to hear more about your specific AI model.
Yes, we are using an LLM for some parts of the code generation, specifically GPT-4. In the medium-term, we plan to go lower in the stack and have our own AI model. We broke down the process into modular steps to only leverage LLMs where it's most needed, and use rule-based methods in other parts of the process (e.g. in fixing compilation errors). This maximizes the accuracy of the transformations.
Yes, internally, we have separate models that produce tests the final data has to pass before being presented to the user. In addition, you can define your own tests on the platform, and we will ensure transformations produced will pass those tests before deployment. We also have helpful versioning and backtesting features.
If not, I'd definitely like to hear more about your specific AI model.