Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Smithy: A language for defining services and SDKs (awslabs.github.io)
243 points by politician on May 7, 2021 | hide | past | favorite | 73 comments


This looks pretty sweet; I've spent a lot of time the past few years writing a LOT of OpenAPI specifications and the thing that gets tiring is all of the boilerplate with each bit of the definition (paths -> get -> responses -> 200 -> application/json -> schema). It gets so exhausting and turns into nested YAML soup pretty quickly. Having better support for multiple files will also be nice; _technically_ you can split an OpenAPI specification into multiple files but the tooling never quite deals with it properly.

I'll definitely be keeping an eye on this; looking forward to TypeScript code generation! One of our main use cases for OpenAPI is to document our models and then generate TypeScript types from them (that we then use in Node and React apps)


Ive used a bit of smithy at work. Yes, composable models, projections, and transforms are super common. Think joining multiple microservices in to a single public interface or splitting out different service/client models based on tags or traits. Most of this is handled by build infrastructure so it takes some mental energy to setup but then it “just works.”

Note: Principal at AWS.


Hey TranquilMarmot. I am working on a TypeScript code Generator for OpenAPI 2 and 3 that you might be interested in. It works on Node.js and web, provides strong typing (types generated against schema in the OpenAPI definition) and runtime type-checking to prevent the runtime data from deviating from TS types. We have been testing it in private for sometime; clients generated using this Code Generator have already been installed ~50K (and counting). I am looking for feedback so I'd be happy if you could try it out. My email is in my profile.


We were bothered by the same thing :) Many plugins for OpenAPI had no codegens which would produce deterministic behavior for compatibility across various languages. Also we wanted to have something that could express complex business domains in definitions.

Ended up building protoforce.io for the very purpose. It does support typescript + nodejs, which we used for the website itself as well.


We use multiple files at work, but all of our make rules first bundle the spec into a single file. We also save that bundle in our VCS. CI will fail if the bundle hasn't been recreated since updating the split files


We've built a similar thing at https://www.protoforce.io, which auto-generates client and server side. It actually transpiles, parsing the models definitions and emits actual code with a bit of shared runtime.

Good amazon opened up their stuff, there should be more competition on this front.


This is some impressive stuff! Congrats on a sizable achievement. I just went through your website, have a few questions...

Questions:

1. SCALA: is the generated code ONLY scala? are other languages supported?

2. CODE-GEN: is it designed only for code generation for a target http framework or does it actually provide an "API Server" itself ( e.g. like graphQL )

3. COMPARISON: the list of frustrations with other solutions, listed in your intro/technology goals are somewhat high-level. what are the "leaky abstractions" or "non-deterministic behaviour"?

4. SETUP: does someone need to know scala to use this ? ( depends on #1 above )

5. DSL : is the protoforce code implemented as a DSL in Scala or its a "language" in itself?

6. IDE : how do you check/compile the code? does it integrated with an IDE?

As a Scala Engineer myself ( though I mostly work in Kotlin now for Android/Server ), this looks great, but most Scala engineers i've met are focused on Spark, and using Play Framework/Http4S, etc. How big is the actual market for Scala API tools?


Thank you.

1. Scala, Typescript/Javascript, and Java at the moment.

2. It does provide the runtime which allows to bootstrap a server easily. (You can check out this post which has modeling + scala setup example at the bottom https://www.protoforce.io/ProtoForce/post/extensive-guide-to...)

3. Please take a look at the documentation, it has a good outline of the features supported. There are many features, most are well documented there.

4. No, not really. You can do with other languages, it provides both client & server sides, so no other language is needed. Again, you can still generate client side stuff for other languages and use them to connect to your server.

5. protoforce website was implemented using the protoforce DSL itself. The parser and transpilers are written in scala. The portal is written in typescript + react.

6. There is currently a sandbox at the website which you can experiment in. There is no currently integration with other IDEs, but language server can be added a bit later for VSCode for instance.

Hope this answers a bit :)


FYI - your website is an unreadable catastrophe on Firefox for Android.


We focused on the desktop because it is difficult to use from mobile due to it being an online IDE. I'll take a look, thank you for reporting.


I get your thought process, but two things:

1. If you use flexbox from the beginning, you can easily have basic readability on mobile.

2. A huge % of users will discover things on mobile, even if those things are desktop apps.

I wasn't even able to tell what your product is at a basic level.


Noted, thank you.


Reminds me a lot of this talk: https://www.youtube.com/watch?v=j6ow-UemzBc

I think it's a really powerful paradigm. I think an org adopting such patterns widespread leads to some really rad capabilities. GDPR compliance, for example, is really tough in SOA architectures in companies without a ton of engineering capacity, but this sort of data/api introspection could do a lot to change that.

Any plans to open source projection for languages other than Java?

edit: I see that typescript and rust are available on github as well


We expect to have Smithy code generation for every language we support as official AWS SDKs.


There’s a Go implementation buried within the V2 SDK.


AWS' new Rust SDK[0] is generated using Smithy models.

[0]: https://news.ycombinator.com/item?id=27080859


This is great. I'm happy to see more evolution in the service definition space, especially in using custom DSL's. The low signal-to-noise ratio and boilerplate in OpenAPI and RAML is a killer IMO.

We're building something similar with Taxi (https://docs.taxilang.org), which provides a rich way to describe services and data.

Similar to Smithy's CityId example, we provide the ability to semantically describe attributes. Our product - Vyne (https://vyne.co) can then use these Id's to automatically chain and orchestrate any services together, without having to write integration code.


I gotta say this is really neat. Definitely going to take it for a spin. I’ve always felt like a lot of these tools have too much ceremony - this feels light and simple.


It’s elegant. It doesn’t overreach.


How does this compare to Google’s protocol buffers? [0] It looks like smithy has a broader set of applications.

[0] https://developers.google.com/protocol-buffers


To me it seems like protocol buffers are a serialisation format while Smithy is just an idl used to describe services.

The services are free to serialize the data in any way be it json, XML or even protocol buffers.

So Smithy is more comparable to OpenApi than Protocal buffers.


Protobuf is commonly used to refer to both the serialization format and the IDL. You can use protos to specify JSON/REST endpoints. In past companies I've also used it to specify DB schemas.


> while Smithy is just an idl used to describe services

What's the use of such descriptions aside from diagrams? I realize there's e.g. terraform that use similar kind of language to describe what to create/destroy.


The main use case is to auto generate clients for services and even create stubs for service implementations.

AWS sdks need to be implemented in dozens of languages so something like Smithy helps in code gen for multiple languages, avoiding the massively manual task of creating client apis.


Amazon has done this internally across several different iterations and it's fantastic. Being able to write a service definition and object model and then generating clients and service stubs makes it easy to model what client/server interaction will look like apart from implementation. When I was there, there were at least 3 protocol representations that you could expose for the same model, make it simple to support a variety of use cases.


I feel like OP might be thinking of gRPC, which seems to closely resemble the use case Smithy is tackling.


gRPC to my knowledge uses protocol buffers to define services. But it's more of a framework for doing RPC.

Smithy is a much more abstract notation for defining services independent of any implementation details, and it is resource based rather than message based.


Protobuf and gRPC are great, and AWS will continue to make sure developers can be successful using them with AWS. I’ll try to explain how we ended up at Smithy instead of using other existing tools.

We started working on Smithy around 2018 because we wanted to improve the scale of our API program and the AWS SDK team to deal with the growing number of services (over 250 now!) and languages we want to support in official AWS SDKs (like the newly released Rust SDK). We had a ton of existing services that we needed to be compatible with, but we also wanted to add new features to improve new services going forward too.

We needed a very flexible meta-model that allows us to continue to evolve the model to account for things like integrating with other systems and to model service-specific customizations that each AWS SDK team can implement independently. Smithy's meta-model is based on traits, a self-describing way to add more information to models. Lots of validation can be built in to custom traits, which helps to ensure that service teams are using traits properly and adhere to their specifications. Smithy's resource modeling helps us here too because it allows AWS service teams, as they adopt Smithy, to essentially automatically support CloudFormation resource schemas. Resources also help us to point service teams in the right direction to make their services work well over HTTP (which methods to use, URIs, safety, idempotency, etc).

We needed an integrated model validation, linting, and diff tool to keep services consistent and detect breaking changes, and it needed to support company-wide standards as well as service-specific standards. We use Smithy’s validation system to automatically enforce API standards, and service teams often create their own service-specific rules to keep their own internal consistency.

We needed built-in input validation constraints so that they're standard across services and clients (e.g., length, range, pattern, etc). We didn't want to rely on third-party extensions to provide this feature since validating inputs is important. AWS uses internal service frameworks that enforce these constraints and are compatible with Smithy models. We're working to create open source service frameworks for Smithy as well.

We also wanted to support various serialization formats so that clients work with all of our existing services spread across JSON, XML, query strings, RPC, and HTTP APIs, but we also wanted to be able to evolve our serialization formats in the future as new technology comes along. That's why Smithy is protocol agnostic (like gRPC actually). The serialization format is an implementation detail. Smithy has some support for MQTT as well.

And finally, we need our code generators to be really flexible to support service customizations. There's quite a few customizations across AWS services, and we needed a way to inject custom code generation logic in various parts of our generators.

Smithy is still in heavy development, and we're working on building out more of the tooling so it can be used easily outside of AWS SDKs too, including client and server code generation.


Thanks for the thorough response! This helps me understand the motivations behind Smithy. I’m going to dig into the project and keep an eye on it as the tooling develops.


The history and origins help frame the Smithy use case for others and where it may not apply. Thanks for sharing!


> Smithy's resource modeling helps us here too because it allows AWS service teams, as they adopt Smithy, to essentially automatically support CloudFormation resource schemas.

Hallelujah!


Does Smith support describing OAuth ?


There's not a trait for it yet, but Smithy is designed to be auth agnostic so new auth traits can be added by anyone. There's a meta-trait called authDefinition[0] that you can apply to your own trait to indicate that it's an auth trait. With that your trait would show up anywhere else auth traits are found in the Smithy tooling. We're designing the code generators to be extensible enough that you could then fairly easily implement just the necessary bits.

[0]: https://awslabs.github.io/smithy/1.0/spec/core/auth-traits.h...


Pardon my ignorance: once one gets to:

https://awslabs.github.io/smithy/quickstart.html#next-steps

well, what is it? What was autogenerated? I know JAVA from previous jobs, but not Gradle and whatever the Smithy generated artifacts are to be composed with is unclear to me.


It's not really a fully finished project yet, so not much. We shipped the AWS SDK for JS v3 with Smithy, the AWS SDK for Go v2 with Smithy, and just launched an alpha of the AWS SDK for Rust using Smithy. More are in the works. We're currently iterating on their code generators to make them easier to use outside the AWS SDKs. AWS SDKs are being built in a layered approach where there's a generic code generator that's really extensible, and then the AWS SDKs extend it to add AWS-specific stuff like regions and credential handling.

We're working to get projects like these to GA: https://github.com/awslabs/smithy-typescript, https://github.com/aws/smithy-go, and https://github.com/awslabs/smithy-rs. And we're also working on service code generation.


I gather the impact of Smithy is to generate something like an old style CORBA client/server stubs and vocabulary types for a given serialization, and communication schema (HTTP2, gRPC) in a target language? Is Smithy specific to interoperating with AWS services or can used in pretty much any distributed system?


Pretty much, but Smithy can be used for anything and isn’t specific to AWS. The AWS modeling support in Smithy is all through extensions that aren’t part of the core.

It’s also protocol agnostic and can be used in a lot of applications (HTTP, MQTT, and we are even experimenting using Smithy to generate C ABI bindings for non client server stuff).


Python?


Python with interoperability with FastAPI and Pydantic models would be fantastic


I understand that codegen is not yet "mature"nevertheless it would be great to have some insight how code generation is intended to be done. It would really be cool if there was a "simple" code gen example using a nice templateing engine.

Any timeline for "basic" codegen availability? (C++, JS, go, rust, etc. Will it be weeks, months, years?

Also an example for serialization would be nice. How would one be defining a service that can be interacted with via multiple transports? (HTTP/MQTT/UDP in one description for example) My current guess would be that you have to establish traits specifying the details or is there some equivalent to a Franca IDL deployment description?


For curiosity and testing what I learned from CocoR https://ssw.jku.at/Research/Projects/Coco/ I created a parser for Smithy that anyone can see/use here https://github.com/awslabs/smithy/issues/793 , also include a transformed ABNF IDL to an EBNF accepted by https://www.bottlecaps.de/rr/ui


Starting a project like this from scratch must have been a bold move for the team involved. I imagine that, in a pitch, the outcome of this tool seemed too good to be true and they actually delivered it. Well done!


Smithy, Protoforce, Taxi - is the assumption synchronous request/response? OpenAPI has callbacks (https://swagger.io/docs/specification/callbacks/), I am not seeing that in these others.

The other thing I am not seeing is a service registry, eg something like UDDI. How do microservices know about each other? Is that a build over common IDL and a coordinated deploy?


Smithy has something called event streams that send async datagrams: https://awslabs.github.io/smithy/1.0/spec/core/stream-traits....

This is currently used in Amazon S3, Kinesis, Transcribe, and other services.

Smithy doesn’t have a service registry today. However, models can be vended and shared via Maven. Client codegen was designed explicitly to not require coordinated releases of clients and servers (that’s impossible for AWS SDKs).


Thank you. Some new vocabulary with Smithy: Prelude, Shapes, and Traits (https://awslabs.github.io/smithy/1.0/spec/core/model.html ). With all the evolution, the lineage story would find an audience, I bet.


Its not exactly what youre atter but smithy has waiters https://awslabs.github.io/smithy/1.0/spec/waiters.html

> Waiters are a client-side abstraction used to poll a resource until a desired state is reached, or until it is determined that the resource will never enter into the desired state.

I havent seen them put behind an async interface but it seems like a good match for a lot of StartWork, DescribeWork(token) patterns.


Taxi+Vyne gives you a complete service registry, so both users and systems can discover services and data.

Because Taxi lets you describe how data from services relate, Vyne can work out how to connect services together automatically, and handle the integration for you. This is a realisation of the UDDI concept, where systems can autonomously work out how to operate with each other.


Thank you. Your recent blog post (https://blog.vyne.co/rethinking-api-consumer-patterns/ ) was a positively interesting read. I wonder if there's a query analyzer a ways down the road ...


For protoforce, we provide completely asynchronous server and client SDKs for scala and nodejs, also we suport websockets as a transport.

Java sdk is built on Futures, so somehow it's async as well.

Also we support server-to-client calls, which, effectively, are a better alternative to callbacks.


Is this supposed to generate the language bindings too from the Smithy schema? How is that done? Can't seem to find info on that in the Guides or examples.


I wonder if Google is going to extend this https://fuchsia.dev/fuchsia-src/development/languages/fidl for other consoles...


"The primary difference between Smithy and OpenAPI is that Smithy is protocol-agnostic, allowing Smithy to describe a broader range of services, metadata, and capabilities. Smithy can be used alongside OpenAPI by converting Smithy models to OpenAPI."


Smithy structure definitions don't have field indexes like in protobuf or thrift. It also has required fields, where proto3 intentionally did away with them.

How do these differences impact backwards-compatibility-safety of Smithy schema changes?


Smithy today doesn't support any serialization formats that require fixed ordering. We do recommend that any additional members added to structures are added at the end to help with C++ and Rust codegen though.

That said, traits can be used in Smithy to enforce constraints on structures, so if you ever needed explicit indexing like that, it could be done via traits and protocols (Smithy's nomenclature for describing how clients and servers communicate). In fact, protocols are defined by traits, and traits can enforce requirements on the rest of the model using a DSL called selectors... Probably way too much info other than -- it's possible and easy to support this in Smithy if it's ever needed.

As for required vs optional -- today it's treated as server-side validation only and not used in client codegen. This allows service teams to remove the required trait from members if something ends up needing to be optional in future without breaking clients. We're working on some ideas too to see if we can generate even better code for SDKs in languages like Rust where optionality is very explicit, but without sacrificing the ability of being able to remove the required trait.

And, in general, backward compatibility issues are caught with Smithy diff, which also supports custom rules: https://github.com/awslabs/smithy/tree/main/smithy-diff


What's the difference between the bracketed syntax, e.g. [City] and `list`?


Smithy is very focused on codegen, so the model is highly normalized. So for example, defining a list of something needs to be done using a `list` shape. This kind of list is something you'd see directly serialized and sent over the wire. For example:

list Messages { member: Messages }

Then you can reference Messages from other places in the model, like from a structure:

structure Something { messages: Messages }

In contrast, the `[City]` syntax is used in other places in the IDL to define a relationship to a shape. This isn't something that gets sent over the wire, it's just used to form essentially a relationship in the service graph from a service to resources, a resource to operations, an operation to errors, etc. For example:

service Weather { resources: [City, Sensors] }


I think, from reading around the examples/spec a bit, collections of items use brackets to bound the collection. The examples are collections of a single item though, so it's slightly confusing at first glance. One hint at that is the keys defining those collections seems to be pluralized in all the cases I've seen so far.


Couldn't find a way to generate models for Python - not sure if that is because it doesn't exist or I just cant find it. The code gen part seems a bit under documented.


No Python codegen yet. Our plan is basically that as we migrate AWS SDKs to Smithy, we'll also offer generic client generators too. The Python migration hasn't started yet.


Is this how the Python `aws` CLI and boto is implemented?


Kind of. The AWS CLI uses an Amazon internal modeling format used to define services that's based on another Amazon internal modeling format that has been in use for about 15 years (and it's based on another internal model etc..). Smithy is basically the open source v2 of both, but with a public spec and tooling. Eventually all the AWS SDKs and the AWS CLI will adopt Smithy. (I work on the AWS SDKs and created Smithy)


Are there examples of utilizing Smithy with WebSockets as the transport? I found the documentation of MQTT bindings and the general information about event streams. However, I'm struggling to map it all together. I imagine there will need to exist WebScoket bindings?


We haven't built a WebSockets based protocol yet. And yeah, without actual server-side support, it is a little meta right now. We're working on it and hope to roll out a few languages this year.


Pretty much all APIs at Amazon , internal and external, are either defined using this or its precursor and then per-language clients and server stub implementations are autogenerated based on the model. That’s true of boto3. Not sure how much the of CLI is autogenerated, but the CLI uses boto(core) under the hood so it’s involved one way or the other.


what ever did happen to wsdl?

ducks


Why duck? WSDL still beats the crap out of most newer would-be contract definition standards. OpenAPI is ok, but the tooling comes from all from the same place.


soap was a really good interoperability protocol for adding kilobytes of overhead to simple text based rest rpc where interoperability was defined as being able to interoperate with clients and servers from the exact same software environment.

trying to get perl to talk to windows or windows to talk to java or java to talk to perl never really worked at all.

but wsdl was like this tempting thing. self describing services! (that only worked when you had a full definition of the service on the client side anyhow)

i'd take a wild bet that this stuff at amazon grew out of frustration with soap a long time ago...

edit: read new replies to thread. this is new as of 2018. soap was circa 2002. ah, well.


Apparently this is V2 or even v3 of something internal. So probably V1 did what you were saying :-)


Yup. Smithy is from a lineage of internal tools that have been in use at Amazon since the early 2000s. From my dive into software archaeology at Amazon (I work there), there was a bit of SOAP in use in the 2000s, and some other internal model formats that are now obsolete claimed to be SOAP like. As the years went on, other internal formats came out to replace other formats, until the internal model format became something distantly inspired by SOAP, but very practical and tuned for cross language code generation so it could power AWS (that’s my take at least). That was in use for well over a decade before we built Smithy to improve on and open source the internal format.


IIRC SOAP helped give the browser world XmlHttpRequest which eventually helped kickstart the whole AJAX web 2.0 / modern webapp world we love today. So in some ways it was an important stepping stone to get there, just a dead end from the general demise of XML tooling.


JSON


...and this thing looks like it scrapes out all the complexity and leaves the good bits that wsdl was supposed to be...


It is the good thing




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: