I guess I only know transformers and how BERT or GPT works, as there would be a limit in the context length. With GPT, you can certainly generate infinite amount of tokens but the previous tokens outside of the maximum context length would be outside of the context window. LLaMa has 2k, GPT-4 has 32k.
Are you saying I can give unlimited tokens to PaLM and generate unlimited amount of tokens? So PaLM doesn't have a context limit?
No, I am not saying that. Since PaLM 2 is a transformer model (they didn't disclose almost anything about the model architecture, but they did disclose that), it has a context length limit. What I am saying is that you can't infer that limit from the limit of maxOutputTokens parameter in the API.
However I couldn't find anything about the context length of their model anywhere. And the API didn't tell me how long the prompt could be.