Here's the same scam from his stream 3 days ago: https://www.twitch.tv/videos/668914814?t=5h4m20s Abandoned Toyota Corolla on the south border of Texas with blood, _exactly_ 22 pounds of cocaine, cash. Multiple bank accounts and addresses registered with your SSN, funds wired out of the US.
i thought the entire point of the cloud was to make stuff like this not cause your site to go down, or did someone realize that only had to be part of the marketing
True, but then the question is it cheaper to build your own redundancy or duplicate all/most infrastructure across multiple zones? Once that becomes the question, the cloud not longer looks so simple or cost effective.
On the <|endoftext|>: GPT-2 and this model were trained by sampling fixed-length segments of text from a set of web pages. So if the sample happens to start near the end of one page then it will fill in the rest of the length with the beginning of another page. The model learns to do the same. TalkToTransformer.com hides this by not showing what comes after the <|endoftext|> token.
>How exactly the large GPT-2 models are deployed is a mystery I really wish was open-sourced more.
TalkToTransformer.com uses preemptible P4 GPUs on Google Kubernetes Engine. Changing the number of workers and automatically restarting them when they're preempted is easy with Kubernetes.
To provide outputs incrementally rather than waiting for the entire sequence to be generated, I open a websocket to a a worker and have it do a few tokens at a time, sending the output back as it goes. GPT-2 tokens can end partway through a multi-byte character, so to make this work you need to send the raw UTF-8 bytes to the browser and then have it concatenate them _before_ decoding the string.
While my workers can batch requests from multiple users, the modest increase in performance is probably not worth the complexity in most cases.
Thanks for the upvotes! The site is running quite a bit slower than planned, but I think I know why. I should be able to get it going at full speed around tomorrow.
You seem to be saying this work is based on convolutional neural networks. That's incorrect. It uses the same attention mechanisms from natural language processing which involve no convolution operations.
Convolutions have a different set of weights for each position offset (with a fixed window size), and reuse those weights across the entire input space.
Transformer-based networks like this work compute attention functions between the current position's encoding and every previous position, then use the outputs to compute a weighted sum of the encodings at those positions. Hence they can look at an arbitrarily large window and the number of parameters they have is independent of the size of that window.
It seems that using "fixed attention" for text would encourage the network to periodically summarize the context so far and put it in that fixed column for the rows below to access.
Maybe the reason "strided attention" didn't work as well is that it would require the network to put this context summary in every column lest the rows below be unable to access it. That would waste features since the summary wouldn't vary much over time but would still be stored in full at each step.
If this is true, the approach they used for images might actually be inefficient in a similar way.
That line refers to training the model from scratch. You can still run the trained model very quickly with one "cheap" GPU.
That said, I'm not sure why one wouldn't get a similar result training on the EC2 or GCE instances that have 8 V100s. Or even training with fewer GPUs but accumulating gradients to get the same batch size.
For all the 3D diagrams that I made (including the animated one at the end) I wrote code that used https://threejs.org/ and my custom library. It worked, but with a lot of hassle. In the future I'll likely try using Blender.
I’ve no idea what nvidia use but you could do this pretty easily in Blender with the Freestyle NPR renderer and the built in import images as planes add on.
Here's the same scam from his stream 3 days ago: https://www.twitch.tv/videos/668914814?t=5h4m20s Abandoned Toyota Corolla on the south border of Texas with blood, _exactly_ 22 pounds of cocaine, cash. Multiple bank accounts and addresses registered with your SSN, funds wired out of the US.