Here’s my story about hosting Reflection 70B on @hyperbolic_labs: On Sep 3, Matt Shumer reached out to us, saying he wanted to release a 70B LLM that should be the top OSS model (far ahead of 405B), and he asked if we were interested in hosting it. At that time, I thought it was a fine-tuned model that surpasses 405B in certain areas like writing, and since we always want people to have easy access to open-source models, we agreed to host it. Two days later, on the morning of Sep 5, Matt made the announcement and claimed the model outperformed closed-source models across several benchmarks. He uploaded the first version to Huggingface. We downloaded and tested the model, but I didn’t see the <thinking> tags featured in his demo, so I messaged him on X to let him know. Later, I saw his tweet saying there’s an issue with the tokenizer in the Huggingface repo (x.com/mattshumer_/status/183…), so we patiently waited. I woke up at 6 AM PST on Sept 6 and found I had received a DM around 3 AM PST from Sahil Chaudhary, founder of Glaive AI. He told me the Reflection-70B weights had been reuploaded and were ready for deployment. I didn’t know him before and that was the only message I received from him. At around 6:30, I was added to a Slack channel with Matt to help streamline communication. I focused on deploying the model, and around 9 AM our API was live, and the tests showed that the <thinking> and <reflection> tags were finally appearing as expected, so we announced that. After we released the model, a few people commented that our API worked worse than Matt’s internal demo website (but I kept seeing error codes using their website so I cannot compare the results), so we dug into everything to ensure it wasn’t a problem on our side. At 7 PM, Matt posted in the Slack channel, saying our API “definitely something's a little off”, and asked if we could expose a completions endpoint so he could manually build prompts to diagnose the issue. I set that up for him in the next hour. There was no response from Matt until the next day’s night, and he told us they were focusing on a retrain, which quite surprised me. On Sep 8, Sunday morning, Matt told us they would have the retrained weights uploaded to HF later and asked if we could host them when they were ready. I said yes and waited for the new models to be uploaded. Several hours later, someone on X pointed out the ref_70_e3 model had been uploaded to HF, so I asked Matt if that was the one. He said it should be, and a while after, he asked us to host it, so I quickly did that. I notified @ArtificialAnlys and later got on a call with their co-founder George in the afternoon, he told me the benchmarking result was not good, much worse than their internal API, and later they posted the results: x.com/ArtificialAnlys/status…. Matt told us that day that they had hosted the “OG weights” themselves and could give us access if we wanted to host them. I replied, “We will wait for the open-source one since we only host open-source models.” Since then, I’ve asked Matt several times when they plan to release the initial weights, but I haven’t received any response. Over 30 hours have passed, and at this point, I believe we should take down the Reflection API and allocate our GPUs to more useful models after some people (@ikristoph) finish their benchmarking (not sure if it's still useful). I was emotionally damaged by this because we spent so much time and energy on it, so I tweeted about what my faces looked like during the weekend. But after Reflecting, I don’t regret hosting it. It helped the community identify the issues more quickly. I don’t want to guess what might have happened, but I think the key reflection is: Attention is not all you need.

Sep 10, 2024 · 10:05 PM UTC

54
55
18
660
I feel for you. Nothing worse than seeing honest contributors burned because of their trust. On the plus side, you can give DeepSeek 2.5 a try, it's a real model at least!
2
128
Yes, DeepSeek 2.5 in our plan! It's unfortunate for them to have their release coincide with Reflection 70b
1
90
...but after REFLECTING...
1
16
There was no problem from your side, you managed everything well, and the community knows it. Sleep like a baby tonight 😅.
1
53
Thank you! finally I can! no more waiting for a newer huggingface model
1
43
Thank you for the clarification. It wasn't necessary. Everything was ok on your side. Your stance: "Let's host this model and serve community" deserves respect. The only thing that is sad is the burned GPU hours. It would be nice if @ArtificialAnlys do the similar clarification
2
17
Why doesn't he just fess up atp, the jig is up
5
30
he apologized
3
12