all 62 comments

[–]heyoniteglo 52 points53 points  (5 children)

This is a good reminder for all the new people showing up and maybe even a reminder to some of us who have been around a while. Sometimes old faithful is all you need. Sometimes it's all you need to fall back on. Thanks for your thoughts.

[–]utf16 12 points13 points  (4 children)

I was listening to this new AI engineer talk about how he's having difficulty getting the ChatGPT to call a tool consistently at specific times. I then taught him the wonders of Cron, and he was amazed.

Sometimes, all you need are simple tools.

[–]beerpancakes1923 3 points4 points  (1 child)

This can’t be real 🤦🏻‍♂️

[–]IpppyCaccy 3 points4 points  (0 children)

We all have our blind spots. Also, sometimes you're so focused on a problem that you can't see the simple solution. This is why explaining it to the Teddy bear often helps.

Another effective technique is to explain your code to someone else. This will often cause you to explain the bug to yourself. Sometimes it takes no more than a few sentences, followed by an embarrassed “Never mind, I see what’s wrong. Sorry to bother you.” This works remarkably well; you can even use non-programmers as listeners. One university computer center kept a teddy bear near the help desk. Students with mysterious bugs were required to explain them to the bear before they could speak to a human counselor. --- Brian Kernighan and Rob Pike, in The Practice of Programming

[–]primordiaI_gloop 3 points4 points  (1 child)

Had a similar interaction just the other day. A guy was stressing over the best way to consistently crawl for new content on a blog—with an RSS feed and its feed link prominently displayed, complete with a nice, large icon at the top of the site. It struck me how some of the newer “devs,” armed with No/Low code tools and LangChain, could easily get confused by this. But hey, more power to them, right? If they can build what they need for their personal use case without any dev experience, I have nothing against that.

Though I couldn’t help but wonder why they didn’t ask GPT for one of the many many ways to achieve these types of goals.

...It’s almost as if people are forgetting that automation, data scraping, cloud functions, process batching, etc., were billion-dollar industries long before LLMs came into the picture.

[–]ActiveAvailable2782 0 points1 point  (0 children)

Same like younger people can't typing in physical keyboard vs on screen.

[–]jollizee 58 points59 points  (3 children)

You left out what I personally consider the most important. Owning the model means you can tinker with it at will, i.e. finetune however you want on anything you want. Privacy is secondary to me.

OpenAI has limited finetuning, as does Gemini. Claude has none. (We're not counting the fact that you don't own your finetune since it runs on their servers and could go poof any time.)

I haven't had the bandwidth to try finetuning yet, but that' something I really want to get into...maybe if more companies like together.ai come up with easy finetuning-for-dummy platforms. I think you can do up to 7x8b MOE finetuning there.

[–]SomeOddCodeGuy[S] 24 points25 points  (2 children)

Oh man, that's a really good point. I can't believe I forgot there.

Yea, finetuning is especially useful for people with automated workflows, or perhaps someone who wants to automate some of their own writing (finetuning on their style).

The ownership also plays in when you don't want to finetune someone else's thing in your style/voice. I'm really not keen on training some website on how to talk and write like me =D

[–]ainz-sama619 13 points14 points  (1 child)

Fine-tuning is probably the biggest feature of open-source imo. Most people don't care about privacy that much, and privacy arguments have never turned off anybody from using a product. Having control over the model is what makes it exciting.

[–]SiEgE-F1 4 points5 points  (0 children)

Some people are not PC literate enough to understand, why lack of privacy in that matter is bad, or are illiterate enough to understand that they are "dumped into an open sea" and nothing they can do would save them. Not only they cannot protect themselves if they knew where the punches are coming from, but they are simply unable to see the vectors of attacks themselves. They understand that "sharing your bank account password to your cloud password guard" is a bad idea, but they are left with nothing but hopes and prays, just closing their eyes on the fact that some bored guy might read through the database once or twice and "note a neat and easy password" to use few years later, after he got fired. All for sake of convenience.

[–]elilev3 23 points24 points  (2 children)

Yeah. New people should realize how much more hopeless the open source scene felt if you go back as recently as even 2022. The best option back then was GPT-NeoX...which was hopelessly incoherent, and still required insane hardware to run, despite being low parameter count (there were no 2bit/4bit quants back then)

We went from that, to GPT-4 level performance on local hardware. If you showed Llama 3 8b to someone in 2022 they'd probably shit themselves.

[–]Merosian 1 point2 points  (1 child)

Let's not exaggerate, Llama 3 8b is nowehere near gpt4 in terms of performance. It's the 70b model that's getting compared, and the unquantized version at that. It does not run on local hardware unless you have a lot of money to throw around.

[–]Due-Memory-6957 2 points3 points  (0 children)

I don't think they're comparing Llama 3 8b to GPT-4. GPT-4 was launched in 2023.

[–]Ylsid 7 points8 points  (0 children)

Yes, but there are totally cases open source AI outperforms business AI tho

[–]jerieljan 4 points5 points  (0 children)

Perfectly said. The only thing I'll add to these is that local models are free for you to use as you see fit as long as your hardware permits you to. You don't have to deal with token / mtok usage, budgeting between models, etc. Just let the model ramble on and on, and let your creativity flow from it.

[–]Eastwindy123 3 points4 points  (0 children)

Also if you're a startup or a business. Using a fine-tuned open source model at scale is still wayyy cheaper than using openai. And there's no reason not to enjoy both. AI is winning in general. There's so many options and you can pick and choose what you like.

[–]lywyu 4 points5 points  (1 child)

Great post and you're absolutely right. But "Open Source isn't going anywhere." mentality can lead to people taking things for granted. The open-source community needs more involvement from people now more than ever, especially on AI-related research and development. While Meta is doing great work, ultimately it's still a public company that needs to post profits for its shareholders. I don't expect this free ride to continue for long.

[–]SomeOddCodeGuy[S] 1 point2 points  (0 children)

You're very right about that. I probably should be careful not to push complacency too much.

I don't expect this free ride to continue for long.

Very true. There are times I wish we at give donations for them. I don't use open source models because they're free, I use open source because it's a higher value to me than closed. Im glad it brings value as-is to companies, but I'd happily throw money at a company for a model, even as a donation.

[–]Red_Redditor_Reddit 20 points21 points  (13 children)

I don't mean to spoil the mood here, but why exactly are you writing like there's been some great defeat here? I'm being serious.

Maybe I'm out of the loop, but all I've seen recently is that gpt4'o' thing. The voice thing would work with llama if it wasn't a bunch of Unix pipes all Jerry rigged together. I'm not real sure how the vision works but I'm sure it can be repeatedly analyzed or the LLM triggers it.

Like what am I missing here? I was way more impressed by the text to video demo.

[–]a_beautiful_rhind 9 points10 points  (4 children)

There are vision models around. There will be more.

A ton of people came from singularity and started dunking on us. That our <100b models don't have the gimmicks of 2-300B+ corporate models.

So we should just give up because we can't have low latency two way voice; even though most people type to AI rather than talking to it like some dingus.

[–]skrshawk 1 point2 points  (3 children)

Which given how some of us use our LLMs is probably a good thing, because another set of people would be trying to see how terrible they could get the AI at the Wendy's drive-thru to behave.

As much as for responsible users of AI we dislike guardrails and would much rather accept responsibility for our choices, some people will not do that, and make a public nuisance of themselves instead. In that scenario, the guardrails mean not letting them hold up the line.

Or letting kids interact with it. I remember Midnight Miqu got evaluated a while ago and randomly inserted "you horny devil" into a business memo. Funny as it was to us, that kind of thing needs to not happen when it's a five year old talking to it on their iPad.

[–]a_beautiful_rhind 2 points3 points  (2 children)

Kid friendly LLMs need to be separate. Character.ai tried an "all ages" llm and it's wrecking their service.

make a public nuisance of themselves

A lot of those people do that with no LLM at all. Ton of video about people pranking, getting violent, etc in public places. Using some TTS seems like the most benign form of it vs starting fist fights with the people behind the counter.

randomly inserted "you horny devil" into a business memo

That's the bane of all LLM. Check the outputs or you'll be sorry. If it's not horny devils, it's hallucinations. Some lawyer got caught citing non existent cases.

[–]skrshawk 4 points5 points  (1 child)

Oh I agree, LLMs for specific sensitive use-cases should be different, and the general use model assuming an adult using it in good faith. Gardening tools don't need to be redesigned to prevent malicious use even though I can whack someone over the head with a shovel.

No disagreement either that people misbehave in public before LLMs and that's not going to change a thing, and that people do ignorant things with technology all the time, I know an older couple who've fallen for the MS tech support scam three times already and lost over 10k to it. It's easy to say these are people who should be given a brick phone and their internet access cut off, but when your doctor's office won't return phone calls and makes people go through their online portal, it's not very realistic at all.

We can get pretty biased around here because we're on the bleeding edge of tech, especially when we try to forget just how stupid people have always been.

[–]a_beautiful_rhind 0 points1 point  (0 children)

makes people go through their online portal,

I hate the appification of the world and fight it as much as possible. You shouldn't need a smartphone for basic life things. At the base of it, what if you lose it or it breaks?

I enjoy LLMs immensely, but don't want to see them and other models being misused for surveillance or manipulation. That I think is the real danger and not some nebulous AGI who will decide to take over the world.

The saddest thing for me is not that old people fall for scams but that the younger generation is lacking computer literacy, they literally have no excuse. Their refusal to learn contributes to negative consequences for the rest of us when ignorance is the majority.

[–]one-joule 3 points4 points  (1 child)

The voice thing would work with llama if it wasn't a bunch of Unix pipes all Jerry rigged together.

Why would it? It's not like you can get GPT-4o levels of conversational latency by tacking on STT and TTS models. You need a multi-modal model that directly understands and emits speech. That requires new training data for the new modality at minimum, as well as optimizing the design to drop latency to almost nothing to reach conversational levels end-to-end. And out of the 250-350ms budget for feeling conversational, you're going to lose 50ms at minimum just shuffling audio around.

[–]smallfried 0 points1 point  (0 children)

You're right, we won't reach those very low latencies without a new base model. But anything below about 2 seconds latency is fine with me, which is doable right now. Reminds me of calling long distance in the past.

[–]AlanCarrOnline 2 points3 points  (2 children)

No great defeat but it's like that Raiders of the Lost Ark thing, where we're getting real good, really really good at the whole sword-fighting thing, and then that motherfucker over there pulled a revolver... and ka-plooey?

So while I'm one of the first to post "Where gguf?" I'm not posting "Where .revolver?" or "Where .9mm?" cos I know there aint gonna be any for a long time.

That could cause a case (or many cases) of the sads?

I just ordered a $3K PC for local AI, which hasn't even arrived yet and I'm already getting people asking "So will your new PC do that talking thing, like my phone?"

Grrr....

[–]TwilightWinterEVEkoboldcpp 1 point2 points  (1 child)

With an alltalk finetune, your PC can do that talking thing, like their phone. It won't be at the same level because open source TTS just isn't, but you can see how far it's come in the past 6 months so it won't be long.

[–]AlanCarrOnline 0 points1 point  (0 children)

I'm gonna have to list down all the things I wanna install on the beast, and that sounds like maybe one of them?

I have some vague idea what fine-tuning is but not really...

[–]SomeOddCodeGuy[S] 0 points1 point  (0 children)

When I get bored I browse through LocalLlama, and Ive noticed a lot of new faces seem a little deflated about Open Source not being at that level, or trying to understand why it isn't, etc.

There are a lot of new folks in the sub that are still finding their feet and looking for their personal value of open source. And depending on why they are here, I can absolutely get that. I think Llama3 really put this place on public radar.

But there's no defeat here: that's the whole point. Comparing Open Source to Proprietary is like comparing a pickup to a semi. The semi hauling a huge trailer isn't a defeat for the pickup; the pickup wasn't ever meant to do that. But it'll fit nice and snug into parking decks that the Semi never will, and it can tow a lot of other stuff. Open Source and Proprietary are a bit similar in that regard.

If folks come in with a mindset of expecting to download something that is even 80% of the capability of current proprietary, they are in for a bad time. But if they come in looking for something stable, secure and completely private, even if its a good bit smaller and can do less out of the box/requires more tinkering to get it close? Well, that's what this sub's all about.

[–]MrVodnik 2 points3 points  (0 children)

It can be cheaper. There is not throttling (famous "you've reached your daily limit" even for paid users on GPT4). You can edit their replies, which allow quick jailbreak of any model, or just a slight tweak to the conversation when needed. You can use it as a BE for your personal project without fear of being cut off from it for whatever reason (like banning FB account, or Google killing another project).

And more, depending on your use case. If you need to be cheered about open source / locall AI, then you do not know enough about it.

[–]windozeFanboi 2 points3 points  (0 children)

I don't know man, llama3 is great at 70B, Gemma2 perhaps getting close at 27B is not too shabby, but the promise of Phi 14B is unmatched in todays landscape...

If Phi14B delivers according to (preview) stats, then it's gonna do what Gemma2 is gonna do at 27B which is close to LLama3 70B...

And it's not that farfetched to say that If a 14B Phi3 can do so much just for text, we might as well get an incredible multimodal Phi-4o at 20B...

Or a Bitnet Phi-4o at 40B...

All these might be a struggle to run in 2024 for most people, but in 2025 it will be better and in 2026 they will be a breeze.

[–]SiEgE-F1 2 points3 points  (0 children)

Compliance: you can find local models that won't immediately lecture you about animal cruelty when you ask how to kill a Python process. Sometimes you need an answer, not a lecture.

The whole "talking down on you" is the worst part. Just drop the connection/block the request. Point. Don't waste my token limits.

Its always available. There's no maintenance or global outage

And not just that. The capability for the local AI to be active in an off-grid environment, with no active internet, no "cloud super computers", no big company's supervisory, with no limits to the amount of requests is just TOO GOOD. People really, really underestimate that fact.

It is like having a luxury, full leather, super quiet, super fast car being available at the same price of a second hand, roughed up volksvagen, or even cheaper, but the car would refuse you from going off road and out of the city, because your subscription doesn't support that, and will revoke itself if you try.

AND we are yet to figure out the best use cases for the LLMs, and their actual value. And them being available off-grid should become the actual drive that will help us do so.

[–]AlanCarrOnline 5 points6 points  (0 children)

Totally agree with everything, and just feel you missed an important point that I'd like to add, if I may?

That being PLEASE HELP AND WELCOME THE NOOBS.

In fairness that's what your post is about, but it can be stressed more - this new thing is full of jargon, abbreviations and entirely new concepts to most people. People asking about the hardware they need or how to navigate github need help, not downvotes.

The more people embracing locally-run AI the better, for many, many reasons, from companies wanting the karma to consumer hardware development.

Noobs are friends, not food.

[–]danigoncalvesLlama 3 1 point2 points  (0 children)

Nice post, I would only subscribe whats written there. If we share and help each other great thing could happen, and who know if in the future its OpenAI who look at us and see some features of their "preview" 🙂

[–]SikinAyylmao 1 point2 points  (2 children)

I prefer open models, however, the compute share you get from closed companies is worth it in its own right. Depending on your setup closed models can be between 2x to 100x faster than your open model simply because of the compute share.

[–]SomeOddCodeGuy[S] 0 points1 point  (1 child)

Oh I agree. Even though Im staunchly pro-open source, I've kept a ChatGPT sub this whole time simply because there are some technical questions that my local models just couldn't answer properly. Realistically, I always will (not necessarily OpenAI, but some proprietary model).

I just treat their models a bit more carefully, only telling it things I'd be fine to post openly on reddit.

[–]TwilightWinterEVEkoboldcpp 1 point2 points  (0 children)

I'm the same, I keep a Claude sub even though I prefer open source, because there are some things where I really need the accuracy and speed of Claude Opus vs the Q2 70B I can run locally... but like you, I don't like to tell it anything I wouldn't post on Reddit.

[–]Next_Program90 1 point2 points  (0 children)

Very well written. I feel the same way and it's good that you took the time to properly put it into words for everyone who needs to read them.

[–]hashmiabrar1 1 point2 points  (0 children)

Can you guide us towards making local LLM's

[–]crazyenterpz 1 point2 points  (0 children)

OpenAI could train its model on a Bazillion parameters but will never be good enough to use in our day today life. For example it will never know what Lydia , from warehouse, ordered last week especially whem Jamal , our vendor manager has contracted new prices. Yes this data is in some big and dumb inventory management system , which is painful to use.

Only a home grown RAG based system will know that. And for that purpose, smaller LLAMA or Mistral models are good enough.

[–]mikebrave 1 point2 points  (1 child)

Initially I thought this was going to be a list of model recommendations for specific tasks "this one is good at labelling art, this one is good at python code, this one at C# code, this one for math" etc.

[–]SomeOddCodeGuy[S] 0 points1 point  (0 children)

lol I bet that was disappointing. I can't offer much there, but I can give this for coding which I've been using =D

https://prollm.toqan.ai/leaderboard

Also, someone posted this for creative writing

https://www.reddit.com/r/LocalLLaMA/comments/1csj9w8/the_llm_creativity_benchmark_new_leader_4x_faster/

The short answer atm looks like it's "WizardLM-2-8x22b for both"

EDIT: And Deepseek 67b if you use Excel lol

[–]multiverse_fan 3 points4 points  (1 child)

I designed a crotch fan. It's was... revolutionary. Thanks LocalLLaMA 😄

[–]one-joule 5 points6 points  (0 children)

Does it have an off-center weight as part of the rotating mass by any chance?

[–]ab2377Llama 8B 3 points4 points  (0 children)

💯 pin this post damnit.

[–]Lemgon-Ultimate 1 point2 points  (1 child)

There's nothing more dystopian than to think about a single instance owning almost all AI capabilities and that's not gonna happen. I'm a bit scared because OpenAI recently statet their antipathy for open source and that AI should be in the hands of a few corporations, which is really evil. I can only hope they change their stance on this and accept open source. Other than that I view the new GPT-4o more like a preview on what's possible and what's coming to open source next year.

[–]TheMissingPremise 1 point2 points  (0 children)

I can only hope they change their stance on this and accept open source.

I'll give you another option, abandon that hope and push for regulations that preserve open source (or at least don't create a moat for OpenAI' and other AI company's investors).

[–]New-Database-7703 0 points1 point  (2 children)

what makes open source better, i get using it locally which is nice(on my 2500$ PC) , but other than that what are the benifits?

[–]beerpancakes1923 1 point2 points  (1 child)

He just explained it

[–]smallfried 0 points1 point  (0 children)

But why male models?

[–]obsoletesatellite -1 points0 points  (0 children)

Open source is cool, but free software is better.

https://www.youtube.com/watch?v=fKUwfFcrVjU