Brian Roemmele · Jan 15, 2024 · 5:22 PM UTC

Brian Roemmele · Jan 15, 2024 · 5:22 PM UTC

Brian Roemmele

AI training data. A quagmire. 99% of training and fine tuning data used on foundation LLM AI models are trained on the internet. I have another system. I am training in my garage an AI model built fundamentally on magazines, newspapers and publications I have rescued from dumpsters. I have ~385,000 (maybe a lot more when I am done) and a majority of them have never been digitized. In fact I may have the last copies. Most are in microfilm/microfiche. I train on EVERYTHING: written content, images, advertisements and more. The early results from these models I am testing is absolutely astonishing and vastly unlike any current models. It is so dramatic on the ethos this model has you just may begin to believe it is AGI. But why? See from the late 1800s to the mid 1960s all of these archives have a narrative that is about extinct today: a can-do ethos with a do-it-yourself mentality. When I prompt these models there is NOTHING they believe there can not do. And frankly the millions of examples from building a house to a gas mask up to the various books and pamphlets that were sold in these magazines (I have about 45,000) there is nothing practical these models can not face the challenge. No, you will not get “I am just a large language model and I can’t” there model will synthesize an answer based on the millions of answers. No, you will not get lectures on dangers with your questions. But it will know you are asking “stupid questions” and have no people telling you like your great grandpa would have in his wood shop out back. This is a slow process for me as I have no investors and it is just me, microfilm and my garage. However I am debating on releasing early versions before I can complete the project. If I do it will be like all of my open source releases, it will be under an assumed name not my own. This is how I build AI models and is one answer to the question on why Human Resources at any large AI companies freak out on employees wanting me to lead their projects (you would find that conversations humorous). Either way I want to say there is something that will be coming your way that will be the sum total of the mentally and ethos that got us to the Moon, in a single LLM AI. It will be yours on your computer. You and I and everyone will never be the same.

Jan 15, 2024 · 5:22 PM UTC

290

370

108

2,045

psyv · Jan 15, 2024 · 5:48 PM UTC

psyv

@psyv282j9d

20h

Replying to @BrianRoemmele

Are you training a model(s), fine tuning, or building a RAG database?

Brian Roemmele · Jan 15, 2024 · 5:48 PM UTC

Brian Roemmele

@BrianRoemmele

20h

Thanks for asking. Both but testing is on a vector database.

more replies

AI with Noah · Jan 15, 2024 · 5:41 PM UTC

AI with Noah

@MyLearninCurv

20h

Replying to @BrianRoemmele

It makes a lot of sense as the training data is the essence to the whole thing. Curious to see how new personalities in the base material affect what models are able to come up with and how they do it.

Yishai · Jan 15, 2024 · 8:06 PM UTC

Yishai

@digitalirony

17h

Replying to @BrianRoemmele

Can you perhaps split up the efforts like Seti did back in the day? Deal out microphiche scans to the community? Outsource your garage?

Darby Bailey (McDonough) 🖍️💫 · Jan 16, 2024 · 3:11 AM UTC

Darby Bailey (McDonough) 🖍️💫

@DarbyBaileyXO

10h

Replying to @BrianRoemmele

"extinct today: a can-do ethos with a do-it-yourself mentality." - Brian Roemmele

Raytional · Jan 15, 2024 · 5:32 PM UTC

Raytional

@Raytional

20h

Replying to @BrianRoemmele

Is there a bit of Irish in you?

Sanctus Filius · Jan 15, 2024 · 5:50 PM UTC

Sanctus Filius

@JohnSan46851615

20h

Replying to @BrianRoemmele

That's fantastic! Popular Mechanics, Popular Science, Nat Geo, even the old Scientific American, were filled with optimism and a sense of can-do. In terms of a more varied look at popular culture, every issue of Playboy from 1953 to 2018 (at least) can be found digitized (77.5 GB). They present an amazing time machine viewport on fashion, music, movies, politics, electronics, and so forth. Interestingly, it was a Playboy centerfold that was used for the development of the JPG standard in imaging. theatlantic.com/technology/a…

Kevin Tieman · Jan 15, 2024 · 9:22 PM UTC

Kevin Tieman

@PlutonianGray

16h

Replying to @BrianRoemmele

Like others have said: 1. Need any help? 2. Want more old documents? Like National Geographics from 1920s? Various other old materials? It'd be nice if you could get almost everything available. 3. This sounds like such an important project ...

William Blankenship · Jan 15, 2024 · 5:42 PM UTC

William Blankenship

@wjblankenship

20h

Replying to @BrianRoemmele

The can-do attitude is infectious. After a few hours of reading Whole Earth and I wanted to build a buckyball green house in my side yard fed by vermicomposting. wholeearth.info Excited to use your AI

Whole Earth Index

Here lies a nearly-complete archive of Whole Earth publications, a series of journals and magazines descended from the Whole Earth Catalog, published by Stewart Brand and the POINT Foundation between...

wholeearth.info

Matt Dursh · Jan 15, 2024 · 7:38 PM UTC

Matt Dursh

@MattDursh

18h

Replying to @BrianRoemmele

this is great. the best copy ever written shows up in magazines.