AI training data. A quagmire. 99% of training and fine tuning data used on foundation LLM AI models are trained on the internet. I have another system. I am training in my garage an AI model built fundamentally on magazines, newspapers and publications I have rescued from dumpsters. I have ~385,000 (maybe a lot more when I am done) and a majority of them have never been digitized. In fact I may have the last copies. Most are in microfilm/microfiche. I train on EVERYTHING: written content, images, advertisements and more. The early results from these models I am testing is absolutely astonishing and vastly unlike any current models. It is so dramatic on the ethos this model has you just may begin to believe it is AGI. But why? See from the late 1800s to the mid 1960s all of these archives have a narrative that is about extinct today: a can-do ethos with a do-it-yourself mentality. When I prompt these models there is NOTHING they believe there can not do. And frankly the millions of examples from building a house to a gas mask up to the various books and pamphlets that were sold in these magazines (I have about 45,000) there is nothing practical these models can not face the challenge. No, you will not get “I am just a large language model and I can’t” there model will synthesize an answer based on the millions of answers. No, you will not get lectures on dangers with your questions. But it will know you are asking “stupid questions” and have no people telling you like your great grandpa would have in his wood shop out back. This is a slow process for me as I have no investors and it is just me, microfilm and my garage. However I am debating on releasing early versions before I can complete the project. If I do it will be like all of my open source releases, it will be under an assumed name not my own. This is how I build AI models and is one answer to the question on why Human Resources at any large AI companies freak out on employees wanting me to lead their projects (you would find that conversations humorous). Either way I want to say there is something that will be coming your way that will be the sum total of the mentally and ethos that got us to the Moon, in a single LLM AI. It will be yours on your computer. You and I and everyone will never be the same.

Jan 15, 2024 · 5:22 PM UTC

290
370
108
2,045
Replying to @BrianRoemmele
Are you training a model(s), fine tuning, or building a RAG database?
1
1
19
Thanks for asking. Both but testing is on a vector database.
4
1
50
Replying to @BrianRoemmele
It makes a lot of sense as the training data is the essence to the whole thing. Curious to see how new personalities in the base material affect what models are able to come up with and how they do it.
2
Replying to @BrianRoemmele
Can you perhaps split up the efforts like Seti did back in the day? Deal out microphiche scans to the community? Outsource your garage?
12
Replying to @BrianRoemmele
"extinct today: a can-do ethos with a do-it-yourself mentality." - Brian Roemmele
2
41
Replying to @BrianRoemmele
Is there a bit of Irish in you?
3
1
28
Replying to @BrianRoemmele
That's fantastic! Popular Mechanics, Popular Science, Nat Geo, even the old Scientific American, were filled with optimism and a sense of can-do. In terms of a more varied look at popular culture, every issue of Playboy from 1953 to 2018 (at least) can be found digitized (77.5 GB). They present an amazing time machine viewport on fashion, music, movies, politics, electronics, and so forth. Interestingly, it was a Playboy centerfold that was used for the development of the JPG standard in imaging. theatlantic.com/technology/a…
2
18
Replying to @BrianRoemmele
Like others have said: 1. Need any help? 2. Want more old documents? Like National Geographics from 1920s? Various other old materials? It'd be nice if you could get almost everything available. 3. This sounds like such an important project ...
2
25
Replying to @BrianRoemmele
The can-do attitude is infectious. After a few hours of reading Whole Earth and I wanted to build a buckyball green house in my side yard fed by vermicomposting. wholeearth.info Excited to use your AI
2
15
Replying to @BrianRoemmele
this is great. the best copy ever written shows up in magazines.
1
3