To a degree, my question is, how do you feel about others being able to generate content, especially when it is limited in flexibility and quality.
Also, I'm curious if you see the real potential market if you flipped the perspective, adopt the tech, and use it to your advantage. Maybe it is layering and backgrounds for composition, maybe it is full on training to generate content, or maybe it is simply maximizing time by allowing the AI to rework images.
Like the typical image generation process most people think about turns a text prompt into an image using an image consisting of mathematically random noise and turning it into a version of the prompt in a series of steps. There are other methods too. One method takes an image as input, overlays some noise, and then uses this as the baseline to generate an image from. Basically a blurry or bad image can add just a small amount of noise and the AI can render it better. This isn't like photo filters or editing. I would be using this to my advantage. I would also look very carefully at what is hard to generate with AI rn and focus on making stuff that it cannot do well. There is a lot more generated content than I thought before I learned how this works and what AI does poorly.
Honestly, may I ask, how do you perceive this?
I have used images to help me learn how training works with AI. It is far easier to see ass nipples are a mistake than it is to see that poor text training has resulted in a middle aged woman with excessive hairiness and a passion for gardening is now going by the name Harry Potter.
I may have a database of images and trained models that I have used to learn, not your content in particular, and not any particularly good results. I've mostly explored why labias are so bad with stable diffusion, and scrapped a couple of ftv galleries. I wouldn't call myself a fan of anyone really. I'm certainly not a mark in this space. My real interest is in other AI applications. Posting trained models of people seems too gray area for me. At the same time, this is becoming a super powerful tool that essentially expands exposure and likely attracts the type of person that would pay for more. Like the recent creation of Open Dream makes it possible to do image layering for complex composition. I'm curious about a content creator's take here.
It isn't too hard to read the way the scripts parse prompts. I haven't gone into much detail when it domes to stable diffusion. The GUIs written in gradio, like Oobabooga for text or Automatic1111 are quite simple python scripts. If you know the basics of code like variables, functions, and branching, you can likely figure out how the text is parsed. This is the most technically correct way to figure this stuff out. Users tend to share a lot of bad information, especially in the visual arts space, and even more so if they use Windows.
Because the prompt parsing method this is part of the script. If we don't know what software you are using, it is hard to tell you what to do with certainty. I think most are compatible, bit I don't know for sure. In the LLM text space, things like characters are parsed differently across various systems.
With Automatic1111, on the text2img page, there is a small red icon under the image that opens up a menu in the GUI and lists all the LoRAs you have placed in the appropriate folder for LoRAs on your host system where you installed A1111. Most of the LoRAs you download that show up on the text2img page will have a small circled "i" icon in one corner, this will usually contain a list of the text data that was used to train the LoRA. This text data was associated with each image. These are the keywords that will trigger certain LoRA attributes. When you have this LoRA menu open, if you click on any of the entries, it will automatically add the tag used to set the strength of the LoRA's influence on the prompt. This defaults to 1.0 but this is always too high. Most of the time 0.2-0.7 work okay. You also need the main key word used to trigger the prompt added somewhere in the prompt. This can be difficult to find unless to keep this information from the place you downloaded the LoRA from. Personally, I rename all of my LoRAs to whatever the keyword is. Also, you're likely going to get a lot of LoRAs eventually. Get in the habit of putting an image relative to what each LoRA does in the LoRAs folder. The image should be named the same as the LoRA itself. A1111 will automatically add this image to each entry you see in the GUI menu. LoRAs are not hard to train too. Try it some time. If you can generate images, you can train LoRAs.
It is easy to have too many cooks in the kitchen, but that is an easy problem to solve. Model decay is not a real problem if you understand how a LLM works. Overtraining is like burning a big dinner and ruining a meal. One doesn't stop cooking forever, or burn down the house and quit. You just cook another meal next time. If your model has 100 trillion tokens, you're likely to try your very best to salvage your massive ruined dish, but in the end, it doesn't matter. You can easily tweak the recipe for next time. Models have no persistent memory. Context can be used to train and turned into data, but it is a totally separate thing that is unrelated to the model itself. As an oversimplification, a LLM is just a large database of categories mixed with a massive amount of language data that enables a statistical calculation of what word should come next. This is a simple prediction of what word comes next. Everything else is censoring algorithms and illusions embedded into how humans use language. Really, thus is a tool to access culture through language, and in the case of larger models, the culture embedded into many different human languages.
This is as much of a "fad" now as the internet was in the late 90's, and this is on par with that change. LLMs are no fad. This is a tool as disruptive as the public internet. For instance, in 10 years, Google will be a relic of the past. AI will completely replace it. Education will also completely change. It is possible to have entirely individualized education. Physiology will change as a LLM can be tuned to address and help with many human social issues. This will change everything because it exists I'm the open source space already.
Today's the first and only time my main account has been useless. There have been minor issues, but nothing like this for me.
Might be the new admin call out for volunteers from a few days ago. If so, I think someone just failed the first day exam. The only way to deal with this is far far far more transparency about ineptitude, and someone that learns extremely quickly.
522, 502, now it is like someone has vindictively blocked my native account completely for not noting every trivial detail. This is the worst experience I've ever had on .world since I joined on June 9th.
They were the best of ~300 while I was trying different checkpoints and Loras. The first one was actually done by a Llama2 7B uncensored chat model in Oobabooga. I have no idea what final prompts it generated. This is one of the only ones without major errors, but the outputs were quite interesting. I'll share more of what it generated shortly. I made a NSFW chat character that liked "traveling and exhibitionism." It was a fun mix. It's like playing fax machine porn games waiting for image outputs with offline text generation
Why not require a negative prompt leading with child, kid, in the first 10 words? I'm not good at writing a bot, but it might even be possible to intercept images to check that this exists in the meta data before Lemmy wipes it....maybe. It might exclude a few ups, but it would greatly reduce the moderation load.