Hackworth

Hackworth@piefed.ca · edit-2 11 days ago

Hackworth@piefed.ca · edit-2 11 days ago

As I understand it, CLIP (and other text encoders in diffusion models) aren’t trained like LLMs, exactly. They’re trained on image/text pairing, which ya get from the metadata creators upload with their photos in Adobe Stock. Open AI trained CLIP with alt text on scraped images, but I assume Adobe would want to train their own text encoder on the more extensive tags on the stock images its already using.

All that said, Adobe hasn’t published their entire architecture. And there were some reports during the training of Firefly 1 back in '22 that they weren’t filtering out AI-generated images in the training set. At the time, those made up ~5% of the full stock library. Currently, AI images make up about half of Adobe Stock, though filtering them out seems to work well. We don’t know if they were included in later versions of Firefly. There’s an incentive for Adobe to filter them out, since AI trained on AI tends to lose its tails (the ability to handle edge cases well), and that would be pretty devastating for something like generative fill.

I figure we want to encourage companies to do better, whatever that looks like. For a monopolistic giant like Adobe, they seem to have at least done better. And at some point, they have to rely on the artists uploading stock photos to be honest. Not just about AI, but about release forms, photo shoot working conditions, local laws being followed while shooting, etc. They do have some incentive to be honest, since Adobe pays them, but I don’t doubt there are issues there too.

Hackworth@piefed.ca · 12 days ago

Not all models are trained in the same way. Adobe Firefly was trained only with images from Adobe Stock, for instance.

Hackworth@piefed.ca · edit-2 12 days ago

The Firefly image generator is a diffusion model, and the Firefly video generator is a diffusion transformer. LLMs aren’t involved in either process - rather the models learn image-text relationships from meta tags. I believe there are some ChatGPT integrations with Reader and Acrobat, but that’s unrelated to Firefly.

Hackworth@piefed.ca · edit-2 13 days ago

Here’s a metaphor/framework I’ve found useful but am trying to refine, so feedback welcome.

Visualize the deforming rubber sheet model commonly used to depict masses distorting spacetime. Your goal is to roll a ball onto the sheet from one side such that it rolls into a stable or slowly decaying orbit of a specific mass. You begin aiming for a mass on the outer perimeter of the sheet. But with each roll, you must aim for a mass further toward the center. The longer you roll, the more masses sit between you and your goal, to be rolled past or slingshot-ed around. As soon as you fail to hit a goal, you lose. But you can continue to play indefinitely.

The model’s latent space is the sheet. The way the prompt is worded is your aiming/rolling of the ball. The response is the path the ball takes. And the good (useful, correct, original, whatever your goal was) response/inference is the path that becomes an orbit of the mass you’re aiming for. As the context window grows, the path becomes more constrained, and there are more pitfalls the model can fall into. Until you lose, there’s a phase transition, and the model starts going way off the rails. This phase transition was formalized mathematically in this paper from August.

The masses are attractors that have been studied at different levels of abstraction. And the metaphor/framework seems to work at different levels as well, as if the deformed rubber sheet is a fractal with self-similarity across scale.

One level up: the sheet becomes the trained alignment, the masses become potential roles the LLM can play, and the rolled ball is the RLHF or fine-tuning. So we see the same kind of phase transition in prompting (from useful to hallucinatory), in pre-training (poisoned training data), and in post-training (switching roles/alignments).

Two levels down: the sheet becomes the neuron architecture, the masses become potential next words, and the rolled ball is the transformer process.

In reality, the rubber sheet has like 40,000 dimensions, and I’m sure a ton is lost in the reduction.

Hackworth@piefed.ca · edit-2 13 days ago

GLSEN’s Strengths and Silences: The Experiences of Lesbian, Gay, Bisexual and Transgender Students in Rural and Small Town Schools. The report documents the experiences of more than 2,300 LGBTQ students who attend secondary schools in rural areas. Findings demonstrate that compared to LGBTQ students in urban and suburban areas, LGBTQ students in rural schools are more likely to hear negative comments about gender expression and sexual orientation; feel unsafe at their schools due to their sexual orientation, gender identity, or gender expression, and experience verbal and physical harassment and assault due to these characteristics.

Also Rurality Matters in LGBTQ Slur Use

Hackworth@piefed.ca · 13 days ago

Adobe’s image generator (Firefly) is trained only on images from Adobe Stock.

Hackworth@piefed.ca · 14 days ago

Coincidentally, this paper published yesterday indicates that LLMs are worse at coding the closer you get to the low level like assembly or binary. Or more precisely, ya stop seeing improvements pretty early on in scaling up the models. If I’m reading it right, which I’m probably not.

Hackworth@piefed.ca · 14 days ago

There are AI’s that are ethically trained. There are AI’s that run on local hardware. We’ll eventually need AI ratings to distinguish use types, I suppose.

Hackworth@piefed.ca · 14 days ago

Yup! Certifying a workflow as AI-free would be a monumental task now. First, you’d have to designate exactly what kinds of AI you mean, which is a harder task than I think people realize. Then, you’d have to identify every instance of that kind of AI in every tool you might use. And just looking at Adobe, there’s a lot. Then you, what, forbid your team from using them, sure, but how do you monitor that? Ya can’t uninstall generative fill from Photoshop. Anyway, that’s why anything with a complicated design process marked “AI-Free” is going to be the equivalent of greenwashing, at least for a while. But they should be able to prevent obvious slop from being in the final product just in regular testing.

Hackworth@piefed.ca · 15 days ago

There’s a lot of research around this. So, LLM’s go through phase transitions when they reach the thresholds described in Multispin Physics of AI Tipping Points and Hallucinations. That’s more about predicting the transitions between helpful and hallucination within regular prompting contexts. But we see similar phase transitions between roles and behaviors in fine-tuning presented in Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs.

This may be related to attractor states that we’re starting to catalog in the LLM’s latent/semantic space. It seems like the underlying topology contains semi-stable “roles” (attractors) that the LLM generations fall into (or are pushed into in the case of the previous papers).

Unveiling Attractor Cycles in Large Language Models

Mapping Claude’s Spirtual Bliss Attractor

The math is all beyond me, but as I understand it, some of these attractors are stable across models and languages. We do, at least, know that there are some shared dynamics that arise from the nature of compressing and communicating information.

Emergence of Zipf’s law in the evolution of communication

But the specific topology of each model is likely some combination of the emergent properties of information/entropy laws, the transformer architecture itself, language similarities, and the similarities in training data sets.

Hackworth@piefed.ca · 16 days ago

The game is Codex Mortis (Steam).

“It’s pure TypeScript. I use PIXI.js for rendering, bitECS for the entity-component-system backend, and Electron to wrap it as a desktop app,” Crunchfest3 wrote. “The whole thing was vibe-coded with Claude Code (mostly Opus 4.1 and 4.5).” The art, meanwhile, was generated by ChatGPT, and the game’s animations “are a shader written by Claude Code.”

Hackworth@piefed.ca · edit-2 16 days ago

They used finetuning in the research, but you can definitely see this kind of behavior in the course of regular prompting, particularly as the context starts to fill up. (Possibly related to this paper?)

Hackworth@piefed.ca · 16 days ago

The loss of widely shared cultural touchstones in media has messed with my perception of time. But also, I’m getting old.

Hackworth@piefed.ca · 17 days ago

Be this the arts or the dark arts?

Hackworth@piefed.ca · edit-2 17 days ago

Copilot is just an implementation of GPT. Claude’s the other main one, at least as far as performance goes.

Hackworth@piefed.ca · 23 days ago

Use Me by Bill Withers

Hackworth@piefed.ca · 24 days ago

Westworld

Hackworth@piefed.ca · edit-2 26 days ago

Yeah, a more honest take would discuss the strengths & weakness of the model. Flux is still better at text than Nano Banana, for instance. There’s no “one model to rule them all,” as much as tech journalism seems to want to write like that.

Hackworth@piefed.ca · 26 days ago

Directly, generating higher res stuff requires way more compute. But there are plenty of AI upscalers out there, some better, some worse. These are also built into Photoshop now. The difference between an AI image that is easy to spot and hard to spot is using good models. The difference between an AI image that is hard to spot and nearly impossible to spot is another 20 min of work in post.