AI Image Generation: How a Text Prompt Inverted Photography Itself in the 2020s

May 28, 2026

On a Thursday afternoon in July 2022, a small San Francisco research lab called Midjourney opened its image-generation platform to public beta access through the Discord chat application. The platform operated through a simple interface mechanism: users joined a Midjourney Discord server, typed the slash command /imagine followed by a text description of an image, and waited approximately 60 seconds for the platform’s machine-learning infrastructure to generate four candidate images responding to the prompt. The candidate images appeared in the Discord channel as four small thumbnails arranged in a two-by-two grid. The user could then request an upscaled version of any single image, or request additional variations on a selected candidate. The platform’s underlying technology operated through a diffusion-model architecture conditioned on text embeddings from the prompt, generating final images through iterative denoising of random noise across approximately 30 inference steps. The Midjourney public beta operated from a small infrastructure team led by David Holz, the founder of the Leap Motion gesture-recognition company, who had launched Midjourney as an independent research laboratory across the prior eighteen months.

By December 2022, approximately five months after the public beta opened, Midjourney’s Discord-based user base had crossed 4 million users. By the close of 2023, the user base ran approximately 16 million. The Stable Diffusion model, released as open-source software by Stability AI in August 2022, generated thousands of derivative model variants across the same window, with the open-source distribution producing image generation accessible across local-machine deployment rather than requiring centralized server infrastructure. OpenAI’s DALL-E 2 had released in April 2022 as the prior consumer-accessible diffusion-model product, with DALL-E 3 following in October 2023 with integration into the ChatGPT conversational interface. Google’s Imagen 3 released across 2024. Flux from Black Forest Labs released in August 2024 as a competing open-source model. The structural release sequence across the 2022-to-2024 window converted research-laboratory technology into consumer-accessible infrastructure across approximately twenty-four months of compressed deployment.

The 2020s AI image generation operation inverted the 2010s Petra Collins operation across every load-bearing variable. Collins had operated as a single 21-year-old photographer working with a 1976-vintage Canon AE-1 and an iPhone, executing campaign imagery for American Apparel out of suburban bedrooms with available light and a pink theater-gel-covered Vivitar 285 flash, distributing through Tumblr and Instagram outside the conventional magazine-portfolio routing infrastructure. The 2010s Collins operation had required a human photographer, a physical camera, a physical subject, and a physical location to produce a photograph. The 2020s AI image operation requires no human photographer beyond the prompt-entry operator at a keyboard. No physical camera. No physical subject. No physical location. The “photograph” is generated through diffusion-model inference operating on text input, producing pixel arrays that visually resemble photography but originate from no photographic exposure event. Where Collins had photographed her actual friend Alice Lancaster in an actual bedroom under actual gel-filtered flash light, the 2020s operation generates photographs of subjects who have never existed, in spaces that have never existed, under lighting conditions that have never occurred.

AI image generation was not just a tool. AI image generation was the structural inversion of photography itself.

The Model Routing

The underlying technical infrastructure routed through approximately a decade of machine-learning research before the 2022 consumer-accessibility release window. Ian Goodfellow, then a doctoral student at the University of Montreal under the supervision of Yoshua Bengio, introduced Generative Adversarial Networks (GANs) in a paper published at the December 2014 NeurIPS conference. The GAN architecture operated through a paired-network structure: a generator network attempting to produce images that would fool a discriminator network, with the two networks trained against each other across iterative optimization cycles until the generator produced outputs that the discriminator could not distinguish from real images. The GAN architecture generated significant research-community attention across the subsequent five years, with applications including StyleGAN (NVIDIA, 2018), CycleGAN, and the broader family of generative image models that the research literature developed across the 2014-to-2020 window.

The diffusion-model architecture that ultimately drove the 2022 consumer release routed through a separate technical lineage. Jonathan Ho, Ajay Jain, and Pieter Abbeel at UC Berkeley published the Denoising Diffusion Probabilistic Models paper at the June 2020 NeurIPS conference, establishing the technical foundation for the subsequent generation of text-to-image models. The diffusion approach operated through a structurally different mechanism than GANs: rather than training a generator-discriminator pair, the diffusion model learned to gradually denoise a sequence of progressively-corrupted versions of training images, with the inference process running in reverse to generate new images from random noise through iterative denoising steps. The diffusion architecture proved more stable in training, more responsive to conditioning inputs, and more reliable at producing high-resolution output than the GAN alternatives, generating the technical foundation for the 2022 consumer-deployment window.

OpenAI released DALL-E in January 2021 as the first widely-publicized text-to-image diffusion model, with the system generating 256-by-256 pixel images from text prompts through a closed research-access program. The output quality ran below subsequent thresholds but established the proof-of-concept that the diffusion-model architecture could generate visually coherent imagery from natural-language input. DALL-E 2 followed in April 2022 with significantly improved output quality at 1024-by-1024 pixel resolution, distributed through a closed beta access program that required users to register on a waitlist before receiving generation access.

Midjourney’s July 2022 public beta opened the consumer-accessibility window. Holz had founded the company across 2021 with a research staff that ran below ten people, operating from a small San Francisco office without venture-capital financing in the conventional pattern. The decision to launch through Discord rather than through a dedicated web application routed the platform’s user-acquisition through the existing Discord community infrastructure, with users joining the Midjourney Discord server and operating the image generation through chat commands rather than through a conventional product interface. The Discord deployment generated structural network effects: users posting their generations in the public server channels created continuous social proof and prompt-engineering examples that subsequent users learned from across the chat history.

Stable Diffusion released in August 2022 as the first open-source consumer-accessible diffusion model. Stability AI, the British company that had funded the model’s training across collaboration with the Ludwig Maximilian University of Munich and the RunwayML startup, released the model weights publicly under a permissive license that allowed download, modification, and commercial application without licensing fees. The open-source release generated immediate downstream effects: thousands of derivative model variants emerged across the subsequent eighteen months as developers fine-tuned Stable Diffusion on specific domains (photorealistic portraits, anime illustration, architectural visualization, and specialized aesthetic registers), with the Civitai community platform aggregating approximately 100,000 user-trained model variants by the close of 2023.

The structural reality of the 2022 release window ran a four-month compression of research-laboratory technology into consumer-accessibility infrastructure. The April-through-August 2022 sequence (DALL-E 2 closed beta, Midjourney public beta, Stable Diffusion open-source release) routed the broader cultural-discourse coverage of AI image generation across mainstream press, with The New York Times, The Atlantic, The Verge, Wired, and the broader technology-press infrastructure running continuous coverage from August 2022 forward. The structural-cultural arrival of AI image generation as an accessible consumer technology compressed into a single calendar quarter.

The Image Specification

The text-to-image prompt operated as the operation’s primary creative input. The prompt structure evolved across the 2022-to-2024 window from simple descriptive sentences toward an increasingly specific vocabulary that prompt-engineering communities developed through trial-and-error experimentation. The basic prompt structure ran: [subject], [composition], [lighting], [style/medium], [camera/lens specification], [aspect ratio], [additional modifiers]. A representative prompt example: “a woman in her thirties wearing a yellow raincoat, walking through a rain-soaked Tokyo street at night, shot on Kodak Portra 400 film, 85mm lens, shallow depth of field, neon reflections in puddles, cinematic composition, photorealistic, aspect ratio 16:9.”

The diffusion inference process executed across approximately 20-to-50 steps depending on the model configuration. Random Gaussian noise generated the initial state. The text encoder (CLIP in earlier models, T5-XXL in newer flux-class models) converted the prompt into a numerical embedding that conditioned the denoising process. Each inference step ran approximately 100-to-200 milliseconds on consumer GPU hardware, with the cumulative inference time running from approximately 3 seconds on premium cloud infrastructure to approximately 30 seconds on consumer hardware. The output ran at native resolutions of 1024-by-1024 pixels for first-generation outputs, with subsequent upscaling operations bringing the final output to 2048-by-2048 or higher resolutions through dedicated upscaling models that the platforms integrated into the generation pipeline.

The prompt-engineering vocabulary that developed across the 2022-to-2024 window ran through several specific patterns. The descriptive language for subject ran straightforward. The compositional vocabulary borrowed from established cinematography and photography terminology: rule of thirds, leading lines, frame within frame, low angle, high angle, dutch tilt. The lighting vocabulary ran specifically through photographic lighting terminology: golden hour, blue hour, soft natural light, hard directional flash, rim lighting, chiaroscuro, three-point lighting. The lens specifications operated through real-world camera-equipment terminology: 35mm, 50mm, 85mm, 105mm, with depth-of-field specifications (shallow, deep, f/1.4, f/2.8) calibrating the rendered defocus characteristics.

The film-stock references emerged as a structural prompt-engineering pattern by 2023. “Kodak Portra 400” prompts generated outputs with the warm-skin-tone and soft-grain characteristics of the actual Portra 400 film stock. “Fujifilm Velvia” prompts generated saturated-color characteristics of the actual Velvia transparency film. “Cinestill 800T” prompts generated the tungsten-balanced cinematic film stock’s characteristic halation around bright light sources. The prompt-engineering community developed extensive reference databases documenting which real-world film stocks and camera systems generated which specific output characteristics across the major model variants.

The “in the style of [photographer name]” prompt construction generated the most significant copyright-and-attribution controversy across the deployment window. Prompts including specific photographer names (Annie Leibovitz, Steve McCurry, Petra Collins, Wolfgang Tillmans, Cindy Sherman, Nan Goldin) generated outputs that visually replicated the named photographer’s aesthetic vocabulary. The platforms had trained their underlying models on internet-scraped image datasets that included extensive examples of each photographer’s published work, with the photographer-name conditioning routing the diffusion process toward outputs in the corresponding visual register. The structural mechanism operated as a form of style-replication that the conventional photography-industry licensing infrastructure had not been positioned to address through its existing rights-management frameworks.

The negative-prompt mechanism allowed users to specify elements that the generation should exclude. A negative prompt of “deformed hands, blurry, low quality, watermark, signature” became a standard inclusion across most prompt structures by 2023, with the negative-prompt instructions filtering out common diffusion-model failure modes that the early-generation outputs had been prone to producing.

The seed value, an integer between 0 and approximately 4 billion that initialized the random-noise starting state, allowed reproducibility of specific generations. Users who generated a successful output could share the prompt and the seed value together, with subsequent users able to regenerate the same image on the same model version. The seed mechanism generated the structural reproducibility infrastructure that the prompt-engineering community required to share and iterate on specific image generations across platforms.

The Copyright and Training Data Question

The structural legal and ethical question that the AI image generation industry had not resolved across the 2022-to-2025 window routed through the training datasets that the major models had been built on. LAION-5B, the dataset assembled by the LAION (Large-scale Artificial Intelligence Open Network) German nonprofit research organization, contained approximately 5.85 billion image-text pairs scraped from public web sources without permission from the underlying copyright holders. The dataset functioned as the structural training input for Stable Diffusion and multiple other diffusion-model variants across the deployment window. The structural mechanism of training a generative model on the dataset routed through standard machine-learning gradient-descent operations across approximately 150,000 GPU-hours of compute work, with the model weights subsequently encoding statistical patterns extracted from the training images.

The structural legal question across the post-deployment window asked whether training a generative model on copyrighted images constituted fair use under existing copyright law or whether it constituted copyright infringement at training scale. The fair-use analysis operated through the four-factor test that U.S. copyright law had developed across the 1976 Copyright Act and subsequent case law: the purpose and character of the use, the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect on the potential market for the copyrighted work. The novel application of fair-use analysis to AI training had not been definitively resolved at any appellate level at the time of writing, with the federal district courts working through multiple parallel cases across extended litigation timelines.

Getty Images filed suit against Stability AI in the U.S. District Court for the District of Delaware in February 2023, alleging that Stability AI had used approximately 12 million Getty Images photographs in the Stable Diffusion training dataset without licensing authorization. The complaint included evidence of Stable Diffusion outputs containing partially-rendered Getty Images watermarks, suggesting that the model had learned to replicate the watermark pattern alongside the underlying visual content. Sarah Silverman, alongside the authors Christopher Golden and Richard Kadrey, filed suit against OpenAI and Meta in July 2023 alleging unauthorized use of their copyrighted works in training datasets. The Authors Guild filed a class-action suit against OpenAI in September 2023. The Andersen, McKernan, and Ortiz suit against Stability AI, Midjourney, and DeviantArt, filed in January 2023, represented a class of visual artists alleging similar training-data copyright infringement.

The structural reality across the post-litigation window ran an unresolved tension. The models had been trained. The model weights existed across millions of downloaded copies in the open-source case and across continuous deployment in the closed-source commercial-platform case. The inference operations could not be reversed. The training-data extraction that generated the model weights could not be undone without retraining the models from scratch on different datasets, which the commercial platforms had not committed to doing across the litigation timeline. The courts continued working through the legal framework on extended timelines while the commercial deployment proceeded at scale.

The position against the prior decade’s photographer-as-rights-holder infrastructure ran structural. The 2010s commercial photography market had operated on a clear chain of rights from the photographer who exposed each image, through the licensing arrangements that distributed the image to commercial clients, through the credit-and-attribution infrastructure that the publication industry had standardized across the prior century. Each photograph carried a known author, a known exposure event, a known location, a known subject, and a known rights-management framework. The 2020s AI image generation infrastructure routed through none of those structures. The generated images carried no human author at the conventional copyright-law definition, no exposure event, no physical location, no documented subject in the conventional photographic-subject sense, and no rights-management framework that the prior industry had been positioned to operate against.

The Commercial Deployment

The structural rollout of AI image generation across commercial photography applications ran across multiple parallel deployment fronts. The early advertising adoption emerged in 2023 across several high-profile brand campaigns that explicitly identified AI imagery as the production methodology. Heinz Ketchup ran a 2022-2023 Midjourney-generated campaign with the tagline “Even AI knows that ketchup is Heinz,” demonstrating that prompts including the word “ketchup” generated images that visually resembled Heinz Ketchup bottles regardless of brand specification. Vodafone ran brand work across 2023 using AI imagery for campaign visualization. Coca-Cola launched the “Create Real Magic” platform in 2024, integrating user-generated AI imagery into the brand’s continuing marketing infrastructure.

The fashion-industry deployment generated more significant industry controversy. Levi’s announced in March 2023 that the company would deploy AI-generated model imagery for portions of its e-commerce product photography across categories where photographer-and-model production had previously generated the imagery. The announcement generated immediate industry-press coverage discussing the displacement of model and photographer labor across the production-economics chain. The structural debate routed through questions about whether AI-generated models displaced employment opportunities, whether the resulting product imagery accurately represented the products to consumers, and whether the deployment generated structural diversity-and-representation issues that the prior physical-photography model had been navigating through actual model bookings.

The stock-photography deployment ran across the major platforms. Shutterstock announced a partnership with OpenAI in October 2022 to integrate DALL-E generation into the platform’s content library, allowing Shutterstock subscribers to generate custom imagery rather than license existing stock photographs. The partnership included a revenue-sharing arrangement designed to compensate stock-photography contributors whose images had been included in the DALL-E training dataset, generating an industry-first attempt at retroactive contributor compensation for training-data inclusion. Adobe launched Firefly in March 2023 as an AI-generation feature integrated into Creative Cloud, with the model trained on Adobe Stock content and licensed-public-domain imagery rather than internet-scraped data, generating a structurally different rights-management proposition than the major competing models.

The structural displacement question across the commercial deployment ran into documented industry-press coverage of position eliminations and revenue declines across the affected photography sub-industries. The American Society of Media Photographers reported approximately a 30 percent decline in commercial-photography assignment volumes across its surveyed member base between 2022 and 2024. The Professional Photographers of America reported similar contraction patterns. The Model Alliance, the U.S. professional-model advocacy organization, reported documented displacement of model bookings across the e-commerce product-photography sub-segment specifically. The stylist-and-location-scout-and-post-production sub-sectors that supported the prior decade’s commercial-photography production infrastructure ran corresponding contractions, with industry-survey data documenting position eliminations across the production-services chain.

The structural quantification of the displacement ran difficult through the deployment window. The AI-image volume scaled rapidly across the period: approximately 15 billion AI-generated images were created across 2022 and 2023 according to the Stanford AI Index Report. The corresponding decline in commercial-photography assignment volume ran approximately one-to-one with the AI-image-volume growth across the segments where AI-generated imagery directly substituted for the commissioned-photography production. The full displacement count across photographers, models, stylists, location scouts, retouchers, and post-production staff continued accumulating across the period without a definitive comprehensive count at the time of writing.

The Equipment Cancellation

The Canon AE-1 sits in the closet of millions of professional and amateur photographers who learned the craft on the camera. The professional camera bag with the bodies and lenses and lighting equipment sits on the shelf at the studio that has lost half its commercial-photography revenue across the prior two years. The Petra Collins Canon AE-1 still works, sitting in its case in her Toronto and New York apartments. The Vivitar 285 flash with the pink theater gel still operates. The bedrooms and bathrooms and parking lots that the prior decade’s photography had inhabited still exist as physical spaces that physical photographers can still photograph through physical exposure events.

The AI image generation infrastructure continues operating at scales that physical photography never matched. The approximately 15 billion AI-generated images created across 2022 and 2023 ran against approximately 1.8 trillion photographs taken globally in 2023, with the AI total running approximately 0.8 percent of the total photographic volume but scaling at orders-of-magnitude faster growth rate. The 2024 generation volume ran approximately 34 billion AI images according to subsequent industry estimates, suggesting the AI proportion was approaching 2 percent of total annual image production by the close of 2024. The structural trajectory pointed toward continued rapid scaling across the back half of the decade, with the AI infrastructure’s marginal cost of additional image generation running approximately $0.0001 per image on consumer hardware against the conventional commercial-photography per-image cost running approximately $50 to $5,000 depending on production tier.

The structural question across the back half of the decade runs whether physical photography retains its market position as the “authentic” or “documentary” mode that AI generation cannot replicate, or whether the visual-distinction infrastructure that the prior 150 years of photography had constructed dissolves entirely under the AI generation infrastructure’s continuing deployment. The conventional photographic-evidence infrastructure that the legal system, the news-press infrastructure, the family-archive tradition, and the broader cultural-memory infrastructure had operated through across the prior century routes through an open question about whether photographs continue to function as reliable indicators of physical-event occurrence as AI-generated photorealistic imagery scales toward general indistinguishability from physical-exposure imagery.

The text-prompt operating model that the 2020s installed runs in parallel rather than as full replacement, with the long-term outcome of the parallel operation continuing to resolve across the platform-distribution infrastructure that the decade’s broader visual-culture transformations route through. The Collins Canon AE-1 continues operating across whatever specific physical-photography contexts the photographer continues to pursue. The Midjourney Discord server continues operating across the continuous infinite-scale image generation that the platform’s infrastructure delivers. Both operations continue. Neither has resolved into structural dominance at the time of writing. The decade continues working through the structural question that the 2022 release window had opened, with the resolution running across the back half of the decade and into the next.

· · ·

Share 0

Post 0

Pin 0