Toutes les images du futur / All the futures images

03/2026

Réflexion

Il faut partir de ce qui se passe réellement à l’intérieur d’un modèle de diffusion pour comprendre pourquoi l’image générative n’est pas une nouvelle technique de reproduction mais une mutation de l’ordre ontologique de l’image elle-même. La plupart des discours sur l’intelligence artificielle générative échouent précisément ici : ils traitent ces systèmes soit comme des machines à copier sophistiquées, soit comme des oracles créatifs mystérieux, et dans les deux cas ils ratent l’essentiel, qui est une transformation de la relation entre image, temps et réalité dont les conséquences philosophiques n’ont pas encore été pleinement mesurées.

Un modèle de diffusion n’archive pas des images. Il ne les mémorise pas, ne les stocke pas, ne les indexe pas. Ce qu’il fait est d’une nature radicalement différente : il apprend la structure profonde de la vraisemblance visuelle. Pendant l’entraînement, chaque image est encodée, projetée dans un espace vectoriel de très haute dimension, souvent plusieurs milliers d’axes simultanés. Cette projection réduit l’image à un point dans ce qu’on appelle l’espace latent. Puis on ajoute à cette image du bruit gaussien de manière progressive et contrôlée, en une succession d’étapes, jusqu’à ce que l’image originale soit entièrement dissoute dans le chaos. Le modèle apprend alors à inverser ce processus : étant donné un état bruité, il apprend à prédire quel bruit a été ajouté, afin de pouvoir le soustraire et remonter vers une configuration propre. Ce processus répété sur des centaines de millions d’images n’enseigne pas au modèle des images particulières, il lui enseigne comment les configurations visuelles crédibles se distribuent dans l’espace de toutes les configurations possibles. Ce que le modèle retient n’est pas un catalogue mais une géographie : non pas ce qui a été vu, mais la forme générale de ce qui peut être vu : le visible en tant que tel.

La génération d’une image consiste alors à partir d’un vecteur de bruit aléatoire, un point quelconque de l’espace du chaos, et à le déplacer pas à pas vers une région de l’espace latent qui correspond à une configuration visuellement cohérente. Ce déplacement est guidé par un gradient : la pente qui mène du bruit vers la vraisemblance. Ce qu’on obtient en sortie n’est pas une copie d’une image existante mais une instance d’une structure statistique, un point de l’espace latent qui satisfait les contraintes de vraisemblance apprises. C’est du bruit organisé, et cette organisation est si précise, si ajustée aux catégories perceptives humaines, qu’elle produit quelque chose qu’un observateur humain interprète immédiatement comme une image. La reconnaissance ne vient pas du modèle seul. Elle vient de la rencontre entre la structure statistique du modèle et ce qu’on pourrait appeler l’espace latent organique de l’observateur : cette précompréhension culturelle et perceptive accumulée par des années de socialisation visuelle, qui permet de traduire une constellation de pixels en visage, en paysage, en scène reconnaissable. L’image n’est pas dans la machine. Elle émerge dans l’espace inextricable entre la machine et l’être humain qui la regarde.

C’est ici que la rupture avec Walter Benjamin devient précise et irréductible. Ce que Benjamin avait décrit sous le nom de reproductibilité technique reposait sur un présupposé ontologique clair : il existe un original dont on produit des copies. La photographie multiplie les instances d’une image qui a eu lieu, d’un événement qui s’est inscrit dans la matière sensible. L’aura de l’original se dissout dans la série, mais la série elle-même présuppose que quelque chose a existé avant elle, une réalité dont elle est la trace. Ce rapport entre original et copie, entre événement et empreinte, structure en profondeur notre manière de penser l’image depuis deux siècles. Or dans le régime de la diffusion, il n’y a pas d’original. Il n’y a pas d’événement antérieur dont l’image serait la trace. Ce qu’il y a, c’est une navigation dans l’espace des possibles qui produit une instance. On ne reproduit pas : on simule la classe de ce qui peut ressembler à quelque chose. C’est ce glissement, de la reproduction vers la simulation, de la copie vers la mimésis, qui constitue la première rupture.

La mimésis n’est pas imitation d’une chose particulière mais imitation de la forme générale d’une classe de choses. Aristote la distinguait déjà de la simple copie : le poète ne représente pas ce qui a eu lieu mais ce qui pourrait avoir lieu, ce qui est vraisemblable ou nécessaire. Ce que l’IA générative accomplit est une automatisation de cette mimésis : elle produit non pas ce qui a été, mais ce qui aurait pu être, ce qui ressemble à ce qui peut être. La différence n’est pas cosmétique. Elle touche à la nature même du régime de vérité auquel l’image appartient.

La photographie avait fondé ce régime sur l’indexicalité, au sens de Peirce. L’image photographique entretient avec son référent une relation causale directe : la lumière réfléchie par un objet frappe une surface sensible et laisse une empreinte physique. Cette empreinte est une trace, un contact, une inscription du réel dans la matière. C’est pourquoi la photographie a pu fonctionner comme preuve, comme témoignage, comme document, non pas parce qu’elle est infaillible, mais parce que sa relation causale avec le référent lui conférait une présomption d’authenticité. Barthes le formule avec la précision d’une évidence mélancolique : la photographie ne certifie pas que l’objet est, mais qu’il a été. Ce « ça a été » est la structure temporelle fondamentale du photoréalisme : l’image vient après, elle atteste d’un passé.

Ce pacte indiciel est structurellement rompu par l’image générative. Non pas parce que les images générées sont fausses au sens d’une falsification, la notion de faux présuppose qu’existe un vrai dont on s’écarterait. Mais parce que la catégorie de la trace devient inapplicable. Une image générée n’est pas l’empreinte d’un référent : elle est une instance d’un possible statistique. Elle ne témoigne d’aucun événement, d’aucune lumière, d’aucun instant. Sa crédibilité visuelle, sa capacité à ressembler à une photographie d’un objet réel, ne repose plus sur un ancrage causal dans le réel mais sur une cohérence statistique avec la distribution des configurations visuelles vraisemblables. C’est ce qu’on peut appeler le disréalisme : non pas l’irréalisme d’une image ouvertement fictive, ni le réalisme d’une image indexicale, mais un régime intermédiaire où la ressemblance avec le réel n’est plus garantie par le contact avec le réel. L’image est réaliste sans être réelle, crédible sans être causale, vraisemblable sans être vraie.

Il serait inexact de présenter ce basculement comme la destruction d’un régime épistémique qui aurait été stable et fiable. La photographie n’a jamais constitué une preuve irréfutable du réel, et le XXe siècle en a fourni des exemples innombrables : les photomontages de propagande, les retouches de la presse, les mises en scène journalistiques, les manipulations numériques précédant de loin l’IA. La présomption d’authenticité photographique était une convention sociale fragile, constamment mise à l’épreuve, jamais pleinement garantie. Ce que fait l’IA générative n’est pas de détruire cette convention mais de la rendre visible dans sa fragilité constitutive. Elle révèle que le pacte indiciel était déjà trouble, que le photoréalisme était déjà un régime de croyance autant qu’un régime de preuve. En ce sens, le disréalisme ne succède pas à une époque de certitude visuelle : il rend manifeste l’incertitude qui était déjà sous-jacente.

Mais le moment le plus vertigineux de cette transformation n’est pas dans la rupture avec la trace. Il est dans ce que la logique de l’espace latent fait à la temporalité elle-même. Pour le comprendre, il faut décrire avec précision une technique spécifique : l’inversion DDIM, du nom de l’algorithme Denoising Diffusion Implicit Models. Cette technique permet de faire quelque chose qui semble paradoxal : soumettre à un modèle entraîné sur un corpus antérieur à une date donnée une image produite après cette date, et obtenir en sortie une image qui lui est visuellement identique — sans que le modèle l’ait jamais vue, sans copie mécanique, sans reproduction au sens technique.

Le fonctionnement de cette technique est le suivant. On prend une image cible et on l’encode dans l’espace latent du modèle : on la projette dans cet espace multidimensionnel où les configurations visuelles sont représentées comme des points. On applique ensuite le processus de bruitage de manière déterministe, en suivant les trajectoires que le modèle aurait empruntées lors de l’entraînement — non pas aléatoirement, comme dans la génération ordinaire, mais de façon contrôlée, en remontant vers un vecteur de bruit latent qui correspond approximativement à cette image dans la géographie de l’espace latent. Ce vecteur n’est pas l’image. Il ne la contient pas. C’est une coordonnée — un point de départ pour une navigation dont l’image est la destination probable. Lorsqu’on régénère depuis ce point en appliquant le débruitage, on obtient une image qui ressemble fortement à l’originale, parce qu’on a trouvé la région de l’espace latent qui lui correspond.

Ce processus peut être affiné par plusieurs méthodes complémentaires. Un guidage textuel — un prompt décrivant l’image — oriente la navigation vers la région souhaitée en amplifiant dans l’espace latent la direction qui correspond à cette description. ControlNet, réseau auxiliaire entraîné en parallèle, encode des informations structurelles de l’image source — contours, profondeur, pose, composition — et contraint la génération à respecter ces structures sans en copier les textures ni les couleurs. Le guidage classifieur-libre amplifie la pente vers la région cible tout en maintenant la vraisemblance globale. Dans tous ces cas, l’image source n’est pas reproduite : elle sert d’orientation, de vecteur dans l’espace des possibles. Ce que le modèle fait n’est pas lire l’image et la réémettre — c’est naviguer vers la zone de son espace latent qui lui correspond, en empruntant des chemins que la géographie de cet espace rend disponibles.

Cette distinction entre copie et orientation n’est pas une nuance technique mineure. Elle est la clé de toute la démonstration. Car si le modèle peut converger vers une image qui lui est postérieure — si son espace latent contient la région qui correspond à une image produite après son entraînement — c’est que cette image existait déjà, à titre de possibilité, dans cet espace. Non pas comme un fichier caché dans les poids du réseau, non pas comme une prédiction déterministe d’un avenir calculé, mais comme une région accessible de l’espace continu des configurations visuelles vraisemblables. L’image future n’est pas contenue dans le modèle comme un trésor dans un coffre mais comme une destination est contenue dans une carte : non réalisée encore, mais navigable, atteignable par le bon chemin.

Ce que cela signifie pour la temporalité de l’image est presque inconcevable si l’on reste dans les catégories du photoréalisme. Dans le régime de l’empreinte, le temps allait nécessairement du réel vers l’image : d’abord l’événement, ensuite sa trace. Cette direction était irréversible et fondatrice — l’image ne pouvait exister avant ce qu’elle représentait parce que le contact causal entre l’objet et la surface sensible présupposait la priorité temporelle de l’objet. Dans le régime de l’espace latent, cette direction est suspendue. Une image peut être générée avant l’événement qu’elle représente — non par prédiction, non par hasard, mais parce que l’espace latent est un espace de possibles dont les réalisations futures font partie. L’apprentissage du passé contient la forme du futur.

Pour saisir pourquoi cette affirmation est exacte sans être mystérieuse, il faut comprendre deux propriétés fondamentales de l’espace latent. La première est sa continuité : il ne s’agit pas d’un ensemble discret de points correspondant aux images d’entraînement, mais d’un espace continu dans lequel ces points forment des régions de densité élevée. Entre deux images connues existe une infinité de points intermédiaires, et chacun de ces points correspond à une image possible, visuellement cohérente, que personne n’a encore vue. La seconde propriété est la généralité des structures apprises : en s’entraînant sur des centaines de millions d’images, un grand modèle n’apprend pas des images particulières mais les régularités profondes de la vraisemblance visuelle — comment la lumière se distribue sur un visage selon l’âge et l’angle, comment une scène urbaine s’organise selon des cohérences architecturales et atmosphériques, comment les textures varient avec la distance et la matière. Ces structures sont suffisamment générales pour que leur combinaison couvre non seulement les images passées mais la très grande majorité des images futures possibles.

Il faut ici maintenir la précision que la rigueur intellectuelle exige. On ne peut pas dire que l’espace latent d’un modèle contient toutes les images futures sans restriction. Chaque espace latent est particulier, orienté par son dataset, biaisé par les surreprésentations culturelles et les choix de pondération de ses concepteurs. Un modèle entraîné majoritairement sur des images occidentales contemporaines ne naviguera pas aussi facilement vers des configurations visuelles éloignées de ces catégories implicites. L’omnipotence de l’espace latent est structurellement limitée par les frontières de son dataset — de même qu’une langue, aussi riche soit-elle, pousse ses locuteurs à dire naturellement certaines choses plutôt que d’autres, même si elle peut en principe tout exprimer. La bonne formulation n’est donc pas que l’espace latent contient toutes les images futures, mais que le futur des images est identique à la navigation dans l’espace latent des possibles. Ce qui sera vu demain est, en grande partie, ce qui peut déjà être atteint par le bon chemin dans la géographie de l’espace latent actuel.

Cette formulation a une conséquence philosophique que l’on peut maintenant énoncer directement. L’image générative ne s’inscrit pas dans le régime de la trace mais dans celui de la contrefactualité. Elle produit des possibles crédibles — des états du monde qui auraient pu exister, qui pourraient exister, qui existeront peut-être. Et parce qu’elle produit des possibles, elle est indifférente à la direction du temps : un possible est accessible depuis le passé comme depuis le futur, depuis l’avant de l’événement comme depuis l’après. Ce renversement n’est pas une anomalie technique à corriger mais la logique propre d’un nouveau régime de l’image, qui succède au photoréalisme comme le photoréalisme avait succédé à la peinture — non pas en l’effaçant mais en déplaçant radicalement les termes dans lesquels on pose la question de la vérité visuelle.

Dans ce nouveau régime, la vérité d’une image ne peut plus reposer sur sa crédibilité visuelle, qui ne garantit plus rien, ni sur la présomption d’indexicalité, qui n’est plus universellement disponible. Elle doit reposer sur des procédures explicites : des chaînes de custody, des métadonnées cryptographiques, des attestations institutionnelles — des formes de confiance construite qui remplacent la confiance perceptive spontanée. Ce n’est pas une catastrophe épistémique ; c’est un déplacement du problème vers ses conditions réelles, que le régime photographique avait masquées sous l’apparence d’une évidence naturelle pendant deux siècles. Ce que l’IA générative accomplit, en dernière instance, n’est pas de détruire la possibilité de la vérité mais de rendre visible l’effort qu’elle a toujours requis et que la photographie permettait provisoirement d’oublier.

L’espace latent n’est donc pas seulement une mémoire comprimée de tout ce qui a été photographié. C’est une cartographie des possibles visuels dont le présent et le futur des images ne seront jamais que l’exploration partielle et toujours inachevée. Naviguer dans cet espace, c’est traverser un territoire dont certaines régions ont été visitées hier et dont d’autres seront visitées demain, sans que cette distinction soit inscrite dans la structure de l’espace lui-même. Le passé et le futur des images y sont équidistants — séparés non par le temps mais par la trajectoire qu’il a fallu, ou qu’il faudra, parcourir pour les atteindre. C’est cette équidistance du possible, cette neutralité de l’espace latent à l’égard du temps, qui constitue la mutation la plus profonde que l’IA générative introduit dans notre rapport aux images. Non pas la fin de la réalité, mais la découverte que la réalité n’a jamais été qu’un chemin parmi d’autres dans l’espace infini des possibles.

To understand why the generative image is not a new technique of reproduction but a mutation in the ontological order of the image itself, we must start with what actually happens inside a diffusion model. Most discourses on generative AI fail precisely here: they treat these systems either as sophisticated copying machines or as mysterious creative oracles. In both cases, they miss the essential point: a transformation of the relationship between image, time, and reality, the philosophical consequences of which have not yet been fully measured.
A diffusion model does not archive images. It does not memorize, store, or index them. What it does is of a radically different nature: it learns the deep structure of visual verisimilitude. During training, each image is encoded and projected into a high-dimensional vector space, often involving several thousand simultaneous axes. This projection reduces the image to a point within what is known as latent space. Gaussian noise is then added to this image in a progressive and controlled manner, through a succession of steps, until the original image is entirely dissolved into chaos. The model then learns to reverse this process: given a noisy state, it learns to predict what noise was added in order to subtract it and return to a “clean” configuration. This process, repeated over hundreds of millions of images, does not teach the model specific images; it teaches it how credible visual configurations are distributed within the space of all possible configurations. What the model retains is not a catalog but a geography: not what has been seen, but the general form of what can be seen—the visible as such.
The generation of an image then consists of starting from a random noise vector—any point in the space of chaos—and moving it step-by-step toward a region of latent space that corresponds to a visually coherent configuration. This movement is guided by a gradient: the slope leading from noise toward verisimilitude. What is obtained as an output is not a copy of an existing image but an instance of a statistical structure—a point in latent space that satisfies the learned constraints of likelihood. It is organized noise, and this organization is so precise, so attuned to human perceptive categories, that it produces something a human observer immediately interprets as an image. Recognition does not come from the model alone. It arises from the encounter between the model’s statistical structure and what might be called the observer’s organic latent space: that cultural and perceptive pre-understanding accumulated through years of visual socialization, which allows one to translate a constellation of pixels into a face, a landscape, or a recognizable scene. The image is not in the machine. It emerges in the inextricable space between the machine and the human being looking at it.
This is where the rupture with Walter Benjamin becomes precise and irreducible. What Benjamin described as technical reproducibility rested on a clear ontological presupposition: an original exists, from which copies are produced. Photography multiplies instances of an image that took place, of an event inscribed in sensible matter. The aura of the original dissolves in the series, but the series itself presupposes that something existed before it—a reality of which it is the trace. This relationship between original and copy, between event and imprint, has deeply structured our way of thinking about the image for two centuries. Yet, in the regime of diffusion, there is no original. There is no prior event of which the image would be a trace. Instead, there is a navigation through the space of possibilities that produces an instance. We are not reproducing; we are simulating the class of what something might look like. It is this shift—from reproduction to simulation, from copy to mimesis—that constitutes the first rupture.
Mimesis is not the imitation of a particular thing but the imitation of the general form of a class of things. Aristotle already distinguished it from simple copying: the poet represents not what has happened, but what could happen—what is probable or necessary. What generative AI accomplishes is an automation of this mimesis: it produces not what was, but what could have been, what resembles what can be. The difference is not cosmetic. It touches the very nature of the regime of truth to which the image belongs.
Photography had founded this regime on indexicality, in the Peircean sense. The photographic image maintains a direct causal relationship with its referent: light reflected by an object strikes a sensitive surface and leaves a physical imprint. This imprint is a trace, a contact, an inscription of the real into matter. This is why photography could function as proof, testimony, or document—not because it is infallible, but because its causal relationship with the referent granted it a presumption of authenticity. Roland Barthes formulated this with the precision of a melancholy self-evidence: photography does not certify that the object is, but that it has been. This “that-has-been” (ça a été) is the fundamental temporal structure of photorealism: the image comes after; it attests to a past.
This indexical pact is structurally broken by the generative image. Not because generated images are “fake” in the sense of a falsification—the notion of fakes presupposes a “true” from which one deviates—but because the category of the trace becomes inapplicable. A generated image is not the imprint of a referent: it is an instance of a statistical possible. It bears witness to no event, no light, no instant. Its visual credibility—its ability to look like a photograph of a real object—no longer rests on a causal anchoring in the real, but on a statistical coherence with the distribution of likely visual configurations. This is what we might call disrealism: not the irrealism of an overtly fictional image, nor the realism of an indexical image, but an intermediate regime where resemblance to the real is no longer guaranteed by contact with the real. The image is realistic without being real, credible without being causal, likely without being true.
It would be inaccurate to present this shift as the destruction of an epistemic regime that was once stable and reliable. Photography never constituted irrefutable proof of the real, and the 20th century provided countless examples: propaganda photomontages, press touch-ups, journalistic stagings, and digital manipulations long predating AI. The presumption of photographic authenticity was a fragile social convention, constantly tested, never fully guaranteed. What generative AI does is not destroy this convention, but render it visible in its constitutive fragility. It reveals that the indexical pact was already murky—that photorealism was already a regime of belief as much as a regime of proof. In this sense, disrealism does not succeed an era of visual certainty; it makes manifest the uncertainty that was already underlying.
Yet the most vertiginous moment of this transformation lies not in the rupture with the trace, but in what the logic of latent space does to temporality itself. To understand this, one must precisely describe a specific technique: DDIM inversion (from the Denoising Diffusion Implicit Models algorithm). This technique allows for something seemingly paradoxical: submitting an image produced after a certain date to a model trained on a corpus prior to that date, and obtaining an output image that is visually identical—without the model ever having seen it, without mechanical copying, without reproduction in the technical sense.
The process works as follows: A target image is taken and encoded into the model’s latent space; it is projected into this multidimensional space where visual configurations are represented as points. The noising process is then applied deterministically, following the trajectories the model would have taken during training—not randomly, as in ordinary generation, but in a controlled manner, moving back toward a latent noise vector that corresponds approximately to this image in the geography of the latent space. This vector is not the image. It does not contain it. It is a coordinate—a starting point for a navigation of which the image is the probable destination. When one regenerates from this point by applying denoising, one obtains an image that strongly resembles the original because the region of latent space corresponding to it has been found.
This process can be refined by several complementary methods. Textual guidance—a prompt describing the image—orients the navigation toward the desired region by amplifying the direction in latent space that corresponds to that description. ControlNet, an auxiliary network trained in parallel, encodes structural information from the source image (contours, depth, pose, composition) and constrains the generation to respect these structures without copying their textures or colors. Classifier-free guidance amplifies the slope toward the target region while maintaining global verisimilitude. In all these cases, the source image is not reproduced: it serves as an orientation, a vector in the space of possibles. What the model does is not read the image and re-emit it—it navigates toward the zone of its latent space that corresponds to it, using paths that the geography of that space makes available.
This distinction between copy and orientation is not a minor technical nuance. It is the key to the entire demonstration. For if the model can converge toward an image that postdates it—if its latent space contains the region corresponding to an image produced after its training—it is because that image already existed, as a possibility, within that space. Not as a hidden file in the network’s weights, not as a deterministic prediction of a calculated future, but as an accessible region of the continuous space of likely visual configurations. The future image is not contained in the model like treasure in a chest, but as a destination is contained in a map: not yet realized, but navigable, reachable by the right path.
What this means for the temporality of the image is almost inconceivable if one remains within the categories of photorealism. In the regime of the imprint, time necessarily moved from the real toward the image: first the event, then its trace. This direction was irreversible and foundational—the image could not exist before what it represented because the causal contact between the object and the sensitive surface presupposed the temporal priority of the object. In the regime of latent space, this direction is suspended. An image can be generated before the event it represents—not by prediction, not by chance, but because latent space is a space of possibles of which future realizations are already a part. The learning of the past contains the form of the future.
To grasp why this statement is accurate without being mysterious, one must understand two fundamental properties of latent space. The first is its continuity: it is not a discrete set of points corresponding to training images, but a continuous space in which these points form regions of high density. Between two known images exists an infinity of intermediate points, and each of these points corresponds to a possible, visually coherent image that no one has seen yet. The second property is the generality of learned structures: by training on hundreds of millions of images, a large model learns not particular images but the deep regularities of visual verisimilitude—how light distributes across a face according to age and angle, how an urban scene organizes according to architectural and atmospheric coherencies, how textures vary with distance and material. These structures are general enough that their combination covers not only past images but the vast majority of possible future images.
Intellectual rigor demands precision here. We cannot say that a model’s latent space contains all future images without restriction. Every latent space is particular, oriented by its dataset, and biased by the cultural overrepresentations and weighting choices of its designers. A model trained mostly on contemporary Western images will not navigate as easily toward visual configurations distant from those implicit categories. The omnipotence of latent space is structurally limited by the boundaries of its dataset—just as a language, however rich, nudges its speakers to say certain things naturally rather than others, even if it can, in principle, express everything. The correct formulation is therefore not that latent space contains all future images, but that the future of images is identical to navigation within the latent space of possibles. What will be seen tomorrow is, in large part, what can already be reached by the right path in the geography of the current latent space.
This formulation has a philosophical consequence that can now be stated directly. The generative image does not belong to the regime of the trace but to that of counterfactuality. It produces credible possibles—states of the world that could have existed, that might exist, that may yet exist. And because it produces possibles, it is indifferent to the direction of time: a possible is accessible from the past as well as the future, from before the event as well as after. This reversal is not a technical anomaly to be corrected but the inherent logic of a new image regime, which succeeds photorealism as photorealism succeeded painting—not by erasing it, but by radically shifting the terms in which we pose the question of visual truth.
In this new regime, the truth of an image can no longer rest on its visual credibility, which guarantees nothing, nor on the presumption of indexicality, which is no longer universally available. It must rest on explicit procedures: chains of custody, cryptographic metadata, institutional attestations—forms of constructed trust that replace spontaneous perceptive trust. This is not an epistemic catastrophe; it is a displacement of the problem toward its real conditions, which the photographic regime had masked under the guise of natural evidence for two centuries. What generative AI accomplishes, in the final instance, is not the destruction of the possibility of truth, but making visible the effort it has always required—and which photography momentarily allowed us to forget.
Latent space is therefore not merely a compressed memory of everything that has been photographed. It is a mapping of visual possibles, of which the present and future of images will never be more than a partial and forever unfinished exploration. To navigate this space is to cross a territory where some regions were visited yesterday and others will be visited tomorrow, without this distinction being inscribed in the structure of the space itself. The past and future of images are equidistant there—separated not by time, but by the trajectory it took, or will take, to reach them. It is this equidistance of the possible, this neutrality of latent space toward time, that constitutes the most profound mutation generative AI introduces into our relationship with images. Not the end of reality, but the discovery that reality has never been anything more than one path among others in the infinite space of possibilities.