Introducing Typologies of Delusion

A historical woman mirrored by her fictional self. Generated with SD 1.5 fine-tuned with a hypernetwork trained with images from a 1970s Polygoon Journaal.

The swift advancement of AI-driven image-generation technologies is a topic of great importance to artists and the creative industries. Tools such as Midjourney, Adobe Firefly, and Stable Diffusion are rapidly transitioning from experimental to essential assets for digital artists and designers. However, we are still trying to grasp their practical potential, their meaningful application and legal implications. “Typologies of Delusion” is a project that seeks to contribute to this conversation via a research-through-practice approach.

Goals

While keeping an eye on emerging AI tools made for creatives, Typologies of Delusion focuses on the use of Stable Diffusion, an open-source image generation programme, to

  1. train neural networks informed by audiovisual archives,
  2. use these networks to render images that can provide new insights about the archive’s narrative and aesthetics, and
  3. unfold the storytelling potential of the tools in use for exploring alternative workflows in the ideation and production of audiovisual art.

The Polygoonjournaal collection

As a starting point, we are working with the emblematic “Polygoonjournaal”, a collection of more than 2000 newsreels produced between the early 1920s and the late 1980s by the Dutch production company Polygoon-Profilti. We are training neural networks with a variety of datasets made from these films, and we will use them to interrogate the archives.

From questions such as “How does beauty look like?” to “Imagine the solar system”, these experiments aim to build a speculative approach towards archive imagery and biases, using them as a source to help creative professionals in their process to either analyse, criticise or re-imagine media archives.

Ogen experiment 01
Early experiment: Image generated with the prompt “man staring at something”, using Stable Difussion 1.5 in combination with a hypernetwork trained with images from a Polygoon reportage about a Dutch company that produces prosthetic eyes.

Exploring the Latent Space

Gravital to this research is the concept of latent space, which refers to the abstract data space that results from the training of a set of images.

A latent space can be compared to an ocean filled with different species of fish, each representing a unique image. The depth, temperature, and salinity of the water (analogous to the dimensions in the latent space) determine the types of fish one can find. Similar species tend to swim in the same areas, so the closer two points are in this ocean, the more similar the fish (images) they represent.

Latent Garden
Image from Matúš Solčány’s “Latent Garden”. The project delves into the notion of latent spaces through a series of data visualisation experiments.

By exploring different regions of this ocean of semantic data, the machine learning model can combine image recipes and generate novel pictures that respond to the prompts or input provided by the user. An explicit example of this could be “a dog with the head of a cat”; a more abstract example would be combining terms such as love and violence in a vaguely descriptive prompt, such as “a couple, love and violence”Based on our experiments so far, purely abstract prompts tend to pull Stable Diffusion to iconic cliches that are not interesting for storytelling. Love, for instance, almost always generates a heart..

SM application 01
Image generation tools respond well to explicit prompts, such as “a dog with the head of a cat”. In this project, we aim to explore beyond conventions by interrogating archives with less descriptive prompts combined with custom-trained networks. On the right: an early experiment using the prompt “a couple, love and violence”.

Tool of choice: Stable Diffusion

Most image generation tools available in the market use standardized datasets to build their latent spaces. Proprietary tools like Midjourney, Dall-E or Adobe Firefly protect their latent spaces as black boxes and bet for a one-size-fits-all approach. This can be useful for most commercial work, as images in advertisements and entertainment follow similar trends when it comes to aesthetics and ideals of beauty. This project, however, wants to touch deeper into the ontological surface of digital images. For that reason, our tool of choice is Stable Diffusion, an open-source programme that allows many forms of customization, both in the training of datasets and the diffusion stage, where images are generated based on prompts and parameters.

SM application 02
Different stages of image generation. Most programmes allow input in the diffusion stage only. Stable Diffusion allows it in all stages.

Furthermore, the open-source nature of Stable Diffusion fosters a dynamic community of users and startups who are customizing the tool to build their own products. As a consequence, the outcomes of this research will support their effort, ultimately enriching the entire community.

Latent Captures
On the top row, we can see images from Polygoon used for training the neural network. The following rows show the gradual steps of the formation of an image informed by the network’s latent space. Row 1 has been prompted with the word “man”, row 2 with “woman”, and row 3 with “car”.

Visual typologies as a method

Another key element of this research is the use of visual typologies as artistic means to keep a record of interactions with Stable Diffusion and explore the aesthetics of serial production, which, as most forms of generative media, comes naturally to its use.

In a general context, typological analysis is defined as the study of categorising things based on shared characteristics. It has been used since ancient times to establish classes and hierarchies to help societies grasp the natural world that surrounds them (like types of animals and plants), and arguably its use expanded in the 17th and 18th centuries with the rise of rationalism and the scientific revolution.

In the arts, visual typologies –especially from the 20th century onwards– have also played a significant role as a method that supports the visualization of concepts. If only focus on photographic images (the kind of image we are interested in producing), important examples of this are:

  • Eadweard Muybridge’s motion studies (1870s), a series of visual typologies that used high-speed photography to capture animal movement in time. This is considered a fundamental piece of research to pave the way for cinema.
  • Karl Blossfeldt’s botanical photography collection, “Urformen der Kunst” (1928), which creatively combines images of plants and flowers portrayed in a scientific manner, showcasing their inherent visual harmony and supporting theories of composition such as the golden ratio.
  • Joseph Kosuth’s artwork “One and Three Chairs” (1965), which challenged object perception by positioning three expressions of the same concept, chair, next to each other.
  • Hilda and Bernd Becher’s industrial photography series, known as “Typologies,” started in the 1960s and continued until the 2000s. In these works, the couple assembled series of photographs showcasing industrial architecture, emphasizing their uniformity and similarity, while highlighting the architectural concepts underlying the individual pieces.
  • Ed Ruscha’s “Twenty Six Gasoline Stations” (1963). Together with Kosuth and the Becher’s works, this is another foundational artwork in conceptual photography.
  • Andy Warhol’s celebrity screenprints, which utilize visual typologies to raise awareness about the mechanical reproduction of images. The serial nature of the prints interacted with their Zeitgeist, reflecting on the emerging trends of mass media, celebrity culture, and industrial design from the 1960s and 70s.
Bernd and Hilla Becher 2 Sculptures
Bernd and Hilla Becher’s use of visual typologies takes us outside the self-contained narrative of one photograph by showing many of them. By placing the same object one next to the other, we start not only to find common patterns, but most interestingly we start to visualize the concept behind the individual objects.

One element that is common to all these examples is the way the use of visual typologies supports the visualisation of concepts. By looking at the group instead of the unique piece, it is possible to provide an optical mapping of the world depicted by the artist. Regardless of the artwork’s objective or subjective ambitions, visual typologies are invitations to think associatively and analytically without the need to provide explanations. We believe this can be a helpful method to showcase the images we will collect from the synthetic oceans of archival imagery we are creating. In doing this we will also be leaving space for the viewer to help us discover patterns and draw further conclusions.

Delusions as a creative north star

NYT article
Kevin Roose’s interview with Bing’s new search assistant (NYT, 16 Feb. 2023) is a good example of generative AI going delusional. In a creative set-up, this unwanted tendency could be a desirable feature.

Delusion refers to a belief or idea that lacks a foundation in reality and persists despite contradictory evidence. It differs from a hallucination in that it involves an unwavering resistance to alternative viewpoints during the process of negotiating reality. For example, if I were to hear a voice that nobody else can hear, that could be a hallucination. With external support and guidance, I could acknowledge that it is not real and attribute it to a sensory misperception. However, if I were to persistently believe that the voice is genuine, disregarding all evidence to the contrary, that would be delusion.

Generative Artificial Intelligence, as it currently exists, often exhibits a tendency to generate text or images that may be based on inaccurate information. Good examples are posing a complex mathematical problem or requesting citations of criminal cases for a court proceeding to Chat GPT. Depending on the input, there is a significant likelihood that the AI may respond with bold confidence, only for us to later discover that the information is indeed false. This can be categorized as a form of hallucination.

Visual delusion of an Eye
Visual delusions of an eye generated from a neural network trained with a Polygoon reportage about prosthetic eyes.

In this research, we believe that the tendency of AI systems to generate hallucinatory content can be utilized as a creative tool to explore new workflows that can support creative professionals in storytelling. This involves venturing into uncharted regions of a neural network’s latent space, not to deceive or expose limitations, but to discover unique results that go beyond stereotypes, clichés, and imitating famous artists. In other words, if we want to use AI as a creative partner for generating fresh ideas, it is preferable for AI to have a delusional approach rather than being occasionally hallucinatory.

Experiments

With these ideas in the background, the main purpose of this project is to carry out experimentation with historical archives to gain new insights into the archives being used and explore no-code creative workflows that include generative AI and that can be inspiring for artists.

For that, we will be running two kinds of experiments,

1) Visual explorations that can allow us to depict concepts and their interaction with an archive.

For instance, consider the exploration of the concept of “woman” across distinct decades. We want to see how far we can get in the attempt to give the neural network space for interpretation.

Photo of a woman and a man
Experiment: representations of a man and a woman in the 1940s and 1980s. We used the same prompt (image of a woman/man) for the 1940s and 1980s different models.

2) Short audiovisual stories that result from the collaboration between artistic craftsmanship aided by generative tools, thinking of them as a means to inspire and facilitate the creative processes carried by an artist.

For example, the appropriation of certain features of a Polygoon film to build or sketch stories as means to communicate ideas and evaluate their potential.

This audio narrative experiment was created with the support of text and audio generation tools.

Collaboration with creative industries’ leading experts

We are developing this project with the help of The Netherlands Institute for Sound and Vision, The Netherlands’ largest media archive, and the assessment of Kaspar.ai, a Danish startup focused on the implementation of artificial intelligence for the creative industries.

If you are interested in this subject, stay tuned to this blog. We will be publishing more articles with reflections and results that are appearing along the way.

If you are interested in partnerships, or collaborations or provide valuable feedback, do not hesitate to get in touch.

Leave a Reply

Your email address will not be published. Required fields are marked *

Share this post