Creative Technologist
I make fun stuffs with and about computers.
Current: Building Queer Map Taiwan with SpOnAcT.xyz
Previous: Senior Creative Technologist at OUTFRONT Media
Previous: Senior Creative Technologist at OUTFRONT Media
What I did in my first half of thesis
Introduction
In 1968, 240,000 miles away from their home base, crew members from Apollo 8th held a meeting where humans and the blue marble had first met. In that same year, the picture "Earthrise" was published as the cover of Steward Brand's "Whole Earth Catalog." In the wake of the cold war's terror on doomsday warfare, witnessing all humanity inside a giant blue space shuttle, the picture served as a perfect way to bring people together.
Fifty years later, Artificial Intelligence has grown to wield the same potential.
While the astronauts were busy taking the first picture from outer space, explosive developments of modern machine learning systems have also opened up brand new perspectives of introspection.
On contemporary machine learning and personal motivation
To elaborate, the "Robots" as we call them today are dominated by neural network-based machine learning models, excelling at jobs like predictions and evaluations. While still categorized in the realm of tools, these machine learning systems have taken up an unprecedented amount of agencies as their responsibilities. Through the abundance of decision-making, humans and machines have formed a feedback loop, reinforced by invisible mundane interfaces, affecting how modern people navigate through one's thoughts, forming ideas, and communicating them. This implication of cultural shift led to the intention of this project: To explore a deeper, ontological implication of machine learning where it will examine the potential of contemporary machine learning systems' ability to be seen as an external recognition for human existential conditions.
On the medium
Domains of the project have been separated into medium and message. To start with the medium, taking inspiration from Wendy Hui Kyong Chun's Programmed Visions, where Chun described computers as metaphor machines, observing computers’ characteristics to both depend on and perpetuate metaphors, going further to identify a recursive relationship within layers of metaphors, where computers have become the metaphors for the metaphor itself.
Looking at text as a shared property in both machine and human, one in the settings of codes while the other in the realm of literature. Within both settings, text is granted the power of carrying out procedures, conveying and storing information. Accompanied by the contemporary development of parallel online social networks, the cyber-accelerated text has become the storage unit for modern identities.
Considering the potentials above on text as a medium, OpenAI's GPT-3 will be selected as the main subject for exploration and examination.
On GPT-3
GPT-3, short for Generative Pre-trained Transformer 3, is a Natural Language Processing(NLP) model developed by OpenAI that belongs to a specific subsection of machine learning: Neural Network. On a high level, it is a language prediction system based on complex statistic calculations involving a large amount of data.
On the message
Looking at the message as the contextual setting for potential material GPT-3 will engage with. The logic stems from a strategic dive into human moods, intending to create an environment to foster nuanced interaction between GPT-3 and human works. Aggregating personal experiences with a certain aesthetic found in various entertainment narratives: Sadness and existential dread. Though often superficially put together, collectively, they bridged a vague link between these dejected moods and the feeling of deep, humane meaning. This particular aesthetic serves as a commonplace for the general public to come together with the same underlying social and cultural context. They are presenting a taxonomy of fragments on modernity and post-modern human conditions. To note a few popular examples, including Fredric Jameson, Mark Fisher, and Jean Baudrillard. Where most of their works shared the same fatalistic view of our present world as a phantom-like continuation of the end of civilization, in particular with Fisher's sentiment from his theory of Hauntology, in that the future has long been canceled.
Compiling the taxonomic vignettes, the settings conclude with a feeling of failed promise, whereas melancholia has become a collective, nuanced mood within a contemporary setting. While often cited as synonymous with sad and dread, the history of the word "Melancholia" implies its close proximity to existential philosophy, with this pedigree suggesting its importance of defining an exclusive "Human" mood.
Early Prototype
The prototyping phase started with a series of explorations on fine-tuning GPT-3, meaning to provide additional training on the model with human-curated materials to control the output style, making the model behave in the manner of a particular type of writings.
During this phase, two writers were selected as they are both prominent figures in literature/performance working with existential human conditions: Samuel Beckett and Charlie Kaufman.
"I am not a person. This is the person I am.
My body is my person."
- GPT-3 model_ID "ft-172o2WnRIJTQwJHum8EID79Q", 2021
The text above is an output excerpt from one of the fine-tuned Kaufman models; the content generated is excitingly effective in preserving the training materials' characteristics in theme and writing style.
The initial reaction towards the generated text prompted questions regarding authorship; on a high level, an observation can be made asserting at least three parties involved in creating the texts: 1. The author of original training material. 2. The GPT-3 model. 3. The agents in charge of curating training materials. However, upon further examination of the model itself, things became interesting. Suppose the weight of training material warranted a spot on the creator list(1. The author of original materials). In that case, the core of GPT-3 and all the other contemporary NLP models contain a massive amount of training materials scraped from thousands of online forums. While tracking down and compiling a list of all the authors from the materials seems wildly unrealistic, abstractions can be applied to a macro level, with a slight shift in our perspective, assertions can be made that the text was written by all of humanity, in unity.
On Computational Solace
The introduction of modern technology came from a place with the need to enhance and augment human abilities. The most common and straightforward approach often involves numeric and symbolic abstraction of data and procedures. Where in the process of abstraction, the metaphors and signs began to shift humanities' cognitive framework. While we are trying to make machines more human-centered by introducing more friendly and invisible interfaces, the line within those interfaces separating human and machine has also become less opaque. The wielder and their tools became more alike.
Upon dissecting the collective melancholia of modernity, elements like the loss of common traditional value, alienation created by the tension between secularism and egalitarianism, and the dilution of cultural contexts, the process of the aforementioned abstractions can be traced back as one of the core components.
Through years of practices and iterations, thousands of abstractions have led modern societies to a place with ubiquitous, universal machines. Tools able to talk to each other, understand all kinds of users across the globe, tools that look at its users like numbers on a spreadsheet.
It is a visceral, modern disenchantment.
The project is set to situate GPT-3, the newest member amplifying this disenchantment, in a place where it recognizes the process, prompting reactions. The idea is to seek solutions within the very perpetrator of this phenomenon. To identify, leverage new potentials to restore depth in new layers of metaphorical abstractions introduced by modern metaphysical technologies like machine learning.
To provide a new form of comfort in machine learning systems' recognition of modern melancholia resulted from the traditional human values the abstractions had flattered.
Identifying Melancholia
The word melancholia is often associated with a calculated response to humanities' existential queries throughout history. When put in a contemporary context, accompanied by the arguments in the previous section, melancholia can be concluded with the feeling of a lack of witnesses, humanities' confrontation with the void of reconciliation. Where the void became a significant potential leverage point for modern machine learning systems like GPT-3, as an entity that not only recognizes but provides a further response as an external agent.
Prototype Iteration: Setting
The next prototyping phase began to explore the settings where machine-generated texts will reside. Aggregating the collective zeitgeist discussed above, this stage strategizes motifs to amplify the feeling of modern melancholia. Starting with the translation into spatial representations. Inspired by the view of an airplane in the air, in-between places: A space in the middle of nowhere. This harkens back to the non-places, places without a coherent cultural context present in modern disenchantment. This idea later came in the form of a physical model of an airplane's window frame, with a digital screen depicting the view outside. The screen displayed two versions of machine-generated contents:
1. GPT-3 generated texts trained with Kaufman's screenplays and novel.
2. A short looping clip of machine-generated cloud.
This approach to recreating and capturing modern melancholia was initially met with positive feedback from fellow students and class faculty. Still, gaps in contextual logic and weakness were identified upon further discussion. In conclusion, the general direction of this iteration was a step in the right direction, cautions have to be taken when diving deeper into the realms of machine-generated signs and literal real-world representations. The approach of assigning airplane windows as representations of a non-place and the host for machine-generated content fell short as it was too literal. The compressed metaphors failed to reenact and highlight the full range of machines' recognition.
Strategies and deliveries in five in five
Five in five is a long-running practice in the program, a design sprint that lasts five days; for each day, participants are expected to come up and fully execute a complete design. The scopes and themes vary depending on the domains each individual is working within.
Strategy coming into five in five ended with ways to identify a series of metrics, looking at different ways to evaluate the creative collaboration with various machine learning models and audiences' reactions this process elicited. The process concluded with the following four metrics:
1. On the characteristics of machine-generated text: Human-like (Generated from GPT-3) <-> Procudual generated(Generated from Markov chain).
2. On manual translating the machine-generated text into visuals: Objective <-> Subjective.
3. On the degrees of human intervention in the translation process: Maximum <-> Minimal.
4. On the resolution/style of the final translated piece: Literal <-> Visual.
Each metric contains two pieces of prototypes as extreme cases of their spectrums, serving as better fleshed-out representations for audiences to gauge the axes with the hope and assumptions that a potential sweet spot would lie somewhere between.
Reactions from the first metric indicated an inclination towards the human-like characteristics of GPT-3, while the latter three metrics all resulted in a place where a certain degree of human creative intervention is appreciated.
The collective audiences' response from this phase ended up as a validation of an earlier hypothesis on the interaction between machine learning and creative writing: A parallel within the uncanny characteristics of certain machine-generated texts and a specific writing style known as magical realism. The link between these two parties would later become the focal point of further exploration.
Text, metaphors, and space
Moving into the next phase of iteration, the scope has been focused on the act of un-flattening the GPT-generated text. The motif remained closely tied to space, as it is regarded as one of the fundamental properties in both physical and metaphysical settings. A new GPT-3 model has been fine-tuned to accommodate this specific theme. A model fine-tuned with the works from Argentine short-story writer Jorge Luis Borges. (The Library of Babel and The Garden of Forking Paths.) In the hope to generate outputs containing a highly dense and clear spatial description. Treating the outputs as prompts, as procedures to manually render a space in modern game engine Unity.
Focusing on the simulation characteristics of a game engine, able to replicate real-world properties with robust interactions. The idea is to harness the engine's power to manipulate properties of physics but still stay relatively grounded with objects, languages, and gestures from the real world. This tension on a narrow scope is perfect for an expected output from a GPT-3 model fine-tuned with the magical realism writings from Borges.
A series of recursive, iterative inquiries will be implemented on the method of collaboration with this GPT-3 model. First, the model was prompted with the following human written line:
"Walking towards the middle "
To extrapolate intentions on the prompt, the "The middle" would likely situate the model in a contextual setting of space; the verb "walking" provides additional framing of a human perspective. The two components were combined with the hope that the model would render the space with a sequence from the middle and outward. The following is the model's output:
"I passed a labyrinth of corridors and stairways. I finally arrived at a vast circular chamber with a diameter of some hundred yards. In the center was a large spherical clockwork. I sat down in a chair of crystal. The walls were covered with shelves which held books of all sizes. I opened one of these volumes. It was in French. I understood nothing. I examined another."
From the first interaction with the model, a relatively coherent and clear image could be drawn from the texts alone. The second step involved identifying the gaps in objects' descriptions and updating the resolutions for set objects. In this example, the "a large spherical clock" lacks further descriptions on its appearance besides the size. To which the model was prompted with the second human input:
"The spherical clockwork is "
This second layer prompt then would generate a new machine-generated paragraph of a further description of the object in interest:
"perhaps the most admirable of all possible styles For it combines in a unique manner the virtues of the abstract and the concrete. The spherical form of the gears implies their roundness; the absence of any straight line in the spherical form implies their circularity; the uniform color of the gears implies their unity; the uniform movement of the hands implies their rationality; the impossibility of finding a point."
The process would go on iteratively as a more detailed picture is formed, enough to jump into the next stage of 3D modeling and game scene building. The final piece is an immersive game scene with a controllable first-person point of view.
Re-orientation and Post Mortem
The explorations so far can be split into two intentions:
- To present machines' recognition of modern melancholia:
- To provide comfort through recursive acknowledgments. (Whether it is done by something human-created or just an aggregation of the human image.)
- Exam modern Natural Language Processing systems' capability of navigating through human nuances.
- Invite appreciation/discourses towards progress made in machine learning and existential implications of NLPs
- To Un-flatten GPT -3's text outputs:
- Harness the fluidity of text through low-level translation.
- Create tangible representations of the text outputs while maintaining the depth and flexibility of metaphors.
- To turn the "Uncanny valleys" characteristics from the machine-generated texts into literature's equivalent of "Magical reality."
As a part of an attempt to zoom out from the two previously separated intentions, a higher level of abstraction can be made to sum up and orient all the explorations so far: Looking at machine learning as an external entity. Inspired by Mark Fisher's book The weird and the eerie, where he stated the fascinating potential of a genuinely external perspective, juxtaposed with emphasis on the characteristics of physical, topographical, and historical, creates an additional dimension of "the beyond." In this project's settings, one can argue the core is to replace the traditional values with the modern machine learning systems' implication of an existence that is "out of this world" and "beyond", where it reacts to human existential conditions with the production of the new.
Aggregating feedback from the final crit session, the following is a list of things that require further research:
- Contemporary machine learnings' capabilities to provide a common voice, allowing the extra dimension to push outside of the human condition and look into it.
- The approach to get the experience of collaboration with GPT-3 on text generation across to the intended audience would be the focal point of this piece's setting.
- A clearer link between melancholia, machine learning, and introspection should be addressed.
- Cautious of the inherent tension between low-level text-based metaphors and literal representations.
At its core, Are you there Computer, it's us, Humans is working towards witnesses, recognition, and acknowledgment. Presenting appreciations towards the potential of machines as an entity bounded my lineage, yet truly external, posing the following question towards humans and their tools:
What do you see?