What, really, is the intersection of AI and Generative Art ?

What if we could harness the power of generative art to celebrate difference rather than suppress it?

By 

Gretchen Andrew

Published 

January 20, 2025

What, really, is the intersection of AI and Generative Art ?

As editorial director of NFT Magazine I’ve been grappling with the simultaneous rise of AI with generative art. While its potential is undeniable, I’m concerned that the explosion of creativity is slightly misleading AI’s impact on our daily lives. Outside of the world of art I generally see AI making our lives more controlled, more normalized, less interesting, and less diverse. It feels less like a creative revolution than a straightjacket as AI is used to predict and make life predictable. 

My recent painting series, "Facetune Portraits," exhibited at fxhash during Berlin Art Week, was born from my fascination with generative art. As this powerful new medium entered the art world consciousness, I began to question its potential impact. If generative art thrives on randomness and algorithmic processes, what happens when we consider its opposite? What does it mean to be 'destroyed' by these very forces that are shaping our creative landscape?

I started to see a disturbing parallel between the unpredictable nature of generative art and the increasing homogeneity we witness in our daily lives. From the filtered faces we encounter on social media to the ubiquitous, mass-produced furniture that fills our homes, a sense of sameness seems to be creeping into every aspect of our existence.

This observation led me to investigate the phenomenon of "facetuning" – the use of AI-driven filters on platforms like TikTok and Zoom. These filters, while seemingly designed for enhancement, ultimately serve to compress human diversity into a single, idealized aesthetic. My "Facetune Portraits" series explores this homogenizing process, examining how these seemingly innocuous tools subtly erode our individuality.

But what if we could subvert this trend? What if we could harness the power of generative art to celebrate difference rather than suppress it? "fxFacetune" is our attempt to do just that. By combining the constraints of these AI filters with the unpredictable nature of generative algorithms, we create a unique and dynamic portrait experience. You, the collector, become an active participant, uploading your own face and witnessing how it is transformed in unexpected and often beautiful ways.

fxFacetune" is not about conformity; it's about embracing the infinite variations of human expression. It's about reclaiming our individuality in the face of increasing homogenization. We invite you to join me on this journey of artistic exploration and discover the exhilarating contrast between the expectations of AI and the boundless possibilities of human creativity.

This project is a collaboration with {protocell:labs}, a testament to the power of technology when used to liberate rather than constrain.  {protocell:labs} created custom image compression, developed new offline machine learning and pushed Tezos Params beyond previously existing uses so that you could create unique artwork based on user Input and randomness.  

Read on for detailed information about how {protocell:labs} made this possible. 

Gretchen Andrew, Artist and Editorial Director of NFT Magazine. 

A RATHER TECHNICAL DESCRIPTION OF fxFacetune by {protocell:labs}

Minting process

During minting, collectors upload an input image of their face, which is stored on-chain and serves as the artwork’s base. Face and background are detected in real-time from the input image. The algorithm then uses the detected facial features to generate three faces: original, distorted and tuned, cycling between them in a perpetual loop - you can click on the artwork to disrupt this process in real-time.

The minting interface guides collectors through this process. The input image is first compressed using a custom JPEG algorithm, encoded as Unicode ideographs, and finally stored on the Tezos blockchain in a single transaction. This 31.5 kB of on-chain data is used to reconstruct the face image every time the artwork is viewed in a browser.

Image input interface (left) and a final artwork (right)

Open-source

fxFacetune generator is built with p5.js, a very popular JavaScript library for creative coding. P5 is a library of choice on popular generative art platforms like fxhash and ArtBlocks, used most often for 2D canvas drawing. fxFacetune additionally utilizes p5.brush.js for realistic “analogue-looking” drawing brushes, a library built by an architect and generative artist himself, Alejandro Campos. Finally, ml5.js is used for the machine vision component of the artwork. All three libraries are free and open-source. In line with their artistic ethos, fxFacetune code itself will be made available on {protocell:labs} Github page under the MIT license, although it is already possible to extract the unminified version of the code using the Sources tab of your browser console.

Machine vision

Due to security reasons, fxhash generators are not allowed to access the “outside” world and make API calls to external websites, something which ml5.js is doing by default. This makes generative artworks safe - they are not able to hack your computer or send your personal data to a third party, but it also makes them more restricted than standard web apps. So, to make fxFacetune code compatible with the fxhash platform, it had to run an instance of ml5 machine vision modules locally or “offline”. To achieve this, the artists had to code and compile a custom build of the ml5.js library. 

There are two machine vision modules running in the background. Segmentation module separates the person in the image from the background, which is then replaced with a solid color from the chosen color palette. FaceMesh is the second module that detects 3D coordinates of 468 keypoints representing facial landmarks - these are used to draw the sketches of different faces based on the face detected in the input image. Both of these machine vision modules are standard in photo editing apps on social media.

Code outputs showing generative effects

Artwork uniqueness

Each artwork uses a collector-provided input image during the minting phase as the base for applying generative effects. The generative effects code, stored on IPFS (decentralized storage), remains the same for all artworks. Variations arise from different PRNG (pseudo-random number generator) seeds used in each iteration. For fxFacetune, the seed is derived from the minter’s wallet address and a collector-selected seed number at mint time. Each wallet is assigned 100 unique seeds.

Image encoding using Unicode mapping of DCT coefficients to Asian ideographs

Onchain image storage

The input image provided by the minter is stored directly on the Tezos blockchain through an on-chain transaction, which includes a params object in its payload. This process incurs an additional minting fee of approximately 8 XTZ, reflecting the nearly full utilization of Tezos's transaction data limit - 31.5 kB of data is stored on-chain. This capacity is sufficient to store the encoded input image along with parameters like detected face keypoints and the segmentation map separating the face from the background.

fxFacetune #23 with an artwork info card, minted by Olga Fradina

Image encoding

The input image is encoded as a 256x400 pixel image compressed with custom JPEG compression. Two main steps are utilized: color space conversion (RGB to YCbCr) and the discrete cosine transform (DCT). YCbCr color space encodes the images in one three channels. The luminance (luma) channel represents grayscale information and is stored using up to 12k Unicode characters. The chrominance (chroma) channels (Cr and Cb) store color details and use around 1.6k characters each. This approach is more efficient than standard RGB encoding, as the luma channel, containing the most visual information, undergoes less compression, while chroma channels, containing less critical information, are compressed more. This design leverages the human eye's sensitivity to tonal variations over color changes, a principle exploited by the JPEG algorithm for better compression.

Image channels are further compressed using the DCT algorithm. Each 8x8 pixel block is transformed into a frequency domain represented by 64 coefficients. Most of these coefficients are near zero and can be discarded. Through quantization, coefficients are grouped into bands, with only the 10 largest retained, ensuring high image fidelity while introducing familiar "JPEG artifacts." The coefficients and their positions in the frequency domain are encoded using a Unicode mapping defined algorithmically in the code. Different Asian ideographs and syllables are used for encoding: Chinese Han and Japanese Katakana for luma and chroma channels, Chinese Yi for background segmentation, and Korean Hangul for face keypoints. These Unicode ranges are chosen for their large character sets, as it is necessary to map thousands of elements. However, the resulting text appears random and is not intended for reading.

Related Posts