Whether it’s automatically tagging objects in pictures or the ability to tweak lighting and separate subjects from their background using the iPhone’s “portrait mode,” there’s no doubt that artificial intelligence is a powerful force in modern photo-editing tools.
But what if it were possible to go one step further, and use the latest cutting-edge technologies to develop what may just be the world’s most ambitious (and, in its own way, imaginative) paint program — one that goes far beyond simply touching up or coldly analyzing your existing pictures?
With such a program, all a person would need to do to remove an unsightly line of cars sullying a picture of their family home would be to pass over it with a brush. As if by magic, the vehicles would be replaced by a photorealistic grassy bank. Want to eliminate that photobomber from one of your vacation snaps? No problem: Just click to select them and they’ll vanish in place of a utility pole that looks like it’s always been there. How about adding an authentically ancient door into a photo of an old church? Click and it’s done. You get the idea.
This is what researchers at Massachusetts Institute of Technology and IBM are working toward with an amazing new tech demonstration they call the “GAN Paint Studio.” Described by its creators as providing the ability to “paint with neurons” — referring to the artificial neurons of a machine learning neural network — it’s one of the most potentially transformative photo-editing tools yet created.
It allows users to upload an image of their choosing and then modify any aspect of it they want, whether that’s changing the size of objects or adding in completely new items and objects. Think of it as Photoshop for the “deepfake” generation, albeit one that’s currently more of a proof-of-concept than a finished product.
The future of creative tools
“What we created with this work is a starting point to show how creative tools in the future could work,” Hendrik Strobelt, a research scientist at the MIT-IBM Watson A.I. Lab, told Digital Trends. “We started from a neural network [called a] GAN that can produce its own images of a certain category — for example, kitchen images — and analyzed which internal parts of the network are responsible for producing which feature. This allowed us to modify the images that the network produced. We ‘drew’ on them. The novelty we added is that you can upload your own image of this category and modify it with brushes that do not just draw strokes, but actually draw semantically meaningful units — such as trees, brick-texture, or domes.”
A GAN, or Generative Adversarial Network, is one of the most powerful tools used in generative artificial intelligence. A GAN pits two artificial neural networks against one another. One network generates new images, while the other attempts to work out which images are computer-generated and which are not. Over time, this generative adversarial process causes the “generator” network to become good enough at creating images that it can successfully fool the “discriminator” every time. A GAN was the technology behind the A.I. artwork that famously sold for big bucks at a Christie’s auction in 2018.
The system developed by the MIT and IBM researchers showcases some neat abilities. A bit like Deep Dream, the trippy image-generating tool developed by Google researchers several years back, it shows an impressive understanding of which images fit together. As a result of being trained on a vast archive of images, it picks up an understanding of the basic rules governing relationships between objects. For instance, ask it to add an object in the sky and it won’t draw a window — since it knows that windows aren’t usually (or ever) found there.
As Strobelt notes, GAN Paint Studio is not quite ready for prime time just yet. Although members of the public can have a go at using it, there’s still more work to be done. Notably, the demonstration version is currently low-resolution. However, it does showcase the immense promise of the technology.
Challenging imagination
“The most fun parts [of the technology] are actually when your imagination is challenged,” Strobelt said. “Try adding a door to the Palazzo Vecchio image; it’s kind of mind-blowing if you know the place. The system is far from perfect, and not every image can be modified equally well. There is still research needed on how to optimize all the parts. For example, when the GAN model tries to represent the input model, it might very well use the wrong semantical units to reproduce features — it [may] just generate a door out of tree units. Figuring out when and how it does do right or wrong is actually very interesting future work.”
“I see this as an advanced tool to help humans who think they are not creative to challenge this thought.”
Just as GANs get better over time, so Strobelt thinks that the applications for GAN Paint Studio will open up. “The obvious first idea would be a photo editor with these semantic brushes and erasers,” he said. “This could help you edit vacation photos, for example. It could also allow architects to quickly create variations on the embedding of their building renderings. Game designers could [also use it to] modify level maps quicker.”
If such technology could be added to video effects, it would also prove immensely powerful. This would allow objects to be placed into shots with just the touch of a button. Should a director realize they’ve forgotten to include a background item that’s crucial to the plot in a completed scene, it could be quickly added in — without the need for the current expensive and time-consuming visual effects processes.
Strobelt is decisive in saying that he doesn’t think GAN Pain Studio is truly, autonomously creative. “No,” he said, decisively. “I see this as an advanced tool to help humans who think they are not creative to challenge this thought.”
Then again, what is creativity? As with many other aspects of our lives, such as the jobs we believe only humans can do, it seems that A.I. is ready to ask the big questions.