> stats-data-ia > imagerie > universal-neural-style-transfer-two-minute-papers-213

Universal Neural Style Transfer | Two Minute Papers #213

Two Minute Papers - 2017-12-09

The paper "Universal Style Transfer via Feature Transforms" and its source code is available here:
https://arxiv.org/abs/1705.08086 
https://github.com/Yijunmaverick/UniversalStyleTransfer

Recommended for you:
https://www.youtube.com/watch?v=Rdpbnd0pCiI - What is an Autoencoder?

We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Andrew Melnychuk, Brian Gilman, Christian Ahlin, Christoph Jadanowski, Dave Rushton-Smith, Dennis Abts, Eric Haddad, Esa Turkulainen, Evan Breznyik, Kaben Gabriel Nanlohy, Malek Cellier, Marten Rauschenberg, Michael Albrecht, Michael Jensen, Michael Orenstein, Raul Araújo da Silva, Robin Graham, Steef, Steve Messina, Sunil Kim, Torsten Reil.
https://www.patreon.com/TwoMinutePapers

One-time payments:
PayPal: https://www.paypal.me/TwoMinutePapers
Bitcoin: 13hhmJnLEzwXgmgJN7RB6bWVdT7WkrFAHh

Music: Antarctica by Audionautix is licensed under a Creative Commons Attribution license (https://creativecommons.org/licenses/by/4.0/)
Artist: http://audionautix.com/ 

Thumbnail background image credit: https://pixabay.com/photo-1978682/
Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu

Károly Zsolnai-Fehér's links:
Facebook: https://www.facebook.com/TwoMinutePapers/
Twitter: https://twitter.com/karoly_zsolnai
Web: https://cg.tuwien.ac.at/~zsolnai/

anEnigma - 2017-12-09

this + augmented reality = mind blowing

Josh S - 2017-12-10

anEnigma shaders in real life. Awesome

Irun S - 2017-12-10

+ mixed reality + deep dream + a bit of randomization = obsolescence of hallucinogens

Peter Lindsey - 2017-12-10

mixed reality + deep dream + randomization + hallucinogens = ? divide by zero!?

anEnigma - 2017-12-10

lol

Reavenk - 2017-12-09

Waiting for the day style transfer is used as a post effect for a game. Maybe a game where you can jump around different artworks. While there's a lot of stuff you can do with post shaders, there's got to be a bunch of amazing effects and animations that can only be done with style transfers.

Erich Reitz - 2017-12-09

check this out. https://www.youtube.com/watch?v=9aKL5eq3tSs The render was insane though, about 4 minutes a frame!

Karl Kastor - 2017-12-09

How are the encodings of both source images combined? Simply adding and averaging wouldn't make a difference between the content image and the style image.

Surya Kant - 2018-06-17

Check out the paper(link in the description). The method is not learning based. And uses mathematical transforms to get target feature representation.

Akash Goel - 2018-08-20

Yeah, we could always read the paper, but the point of this channel is to be able to exactly avoid that.

Manuel Cuevas - 2019-01-18

I'm with you very disappointed for the channel, but the network they use is already pretrain, similar

Nicolas Gunawan - 2020-01-26

any answer yet?

Will Tesler - 2017-12-09

The animation at 3:52 is insane.

Evan Davis - 2017-12-12

A couple of commenters have asked what's so special about this method as compared to what's been around for the past two years:

The original algo from Gatys in 2015 still produces the most visually pleasing results as far as I know. It poses style transfer as an optimization objective where a noise image is fed forward through a pre-trained image classification net like VGG16, and the loss balances between content/style losses that encourage it to match the VGG features of given content/style images at multiple layers. The error is backpropagated and the noise image is updated with a gradient descent step. The forward/backward/update process is repeated up to hundreds of times, and so this is typically slow and can't stylize in real-time.

jcjohnson in 2016 introduced a 'fast' style transfer approach that addresses the speed issue with somewhat lower quality results. It uses a separate 'image transformation network' that's trained to apply a single style to input content images with only a forward pass. It also uses VGG to calculate a similar loss during training, but at test time only the transformation net is needed. Most style transfer mobile apps likely use a variation of this.

There are a couple of extensions to the 'fast' architecture that support multiple styles without needing to train a new net for each. One of these is Conditional Instance Normalization from Google Magenta https://magenta.tensorflow.org/2016/11/01/multistyle-pastiche-generator. However, this is limited to a pre-set # of styles that have to be supplied during training. Adding an unseen style to the net requires training with a new set of params.

This paper is 'universal' in the sense of having the flexibility to generalize to arbitrary styles while also being fast. In fact, there is no explicit style transfer objective and style images aren't even needed for training. The architecture is simply an autoencoder trained to decode VGG features to reconstruct the image that's fed in.

The style transfer magic is applied at test time by combining the content/style image features using a Whitening-Coloring Transform https://goo.gl/9eUnUW. This relies on the insight that style information is represented by the feature statistics, and so transferring the mean/covariance of the style features to the content with WCT is sufficient to preserve the structure of the content while applying style. The stylized result is then obtained by forwarding the transformed features through the decoder. The results still aren't as good as Gatys, but in the realm of fast approaches it feels like the conceptually cleanest.

And to plug my own work: I've implemented this paper in TensorFlow https://github.com/eridgd/WCT-TF

FX - 2017-12-10

2:28 It is not clear at all how the two inputs are combined. How does this "Feature Transforms" step work?

Piripiri - 2017-12-10

I gave it some thought and this is my conclusion is that the first input recognizes the 'lineart', that is, sudden gradients in color. The lineart then defines regions of empty or almost empty space which arefilled with colorful style taken from the second input. I have yet to see the code but that's my two cents on this.

Ezeste - 2017-12-09

Can this model be applied to audio?

Brian Muhia - 2017-12-09

Two ways:
1. Figure out how to do style transfer using recurrent neural networks (audio is a sequential stream of data, which RNNs are very good at learning). Paper from November: https://export.arxiv.org/pdf/1711.04731
2. Use the waveform of the image (spectrogram) and figure out how to do it using a convolutional network.

I think 1 would work, I don't know if anyone is doing 2.

Here's another link: https://dmitryulyanov.github.io/audio-texture-synthesis-and-style-transfer/

Stephen Nielsen - 2017-12-10

that would be awesome!

logan milliken - 2017-12-13

exactly what i was thinking... maybe useable for music making break throughs. Might also be good for mid level voice/accent shifting (giving your siri many more choices with little additional ROM requirements).

Trion - 2017-12-10

"Hell yeah" was best 😂

Tori Ko - 2017-12-09

Wow, I really liked this episode, good job!

Nova Verse - 2017-12-10

This is amazing!!..

Jonas Rejman - 2017-12-09

You rock, man! Thank you!

E S - 2020-01-22

This is amazing, im new to this and slightly confused on how to get started using this on mac. How do I get started?

Brian Muhia - 2017-12-09

Oh man, I just tried this on some photographs. Stunning!

ProGamerGov - 2017-12-09

The results don't seem to look as good as jcjohnson's Neural-Style. This seems to be more towards "Fast" style transfer, which produces lower quality, but faster outputs. Reminds me a lot of AdaIN, Style-swap, and Fast-Neural-Style. These sorts of style transfer networks seem best suited for devices like phones, and those without access to high end GPUs, but they still can't compete with the original Neural-Style.

There are already a large number of ways to control the outputs produced by Neural-Style, and I've tried to list them all here: https://github.com/jcjohnson/neural-style/wiki/Scripts. You can transfer styles between different regions, using the style feature mean in addition to the gram matrix, gram matrix delta manipulation, layer channel manipulation, luminance transfer, histogram matching, photorealism, simultaneous DeepDream and style transfer, endless zoom, multiscale resolution, tiling, etc...

So I don't think it's accurate to say that you couldn't tune the output artistically to your liking, in previous style transfer algorithms. It's more accurate to say that you couldn't really tune "Fast" style transfer outputs to your liking as easily, in previous "Fast" style transfer algorithms.

Kram1032 - 2017-12-09

That's pretty close to my thoughts as well.
Though I think these really fast and flexible results could potentially be refined using some constraints.
Especially the bridge example results didn't particularly blow me away. This technique can't handle the long straight lines of the bridge well, and visually considers the negative space between the top bridge wires a separate region from the surrounding sky, giving weird distortions there.
Still, it's great for its speed.

fastlater - 2018-01-24

ProGamerGov any good style transfer project with tensorflow? I found many fast style transfer and the one poster by you is in torch. Any recommended project?

Tsz Fung Li - 2017-12-11

It's really delightful hearing you say "See you next time"

Omar Cusma Fait - 2018-04-05

really awesome and enlightning! :D

Damian Reloaded - 2017-12-09

Amazing!

VFX Kitchen. - 2017-12-09

this is awesome!

Donovan Keating - 2017-12-09

Hi Karoly. I love your content. Can you do some stuff on NLP? :)

Two Minute Papers - 2017-12-09

Hey there Donovan, kind thanks for watching and for the kind words! Please send me some papers in that area that you find interesting through twitter or e-mail and I'll toss them onto the reading list. Thank you! :)

Zhaowen Wang - 2017-12-10

Great work!

Joaquin Rocco - 2017-12-09

There are apps that are performing real-time style transfer on phone already. Check out envision and dreamsnap! Both run on the GPU of iPhones using a framework called Bender

Hari Krishnan - 2017-12-11

Indeed! What a time to be alive!

centar15 - 2017-12-09

Amazing episode!!! Thank you so much, and I have no idea whether you read my comment on the last episode, but this one really had the perfect amount of explanation! Thank you Karoly from a Viennese Fellow Scholar/Data Scientist ;) I only miss how the two bottleneck representations are combined, but -> paper reading it is :)

Two Minute Papers - 2017-12-09

I have seen it (I think)! Thank you. :)

chkone007 - 2017-12-09

Even If I see the interest of this work, depending of the style I still prefer (qualitatively) the other methods ( :

Two Minute Papers - 2017-12-09

Sure, that's fine - after all, we don't have an objective function that we are trying to maximize (or if there is, there are many), so there is clearly a great deal of subjectivity to this. Maybe there will be a method that can "do it all" by providing a great deal of control over the output with different parameter choices (even more than this one).

Shubham Agrawal - 2019-11-30

What a time it is to be alive! Indeed!

feihcsim - 2017-12-09

breathtaking

Super Cables - 2017-12-19

Downloaded the code, how do i use it?

anEnigma - 2017-12-09

Try it with ilene meyers artworks

anEnigma - 2017-12-10

or maybe Alexander Lymkin

Max Gisborne - 2017-12-10

What happens if you conect it up the other way around so the style transfers in the other direction

Tori Ko - 2017-12-09

Hi

Stephen Nielsen - 2017-12-10

bye bye expensive video special effects suites

Piripiri - 2017-12-10

Snapchat is buying this in 3, 2, 1

Surya Kant - 2018-06-15

Implementing this on Android app for an research internship. Let's hope the results are decent :D

Daroga Jee - 2017-12-10

Video about.... Alpha zero...

frank x - 2017-12-10

What about the hypothesis that the essence of idealisation in the brain is merely (usually) the result of this "bottlenecking", (or just heavy data compression achieved by some means)?

Félix L - 2017-12-13

Is this how the app "Prisma" works?

Evan Davis - 2017-12-13

Prisma likely uses a variation of so-called 'fast' neural style like https://github.com/jcjohnson/fast-neural-style, which requires training a separate transformation net for each style. This paper's architecture is not quite as fast because it relies on a VGG19 net as a feature encoder, but it's more flexible in being able to generalize to any style without retraining.

Félix L - 2017-12-13

Evan Davis Alright. Thanks

Shreeyak Sajjan - 2017-12-11

I didn't really understand the difference between the old method and the new one. Could you elaborate a bit on this in the comments?

japrogramer - 2018-07-21

The code says that it only works with an nvidia gpu with cudnn .. can someone please transfer this to work with amd and nvidia?

Raphael Barros - 2019-08-01

I wonder if most of the artists whose works have being used for training in style transfer GAN would be happy with it (if they were alive, I suppose (expect) people are using public domain arts in this case). I sure wouldn't.

AI devs won't be happy until they suck the joy and soul of every feasible human activity. Converting years of career development, the very fundamentals of art into a push of a button is basically going to kill any desire to pursue art as a career, or even a hobby.

columbus8myhw - 2017-12-10

Am I the only one who thinks that these are worse than the previous versions it was compared against? Or is the main improvement in speed rather than quality?

Facebotter - 2017-12-10

versatility in different styles, I believe

ILikeWeatherGuy - 2017-12-09

why not use this to interview an AI? just like you would a person. We cant tell what an AI is thinking because the tensors involved are too complex, but we can interview an AI and then start to map out factors the AI uses to arrive at a set conclusion.

Limitless 1 - 2017-12-09

should I be worried ?

schuelermine - 2017-12-09

No, why?

quebono100 - 2017-12-09

wtf, who dislike this?

Happy Joy Bravo - 2017-12-10

"Universal Neural Style Transfer " Oh cmon. How does that now mean transferring your consciousness into another mind?!?!

Ginsu131 - 2017-12-09

How are these style things being considered "research" these days? These need to stop being mixed in with actual research and have their own fun blog or something.

Chris Gresock - 2017-12-09

I think these techniques are important for the future of computer vision. By understanding abstract patterns in images, a lot of other areas are enhanced by this. This research has wider implications than just cool new snapchat filters.

peyj n. - 2017-12-09

Out of curiosity: Why do you not consider this valid research? To my knowledge anything that produces meaningful new knowledge about our universe is considered research, independent of it's potential for serious applications.

Dr. Dietrich Davidstein - 2017-12-09

Just look at the algorithms that are needed to compute these style transfer methods – it's not trivial to write these! The first project i can think of came from Justin Johnson ( http://cs.stanford.edu/people/jcjohns/ ) who implemented this ( https://arxiv.org/abs/1508.06576 ) here: https://github.com/jcjohnson/neural-style. If you think this is not contributing to the general research of image recognition and AI – then what it is? The mathematics behind these algorithms are surely moving forward other areas, like object detection in realtime (just look at Johnsons other projects) and image super resolution. Which can ultimately lead to better compression techniques and maybe new image formats that can speed up the whole web! Imagine you are surfing on mobile and you just need to download 4x downscaled images which can be reconstructed to high quality versions via a non-compute heavy algorithm. Since images make up most of the data of a webpage this gives you a significant speed boost. I know, style transfer (ST) is quite different compared to super resolution (SR), but a lot of people interested in ST also work and incorporate solutions and models of ST to SR projects.

Style transfer in realtime could also lead to super realistic video game graphics! Sometime GPUs will be fast enough to deliver 30-60 fps of "styled" images that are converted from a model of real world images. There are so many fields where these style transfer techniques are relevant – just because they are fun to work with doesn't mean that they don't contribute to the whole area of machine learning/AI!

Colopty - 2017-12-09

What would you consider to be research?