StyleCLIPDraw: Text-to-Drawing Synthesis with Artistic Control

image
Louis Bouchard Hacker Noon profile picture

Louis Bouchard

I explain Artificial Intelligence terms and news to non-experts.

Have you ever dreamed of taking the style of a picture, like this cool TikTok drawing style on the left, and applying it to a new picture of your choice? Well, I did, and it has never been easier to do. In fact, you can even achieve that from only text and can try it right now with this new method and their Google Colab notebook available for everyone (see references).

Simply take a picture of the style you want to copy, enter the text you want to generate, and this algorithm will generate a new picture out of it! Just look back at the results above, such a big step forward! The results are extremely impressive, especially if you consider that they were made from a single line of text! If that sounds interesting, watch the video and learn more!

Watch the video

References

►Read the full article: https://www.louisbouchard.ai/clipdraw/
►CLIPDraw: Frans, K., Soros, L.B. and Witkowski, O., 2021. CLIPDraw:
exploring text-to-drawing synthesis through language-image encoders. https://arxiv.org/abs/2106.14843
►StyleCLIPDraw: Schaldenbrand, P., Liu, Z. and Oh, J., 2021.
StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis. https://arxiv.org/abs/2111.03133
►CLIPDraw Colab notebook: https://colab.research.google.com/github/kvfrans/clipdraw/blob/main/clipdraw.ipynb
►StyleCLIPDraw code: https://github.com/pschaldenbrand/StyleCLIPDraw
►StyleCLIPDraw Colab notebook: https://colab.research.google.com/github/pschaldenbrand/StyleCLIPDraw/blob/master/Style_ClipDraw_1_0_Refactored.ipynb
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/

Video Transcript

00:00

have you ever dreamed of taking a

00:01

picture like this cool tick tock drawing

00:03

style and applying it to a new picture

00:06

of your choice well i did and it has

00:08

never been easier to do in fact you can

00:10

even achieve that from only text and you

00:13

can try it right now with this new

00:15

method and their google collab notebook

00:17

available for everyone simply take a

00:19

picture of the style you want to copy

00:21

enter the text you want to generate and

00:23

this algorithm will generate a new

00:25

picture out of it look at that such a

00:28

big step forward the results are

00:30

extremely impressive especially if you

00:31

consider that they were made from a

00:33

single line of text here i tried

00:35

imitating the same style with another

00:37

text input to be honest sometimes it may

00:40

look a bit all over the place especially

00:42

if you select a more complicated or

00:44

messy drawing style like this one

00:46

speaking of something messy if you are

00:47

like me and your model versioning and

00:49

resource tracking looks like this you

00:51

may be the perfect candidate to try the

00:53

sponsor of two days video which is none

00:55

other than weights and biases i always

00:57

assumed i could stack folders like this

00:59

and simply add old v1 v2 v3 and so on to

01:03

my file names without any problem until

01:06

i had to work with someone while it may

01:07

be easy for me to find my old tests it

01:10

was impossible to explain my thought

01:12

process behind this mess and was my

01:14

teammate’s nightmare if you care about

01:15

your teammates and reproducibility don’t

01:18

do like i did and give weights and

01:20

biases a shot no more notebooks or

01:22

results saved everywhere as it creates a

01:24

super friendly user dashboard for you

01:26

and your team to track your experiments

01:28

and it’s super easy to set up and use

01:30

it’s the first link in the description

01:32

and i promise within a month you will be

01:34

completely dependent

01:37

as we said this new model by peter

01:39

schaldenbrunn ethel called style clip

01:42

draw which is an improvement upon clip

01:44

draw by kevin franz at all takes an

01:46

image and takes as inputs and can

01:48

generate a new image based on your text

01:50

and following the style in the image so

01:52

the model has to both understand what’s

01:54

in the text and the image to correctly

01:56

copy its style as you may suspect this

01:59

is incredibly challenging but we are

02:01

fortunate enough to have a lot of

02:02

researchers working on so many different

02:04

challenges like trying to link text with

02:07

images which is what clip can do quickly

02:10

clip is a model developed by openai that

02:12

can basically associate a line of text

02:14

with an image both the text and images

02:17

will be encoded similarly so that they

02:19

will be very close to each other in the

02:21

new space they are encoded in if they

02:23

both mean the same thing using clip the

02:25

researchers could understand the text

02:27

from the user input and generate an

02:29

image out of it if you are not familiar

02:31

with clip yet i would recommend watching

02:33

a video i made about it together with

02:35

dolly earlier this year but then how did

02:38

they apply a new style to it clip is

02:40

just linking existing images to texts it

02:43

cannot create a new image indeed we also

02:46

need something else to capture the style

02:48

of the image sent in both the textures

02:50

and shapes well the image generation

02:52

process is quite unique it won’t simply

02:55

generate an image right away rather it

02:57

will draw on a canvas and get better and

02:59

better over time it will just draw

03:01

random lines at first and create an

03:03

initial image this new image is then

03:06

sent back to the algorithm and compared

03:08

with both the style image and the text

03:10

which will generate another version this

03:12

is one iteration at each iteration we

03:15

draw random curves again oriented by the

03:17

two losses we’ll see in a second this

03:19

random process is quite cool since it

03:22

will allow each new test to look

03:24

different so using the same image and

03:26

same text as inputs you will end up with

03:29

different results that may look even

03:31

better here you can see a very important

03:33

step called image augmentation it will

03:35

basically create multiple variations of

03:38

the image and allow the model to

03:39

converge on results that look right to

03:42

humans and not simply on the right

03:44

numerical values for the machine this

03:46

simple process is repeated until we are

03:49

satisfied with the results so this whole

03:51

model learns on the fly over many

03:54

iterations optimizing two losses we see

03:56

here one for aligning the content of the

03:59

image with the text sent and the other

04:01

further style here you can see the first

04:03

lust is based on how close the clip

04:06

encodings are as we said earlier where

04:08

clip is basically judging the results

04:11

and its decision will orient the next

04:12

generation the second one is also very

04:15

simple we send both images into a

04:18

pre-trained convolutional neural network

04:20

like vgg which will encode the images

04:22

similarly to clip we then compare these

04:24

encodings to measure how close they are

04:26

to each other this will be our second

04:29

judge that will orient the next

04:30

generation as well this way using both

04:33

judges we can get closer to the text and

04:35

the wanted style at the same time in the

04:37

next generation if you are not familiar

04:39

with convolutional neural networks and

04:41

encodings i will strongly recommend

04:43

watching the video i made explaining

04:45

them in simple terms this iterative

04:47

process makes the model a bit slow to

04:49

generate a beautiful image but after a

04:51

few hundred iterations or in other words

04:53

after a few minutes you have your new

04:55

image and i promise it’s worth the wait

04:58

it also means that it doesn’t require

05:00

any other training which is pretty cool

05:02

now the interesting part you’ve been

05:04

waiting for indeed you can use it right

05:06

now for free or at least pretty cheaply

05:08

using the collab notebook linked in the

05:10

description below i had some problems

05:12

running it and i would recommend buying

05:14

the pro version of collab if you’d like

05:16

to play with it without any issues

05:19

otherwise feel free to ask me any

05:21

questions in the comments if you

05:22

encounter any problems i pretty much

05:24

went through all of them myself to use

05:27

it you simply run all cells like that

05:29

and that’s it you can now enter a new

05:31

text for the generation or send a new

05:33

image for the style from a link and

05:35

voila now tweak the parameters and see

05:38

what you can do if you play with it

05:40

please send me the results on twitter

05:42

and tag me i’d love to see them as they

05:44

state in the paper the results will have

05:46

the same biases as the models they use

05:49

such as clip which you should consider

05:51

if you play with it of course this was a

05:53

simple overview of the paper and i

05:55

strongly invite you to read both clip

05:57

draw and style clip draw for more

05:58

technical details and try their collab

06:01

notebook both are linked in the

06:02

description below thank you once again

06:05

weights and biases for sponsoring this

06:07

video and huge thanks to you for

06:09

watching until the end i hope you

06:11

enjoyed this week’s video let me know

06:13

what you think and how you will use this

06:15

new model

06:17

[Music]

Tags



StyleCLIPDraw: Text-to-Drawing Synthesis with Artistic Control
Source: Pinay Tube PH

Post a Comment

0 Comments