diff --git a/notebook.ipynb b/notebook.ipynb index a9309b4..80ad555 100644 --- a/notebook.ipynb +++ b/notebook.ipynb @@ -1 +1 @@ -{"cells":[{"cell_type":"markdown","metadata":{"id":"arqpDh95Zt6W"},"source":["# Wasserstein GAN\n","\n","L'objectif de ce projet était d'étudier les GANs dans le cas de la distance de Wasserstein.\n","\n","Voici les membres de notre groupe classés par ordres alphabétiques pour leur nom de famille :\n","- Paul Corbalan\n","- Nicolas Gonel\n","- Oihan Joyot\n","- Tristan Portugues\n","- Florian Zorzynski\n","\n","Notre projet s'inspire grandement des ressources suivantes qui sont l'article initial de notre projet ainsi que le code correspondant.\n","- Article : [[1701.07875] Wasserstein GAN (arxiv.org)](https://arxiv.org/abs/1701.07875)\n","- Code : [martinarjovsky/WassersteinGAN (github.com)](https://github.com/martinarjovsky/WassersteinGAN)\n","\n","Un répertoire pour ce projet en général est disponible à l'adresse suivante :\n","https://github.com/paul-corbalan/wasserstein-gan\n","\n","---"]},{"cell_type":"markdown","metadata":{"id":"Hke_hyXT3bSP"},"source":["## Introduction\n","\n","Les Réseaux Antagonistes Génératifs (GANs) représentent une avancée majeure dans le domaine de l'apprentissage profond, révolutionnant la manière dont les machines comprennent et génèrent des données, en particulier des images. Cette technologie imite la façon dont les humains apprennent et créent, ouvrant des portes vers des applications innovantes allant de l'art numérique à des solutions médicales avancées. Le projet \"Wasserstein GAN\" s'inscrit dans cette perspective, visant à explorer une variante spécifique des GANs qui utilise la distance de Wasserstein pour améliorer la stabilité et la qualité des résultats.\n","\n","Le choix de la distance de Wasserstein comme métrique clé dans notre projet offre un avantage distinct sur les méthodes traditionnelles. Elle permet de surmonter certains des défis inhérents aux GANs classiques, comme le mode collapse et les problèmes de convergence. En se concentrant sur cette approche, notre projet cherche à démontrer comment une compréhension approfondie de la théorie mathématique peut être appliquée efficacement pour améliorer la performance et la fiabilité des modèles génératifs.\n","\n","Ce notebook est conçu pour servir d'outil d'apprentissage et d'exploration dans le domaine des GANs, avec un accent particulier sur les Wasserstein GANs. Il guide le lecteur à travers les principes fondamentaux, les défis et les solutions uniques associés à cette technologie, offrant un mélange d'explications théoriques et d'applications pratiques. L'objectif est de fournir une base solide pour comprendre et utiliser les Wasserstein GANs."]},{"cell_type":"markdown","metadata":{"id":"P_O5MYE_xMU5"},"source":["## Generative Adversarial Network (GAN)\n","\n","Les GANs sont des modèles d'apprentissage profond définis par deux réseaux neuronaux, le générateur $G$ et le discriminateur $D$.\n","\n"," Le générateur crée des données, tandis que le discriminateur les évalue. L'objectif du générateur est d'approcher un distribution $\\mathbb{P}_g$ inconnue telle que les données générées $G(z)$ soient indiscernables des données réelles $x$, où $z$ un vecteur de notre espace latent. Le discriminateur est entraîné à faire la distinction entre un inputs et $x$.\n","\n"," Les deux modèles sont mis en compétition et $G$ cherche à minimiser la probabilité que $D$ fasse la distinction entre $G(z)$ et $x$, tandis que $D$ cherche à maximiser cette probabilité.\n","\n","Formellement, cela correspond à résoudre le problème min-max pour :\n","$$V(D, G) = \\mathbb{E}_{x \\sim \\mathbb{P}_{r}}[\\log D(x)] + \\mathbb{E}_{z \\sim \\mathbb{P}_z}[\\log(1 - D(G(z)))]$$\n","où le problème est le suivant :\n","$$\n","\\min _G \\max _D V(D, G)\n","$$\n","\n"," Cette minimisation utilise la log-vraissemblance négative et est la solution d'origine pour arriver à un équilibre entre le générateur et le discriminateur. Cependant cette méthode peut présenter plusieurs problèmes:\n"," - des \"modes collapse\" où l'entraînement converge vers une solution oubliant certaines particularités de la distibution cherché.\n"," - des gradients évanescents, lorsque le discriminateur devient trop parfait, il devient impossible de générer un gradient utilisable à partir de la sortie du discriminateur.\n","\n"," On va donc chercher un moyen de résoudre ces problèmes en changeant de fonction de perte et on va essayer d'utiliser la distance de wasserstein."]},{"cell_type":"markdown","metadata":{"id":"yIx0TILzG6-l"},"source":["## Distance de Wasserstein\n","\n","### Définition\n","\n","En mathématiques, la distance de Wasserstein est une fonction définie entre des distributions de probabilité sur un espace métrique donné $(M,d)$. La distance de Wasserstein d'ordre $p \\in \\left[1,+\\infty \\right]$ entre deux mesures de probabilité $\\mu$ et $\\nu$ définies sur $M$ (avec des moments finis de l'ordre $p$) est définie par :\n","\n","\\begin{equation}\n","\\begin{split}\n","{\\displaystyle W_{p}(\\mu ,\\nu )=\\left(\\inf _{\\gamma \\in \\Gamma (\\mu ,\\nu )}\\mathbf {E} _{(x,y)\\sim \\gamma }d(x,y)^{p}\\right)^{1/p}.}\n","\\end{split}\n","\\end{equation}\n","\n","L'infimum est pris sur $\\Gamma (\\mu,\\nu )$, l'ensemble de tous les couplages dont les distributions marginales sont respectivement $\\mu$ et $\\nu$.\n","\n","### Interprétation physique\n","\n","Cette métrique est également connue sous le nom de earth mover distance. De manière intuitive, si l'on imagine chaque distribution comme une unité de terre empilée sur un espace métrique $M$, la métrique représente le coût minimal pour remodeler une pile en une autre. Ce coût est conçu comme la quantité de terre à déplacer, multipliée par la distance moyenne qu'elle doit parcourir.\n","\n","En d'autres termes, la distance de Wasserstein fournit une mesure précise du coût minimal nécessaire pour transformer une distribution de probabilité en une autre, tout en minimisant le coût total de ce déplacement. C'est pourquoi l'avantage de cette distance réside dans son incorporation des concepts de transport optimal et de couplage, tous deux pertinents et pratiques pour l'étude.\n","\n","### Utilisation pour les images numériques\n","\n","Pour résumer, la distance de Wasserstein est une manière naturelle de comparer les distributions de probabilité de deux variables, où une variable est dérivée de l'autre par de petites perturbations non uniformes (aléatoires ou déterministes). C'est pourquoi, en informatique, cette métrique est largement utilisée pour comparer des distributions discrètes, notamment les histogrammes de couleur de deux images numériques.\n","\n","\n","### Calcul numérique\n","\n","Le problème principal de cette distance est son calcul, l'infimum étant très compliqué à calculer. Heureusement, la dualité de Kantorovich-Rubinstein nous donne :\n","\\begin{equation}\n","W(\\mathbb{P}_r, \\mathbb{P}_{\\theta})=\\frac{1}{K}\\sup_{||f|| 0.7:\n"," print(f\"Image {filename} is predicted as a human face and will be moved.\")\n"," shutil.copy(os.path.join(folder_path, filename), 'predicted_humans')\n","\n","\n","predict_and_move(\"..\")"]},{"cell_type":"markdown","metadata":{"id":"Pl2UlIxJMb-s"},"source":["![confusion.svg]()"]},{"cell_type":"markdown","metadata":{"id":"vEH1G2zSIZ3p"},"source":["L'ensemble de données a ensuite été mis à l'échelle pour appliquer le filtre de convolution à l'aide de la commande :\n","```shell\n","convert *.png resized 400% *upscaled*/*.png\n","```\n","\n","Les données finales peuvent être téléchargées sur :\n","https://github.com/paul-corbalan/wasserstein-gan/blob/develop/data/predicted_humans.zip"]},{"cell_type":"markdown","metadata":{"id":"xA3azaz1US4k"},"source":["## Architecture du modèle GAN\n","\n","Dans cette partie sont décrites les architectures du discriminateur et du générateur de notre modèle."]},{"cell_type":"markdown","metadata":{"id":"dxaMaimlY21r"},"source":["### Discriminateur"]},{"cell_type":"markdown","metadata":{"id":"dFj_wV-i4dIG"},"source":["![netD_architecture.svg]()"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"5DgOVOMHSMqA"},"outputs":[],"source":["class DCGAN_D(nn.Module):\n"," def __init__(self, isize, nz, nc, ndf, ngpu, n_extra_layers=0):\n"," super(DCGAN_D, self).__init__()\n"," self.ngpu = ngpu\n"," assert isize % 16 == 0, \"isize has to be a multiple of 16\"\n","\n"," main = nn.Sequential()\n"," # inputs is nc x isize x isize\n"," main.add_module('initial:{0}-{1}:conv'.format(nc, ndf),\n"," nn.Conv2d(nc, ndf, 4, 2, 1, bias=False))\n"," main.add_module('initial:{0}:relu'.format(ndf),\n"," nn.LeakyReLU(0.2, inplace=True))\n"," csize, cndf = isize / 2, ndf\n","\n"," # Extra layers\n"," for t in range(n_extra_layers):\n"," main.add_module('extra-layers-{0}:{1}:conv'.format(t, cndf),\n"," nn.Conv2d(cndf, cndf, 3, 1, 1, bias=False))\n"," main.add_module('extra-layers-{0}:{1}:batchnorm'.format(t, cndf),\n"," nn.BatchNorm2d(cndf))\n"," main.add_module('extra-layers-{0}:{1}:relu'.format(t, cndf),\n"," nn.LeakyReLU(0.2, inplace=True))\n","\n"," while csize > 4:\n"," in_feat = cndf\n"," out_feat = cndf * 2\n"," main.add_module('pyramid:{0}-{1}:conv'.format(in_feat, out_feat),\n"," nn.Conv2d(in_feat, out_feat, 4, 2, 1, bias=False))\n"," main.add_module('pyramid:{0}:batchnorm'.format(out_feat),\n"," nn.BatchNorm2d(out_feat))\n"," main.add_module('pyramid:{0}:relu'.format(out_feat),\n"," nn.LeakyReLU(0.2, inplace=True))\n"," cndf = cndf * 2\n"," csize = csize / 2\n","\n"," # state size. K x 4 x 4\n"," main.add_module('final:{0}-{1}:conv'.format(cndf, 1),\n"," nn.Conv2d(cndf, 1, 4, 1, 0, bias=False))\n"," self.main = main\n","\n","\n"," def forward(self, inputs):\n"," if isinstance(inputs.data, torch.cuda.FloatTensor) and self.ngpu > 1:\n"," output = nn.parallel.data_parallel(self.main, inputs, range(self.ngpu))\n"," else:\n"," output = self.main(inputs)\n","\n"," output = output.mean(0)\n"," return output.view(1)"]},{"cell_type":"markdown","metadata":{"id":"2X34F3vIY-6c"},"source":["### Generateur"]},{"cell_type":"markdown","metadata":{"id":"Kg60CsB74fHR"},"source":["![netG_architecture.svg]()"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"CiqFCjoAYtBq"},"outputs":[],"source":["class DCGAN_G(nn.Module):\n"," def __init__(self, isize, nz, nc, ngf, ngpu, n_extra_layers=0):\n"," super(DCGAN_G, self).__init__()\n"," self.ngpu = ngpu\n"," assert isize % 16 == 0, \"isize has to be a multiple of 16\"\n","\n"," cngf, tisize = ngf//2, 4\n"," while tisize != isize:\n"," cngf = cngf * 2\n"," tisize = tisize * 2\n","\n"," main = nn.Sequential()\n"," # inputs is Z, going into a convolution\n"," main.add_module('initial:{0}-{1}:convt'.format(nz, cngf),\n"," nn.ConvTranspose2d(nz, cngf, 4, 1, 0, bias=False))\n"," main.add_module('initial:{0}:batchnorm'.format(cngf),\n"," nn.BatchNorm2d(cngf))\n"," main.add_module('initial:{0}:relu'.format(cngf),\n"," nn.ReLU(True))\n","\n"," csize, cndf = 4, cngf\n"," while csize < isize//2:\n"," main.add_module('pyramid:{0}-{1}:convt'.format(cngf, cngf//2),\n"," nn.ConvTranspose2d(cngf, cngf//2, 4, 2, 1, bias=False))\n"," main.add_module('pyramid:{0}:batchnorm'.format(cngf//2),\n"," nn.BatchNorm2d(cngf//2))\n"," main.add_module('pyramid:{0}:relu'.format(cngf//2),\n"," nn.ReLU(True))\n"," cngf = cngf // 2\n"," csize = csize * 2\n","\n"," # Extra layers\n"," for t in range(n_extra_layers):\n"," main.add_module('extra-layers-{0}:{1}:conv'.format(t, cngf),\n"," nn.Conv2d(cngf, cngf, 3, 1, 1, bias=False))\n"," main.add_module('extra-layers-{0}:{1}:batchnorm'.format(t, cngf),\n"," nn.BatchNorm2d(cngf))\n"," main.add_module('extra-layers-{0}:{1}:relu'.format(t, cngf),\n"," nn.ReLU(True))\n","\n"," main.add_module('final:{0}-{1}:convt'.format(cngf, nc),\n"," nn.ConvTranspose2d(cngf, nc, 4, 2, 1, bias=False))\n"," main.add_module('final:{0}:tanh'.format(nc),\n"," nn.Tanh())\n"," self.main = main\n","\n"," def forward(self, inputs):\n"," if isinstance(inputs.data, torch.cuda.FloatTensor) and self.ngpu > 1:\n"," output = nn.parallel.data_parallel(self.main, inputs, range(self.ngpu))\n"," else:\n"," output = self.main(inputs)\n"," return output"]},{"cell_type":"markdown","metadata":{"id":"zyRrm3ClUMX1"},"source":["## Entrainement\n","\n","Voici un exemple de code qui permettrait l'entraînement de notre modèle et la configuration de celui-ci."]},{"cell_type":"markdown","metadata":{"id":"PhynHzMXXEyP"},"source":["### Options pour l'exécution"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"zkk11X5zSQLK"},"outputs":[],"source":["# Path to dataset\n","opt_dataroot = 'data/faces'\n","# Number of data loading workers\n","opt_workers = 2\n","# inputs batch size\n","opt_batchSize = 64\n","# The height / width of the inputs image to network\n","opt_imageSize = 32\n","# inputs image channels\n","nc = 3\n","# Size of the latent z vector\n","nz = 100\n","# Size of feature maps in generator\n","ngf = 32\n","# Size of feature maps in discriminator\n","ndf = 32\n","# Number of epochs to train for\n","opt_niter = 25\n","# Learning rate for Discriminator\n","opt_lrD = 0.00005\n","# Learning rate for Generator\n","opt_lrG = 0.00005\n","# beta1 for adam.\n","opt_beta1 = 0.5\n","# Lower value clamp for Discriminator weights\n","opt_clamp_lower = -0.01\n","# Upper value clamp for Discriminator weights\n","opt_clamp_upper = 0.01\n","# Number of D iters per each G iter\n","opt_Diters = 5\n","# Where to store samples and models\n","opt_experiment = 'samples'"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"LVYpQv-_tFsK"},"outputs":[],"source":["# Use CUDA if a GPU is available\n","opt_cuda = False\n","\n","opt_cuda_resp = input(\"Use cuda? (y/n)\\n{} by default\\n\".format(opt_cuda))\n","\n","# Use CUDA if a GPU is available\n","opt_cuda = True if opt_cuda_resp == 'y' else opt_cuda\n","# Number of GPUs to use for running the model if CUDA is enabled\n","ngpu = 1"]},{"cell_type":"markdown","metadata":{"id":"Ggb5i2jsX1jp"},"source":["### Configuration du modèle"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"ENkqSYs2Tbj9"},"outputs":[],"source":["os.system('mkdir {0}'.format(opt_experiment))\n","\n","opt_manualSeed = random.randint(1, 10000) # fix seed\n","print(\"Random Seed: \", opt_manualSeed)\n","random.seed(opt_manualSeed)\n","torch.manual_seed(opt_manualSeed)\n","\n","cudnn.benchmark = True\n","\n","# folder dataset\n","dataset = dset.ImageFolder(root=opt_dataroot,\n"," transform=transforms.Compose([\n"," transforms.Resize(opt_imageSize),\n"," transforms.CenterCrop(opt_imageSize),\n"," transforms.ToTensor(),\n"," transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),\n"," ]))\n","assert dataset\n","dataloader = torch.utils.data.DataLoader(dataset, batch_size=opt_batchSize,\n"," shuffle=True, num_workers=int(opt_workers))\n","\n","# custom weights initialization called on netG and netD\n","def weights_init(m):\n"," classname = m.__class__.__name__\n"," if classname.find('Conv') != -1:\n"," m.weight.data.normal_(0.0, 0.02)\n"," elif classname.find('BatchNorm') != -1:\n"," m.weight.data.normal_(1.0, 0.02)\n"," m.bias.data.fill_(0)\n","\n","netG = DCGAN_G(opt_imageSize, nz, nc, ngf, ngpu)\n","\n","netG.apply(weights_init)\n","print(netG)\n","\n","netD = DCGAN_D(opt_imageSize, nz, nc, ndf, ngpu)\n","netD.apply(weights_init)\n","\n","print(netD)\n","\n","inputs = torch.FloatTensor(opt_batchSize, 3, opt_imageSize, opt_imageSize)\n","noise = torch.FloatTensor(opt_batchSize, nz, 1, 1)\n","fixed_noise = torch.FloatTensor(opt_batchSize, nz, 1, 1).normal_(0, 1)\n","one = torch.FloatTensor([1])\n","mone = one * -1\n","\n","if opt_cuda:\n"," netD.cuda()\n"," netG.cuda()\n"," inputs = inputs.cuda()\n"," one, mone = one.cuda(), mone.cuda()\n"," noise, fixed_noise = noise.cuda(), fixed_noise.cuda()\n","\n","# setup optimizer\n","optimizerD = optim.RMSprop(netD.parameters(), lr = opt_lrD)\n","optimizerG = optim.RMSprop(netG.parameters(), lr = opt_lrG)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"W96oVYCkjzPR"},"outputs":[],"source":["netD_graph = make_dot(netD(Variable(torch.randn(1, 3, 32, 32))))\n","netD_graph.render(\"netD_architecture\", format=\"png\")\n","\n","netG_graph = make_dot(netG(Variable(torch.randn(1, 100, 1, 1))))\n","netG_graph.render(\"netG_architecture\", format=\"png\")"]},{"cell_type":"markdown","metadata":{"id":"YkMM0kSeX6cH"},"source":["### Entrainement du modèle"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Pf06223hTrpJ"},"outputs":[],"source":["gen_iterations = 0\n","for epoch in range(opt_niter):\n"," data_iter = iter(dataloader)\n"," i = 0\n"," while i < len(dataloader):\n"," ############################\n"," # (1) Update D network\n"," ###########################\n"," for p in netD.parameters(): # reset requires_grad\n"," p.requires_grad = True # they are set to False below in netG update\n","\n"," # train the discriminator Diters times\n"," if gen_iterations < 25 or gen_iterations % 500 == 0:\n"," Diters = 100\n"," else:\n"," Diters = opt_Diters\n"," j = 0\n"," while j < Diters and i < len(dataloader):\n"," j += 1\n","\n"," # clamp parameters to a cube\n"," for p in netD.parameters():\n"," p.data.clamp_(opt_clamp_lower, opt_clamp_upper)\n","\n"," data = next(data_iter)\n"," i += 1\n","\n"," # train with real\n"," real_cpu, _ = data\n"," netD.zero_grad()\n"," batch_size = real_cpu.size(0)\n","\n"," if opt_cuda:\n"," real_cpu = real_cpu.cuda()\n"," inputs.resize_as_(real_cpu).copy_(real_cpu)\n"," inputsv = Variable(inputs)\n","\n"," errD_real = netD(inputsv)\n"," errD_real.backward(one)\n","\n"," # train with fake\n"," noise.resize_(opt_batchSize, nz, 1, 1).normal_(0, 1)\n"," noisev = Variable(noise, volatile = True) # totally freeze netG\n"," fake = Variable(netG(noisev).data)\n"," inputsv = fake\n"," errD_fake = netD(inputsv)\n"," errD_fake.backward(mone)\n"," errD = errD_real - errD_fake\n"," optimizerD.step()\n","\n"," ############################\n"," # (2) Update G network\n"," ###########################\n"," for p in netD.parameters():\n"," p.requires_grad = False # to avoid computation\n"," netG.zero_grad()\n"," # in case our last batch was the tail batch of the dataloader,\n"," # make sure we feed a full batch of noise\n"," noise.resize_(opt_batchSize, nz, 1, 1).normal_(0, 1)\n"," noisev = Variable(noise)\n"," fake = netG(noisev)\n"," errG = netD(fake)\n"," errG.backward(one)\n"," optimizerG.step()\n"," gen_iterations += 1\n","\n"," print('[%d/%d][%d/%d][%d] Loss_D: %f Loss_G: %f Loss_D_real: %f Loss_D_fake %f'\n"," % (epoch, opt_niter, i, len(dataloader), gen_iterations,\n"," errD.data[0], errG.data[0], errD_real.data[0], errD_fake.data[0]))\n"," if gen_iterations % 500 == 0:\n"," real_cpu = real_cpu.mul(0.5).add(0.5)\n"," vutils.save_image(real_cpu, '{0}/real_samples.png'.format(opt_experiment))\n"," fake = netG(Variable(fixed_noise, volatile=True))\n"," fake.data = fake.data.mul(0.5).add(0.5)\n"," vutils.save_image(fake.data, '{0}/fake_samples_{1}.png'.format(opt_experiment, gen_iterations))\n","\n"," # do checkpointing\n"," torch.save(netG.state_dict(), '{0}/netG_epoch_{1}.pth'.format(opt_experiment, epoch))\n"," torch.save(netD.state_dict(), '{0}/netD_epoch_{1}.pth'.format(opt_experiment, epoch))"]},{"cell_type":"markdown","metadata":{"id":"NfPfYpS-T5DN"},"source":["## Generation\n","\n","Voici la section de code correspondant à la génération de nos images, plus tard exploitées."]},{"cell_type":"markdown","metadata":{"id":"-B6fUl2zW_Ak"},"source":["### Options pour l'exécution"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"3r_w9O_zUERu"},"outputs":[],"source":["# Number of images to generate\n","opt_nimages = 100\n","# Path to output directory\n","opt_output_dir = 'data/generated'\n","# Path to generator weights .pth file\n","opt_weights = 'samples/netG_epoch_2384.pth'"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"1smXNRDUw-5k"},"outputs":[],"source":["opt_nimages_resp = input(\"How many images to generate?\\n{} by default\\n\".format(opt_nimages))\n","opt_output_dir_resp = input(\"Where to store generated images?\\n{} by default\\n\".format(opt_output_dir))\n","\n","# Number of images to generate\n","opt_nimages = int(opt_nimages_resp) if opt_nimages_resp != '' else opt_nimages\n","# Path to output directory\n","opt_output_dir = opt_output_dir_resp if opt_output_dir_resp != '' else opt_output_dir"]},{"cell_type":"markdown","metadata":{"id":"XWGZgsuyYDet"},"source":["### Configuration du modèle"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"KLNgMfV_T7Ou"},"outputs":[],"source":["netG = DCGAN_G(opt_imageSize, nz, nc, ngf, ngpu)\n","\n","# load weights\n","if opt_cuda:\n"," netG.load_state_dict(torch.load(opt_weights, map_location=torch.device('cuda')))\n","else:\n"," netG.load_state_dict(torch.load(opt_weights, map_location=torch.device('cpu')))\n","\n","# initialize noise\n","fixed_noise = torch.FloatTensor(opt_nimages, nz, 1, 1).normal_(0, 1)\n","\n","if opt_cuda:\n"," netG.cuda()\n"," fixed_noise = fixed_noise.cuda()\n","\n","fake = netG(fixed_noise)\n","fake.data = fake.data.mul(0.5).add(0.5)\n","\n","if not os.path.exists(opt_output_dir):\n"," os.makedirs(opt_output_dir)"]},{"cell_type":"markdown","metadata":{"id":"rPU04cGsYFVz"},"source":["### Génération des images"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Jg82PIsjUH55"},"outputs":[],"source":["for i in range(opt_nimages):\n"," vutils.save_image(fake.data[i, ...].reshape((1, nc, opt_imageSize, opt_imageSize)), os.path.join(opt_output_dir, \"generated_%02d.png\"%i))"]},{"cell_type":"markdown","metadata":{"id":"ePeCi4PZJ5Ar"},"source":["## Résultats\n","L'entrainement a été réalisé à partir d'un ordinateur équipé d'un GPU accessible via internet ssh et un VPN configuré.\n","\n","Voici la commande utilisée pour effectuer la formation :\n","```shell\n","python main.py --dataset folder --dataroot data/faces --batchSize 2048 --niter 5000 --ngf 32 --ndf 32 --imageSize 32 --cuda\n","```\n","\n","et la génération :\n","```shell\n","python generate.py --config samples/generator_config.json --weight samples/netG_epoch_2384.pth --output_dir data/generated --nimages 100 --cuda\n","```\n","\n","Les fichiers Python sont ceux de l'article original. Ce sont les fichiers qui sont réutilisés et grandement simplifiés dans ce Jupyter Notebook.\n","\n","Les poids et les fichiers de configuration peuvent être téléchargés sur :\n","https://github.com/paul-corbalan/wasserstein-gan/tree/develop/samples\n","\n","La partie verbale de l'exécution `out` est également accessible."]},{"cell_type":"markdown","metadata":{"id":"xme7UNdR678K"},"source":[""]},{"cell_type":"markdown","metadata":{"id":"3PQ8dgt1OLyT"},"source":["Comme nous avons oublié de stocker les pertes pendant la formation, ce script a été créé pour les récupérer à partir du fichier verbeux."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"uLZ1Egi8OEAo"},"outputs":[],"source":["data = open('out', 'r').read()\n","\n","pattern = re.compile(r\"\\[(\\d+)/(\\d+)\\]\\[(\\d+)/(\\d+)\\]\\[(\\d+)\\] Loss_D: ([-+]?\\d*\\.\\d+|\\d+) Loss_G: ([-+]?\\d*\\.\\d+|\\d+) Loss_D_real: ([-+]?\\d*\\.\\d+|\\d+) Loss_D_fake ([-+]?\\d*\\.\\d+|\\d+)\")\n","\n","matches = pattern.findall(data)\n","\n","df = pd.DataFrame(matches, columns=['epoch', 'niter', 'i', 'dataloader_size', 'gen_iterations', 'Loss_D', 'Loss_G', 'Loss_D_real', 'Loss_D_fake'])\n","\n","df = df.apply(pd.to_numeric)\n","\n","plt.plot(df['gen_iterations'][::100], -df['Loss_D'][::100])\n","plt.xlabel('Generator Iterations')\n","plt.ylabel('Loss_D ~ Wasserstein Distance')\n","plt.title('Loss_D vs. Generator Iterations')\n","plt.show()"]},{"cell_type":"markdown","metadata":{"id":"en27GDwsO4J_"},"source":["Nous traçons l'évolution de la perte du discriminateur car, contrairement au GAN classique, elles sont interprétables. Il s'agit en fait d'une estimation de la distance de Wasserstein entre la distribution générée et la distribution cible. Les valeurs décroissantes sont la preuve que le modèle continue d'apprendre et n'est pas affecté par l'un des problèmes des GANs classiques."]},{"cell_type":"markdown","metadata":{"id":"fdKb4NcvNzTz"},"source":["![evolution.png]()"]},{"cell_type":"markdown","metadata":{"id":"qLhnYB0uPpjt"},"source":["## Application\n","### Problème inverse : Récupérer le vecteur latent d'une image\n","\n","Voici le problème d'optimisation que nous voulons résoudre :\n","\n","$$\\underset{z \\in \\mathbb{Z}}{\\text{argmin}}\\lVert g(z)-x_0 \\rVert_2^2$$\n","\n","Voici le code pour résoudre ce problème en utilisant la différenciation automatique de PyTorch :"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"sdc3spKfRDO3"},"outputs":[],"source":["x0 = inputsv[0][None, :,:,:]\n","\n","noise = torch.FloatTensor(1, nz, 1, 1).normal_(0,1)\n","noise.requires_grad = True\n","\n","\n","# Choisissez un optimiseur, par exemple Adam\n","optimizer = optim.Adam([noise], lr=0.001)\n","\n","\n","for p in netD.parameters():\n"," p.requires_grad = False\n","for p in netG.parameters():\n"," p.requires_grad = False\n","\n","# Boucle d'optimisation\n","for iteration in range(100000):\n"," optimizer.zero_grad()\n","\n"," # Générer une donnée à partir de z\n"," generated_data = netG(noise)\n","\n"," # Calculer la perte (norme L2 au carré)\n"," loss = torch.norm(generated_data - x0)**2\n","\n"," # Rétropropagation et optimisation\n"," loss.backward()\n"," optimizer.step()\n","\n"," print(f\"Iteration: {iteration}, Loss: {loss.item()}\")"]},{"cell_type":"markdown","metadata":{"id":"pUw0PYPNR5wo"},"source":["Voici un exemple :\n","\n","- Image cible :"]},{"cell_type":"markdown","metadata":{"id":"Ic24cRnN7aus"},"source":[""]},{"cell_type":"markdown","metadata":{"id":"I-DndMBm7YJa"},"source":["- Image trouvée :"]},{"cell_type":"markdown","metadata":{"id":"HXoaWAoV7hoj"},"source":[""]},{"cell_type":"markdown","metadata":{"id":"_tBHFalDSRcI"},"source":["À partir d'un vecteur latent, nous pouvons explorer l'espace engendré par le générateur.\n","\n","---\n","\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"pS4lMtoBSgnZ"},"outputs":[],"source":["fixed_noise = noise.repeat(25, 1, 1, 1)\n","\n","n = int(fixed_noise.shape[0]**.5)\n","for i in range(n):\n"," for j in range(n):\n"," fixed_noise[i*n+j] = fixed_noise[0] + i*torch.eye(nz)[0][:, None, None] + j*torch.eye(nz)[1][:, None, None]\n","\n"," for n in fixed_noise:\n"," im = netG(n[None, :,: ,:])[0]\n"," plt.imshow(torch.permute(im.detach(), (1,2,0)))\n"," plt.show()"]},{"cell_type":"markdown","metadata":{"id":"nDMasgAOS8aa"},"source":[""]},{"cell_type":"markdown","metadata":{"id":"cgRBSaWRTAtT"},"source":["D'après les propriétés énnoncées, nous savons que la distance de Wasserstein est Lipschitz sur l'espace engendré par le générateur. Par conséquent, nous remarquons que toutes les images sont des barycentres de Wasserstein pour les autres."]},{"cell_type":"markdown","metadata":{"id":"cyp-fhG03bS3"},"source":["## Conclusion\n","\n","Notre exploration des Wasserstein GANs, telle que détaillée dans ce projet, nous a permis de nous familiariser avec des concepts clés tels que le Generative Adversarial Network (GAN), la distance de Wasserstein, et les techniques avancées de traitement de données. Cette compréhension approfondie nous a équipés avec une perspective unique et une connaissance approfondie des mécanismes sous-jacents aux GANs, ainsi que de leur potentiel dans des applications variées.\n","\n","En regardant vers l'avenir, nous identifions plusieurs domaines potentiels d'amélioration et de recherche. Parmi ceux-ci, comparer les Wasserstein GANs avec d'autres formes de GANs utilisant différentes fonctions de coût se présente comme une piste prometteuse. De plus, l'optimisation des paramètres du modèle, en se concentrant sur des aspects tels que le taux d'apprentissage, le clipping, et le nombre d'itérations du discriminant, pourrait conduire à des avancées significatives dans la performance et l'efficacité des GANs."]}],"metadata":{"colab":{"provenance":[],"toc_visible":true},"kernelspec":{"display_name":"Python 3 (ipykernel)","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.11.7"}},"nbformat":4,"nbformat_minor":0} \ No newline at end of file +{"cells":[{"cell_type":"markdown","metadata":{"id":"arqpDh95Zt6W"},"source":["# Wasserstein GAN\n","\n","L'objectif de ce projet était d'étudier les GANs dans le cas de la distance de Wasserstein.\n","\n","Voici les membres de notre groupe classés par ordres alphabétiques pour leur nom de famille :\n","- Paul Corbalan\n","- Nicolas Gonel\n","- Oihan Joyot\n","- Tristan Portugues\n","- Florian Zorzynski\n","\n","Notre projet s'inspire grandement des ressources suivantes qui sont l'article initial de notre projet ainsi que le code correspondant.\n","- Article : [[1701.07875] Wasserstein GAN (arxiv.org)](https://arxiv.org/abs/1701.07875)\n","- Code : [martinarjovsky/WassersteinGAN (github.com)](https://github.com/martinarjovsky/WassersteinGAN)\n","\n","Un répertoire pour ce projet en général est disponible à l'adresse suivante :\n","https://code.paul-corbalan.com/paul-corbalan/wasserstein-gan\n","\n","---"]},{"cell_type":"markdown","metadata":{"id":"Hke_hyXT3bSP"},"source":["## Introduction\n","\n","Les Réseaux Antagonistes Génératifs (GANs) représentent une avancée majeure dans le domaine de l'apprentissage profond, révolutionnant la manière dont les machines comprennent et génèrent des données, en particulier des images. Cette technologie imite la façon dont les humains apprennent et créent, ouvrant des portes vers des applications innovantes allant de l'art numérique à des solutions médicales avancées. Le projet \"Wasserstein GAN\" s'inscrit dans cette perspective, visant à explorer une variante spécifique des GANs qui utilise la distance de Wasserstein pour améliorer la stabilité et la qualité des résultats.\n","\n","Le choix de la distance de Wasserstein comme métrique clé dans notre projet offre un avantage distinct sur les méthodes traditionnelles. Elle permet de surmonter certains des défis inhérents aux GANs classiques, comme le mode collapse et les problèmes de convergence. En se concentrant sur cette approche, notre projet cherche à démontrer comment une compréhension approfondie de la théorie mathématique peut être appliquée efficacement pour améliorer la performance et la fiabilité des modèles génératifs.\n","\n","Ce notebook est conçu pour servir d'outil d'apprentissage et d'exploration dans le domaine des GANs, avec un accent particulier sur les Wasserstein GANs. Il guide le lecteur à travers les principes fondamentaux, les défis et les solutions uniques associés à cette technologie, offrant un mélange d'explications théoriques et d'applications pratiques. L'objectif est de fournir une base solide pour comprendre et utiliser les Wasserstein GANs."]},{"cell_type":"markdown","metadata":{"id":"P_O5MYE_xMU5"},"source":["## Generative Adversarial Network (GAN)\n","\n","Les GANs sont des modèles d'apprentissage profond définis par deux réseaux neuronaux, le générateur $G$ et le discriminateur $D$.\n","\n"," Le générateur crée des données, tandis que le discriminateur les évalue. L'objectif du générateur est d'approcher un distribution $\\mathbb{P}_g$ inconnue telle que les données générées $G(z)$ soient indiscernables des données réelles $x$, où $z$ un vecteur de notre espace latent. Le discriminateur est entraîné à faire la distinction entre un inputs et $x$.\n","\n"," Les deux modèles sont mis en compétition et $G$ cherche à minimiser la probabilité que $D$ fasse la distinction entre $G(z)$ et $x$, tandis que $D$ cherche à maximiser cette probabilité.\n","\n","Formellement, cela correspond à résoudre le problème min-max pour :\n","$$V(D, G) = \\mathbb{E}_{x \\sim \\mathbb{P}_{r}}[\\log D(x)] + \\mathbb{E}_{z \\sim \\mathbb{P}_z}[\\log(1 - D(G(z)))]$$\n","où le problème est le suivant :\n","$$\n","\\min _G \\max _D V(D, G)\n","$$\n","\n"," Cette minimisation utilise la log-vraissemblance négative et est la solution d'origine pour arriver à un équilibre entre le générateur et le discriminateur. Cependant cette méthode peut présenter plusieurs problèmes:\n"," - des \"modes collapse\" où l'entraînement converge vers une solution oubliant certaines particularités de la distibution cherché.\n"," - des gradients évanescents, lorsque le discriminateur devient trop parfait, il devient impossible de générer un gradient utilisable à partir de la sortie du discriminateur.\n","\n"," On va donc chercher un moyen de résoudre ces problèmes en changeant de fonction de perte et on va essayer d'utiliser la distance de wasserstein."]},{"cell_type":"markdown","metadata":{"id":"yIx0TILzG6-l"},"source":["## Distance de Wasserstein\n","\n","### Définition\n","\n","En mathématiques, la distance de Wasserstein est une fonction définie entre des distributions de probabilité sur un espace métrique donné $(M,d)$. La distance de Wasserstein d'ordre $p \\in \\left[1,+\\infty \\right]$ entre deux mesures de probabilité $\\mu$ et $\\nu$ définies sur $M$ (avec des moments finis de l'ordre $p$) est définie par :\n","\n","\\begin{equation}\n","\\begin{split}\n","{\\displaystyle W_{p}(\\mu ,\\nu )=\\left(\\inf _{\\gamma \\in \\Gamma (\\mu ,\\nu )}\\mathbf {E} _{(x,y)\\sim \\gamma }d(x,y)^{p}\\right)^{1/p}.}\n","\\end{split}\n","\\end{equation}\n","\n","L'infimum est pris sur $\\Gamma (\\mu,\\nu )$, l'ensemble de tous les couplages dont les distributions marginales sont respectivement $\\mu$ et $\\nu$.\n","\n","### Interprétation physique\n","\n","Cette métrique est également connue sous le nom de earth mover distance. De manière intuitive, si l'on imagine chaque distribution comme une unité de terre empilée sur un espace métrique $M$, la métrique représente le coût minimal pour remodeler une pile en une autre. Ce coût est conçu comme la quantité de terre à déplacer, multipliée par la distance moyenne qu'elle doit parcourir.\n","\n","En d'autres termes, la distance de Wasserstein fournit une mesure précise du coût minimal nécessaire pour transformer une distribution de probabilité en une autre, tout en minimisant le coût total de ce déplacement. C'est pourquoi l'avantage de cette distance réside dans son incorporation des concepts de transport optimal et de couplage, tous deux pertinents et pratiques pour l'étude.\n","\n","### Utilisation pour les images numériques\n","\n","Pour résumer, la distance de Wasserstein est une manière naturelle de comparer les distributions de probabilité de deux variables, où une variable est dérivée de l'autre par de petites perturbations non uniformes (aléatoires ou déterministes). C'est pourquoi, en informatique, cette métrique est largement utilisée pour comparer des distributions discrètes, notamment les histogrammes de couleur de deux images numériques.\n","\n","\n","### Calcul numérique\n","\n","Le problème principal de cette distance est son calcul, l'infimum étant très compliqué à calculer. Heureusement, la dualité de Kantorovich-Rubinstein nous donne :\n","\\begin{equation}\n","W(\\mathbb{P}_r, \\mathbb{P}_{\\theta})=\\frac{1}{K}\\sup_{||f|| 0.7:\n"," print(f\"Image {filename} is predicted as a human face and will be moved.\")\n"," shutil.copy(os.path.join(folder_path, filename), 'predicted_humans')\n","\n","\n","predict_and_move(\"..\")"]},{"cell_type":"markdown","metadata":{"id":"Pl2UlIxJMb-s"},"source":["![confusion.svg]()"]},{"cell_type":"markdown","metadata":{"id":"vEH1G2zSIZ3p"},"source":["L'ensemble de données a ensuite été mis à l'échelle pour appliquer le filtre de convolution à l'aide de la commande :\n","```shell\n","convert *.png resized 400% *upscaled*/*.png\n","```\n","\n","Les données finales peuvent être téléchargées sur :\n","https://code.paul-corbalan.com/paul-corbalan/wasserstein-gan/media/branch/master/data/predicted_humans.zip"]},{"cell_type":"markdown","metadata":{"id":"xA3azaz1US4k"},"source":["## Architecture du modèle GAN\n","\n","Dans cette partie sont décrites les architectures du discriminateur et du générateur de notre modèle."]},{"cell_type":"markdown","metadata":{"id":"dxaMaimlY21r"},"source":["### Discriminateur"]},{"cell_type":"markdown","metadata":{"id":"dFj_wV-i4dIG"},"source":["![netD_architecture.svg]()"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"5DgOVOMHSMqA"},"outputs":[],"source":["class DCGAN_D(nn.Module):\n"," def __init__(self, isize, nz, nc, ndf, ngpu, n_extra_layers=0):\n"," super(DCGAN_D, self).__init__()\n"," self.ngpu = ngpu\n"," assert isize % 16 == 0, \"isize has to be a multiple of 16\"\n","\n"," main = nn.Sequential()\n"," # inputs is nc x isize x isize\n"," main.add_module('initial:{0}-{1}:conv'.format(nc, ndf),\n"," nn.Conv2d(nc, ndf, 4, 2, 1, bias=False))\n"," main.add_module('initial:{0}:relu'.format(ndf),\n"," nn.LeakyReLU(0.2, inplace=True))\n"," csize, cndf = isize / 2, ndf\n","\n"," # Extra layers\n"," for t in range(n_extra_layers):\n"," main.add_module('extra-layers-{0}:{1}:conv'.format(t, cndf),\n"," nn.Conv2d(cndf, cndf, 3, 1, 1, bias=False))\n"," main.add_module('extra-layers-{0}:{1}:batchnorm'.format(t, cndf),\n"," nn.BatchNorm2d(cndf))\n"," main.add_module('extra-layers-{0}:{1}:relu'.format(t, cndf),\n"," nn.LeakyReLU(0.2, inplace=True))\n","\n"," while csize > 4:\n"," in_feat = cndf\n"," out_feat = cndf * 2\n"," main.add_module('pyramid:{0}-{1}:conv'.format(in_feat, out_feat),\n"," nn.Conv2d(in_feat, out_feat, 4, 2, 1, bias=False))\n"," main.add_module('pyramid:{0}:batchnorm'.format(out_feat),\n"," nn.BatchNorm2d(out_feat))\n"," main.add_module('pyramid:{0}:relu'.format(out_feat),\n"," nn.LeakyReLU(0.2, inplace=True))\n"," cndf = cndf * 2\n"," csize = csize / 2\n","\n"," # state size. K x 4 x 4\n"," main.add_module('final:{0}-{1}:conv'.format(cndf, 1),\n"," nn.Conv2d(cndf, 1, 4, 1, 0, bias=False))\n"," self.main = main\n","\n","\n"," def forward(self, inputs):\n"," if isinstance(inputs.data, torch.cuda.FloatTensor) and self.ngpu > 1:\n"," output = nn.parallel.data_parallel(self.main, inputs, range(self.ngpu))\n"," else:\n"," output = self.main(inputs)\n","\n"," output = output.mean(0)\n"," return output.view(1)"]},{"cell_type":"markdown","metadata":{"id":"2X34F3vIY-6c"},"source":["### Generateur"]},{"cell_type":"markdown","metadata":{"id":"Kg60CsB74fHR"},"source":["![netG_architecture.svg]()"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"CiqFCjoAYtBq"},"outputs":[],"source":["class DCGAN_G(nn.Module):\n"," def __init__(self, isize, nz, nc, ngf, ngpu, n_extra_layers=0):\n"," super(DCGAN_G, self).__init__()\n"," self.ngpu = ngpu\n"," assert isize % 16 == 0, \"isize has to be a multiple of 16\"\n","\n"," cngf, tisize = ngf//2, 4\n"," while tisize != isize:\n"," cngf = cngf * 2\n"," tisize = tisize * 2\n","\n"," main = nn.Sequential()\n"," # inputs is Z, going into a convolution\n"," main.add_module('initial:{0}-{1}:convt'.format(nz, cngf),\n"," nn.ConvTranspose2d(nz, cngf, 4, 1, 0, bias=False))\n"," main.add_module('initial:{0}:batchnorm'.format(cngf),\n"," nn.BatchNorm2d(cngf))\n"," main.add_module('initial:{0}:relu'.format(cngf),\n"," nn.ReLU(True))\n","\n"," csize, cndf = 4, cngf\n"," while csize < isize//2:\n"," main.add_module('pyramid:{0}-{1}:convt'.format(cngf, cngf//2),\n"," nn.ConvTranspose2d(cngf, cngf//2, 4, 2, 1, bias=False))\n"," main.add_module('pyramid:{0}:batchnorm'.format(cngf//2),\n"," nn.BatchNorm2d(cngf//2))\n"," main.add_module('pyramid:{0}:relu'.format(cngf//2),\n"," nn.ReLU(True))\n"," cngf = cngf // 2\n"," csize = csize * 2\n","\n"," # Extra layers\n"," for t in range(n_extra_layers):\n"," main.add_module('extra-layers-{0}:{1}:conv'.format(t, cngf),\n"," nn.Conv2d(cngf, cngf, 3, 1, 1, bias=False))\n"," main.add_module('extra-layers-{0}:{1}:batchnorm'.format(t, cngf),\n"," nn.BatchNorm2d(cngf))\n"," main.add_module('extra-layers-{0}:{1}:relu'.format(t, cngf),\n"," nn.ReLU(True))\n","\n"," main.add_module('final:{0}-{1}:convt'.format(cngf, nc),\n"," nn.ConvTranspose2d(cngf, nc, 4, 2, 1, bias=False))\n"," main.add_module('final:{0}:tanh'.format(nc),\n"," nn.Tanh())\n"," self.main = main\n","\n"," def forward(self, inputs):\n"," if isinstance(inputs.data, torch.cuda.FloatTensor) and self.ngpu > 1:\n"," output = nn.parallel.data_parallel(self.main, inputs, range(self.ngpu))\n"," else:\n"," output = self.main(inputs)\n"," return output"]},{"cell_type":"markdown","metadata":{"id":"zyRrm3ClUMX1"},"source":["## Entrainement\n","\n","Voici un exemple de code qui permettrait l'entraînement de notre modèle et la configuration de celui-ci."]},{"cell_type":"markdown","metadata":{"id":"PhynHzMXXEyP"},"source":["### Options pour l'exécution"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"zkk11X5zSQLK"},"outputs":[],"source":["# Path to dataset\n","opt_dataroot = 'data/faces'\n","# Number of data loading workers\n","opt_workers = 2\n","# inputs batch size\n","opt_batchSize = 64\n","# The height / width of the inputs image to network\n","opt_imageSize = 32\n","# inputs image channels\n","nc = 3\n","# Size of the latent z vector\n","nz = 100\n","# Size of feature maps in generator\n","ngf = 32\n","# Size of feature maps in discriminator\n","ndf = 32\n","# Number of epochs to train for\n","opt_niter = 25\n","# Learning rate for Discriminator\n","opt_lrD = 0.00005\n","# Learning rate for Generator\n","opt_lrG = 0.00005\n","# beta1 for adam.\n","opt_beta1 = 0.5\n","# Lower value clamp for Discriminator weights\n","opt_clamp_lower = -0.01\n","# Upper value clamp for Discriminator weights\n","opt_clamp_upper = 0.01\n","# Number of D iters per each G iter\n","opt_Diters = 5\n","# Where to store samples and models\n","opt_experiment = 'samples'"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"LVYpQv-_tFsK"},"outputs":[],"source":["# Use CUDA if a GPU is available\n","opt_cuda = False\n","\n","opt_cuda_resp = input(\"Use cuda? (y/n)\\n{} by default\\n\".format(opt_cuda))\n","\n","# Use CUDA if a GPU is available\n","opt_cuda = True if opt_cuda_resp == 'y' else opt_cuda\n","# Number of GPUs to use for running the model if CUDA is enabled\n","ngpu = 1"]},{"cell_type":"markdown","metadata":{"id":"Ggb5i2jsX1jp"},"source":["### Configuration du modèle"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"ENkqSYs2Tbj9"},"outputs":[],"source":["os.system('mkdir {0}'.format(opt_experiment))\n","\n","opt_manualSeed = random.randint(1, 10000) # fix seed\n","print(\"Random Seed: \", opt_manualSeed)\n","random.seed(opt_manualSeed)\n","torch.manual_seed(opt_manualSeed)\n","\n","cudnn.benchmark = True\n","\n","# folder dataset\n","dataset = dset.ImageFolder(root=opt_dataroot,\n"," transform=transforms.Compose([\n"," transforms.Resize(opt_imageSize),\n"," transforms.CenterCrop(opt_imageSize),\n"," transforms.ToTensor(),\n"," transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),\n"," ]))\n","assert dataset\n","dataloader = torch.utils.data.DataLoader(dataset, batch_size=opt_batchSize,\n"," shuffle=True, num_workers=int(opt_workers))\n","\n","# custom weights initialization called on netG and netD\n","def weights_init(m):\n"," classname = m.__class__.__name__\n"," if classname.find('Conv') != -1:\n"," m.weight.data.normal_(0.0, 0.02)\n"," elif classname.find('BatchNorm') != -1:\n"," m.weight.data.normal_(1.0, 0.02)\n"," m.bias.data.fill_(0)\n","\n","netG = DCGAN_G(opt_imageSize, nz, nc, ngf, ngpu)\n","\n","netG.apply(weights_init)\n","print(netG)\n","\n","netD = DCGAN_D(opt_imageSize, nz, nc, ndf, ngpu)\n","netD.apply(weights_init)\n","\n","print(netD)\n","\n","inputs = torch.FloatTensor(opt_batchSize, 3, opt_imageSize, opt_imageSize)\n","noise = torch.FloatTensor(opt_batchSize, nz, 1, 1)\n","fixed_noise = torch.FloatTensor(opt_batchSize, nz, 1, 1).normal_(0, 1)\n","one = torch.FloatTensor([1])\n","mone = one * -1\n","\n","if opt_cuda:\n"," netD.cuda()\n"," netG.cuda()\n"," inputs = inputs.cuda()\n"," one, mone = one.cuda(), mone.cuda()\n"," noise, fixed_noise = noise.cuda(), fixed_noise.cuda()\n","\n","# setup optimizer\n","optimizerD = optim.RMSprop(netD.parameters(), lr = opt_lrD)\n","optimizerG = optim.RMSprop(netG.parameters(), lr = opt_lrG)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"W96oVYCkjzPR"},"outputs":[],"source":["netD_graph = make_dot(netD(Variable(torch.randn(1, 3, 32, 32))))\n","netD_graph.render(\"netD_architecture\", format=\"png\")\n","\n","netG_graph = make_dot(netG(Variable(torch.randn(1, 100, 1, 1))))\n","netG_graph.render(\"netG_architecture\", format=\"png\")"]},{"cell_type":"markdown","metadata":{"id":"YkMM0kSeX6cH"},"source":["### Entrainement du modèle"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Pf06223hTrpJ"},"outputs":[],"source":["gen_iterations = 0\n","for epoch in range(opt_niter):\n"," data_iter = iter(dataloader)\n"," i = 0\n"," while i < len(dataloader):\n"," ############################\n"," # (1) Update D network\n"," ###########################\n"," for p in netD.parameters(): # reset requires_grad\n"," p.requires_grad = True # they are set to False below in netG update\n","\n"," # train the discriminator Diters times\n"," if gen_iterations < 25 or gen_iterations % 500 == 0:\n"," Diters = 100\n"," else:\n"," Diters = opt_Diters\n"," j = 0\n"," while j < Diters and i < len(dataloader):\n"," j += 1\n","\n"," # clamp parameters to a cube\n"," for p in netD.parameters():\n"," p.data.clamp_(opt_clamp_lower, opt_clamp_upper)\n","\n"," data = next(data_iter)\n"," i += 1\n","\n"," # train with real\n"," real_cpu, _ = data\n"," netD.zero_grad()\n"," batch_size = real_cpu.size(0)\n","\n"," if opt_cuda:\n"," real_cpu = real_cpu.cuda()\n"," inputs.resize_as_(real_cpu).copy_(real_cpu)\n"," inputsv = Variable(inputs)\n","\n"," errD_real = netD(inputsv)\n"," errD_real.backward(one)\n","\n"," # train with fake\n"," noise.resize_(opt_batchSize, nz, 1, 1).normal_(0, 1)\n"," noisev = Variable(noise, volatile = True) # totally freeze netG\n"," fake = Variable(netG(noisev).data)\n"," inputsv = fake\n"," errD_fake = netD(inputsv)\n"," errD_fake.backward(mone)\n"," errD = errD_real - errD_fake\n"," optimizerD.step()\n","\n"," ############################\n"," # (2) Update G network\n"," ###########################\n"," for p in netD.parameters():\n"," p.requires_grad = False # to avoid computation\n"," netG.zero_grad()\n"," # in case our last batch was the tail batch of the dataloader,\n"," # make sure we feed a full batch of noise\n"," noise.resize_(opt_batchSize, nz, 1, 1).normal_(0, 1)\n"," noisev = Variable(noise)\n"," fake = netG(noisev)\n"," errG = netD(fake)\n"," errG.backward(one)\n"," optimizerG.step()\n"," gen_iterations += 1\n","\n"," print('[%d/%d][%d/%d][%d] Loss_D: %f Loss_G: %f Loss_D_real: %f Loss_D_fake %f'\n"," % (epoch, opt_niter, i, len(dataloader), gen_iterations,\n"," errD.data[0], errG.data[0], errD_real.data[0], errD_fake.data[0]))\n"," if gen_iterations % 500 == 0:\n"," real_cpu = real_cpu.mul(0.5).add(0.5)\n"," vutils.save_image(real_cpu, '{0}/real_samples.png'.format(opt_experiment))\n"," fake = netG(Variable(fixed_noise, volatile=True))\n"," fake.data = fake.data.mul(0.5).add(0.5)\n"," vutils.save_image(fake.data, '{0}/fake_samples_{1}.png'.format(opt_experiment, gen_iterations))\n","\n"," # do checkpointing\n"," torch.save(netG.state_dict(), '{0}/netG_epoch_{1}.pth'.format(opt_experiment, epoch))\n"," torch.save(netD.state_dict(), '{0}/netD_epoch_{1}.pth'.format(opt_experiment, epoch))"]},{"cell_type":"markdown","metadata":{"id":"NfPfYpS-T5DN"},"source":["## Generation\n","\n","Voici la section de code correspondant à la génération de nos images, plus tard exploitées."]},{"cell_type":"markdown","metadata":{"id":"-B6fUl2zW_Ak"},"source":["### Options pour l'exécution"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"3r_w9O_zUERu"},"outputs":[],"source":["# Number of images to generate\n","opt_nimages = 100\n","# Path to output directory\n","opt_output_dir = 'data/generated'\n","# Path to generator weights .pth file\n","opt_weights = 'samples/netG_epoch_2384.pth'"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"1smXNRDUw-5k"},"outputs":[],"source":["opt_nimages_resp = input(\"How many images to generate?\\n{} by default\\n\".format(opt_nimages))\n","opt_output_dir_resp = input(\"Where to store generated images?\\n{} by default\\n\".format(opt_output_dir))\n","\n","# Number of images to generate\n","opt_nimages = int(opt_nimages_resp) if opt_nimages_resp != '' else opt_nimages\n","# Path to output directory\n","opt_output_dir = opt_output_dir_resp if opt_output_dir_resp != '' else opt_output_dir"]},{"cell_type":"markdown","metadata":{"id":"XWGZgsuyYDet"},"source":["### Configuration du modèle"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"KLNgMfV_T7Ou"},"outputs":[],"source":["netG = DCGAN_G(opt_imageSize, nz, nc, ngf, ngpu)\n","\n","# load weights\n","if opt_cuda:\n"," netG.load_state_dict(torch.load(opt_weights, map_location=torch.device('cuda')))\n","else:\n"," netG.load_state_dict(torch.load(opt_weights, map_location=torch.device('cpu')))\n","\n","# initialize noise\n","fixed_noise = torch.FloatTensor(opt_nimages, nz, 1, 1).normal_(0, 1)\n","\n","if opt_cuda:\n"," netG.cuda()\n"," fixed_noise = fixed_noise.cuda()\n","\n","fake = netG(fixed_noise)\n","fake.data = fake.data.mul(0.5).add(0.5)\n","\n","if not os.path.exists(opt_output_dir):\n"," os.makedirs(opt_output_dir)"]},{"cell_type":"markdown","metadata":{"id":"rPU04cGsYFVz"},"source":["### Génération des images"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Jg82PIsjUH55"},"outputs":[],"source":["for i in range(opt_nimages):\n"," vutils.save_image(fake.data[i, ...].reshape((1, nc, opt_imageSize, opt_imageSize)), os.path.join(opt_output_dir, \"generated_%02d.png\"%i))"]},{"cell_type":"markdown","metadata":{"id":"ePeCi4PZJ5Ar"},"source":["## Résultats\n","L'entrainement a été réalisé à partir d'un ordinateur équipé d'un GPU accessible via internet ssh et un VPN configuré.\n","\n","Voici la commande utilisée pour effectuer la formation :\n","```shell\n","python main.py --dataset folder --dataroot data/faces --batchSize 2048 --niter 5000 --ngf 32 --ndf 32 --imageSize 32 --cuda\n","```\n","\n","et la génération :\n","```shell\n","python generate.py --config samples/generator_config.json --weight samples/netG_epoch_2384.pth --output_dir data/generated --nimages 100 --cuda\n","```\n","\n","Les fichiers Python sont ceux de l'article original. Ce sont les fichiers qui sont réutilisés et grandement simplifiés dans ce Jupyter Notebook.\n","\n","Les poids et les fichiers de configuration peuvent être téléchargés sur :\n","https://code.paul-corbalan.com/paul-corbalan/wasserstein-gan/src/branch/master/samples\n","\n","La partie verbale de l'exécution `out` est également accessible."]},{"cell_type":"markdown","metadata":{"id":"xme7UNdR678K"},"source":[""]},{"cell_type":"markdown","metadata":{"id":"3PQ8dgt1OLyT"},"source":["Comme nous avons oublié de stocker les pertes pendant la formation, ce script a été créé pour les récupérer à partir du fichier verbeux."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"uLZ1Egi8OEAo"},"outputs":[],"source":["data = open('out', 'r').read()\n","\n","pattern = re.compile(r\"\\[(\\d+)/(\\d+)\\]\\[(\\d+)/(\\d+)\\]\\[(\\d+)\\] Loss_D: ([-+]?\\d*\\.\\d+|\\d+) Loss_G: ([-+]?\\d*\\.\\d+|\\d+) Loss_D_real: ([-+]?\\d*\\.\\d+|\\d+) Loss_D_fake ([-+]?\\d*\\.\\d+|\\d+)\")\n","\n","matches = pattern.findall(data)\n","\n","df = pd.DataFrame(matches, columns=['epoch', 'niter', 'i', 'dataloader_size', 'gen_iterations', 'Loss_D', 'Loss_G', 'Loss_D_real', 'Loss_D_fake'])\n","\n","df = df.apply(pd.to_numeric)\n","\n","plt.plot(df['gen_iterations'][::100], -df['Loss_D'][::100])\n","plt.xlabel('Generator Iterations')\n","plt.ylabel('Loss_D ~ Wasserstein Distance')\n","plt.title('Loss_D vs. Generator Iterations')\n","plt.show()"]},{"cell_type":"markdown","metadata":{"id":"en27GDwsO4J_"},"source":["Nous traçons l'évolution de la perte du discriminateur car, contrairement au GAN classique, elles sont interprétables. Il s'agit en fait d'une estimation de la distance de Wasserstein entre la distribution générée et la distribution cible. Les valeurs décroissantes sont la preuve que le modèle continue d'apprendre et n'est pas affecté par l'un des problèmes des GANs classiques."]},{"cell_type":"markdown","metadata":{"id":"fdKb4NcvNzTz"},"source":["![evolution.png]()"]},{"cell_type":"markdown","metadata":{"id":"qLhnYB0uPpjt"},"source":["## Application\n","### Problème inverse : Récupérer le vecteur latent d'une image\n","\n","Voici le problème d'optimisation que nous voulons résoudre :\n","\n","$$\\underset{z \\in \\mathbb{Z}}{\\text{argmin}}\\lVert g(z)-x_0 \\rVert_2^2$$\n","\n","Voici le code pour résoudre ce problème en utilisant la différenciation automatique de PyTorch :"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"sdc3spKfRDO3"},"outputs":[],"source":["x0 = inputsv[0][None, :,:,:]\n","\n","noise = torch.FloatTensor(1, nz, 1, 1).normal_(0,1)\n","noise.requires_grad = True\n","\n","\n","# Choisissez un optimiseur, par exemple Adam\n","optimizer = optim.Adam([noise], lr=0.001)\n","\n","\n","for p in netD.parameters():\n"," p.requires_grad = False\n","for p in netG.parameters():\n"," p.requires_grad = False\n","\n","# Boucle d'optimisation\n","for iteration in range(100000):\n"," optimizer.zero_grad()\n","\n"," # Générer une donnée à partir de z\n"," generated_data = netG(noise)\n","\n"," # Calculer la perte (norme L2 au carré)\n"," loss = torch.norm(generated_data - x0)**2\n","\n"," # Rétropropagation et optimisation\n"," loss.backward()\n"," optimizer.step()\n","\n"," print(f\"Iteration: {iteration}, Loss: {loss.item()}\")"]},{"cell_type":"markdown","metadata":{"id":"pUw0PYPNR5wo"},"source":["Voici un exemple :\n","\n","- Image cible :"]},{"cell_type":"markdown","metadata":{"id":"Ic24cRnN7aus"},"source":[""]},{"cell_type":"markdown","metadata":{"id":"I-DndMBm7YJa"},"source":["- Image trouvée :"]},{"cell_type":"markdown","metadata":{"id":"HXoaWAoV7hoj"},"source":[""]},{"cell_type":"markdown","metadata":{"id":"_tBHFalDSRcI"},"source":["À partir d'un vecteur latent, nous pouvons explorer l'espace engendré par le générateur.\n","\n","---\n","\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"pS4lMtoBSgnZ"},"outputs":[],"source":["fixed_noise = noise.repeat(25, 1, 1, 1)\n","\n","n = int(fixed_noise.shape[0]**.5)\n","for i in range(n):\n"," for j in range(n):\n"," fixed_noise[i*n+j] = fixed_noise[0] + i*torch.eye(nz)[0][:, None, None] + j*torch.eye(nz)[1][:, None, None]\n","\n"," for n in fixed_noise:\n"," im = netG(n[None, :,: ,:])[0]\n"," plt.imshow(torch.permute(im.detach(), (1,2,0)))\n"," plt.show()"]},{"cell_type":"markdown","metadata":{"id":"nDMasgAOS8aa"},"source":[""]},{"cell_type":"markdown","metadata":{"id":"cgRBSaWRTAtT"},"source":["D'après les propriétés énnoncées, nous savons que la distance de Wasserstein est Lipschitz sur l'espace engendré par le générateur. Par conséquent, nous remarquons que toutes les images sont des barycentres de Wasserstein pour les autres."]},{"cell_type":"markdown","metadata":{"id":"cyp-fhG03bS3"},"source":["## Conclusion\n","\n","Notre exploration des Wasserstein GANs, telle que détaillée dans ce projet, nous a permis de nous familiariser avec des concepts clés tels que le Generative Adversarial Network (GAN), la distance de Wasserstein, et les techniques avancées de traitement de données. Cette compréhension approfondie nous a équipés avec une perspective unique et une connaissance approfondie des mécanismes sous-jacents aux GANs, ainsi que de leur potentiel dans des applications variées.\n","\n","En regardant vers l'avenir, nous identifions plusieurs domaines potentiels d'amélioration et de recherche. Parmi ceux-ci, comparer les Wasserstein GANs avec d'autres formes de GANs utilisant différentes fonctions de coût se présente comme une piste prometteuse. De plus, l'optimisation des paramètres du modèle, en se concentrant sur des aspects tels que le taux d'apprentissage, le clipping, et le nombre d'itérations du discriminant, pourrait conduire à des avancées significatives dans la performance et l'efficacité des GANs."]}],"metadata":{"colab":{"provenance":[],"toc_visible":true},"kernelspec":{"display_name":"Python 3 (ipykernel)","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.11.7"}},"nbformat":4,"nbformat_minor":0}