Buckets:
| <meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"Diffusion 모델 평가하기","local":"evaluating-diffusion-models","sections":[{"title":"시나리오","local":"scenarios","sections":[],"depth":2},{"title":"정성적 평가","local":"qualitative-evaluation","sections":[],"depth":2},{"title":"정량적 평가","local":"quantitative-evaluation","sections":[{"title":"텍스트 안내 이미지 생성","local":"text-guided-image-generation","sections":[],"depth":3},{"title":"이미지 조건화된 텍스트-이미지 생성","local":"image-conditioned-text-to-image-generation","sections":[],"depth":3},{"title":"클래스 조건화 이미지 생성","local":"class-conditioned-image-generation","sections":[],"depth":3}],"depth":2}],"depth":1}"> | |
| <link href="/docs/diffusers/pr_11452/ko/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11452/ko/_app/immutable/entry/start.475318ac.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11452/ko/_app/immutable/chunks/scheduler.94020406.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11452/ko/_app/immutable/chunks/singletons.7249ad28.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11452/ko/_app/immutable/chunks/index.8b553f6b.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11452/ko/_app/immutable/chunks/paths.36fb8dff.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11452/ko/_app/immutable/entry/app.42775c8f.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11452/ko/_app/immutable/chunks/index.a08c8d92.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11452/ko/_app/immutable/nodes/0.0d510dbd.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11452/ko/_app/immutable/chunks/each.e59479a4.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11452/ko/_app/immutable/nodes/5.d0c6ba13.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11452/ko/_app/immutable/chunks/Tip.3b0aeee8.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11452/ko/_app/immutable/chunks/CodeBlock.f1fae7de.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11452/ko/_app/immutable/chunks/getInferenceSnippets.58cd4b84.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"Diffusion 모델 평가하기","local":"evaluating-diffusion-models","sections":[{"title":"시나리오","local":"scenarios","sections":[],"depth":2},{"title":"정성적 평가","local":"qualitative-evaluation","sections":[],"depth":2},{"title":"정량적 평가","local":"quantitative-evaluation","sections":[{"title":"텍스트 안내 이미지 생성","local":"text-guided-image-generation","sections":[],"depth":3},{"title":"이미지 조건화된 텍스트-이미지 생성","local":"image-conditioned-text-to-image-generation","sections":[],"depth":3},{"title":"클래스 조건화 이미지 생성","local":"class-conditioned-image-generation","sections":[],"depth":3}],"depth":2}],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="evaluating-diffusion-models" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#evaluating-diffusion-models"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Diffusion 모델 평가하기</span></h1> <a target="_blank" href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/evaluation.ipynb" data-svelte-h="svelte-1fn2wis"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> <p data-svelte-h="svelte-6xq5a1"><a href="https://huggingface.co/docs/diffusers/stable_diffusion" rel="nofollow">Stable Diffusion</a>와 같은 생성 모델의 평가는 주관적인 성격을 가지고 있습니다. 그러나 실무자와 연구자로서 우리는 종종 다양한 가능성 중에서 신중한 선택을 해야 합니다. 그래서 다양한 생성 모델 (GAN, Diffusion 등)을 사용할 때 어떻게 선택해야 할까요?</p> <p data-svelte-h="svelte-1xlftt0">정성적인 평가는 모델의 이미지 품질에 대한 주관적인 평가이므로 오류가 발생할 수 있고 결정에 잘못된 영향을 미칠 수 있습니다. 반면, 정량적인 평가는 이미지 품질과 직접적인 상관관계를 갖지 않을 수 있습니다. 따라서 일반적으로 정성적 평가와 정량적 평가를 모두 고려하는 것이 더 강력한 신호를 제공하여 모델 선택에 도움이 됩니다.</p> <p data-svelte-h="svelte-yspndj">이 문서에서는 Diffusion 모델을 평가하기 위한 정성적 및 정량적 방법에 대해 상세히 설명합니다. 정량적 방법에 대해서는 특히 <code>diffusers</code>와 함께 구현하는 방법에 초점을 맞추었습니다.</p> <p data-svelte-h="svelte-un0io1">이 문서에서 보여진 방법들은 기반 생성 모델을 고정시키고 다양한 <a href="https://huggingface.co/docs/diffusers/main/en/api/schedulers/overview" rel="nofollow">노이즈 스케줄러</a>를 평가하는 데에도 사용할 수 있습니다.</p> <h2 class="relative group"><a id="scenarios" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#scenarios"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>시나리오</span></h2> <p data-svelte-h="svelte-1p7jqma">다음과 같은 파이프라인을 사용하여 Diffusion 모델을 다룹니다:</p> <ul data-svelte-h="svelte-1l171m9"><li>텍스트로 안내된 이미지 생성 (예: <a href="https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/text2img" rel="nofollow"><code>StableDiffusionPipeline</code></a>).</li> <li>입력 이미지에 추가로 조건을 건 텍스트로 안내된 이미지 생성 (예: <a href="https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/img2img" rel="nofollow"><code>StableDiffusionImg2ImgPipeline</code></a> 및 <a href="https://huggingface.co/docs/diffusers/main/en/api/pipelines/pix2pix" rel="nofollow"><code>StableDiffusionInstructPix2PixPipeline</code></a>).</li> <li>클래스 조건화된 이미지 생성 모델 (예: <a href="https://huggingface.co/docs/diffusers/main/en/api/pipelines/dit" rel="nofollow"><code>DiTPipeline</code></a>).</li></ul> <h2 class="relative group"><a id="qualitative-evaluation" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#qualitative-evaluation"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>정성적 평가</span></h2> <p data-svelte-h="svelte-1w1uh1l">정성적 평가는 일반적으로 생성된 이미지의 인간 평가를 포함합니다. 품질은 구성성, 이미지-텍스트 일치, 공간 관계 등과 같은 측면에서 측정됩니다. 일반적인 프롬프트는 주관적인 지표에 대한 일정한 기준을 제공합니다. | |
| DrawBench와 PartiPrompts는 정성적인 벤치마킹에 사용되는 프롬프트 데이터셋입니다. DrawBench와 PartiPrompts는 각각 <a href="https://imagen.research.google/" rel="nofollow">Imagen</a>과 <a href="https://parti.research.google/" rel="nofollow">Parti</a>에서 소개되었습니다.</p> <p data-svelte-h="svelte-1v3brnz"><a href="https://parti.research.google/" rel="nofollow">Parti 공식 웹사이트</a>에서 다음과 같이 설명하고 있습니다:</p> <blockquote data-svelte-h="svelte-1udk7r1"><p>PartiPrompts (P2)는 이 작업의 일부로 공개되는 영어로 된 1600개 이상의 다양한 프롬프트 세트입니다. P2는 다양한 범주와 도전 측면에서 모델의 능력을 측정하는 데 사용할 수 있습니다.</p></blockquote> <p data-svelte-h="svelte-19xz367"><img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/parti-prompts.png" alt="parti-prompts"></p> <p data-svelte-h="svelte-4fj1bh">PartiPrompts는 다음과 같은 열을 가지고 있습니다:</p> <ul data-svelte-h="svelte-6wioij"><li>프롬프트 (Prompt)</li> <li>프롬프트의 카테고리 (예: “Abstract”, “World Knowledge” 등)</li> <li>난이도를 반영한 챌린지 (예: “Basic”, “Complex”, “Writing & Symbols” 등)</li></ul> <p data-svelte-h="svelte-3d1wrj">이러한 벤치마크는 서로 다른 이미지 생성 모델을 인간 평가로 비교할 수 있도록 합니다.</p> <p data-svelte-h="svelte-vnamve">이를 위해 🧨 Diffusers 팀은 <strong>Open Parti Prompts</strong>를 구축했습니다. 이는 Parti Prompts를 기반으로 한 커뮤니티 기반의 질적 벤치마크로, 최첨단 오픈 소스 확산 모델을 비교하는 데 사용됩니다:</p> <ul data-svelte-h="svelte-tidi6g"><li><a href="https://huggingface.co/spaces/OpenGenAI/open-parti-prompts" rel="nofollow">Open Parti Prompts 게임</a>: 10개의 parti prompt에 대해 4개의 생성된 이미지가 제시되며, 사용자는 프롬프트에 가장 적합한 이미지를 선택합니다.</li> <li><a href="https://huggingface.co/spaces/OpenGenAI/parti-prompts-leaderboard" rel="nofollow">Open Parti Prompts 리더보드</a>: 현재 최고의 오픈 소스 diffusion 모델들을 서로 비교하는 리더보드입니다.</li></ul> <p data-svelte-h="svelte-1tvbdp7">이미지를 수동으로 비교하려면, <code>diffusers</code>를 사용하여 몇가지 PartiPrompts를 어떻게 활용할 수 있는지 알아봅시다.</p> <p data-svelte-h="svelte-sgc14z">다음은 몇 가지 다른 도전에서 샘플링한 프롬프트를 보여줍니다: Basic, Complex, Linguistic Structures, Imagination, Writing & Symbols. 여기서는 PartiPrompts를 <a href="https://huggingface.co/datasets/nateraw/parti-prompts" rel="nofollow">데이터셋</a>으로 사용합니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> load_dataset | |
| <span class="hljs-comment"># prompts = load_dataset("nateraw/parti-prompts", split="train")</span> | |
| <span class="hljs-comment"># prompts = prompts.shuffle()</span> | |
| <span class="hljs-comment"># sample_prompts = [prompts[i]["Prompt"] for i in range(5)]</span> | |
| <span class="hljs-comment"># Fixing these sample prompts in the interest of reproducibility.</span> | |
| sample_prompts = [ | |
| <span class="hljs-string">"a corgi"</span>, | |
| <span class="hljs-string">"a hot air balloon with a yin-yang symbol, with the moon visible in the daytime sky"</span>, | |
| <span class="hljs-string">"a car with no windows"</span>, | |
| <span class="hljs-string">"a cube made of porcupine"</span>, | |
| <span class="hljs-string">'The saying "BE EXCELLENT TO EACH OTHER" written on a red brick wall with a graffiti image of a green alien wearing a tuxedo. A yellow fire hydrant is on a sidewalk in the foreground.'</span>, | |
| ]<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-ucg1d9">이제 이런 프롬프트를 사용하여 Stable Diffusion (<a href="https://huggingface.co/CompVis/stable-diffusion-v1-4" rel="nofollow">v1-4 checkpoint</a>)를 사용한 이미지 생성을 할 수 있습니다 :</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">import</span> torch | |
| seed = <span class="hljs-number">0</span> | |
| generator = torch.manual_seed(seed) | |
| images = sd_pipeline(sample_prompts, num_images_per_prompt=<span class="hljs-number">1</span>, generator=generator).images<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-4i7yd5"><img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/parti-prompts-14.png" alt="parti-prompts-14"></p> <p data-svelte-h="svelte-px9uzg"><code>num_images_per_prompt</code>를 설정하여 동일한 프롬프트에 대해 다른 이미지를 비교할 수도 있습니다. 다른 체크포인트(<a href="https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5" rel="nofollow">v1-5</a>)로 동일한 파이프라인을 실행하면 다음과 같은 결과가 나옵니다:</p> <p data-svelte-h="svelte-gipltn"><img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/parti-prompts-15.png" alt="parti-prompts-15"></p> <p data-svelte-h="svelte-j6hquq">다양한 모델을 사용하여 모든 프롬프트에서 생성된 여러 이미지들이 생성되면 (평가 과정에서) 이러한 결과물들은 사람 평가자들에게 점수를 매기기 위해 제시됩니다. DrawBench와 PartiPrompts 벤치마크에 대한 자세한 내용은 각각의 논문을 참조하십시오.</p> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-1fh9g2r">모델이 훈련 중일 때 추론 샘플을 살펴보는 것은 훈련 진행 상황을 측정하는 데 유용합니다. <a href="https://github.com/huggingface/diffusers/tree/main/examples/" rel="nofollow">훈련 스크립트</a>에서는 TensorBoard와 Weights & Biases에 대한 추가 지원과 함께 이 유틸리티를 지원합니다.</p></div> <h2 class="relative group"><a id="quantitative-evaluation" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#quantitative-evaluation"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>정량적 평가</span></h2> <p data-svelte-h="svelte-fcrw06">이 섹션에서는 세 가지 다른 확산 파이프라인을 평가하는 방법을 안내합니다:</p> <ul data-svelte-h="svelte-148168f"><li>CLIP 점수</li> <li>CLIP 방향성 유사도</li> <li>FID</li></ul> <h3 class="relative group"><a id="text-guided-image-generation" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#text-guided-image-generation"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>텍스트 안내 이미지 생성</span></h3> <p data-svelte-h="svelte-1p20uhi"><a href="https://huggingface.co/papers/2104.08718" rel="nofollow">CLIP 점수</a>는 이미지-캡션 쌍의 호환성을 측정합니다. 높은 CLIP 점수는 높은 호환성🔼을 나타냅니다. CLIP 점수는 이미지와 캡션 사이의 의미적 유사성으로 생각할 수도 있습니다. CLIP 점수는 인간 판단과 높은 상관관계를 가지고 있습니다.</p> <p data-svelte-h="svelte-xk9h88"><code>StableDiffusionPipeline</code>을 일단 로드해봅시다:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> diffusers <span class="hljs-keyword">import</span> StableDiffusionPipeline | |
| <span class="hljs-keyword">import</span> torch | |
| model_ckpt = <span class="hljs-string">"CompVis/stable-diffusion-v1-4"</span> | |
| sd_pipeline = StableDiffusionPipeline.from_pretrained(model_ckpt, torch_dtype=torch.float16).to(<span class="hljs-string">"cuda"</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1sbnm0m">여러 개의 프롬프트를 사용하여 이미지를 생성합니다:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->prompts = [ | |
| <span class="hljs-string">"a photo of an astronaut riding a horse on mars"</span>, | |
| <span class="hljs-string">"A high tech solarpunk utopia in the Amazon rainforest"</span>, | |
| <span class="hljs-string">"A pikachu fine dining with a view to the Eiffel Tower"</span>, | |
| <span class="hljs-string">"A mecha robot in a favela in expressionist style"</span>, | |
| <span class="hljs-string">"an insect robot preparing a delicious meal"</span>, | |
| <span class="hljs-string">"A small cabin on top of a snowy mountain in the style of Disney, artstation"</span>, | |
| ] | |
| images = sd_pipeline(prompts, num_images_per_prompt=<span class="hljs-number">1</span>, output_type=<span class="hljs-string">"np"</span>).images | |
| <span class="hljs-built_in">print</span>(images.shape) | |
| <span class="hljs-comment"># (6, 512, 512, 3)</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1guznfb">그러고 나서 CLIP 점수를 계산합니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> torchmetrics.functional.multimodal <span class="hljs-keyword">import</span> clip_score | |
| <span class="hljs-keyword">from</span> functools <span class="hljs-keyword">import</span> partial | |
| clip_score_fn = partial(clip_score, model_name_or_path=<span class="hljs-string">"openai/clip-vit-base-patch16"</span>) | |
| <span class="hljs-keyword">def</span> <span class="hljs-title function_">calculate_clip_score</span>(<span class="hljs-params">images, prompts</span>): | |
| images_int = (images * <span class="hljs-number">255</span>).astype(<span class="hljs-string">"uint8"</span>) | |
| clip_score = clip_score_fn(torch.from_numpy(images_int).permute(<span class="hljs-number">0</span>, <span class="hljs-number">3</span>, <span class="hljs-number">1</span>, <span class="hljs-number">2</span>), prompts).detach() | |
| <span class="hljs-keyword">return</span> <span class="hljs-built_in">round</span>(<span class="hljs-built_in">float</span>(clip_score), <span class="hljs-number">4</span>) | |
| sd_clip_score = calculate_clip_score(images, prompts) | |
| <span class="hljs-built_in">print</span>(<span class="hljs-string">f"CLIP score: <span class="hljs-subst">{sd_clip_score}</span>"</span>) | |
| <span class="hljs-comment"># CLIP score: 35.7038</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-yopa5z">위의 예제에서는 각 프롬프트 당 하나의 이미지를 생성했습니다. 만약 프롬프트 당 여러 이미지를 생성한다면, 프롬프트 당 생성된 이미지의 평균 점수를 사용해야 합니다.</p> <p data-svelte-h="svelte-1ytdcd">이제 <code>StableDiffusionPipeline</code>과 호환되는 두 개의 체크포인트를 비교하려면, 파이프라인을 호출할 때 generator를 전달해야 합니다. 먼저, 고정된 시드로 <a href="https://huggingface.co/CompVis/stable-diffusion-v1-4" rel="nofollow">v1-4 Stable Diffusion 체크포인트</a>를 사용하여 이미지를 생성합니다:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->seed = <span class="hljs-number">0</span> | |
| generator = torch.manual_seed(seed) | |
| images = sd_pipeline(prompts, num_images_per_prompt=<span class="hljs-number">1</span>, generator=generator, output_type=<span class="hljs-string">"np"</span>).images<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-gge1c8">그런 다음 <a href="https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5" rel="nofollow">v1-5 checkpoint</a>를 로드하여 이미지를 생성합니다:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->model_ckpt_1_5 = <span class="hljs-string">"stable-diffusion-v1-5/stable-diffusion-v1-5"</span> | |
| sd_pipeline_1_5 = StableDiffusionPipeline.from_pretrained(model_ckpt_1_5, torch_dtype=weight_dtype).to(device) | |
| images_1_5 = sd_pipeline_1_5(prompts, num_images_per_prompt=<span class="hljs-number">1</span>, generator=generator, output_type=<span class="hljs-string">"np"</span>).images<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-umer2m">그리고 마지막으로 CLIP 점수를 비교합니다:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->sd_clip_score_1_4 = calculate_clip_score(images, prompts) | |
| <span class="hljs-built_in">print</span>(<span class="hljs-string">f"CLIP Score with v-1-4: <span class="hljs-subst">{sd_clip_score_1_4}</span>"</span>) | |
| <span class="hljs-comment"># CLIP Score with v-1-4: 34.9102</span> | |
| sd_clip_score_1_5 = calculate_clip_score(images_1_5, prompts) | |
| <span class="hljs-built_in">print</span>(<span class="hljs-string">f"CLIP Score with v-1-5: <span class="hljs-subst">{sd_clip_score_1_5}</span>"</span>) | |
| <span class="hljs-comment"># CLIP Score with v-1-5: 36.2137</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-skbucg"><a href="https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5" rel="nofollow">v1-5</a> 체크포인트가 이전 버전보다 더 나은 성능을 보이는 것 같습니다. 그러나 CLIP 점수를 계산하기 위해 사용한 프롬프트의 수가 상당히 적습니다. 보다 실용적인 평가를 위해서는 이 수를 훨씬 높게 설정하고, 프롬프트를 다양하게 사용해야 합니다.</p> <div class="course-tip course-tip-orange bg-gradient-to-br dark:bg-gradient-to-r before:border-orange-500 dark:before:border-orange-800 from-orange-50 dark:from-gray-900 to-white dark:to-gray-950 border border-orange-50 text-orange-700 dark:text-gray-400"><p data-svelte-h="svelte-1dj5ge3">이 점수에는 몇 가지 제한 사항이 있습니다. 훈련 데이터셋의 캡션은 웹에서 크롤링되어 이미지와 관련된 <code>alt</code> 및 유사한 태그에서 추출되었습니다. 이들은 인간이 이미지를 설명하는 데 사용할 수 있는 것과 일치하지 않을 수 있습니다. 따라서 여기서는 몇 가지 프롬프트를 “엔지니어링”해야 했습니다.</p></div> <h3 class="relative group"><a id="image-conditioned-text-to-image-generation" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#image-conditioned-text-to-image-generation"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>이미지 조건화된 텍스트-이미지 생성</span></h3> <p data-svelte-h="svelte-1rd9xlo">이 경우, 생성 파이프라인을 입력 이미지와 텍스트 프롬프트로 조건화합니다. <code>StableDiffusionInstructPix2PixPipeline</code>을 예로 들어보겠습니다. 이는 편집 지시문을 입력 프롬프트로 사용하고 편집할 입력 이미지를 사용합니다.</p> <p data-svelte-h="svelte-17dg0d1">다음은 하나의 예시입니다:</p> <p data-svelte-h="svelte-tnn31f"><img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/edit-instruction.png" alt="edit-instruction"></p> <p data-svelte-h="svelte-53z69p">모델을 평가하는 한 가지 전략은 두 이미지 캡션 간의 변경과(<a href="https://huggingface.co/papers/2108.00946" rel="nofollow">CLIP-Guided Domain Adaptation of Image Generators</a>에서 보여줍니다) 함께 두 이미지 사이의 변경의 일관성을 측정하는 것입니다 (<a href="https://huggingface.co/docs/transformers/model_doc/clip" rel="nofollow">CLIP</a> 공간에서). 이를 ”<strong>CLIP 방향성 유사성</strong>“이라고 합니다.</p> <ul data-svelte-h="svelte-1pnh71f"><li>캡션 1은 편집할 이미지 (이미지 1)에 해당합니다.</li> <li>캡션 2는 편집된 이미지 (이미지 2)에 해당합니다. 편집 지시를 반영해야 합니다.</li></ul> <p data-svelte-h="svelte-1ubfnyv">다음은 그림으로 된 개요입니다:</p> <p data-svelte-h="svelte-fs1abj"><img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/edit-consistency.png" alt="edit-consistency"></p> <p data-svelte-h="svelte-ixcjgc">우리는 이 측정 항목을 구현하기 위해 미니 데이터 세트를 준비했습니다. 먼저 데이터 세트를 로드해 보겠습니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> load_dataset | |
| dataset = load_dataset(<span class="hljs-string">"sayakpaul/instructpix2pix-demo"</span>, split=<span class="hljs-string">"train"</span>) | |
| dataset.features<!-- HTML_TAG_END --></pre></div> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->{<span class="hljs-string">'input'</span>: Value(dtype=<span class="hljs-string">'string'</span>, <span class="hljs-built_in">id</span>=None), | |
| <span class="hljs-string">'edit'</span>: Value(dtype=<span class="hljs-string">'string'</span>, <span class="hljs-built_in">id</span>=None), | |
| <span class="hljs-string">'output'</span>: Value(dtype=<span class="hljs-string">'string'</span>, <span class="hljs-built_in">id</span>=None), | |
| <span class="hljs-string">'image'</span>: Image(decode=True, <span class="hljs-built_in">id</span>=None)}<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-x2kg70">여기에는 다음과 같은 항목이 있습니다:</p> <ul data-svelte-h="svelte-1uwuaaj"><li><code>input</code>은 <code>image</code>에 해당하는 캡션입니다.</li> <li><code>edit</code>은 편집 지시사항을 나타냅니다.</li> <li><code>output</code>은 <code>edit</code> 지시사항을 반영한 수정된 캡션입니다.</li></ul> <p data-svelte-h="svelte-1idfvnt">샘플을 살펴보겠습니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->idx = <span class="hljs-number">0</span> | |
| <span class="hljs-built_in">print</span>(<span class="hljs-string">f"Original caption: <span class="hljs-subst">{dataset[idx][<span class="hljs-string">'input'</span>]}</span>"</span>) | |
| <span class="hljs-built_in">print</span>(<span class="hljs-string">f"Edit instruction: <span class="hljs-subst">{dataset[idx][<span class="hljs-string">'edit'</span>]}</span>"</span>) | |
| <span class="hljs-built_in">print</span>(<span class="hljs-string">f"Modified caption: <span class="hljs-subst">{dataset[idx][<span class="hljs-string">'output'</span>]}</span>"</span>)<!-- HTML_TAG_END --></pre></div> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->Original caption: 2. FAROE ISLANDS: An archipelago of 18 mountainous isles <span class="hljs-keyword">in</span> the North Atlantic Ocean between Norway and Iceland, the Faroe Islands has <span class="hljs-string">'everything you could hope for'</span>, according to Big 7 Travel. It boasts <span class="hljs-string">'crystal clear waterfalls, rocky cliffs that seem to jut out of nowhere and velvety green hills'</span> | |
| Edit instruction: make the isles all white marble | |
| Modified caption: 2. WHITE MARBLE ISLANDS: An archipelago of 18 mountainous white marble isles <span class="hljs-keyword">in</span> the North Atlantic Ocean between Norway and Iceland, the White Marble Islands has <span class="hljs-string">'everything you could hope for'</span>, according to Big 7 Travel. It boasts <span class="hljs-string">'crystal clear waterfalls, rocky cliffs that seem to jut out of nowhere and velvety green hills'</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-skm5wr">다음은 이미지입니다:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->dataset[idx][<span class="hljs-string">"image"</span>]<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1r4pb3b"><img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/edit-dataset.png" alt="edit-dataset"></p> <p data-svelte-h="svelte-1udfdoz">먼저 편집 지시사항을 사용하여 데이터 세트의 이미지를 편집하고 방향 유사도를 계산합니다.</p> <p data-svelte-h="svelte-1inu8ac"><code>StableDiffusionInstructPix2PixPipeline</code>를 먼저 로드합니다:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> diffusers <span class="hljs-keyword">import</span> StableDiffusionInstructPix2PixPipeline | |
| instruct_pix2pix_pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained( | |
| <span class="hljs-string">"timbrooks/instruct-pix2pix"</span>, torch_dtype=torch.float16 | |
| ).to(device)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-b448d3">이제 편집을 수행합니다:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np | |
| <span class="hljs-keyword">def</span> <span class="hljs-title function_">edit_image</span>(<span class="hljs-params">input_image, instruction</span>): | |
| image = instruct_pix2pix_pipeline( | |
| instruction, | |
| image=input_image, | |
| output_type=<span class="hljs-string">"np"</span>, | |
| generator=generator, | |
| ).images[<span class="hljs-number">0</span>] | |
| <span class="hljs-keyword">return</span> image | |
| input_images = [] | |
| original_captions = [] | |
| modified_captions = [] | |
| edited_images = [] | |
| <span class="hljs-keyword">for</span> idx <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(<span class="hljs-built_in">len</span>(dataset)): | |
| input_image = dataset[idx][<span class="hljs-string">"image"</span>] | |
| edit_instruction = dataset[idx][<span class="hljs-string">"edit"</span>] | |
| edited_image = edit_image(input_image, edit_instruction) | |
| input_images.append(np.array(input_image)) | |
| original_captions.append(dataset[idx][<span class="hljs-string">"input"</span>]) | |
| modified_captions.append(dataset[idx][<span class="hljs-string">"output"</span>]) | |
| edited_images.append(edited_image)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-13ebdic">방향 유사도를 계산하기 위해서는 먼저 CLIP의 이미지와 텍스트 인코더를 로드합니다:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> ( | |
| CLIPTokenizer, | |
| CLIPTextModelWithProjection, | |
| CLIPVisionModelWithProjection, | |
| CLIPImageProcessor, | |
| ) | |
| clip_id = <span class="hljs-string">"openai/clip-vit-large-patch14"</span> | |
| tokenizer = CLIPTokenizer.from_pretrained(clip_id) | |
| text_encoder = CLIPTextModelWithProjection.from_pretrained(clip_id).to(device) | |
| image_processor = CLIPImageProcessor.from_pretrained(clip_id) | |
| image_encoder = CLIPVisionModelWithProjection.from_pretrained(clip_id).to(device)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-j4jmg5">주목할 점은 특정한 CLIP 체크포인트인 <code>openai/clip-vit-large-patch14</code>를 사용하고 있다는 것입니다. 이는 Stable Diffusion 사전 훈련이 이 CLIP 변형체와 함께 수행되었기 때문입니다. 자세한 내용은 <a href="https://huggingface.co/docs/transformers/model_doc/clip" rel="nofollow">문서</a>를 참조하세요.</p> <p data-svelte-h="svelte-i6vhnq">다음으로, 방향성 유사도를 계산하기 위해 PyTorch의 <code>nn.Module</code>을 준비합니다:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn | |
| <span class="hljs-keyword">import</span> torch.nn.functional <span class="hljs-keyword">as</span> F | |
| <span class="hljs-keyword">class</span> <span class="hljs-title class_">DirectionalSimilarity</span>(nn.Module): | |
| <span class="hljs-keyword">def</span> <span class="hljs-title function_">__init__</span>(<span class="hljs-params">self, tokenizer, text_encoder, image_processor, image_encoder</span>): | |
| <span class="hljs-built_in">super</span>().__init__() | |
| self.tokenizer = tokenizer | |
| self.text_encoder = text_encoder | |
| self.image_processor = image_processor | |
| self.image_encoder = image_encoder | |
| <span class="hljs-keyword">def</span> <span class="hljs-title function_">preprocess_image</span>(<span class="hljs-params">self, image</span>): | |
| image = self.image_processor(image, return_tensors=<span class="hljs-string">"pt"</span>)[<span class="hljs-string">"pixel_values"</span>] | |
| <span class="hljs-keyword">return</span> {<span class="hljs-string">"pixel_values"</span>: image.to(device)} | |
| <span class="hljs-keyword">def</span> <span class="hljs-title function_">tokenize_text</span>(<span class="hljs-params">self, text</span>): | |
| inputs = self.tokenizer( | |
| text, | |
| max_length=self.tokenizer.model_max_length, | |
| padding=<span class="hljs-string">"max_length"</span>, | |
| truncation=<span class="hljs-literal">True</span>, | |
| return_tensors=<span class="hljs-string">"pt"</span>, | |
| ) | |
| <span class="hljs-keyword">return</span> {<span class="hljs-string">"input_ids"</span>: inputs.input_ids.to(device)} | |
| <span class="hljs-keyword">def</span> <span class="hljs-title function_">encode_image</span>(<span class="hljs-params">self, image</span>): | |
| preprocessed_image = self.preprocess_image(image) | |
| image_features = self.image_encoder(**preprocessed_image).image_embeds | |
| image_features = image_features / image_features.norm(dim=<span class="hljs-number">1</span>, keepdim=<span class="hljs-literal">True</span>) | |
| <span class="hljs-keyword">return</span> image_features | |
| <span class="hljs-keyword">def</span> <span class="hljs-title function_">encode_text</span>(<span class="hljs-params">self, text</span>): | |
| tokenized_text = self.tokenize_text(text) | |
| text_features = self.text_encoder(**tokenized_text).text_embeds | |
| text_features = text_features / text_features.norm(dim=<span class="hljs-number">1</span>, keepdim=<span class="hljs-literal">True</span>) | |
| <span class="hljs-keyword">return</span> text_features | |
| <span class="hljs-keyword">def</span> <span class="hljs-title function_">compute_directional_similarity</span>(<span class="hljs-params">self, img_feat_one, img_feat_two, text_feat_one, text_feat_two</span>): | |
| sim_direction = F.cosine_similarity(img_feat_two - img_feat_one, text_feat_two - text_feat_one) | |
| <span class="hljs-keyword">return</span> sim_direction | |
| <span class="hljs-keyword">def</span> <span class="hljs-title function_">forward</span>(<span class="hljs-params">self, image_one, image_two, caption_one, caption_two</span>): | |
| img_feat_one = self.encode_image(image_one) | |
| img_feat_two = self.encode_image(image_two) | |
| text_feat_one = self.encode_text(caption_one) | |
| text_feat_two = self.encode_text(caption_two) | |
| directional_similarity = self.compute_directional_similarity( | |
| img_feat_one, img_feat_two, text_feat_one, text_feat_two | |
| ) | |
| <span class="hljs-keyword">return</span> directional_similarity<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-19tiepu">이제 <code>DirectionalSimilarity</code>를 사용해 보겠습니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->dir_similarity = DirectionalSimilarity(tokenizer, text_encoder, image_processor, image_encoder) | |
| scores = [] | |
| <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(<span class="hljs-built_in">len</span>(input_images)): | |
| original_image = input_images[i] | |
| original_caption = original_captions[i] | |
| edited_image = edited_images[i] | |
| modified_caption = modified_captions[i] | |
| similarity_score = dir_similarity(original_image, edited_image, original_caption, modified_caption) | |
| scores.append(<span class="hljs-built_in">float</span>(similarity_score.detach().cpu())) | |
| <span class="hljs-built_in">print</span>(<span class="hljs-string">f"CLIP directional similarity: <span class="hljs-subst">{np.mean(scores)}</span>"</span>) | |
| <span class="hljs-comment"># CLIP directional similarity: 0.0797976553440094</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1ff4mmy">CLIP 점수와 마찬가지로, CLIP 방향 유사성이 높을수록 좋습니다.</p> <p data-svelte-h="svelte-1hslery"><code>StableDiffusionInstructPix2PixPipeline</code>은 <code>image_guidance_scale</code>과 <code>guidance_scale</code>이라는 두 가지 인자를 노출시킵니다. 이 두 인자를 조정하여 최종 편집된 이미지의 품질을 제어할 수 있습니다. 이 두 인자의 영향을 실험해보고 방향 유사성에 미치는 영향을 확인해보기를 권장합니다.</p> <p data-svelte-h="svelte-1nlq5nk">이러한 메트릭의 개념을 확장하여 원본 이미지와 편집된 버전의 유사성을 측정할 수 있습니다. 이를 위해 <code>F.cosine_similarity(img_feat_two, img_feat_one)</code>을 사용할 수 있습니다. 이러한 종류의 편집에서는 이미지의 주요 의미가 최대한 보존되어야 합니다. 즉, 높은 유사성 점수를 얻어야 합니다.</p> <p data-svelte-h="svelte-1gpzo3s"><a href="https://huggingface.co/docs/diffusers/main/en/api/pipelines/pix2pix_zero#diffusers.StableDiffusionPix2PixZeroPipeline" rel="nofollow"><code>StableDiffusionPix2PixZeroPipeline</code></a>와 같은 유사한 파이프라인에도 이러한 메트릭을 사용할 수 있습니다.</p> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-sk48eu">CLIP 점수와 CLIP 방향 유사성 모두 CLIP 모델에 의존하기 때문에 평가가 편향될 수 있습니다</p></div> <p data-svelte-h="svelte-dg71ox"><strong><em>IS, FID (나중에 설명할 예정), 또는 KID와 같은 메트릭을 확장하는 것은 어려울 수 있습니다</em></strong>. 평가 중인 모델이 대규모 이미지 캡셔닝 데이터셋 (예: <a href="https://laion.ai/blog/laion-5b/" rel="nofollow">LAION-5B 데이터셋</a>)에서 사전 훈련되었을 때 이는 문제가 될 수 있습니다. 왜냐하면 이러한 메트릭의 기반에는 중간 이미지 특징을 추출하기 위해 ImageNet-1k 데이터셋에서 사전 훈련된 InceptionNet이 사용되기 때문입니다. Stable Diffusion의 사전 훈련 데이터셋은 InceptionNet의 사전 훈련 데이터셋과 겹치는 부분이 제한적일 수 있으므로 따라서 여기에는 좋은 후보가 아닙니다.</p> <p data-svelte-h="svelte-179vfay"><strong><em>위의 메트릭을 사용하면 클래스 조건이 있는 모델을 평가할 수 있습니다. 예를 들어, <a href="https://huggingface.co/docs/diffusers/main/en/api/pipelines/dit" rel="nofollow">DiT</a>. 이는 ImageNet-1k 클래스에 조건을 걸고 사전 훈련되었습니다.</em></strong></p> <h3 class="relative group"><a id="class-conditioned-image-generation" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#class-conditioned-image-generation"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>클래스 조건화 이미지 생성</span></h3> <p data-svelte-h="svelte-1qwesop">클래스 조건화 생성 모델은 일반적으로 <a href="https://huggingface.co/datasets/imagenet-1k" rel="nofollow">ImageNet-1k</a>와 같은 클래스 레이블이 지정된 데이터셋에서 사전 훈련됩니다. 이러한 모델을 평가하는 인기있는 지표에는 Fréchet Inception Distance (FID), Kernel Inception Distance (KID) 및 Inception Score (IS)가 있습니다. 이 문서에서는 FID (<a href="https://huggingface.co/papers/1706.08500" rel="nofollow">Heusel et al.</a>)에 초점을 맞추고 있습니다. <a href="https://huggingface.co/docs/diffusers/api/pipelines/dit" rel="nofollow"><code>DiTPipeline</code></a>을 사용하여 FID를 계산하는 방법을 보여줍니다. 이는 내부적으로 <a href="https://huggingface.co/papers/2212.09748" rel="nofollow">DiT 모델</a>을 사용합니다.</p> <p data-svelte-h="svelte-179n0sy">FID는 두 개의 이미지 데이터셋이 얼마나 유사한지를 측정하는 것을 목표로 합니다. <a href="https://mmgeneration.readthedocs.io/en/latest/quick_run.html#fid" rel="nofollow">이 자료</a>에 따르면:</p> <blockquote data-svelte-h="svelte-8chdsm"><p>Fréchet Inception Distance는 두 개의 이미지 데이터셋 간의 유사성을 측정하는 지표입니다. 시각적 품질에 대한 인간 판단과 잘 상관되는 것으로 나타났으며, 주로 생성적 적대 신경망의 샘플 품질을 평가하는 데 사용됩니다. FID는 Inception 네트워크의 특징 표현에 맞게 적합한 두 개의 가우시안 사이의 Fréchet 거리를 계산하여 구합니다.</p></blockquote> <p data-svelte-h="svelte-83slvb">이 두 개의 데이터셋은 실제 이미지 데이터셋과 가짜 이미지 데이터셋(우리의 경우 생성된 이미지)입니다. FID는 일반적으로 두 개의 큰 데이터셋으로 계산됩니다. 그러나 이 문서에서는 두 개의 미니 데이터셋으로 작업할 것입니다.</p> <p data-svelte-h="svelte-16ocjht">먼저 ImageNet-1k 훈련 세트에서 몇 개의 이미지를 다운로드해 봅시다:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> zipfile <span class="hljs-keyword">import</span> ZipFile | |
| <span class="hljs-keyword">import</span> requests | |
| <span class="hljs-keyword">def</span> <span class="hljs-title function_">download</span>(<span class="hljs-params">url, local_filepath</span>): | |
| r = requests.get(url) | |
| <span class="hljs-keyword">with</span> <span class="hljs-built_in">open</span>(local_filepath, <span class="hljs-string">"wb"</span>) <span class="hljs-keyword">as</span> f: | |
| f.write(r.content) | |
| <span class="hljs-keyword">return</span> local_filepath | |
| dummy_dataset_url = <span class="hljs-string">"https://hf.co/datasets/sayakpaul/sample-datasets/resolve/main/sample-imagenet-images.zip"</span> | |
| local_filepath = download(dummy_dataset_url, dummy_dataset_url.split(<span class="hljs-string">"/"</span>)[-<span class="hljs-number">1</span>]) | |
| <span class="hljs-keyword">with</span> ZipFile(local_filepath, <span class="hljs-string">"r"</span>) <span class="hljs-keyword">as</span> zipper: | |
| zipper.extractall(<span class="hljs-string">"."</span>)<!-- HTML_TAG_END --></pre></div> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image | |
| <span class="hljs-keyword">import</span> os | |
| dataset_path = <span class="hljs-string">"sample-imagenet-images"</span> | |
| image_paths = <span class="hljs-built_in">sorted</span>([os.path.join(dataset_path, x) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> os.listdir(dataset_path)]) | |
| real_images = [np.array(Image.<span class="hljs-built_in">open</span>(path).convert(<span class="hljs-string">"RGB"</span>)) <span class="hljs-keyword">for</span> path <span class="hljs-keyword">in</span> image_paths]<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1fss7yu">다음은 ImageNet-1k classes의 이미지 10개입니다 : “cassette_player”, “chain_saw” (x2), “church”, “gas_pump” (x3), “parachute” (x2), 그리고 “tench”.</p> <p align="center" data-svelte-h="svelte-94lw7t"><img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/real-images.png" alt="real-images"><br> <em>Real images.</em></p> <p data-svelte-h="svelte-mqjsy3">이제 이미지가 로드되었으므로 이미지에 가벼운 전처리를 적용하여 FID 계산에 사용해 보겠습니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> torchvision.transforms <span class="hljs-keyword">import</span> functional <span class="hljs-keyword">as</span> F | |
| <span class="hljs-keyword">def</span> <span class="hljs-title function_">preprocess_image</span>(<span class="hljs-params">image</span>): | |
| image = torch.tensor(image).unsqueeze(<span class="hljs-number">0</span>) | |
| image = image.permute(<span class="hljs-number">0</span>, <span class="hljs-number">3</span>, <span class="hljs-number">1</span>, <span class="hljs-number">2</span>) / <span class="hljs-number">255.0</span> | |
| <span class="hljs-keyword">return</span> F.center_crop(image, (<span class="hljs-number">256</span>, <span class="hljs-number">256</span>)) | |
| real_images = torch.cat([preprocess_image(image) <span class="hljs-keyword">for</span> image <span class="hljs-keyword">in</span> real_images]) | |
| <span class="hljs-built_in">print</span>(real_images.shape) | |
| <span class="hljs-comment"># torch.Size([10, 3, 256, 256])</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1osf93j">이제 위에서 언급한 클래스에 따라 조건화 된 이미지를 생성하기 위해 <a href="https://huggingface.co/docs/diffusers/api/pipelines/dit" rel="nofollow"><code>DiTPipeline</code></a>를 로드합니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> diffusers <span class="hljs-keyword">import</span> DiTPipeline, DPMSolverMultistepScheduler | |
| dit_pipeline = DiTPipeline.from_pretrained(<span class="hljs-string">"facebook/DiT-XL-2-256"</span>, torch_dtype=torch.float16) | |
| dit_pipeline.scheduler = DPMSolverMultistepScheduler.from_config(dit_pipeline.scheduler.config) | |
| dit_pipeline = dit_pipeline.to(<span class="hljs-string">"cuda"</span>) | |
| words = [ | |
| <span class="hljs-string">"cassette player"</span>, | |
| <span class="hljs-string">"chainsaw"</span>, | |
| <span class="hljs-string">"chainsaw"</span>, | |
| <span class="hljs-string">"church"</span>, | |
| <span class="hljs-string">"gas pump"</span>, | |
| <span class="hljs-string">"gas pump"</span>, | |
| <span class="hljs-string">"gas pump"</span>, | |
| <span class="hljs-string">"parachute"</span>, | |
| <span class="hljs-string">"parachute"</span>, | |
| <span class="hljs-string">"tench"</span>, | |
| ] | |
| class_ids = dit_pipeline.get_label_ids(words) | |
| output = dit_pipeline(class_labels=class_ids, generator=generator, output_type=<span class="hljs-string">"np"</span>) | |
| fake_images = output.images | |
| fake_images = torch.tensor(fake_images) | |
| fake_images = fake_images.permute(<span class="hljs-number">0</span>, <span class="hljs-number">3</span>, <span class="hljs-number">1</span>, <span class="hljs-number">2</span>) | |
| <span class="hljs-built_in">print</span>(fake_images.shape) | |
| <span class="hljs-comment"># torch.Size([10, 3, 256, 256])</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1xh8e6v">이제 <a href="https://torchmetrics.readthedocs.io/" rel="nofollow"><code>torchmetrics</code></a>를 사용하여 FID를 계산할 수 있습니다.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> torchmetrics.image.fid <span class="hljs-keyword">import</span> FrechetInceptionDistance | |
| fid = FrechetInceptionDistance(normalize=<span class="hljs-literal">True</span>) | |
| fid.update(real_images, real=<span class="hljs-literal">True</span>) | |
| fid.update(fake_images, real=<span class="hljs-literal">False</span>) | |
| <span class="hljs-built_in">print</span>(<span class="hljs-string">f"FID: <span class="hljs-subst">{<span class="hljs-built_in">float</span>(fid.compute())}</span>"</span>) | |
| <span class="hljs-comment"># FID: 177.7147216796875</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-jfhad4">FID는 낮을수록 좋습니다. 여러 가지 요소가 FID에 영향을 줄 수 있습니다:</p> <ul data-svelte-h="svelte-1mjj4yp"><li>이미지의 수 (실제 이미지와 가짜 이미지 모두)</li> <li>diffusion 과정에서 발생하는 무작위성</li> <li>diffusion 과정에서의 추론 단계 수</li> <li>diffusion 과정에서 사용되는 스케줄러</li></ul> <p data-svelte-h="svelte-1bjjf4r">마지막 두 가지 요소에 대해서는, 다른 시드와 추론 단계에서 평가를 실행하고 평균 결과를 보고하는 것은 좋은 실천 방법입니다</p> <div class="course-tip course-tip-orange bg-gradient-to-br dark:bg-gradient-to-r before:border-orange-500 dark:before:border-orange-800 from-orange-50 dark:from-gray-900 to-white dark:to-gray-950 border border-orange-50 text-orange-700 dark:text-gray-400"><p data-svelte-h="svelte-ed472z">FID 결과는 많은 요소에 의존하기 때문에 취약할 수 있습니다:</p> <ul data-svelte-h="svelte-5s9l4c"><li>계산 중 사용되는 특정 Inception 모델.</li> <li>계산의 구현 정확도.</li> <li>이미지 형식 (PNG 또는 JPG에서 시작하는 경우가 다릅니다).</li></ul> <p data-svelte-h="svelte-1fhj7dv">이러한 사항을 염두에 두면, FID는 유사한 실행을 비교할 때 가장 유용하지만, 저자가 FID 측정 코드를 주의 깊게 공개하지 않는 한 논문 결과를 재현하기는 어렵습니다.</p> <p data-svelte-h="svelte-16sun9v">이러한 사항은 KID 및 IS와 같은 다른 관련 메트릭에도 적용됩니다.</p></div> <p data-svelte-h="svelte-147ab8l">마지막 단계로, <code>fake_images</code>를 시각적으로 검사해 봅시다.</p> <p align="center" data-svelte-h="svelte-16e5oh4"><img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/fake-images.png" alt="fake-images"><br> <em>Fake images.</em></p> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/diffusers/blob/main/docs/source/ko/conceptual/evaluation.md" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p> | |
| <script> | |
| { | |
| __sveltekit_1q80ugm = { | |
| assets: "/docs/diffusers/pr_11452/ko", | |
| base: "/docs/diffusers/pr_11452/ko", | |
| env: {} | |
| }; | |
| const element = document.currentScript.parentElement; | |
| const data = [null,null]; | |
| Promise.all([ | |
| import("/docs/diffusers/pr_11452/ko/_app/immutable/entry/start.475318ac.js"), | |
| import("/docs/diffusers/pr_11452/ko/_app/immutable/entry/app.42775c8f.js") | |
| ]).then(([kit, app]) => { | |
| kit.start(app, element, { | |
| node_ids: [0, 5], | |
| data, | |
| form: null, | |
| error: null | |
| }); | |
| }); | |
| } | |
| </script> | |
Xet Storage Details
- Size:
- 83.4 kB
- Xet hash:
- 8acdee4db2ed52e2e83c0623564e6eeccca317353ba219d2c0a46064da1bd236
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.