Shap-E 3D 生成

先週の先週、OpenAIはShap-E 3Dジェネレーターをオープンソース化しました。論文のアドレスはこちらです：https://arxiv.org/pdf/2305.02463.pdf

今日はColabで試してみます：

インストール

!git clone https://github.com/openai/shap-e.git
%cd shap-e
!pip install -e .

Text-3D

まず、小さな車を生成して効果を見てみましょう

import torch
from shap_e.diffusion.sample import sample_latents
from shap_e.diffusion.gaussian_diffusion import diffusion_from_config
from shap_e.models.download import load_model, load_config
from shap_e.util.notebooks import create_pan_cameras, decode_latent_images, gif_widget

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
xm = load_model('transmitter', device=device)
model = load_model('text300M', device=device)
diffusion = diffusion_from_config(load_config('diffusion'))

batch_size = 4
guidance_scale = 15.0
prompt = "a car"

latents = sample_latents(
    batch_size=batch_size,
    model=model,
    diffusion=diffusion,
    guidance_scale=guidance_scale,
    model_kwargs=dict(texts=[prompt] * batch_size),
    progress=True,
    clip_denoised=True,
    use_fp16=True,
    use_karras=True,
    karras_steps=64,
    sigma_min=1e-3,
    sigma_max=160,
    s_churn=0,
)

render_mode = 'nerf'  # you can change this to 'stf'
size = 64  # this is the size of the renders; higher values take longer to render.

cameras = create_pan_cameras(size, device)
for i, latent in enumerate(latents):
    images = decode_latent_images(xm, latent, cameras, rendering_mode=render_mode)
    display(gif_widget(images))

もちろん、他のものも生成できます。公式に提供されているサンプルを参考にしてください。

https://github.com/openai/shap-e/blob/main/samples.md

生成された3Dモデルはply形式でダウンロードして保存することができます。このスクリプトを実行してください：

from shap_e.util.notebooks import decode_latent_mesh

for i, latent in enumerate(latents):
    with open(f'example_mesh_{i}.ply', 'wb') as f:
        decode_latent_mesh(xm, latent).tri_mesh().write_ply(f)

その後、plyファイルはshap-eフォルダ内に保存されます。右クリックしてダウンロードすることで、ローカルに保存できます。

Image-3D

画像がある場合、それをShap-Eを使って3Dモデルに変換することができます。

画像をshap-eフォルダにアップロードします

スクリプトを実行します：

import torch

from shap_e.diffusion.sample import sample_latents
from shap_e.diffusion.gaussian_diffusion import diffusion_from_config
from shap_e.models.download import load_model, load_config
from shap_e.util.notebooks import create_pan_cameras, decode_latent_images, gif_widget
from shap_e.util.image_util import load_image

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

xm = load_model('transmitter', device=device)
model = load_model('image300M', device=device)
diffusion = diffusion_from_config(load_config('diffusion'))

batch_size = 4
guidance_scale = 3.0

image = load_image("car.png")

latents = sample_latents(
    batch_size=batch_size,
    model=model,
    diffusion=diffusion,
    guidance_scale=guidance_scale,
    model_kwargs=dict(images=[image] * batch_size),
    progress=True,
    clip_denoised=True,
    use_fp16=True,
    use_karras=True,
    karras_steps=64,
    sigma_min=1e-3,
    sigma_max=160,
    s_churn=0,
)

render_mode = 'nerf' # you can change this to 'stf' for mesh rendering
size = 64 # this is the size of the renders; higher values take longer to render.

cameras = create_pan_cameras(size, device)
for i, latent in enumerate(latents):
    images = decode_latent_images(xm, latent, cameras, rendering_mode=render_mode)
    display(gif_widget(images))

これにより、その画像に基づいた3Dモデルが生成されます

見た目はまだ少し粗いですが、オープンソースコミュニティの知恵によってこのAIツールは徐々に改善されていくでしょう。

AIに仕事が奪われてしまう時代が来る気がしますが、では我々人間は何をするべきなのでしょうか？👇