Text-to-3D with OpenAI Shap-E
Shap-E is OpenAI’s open-source model that generates 3D objects conditioned on text or images. It outputs implicit representations that you can export to standard mesh formats like OBJ and PLY. The fastest way to use it is through the Hugging Face diffusers library, which wraps the model into a familiar pipeline interface.
| |
| |
This produces a colored point cloud in PLY format. For a watertight mesh you can convert PLY to OBJ with trimesh:
| |
Shap-E needs around 7-8 GB of VRAM. If you hit torch.cuda.OutOfMemoryError: CUDA out of memory, drop the frame_size to 128 or move to CPU with pipe.to("cpu") – generation will take minutes instead of seconds, but it works. Another common issue: the progress bar sticks at 0% and never advances. That usually means your PyTorch build doesn’t match your CUDA driver version. Run python -c "import torch; print(torch.cuda.is_available())" to confirm GPU access before you blame the model.
Image-to-3D with TripoSR
TripoSR, built by Stability AI and Tripo AI, reconstructs a textured 3D mesh from a single photograph. Where Shap-E generates from a text description, TripoSR excels at turning a real photo into geometry – it produces results in under a second on an A100.
| |
Run it from the command line:
| |
This writes an OBJ file with vertex colors to output/. You can also specify the marching cubes resolution to trade mesh density against speed:
| |
The --no-remove-bg flag skips background removal when your input already has a clean white or transparent background. Passing a cluttered photo without background removal produces garbage geometry, so only use this flag when you know the subject is already isolated.
Fixing torchmcubes Build Failures
The most common TripoSR installation problem is torchmcubes failing to compile. You’ll see something like:
| |
or at runtime:
| |
Fix it by making sure your local CUDA toolkit version matches the CUDA version that PyTorch was compiled with. Check both:
| |
If they differ, either install the matching CUDA toolkit or reinstall PyTorch for your local version. Then rebuild torchmcubes:
| |
At default settings, TripoSR uses around 6 GB of VRAM. It can run on CPU too, though single-image inference jumps from under a second to roughly 30 seconds.
Using the Meshy API for Production Workflows
Open-source models are great for experimentation, but if you need reliable 3D generation at scale without managing GPU infrastructure, the Meshy API is a solid option. It supports both text-to-3D and image-to-3D, exports to GLB, OBJ, FBX, USDZ, and STL, and handles the compute for you.
The workflow is asynchronous: submit a task, poll for completion, then download the result.
| |
The preview stage generates a low-detail mesh quickly. For production assets, follow it with a refine step by sending another request with "mode": "refine" and "preview_task_id": task_id. The refined model includes proper textures and cleaner geometry.
Meshy also supports image-to-3D through a separate endpoint at /openapi/v2/image-to-3d. Pass an image URL or base64-encoded image and you get the same async task workflow back.
Picking the Right Tool
Each approach fills a different niche:
| Tool | Input | Output | Speed | GPU Required |
|---|---|---|---|---|
| Shap-E | Text or image | PLY, OBJ | ~30s on GPU | Yes (7-8 GB) |
| TripoSR | Single image | OBJ with vertex colors | <1s on A100 | Recommended (6 GB) |
| Meshy API | Text or image | GLB, OBJ, FBX, USDZ, STL | 1-3 min | No (cloud) |
If you want quick experiments from text descriptions, start with Shap-E. If you have product photos or concept art and need geometry fast, TripoSR is hard to beat. If you’re building a pipeline where reliability and format flexibility matter more than cost, Meshy handles the infrastructure so you don’t have to.
Output Formats and Viewing Your Models
All three tools can produce OBJ files, which nearly every 3D application reads. GLB (binary glTF) is the better choice for web viewers and game engines since it bundles geometry, textures, and materials into a single file.
To quickly inspect a generated mesh without opening Blender:
| |
For web embedding, three.js loads GLB files natively with its GLTFLoader. Drop the file on gltf-viewer.donmccurdy.com for a fast sanity check before integrating into your pipeline.
Related Guides
- How to Generate 3D Scenes from Text with Gaussian Splatting
- How to Generate Images with FLUX.2 in Python
- How to Generate Images with Stable Diffusion in Python
- How to Generate AI Product Photography with Diffusion Models
- How to Edit Images with AI Inpainting Using Stable Diffusion
- How to Generate Videos with Stable Video Diffusion
- How to Build AI Clothing Try-On with Virtual Diffusion Models
- How to Generate Music with Meta AudioCraft
- How to Generate and Edit Audio with Stable Audio and AudioLDM
- How to Edit Images with Natural Language Using InstructPix2Pix