跳到主要内容

Docker Spaces

Spaces 支持使用自定义 Docker 容器 来运行超出 Streamlit 和 Gradio 范围的应用。Docker Spaces 让用户可以突破标准 SDK 之前的限制:从 FastAPI、Go 端点,到 Phoenix 应用和各类 ML Ops 工具,Docker Spaces 都可以很好地承载。

Setting up Docker Spaces

创建新 Space 时选择 Docker 作为 SDK,会在 README.md 的 YAML 块中将 sdk 属性设置为 docker 来初始化 Space。或者,对于已有的 Space 仓库,你也可以直接在 README.md 顶部的 YAML 区块中写上 sdk: docker。你还可以通过设置 app_port: 7860 来修改默认对外暴露的端口 7860。之后,就可以像平时那样编写常规的 Dockerfile

---
title: Basic Docker SDK Space
emoji: 🐳
colorFrom: purple
colorTo: gray
sdk: docker
app_port: 7860
---

在容器内部,你可以按需开放任意数量的端口。例如,你可以在 Space 中安装 Elasticsearch,并在内部通过其默认端口 9200 调用。

如果你希望对外暴露运行在多个端口上的应用,一个常见做法是在容器中使用 Nginx 等反向代理:对外只打开一个端口,由反向代理将互联网请求分发到不同的内部端口。

Secrets and Variables Management

You can manage a Space's environment variables in the Space Settings. Read more here.

Variables

Buildtime

在构建 Docker Space 时,变量会作为 build-arg 传入。如何在 Dockerfile 中使用这些参数,可以参考 Docker 官方文档

	# Declare your environment variables with the ARG directive
ARG MODEL_REPO_NAME

FROM python:latest
# [...]
# You can use them like environment variables
RUN predict.py $MODEL_REPO_NAME

Runtime

在运行时,这些变量会注入到容器的环境变量中。

Secrets

Buildtime

出于安全考虑,Docker Spaces 中 secrets 的管理方式与普通变量不同。你可以在 Settings 标签页 中创建 secret,然后在 Dockerfile 中通过挂载的方式在构建阶段使用它。

例如,如果 SECRET_EXAMPLE 是你在 Settings 中创建的 secret 名称,就可以在构建时将其挂载为文件,然后通过 $(cat /run/secrets/SECRET_EXAMPLE) 读取其值。

示例如下:

# Expose the secret SECRET_EXAMPLE at buildtime and use its value as git remote URL
RUN --mount=type=secret,id=SECRET_EXAMPLE,mode=0444,required=true \
git init && \
git remote add origin $(cat /run/secrets/SECRET_EXAMPLE)
# Expose the secret SECRET_EXAMPLE at buildtime and use its value as a Bearer token for a curl request
RUN --mount=type=secret,id=SECRET_EXAMPLE,mode=0444,required=true \
curl test -H 'Authorization: Bearer $(cat /run/secrets/SECRET_EXAMPLE)'

Runtime

和公共 Variables 一样,在运行时你可以以环境变量的方式访问 secrets。例如,在 Python 中可以使用 os.environ.get("SECRET_EXAMPLE")。可以参考这个使用 secrets 的 Docker Space 示例:secret-example

Permissions

容器以用户 ID 1000 运行。为避免权限问题,你应在进行任何 COPY 或下载操作之前创建一个用户并设置其 WORKDIR

# Set up a new user named "user" with user ID 1000
RUN useradd -m -u 1000 user

# Switch to the "user" user
USER user

# Set home to the user's home directory
ENV HOME=/home/user \
PATH=/home/user/.local/bin:$PATH

# Set the working directory to the user's home directory
WORKDIR $HOME/app

# Try and run pip command after setting the user with `USER user` to avoid permission issues with Python
RUN pip install --no-cache-dir --upgrade pip

# Copy the current directory contents into the container at $HOME/app setting the owner to the user
COPY --chown=user . $HOME/app

# Download a checkpoint
RUN mkdir content
ADD --chown=user https://<SOME_ASSET_URL> content/<SOME_ASSET_NAME>
注意

Always specify the --chown=user with ADD and COPY to ensure the new files are owned by your user.

If you still face permission issues, you might need to use chmod or chown in your Dockerfile to grant the right permissions. For example, if you want to use the directory /data, you can do:

RUN mkdir -p /data
RUN chmod 777 /data

You should always avoid superfluous chowns.

[!WARNING] Updating metadata for a file creates a new copy stored in the new layer. Therefore, a recursive chown can result in a very large image due to the duplication of all affected files.

Rather than fixing permission by running chown:

COPY checkpoint .
RUN chown -R user checkpoint

you should always do:

COPY --chown=user checkpoint .

(same goes for ADD command)

Data Persistence

The data written on disk is lost whenever your Docker Space restarts, unless you opt-in for a persistent storage upgrade.

If you opt-in for a persistent storage upgrade, you can use the /data directory to store data. This directory is mounted on a persistent volume, which means that the data written in this directory will be persisted across restarts.

注意

At the moment, /data volume is only available at runtime, i.e. you cannot use /data during the build step of your Dockerfile.

You can also use our Datasets Hub for specific cases, where you can store state and data in a git LFS repository. You can find an example of persistence here, which uses the huggingface_hub library for programmatically uploading files to a dataset repository. This Space example along with this guide will help you define which solution fits best your data type.

Finally, in some cases, you might want to use an external storage solution from your Space's code like an external hosted DB, S3, etc.

Docker container with GPU

You can run Docker containers with GPU support by using one of our GPU-flavored Spaces Hardware.

We recommend using the nvidia/cuda from Docker Hub as a base image, which comes with CUDA and cuDNN pre-installed.

注意

During Docker buildtime, you don't have access to a GPU hardware. Therefore, you should not try to run any GPU-related command during the build step of your Dockerfile. For example, you can't run nvidia-smi or torch.cuda.is_available() building an image. Read more here.

Read More