TransWikia.com

DIGITS Docker container not picking up GPU

Data Science Asked by Bojan Komazec on May 4, 2021

I am running DIGITS Docker container but for some reason it fails to recognize host’s GPU: it does not report any GPUs (where I expect 1 to be reported) so in the upper right corner of the DIGITS home page there is no indication of any GPUs and also during the training phase, DIGITS uses only CPU.

enter image description here

I have GeForce GT 640 graphics card:

$ nvidia-smi -L
GPU 0: GeForce GT 640 (UUID: GPU-f2583df9-404d-2564-d332-e7878a94d087)

$ lspci
...
VGA compatible controller: NVIDIA Corporation GK107 [GeForce GT 640 OEM] (rev a1)
...

GK107 is a code name for GeForce GT 640 (GDDR5) (source: https://en.wikipedia.org/wiki/GeForce_600_series) which, according to https://developer.nvidia.com/cuda-gpus, has computing capability 3.5 (which is supported as it has to be >2.1 according to https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installing-on-ubuntu-and-debian).

This is my docker run command:

$ docker run --gpus all -d --name digits --rm -p 8888:5000 -v /home/userx/data:/data -v /home/userx/jobs:/workspace/jobs nvcr.io/nvidia/digits:20.12-tensorflow-py3

When nvidia-smi runs from Docker container, it does see the graphics card:

$ docker exec -it digits bash
root@e58b860504a9:/workspace# nvidia-smi
Fri Feb 12 23:33:17 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GT 640      Off  | 00000000:01:00.0 N/A |                  N/A |
| 40%   32C    P8    N/A /  N/A |    260MiB /  1992MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I am using the latest version of Docker and Nvidia Docker:

$ docker --version
Docker version 20.10.3, build 48d30b5

$ nvidia-docker version 
NVIDIA Docker: 2.5.0
Client: Docker Engine - Community
 Version:           20.10.3
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        48d30b5
 Built:             Fri Jan 29 14:33:21 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.3
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       46229ca
  Built:            Fri Jan 29 14:31:32 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

I am running Ubuntu 20.04:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.2 LTS
Release:    20.04
Codename:   focal

I installed the most recent version of NVIDIA driver for Ubuntu:

$ modinfo nvidia
filename:       /lib/modules/5.4.0-65-generic/updates/dkms/nvidia.ko
alias:          char-major-195-*
version:        460.32.03
supported:      external
license:        NVIDIA
srcversion:     9BFA7969070552C6938D8A8
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:        
retpoline:      Y
name:           nvidia
vermagic:       5.4.0-65-generic SMP mod_unload 
...

Would anyone be kind to give me a hint why DIGITS running in Docker does not recognize my graphics card?

One Answer

I found the answer. https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#platform-requirements specifies compute capability requirements for NVIDIA Container Toolkit but compute capability requirements for DIGITS Docker image are specified for each image release. For digits:20.12 https://docs.nvidia.com/deeplearning/digits/digits-release-notes/rel_20-12.html#rel_20-12 states the following:

Release 20.12 supports CUDA compute capability 6.0 and higher.

My GPU does not meet that requirement.

Correct answer by Bojan Komazec on May 4, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP