Quantcast
Channel: Debian User Forums
Viewing all articles
Browse latest Browse all 3200

Beginners Questions • [Software] Nvidia gpu stops being useable after a time

$
0
0
Hello. I am new to debian and working with a new desktop with an rtx 3090 that I am using for generative ai workflows with stable diffusion using tools like ComfyUI. Here is my about this system:

Operating System: Debian GNU/Linux 12
KDE Plasma Version: 5.27.5
KDE Frameworks Version: 5.103.0
Qt Version: 5.15.8
Kernel Version: 6.1.0-17-amd64 (64-bit)
Graphics Platform: Wayland
Processors: 24 × AMD Ryzen 9 7900X 12-Core Processor
Memory: 30.5 GiB of RAM
Graphics Processor: AMD Radeon Graphics
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: B650 AORUS ELITE AX

The issue I am having is that often the gpu becomes unavailable after a period of time. Restarting the computer seems to get it working again.

When it becomes unavailable `nvidia-smi` returns Unable to determine the device handle for GPU0000:01:00.0: Unknown Error

`nvidia-debugdump --dumpall`

returns
```
ERROR: internal_dumpNvLogComponent() failed, return code: 0x3e7
ERROR: internal_dumpGpuComponent() failed, return code: 0x3e7
ERROR: internal_dumpGpuComponent() failed, return code: 0x3e7
ERROR: internal_dumpNvLogComponent() failed, return code: 0x3e7
```

Possibly related, on https://wiki.debian.org/NvidiaGraphicsDrivers

It is mentioned that if `lspci | grep -E "VGA|3D"` returns two lines, you have an optimus card, for me the command returns:

```
01:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
10:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raphael (rev c2)
```

So take that to mean it is an optimus card, but when I go to look at the steps for that I dont really understand what I am to do. It looks like the optimal solution is `Nvidia prime to render offload`. But that doesn't look like a configuration, it reads to me as something i prepend to commands. Does that mean that I should prepend that to any command that will be using a gpu. In the case of comfyui would I run `__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia python main.py` after activating the conda environment that I set up for comfyui? In that case I still get the same error:

```
torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
```

How can I solve this so the gpu is ready whenever I need it?

Thanks in advance.

Statistics: Posted by quddus — 2024-02-19 02:33 — Replies 0 — Views 44



Viewing all articles
Browse latest Browse all 3200

Trending Articles