nvidia repositories are hard coded

Bug #1915361 reported by Narinder Gupta
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Containerd Subordinate Charm
Fix Released
High
George Kraft

Bug Description

We are trying airgap deployment of kubernetes where we handcrafted the Nvidia repositories in the local repo. But contained fails to install Nvidia packages as it hard code the Nvidia repositories and keys. This should be configurable through config options.

Tags: seg
Revision history for this message
George Kraft (cynerva) wrote :
Changed in charm-containerd:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Ante Karamatić (ivoks) wrote :

Attached is the solution Narinder used to solve this problem.

Revision history for this message
Vern Hart (vern) wrote :

Is there a plan to incorporate this patch into the charm?

I'm doing another offline deployment for a customer that requires drivers from nvidia for gpu integration.

Felipe Reyes (freyes)
tags: added: seg
Revision history for this message
Vern Hart (vern) wrote :

Subscribing field-medium as this impacts customer deployments.

The patch attached above is not quite enough. Need to add a couple of variables to the config.yaml. Also, I had to update the hard-coded packages as the versions in the package names have advanced. Specifically, the packages in the patch are:

NVIDIA_PACKAGES = [
    'cuda-drivers',
    'nvidia-container-runtime',
    'nvidia-fabricmanager-460',
    'nvidia-utils-460',
    'linux-modules-nvidia-460-generic',
]

And cuda-drivers, which appears to be a pseudo package that points to cuda-drivers-470 but the latest linux-modules-nvidia package is linux-modules-nvidia-465-generic so you get dependency conflicts. I had to update it to:

NVIDIA_PACKAGES = [
    'cuda-drivers-465',
    'nvidia-container-runtime',
    'nvidia-fabricmanager-465',
    'nvidia-utils-465',
    'linux-modules-nvidia-465-generic',
]

The way nvidia includes the software version in the package name seems to make it cumbersome to keep this up-to-date. There's got to be an automated way to determine the latest packages to install.

Revision history for this message
Narinder Gupta (narindergupta) wrote :

option option would be make package list as part of config option might help.

George Kraft (cynerva)
Changed in charm-containerd:
importance: Medium → High
Revision history for this message
Vern Hart (vern) wrote :

For clarity, I started with the patch but had to add some config.yaml changes and update the nvidia packages. Here's the patch I'm using: https://api.jujucharms.com/charmstore/v5/~fe-staging/containerd-5/archive/bug-1915361.patch

Revision history for this message
Vern Hart (vern) wrote (last edit ):

Subscribing field-high as this is blocking support handover for a customer deployment.

George Kraft (cynerva)
Changed in charm-containerd:
assignee: nobody → George Kraft (cynerva)
status: Triaged → In Progress
Revision history for this message
George Kraft (cynerva) wrote :
tags: added: backport-needed review-needed
Changed in charm-containerd:
milestone: none → 1.22+ck1
Cory Johns (johnsca)
Changed in charm-containerd:
status: In Progress → Fix Committed
tags: removed: review-needed
Revision history for this message
Cory Johns (johnsca) wrote :
tags: removed: backport-needed
George Kraft (cynerva)
Changed in charm-containerd:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.