VOGONS


First post, by smesgr

User metadata
Rank Newbie
Rank
Newbie

Hi,

even through RX5500 to RX7000 series is still quiet new. I would like to post some informations currently hard to come by and scattered around - few chinese and russian pages.

Is there a possibility to check your RX[5-7]X00 cards?
Yes. There is a software package from AMD provided to board partners and maybe repair shops. It is called tserver which according to the manual is "AMD Board Production Diagnostic Test Suite". This suit contains several test setups from stress tests, memory test and test of specific components like video decoder. Every package is ~1-3GB in size for a specific range of cards. E.g. RX5500, RX6800, RX6900, RX7000

cool where can I download this?
thats the tricky part. I did not find a specific information, but it seem the suit was provided to board partners and likely leaked. Thus you need to do some google magic here amd tserver and your card without the suffix like XT XTX and so one should bring it up. I think it is a good idea to get those packages now - you never know.

how to run the stuff?
the suit is build to run on a Linux system. There are floating ISOs around, but at least for my card none of them worked. Thus the following description may depend on the specific tserver package. For RX6000-Gen this worked for me. Whats the big deal? The package contains of several perl scripts and driver packages. Those driver packages are build at runtime against the Linux kernel. But the kernel interface is not stable thus you need a specific Linux kernel for your package. You can't modify/fix the driver package for your kernel, because the perl script checks parts of the suit and throws an integrity error. The manifest contains the checksum but couldn't figure out which actual algorithm was used.
Thus after trial an error Ubuntu 18.04 LTS with the v5.4 kernel worked.
Than the suit requires some parameters to be setup.

mem=8G consleblank=0 iommu=off nomodeset

this has to be added to the grub bootloader in /etc/default/grub or the location for your distribution to GRUB_CMDLINE_LINUX_DEFAULT
- the specific values for conoleblank, iommu are requested by the suit itself.
- mem kernel parameter specifies the amount of memory usable for the system. The suit also request the value to be set. I don't know why but 8G for 8 Gigabyte worked for me. To low values may cause hickups. Went up to maximum system memory and this still worked
- nomodeset kernel parameter stops Linux of setting the graphics mode using the device specific kernel driver. Thus for RX6000 series the amdgpu driver. With the parameter set most distribution also stop using a graphical login and fallback to console login
Don't forget to run update-grub to enable your modification. An alternative way to set those values is in grub editor in the boot loader itself. Be aware you have than to do that on every boot.
After you setup the linux distribution you need to have "make" and "gcc" available. With those build tools the suit can build the kernel driver modules at runtime. For Ubuntu and most Debian derivates those can be installed via

apt install make gcc

or your distributions package manager.

now to actual test your GPU?
If you have prepared you test rig - you have to start with your GPU. I think most post that I came across uses an onboard graphics card to actual run the console interface, but I prefer to login via SSH to the machine from a different system. Other option would be to configure a serial connection for the console output. No mater which option you choose as a root you should go to the suit directory and run

tswrapper

or

tserver

directly for example

./tswrapper.pl -d=1 -boardtest=quickmfg

If you have several GPUs inside you have to point to the right card with d-option. If you get an error like the operation is not possible for you SKU - than your card is not supported by the suit - double check if you have got the right package. Be aware you may give execution permissions via

chmod 755 <file>

for several executables.
I think the big tests is extmfg on the first execution you will get asked which test groups you would like to test. Also option to log everything into log.txt and log.yml. This is useful a full blown test may take more than 1 hour.
From here the Board Reference Guide PDF should give you information which tests does what.

I hope with this information you could jump start your diagnostic process.