When I try to run it on Android, it's very slow (+16000 cycles/second, supposed to be ~4770000/second (14318180/4/second to be exact). Is this simply because of it's 0ns delays every few cycles? (Source code in dignature) Is there a good way to profile on Samsung Galaxy S7 w/ Android NDK (Windows)?
It uses a Samsung Exynos 8890 Octa CPU(Android) as far as I know(clocked at 2.4GHz). I'm trying to get the android-ndk-profiler to compile, but it just won't take the NDK_MODULE_PATH I pass to the ndk-build.cmd. I execute "androidprompt.bat profile" in the Windows command prompt(or through shortcut), resulting in the following output:
1Profile build selected. 2Android NDK: jni/src/Android.mk: Cannot find module with tag 'android-ndk-profiler' in import path 3Android NDK: Are you sure your NDK_MODULE_PATH variable is properly defined ? 4Android NDK: The following directories were searched: 5Android NDK: 6jni/src/Android.mk:46: *** Android NDK: Aborting. . Stop.
I have the contents of the android-ndk-profiler repository(https://github.com/richq/android-ndk-profiler ) downloaded into the "C:\androiddevdir\android-ndk-r12b\sources\android-ndk-profiler" directory and compiled by CD-ing into it from within the androidprompt.bat, then calling ndk-build to compile it. It generates two .a libraries in it's obj subfolders. Then I just plainly call the androidprompt.bat like described above, from the UniPCemu repository directory.
Ah I see why at 2.4Ghz you are worried that you are only getting 16k cycles/sec. However keep in mind that unlike the desktop CPUs where you can trust the frequency on mobile ARM you will be dutycycled and probably you will not be running at 2.4Ghz all the time. That being said you should be getting way more than that. On my Raspberry PI 3 (which runs an ARM v8 at 1.2Ghz) I get about 2.9mil cycles/seconds in CAPE.
I do not know anything about NDK profiler but it just seem you are missing some paths so the projects find each other.
Well, that's the strange thing: I specify the report path (C:/androiddevdir/android-ndk-r12b/sources or it's subdirectory makes no difference), but it refuses to find it.
Specifying an invalid path makes it throw a warning that it's ignoring the invalid import path(e.g. adding xxx to test to the end of the NDK_MODULE_PATH environment variable). It should show up at the line above the final line in my last post (the list of searched paths), but it doesn't even list it?
Batch file executed (the %1 if is executing ndk-build):
1@echo off 2set ANDROIDNDK=C:/androiddevdir/android-ndk-r12b 3set path=%path%;%ANDROIDNDK%;C:\androiddevdir\apache-ant-1.9.7\bin;C:\androiddevdir\android-sdk-windows\platform-tools 4rem Add Java path as well to support keytool. 5set path=%path%;c:\Program Files\Java\jre1.8.0_91\bin 6set JAVA_HOME=C:\Program Files\Java\jdk1.8.0_102 7set ANDROID_HOME=C:/androiddevdir/android-sdk-windows 8set NDK_MODULE_PATH=%ANDROIDNDK%/sources/android-ndk-profiler 9rem Set us up in the used project folder and start a command line session to work in! 10cd android-project 11if "%1"=="profile" ndk-build profile=%ANDROIDNDK% NDK_MODULE_PATH=%NDK_MODULE_PATH% 12cmd 13pause
I've managed to make it continue compiling up until it tries to compile GPU_mcount.S(with a lot of compiler errors):
1[arm64-v8a] Compile : android-ndk-profiler <= gnu_mcount.S 2C:/androiddevdir/android-ndk-r12b/sources/android-ndk-profiler/gnu_mcount.S: Assembler messages: 3C:/androiddevdir/android-ndk-r12b/sources/android-ndk-profiler/gnu_mcount.S:11: Error: unknown pseudo-op: `.thumb_func' 4C:/androiddevdir/android-ndk-r12b/sources/android-ndk-profiler/gnu_mcount.S:16: Error: unknown mnemonic `push' -- `push {r0-r3}' 5C:/androiddevdir/android-ndk-r12b/sources/android-ndk-profiler/gnu_mcount.S:17: Error: unknown mnemonic `push' -- `push {lr}' 6C:/androiddevdir/android-ndk-r12b/sources/android-ndk-profiler/gnu_mcount.S:18: Error: operand 1 should be an integer register -- `ldr r0,[sp,#20]@r0=lr pushed by calling routine' 7C:/androiddevdir/android-ndk-r12b/sources/android-ndk-profiler/gnu_mcount.S:19: Error: operand 1 should be an integer register -- `mov r1,lr@address of calling routine' 8C:/androiddevdir/android-ndk-r12b/sources/android-ndk-profiler/gnu_mcount.S:21: Error: unknown mnemonic `pop' -- `pop {r2}@this routine115return address' 9C:/androiddevdir/android-ndk-r12b/sources/android-ndk-profiler/gnu_mcount.S:22: Error: unknown mnemonic `pop' -- `pop {r0,r1}' 10C:/androiddevdir/android-ndk-r12b/sources/android-ndk-profiler/gnu_mcount.S:23: Error: junk at end of line, first unrecognized character is `@' 11C:/androiddevdir/android-ndk-r12b/sources/android-ndk-profiler/gnu_mcount.S:24: Error: operand 1 should be an integer register -- `ldr r3,[sp,#8]@r3=lr pushed by calling routine' 12C:/androiddevdir/android-ndk-r12b/sources/android-ndk-profiler/gnu_mcount.S:25: Error: operand 1 should be an integer register -- `str r2,[sp,#8]@return address now last on the stack' 13C:/androiddevdir/android-ndk-r12b/sources/android-ndk-profiler/gnu_mcount.S:26: Error: operand 1 should be an integer register -- `mov lr,r3@lr=caller115expected lr' 14C:/androiddevdir/android-ndk-r12b/sources/android-ndk-profiler/gnu_mcount.S:27: Error: unknown mnemonic `pop' -- `pop {r2,r3}' 15C:/androiddevdir/android-ndk-r12b/sources/android-ndk-profiler/gnu_mcount.S:28: Error: unknown mnemonic `pop' -- `pop {pc}@pop caller115expected r2,r3 and return' 16make: *** [obj/local/arm64-v8a/objs/android-ndk-profiler/gnu_mcount.o] Error 1
Executing "ndk-build profile=%ANDROIDNDK%/sources" from the androidprompt.bat file. I've moved the android-ndk-profiler jni folder contents to it's parent folder(C:\androiddevdir\android-ndk-r12b\sources\android-ndk-profiler). It wouldn't even try to compile otherwise.
Could it be it's because it's trying to compile it for the arm64-v8a architecture as well, instead of just for armeabi and armeabi-v7a as it's Application.mk suggests(for which it's build)?
Edit: Modifying UniPCemu's Application.mk file to only support those two platforms instead of "all" makes it continue compiling and finish without errors. So it's indeed only supported when building for those two platforms only.
1 #ifdef NDK_PROFILE 2 atexit(&monpendingcleanup); //Cleanup function! We have lower priority than the callbacks(which includes SDL_Quit to terminate the application, which would prevent us from cleaning up properly. 3 #endif
The gmon.out should be in the currently used UniPCemu root directory, as specified by the first writable drive of the SECONDARY_STORAGE environment(highest priority) then either SDCARD(Pelya's SDL and normal SDL(2)) and finally(for normal SDL(2) only) SDL_AndroidGetExternalStoragePath and SDL_AndroidGetInternalStoragePath. Since it's properly using the internal memory folder(where all documents are to make it run, including the used SETTINGS.DAT) exist, I conclude it's using the internal Samsung memory storage location correctly. But no gmon.out appears in that location, so there might be something wrong with either UniPCemu's code or the android-ndk-profiler itself?
Edit: Does the SDL_QUIT event fire when it's terminated using Android(goto task switch overview using the phone button(on Galaxy S7 left to the home button, or holding the home button on some systems, like my LG-P710), and terminating the app by swiping it off the screen or pressing X)?
I just found out something: triggering UniPCemu's RALT-F4 termination shortcut makes it handle the termination the emulator way, triggering SDL_QUIT event and emulator shutdown properly, logging the gmon.out file in the project's used directory.
So for some reason, the SDL_QUIT event(or SDL Terminating event) isn't properly called on Android, causing the app to be terminated by Android itself, instead of through the proper SDL_QUIT hander in UniPCemu's input.c. Thus terminating the app without properly cleaning up(and generating gmon.out).
The good news is, that it now properly generates a gmon.out to inspect:D
After adding a seperate batch file for analyzing the results from the profiling run(and looking at the resulting data), it seems the main problem is the text surface renderer itself: this takes up most of the CPU time. CPU_initLookupTables can be ignored(it's only called once when emulation of a certain CPU is started to precalculate all required lookup tables).
I've optimized the text surface rendering by precalculating all horizontal and vertical(horizontal and vertical axis seperarated in two precalc buffers) locations of the screen, which maps the screen coordinates to the text surface coordinates. Now the GPU_textrenderer doesn't even show the GPU_textrenderer routine anymore(so it's either very light code now(only calculated once when starting the emulator), or it's nothing compared to the other routines that run(CPU_initLookupTables jumps to 100% CPU usage now, which should only be called once during the entire runtime of a running emulated CPU))?
Edit: With the latest optimizations, it's now running at ~26500 cycles/second(8088 at 4.77MHz, All sound hardware enabled(with AweROMGM soundfont), PSP-mode rendering(4:3 VGA, Color monitor, CGA emulation, rendering NTSC in old-style mode, Framerate disabled, Full CPU synchronization)).
The profiling itself reported fine, almost no time after 30 seconds of runtime:
I might have found a problem in the text surface rendering code, which may have been the cause of the extreme screen updates: it was checking the border of a character cell for updating clickable text(e.g. the Set button in the bottom right of the screen to open the Settings menu), but was comparing the old font with the newly set border, causing the text surface to always update each and every frame(60 times/second), instead of only once each frame(since it's only changing it's output once every second(framerate display) or when the user provides an action(e.g. like opening and using the settings menu or clicking the screen using the buttons(or keyboard/mouse input on PCs as well)).
Edit: The profile build is running like crazy, up to 260000 cycles/second now. Eventually jumps down to 5000-6000 cycles/second, though. Just from the text surface not updating as much(excess updating was being done, due to the update detection bug).
Edit: It takes about 24 seconds for it to drop from 270000(after optimizing the put_pixel and related subfunctions to be more efficient by using the (un)likely macros) to 5000-6000 seconds. So about 6480000 cycles emulated until it drops to almost nothing. So after about 0.736 seconds emulated, it shrinks down into nothingness(maybe the VGA starting or something related)?
The total profile time is about 52.7 seconds(calculating by dividing the GPU_textrenderer calls column by 60FPS, which is the rate the screen is drawn at).
I've modified the rendering code to precalc the start of the row to render on the rendering surface, avoiding multiplications every pixel, reducing it to once every scanline(on the main rendering surface rendered to SDL, which is flipped into view(forced update actually, using the custom 'flip' function in UniPCemu's GPU unit, SDL-to-SDL2-style, based on the way documented on the SDL migration tutorial on libsdl.org). So instead of calculating a pixel row every pixel, it's now done once every row, with the call to the put_pixel function removed(it's functionality copied and optimized).
I've finally got it somewhat running at the same speed on a 2.2GHz laptop(Intel Dual core CPU). It runs at ~6% as well. I've managed to make it compile and start profiling with GCOV, but trying to convert the profiler dump to a text file seems to fail, asking for an unexistant data file?
I call "make win analyze lineprofile gcov_files=emu/gpu/gpu_text.c" to get results I need.
This literally executes "gcov emu/gpu/gpu_text.c". Anyone can see what's wrong about this?