It currently updates it's state whenever a timeout occurs (when the time emulated (which is kept in a simple nanosecond double floating point number) exceeds a pixel clock's time (e.g. every 1/28Mth at 28MHz pixel clock, so every 0.03ns emulated). It currently takes the time the CPU emulated (total CPU clocks executed for the current instruction, converted into nanoseconds) and adds it to it's own remaining spent time up to that point. Once it detects that the time it's accumulated exceeds(or equals) 300ns(when running at 28MHz), it divides the number it's accumulated(and stores the rest of the operation back into the time accumulator for future timing) by 300nS to get the amount of pixel clocks it needs to update. It then executes that amount of pixel clocks(every pixel clock either draws a pixel on the screen(when applicable: it depends on the current state of the CRT(active display/overscan. retrace state etc.)).
Although this is relatively CPU-heavy to execute, it isn't the part that spends the most time(according to the Visual Studio profiler): the most time is spent with updating RAM locations etc.(loading the next data from VRAM(4 bytes of data loaded every character clock(depending on (S)VGA settings))).
Is there any way to speed it up? Or is that practically impossible without breaking accuracy?
Btw, I've noticed there's a discussion out here also that says there's no CRT emulators out there, but isn't a CRT only a simple beam tracing from left to right, top to bottom on the screen drawing pixels at a specified speed? So essentially any emulator creating display this way is an 'CRT emulator'? Although the actual way seperate pixels are handled is kind of simplified (sets of 3 red/green/blue 'pixels' (at different angles, depending on the monitor) lighting up at different strengths to form a pixel. Most emulators(including mine) only show the result of those 3 R/G/B pixels becoming one pixel. Although I've yet to see any emulator going as far as mine, trying to accurately plot pixels by the clock&pixel(Most emulators only draw entire screens or lines, directly from VRAM, which may or may not be accurately timed cycle-accurate).
Btw, the only part my emulator does whole screens and lines at a time is converting entire data lines to RGB display(only with CGA/MDA) and scaling screens to display every frame(this is handled by UniPCemu's GPU core, which converts the rendered screens entire screens at a time into the correct display resolution(what normally happens by stretching the rendered display across the screen by the monitor itself, depending on the retrace signals). So essentially the VGA(and it's CRTC emulation) contains the RAM logic and rendering logic(rendering the (S)VGA/CGA/MDA RAM into pixels at a specified dot clock rate), while the GPU core processes those frames at every vertical retrace and converts it into the display the user sees at it's proper aspect ratio and resolution(which depends on the emulated monitor and settings(in this case a CGA(special custom resolution found at one of Reenigne's blog articles), 800x600(VGA), 1024x768(SVGA) or 1920x1080(SVGA) display). Although the virtual display(the one the VGA is rendering to) is currently limited to 2048x2048 pixels(which is more than enough to contain frames up to fullHD, as rendered by the current (S)VGA rendering).
The rendering process runs at a constant rate(up to 28M pixels each second when using the VGA, higher rates can be selected on the Tseng video cards). It essentially draws blocks of pixels(or moves the beam) at that rate, although it will move small blocks when the CPU is running slower than the (S)VGA(e.g. if the CPU executes an instruction with a time that is a multiple of the VGA clocks, it will spend that many VGA clocks in a loop before executing the next instruction(only dividing one time)). Although each clock is still handled seperately(e.g. each clock does all work the VGA needs to do that clock, updating states etc. before moving to the next clock).