To understand what's going on here you really need to understand how the CGA card tells the monitor the size of the active image. The answer is: it doesn't. The only information the CGA card sends the monitor is the signal itself (either composite or RGBI) and the sync pulses. So as long as the sync pulses are in the right places, the monitor doesn't care if the CRTC's idea of the image size is 2 scanlines or 262. On receiving a vertical sync pulse, a real CRT won't necessarily immediately move the beam to the top of the screen - it only has an effect if the beam is close to the place where the vertical retrace would be happening anyway. So the monitor has its own idea about the image size. In the NTSC standard, this is defined by the sync frequencies - 15.734kHz horizontal, 59.94Hz vertical (with a certain tolerance). This works out at 910 hdots by 262.5 scanlines.
The CGA card doesn't generate a signal with precisely these timings, though - it generates a signal that is 912 hdots by 262 scanlines. However, all real NTSC CRTs have sufficient tolerance in their sync frequencies to display such an image correctly. So, to emulate the machine properly you need another set of horizontal and vertical counters (separate from the ones in the CRTC). These counters form part of the emulated CRT. If you get a horizontal sync pulse at hdot 902-918 then act on it, and likewise for a vertical sync pulse between scanline 248-276. If the beam gets to the end of this region then do a flyback anyway (resetting the counter to zero). These tolerances were measured from an IBM 5153.
For the 1K colour mode specifically, the 2 scanlines-per-CRTC-frame mode works like superfury said - the end-of-frame happens before the vertical sync position so the CRTC restarts its "frame" (including resetting the memory counter to the start address) but the monitor doesn't. On the last row the CRTC timings are changed so that a real sync pulse happens in the right place.