Put simply (copy pasta of some oldies elsewhere)
1280x1024 is based on nice power of two values (like 1024x768), thus easy to handle and maximizing the use of RAM, as it exactly filled 1.25 MiB in 8 Bit Mode and 2.5 MiB in 16 Bit Mode (*1) Both sizes could be well added as a series of 5 RAM banks to the card. In addition it's worth considering that later single chip VGA designs were usually fully programmable, thus able to offer next to any resolution (within their pixel clock that is).
And yes video cards with 2.25mb (1024x768) and 2.5mb on the dot did exist.
Some years later 1280x1024 got a revival when upcomming (relative) low cost LCD manufacturing process passed the 1024x748 ability for 15".
The concept however of 5:4 aspect ratio, and 1280x1024 graphics coordinates, is actually much older than you think.
Many non-ibm work stations with fixed frequency screens used 1280x1024 (back in the 80’s) for the aforementioned reasons above.
In the consumer space…
It dates back at least to the BBC Micro introduced at the end of 1981; it had graphics modes designed around the PAL TVs used in the UK, with 160x256, 320x256, and 640x256 modes, all with a standard coordinate system of 1280x1024 for easy graphics programming. At 8x8 character size, this allowed an 80x32 text display on affordable hardware at home, better than many of the cheaper dumb terminals. The Acorn Archimedes, which succeeded the BBC Micro in the late 1980s, extended this capability to 640x512 with PAL TV timings, as well as supporting VGA/SVGA resolutions when connected to a PC-type monitor.
These resolutions were very easy to implement on PAL, using a 16MHz master dot clock, since the time between horizontal sync pulses is exactly 64µs, and there are slightly more than 512 lines (divided between the two interlaced fields) in the display-safe area. This relatively high level of capability was used by the BBC to generate broadcast TV graphics during the early to mid 1980s.
By 1984, early SGI IRIS workstations supported high-resolution graphics, with - in particular - 1024 rows of pixels:
The IRIS 1400 comes standard with 1.5 MB of CPU memory, 8 bit-planes of 1024x1024 image memory. A section of the framebuffer could be selected for display output, the size of that section depending on the output device's capabilities.
Apple introduced what was then considered a very high-resolution monochrome display in 1989, supporting just 1152x870 resolution (in a 4:3 aspect ratio), a size most likely designed to just fit in a megabit of RAM. A special modification to the Acorn Archimedes series allowed it to support 1152x896 (close to 5:4 aspect ratio) on a particular monitor, probably very similar to the one made by Apple; the Archimedes allocated display memory in system DRAM, so it didn't have a hard megabit limit as a Mac's graphics card did.
As the availability of fast and affordable memory became less of a restriction on graphics capabilities in the 1990s, it is notable that 1280x1024 with a 5:4 aspect ratio was specifically catered for by high-end monitor vendors. If three bytes per pixel are used to support 24-bit truecolour, moreover, this is a resolution that fits comfortably in a comparatively affordable 4MB of VRAM. CRTs could easily be built this way, as the natural shape of a glass tube is circular, thus the squarer the aspect ratio the easier the CRT was to make. This also did not restrict the display from handling 4:3 aspect ratios cleanly, just leaving a slightly different pattern of blank borders at the edges. Once 1280x1024 was established, LCD monitors for computer use were made to support it (in contrast to those for televisions)
The slightly taller aspect ratio is useful in text modes, where programmers appreciate having more lines of code on screen more than they do having more columns, and also in desktop environments where menus and toolbars have a habit of consuming vertical space more often than horizontal. The present trend towards wider aspect ratios, by contrast, is driven by the movie and gaming industries which want to cater for human peripheral vision, and thus immersion in the scene, rather than for display of information.