I would have preferred 75% allocated for regular usage and 25% set aside for other things
The problem here is that your analysis is flawed not the design, no offence. The "available" memory is simply a read only (largely) copy of what is on disk so can be safely overwritten without extra cost. i.e its the difference between overwriting 0xabcdef and changing the page index from cached>application or overwriting 0x0000 and changing the index from free>application. Given the choice between your RAM being filled with zeros or filled with something I might
need later I know what I would choose. OK some (only new allocations and not page-ins or code/data loads) RAM needs to be zeroed (for security) before handing it to an application but most other OSes do this on demand. WindowsNT does in fact use idle CPU to pre-zero free memory but unless all CPU cores are pegged at 100% you don't need more than a few MB buffered for this. Also since you have a large cache the NT kernel can delay memory writes when the disks are idle (lazy writes) or even opportunistically pre-write in memory data that's still in use rather than get in the way of reads.
The other point people are unaware of is this is EXACTLY the same as XP (+prefetcher), Win2000, NT4.0, NT3.51 and of course VMS (Thank you Mr. Cutler). You could achieve the same thing on prior versions by simply loading then closing all your most frequently accessed applications in the morning. The problem is on modern PCs (with bags of RAM) its takes a lot of time to warm up (i.e. fill) the cache and an empty cache is a useless cache so why not fill it in idle time? In fact the only significant change in Vista's memory manager is the addition of memory priority so it can give preference to OS>applications>actively cached stuff>superfetched stuff which is a big improvement.
OK back on topic the other big change in 64bit is the change in OS calling conventions, since extra registers allows more function params to be passed in register and not on stack, also (with reference to windows) a totally different exception handler mechanism is used i.e. read-only complier generated stack unwind data + fixed prolog/epilog styles rather than (easily exploited) exception handler chains. This then makes writing ASM for inner loop optimization in applications very different. However, while the extra memory space and the extra registers are very useful (8 is not really enough) the increase from 32->64bit really doesn't help 90%+ of algorithms as evidenced by most other archutectures (e.g. PPC) usually getting slightly slower (5-10%) when moving from 32 to 64bit code due to extra pressure on the code/data cache. x86 is unusual in this regard as it usually gets about 5-10% faster in 64bit mode largely due to the extra GPRs even though most 64bit code only actually using 32bit operands. That is not to say that the odd bit of code can not really benefit from 64bit. For example, SHA512 is much (~x2) faster.