Ned's BigFaT Blog!

November 26, 2009

Suh-weet

Filed under: Uncategorized — makfu @ 9:30 pm

I have had an on again, off again relationship with Firefox for a long time. I love the extensions and the superfast jscript engine, but wish Firefox would implement some Windows Vista/7 specific security features, like sandboxing via Mandatory Integrity Control (especially given how potentially dangerous the extensions architecture is).

All that aside, one recent development that blew my socks off was the announcement that Mozilla’s folks have been working on Direct2D based rendering. This is something the IE team announced at PDC, but one day later the dev heading up the D2D effort for Mozilla actually posted a working build. And does it work! Here is a browser that finally makes Facebook’s rendering time not suck and scrolling smooth as silk, not to mention that the text rendering, especially when zooming, is gorgeous (vertically and horizontally sub-pixel interpolated using D2D implemented cleartype). Impressive to say the least.

What is interesting though is that between IE and Firefox, the two most popular browsers on the planet are moving to D2D based rendering which, frankly, is surprising to me. Getting people to adopt more flexible 3d derived API’s, such as WPF, that can take advantage of modern GPU hardware has proven to be difficult; love it or hate it GDI has had a lock on most general purpose applications for a long time (mainly because, even without true GPU acceleration, it is fast). Yet, with D2D, we are finally seeing some concrete uptake of a GPU accelerated API for something other than games. This is exciting stuff and it will be interesting to see what other GDI applications make the move to D2D in the near future.

To check out more on D2D enabled Firefox alpha and to download working bits, visit Bass Schouten’s blog by clicking here. For more information on D2D, visit Tom Olsen’s blog here.

Oh thank God…

Filed under: Uncategorized — makfu @ 1:49 am

I thought I was the only person on earth call all this use of the term “cloud” BS. Well Larry is here to save my sanity:

November 23, 2009

I have seen the Future and it is full of FAIL.

Filed under: Uncategorized — makfu @ 12:18 pm

There has been a lot written about Google’s ChromeOS over the last 4 days or so. Every paid and unpaid blogosphere new/old media hack has some opinion about Google’s second major foray into the Operating System space. Some have hailed it as the “future” and described it as “a strike at the heart of Microsoft”. Others have greeted it with considerable suspicion, seeing at as an attempt by Google to gain control over a wide range of the computing experience. I, and a few others, take a slightly different view: ChromeOS is junk. Yeah, I said it. Garbage.

Let’s just call out ChromeOS for what it is – a locked down appliance implementation of Debian (with a limited subset of the usual Linux underpinnings including Xorg), running an SELinux enabled kernel, a least privileged user session and Linux Chrome. There really isn’t anything interesting or novel about this beyond the fact that Google has convinced so many gullible people that this is something new and better.

Their plan: you get to run their browser and your apps are essentially jscript based and rendered using HTML You get a profile that is part of the local system (located, predictably under /HOME) but most of your data is supposed to live up on Google’s servers (or someone else’s web app server – I am done calling this crap “The Cloud”). Want to run Firefox? Not gonna happen. Want, to install a native rich client application? Denied. It’s Chrome or nothing.

If you like the sound of this rather Orwellian computing model then all of this might be fine if Google didn’t try to ram the full screen tabbed browser interface model down the the throats of users. Don’t get me wrong, Chrome is an excellent browser (among the best out there on Windows), but to think that it alone can replace the conventional desktop, window and taskbar/dock metaphor is beyond ridiculous. Google: there is a reason that Windows, MacOS and Linux continue to use the same fundamental UI building blocks: THEY WORK!

I spent a number of hours looking at the source and playing with ChromeOS in a VM this weekend, so let’s take a look at a few screenshots:

login

Above is the ChromeOS logon screen; you use your Google ID to login and as of this build there is no offline credential caching, though I would think at some point this functionality will be added.

startpage

Once logged on, this is what you see. A Chrome browser instance – it can’t be resized, minimized and there is absolutely no support for overlapping or tiled browser windows (at least not in this build). If you want to simulate this experience, take any current browser and run it in full screen mode as that is essentially the UI of ChromeOS.

googledocs

The above screenshot illustrates what a typical workload in ChromeOS might look like – Google Docs,  YouTube and various websites, all organized and presented in tabbed form (again, just like a fullscreen browser session on any other OS).

filesystem

Here is where things get a little more interesting; as of this build, you can still browse the file system from within Chrome (actually, due to the obvious haste in which this was thrown together, conventional open/file dialog boxes are still present also). From here you can observe that ChromeOS is a pretty conventional Linux distro, albeit a very locked down and rudimentary one.

controlpanel

Above is the only OS control panel item currently implemented. Granted, Google’s stated goal of the OS only running on locked-down, unmodifiable  configurations limits the number options they need to provide access to, but I suspect a lot more will still need to be done on this front.

taskmanager

The Chrome Task Manager is essentially the same as that found on other Chrome implementations.

aboutmemory

Above we see the About:Memory information provided by Chrome. It’s interesting to note the reference to other browsers that is still present, despite the impossibility of running anything other than Chrome (I’d assume Google will remove this reference in future builds).

aboutos

In the About: info are some details regarding the underlying software stack. Nothing particularly interesting here.

DeveloperTools

The screenshot above really illustrates the failings of the Chrome UI model. Opening the excellent built-in Chrome Developer tools you are forced into this horrendous tiled mode. By forcing you to live without conventional overlapping windows it’s as if we have stepped back  25 years to Windows 1.01:

win101

It was a joke in 1985 and it’s borderline insanity two and half decades later.

DevToolsChromeOS-Windows

In contrast, a conventional modern OS’s windowing model simply makes more sense which is why nobody (who isn’t pushing a web or nothing approach) has changed the model. Could the conventional UI be replaced with something better? Certainly, but it will have to be something that is actually more usable, powerful and intuitive – not the overloading of a single application – the web browser – metaphor for the purposes of forwarding a single companies business agenda.

When all is said and done, I suspect that ChromeOS will fail spectacularly, despite the cheerleading from the Tech Media. I don’t say this just because I am philosophically (and vehemently)  opposed to the web or nothing "in the cloud” model that some champion, I say this because I doubt anyone wants a netbook (or notebook, desktop, etc.) that is so completely limited. The proof is in the pudding – people don’t want less for less. The netbook market’s biggest issues have stemmed from people expecting more for less; consumers mistakenly thinking they were getting more computer than they were with their netbook purchase. When we look at the dissatisfaction stories about Win7 on netbooks, it stems from wanting a MORE full featured version of Windows than Starter, not something more restrictive, smaller or featureless. The end result is that the stripped down Linux netbook implementations have fallen by the wayside and netbooks are now being scaled up to look more like the ultraportable, high-feature notebooks people thought they were all along.

Most telling of all is that Google’s own Android is a far more capable platform and will likely remain that way for some time. I don’t care how much polish Google puts on this one, once the media Googlegazm and hype subsides, nobody will buy this thing. Period.

October 20, 2009

Netbooks are cool

Filed under: Uncategorized — makfu @ 7:55 am

I am not a big fan of Notebook computers, especially the latest and greatest over-engineered monster desktop replacements. There are several big problems with these monster mobile machines:

  • They cost considerably more than a faster desktop equivalent
  • They can’t be upgraded in the same fashion as a desktop
  • They tend to be remarkably overloaded with craplets required to make all the value-add functionality work (shortcut buttons and other doohickeys)
  • The most powerful machines run hotter than the surface of the sun making them unpleasant to use on your lap

As a result, I much prefer to work with my desktop for heavy lifting. The Nvidia GTX285 video subsystem is massively more powerful than anything available in a notebook, it has RAID 0, 2.7TB of SATA storage and 8GB of ram and quad cores, not to mention a 24inch 1920×1600 display that not even the most ridiculously oversized “laptop” can come close to matching. It also happens to be (literally) 100% reliable, never having once bluscreened or suffering a forced reboot since I built the machine.

Which brings me to my current mobile solution: a Samsung NC10 with 1GB of RAM, a 2GB SD card for ReadyBoost, 160GB 4200RPM HDD, 1.6Ghz N270 Intel Atom single core CPU (HT enabled). To me, this netbook is the perfect desktop accessory and does everything a mobile computer should do:

  • It’s totally reliable (having never crashed or hung running both Win7 RTM and RC)
  • Has only minimal craplet overhead
  • Is only 2lbs or so, has a great 93% keyboard and a brilliant 1024×600 LED backlit display
  • Runs Win7 RTM great, including Microsoft Security Essentials, Aero, 480p (720×480) video, real office apps concurrently and a lot more
  • Gets SIX hours + of battery life (more if I run the machine in the Win7 power-saver power profile)
  • Can sit on your lap without potentially damaging your procreative prospects (runs cool)

So, if you have a big honking desktop and use it to do most of your heavy work (Virtualization, development, media, content creation and gaming), the I’d say the netbook, not some overcooked notebook, is the right machine for your mobile needs.

 

 Windows 7 Ultimate on a Samsung NC10 running Zune 4,0 (playing 720×480p), along with multiple Office apps, browsers, Skype, Steam and Microsoft Security Essentials + Bitlocker.

netbook

August 10, 2009

I’m back and feeling fluffy

Filed under: Uncategorized — makfu @ 9:52 pm

Cloud computing. One hears this term everywhere. Every application will soon be a cloud-based web application. No more desktop applications. No more Office or Photoshop for Windows or MacOS, they will just all be web apps that live in the magical cloud. This will be a world of Google ChromeOS, Azure, App Engine and Google Doc’s. I call BS on this nonsense.

The first, and most obvious question is, why? Why web-apps and cloud computing at all? There are a couple of different answers to this, but great user experience is not at the top of the list. The web-app world is one of lowest common detonators; when you need to build an application quickly with minimal development overhead, web-apps and cloud computing infrastructure works great. WebApps are simple, cheap and provides for ubiquity of access. A great example is FaceBook; you can use it from any OS running a reasonably capable browser and building a similar service using rich-client technology made no sense when the service was in its incubation period.

Facebook is also a great example of a service that, from a purely user-experience standpoint, would probably be substantially better with a rich-client frontend (imagine having true drag and drop, being able to reorganized lists as you see fit and not being subject to random and inexplicable UI changes made by FaceBook’s devs). So inadequate is the current web-based user experience with FaceBook that at least 4 projects, I know of, are working on providing a rich-client frontend to FaceBook.

In cases where an application or service is built only as a web-app or cloud service, it is usually serving, first and foremost, the interests of someone other than the user. When we hear Google, IBM or Oracle (or just about anyone else) talk about applications in the cloud, it is to serve their interests of promoting anything that deemphasizes a competitor or an entrenched technology and provides them a method to lock customers to their platform. When we look at services like FaceBook, it’s to simplify development costs and provide global ubiquitous access. Even in the case of Microsoft’s SharePoint, it was developed (originally) as a web application due to the comparatively low-cost of development.

That doesn’t make any of this great or even good, from a user experience standpoint, and the proof is in the pudding: most web apps are a usability nightmare and the general interweb experience today is one of intrusive and resource sucking flash ad’s combined with flash and Silverlight media players (coded as native plug-ins, of course) and totally inconsistent user interfaces.

So why would I, as a user, want to give up my rich client applications in favor of a web/cloud replacement? Price. Advertising supported services don’t cost a dime out of pocket. Of course, we are back to annoying advertising, the outages, the lousy user experience and, generally, a discussion of lowest common denominator. This is why rich-client applications aren’t going away: if you need a really outstanding email and calendaring solution, there is no web-app today (and likely in the future) that can touch Outlook. You want the best media players, the best games, the best spreadsheet, the best CAD package, the best photo editing solution, then native apps are where you end up. Even if you could implement these apps well in a browser, you would simply end up with the browser becoming just another container acting as a virtualization layer.

If you want proof of the importance or continued success of rich-client applications, look no further than the iPhone. The iPhone should be a poster-boy for the always on, web connected, post-desktop cloud computing world. But the iPhone SDK looks shockingly like a conventional desktop OS SDK. This is because it is, and for good reason: a true (non active-x, flash, Silverlight) browser/web app can’t and won’t ever provide the same level of performance or access to OS provided features. That’s why even Google Earth is a native desktop rich-client application!

Cloud platforms, like Azure and AppEngine are important as enablers of new back-end scenarios, just as the web browser enabled a whole class of application to exist that might otherwise never have come about. However, those that claim the cloudification or broswerfication of everything are selling a world vision to suite their needs or desires, not necessarily those of users. Web and cloud computing are additions to the computing landscape, not replacements for existing technology. The PC didn’t replace the mainframe, SOA didn’t replace n-tier architecture and Cloud Computing is not going to replace the need for conventional infrastructure or desktop applications.

Certainly, classes of desktop applications may go away (quick’n dirty VB6 apps hopefully), but ultimately, computing is an additive world. New ways of doing things are intermingled with old methods, those old methods are renamed with shiny new names and, or course, there is always some displacement. But, as one of my mainframe centric customer loves to say, “we have been doing this cloud crap for 40 years”.

September 25, 2008

The Little CPU That Can

Filed under: Uncategorized — makfu @ 12:41 am

There is just something cool about the little Atom CPU from Intel. Indeed, it’s so neat that I just had to have one. Now, the good news is that building an Atom based box will cost you well under 300 bucks. The other bit of good news is that these Lilliputian CPU’s actually make very versatile little computers. After reading Tom’s Hardware review of an Atom 230 based “nettop” machine, I wasn’t very confident in the machines ability to run Vista.

Now, granted, Tom’s Hardware sucks (sorry, but it is true) and they tested with Vista RTM rather than with SP1, which is a fairly substantial difference, but clearly there is a disparity between my experience and theirs. In short, Vista SP1 runs surprisingly well on this little machine. This includes driving a 1600×1200 display with Aero Glass enabled and running some reasonably demanding apps, like Outlook 2007. It does flash animations and media playback just fine and is good for DVD playback also. Overall system responsiveness is surprisingly quick (much faster than I expected).

I plan on making a video so people can gauge performance of this machine for themselves, though I think people will be generally pretty impressed.

If you want to build a machine like mine, you will need the following parts from NewEgg:

1 x ($69.99) MB INTEL BOXD945GCLF 945GC RT – Retail

$69.99
________________________________________

1 x ($41.99) HD 160G|WD 7K 8M SATA2 WD1600AAJS % – OEM

$41.99
________________________________________

1 x ($32.99) MEM 2G|KST DII667 KVR667D2N5/2G R – Retail

$32.99
________________________________________

1 x ($94.99) CASE INWIN|BM639 BK/SIL RT – Retail

$94.99
________________________________________

1 x ($24.99) DVD BURN LITE-ON|DH-20A4P-08 20X R – Retail

$24.99
________________________________________

Total 264.95

Now, at the time I ordered mine, NewEgg was running a combo discount on the InWin case with the Lite-On DVD drive of 24.99, essentially giving me the DVD drive for free, so my total we 25 bucks cheaper than perhaps what you might get today (though NewEgg is always running combo deals, so you might actually get the machine cheaper).

Before you buy one of these machines, there are a few caveats:

1. The Intel GMA 950 will do Aero, but because the GMA’s architecture is such that T&L work is actually done on the host CPU, the 950 performs fairly poorly. At high resolutions, such as 1600×1200, the GMA struggles with Aero, and Window drags are sluggish and choppy. However, this doesn’t really impact usability, as apps still launch snappily and GDI client area (the objects inside an apps window, such as menus, scrolling, etc.) all work smoothly. Intel (or another vendor) would do well to couple the Atom with a proper GPU with full hardware T&L as this would greatly improve the scalability of the platform.

2. The newer Dual Core Atom 330 will be available on the D945GCLF2 motherboard shortly and, frankly, I would recommend waiting for this chip/mobo combo as it should cost about the same and will be quite a bit faster (executing 4 threads on two hyperthreaded cores). However, the cool thing about this class of hardware is that its so cheap that buying the new board and swapping it out the old one is sub 100 dollar proposition.

 

So what can you do with this hardware? Well, you could make yourself a little Linux router or web server. It also makes a fine low-end Windows Server 2008 machine (though, obviously the cost there is the server license, unless you are using it for lab purposes). I also happen to think that for a large segment of the population, this machine is a fine solution for basic web browsing and e-mail. I have been using it almost exclusively for this purpose and it has been a real eye-opener to see just how well one can get along with what is, in conventional wisdom, such an underpowered machine.

 

Here is a screenshot of my Atom box in action running Vista x64:

atomscreenshot

April 11, 2008

Vista Performance Recommendations (no funny title for this post)

Filed under: Uncategorized — makfu @ 9:19 pm

So, there is no getting around the fact that Windows Vista is, well, a rather large Operating System and has a bit of a perception problem regarding performance. The real problem is not that Vista is, at heart, terribly bloated, it’s that the system’s default configuration makes assumptions about the awesomeness of your computer that are, in many cases, perhaps a little over optimistic. This is especially true with lower-end mobile computers and compact sub-notebooks.

Specifically, Vista, out of the box, is tuned for a system with a fairly fast disk subsystem and lots of physical memory. This is especially important to note because the disk subsystem actually has more of an impact on the total perceived performance of Vista than potentially any other system component, based on my own informal testing (please note, that none of these statements are official cannon from, or sanctioned by, Microsoft and are based on my own testing and assertions).

SuperFetch

The first major contributor to potential performance issues, on low-end systems is SuperFetch. SuperFetch is a very neat technology that essentially implements a page-prefetching scheme. Put simply, SuperFetch maintains a system-wide profile of the most commonly used groups of pages (memory mapped files such as DLL’s and application files, etc.) and then speculatively loads said pages into the systems standby page list, more commonly called the System Cache (technically, in Windows 6 there are 8 standby lists, structured by priority). This prefetching is ongoing, so as pages of non-allocated (totally unused) memory become available, SuperFetch loads these pages with code/data and places them on the standby list. When those pages are actually needed by a process, such as loading a commonly used data file, or launching an often used application, the pages are already in memory and are simply moved from the standby list to the applications working set, thus negating the need for disk IO. Furthermore, if you leave applications running, the memory manager may move pages from a processes working set to the standby list to increase the pool of “available” memory. When you launch a new application, those pages will be dropped from the standby list in order to satisfy the new applications need for process working set allocation. However, upon exit of the memory hungry application, SuperFetch will begin repopulating the standby list with those commonly used pages so as to improve responsiveness as soon as you switch back to using that commonly used application.

 On systems with large amounts of memory (2GB or more) and fast, low-latency disk systems (5.0 or better disk score), SuperFetch is a real boon to those who often leave large numbers of applications running concurrently and it really does radically improve general system responsiveness for commonly used applications, OS components and access to data files. I can’t stress this enough, if you are running Vista with 2GB or more of memory (especially 4GB or larger 64bit configurations) with a fast disk subsystem (16MB of buffer/cache, 7200RPM, etc.) turning off SuperFetch is actually a bad idea and will lower system responsiveness.

However, if the system in question is a low-memory configuration (1GB or less) and, most importantly, has a slow disk, SuperFetch can have a very negative impact on performance because it does increase background disk IO activity substantially, most notably during startup and especially on computers where the system is under memory pressure, causing pages to be evicted from the standby list but then in short order potentially reloaded, from disk, to the standby list. Vista goes to great lengths to minimize this situation by implementing a set of page priorities, but this solution becomes less effective as the system becomes more constrained from a physical memory standpoint.

Additionally, machines with relatively slow spindle rotation speeds, low areal density and small disk buffer/cache will be impacted by the IO overhead of SuperFetch far more than a systems with a fast disk. Essentially, below a 4.9 rated disk on a system with 1GB of memory, SuperFetch’s benefits are (in my testing) usually outweighed by its overhead. Even on systems with 2GB of memory, but a very slow primary disk (4.7 or lower Primary Disk WinSat score), SuperFetch’s IO may negatively impact performance, rather than improve it. Conversely, a 1GB configuration on a system with a fast primary disk may still perform better with SuperFetch enabled. My general rule of thumb though is that if the system has 1GB or less, or if the system has a disk score below 4.8, I disable SuperFetch.

Search Indexer, NTFS Disk Defragmenter

Vista implements some impressive new low-level features aimed at making disk IO less impactful with the most important core feature being prioritized IO (you can read more about how IO priority is implemented here: http://technet.microsoft.com/en-us/magazine/cc162494.aspx). With the advent of prioritized IO, Windows now attempts to run a number of background tasks using IO priority below that of standard interactive processes. This is probably one of the most important changes in the Windows 6 codebase and has some really impressive performance benefits.

When Vista runs a NTFS Disk Defrag job, performs indexing for the Windows Search service or runs a Defender scan (or any Vista optimized AV product), the IO for those tasks is done with Background IO priority. This essentially eliminates the foreground impact of low-cpu, but high IO utilizing applications. There is, however, one caveat, and that is if your system has a relatively small disk buffer/cache (8MB or less), this background IO will increase, noticeably, IO latency for foreground processes by overrunning the disk subsystems hardware cache/buffer. The more cache present, less of an impact background IO will have.

The most significant case where this becomes a problem is when background IO tasks (such as Indexing or Defrag) occur in tandem with SuperFetch IO. This, on a slow system leads to pretty significant issues. As stated above, the first recommendation is to turn off SuperFetch on slower, low memory machines. However, if performance during background IO tasks, such as defrag or indexing, is still very poor, then modify the systems advanced power settings to balanced, which will set the indexer to be less aggressive, or disable Windows Search altogether. Also set the scheduled disk defrag job to only run manually and modify other background applications (such as Defender, AV, etc.) to run on a less aggressive schedule. Note, however, that in general, only the SLOWEST systems (or Virtual Machines) should require this step. I highly recommend turning off SuperFetch as the first step and avoid modifying defrag and Windows Search indexer changes, if at all possible, as the loss of functionality and potential long-term performance impact is far greater than the benefit in all but the most extreme cases.

ReadyBoost

ReadyBoost is one of the most poorly understood Windows Vista features. Unfortunately, due to poor initial messaging around what ReadyBoost is and does, it is often misunderstood as a way to increase the available memory on a system. ReadyBoost is, quite simply, a write-through, page-file cache used for random 4k in-page operations by the memory manager.

When you insert a flash device that meets ReadyBoost specifications, you have the option of using the device to cache a portion of the system pagefile. The advantage of doing this is that for scattered IO operations against the pagefile, such as pulling in a random page of memory, a flash device has nearly 0 latency compared to a conventional, spindle-based, hard disk drive. This means that a page can be read back into memory much faster, thus improving system responsiveness, versus getting that page from the primary disk. However, for large sequential (contiguous or large page) operations, a traditional hard drive offers far greater sustained throughput versus a flash device, so those operations will occur against the primary page file. In addition to speeding up random in-page operations, there is a small degree of concurrency that occurs for paging operations (wherein both devices can be servicing IO’s in tandem), which also can positively impact performance.

ReadyBoost is best used in scenarios where a system has a very slow primary disk as it will alleviate a certain amount of disk IO. In systems with relatively small memory configurations, such as 1GB or less and slow disks, ReadyBoost can have a dramatic impact on overall system responsiveness (especially if no other system tweaks have been performed). A systems with lots of physical memory and a fast disk, may see no perceivable improvement from using ReadyBoost and, in some cases, ReadyBoost can actually be detrimental.

The two biggest performance caveats with ReadyBoost are if you are sharing the ReadyBoost devices bus (such as USB) with other bandwidth intensive devices and that ReadyBoost is not entirely free of CPU overhead and said overhead can be dramatic if the bus hosting the ReadyBoost device is serviced by poorly optimized drivers (I have personally seen this with one vendors SD-Reader drivers).

 A good example of a non-standard scenario for which ReadyBoost can be quite beneficial is when running a system that is memory constrained because of workload, such as running VirtualPC or VMWare virtual machines (such as for demo purposes on a notebook computer). In this scenario, however, if the ReadyBoost device and the external hard disk being used to run IO intensive apps (such as Virtual Machines) share the same bus, performance of both can be negatively impacted. In this scenario, ideally the ReadyBoost device would be hosted on a separate bus, such as an SD card bus, while the external hard disk would be located on a USB or IEE1394 bus.

The last thing to note about ReadyBoost is that, despite its name, it is not instantaneous in improving performance. On insertion and activation of a ReadyBoost device, or on system reboot, ReadyBoost cache creation takes a number of minutes (2-5 in my experience). During this time, performance may actually be substantially degraded while the cache is being populated.

Additional notes about performance

A couple of quick pointers about what to and what not to do regarding performance tweaking in Vista:

1. Don’t just turn off built in services – most services are necessary and should be left running (though you should try to remove unnecessary 3rd party services, a good example being the iTunes and iPod services Apple installs)

2. DO NOT turn off Aero Glass (or disable the Desktop Window Manager service) – this forces all window management work onto the host CPU (off the GPU) and actually degrades performance dramatically. Only in sub 1GB RAM scenarios should you disable Aero Glass.

3. Do try to leave the pre-configured index and disk defrag settings in place, if at all possible

4. Do make sure you have the latest drivers installed, as Vista’s biggest weak spot at RTM was that drivers were very immature (especially Nvidia’s and Creative’s drivers)

5. Do download AutoRuns from http://technet.microsoft.com/en-us/sysinternals/bb963902.aspx and use it to remove background craplets such as Apple QuickTime and Adobe agents – these agents eat memory and CPU cycles and deliver little value

6. Do use the Windows Resource Monitor to get a quick view of CPU, Disk, Network and Memory activity as this handy utility can be very useful for getting a quick but comprehensive understanding of what applications are consuming resources

7. Provided your system is connected to a UPS, and you have significant physical memory, do “Enable advanced performance” as this increases the systems memory allocated for disk write caching (though do this with caution as in increases the chance of file system corruption and data loss in the event you suddenly lose power)

8. If your system is low-end, 1GB or less with a single core CPU and a slow drive, do disable the Sidebar (yes gadgets are cool, but for low-end systems, they are a waste of cpu cycles)

9. If you are planning to upgrade your system because of poor performance, start by both increasing memory to 2GB, but just as importantly, upgrade to a 7200RPM SATA disk with 16MB of cache, as this will improve performance dramatically (cache being the most important factor)

My final comment for this post is that Windows Vista will never have the same system footprint as Windows XP. Vista, despite all the negative press, and initial launch missteps, is a better Operating System that scales far better than Windows XP thanks to the myriad of core technology improvements. However, with the advent of better scalability (especially on 64bit systems), new functionality (instant search) and better system self maintenance (such as low-impact background filesystem defrag) the OS, as with every new OS iteration, is simply bigger. Windows 2000 was much bigger than NT 4, XP was bigger than both and derided as bloated when it shipped. When OS X shipped, it too was lambasted by more than a few critics as being slow, as it was far larger than its obsolete predecessor. Even Linux is not immune from growth, as a current common Linux distribution is massive compared to a comparable distribution from 5-6 years ago.

So, in short, if you really want a faster version of Windows, perhaps you would like to try Windows 95, as on current hardware, it makes XP look like it’s standing in cement (yeah, I didn’t think so).

March 28, 2008

Would you like some cheese with that whine?

Filed under: Uncategorized — makfu @ 2:02 am

So todays topic is Can Direct X 10 be ported to XP? 

This has been reported, incorrectly, an incredible number of times. What started as a technical misunderstanding regarding Vista’s core graphics stack has lead to a plethora of conspiracy theories and the notion that Microsoft could “easily” implement DX 10 on top of XP. That the technically inept “tech media” actually propagated this nonsense gave this theory a sense of legitimacy.

I think the most fundamental problem with this discussion is most people have little to no understanding of how radically different the WDDM driver model and the new Direct X Graphics Kernel are compared to their predecessors. In the old model, the kernel mode “miniport” driver was responsible for implementing all GPU management, including scheduling and memory management. In Vista the DXGK is responsible for this work and is the arbitrator for all pipelines rendering to the display (DX9, DX10, OGL ICD, GDI). This major overhaul was necessary to support a number of key features in D3D10 and DX9ex.

Here is one good example of why Microsoft needed to reengineer the stack for D3D10; D3D10 supports geometry shaders that can procedurally generate new primitives based on properties of existing ones. This means that, with limitations, you can create procedurally generated geometry on the GPU. The best analogy I can come up with is to think of it like an origami crane, you start with a basic primitive and by applying geometry shader instructions you can generate a more complex shape, just like following the instructions of folding paper until you get a crane (though it’s essentially additive, not subtractive).

With this capability comes the possibility of much larger and longer running shader programs on the GPU that can generate much more content in a far more efficient and parallelized pipeline (since it’s no longer, for example, one triangle in, one triangle out). This means D3D10 has the ability to generate really complex stuff on screen in real time, but that also means more stuff, period. This leads to greater usage of framebuffer memory and the need to manage execution on the GPU since these shader programs are potentially much more complex (if the GPU is stuck in one threads shader code, it could prevent another thread from running, which could be bad – though today’s DXGK/WDDM doesn’t do command stream preemption, that’s a WDDM 2 feature). Thus, the need for a new underlying framework in the form of framebuffer virtualization and GPU scheduling, implemented in the  new, adeptly named, DirectX Graphics Kernel, and the new driver model that “plugs” into it, the WDDM. WDDM and DXGK make up the core of Direct X 10.

Now, everything written above is extrapolated from discussing the requirements of just one, albeit major, feature of the D3D10 API (a feature that, by the way, isn’t used in a single shipping program because todays DX10 hardware isn’t optimized for the new geometry shader functionality in DX10). There are, however, more than a few other major changes and features (like advanced instancing, actually used in certain games) in D3D10 that also leverage new core functionality that, when combined, further drove the decisions regarding what that underlying core graphics architecture had to be and what features it had to support.

Even more interesting is there are some other important, non D3D10 specific reasons why we want framebuffer virtualization and GPU scheduling and all that good stuff. The most obvious is multiple discreet on-screen apps (not just threads within an app’s process) using the GPU. This is becoming a common scenario; a good example of which is running Aero while also running Microsoft Virtual Earth 3D and maybe Chess in Vista =). In the future, probably everything will be rendered to the 3d pipeline via application frameworks like WPF. This combined with desktop composition’s (DWM/Aero) current need to allocate shared D3D surfaces, all points to a future where the DXGK/WDDM features aren’t just nice to have, but really are necessary.

One final particularly important point: D3D9 isn’t emulated, it’s a reimplementation (forward port, if you like) of the API (libraries) on top of the new DXGK/WDDM infrastructure. Let me repeat that, DX9 on Vista was essentially a bottom up rewrite of the DX9 libraries for WDDM. In the process of re-implementing DX9, the DX folks also added the features above that could only easily be implemented on top of DXGK/WDDM, like cross-process shared surfaces (used by DWM/Aero, for example). With Vista, DX9 is quite a bit more advanced than XP’s version.

So DX10’s advanced features (specifically the D3D API functionality) depends on a lot of functionality implemented via some very advanced core DX10 OS components that simply doesn’t exist in the old Windows XP Direct X and graphics driver model. Regardless if DX10 is back portable, it is NOT a trivial matter as even forward porting DX9 to the WDDM model was a huge undertaking. Quite simply, it wouldn’t be a port of DX10; it would be a whole new implementation of the libraries on the old display driver model OR you would end up back porting the entire WDDM/DXGK infrastructure. Not trivial and, practically speaking, not feasible.

March 27, 2008

I reject your reality and will substitute my own

Filed under: Uncategorized — makfu @ 2:23 am

Man, the anti-Vista brigade just can’t stop themselves. Every time I turn around I hear some other bit of nonsense about the product. So I am going to voice my highly opinionated views over then next few days on several arguments I have seen lately.

Todays topic: is Vista SP1 is slower that Server 2008 in benchmarks?

Like Duh! The server product is geared to be stripped down and is optimized out of the box for server workloads. This, by the way, makes it well suited for running timed benchmarks. This does not mean it is specifically well suited to interactive, single user workloads.

One of the observations made by one widely quoted blog was that when configured in an equivalent fashion, Vista is about 17% slower than 2008. The problem with this statement, besides a lack of detailed configuration information sited in the blog, is that despite being binary identical, the out of box configuration is quite different between the systems (and yes, all common components between Vista SP1 and Server 2008 are binary identical and the “slipstream” ISO that is available was actually built as a complete build at the same time as Server 2008’s ISO’s).

So, for example, Server 2008 doesn’t have superfetch or the Windows Search indexer enabled EVEN if you do install the “desktop experience” feature. To enable Windows Search requires installing additional role services and superfetch requires delving into the registry (and is strictly not supported on any server configuration because of the impact to server applications and multi-user configurations). But even assuming that these services are disabled in the OS instances used for testing, there are additional lower-level differences in the run-time optimizations between the two OS’s.

For example, the default configuration for performance settings is to favor background services, versus foreground applications, which has a profound impact on processor scheduling. Specifically, the Vista default of Optimize Performance for Applications enables short quantum lengths (time slices) with variable quantum length for foreground applications and a high foreground boost by using longer quantum’s for foreground interactive processes. In contrast, the server default of Optimize Performance for Background Services, provide a long quantum that is fixed for all threads (e.g. no foreground boost).

If all these options aren’t configured identically, the service count and configuration isn’t identical and if all drivers aren’t 100% identical, then performance could differ greatly between two systems, even though they are based on the same base binaries.

Most importantly, it’s irrelevant to compare server versus client OS’s because benchmarks do not tell a complete story. For example, during multiple benchmark runs, the basic demand-page system caching model that is used in server (or Vista if superfetch is disabled), versus the proactive paging enabled by superfetch, acts as a major potential differentiator because in subsequent runs, a large quantity of code and data information becomes cached in the system cache (standby page list). Performance in scenarios where users load common large files or applications in a variable order will highlight the advantages of a system like superfetch (especially in very large memory configurations) because the system will proactively begin loading commonly used pages (code and data) once the system boots or once memory becomes free, based on a usage profile that the superfetch system develops over time.

Put simply, Vista is tuned to try and scale performance the way users work. Users don’t work like benchmarks; most open and close files and applications randomly and have usage patterns that are not strictly linear. Servers workloads however look very much like benchmark runs  – a series of actions (launching processes and loading files in a repeatable sequence), so a server OS default install (with superfetch disabled), will most likely run benchmarks faster than a default Vista install and most benchmarks would benefit at all from proactive paging (caching).

With that said, I will be running some benchmarks this weekend to illustrate the above topics, using a common (real hardware) platform to evaluate performance between identical Vista and Server 2008 installs.

The topic for tomorrow? Can DX10 be ported to XP…

March 7, 2008

Prepare to be Rocked…

Filed under: Uncategorized — makfu @ 3:36 am

One of the recurring bits of misinformation that I see floating about message forums is how OS X has supposedly better “64bit support” with comments stating that OS X Leopard is “more 64bit” than Vista. I find this assertion amusing because it is, contrary to Apple marketing, completely wrong.

First, let me state that, their Windows software notwithstanding, Apple makes terrific products and OS X is an excellent operating system. However, the question at hand is whether OS X Leopard is a 64bit OS, and the answer to that is an unequivocal no.

First, to make myself clear, I define the OS as the core kernel, drivers, system services, shell and primary UI libraries. An OS can support application code of different word lengths, via subsystems, without actually being natively said word length. For example, DOS, via the 4GW runtime environment, could run 32bit protected mode code, however this did not make DOS itself a 32bit OS (though some would claim that DOS4GW provided so many functions that it was itself an OS).

Some operating systems, like Windows 9x, are legitimately hybrid systems, as its core kernel was derived from a v86 VMM (virtual machine monitor that, yes, you could almost call a hypervisor). The Windows V86 VMM was a true 32bit preemptive multitasking, virtual memory, Ring 0 kernel that managed 1 “system VM”, where your 16bit windows apps and 16bit Windows OS code lived, and other DOS v86 vm’s. In fact, even as far back as Windows 2.11 /386, the system VM (where your 16bit windows apps ran using cooperative multitasking) was preemptively multitasked alongside all DOS applications running in their separate dos “box” v86 virtual machines.

The VMM was extended in Windows95 (not by much) to support preemptive thread scheduling and memory management for 32bit protected-mode Windows processes. This meant that Windows 9x was not a pure 32bit system, since much functionality was derived from the 16bit components in the system VM and, even in a few occurrences, DOS int21h and BIOS int13h 16bit real mode functions were invoked. Because of its 32 bit kernel, however, it was also not a 16bit OS in the strict sense and is correctly described as a hybrid (or, as I prefer, a big fat kludge).

Now what about 64bit Windows NT based operating systems, such as Vista and Server 2008 x64? Are they a similar kludge as Win9x, given they run both 32bit and 64bit code? The short answer is no. The long answer requires that we delve into how X86-64, or as it’s more commonly called, x64, works.

The x64 CPU supports 3 modes of operation; Real Mode – the legacy x86 segmented memory model used by DOS, Protected Mode – the 32bit linear address space mode with hardware memory management introduced with the 80386 (also includes 16bit 286 protected mode and v86 mode), and Long Mode – the 64bit linear address space mode, also with hardware memory management, introduced by AMD with the Hammer architecture based CPU’s.

Long Mode is interesting because, when active, it actually encapsulates 32bit Protected Mode and 64bit Native Long Mode. When a 64bit OS switches the CPU to Long Mode, the first stop is an intermediate “compatibility sub-mode”. This sub-mode is essentially identical to “legacy” 32bit Protected Mode, but without the virtual-8086 (v86) sub-mode support used to run DOS/BIOS 16bit Real Mode code in a Protected Mode, 32bit OS. It is a further step to actually switch the CPU to full 64bit Long Mode, but this step is actually a critical part of AMD’s well thought out compatibility strategy. By allowing a nearly unmodified 32bit code-base to run in a default 32-bit “sub-mode”, AMD solved many problems for OS developers and, as we shall see in a few paragraphs, actually made it possible for Apple, and others, to get to a 64bit world via an elegant shortcut.

But first, how does Windows support x64? 64bit (x64) variants of XP, Server 2003, Server 2008 and Vista run in full 64bit Long Mode, meaning that the system boots all the way to full Long Mode, supports and uses pointers 64bits in length and subsequently supports 64bit virtual addressing along with 64bit datatypes (for example, in Windows data model, LLP64, longlong is natively a 64bit data type). Furthermore all data registers used are 64bits in length and 8 additional general purpose and XMM registers are available for use via the full Long Mode instruction set architecture. This is a point that needs repeating, with 64bit Windows, the OS, kernel, drivers, shell and all major libraries (Win32, COM, GDI, Direct X, .Net, etc.) are all true, native 64bit code all running in 64bit full Long Mode with access to the 8 extra GP and XMM registers and gobs of address space. Top to bottom, 64bit Windows (NT) is a 64bit OS.

So if, the OS is 64bit, how does it run 32bit applications? Well, first of all, x64 versions of Windows do not use an “emulation” environment, such as the NTVDM used in 32bit Windows NT based operating systems for running 16bit code. Instead, 64bit NT uses a translation layer called WOW64 which leverages a very cool feature of the x64 architecture that AMD had the foresight to add when developing x64, namely the ability, once the CPU is in Long Mode, to dynamically switch the CPU’s sub mode from either 32bit compatibility (e.g. protected) mode or full Long Mode based on the code segment (CS) value loaded in the CS register.

In 64bit Windows, this works as follows: when loading a 32bit process, 64bit DLL’s named NTDLL.DLL, WOW64.DLL and WOWWIN64.DLL are loaded into the address space and then, WOW64.DLL proceeds to load a 32bit version of ntdll.dll and calls its initialization routine which loads all the required DLL’s for the application, including 32bit system DLL’s that, if they make system calls, are modified to call into WOW64.DLL (or WOWWIN64.DLL), rather than the standard call path. For all 32bit code loaded in the address space, the memory manager sets the L&D bits of the Code Segment to its corresponding 32bit mode indicator (L0, D1). When (and anytime) the program begins executing, e.g. there is a context switch to a thread executing 32bit code in the programs process, that code segment value is loaded into the CS register, per the transfer of control, and the CPU switches on the fly to 32bit Compatibility Mode.

When a 32bit program needs to make a system call or, as is more often the case, a function in a system DLL needs to make a core OS function call, the modified DLL calls the WOW64 stub libraries, which, based on loading the CS value for WOW64, causes the CS register values used for CPU mode selection to be set to L&D values L1, D0. From this point on, until returning execution back to 32bit code, the processor is in 64bit long mode. Before passing (thunking) the system call, WOW64 also performs stack translation for 32bit values/arguments.

So, from the above description, you can now understand how a 64bit Long Mode OS, on x64, executes 32bit protected mode code and it should be fairly clear how x64 CPU’s make the transition between 32 and 64bit modes. I will add, that the same WOW64 subsystem is used on the other major 64bit platform supported by Windows, IA64 (Itanium) with an additional DLL that provides x86 to IA64 instruction translation, making the Itanium version of WOW64 an actual emulation environment, versus a translation (thunking) layer as it is in x64 version of Windows.

Okay, so how do other OS’s, such as OS X Leopard, run 64bit code on a 32bit kernel? The answer is the exact inverse of what is described above. Using OS X as an example, the “XNU” kernel is not 64bit code; it is a 32bit PAE enabled kernel and, as discussed a few paragraphs back, a 32bit system can bootstrap into Long Mode’s 32bit compatibility mode. Furthermore none of XNU/Darwin kernel mode drivers or components are 64bit as one can’t safely mix x86 and x64 code in a common address space due to differences in pointer lengths (32bit vs. 64bit), argument/variable management (stack vs. register) and register counts (8 vs. 16) except when it’s very tightly controlled, such as the WOW64 DLL’s mapped into a 32bit processes user-mode address space.  Doing so in kernel mode would be a very dangerous and make kernel mode development extremely difficult to debug.

However, the OS does support 64bit processes and has certain libraries that are coded as 64bit native for supporting 64bit programs (processes). Just as Windows switches the CPU back to 64bit long mode via WOW64 when an application makes a system call, a system call from a 64bit long mode process in OS X will cause whatever library invokes the system call to follow a call path that results in some code specifying the CS for a 32bit code segment descriptor, thus setting the L&D bits of the Code Segment register to its corresponding 32bit mode indicator (L0,D1). Subsequently, a function, prior to passing arguments to the native kernel mode system call, will truncate/reformat any 64bit values that are being passed to 32bit x86 compatible values. Returning from the system call back to the 64bit process, causes CS register values to be set to L1,D0 and the processor is magically back in full 64bit Long Mode.

Now, if that sounded disparaging of OS X, then you have bought into the “more bits” is better “measuring” contest. Reality is that the benefits of a fully 64bit OS are dependent on a lot of factors. Apple using the x64 architecture in the manner they did is a valid way to support 64bit applications. It is interesting to note, that once upon a time, this was the direction NT was headed in with NT 5.0 (Windows 2000) on the Alpha architecture (not to mention this is how several other OS’s on other architectures made the move to 64bit support). The likely reason that Windows (NT) ended up becoming a fully 64bit platform all at once is because, with the end of Alpha, development of a 64bit system was focused entirely on Itanium which had no real 32bit mode. As a result, the x64 version of Windows is actually a port of the Itanium version and even carries over the 44bit address map from that platform. Had 64bit efforts started later, with more focus initially on x86, it is very likely that Windows would have travelled the more relaxed route to a 64bit world.

That said, eventually those OS vendors using a 64bit process on a 32bit kernel model afforded to them via x64’s clever compatibility, will have to port their core codebase to native 64bit since, in the near future, contraints in the amount of addressing allowed in 32bit compatibility mode will limit the total physically addressable system memory (even for 64bit processes) and the memory accessible to the kernel itself.

Older Posts »

Blog at WordPress.com.