Tuesday, June 24, 2008

Apple's other open secret: the LLVM Compiler

Published: 07:10 AM EST

SproutCore, profiled earlier this week, isn't the only big news spill out from the top secret WWDC conference due to Apple's embrace of open source sharing. Another future technology featured by the Mac maker last week was LLVM, the Low Level Virtual Machine compiler infrastructure project.

Like SproutCore, LLVM is neither new nor secret, but both have been hiding from attention due to a thick layer of complexity that has obscured their future potential.

Looking for LLVM at WWDC

Again, the trail of breadcrumbs for LLVM starts in the public WWDC schedule. On Tuesday, the session "New Compiler Technology and Future Directions" detailed the following synopsis:

"Xcode 3.1 introduces two new compilers for Mac OS X: GCC 4.2 and LLVM-GCC. Learn how the new security and performance improvements in GCC 4.2 can help you produce better applications. Understand the innovations in LLVM-GCC, and find out how you can use it in your own testing and development. Finally, get a preview of future compiler developments."

There's a lot of unpronounceable words in all capital letters in that paragraph, LOLBBQ. Let's pull a few out and define them until the synopsis starts making sense.

Introducing GCC

The first acronym in our alphabet soup is GCC, originally the GNU C Compiler. This project began in the mid 80s by Richard Stallman of the Free Software Foundation. Stallman's radical idea was to develop software that would be shared rather than sold, with the intent of delivering code that anyone could use provided that anything they contribute to it would be passed along in a form others could also use.

Stallman was working to develop a free version of AT&T's Unix, which had already become the standard operating system in academia. He started at the core: in order to develop anything in the C language, one would need a C compiler to convert that high level, portable C source code into machine language object code suited to run on a particular processor architecture.

GCC has progressed through a series of advancements over the years to become the standard compiler for GNU Linux, BSD Unix, Mac OS X, and a variety of embedded operating systems. GCC supports a wide variety of processor architecture targets and high level language sources.

Apple uses a specialized version of GCC 4.0 and 4.2 in Leopard's Xcode 3.1 that supports compiling Objective-C/C/C++ code to both PowerPC and Intel targets on the desktop and uses GCC 4.0 to target ARM development on the iPhone.

The Compiler

A compiler refers to the portion of the development toolchain between source code building and debugging and deployment. The first phase of compiling is the Front End Parser, which performs initial language-specific syntax and semantic analysis on source code to create an internal representation of the program.

Code is then passed through an Optimizer phase which improves it by doing things like deleting any code redundancies or dead code that doesn't need to exist in the final version.

The Code Generator phase then takes the optimized code and maps it to the output processor, resulting in assembly language code which is no longer human readable.

The Assembler phase converts assembly language code into object code that can be interpreted by a hardware processor or a software virtual machine.

The final phase is the Linker, which combines object code with any necessary library code to create the final executable.

Introducing LLVM

GCC currently handles all those phases for compiling code within Xcode, Apple's Mac OS X IDE (Integrated Development Environment). However, there are some drawbacks to using GCC.

One is that it is delivered under the GPL, which means Apple can't integrate it directly into Xcode without making its IDE GPL as well. Apple prefers BSD/MIT style open source licensees, where there is no limitation upon extending open projects as part of larger proprietary products.

Another is that portions of GCC are getting long in the tooth. LLVM is a modern project that has aspired to rethink how compiler parts should work, with emphasis on Just In Time compilation, cross-file optimization (which can link together code from different languages and optimize across file boundaries), and a modular compiler architecture for creating components that have few dependencies on each other while integrating well with existing compiler tools.

LLVM only just got started at the University of Illinois in 2000 as a research project of Chris Lattner. It was released as version 1.0 in 2003. Lattner caught the attention of Apple after posting questions about Objective-C to the company's objc-language mailing list. Apple in turn began contributing to the LLVM project in 2005 and later hired Lattner to fund his work.

Clang and LLVM-GCC

Last year the project released Clang as an Apple led, standalone implementation of the LLVM compiler tools aimed to provide fast compiling with low memory use, expressive diagnostics, a modular library-based architecture, and tight integration within an IDE such as Xcode, all offered under the BSD open source license.

In addition to the pure LLVM Clang project, which uses an early, developmental front end code parser for Objective C/C/C++, Apple also started work on integrating components of LLVM into the existing GCC based on Lattner's LLVM/GCC Integration Proposal. That has resulted in a hybrid system that leverages the mature components of GCC, such as its front end parser, while adding the most valuable components of LLVM, including its modern code optimizers.

That project, known as LLVM-GCC, inserts the optimizer and code generator from LLVM into GCC, providing modern methods for "aggressive loop, standard scalar, and interprocedural optimizations and interprocedural analyses" missing in the standard GCC components.

LLVM-GCC is designed to be highly compatible with GCC so that developers can move to the new compiler and benefit from its code optimizations without making substantial changes to their workflow. Sources report that LLVM-GCC "compiles code that consistently runs 33% faster" than code output from GCC.

Apple also uses LLVM in the OpenGL stack in Leopard, leveraging its virtual machine concept of common IR to emulate OpenGL hardware features on Macs that lack the actual silicon to interpret that code. Code is instead interpreted or JIT on the CPU.

Apple is also using LLVM in iPhone development, as the project's modular architecture makes it easier to add support for other architectures such as ARM, now supported in LLVM 2.0 thanks to work done by Nokia's INdT.

On

No comments: