Tuesday, August 23, 2016

Firing up the Ignition Interpreter

V8 and other modern JavaScript engines get their speed via just-in-time (JIT) compilation of script to native machine code immediately prior to execution. Code is initially compiled by a baseline compiler, which can generate non-optimized machine code quickly. The compiled code is analyzed during runtime and optionally re-compiled dynamically with a more advanced optimizing compiler for peak performance. In V8, this script execution pipeline has a variety of special cases and conditions which require complex machinery to switch between the baseline compiler and two optimizing compilers, Crankshaft and TurboFan.

One of the issues with this approach (in addition to architectural complexity) is that the JITed machine code can consume a significant amount of memory, even if the code is only executed once. In order to mitigate this overhead, the V8 team has built a new JavaScript interpreter, called Ignition, which can replace V8’s baseline compiler, executing code with less memory overhead and paving the way for a simpler script execution pipeline.

With Ignition, V8 compiles JavaScript functions to a concise bytecode, which is between 50% to 25% the size of the equivalent baseline machine code. This bytecode is then executed by a high-performance interpreter which yields execution speeds on real-world websites close to those of code generated by V8’s existing baseline compiler.

In Chrome 53, Ignition will be enabled for Android devices which have limited RAM (512 MB or less), where memory savings are most needed. Results from early experiments in the field show that Ignition reduces the memory of each Chrome tab by around 5%.

V8’s compilation pipeline with Ignition enabled.

Details


In building Ignition’s bytecode interpreter, the team considered a number of potential implementation approaches. A traditional interpreter, written in C++ would not be able to interact efficiently with the rest of V8’s generated code. An alternative would have been to hand-code the interpreter in assembly code, however given V8 supports nine architecture ports, this would have entailed substantial engineering overhead.

Instead, we opted for an approach which leveraged the strength of TurboFan, our new optimizing compiler, which is already tuned for optimal interaction with the V8 runtime and other generated code. The Ignition interpreter uses TurboFan’s low-level, architecture-independent macro-assembly instructions to generate bytecode handlers for each opcode. TurboFan compiles these instructions to the target architecture, performing low-level instruction selection and machine register allocation in the process. This results in highly optimized interpreter code which can execute the bytecode instructions and interact with the rest of the V8 virtual machine in a low-overhead manner, with a minimal amount of new machinery added to the codebase.

Ignition is a register machine, with each bytecode specifying its inputs and outputs as explicit register operands, as opposed to a stack machine where each bytecode would consume inputs and push outputs on an implicit stack. A special accumulator register is an implicit input and output register for many bytecodes. This reduces the size of bytecodes by avoiding the need to specify specific register operands. Since many JavaScript expressions involve chains of operations which are evaluated from left to right, the temporary results of these operations can often remain in the accumulator throughout the expression’s evaluation, minimizing the need for operations which load and store to explicit registers.

As the bytecode is generated, it passes through a series of inline-optimization stages. These stages perform simple analysis on the bytecode stream, replacing common patterns with faster sequences, remove some redundant operations, and minimize the number of unnecessary register loads and transfers. Together, the optimizations further reduce the size of the bytecode and improve performance.

For further details on the implementation of Ignition, see our BlinkOn talk:


Future


Our focus for Ignition up until now has been to reduce V8’s memory overhead. However, adding Ignition to our script execution pipeline opens up a number of future possibilities. The Ignition pipeline has been designed to enable us to make smarter decisions about when to execute and optimize code to speed up loading web pages and reduce jank and to make the interchange between V8’s various components more efficient.

Stay tuned for future developments in Ignition and V8.

by Ross McIlroy, V8 Ignition Jump Starter

Thursday, July 21, 2016

V8 at the BlinkOn 6 conference

BlinkOn is a biannual meeting of Blink, V8, and Chromium contributors. BlinkOn 6 was held in Munich on June 16 and June 17. The V8 team gave a number of presentations on architecture, design, performance initiatives, and language implementation.

The V8 BlinkOn talks are embedded below.

Real-world JavaScript Performance


Length: 31:41



Outlines the history of how V8 measures JavaScript performance, the different eras of benchmarking, and a new technique to measure page loads across real-world, popular websites with detailed breakdowns of time per V8 component.

Ignition: an interpreter for V8


Length: 36:39



Introduces V8’s new Ignition Interpreter, explaining the architecture of the engine as a whole, and how Ignition affects memory usage and startup performance.

How we measure and optimize for RAIL in V8’s GC


Length: 27:11



Explains how V8 uses the Response, Animation, Idle, Loading (RAIL) metrics to target low-latency garbage collection and the recent optimizations we’ve made to reduce jank on mobile.

ECMAScript 2015 and Beyond


Length: 28:52



Provides an update on the implementation of new language features in V8, how those features integrate with the web platform, and the standards process which continues to evolve the ECMAScript language.

Tracing Wrappers from V8 to Blink (Lightning Talk)


Length: 2:31


Highlights tracing wrappers between V8 and Blink objects and how they help prevent memory leaks and reduce latency.

Monday, July 18, 2016

V8 Release 5.3

Roughly every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before Chrome branches for a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 5.3, which will be in beta until it is released in coordination with Chrome 53 Stable. V8 5.3 is filled will all sorts of developer-facing goodies, so we’d like to give you a preview of some of the highlights in anticipation of the release in several weeks.

Memory


New Ignition Interpreter


Ignition, V8's new interpreter, is feature complete and will be enabled in Chrome 53 for low-memory Android devices. The interpreter brings immediate memory savings for JIT'ed code and will allow V8 to make future optimizations for faster startup during code execution. Ignition works in tandem with V8's existing optimizing compilers (TurboFan and Crankshaft) to ensure that “hot” code is still optimized for peak performance. We are continuing to improve interpreter performance and hope to enable Ignition soon on all platforms, mobile and desktop. Look for an upcoming blog post for more information about Ignition’s design, architecture, and performance gains. Embedded versions of V8 can turn on the Ignition interpreter with the flag --ignition.

Reduced jank


V8 version 5.3 includes various changes to reduce application jank and garbage collection times. These changes include:
  • Optimizing weak global handles to reduce the time spent handling external memory
  • Unifying the heap for full garbage collections to reduce evacuation jank
  • Optimizing V8’s black allocation additions to the garbage collection marking phase
Together, these improvements reduce full garbage collection pause times by about 25%, measured while browsing a corpus of popular webpages. For more detail on recent garbage collection optimizations to reduce jank, see the “Jank Busters” blog posts Part 1 & Part 2.

Performance


Improving page startup time


The V8 team recently began tracking performance improvements against a corpus of 25 real-world website page loads (including popular sites such as Facebook, Reddit, Wikipedia, and Instagram). Between V8 5.1 (measured in Chrome 51 from April) and V8 5.3 (measured in a recent Chrome Canary 53) we improved startup time in aggregate across the measured websites by ~7%. These improvements loading real websites mirrored similar gains on the Speedometer benchmark, which ran 14% faster in V8 5.3. For more details about our new testing harness, runtime improvements, and breakdown analysis of where V8 spends time during page loads, see our upcoming blog post on startup performance.

ES6 Promise performance


V8's performance on the Bluebird ES6 Promise benchmark suite improved by 20-40% in V8 version 5.3, varying by architecture and benchmark.

V8 Promise performance over time on a Nexus 5x


V8 API


Please check out our summary of API changes. This document gets regularly updated a few weeks after each major release.

Developers with an active V8 checkout can use 'git checkout -b 5.3 -t branch-heads/5.3' to experiment with the new features in V8 5.3. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.

Posted by the V8 team

Saturday, June 4, 2016

V8 Release 5.2

Roughly every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before Chrome branches for a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 5.2, which will be in beta until it is released in coordination with Chrome 52 Stable. V8 5.2 is filled will all sorts of developer-facing goodies, so we’d like to give you a preview of some of the highlights in anticipation of the release in several weeks.

ES6 & ES7 support

V8 5.2 contains support for ECMAScript 6 (aka ES2015) and ECMAScript 7 (aka ES2016).

Exponentiation operator

This release contains support for the ES7 exponentiation operator, an infix notation to replace Math.pow.
let n = 3**3; // n == 27
n **= 2; // n == 729

Evolving spec

For more information on the complexities behind support for evolving specifications and continued standards discussion around web compatibility bugs and tail calls, see the V8 blog post ES6, ES7, and beyond.

Performance

V8 5.2 contains further optimizations to improve the performance of JavaScript built-ins, including improvements for Array operations like the isArray method, the in operator, and Function.prototype.bind. This is part of ongoing work to speed up built-ins based on new analysis of runtime call statistics on popular web pages. For more information, see the V8 Google I/O 2016 talk and look for an upcoming blog post on performance optimizations gleaned from real-world websites.

V8 API

Please check out our summary of API changes. This document gets regularly updated a few weeks after each major release.

Developers with an active V8 checkout can use 'git checkout -b 5.2 -t branch-heads/5.2' to experiment with the new features in V8 5.2. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.

Posted by the V8 team

Friday, April 29, 2016

ES6, ES7, and beyond

The V8 team places great importance on the evolution of JavaScript into an increasingly expressive and well-defined language that makes writing fast, safe, and correct web applications easy. In June 2015, the ES6 specification was ratified by the TC39 standards committee, making it the largest single update to the JavaScript language. New features include classes, arrow functions, promises, iterators / generators, proxies, well-known symbols, and additional syntactic sugar. TC39 has also increased the cadence of new specifications and released the candidate draft for ES7 in February 2016, to be ratified this summer. While not as expansive as the ES6 update due to the shorter release cycle, ES7 notably introduces the exponentiation operator and Array.prototype.includes().

Today we’ve reached an important milestone: V8 supports ES6 and ES7. You can use the new language features today in Chrome Canary, and they will ship by default in the M52 release of Chromium.

Given the nature of an evolving spec, the differences between various types of conformance tests, and the complexity of maintaining web compatibility, it can be difficult to determine when a certain version of ECMAScript is considered fully supported by a JavaScript engine. Read on for why spec support is more nuanced than version numbers, why proper tail calls are still under discussion, and what caveats remain at play.

An evolving spec

When TC39 decided to publish more frequent updates to the JavaScript specification, the most up-to-date version of the language became the master, draft version. Although versions of the ECMAScript spec are still produced yearly and ratified, V8 implements a combination of the most recently ratified version (e.g. ES6), certain features which are close enough to standardization that they are safe to implement (e.g. the exponentiation operator and Array.prototype.includes() from the ES7 candidate draft), and a collection of bug fixes and web compatibility amendments from more recent drafts. Part of the rationale for such an approach is that language implementations in browsers should match the specification, even if the it’s the specification that needs to be updated. In fact, the process of implementing a ratified version of the spec often uncovers many of the fixes and clarifications that comprise the next version of the spec.

Currently shipping parts of the evolving ECMAScript specification

For example, when implementing the ES6 RegExp sticky flag, the V8 team discovered that the semantics of the ES6 spec broke many existing sites (including all sites using versions 2.x.x of the the popular XRegExp library on npm). Since compatibility is a cornerstone of the web, engineers from the V8 and Safari JavaScriptCore teams proposed an amendment to the RegExp specification to fix the breakage, which was agreed upon by TC39. The amendment won't appear in a ratified version until ES8, but it's still a part of the ECMAScript language and we've implemented it in order to ship the RegExp sticky flag.

The continual refinement of the language specification and the fact that each version (including the yet-to-be-ratified draft) replaces, amends, and clarifies previous versions makes it tricky to understand the complexities behind ES6 and ES7 support. While it's impossible to state succinctly, it's perhaps most accurate to say that V8 supports compliance with the “continually maintained draft future ECMAScript standard”!

Measuring conformance

In an attempt to make sense of this specification complexity, there are a variety of ways to measure JavaScript engine compatibility with the ECMAScript standard. The V8 team, as well as other browser vendors, use the test262 test suite as the gold standard of conformance to the continually maintained draft future ECMAScript standard. This test suite is continually updated to match the spec and it provides 16,000 discrete functional tests for all the features and edge cases which make up a compatible, compliant implementation of JavaScript. Currently V8 passes approximately 98% of test262, and the remaining 2% are a handful of edge cases and future ES features not yet ready to be shipped.

Since it’s difficult to skim the enormous number of test262 tests, other conformance tests exist, such as the Kangax compatibility table. Kangax makes it easy to skim to see whether a particular feature (like arrow functions) has been implemented in a given engine, but doesn’t test all the conformance edge cases that test262 does. Currently, Chrome Canary scores a 98% on the Kangax table for ES6 and 100% on the sections of Kangax corresponding to ES7 (e.g. the sections labelled “2016 features” and “2016 misc” under the ESnext tab).

The remaining 2% of the Kangax ES6 table tests proper tail calls, a feature which has been implemented in V8, but deliberately turned off in Chrome Canary due to outstanding developer experience concerns detailed below. With the “Experimental JavaScript features” flag enabled, which forces this feature on, Canary scores 100% on the entirety of the Kangax table for ES6.

Proper Tail Calls

Proper tail calls have been implemented but not yet shipped given that a change to the feature is currently under discussion at TC39. ES6 specifies that strict mode function calls in tail position should never cause a stack overflow. While this is a useful guarantee for certain programming patterns, the current semantics have two problems. First, since the tail call elimination is implicit, it can be difficult for programmers to identify which functions are actually in tail call position. This means that developers may not discover misplaced attempted tail calls in their programs until they overflow the stack. Second, implementing proper tail calls requires eliding tail call stack frames from the stack, which loses information about execution flow. This in turn has two consequences:
  1. It makes it more difficult to understand during debugging how execution arrived at a certain point since the stack contains discontinuities and
  2. Error.prototype.stack contains less information about execution flow which may break telemetry software that collects and analyzes client-side errors.
Implementing a shadow stack can improve the readability of call stacks, but the V8 and DevTools teams believe that debugging is easiest, most reliable, and most accurate when the stack displayed during debugging is completely deterministic and always matches the true state of the actual virtual machine stack. Moreover, a shadow stack is too expensive performance-wise to turn on all the time.

For these reasons, the V8 team strongly support denoting proper tail calls by special syntax. There is a pending TC39 proposal called syntactic tail calls to specify this behavior, co-championed by committee members from Mozilla and Microsoft. We have implemented and staged proper tail calls as specified in ES6 and started implementing syntactic tail calls as specified in the new proposal. The V8 team plans to resolve the issue at the next TC39 meeting before shipping implicit proper tail calls or syntactic tail calls by default. You can test out each version in the meantime by using the V8 flags --harmony-tailcalls and --harmony-explicit-tailcalls.

Modules

One of the most exciting promises of ES6 is support for JavaScript modules to organize and separate different parts of an application into namespaces. ES6 specifies import and export declarations for modules, but not how modules are loaded into a JavaScript program. In the browser, loading behavior was recently specified by the new <script type="module"> tag. Although additional standardization work is needed to specify advanced dynamic module-loading APIs, Chromium support for module script tags is already in development. You can track implementation work on the launch bug and read more about experimental loader API ideas in the whatwg/loader repository.


ESnext and beyond

In the future, developers can expect ECMAScript updates to come in smaller, more frequent updates with shorter implementation cycles. The V8 team is already working to bring upcoming features such as async / await keywords, Object.values() / Object.entries(), String.prototype.padStart() / String.prototype.padEnd() and RegExp lookbehind to the runtime. Check back for more updates on our ESnext implementation progress and performance optimizations for existing ES6 and ES7 features.

We strive to continue evolving JavaScript and strike the right balance of implementing new features early, ensuring compatibility and stability of the existing web, and providing TC39 implementation feedback around design concerns. We look forward to seeing the incredible experiences developers will build with these new features.

-- Posted by the V8 team, ECMAScript Enthusiasts

Saturday, April 23, 2016

V8 Release 5.1

The first step in the V8 release process is a new branch from the git master immediately before Chromium branches for a Chrome Beta milestone (roughly every six weeks). Our newest release branch is V8 5.1, which will remain in beta until we release a stable build in conjunction with Chrome 51 Stable. Here’s a highlight of the new developer-facing features in this version of V8.

Improved ECMAScript support

V8 5.1 contains a number of changes towards compliance with the ES2017 draft spec.

Symbol.species

Array methods like Array.prototype.map construct instances of the subclass as its output, with the option to customize this by changing Symbol.species. Analogous changes are made to other built-in classes.

instanceof customization

Constructors can implement their own Symbol.hasInstance method, which overrides the default behavior.

Iterator closing

Iterators created as part of a for-of loop (or other built-in iteration, such as the spread operator) are now checked for a close method which is called if the loop terminates early. This can be used for clean-up duty after the iteration has finished.

RegExp subclassing exec method

RegExp subclasses can overwrite the exec method to change just the core matching algorithm, with the guarantee that this is called by higher level functions like String.prototype.replace.

Function name inference

Function names inferred for function expressions are now typically made available in the name property of functions, following the ES2015 formalization of these rules. This may change existing stack traces and provide different names from previous V8 versions. It also gives useful names to properties and methods with computed property names:
class Container {
   ...
   [Symbol.iterator]() { ... }
   ...
}
let c = new Container;
// Logs "[Symbol.iterator]".
console.log(c[Symbol.iterator].name);

Array.prototype.values

Analogous to other collection types, the values method on Array returns an iterator over the contents of the Array.

Performance improvements

Release 5.1 also brings a few notable performance improvements to the following JavaScript features:
  • Executing loops like for-in
  • Object.assign
  • Promise and RegExp instantiation
  • Calling Object.prototype.hasOwnProperty
  • Math.floor, Math.round and Math.ceil
  • Array.prototype.push
  • Object.keys
  • Array.prototype.join & Array.prototype.toString
  • Flattening repeat strings e.g. '.'.repeat(1000)

WASM

5.1 has a preliminary support for WASM. You can enable it via the flag --expose_wasm in d8. Alternatively you can try out the WASM demos with Chrome 51 (Beta Channel).

Memory

V8 implemented more slices of Orinoco:
  • Parallel young generation evacuation 
  • Scalable remembered sets 
  • Black allocation 
The impact is reduced jank and memory consumption in times of need.

V8 API

Please check out our summary of API changes. This document gets regularly updated a few weeks after each major release.

Developers with an active V8 checkout can use 'git checkout -b 5.1 -t branch-heads/5.1' to experiment with the new features in V8 5.1. Alternatively you can subscribe to Chrome's Beta channel and try the new features out yourself soon.

Posted by the V8 team

Tuesday, April 12, 2016

Jank Busters Part Two: Orinoco

In a previous blog post, we introduced the problem of jank caused by garbage collection interrupting a smooth browsing experience. In this blog post we introduce three optimizations that lay the groundwork for a new garbage collector in V8, codenamed Orinoco. Orinoco is based on the idea that implementing a mostly parallel and concurrent garbage collector without strict generational boundaries will reduce garbage collection jank and memory consumption while providing high throughput. Instead of implementing Orinoco behind a flag as a separate garbage collector, we decided to ship features of Orinoco incrementally on V8 tip of tree to benefit users immediately. The three features discussed in this post are parallel compaction, parallel remembered set processing, and black allocation.

V8 implements a generational garbage collector where objects may move within the young generation, from the young to the old generation, and within the old generation. Moving objects is expensive since the underlying memory of objects needs to be copied to new locations and the pointers to those objects are also subject to updating. Figure 1 shows the phases and how they were executed before Orinoco. Essentially, objects were moved first and then pointers between those objects were updated afterwards, all in sequential order, resulting in observable jank.

Figure 1: Sequential moving of objects and updating pointers

V8 partitions its heap memory into fixed-size chunks, called pages, that are assigned to either young or old generation space. Objects are initially allocated in the young generation. Upon garbage collection, live objects are moved within the young generation once. Objects that survive another garbage collection are promoted to the old generation. For both phases, which we call collectively young generation evacuation, we parallelize the copying of memory based on pages. Within the young generation, moving objects always involves allocating memory on fresh pages (and releasing the old pages), leaving behind a compact memory layout. In the old generation this process happens in a slightly different manner, since dead memory leaves behind unusable holes (or fragmentation). Some of these holes can be reused via free lists, but others are left behind, requiring compaction to move live objects to a better packed (potentially new) page. Similar to the young generation this process is parallelized on page-level.

Since there are no dependencies between young generation evacuation and old generation compaction, Orinoco now performs these phases in parallel, as shown in Figure 2. The result of these improvements is a reduction of compaction time of 75% from ~7ms to under 2ms on average.

Figure 2: Parallel moving of objects and updating pointers


The second optimization introduced by Orinoco improves how garbage collection tracks pointers. When an object moves location on the heap, the garbage collector has to find all pointers that contain the old location of the moved object and update them with the new location. Since iterating through the heap to find the pointers would be very slow, V8 uses a data structure called a remembered set to keep track of all the interesting pointers on the heap. A pointer is interesting if it points to an object that may move during garbage collection. For example, all pointers from the old generation to the new generation are interesting because new generation objects move on every garbage collection. Pointers to objects in heavily fragmented pages are also interesting because these objects will move to other pages during compaction.

Previously, V8 implemented remembered sets as arrays of pointer addresses, or store buffers. There was one store buffer for the young generation and one for each of the fragmented old generation pages. The store buffer of a page contains addresses of all incoming pointers as shown in Figure 3. Entries are appended to a store buffer in a write barrier, which guards write operations in JavaScript code. This may result in duplicate entries since a store buffer may include a pointer multiple times and two different store buffers may include the same pointer. Duplicate entries make parallelization of the pointer update phase difficult because of the data race caused by two threads trying to update the same pointer.

Figure 3: Old remembered set

Orinoco removes this complexity by reorganizing the remembered set to simplify parallelization and make sure that threads get disjoint sets of pointers to update. Instead of storing incoming interesting pointers in an array, each page now stores the offsets of interesting pointers originating from that page in buckets of bitmaps as shown in Figure 4. Each bucket is either empty or points to a bitmap of a fixed length. A bit in the bitmap corresponds to a pointer offset in the page. If a bit is set then the pointer is interesting and is in the remembered set. Using this data-structure we can parallelize pointer updates based on pages. The absence of duplicate entries and the dense representation of pointers also allowed us to remove complex code for handling remembered set overflow. In our long running Gmail benchmark, this change reduced the maximum pause time of compacting garbage collection by 45% from 42ms to 23 ms.

Figure 4: New remembered set



The third optimization that Orinoco introduces is black allocation, an improvement to the marking phase of the garbage collector. Black allocation (shipped in V8 5.1) is a garbage collection technique in which all objects allocated in the old generation (e.g. pre-tenured allocations or promoted objects by the garbage collector) are marked black immediately in order to designate them as "live". The intuition behind black allocation is that objects allocated in the old generation are likely long living. Therefore, objects that were recently allocated in the old generation should at least survive the next old generation garbage collection, otherwise they were falsely promoted. After coloring newly allocated objects black the garbage collector will not visit them. We speed up coloring of black objects by allocating them on black pages where all all objects are black by default. Another benefit of black pages is that they do not have to be swept, since all objects allocated on them are (by definition) live. Black allocation speeds up incremental marking progress since marking work does not increase with new allocations. The impact of black allocation is clearly visible on the Octane Splay benchmark where the throughput and latency score improved by about 30% while using about 20% less memory due to faster marking progress and less garbage collection work overall.

We plan to roll out more Orinoco features soon. Stay tuned, we are still tinkering!

Posted by the jank busters: Ulan Degenbaev, Michael Lippautz, and Hannes Payer