This document describes the New HTTP cache version 2.

 

The code resides in /network/cache2.

 

API

Here is a detailed description of the HTTP cache v2 API, examples included.  This document only contains what cannot be found or may not be clear directly from the IDL files comments.

It is strongly encoraged to NOT USE the OLD cache API any more - nsICacheService et al.  It will soon be completely obsoleted and removed (bug 913828).

nsICacheStorageService

nsILoadContextInfo

nsICacheStorage

nsICacheEntryOpenCallback

nsICacheEntry

Lifetime of a new entry

Concurrent read and write

Important difference in behavior from the old cache: the cache now supports reading a cache entry data while it is still being written by the first consumer - the writer.

This can only be engaged for resumable responses that (bug 960902) don't need revalidation. Reason is that when the writer is interrupted (by e.g. external canceling of the loading channel) concurrent readers would not be able to reach the remaning unread content.

This could be improved by keeping the network load running and being stored to the cache entry even after the writing channel has been canceled.

When the writer is interrupted, the first concurrent reader in line does a range request for the rest of the data - and becomes that way a new writer. The rest of the readers are still concurrently reading the content since output stream for the cache entry is again open and kept by the current writer.

Lifetime of an existing entry with only a partial content

Lifetime of an existing entry that doesn't pass server revalidation

Adding a new storage

Should there be a need to add a new distinct storage for which the current scoping model would not be sufficient - use one of the two following ways:

  1. [preffered] Add a new <Your>Storage method on nsICacheStorageService and if needed give it any arguments to specify the storage scope even more.  Implementation only should need to enhance the context key generation and parsing code and enhance current - or create new when needed - nsICacheStorage implementations to carry any additional information down to the cache service.
  2. [not preferred] Add a new argument to nsILoadContextInfo; be careful here, since some arguments on the context may not be known during the load time, what may lead to inter-context data leaking or implementation problems. Adding more distinction to nsILoadContextInfo also affects all existing storages which may not be always desirable.

See context keying details for more information.

Code examples

TBD

Opening an entry

Creating a new entry

Recreating an already open entry

Implementation

Threading

The cache API is fully thread-safe.

The cache is using a single background thread where any IO operations like opening, reading, writing and erasing happen.  Also memory pool management, eviction, visiting loops happen on this thread.

The thread supports several priority levels. Dispatching to a level with a lower number is executed sooner then dispatching to higher number layers; also any loop on lower levels yields to higher levels so that scheduled deletion of 1000 files will not block opening cache entries.

  1. OPEN_PRIORITY: except opening priority cache files also file dooming happens here to prevent races
  2. READ_PRIORITY: top level documents and head blocking script cache files are open and read as the first
  3. OPEN
  4. READ: any normal priority content, such as images are open and read here
  5. WRITE: writes are processed as last, we cache data in memory in the mean time
  6. MANAGEMENT: level for the memory pool and CacheEntry background operations
  7. CLOSE: file closing level
  8. INDEX: index is being rebuild here
  9. EVICT: files overreaching the disk space consumption limit are being evicted here

NOTE: Special case for eviction - when an eviction is scheduled on the IO thread, all operations pending on the OPEN level are first merged to the OPEN_PRIORITY level. The eviction preparation operation - i.e. clearing of the internal IO state - is then put to the end of the OPEN_PRIORITY level.  All this happens atomically. This functionality is currently pending in bug 976866.

Storage and entries scopes

A scope key string used to map the storage scope is based on the arguments of nsILoadContextInfo. The form is following (currently pending in bug 968593):

a,b,i1009,p,

CacheStorageService keeps a global hashtable mapped by the scope key. Elements in this global hashtable are hashtables of cache entries. The cache entries are mapped by concantation of Enhance ID and URI passed to nsICacheStorage.asyncOpenURI.  So that when an entry is beeing looked up, first the global hashtable is searched using the scope key. An entries hashtable is found. Then this entries hashtable is searched using <enhance-id:><uri> string. The elemets in this hashtable are CacheEntry classes, see below.

The hash tables keep a strong reference to CacheEntry objects. The only way to remove CacheEntry objects from memory is by exhausting a memory limit for intermediate memory caching, what triggers a background process of purging expired and then least used entries from memory. Another way is to directly call the nsICacheStorageService.purge method. That method is also called automatically on the "memory-pressure" indication.

Access to the hashtables is protected by a global lock. We also - in a thread-safe manner - count the number of consumers keeping a reference on each entry. The open callback actually doesn't give the consumer directly the CacheEntry object but a small wrapper class that manages the 'consumer reference counter' on its cache entry. This both mechanisms ensure thread-safe access and also inability to have more then a single instance of a CacheEntry for a single <scope+enhanceID+URL> key.

CacheStorage, implementing the nsICacheStorage interface, is forwarding all calls to internal methods of CacheStorageService passing itself as an argument.  CacheStorageService then generates the scope key using the nsILoadContextInfo of the storage. Note: CacheStorage keeps a thread-safe copy of nsILoadContextInfo passed to a *Storage method on nsICacheStorageService.

Invoking open callbacks

CacheEntry, implementing the nsICacheEntry interface, is responsible for managing the cache entry internal state and to properly invoke onCacheEntryCheck and onCacheEntryAvaiable callbacks to all callers of nsICacheStorage.asyncOpenURI.

The openers FIFO is an array of CacheEntry::Callback objects. CacheEntry::Callback keeps a strong reference to the opener plus the opening flags.  nsICacheStorage.asyncOpenURI forwards to CacheEntry::AsyncOpen and triggers the following pseudo-code:

CacheStorage::AsyncOpenURI - the API entry point:

CacheEntry::AsyncOpen (entry atomic):

CacheEntry::InvokeCallbacks (entry atomic):

CacheEntry::OnFileReady (entry atomic):

CacheEntry::OnHandleClosed (entry atomic):

All consumers release the reference:

Intermediate memory caching of frequently used metadata (a.k.a. disk cache memory pool)

This is a description of this feature status that is currently only a patch in bug 986179. Current behavior is simpler and causes a serious memory consumption regression (bug 975367).

For the disk cache entries we keep some of the most recent and most used cache entries' meta data in memory for immediate zero-thread-loop opening. The default size of this meta data memory pool is only 250kB and is controlled by a new browser.cache.disk.metadata_memory_limit preference. When the limit is exceeded, we purge (throw away) first expired and then least used entries to free up memory again. 

Only CacheEntry objects that are already loaded and filled with data and having the 'consumer reference == 0' (bug 942835) can be purged.

The 'least used' entries are recognized by the lowest value of frecency we re-compute for each entry on its every access. The decay time is controlled by the browser.cache.frecency_half_life_hours preference and defaults to 6 hours. The best decay time will be based on results of an experiment.

The memory pool is represented by two lists (strong refering ordered arrays) of CacheEntry objects:

  1. Sorted by expiration time (that default to 0xFFFFFFFF)
  2. Sorted by frecency (defaults to 0)

We have two such pools, one for memory-only entries actually representing the memory-only cache and one for disk cache entries for which we only keep the meta data.  Each pool has a different limit checking - the memory cache pool is controlled by browser.cache.memory.capacity, the disk entries pool is already described above. The pool can be accessed and modified only on the cache background thread.

"@mozilla.org/netwerk/cache-storage-service;1"