CPAN Testers is only made possible with the support of our sponsors.
For more information on sponsoring, please visit the I CPAN Testers website.

Upgrade Notice

The CPAN Testers Wiki site has been upgraded since you last accessed the site. Please press the F5 key or CTRL-R to refresh your browser cache to use the latest javascript and CSS files.

Overview

The CPAN Metabase is a framework for storing metadata about CPAN distributions. It can be used to store, retrieve, and search this information, which can be of arbitrary and mixed types. The metabase is intended as the report repository for version 2.0 of CPAN Testers (see Roadmap).

When CPAN::Metabase was initially developed, CPAN Testers reports were sent by individual testers to a single email server, which then forwarded them to a USENET group, which was considered the authoritative store. This presented problems: some testers couldn’t send email, the system wasn’t very searchable or mirrorable, and the data inside the system was entirely unstructured.

CPAN::Metabase aims to avoid all of those problems by being transport- neutral, searchable and mirrorable by design, and geared toward storing structured data. Simplicity is another design goal: while it has several moving parts, they’re all simple and designed to be replaceable and extensible, rather than to be a perfect design up front.

Status

Still in development. Code is available from repositories (see below) but has not yet been released to CPAN.

A simple test client was able to store and retrieve a simple "fact" (text string) from a metabase server.

Components

There are currently four components:

  • CPAN-Metabase -- the database backend and low-level API
  • CPAN-Metabase-Fact -- base class for objects or object collections to be stored in the database
  • CPAN-Metabase-Web -- web server providing an API to query, submit or retrieve from the database
  • CPAN-Metabase-Client -- client library to query, store or retrieve from the web API

CPAN-Metabase-Fact

This component provides base classes for items to be stored in the metabase.

CPAN::Metabase::Fact

Core attributes:

  • cpan_id -- identifier for a CPAN object (e.g. distribution name, module name, author id) that the fact relates to
  • refers_to -- type associated with the cpan_id, initially only 'distribution' is supported
  • type -- class name with colons converted to dashes (automatically populated)
  • content -- user-supplied data about the CPAN object
  • version -- schema version of the fact content

Extended attributes:

  • guid -- unique identifier of a particular fact in a particular metabase
  • index_meta -- standard key/value pairs for indexing any fact
  • content_meta -- key/value pairs specific to a fact subclass

Functionality:

  • auto-population of some core attributes
  • serialization/deserialization

Subclass behaviors:

  • automatic content generation (either replacing or augmenting user-supplied content)
  • validation of user-supplied content
  • content-type specific serialization/deserialization

Design notes:

New facts are require only core attributes to be populated (referred to hereafter as a "bare fact"). When a bare fact is submitted to a metabase, the server (or metabase) must populate extended attributes. If extended attributes exist alredy when the fact is submitted, they should be discarded. When a fact is retrieved from a metabase, all attributes are provided.

(N.B. metabase queries may have an option to retrieve a data structure with some index data; as there is no behavior to associate with the information, this should just be returned as data, not as a lobotimized object.)

CPAN::Metabase::Report

A Report is a subtype of Fact that represents an unordered collection of facts.

Functionality:

  • serialization/deserialization of collections of facts
  • shorthand construction of facts with identical core attributes

Design notes:

Reports require special handling during submission as bundled facts need to be stored first so their unique identifiers are known before the report itself can be stored.

During retrieval, the process runs in reverse, with identifiers retrieved, then associated facts retrieved and then the report is serialized for transport.

Design discussion and open issues

Serialization

We've gone back and forth over how fact objects should serialize -- either as "text" or as frozen objects. Consensus seems to be that we should use plain text -- no objects -- for security and portability.

Repositories

  • DAGOLDEN: http://echo.dagolden.com/git/
    • CPAN-Metabase git://echo.dagolden.com/git/CPAN-Metabase
    • CPAN-Metabase-Client git://echo.dagolden.com/git/CPAN-Metabase-Client
    • CPAN-Metabase-Fact git://echo.dagolden.com/git/CPAN-Metabase-Fact
    • CPAN-Metabase-Web git://echo.dagolden.com/git/CPAN-Metabase-Web
  • RJBS: http://git.codesimply.com
    • CPAN-Metabase git://git.codesimply.com/CPAN-Metabase
    • CPAN-Metabase-Client git://git.codesimply.com/CPAN-Metabase-Client
    • CPAN-Metabase-Fact git://git.codesimply.com/CPAN-Metabase-Fact
    • CPAN-Metabase-Web git://git.codesimply.com/CPAN-Metabase-Web