Commons:Structured data/Stable Interface Policy

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Stable public interfaces for data access are a crucial component of any public knowledge repository. This Stable Interface Policy defines which guarantees are and are not given by the Structured Data engineering team regarding the stability of data formats provided by WikibaseMediaInfo as deployed on commons.wikimedia.org.

Definitions

[edit]

This section defines some crucial terms used in this document.

  • Consumer: software that reads and interprets data received from Wikimedia Commons.
  • Client: software that calls public Wikimedia Commons. Clients are typically also consumers of data.
  • Compliant client/consumer: A client or consumer that complies with the specification of the underlying formats and protocols it uses. For instance, a compliant consumer that reads JSON data complies with the JSON specification, and will accept any encoding allowed by the JSON specification (RFC 7159). A compliant client using a web API will comply with the HTTP spec, etc.
  • Well behaved client/consumer: A (compliant) client or consumer which is implemented in a robust and forward-compatible way, specifically taking into account the guarantees and limitations stated in this document. For instance, a well-behaved client will not break when encountering a new data type.
  • Breaking change: a change to a data format that violates guarantees given or widely assumed before. Breaking changes include removal of data fields and changes to the interpretation or format of data fields.
  • Significant change: a change to a data format that would be beneficial for clients or consumers to adapt to, but which will not break a well behaved client or consumer. Significant changes particularly include additions, such as the introduction of new data types or entity types, or the inclusion of additional information in the data output. See Extensibility below.
  • Insignificant change: a change to a data format that is not expected to have any impact in a well-behaved client. Insignificant changes include changes to whitespace outside literals as well as the order of fields in a JSON object.
  • Stable Interface: a data format for which breaking and significant changes will be announced as per the below policy. Which interfaces are considered stable is defined in the Stable Interfaces later in this document.

Notification Policy

[edit]

This section defines where and when the operators of clients and consumers will be notified of changes to a stable interface. No guarantees are made regarding unstable interfaces.

Extensibility

[edit]

This section explains in which way our data model and data formats are extensible. Consumers should consider this information in order to accommodate unknown structures they may encounter in the data.

The Wikibase Data Model is designed to be extensible. In particular, it is possible to introduce new data types and new entity types. Well-behaved clients and consumers should thus be prepared to encounter unknown data types and entity types, and handle them gracefully, in a way appropriate for the use at hand. In many cases, it is appropriate to simply ignore such structures of unknown type.

Similarly, bindings such as the JSON representation of the Wikibase data model are designed to be extensible. Data structures may be added in any syntactically appropriate place as long as they do not modify the meaning of pre-existing fields or data structures, and as long as their addition does not break any guarantees regarding the containing data structures. This follows the idea of the Liskov substitution principle: what was guaranteed about a data structure before the addition should still be guaranteed after the addition.

If no explicit guarantees are given regarding the structure and contents of a data structure, the following principles should give guidance regarding whether a change should be considered a breaking change:

  • In structures based on lists (aka arrays) and maps (aka hashes or objects), like JSON is, adding a key to a map is not considered a breaking change, as long as the new field does not change the interpretation of any other fields in the structure (nor in any surrounding structure). Adding a structure to a list or set however is considered a breaking change if it would break assumptions about the type of structure to expect in the list, or under what conditions a structure would be included in the list.
  • By convention, lists are considered homogeneous, and should only contain one kind of element, unless otherwise specified. So adding a data structure to a list is a breaking change if that data structure is not compatible with the type of structure that the list was previously defined or implied to contain.
  • In a tabular data representation, such as a relational database schema, the addition of fields is not considered a breaking change. Any change to the interpretation of a field, as well as the removal of fields, are considered breaking. Changes to existing unique indexes or primary keys are breaking changes; changes to other indexes as well as the addition of new indexes are not breaking changes.
  • In DOM-like structures based on nested typed elements with attributes, like XML is, adding an attribute is not considered a breaking change, as long as the new attribute does not change the interpretation of any other fields in the structure (nor in any surrounding structure). Adding a new type of element to a parent element is also not considered breaking, if that parent element is heterogeneous and essentially acts like a map. However, if the parent element is defined or implied to be a homogeneous list of a specific kind of child element, adding another kind of element is considered a breaking change.
  • For data formats that allow namespacing, like XML does, names (attribute names, element names) that belong to a namespace not explicitly mentioned by the specification of the data format can be ignored by consumers. Addition and changes to data structures from other namespaces are not considered breaking changes.
  • In contrast, the following modifications are examples of breaking changes, and can thus not be used to extend a format: removal of fields, changes to the type or format of a primitive value, changes to the interpretation or role of a data field, as well as changes to the element type of a collection as described above.

Stable Data Formats

[edit]

This section lists the data formats we consider stable. These data formats are subject to the above notification policy.

The RDF mapping of the WikibaseMediaInfo Data Model, as used in RDF dumps as well as in the Linked Data Interface, is considered a stable data format. Any changes to the structure or interpretation of the mapping are subject to the above notification policy. As per the general principles of RDF, additional information introduced at any time, in any location, about any subject, is not considered a breaking change.

The JSON binding of the WikibaseMediaInfo Data Model as used in JSON dumps, with the web API, and with the Linked Data Interface, is considered a stable data format. Any changes to the structure or interpretation of the mapping are subject to the above notification policy. Following the flexible nature of JSON, the addition of fields to JSON objects is not considered a breaking change. Well-behaved consumers should be prepared to ignore such additional fields.

Stable Public APIs

[edit]

This section lists the interfaces we consider stable. These interfaces are subject to the above notification policy.

The Wikibase Web API accessible via https://commons.wikimedia.org/w/api.php is considered a stable interface. Changes to the parameters, operation, or returned data structure are subject to the notification policy.

The Linked Data Interface accessible via https://commons.wikimedia.org/wiki/Special:EntityData and https://commons.wikimedia.org/entity/... is considered a stable interface. Changes to the parameters, operation, or returned data structure are subject to the above notification policy.

The Wikimedia Commons Query Service accessible via https://wcqs-beta.wmflabs.org/ is in a beta and should not be considered a stable interface. It provides a full SPARQL endpoint. Whilst in beta, it is not subject to the above notification policy but may be provided as a courtesy.

To allow better gadget integration JavaScript hooks documented in the hooks-js.md file delivered together with Wikibase source code are considered stable.

We acknowledge that third party tools on Cloud VPS and Toolforge may rely on the Wikibase database schema. Whilst changes to WikibaseMediainfo that impact available tables and fields are subject to the above notification policy; changes to Wikibase itself are subject to the Wikidata stable interface policy. However, note that the database schema is not designed to be a public API, and less consideration is given to backwards compatibility.

Unstable Interfaces

[edit]

This section lists some interfaces that we do not currently consider stable, and thus may change in incompatible ways without notice.

MediaWiki XML Dumps are not considered a stable interface. MediaWiki XML dumps contain the raw data of page revisions in their internal representation. The internal representation of WikibaseMediaInfo entities is not a stable interface. It has changed significantly in the past, and it may change again in the future. Several different representations of WikibaseMediaInfo content may be present in the same XML dump.

Wikibase + WikibaseMediaInfo PHP code is not considered a stable interface. Although the Wikibase project now provides official releases, commons.wikimedia.org still receives rolling deployment of Wikibase & WikibaseMediaInfo code. Therefore there is no point in time at which any given PHP class or interface can be assumed to remain stable.

Wikibase + WikibaseMediaInfo JavaScript code is not considered a stable interface. Although the Wikibase project now provides official releases, commons.wikimedia.org still receives rolling deployment of Wikibase & WikibaseMediaInfo code. Therefore there is no point in time at which JavaScript code can be assumed to remain stable. This means that Gadgets cannot rely on the JavaScript code to remain stable.

The HTML DOM structure generated by WikibaseMediaInfo is not considered a stable interface. This means that Gadgets cannot rely on the DOM structure to remain stable.

Outlook

[edit]

This section provides information about improvements that are planned or considered for the future.

History

[edit]

This section lists past and scheduled breaking changes. The list of past changes before the implementation of this policy may be incomplete. Each change should be listed with the date of announcement and the date of deployment, ideally accompanied with a link to the announcement and any relevant tickets.