Skip to main content

Edit the metadata

See metadata cheat sheet as a quick reference.

Now edit the metadata.yaml. The initial metadata schema will be an empty object, feel free to add your own properties:

title:
type: fulltext
  • Key in the object is the name of the property
  • Value is always an object. It contains a required property "type" that gives the type of the object.

For simpler cases you can use a shorthand notation:

title: fulltext

The file can be in yaml (extension .yaml) or json5 file format, yaml is recommended. If you change the extension, be sure to change the reference in model.yaml file. To edit the file, we recommend using VS Code with RedHat's YAML extension.

Feel free to document the file/properties with comments starting with '#', just mind the nesting:

myprop:
# this is a comment
type: keyword

Simple types

  • integer
  • float
  • date
  • time
  • datetime (date, time, datetime)
  • edtf (edtf, edtf-interval)
  • boolean

String types

  • fulltext - will be indexed as type="text" in opensearch
  • keyword - will be indexed as type="keyword" in opensearch
  • fulltext+kyword - will become type="text" with subfield keyword=(type:keyword)

Arrays

Arrays can be written in JSON Schema form or in a shortcut notation:

JSONSchema-like:

tags:
type: array
label.en: Tags # UI extension defining label
items:
type: keyword
minLength: 3

Shortcut notation:

tags[]:
^label.en: Tags
type: keyword
minLength[]: 3

In this notation, properties with ^ prefix are used for the array element, properties without the suffix define array item.

Complex values

As we need to distinguish between nested and object data type, complex values (object values) are written with types 'object' or 'nested'. See Opensearch documentation for the differences between the two.

Example:

Use Case: I want to filter all articles by John Smith with MIT affiliation. This means that cross-field search is required and nested data type must be used.

authors[]:
type: nested
properties:
name:
type: keyword
affiliation:
type: keyword

If cross-field search will never be required, use "object" data type:

authors[]:
type: object
properties:
name: keyword
affiliation: keyword

As a shortcut, you can omit the type: object - it will be added automatically if there is "properties" element inside "authors":

authors[]:
properties:
name: keyword
affiliation: keyword

Another shortcut is to use {} suffix with ^ behaving the same way as in arrays:

author{}:
^label.en: Label of the "author" element
name: keyword
affiliation: keyword

Custom types

Custom data types might be added via plugin to oarepo-model-builder. See i18n plugin for an example of extending the builder.

Structuring metadata file

You can put repeated blocks to a separate file and include them:

# metadata.yaml
authors[]:
use: person.yaml
contributors[]:
use: person.yaml
# person.yaml
type: object
properties:
name:
type: keyword

JSON pointer can be used to take just a part of file:

# metadata.yaml
authors[]:
use: "defs.yaml#/Person"
contributors[]:
use: "person.yaml#/Person"
# defs.yaml
Person:
type: object
properties:
name:
type: keyword

For readability we highly recommend splitting the metadata into multiple files and linking them via use. You can even put these into a python library - see oarepo-model-builder for details.

I18n annotations

Properties in the metadata file can have proper labels, tooltips and editor hints:

title:
type: fulltext
label.en: Article title
tooltip.en: |
any tooltip here, even
on multiple lines
hint.en: Copy/paste this from the journal site
help.en: A longer help text shown, for example, after a click on '?'

Note: these need the oarepo-model-builder-ui plugin to work.

Languages

The labels/tooltip/... MUST have a language associated. If you want to translate them to other languages:

  1. Duplicate the property with another language (this does not scale for more then 2-3 languages):
   label.en: Article title
label.cs: Název článku
  1. Use a special ".key" language - this will turn into the exact "key" in translation files (for example, msgid in GNU gettext) and translate this afterwards in your translation software:
   label.key: model.article.title

Marshmallow (validation) annotations

Marshmallow is a library that is used inside Invenio to validate json content of API uploads/UI deposits. Marshmallow schema files are generated for you automatically from the model's metadata.

To customize the classes/fields generated, you can add marshmallow annotation into your model definition.

Adding extra arguments on marshmallow fields

You can add list of arguments on the generated marshmallow field:

amount:
type: integer
marshmallow:
arguments: ["strict=True", "just_to_illustrate=1"]

will generate:

amount = ma_fields.IntegerField(strict=True, just_to_illustrate=1)

Validators

A list of validators can be specified. Be sure to import them with imports property.

amount:
type: integer
marshmallow:
validators: ["greater_than_zero"]
imports:
- import: mypkg.utils.greater_than_zero

will generate:

amount = ma_fields.IntegerField(validators=[greater_than_zero])

Using custom marshmallow fields

To use a custom field class (instead, for example, ma_fields.IntegerField) pass the field-class property:

amount:
type: integer
marshmallow:
field-class: MyIntegerField
imports:
- import: mypkg.fields.MyIntegerField

will generate:

amount = MyIntegerField(...)

Generating schema class for objects

For object/nested types, a marshmallow schema class is automatically generated. The class name is created by capitalizing the name of the property and adding Schema to the end of it.

structure:
type: object

will generate:

class StructureSchema(ma.Schema):
....

class MetadataSchema(ma.Schema):
structure = ma_fields.Nested(lambda: StructureSchema())

The lambda is added so that we can have circular usages and do not have to rely on the order of classes in the generated schema file.

Custom class name

A custom name can be supplied:

structure:
type: object
marshmallow:
schema-class: MyClassNameForStructureSchema

will generate:

class MyClassNameForStructureSchema(ma.Schema):
....

class MetadataSchema:
structure = ma_fields.Nested(lambda: MyClassNameForStructureSchema())

Package name

If you do not specify package in class name, it will be generated to the same file. If '.' is present in the class name, the class will be generated to the given package. For example, schema-class: a.b.CSchema will be generated to a/b.py.

Using already written class

Sometimes you would like to use your own class and do not generate it. To do so, set generate: false together with the class name:

structure:
type: object
marshmallow:
generate: false
schema-class: pkg.NotGeneratedSchema

Custom base classes

If you would like to have the schema generated but want to have your own base class instead of ma.Schema, set base-classes marshmallow property. This way you can also include any number of mixins:

structure:
type: object
marshmallow:
base-classes: [ "mypkg.MyBaseClass" ]

Imported packages/classes

You can add imports/import aliases with:

marshmallow:
imports:
- import: mypkg.MyClass
alias: MyClassAlias

will generate:

import mypkg.MyClass as MyClassAlias

If the alias is omitted, and import mypkg.MyClass is generated.

Complete control on fields

In rare cases you might want to skip the generator and just use your own instantiated field. To do so, specify field property:

fld:
marshmallow:
field: "mypkg.MyField(required=False, other_prop='Hello world')"

This will copy the definition, without any processing:

fld = mypkg.MyField(required=False, other_prop='Hello world')
Dump-only and load-only properties

In rare cases cases:

To have a dump-only property (such as one that is generated by a service and can not be directly edited), set write: false on the marshmallow definition. To have a write-only field, add read:false.

auto_field:
marshmallow:
read: true # default value
write: false

Search options annotations

Options for creating facets and sorting rules can be specified within the model.

Facets creation

It is possible to specify a Boolean searchable field at the top level of the model. If this value is set to false, facets will not be created for any fields unless otherwise specified within the fields. By default, this value is set to true. The no-facet-creation setting does not apply to the Invenio default fields (id, $schema, created, updated). In the example below, only facets for invenio fields will be generated, because the searchable option is set to false.

"model": {
"use": "invenio",
"searchable" : false,
"properties": {
"a" : "fulltext+keyword",
"b" : "keyword"

},

Will generate

_id = TermsFacet(field = "id")

created = TermsFacet(field = "created")

updated = TermsFacet(field = "updated")

_schema = TermsFacet(field = "$schema")

If the searchable option is not set or is set to true:

"model": {
"use": "invenio",
"properties": {
"a" : "fulltext+keyword",
"b" : "keyword"

},

Will generate

a_keyword = TermsFacet(field = "a.keyword")

b = TermsFacet(field="b")

_id = TermsFacet(field = "id")

created = TermsFacet(field = "created")

updated = TermsFacet(field = "updated")

_schema = TermsFacet(field = "$schema")

Facets additional definition

You can specify the exact facet value for each field and change the facet key name. For this purposes use field and key in facets object, as in the example below. It is also possible to specify whether to create a facet for a given field using the searchable Boolean value. If the facet object is used and the searchable field is not defined, it is automatically set to true.

"model": {
"properties": {
"a" : {
"type" : "keyword",
"facets": {"searchable": false}
},
"b" : {
"type" : "keyword",
"facets": {"key": "name"}
},
"c" : {
"type" : "keyword",
"facets": {"field": "TermsFacet(field="name")"}
},


},

Will generate


name = TermsFacet(field="b")

c = TermsFacet(field = "name")