Lexeme

Validate against: http://json-schema.org/draft-07/schema#

Schema ID: http://schemas.digitallinguistics.io/Lexeme-9.0.0.json

Type: object

Description

A lexeme is an abstract entity that represents all the various forms of a word. In DLx, a lexeme refers broadly to any bundle of related senses and forms, whether the item is an individual word, morpheme, idiom, etc. — anything that constitues a semantic unit. Examples of lexemes in English might include be, run up (a phrasal verb), and ‑ing. A lexeme will typically have multiple senses or meanings, and those are listed in the senses property. It is up to the linguist to decide when two meanings are related, and therefore belong to the same lexeme, or when they belong to different lexemes. A lexeme often also has multiple base forms, such as suppletive forms, irregular forms, or morphologically-conditioned forms. For example, the lexeme be has the base forms be, am, is, etc. These are listed in the forms field. The forms field should not be used to list all the regularly-inflected forms of a word. In addition, individual base forms may have phonologically-conditioned allomorphs, and these are listed in the allomorphs field of the form. The lexeme and its forms and senses may also have variations, such as dialectal and idiolectal variants, rapid speech variants, register-based variants, variations in meaning, or even spelling variants. These are listed in the variants fields. By convention, one of the forms of a lexeme is typically chosen as a representative headword or lemma, and this is indicated by the lemma field. For example, the form man in English is typically used as the lemma/headword for the lexeme that includes the forms man and men. Note that the DLx Lexeme does not represent a lexical entry in a dictionary. Dictionaries typically list each base form of a lexeme as a separate lexical entry. The DLx Lexeme lists puts each of these lexical entries together in the forms field instead.

Developer Notes

This is a top-level database object.

Required Properties

  • forms
  • lemma
  • senses

Dependencies

This object has the following dependencies between properties:

  • If this object has the variantType property, it must have the following properties as well:

    • variantOf

Properties

The following properties are defined for this object:

  • Type: type

    Type: string

    Read-only: true

    Description

    The type of object. Must be set to Lexeme.

    This item must have the following value:

    "Lexeme"
  • ID: id

    Description

    A unique database identifier for this Lexeme

  • Lexeme Key: key

    Type: string

    Description

    A human-readable key that uniquely identifies this lexeme or variant within the language. Best practice is for the key to consist of a representation of the lemma form of the word without diacritics, and, if the word is a homonym, the homonym number. However, any value is acceptable as long as it is unique for the language. (Keys do not need to be unique across languages.)

    Regular expression to match: ^[^\s]+$

  • Alternative Analyses: alternativeAnalyses

    Type: array

    Description

    An array of alternative Lexeme objects for this lexeme. This can be useful when working with historical sources or research from other linguists. It allows users to decide on their own analysis, while still maintaining a faithful record of the analyses of the original documentation.

    Items must be unique: true

    Items

    Each item in this array must adhere to the following schema:

    Alternative Analysis: alternativeAnalyses

    Type: object

    Description

    A lexeme object representing an alternative analysis of this lexeme.

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/Lexeme.json

  • Bibliography: bibliography

    Type: array

    Description

    A list of citations to attested bibliographic sources where this lexeme appears or is discussed. For precision’s sake, it is recommended that sources be listed for specific senses or forms of a lexeme whenever possible.

    Items must be unique: true

    Items

    Each item in this array must adhere to the following schema:

    Citation: bibliography

    Type: object

    Description

    A citation to an attested source for this lexeme.

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/Citation.json

  • Citation Form: citationForm

    Type: object

    Description

    The citation form of a lexeme is the form given when spoken in isolation, which may be different from its lemma form. For example, in Swahili the citation form of a verb is typically the infinitive, e.g. kuandika to write, even though ‑andika is typically used as its lemma form. It may be represented in multiple orthographies. Do not include leading or trailing tokens (e.g. hyphens, equal signs) in this field.

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/Transcription.json

  • Date Created: dateCreated

    Type: string

    Description

    The date and optionally time that this lexeme was created

    This item must also validate against exactly one of the following schemas:

    • Format: date

    • Format: date-time

  • Date Modified: dateModified

    Type: string

    Description

    The date and optionally time that this lexeme was last modified

    This item must also validate against exactly one of the following schemas:

    • Format: date

    • Format: date-time

  • Examples: examples

    Type: array

    Description

    A collection of examples illustrating this lexeme in use. Each example is an Utterance from a Text. The Utterance number should be indicated in the index field of the Database Reference object. If using a full Utterance object rather than a Database Reference object, the key field should be included. For precision’s sake, it is recommended that examples be given for individual senses and forms rather than the entire lexeme when possible.

    Items must be unique: true

    Items

    Each item in this array must adhere to the following schema:

    Example Utterance (Database Reference): examples

    Type: object

    Description

    A database reference to an Utterance object

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/DatabaseReference.json

  • Features: features

    Type: object

    Description

    A set of inflectional features for this lexeme (used primarily with grammatical morphemes). Each property should be the name of a feature type (e.g. case, person, number, gender, nounClass, etc.), and its value should be the value for that feature, as a string (e.g. nominative, 1, singular, masculine, etc.). Features may be written more than once, in different languages. For example, a morpheme may have the feature case: accusative (English) as well as caso: acusativo (Spanish).

    This item must also validate against all of the following schemas:

    • Tags

      Type: object

      Description

      The Features object must be a Tags object

      Referenced Schema

      This item must validate against the following schema:

      http://schemas.digitallinguistics.io/Tags.json

    • Additional Properties

      Any additional properties must adhere to the following schema:

      Feature: 1

      Type: string

      Description

      Individual features must be represented as Strings

      Minimum length: 1

  • Lexeme Base Forms: forms

    Type: array

    Description

    A collection of base forms for this lexeme, i.e. the different forms that this lexeme or morpheme may take, exclusive of its regular inflectional variants. Each base form typically corresponds to a lexical entry in a dictionary. For example: the lexeme man would include the forms man and men; the lexeme run would include the forms run and ran, but not runs or running, because these are regularly-inflected and therefore predictable; the lexeme be would include am, are, is, etc., because these are irregular / suppletive forms, but would not include being.

    Items must be unique: true

    Items

    Each item in this array must adhere to the following schema:

    Lexeme Base Form: forms

    Type: object

    Description

    One of the base forms of this lexeme

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/LexemeForm.json

  • Language (DatabaseReference): language

    Type: object

    Description

    The language of this Lexeme. This property is most useful when working with lexical data from multiple languages.

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/DatabaseReference.json

  • Lemma: lemma

    Type: object

    Description

    A lemma is the form of a lexeme conventionally used to represent that lexeme. It is typically used as the headword in dictionary entries. It may differ drastically from the citation form. For example, the form be is typically used as the lemma form of the English verb forms am, are, is, etc. Lemmas may be represented in multiple orthographies. Do not include any leading or trailing tokens (e.g. hyphens, equal signs).

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/Transcription.json

  • Lexical Relations: lexicalRelations

    Type: array

    Description

    A list of the lexical relations that this lexeme has to other lexemes. Each item is a Database Reference, and must also have a property called relation, indicating the type of lexical relation. For precision’s sake, lexical relations should be specified for individual senses rather than the entire lexeme whenever possible.

    Items must be unique: true

    Items

    Each item in this array must adhere to the following schema:

    lexicalRelations

    This item must also validate against all of the following schemas:

    • Lexeme (Database Reference)

      Type: object

      Description

      A database reference representing a lexical relation between two lexemes or senses. Note: The database reference must also have a relation property specified, indicating the type of lexical relation.

      Referenced Schema

      This item must validate against the following schema:

      http://schemas.digitallinguistics.io/DatabaseReference.json

    • Required Properties

      • relation

      Properties

      The following properties are defined for this object:

      • Relation Type: relation

        Type: string

        Description

        The type of lexical relation that holds between the current item and the referenced Lexeme. Can also be used for general cross-references (a compare relation) or historical relationships (a derivedFrom or originOf relation). Examples: antonym, synonym, cognate, derivedFrom, originOf, compare, partOf, hypernymOf, hyponymOf.

        Referenced Schema

        This item must validate against the following schema:

        http://schemas.digitallinguistics.io/Abbreviation.json

  • Lexeme Type: lexemeType

    Type: string

    Description

    The type of lexeme this is (either lexical or grammatical). The primary purpose of this field is to make it easy to style interlinear glosses in small caps.

    Allowed Values

    • "lexical"
    • "grammatical"
  • Link: link

    Type: string

    Description

    A URL where a presentational format for this resource may be viewed

    Format: uri

  • Literal Meaning: literalMeaning

    Description

    The literal meaning of the lexeme, optionally in multiple languages

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/MultiLangString.json

  • Media: media

    Type: array

    Description

    Media items associated with this lexeme, such as recordings of the citation form of the word, pictures of the item this word refers to, or videos of the action being performed. If a media item pertains a specific sense or form, it should be placed in that sense or form’s media field instead.

    Items must be unique: true

    Items

    Each item in this array must adhere to the following schema:

    Media Item (Database Reference): media

    Type: object

    Description

    A database reference to a media item associated with this lexeme

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/DatabaseReference.json

  • Morpheme Type: morphemeType

    Description

    The morphological type for this lexeme. Example values might include "root", "stem", "prefix", "suffix", "compound", "phrasal verb", etc.

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/MultiLangString.json

  • Notes: notes

    Type: array

    Description

    A collection of notes about this lexeme. Each Note object must have its noteType property specified. Notes with a note type of private are not intended for publication in dictionaries, while other types of notes are. For precision’s sake, it is recommended that notes be attached to specific forms or senses whenever possible.

    Items must be unique: true

    Items

    Each item in this array must adhere to the following schema:

    notes

    This item must also validate against all of the following schemas:

    • Note

      Type: object

      Description

      A note about this lexeme

      Referenced Schema

      This item must validate against the following schema:

      http://schemas.digitallinguistics.io/Note.json

    • Required Properties

      • noteType

      Properties

      The following properties are defined for this object:

      • Note Type: noteType

        Type: string

        Description

        The type of note about this lexeme

        Allowed Values

        • "private"
        • "general"
        • "anthropology"
        • "discourse"
        • "encyclopedic"
        • "grammar"
        • "phonology"
        • "semantics"
        • "sociocultural"
        • "pragmatic"
  • Senses: senses

    Type: array

    Description

    A collection of meanings or senses for this lexeme. It is up to the linguist to decide whether two uses of a lexeme are distinct enough to be considered separate senses.

    Minimum number of items: 1

    Items must be unique: true

    Items

    Each item in this array must adhere to the following schema:

    Sense: senses

    Type: object

    Description

    A sense or meaning for this lexeme.

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/Sense.json

  • Sources: sources

    Type: array

    Description

    A list of the initials of the speaker or speakers who are the attested sources for this lexeme. For precision’s sake, sources should be listed for specific senses or forms of a lexeme whenever possible. These sources should be DatabaseReferences to a Person object.

    Items must be unique: true

    Items

    Each item in this array must adhere to the following schema:

    Source: sources

    Type: object

    Description

    An attested source for this lexeme. This will often be the initials of a speaker.

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/DatabaseReference.json

  • Tags: tags

    Type: object

    Description

    A set of tags for this lexeme

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/Tags.json

  • URL: url

    Type: string

    Description

    A URL where a JSON representation of this lexeme may be retrieved

    Format: uri

  • Variant Of: variantOf

    Type: object

    Description

    If this lexeme is a variant of another lexeme, this field should contain a reference to the other Lexeme. Lexemes may only be variants of one other Lexeme.

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/DatabaseReference.json

  • Variants: variants

    Type: array

    Description

    A list of variants of this this lexeme. This field should be used for dialectal and idiolectal variants, rapid and careful speech variants, register-based variants, variations in meaning, spelling variants, etc. It should not be used for phonologically-conditioned variants (use the allomorphs field of a specific form instead) or morphologically-conditioned variants (use the forms field instead). Each variant should have its variantType property specified.

    Items must be unique: true

    Items

    Each item in this array must adhere to the following schema:

    variants

    This item must also validate against all of the following schemas:

    • Variant (Database Reference)

      Type: object

      Description

      A database reference to a variant of this lexeme. Note: The Database Reference object must have a variantType property, indicating the type of variant.

      Referenced Schema

      This item must validate against the following schema:

      http://schemas.digitallinguistics.io/DatabaseReference.json

    • Required Properties

      • variantType

      Properties

      The following properties are defined for this object:

      • Variant Type: variantType

        Description

        This field is be used to specify the type of variant. Possible values might be a person’s name (representing an idiolectal variant), or simply idiolectal, or dialectal (or the name of the dialect), or rapid speech, etc. May be in multiple languages.

        Referenced Schema

        This item must validate against the following schema:

        http://schemas.digitallinguistics.io/MultiLangString.json

  • Variant Type: variantType

    Description

    If this lexeme is a variant of another lexeme or sense, this field can be used to specify the type of variant. Possible values might be a person’s name (representing an idiolectal variant), or simply idiolectal, or dialectal (or the name of the dialect), or rapid speech, etc. Optionally in multiple languages.

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/MultiLangString.json

Additional Properties

Any additional properties must adhere to the following schema:

This schema imposes no restrictions. All values are valid.

Examples

The following are example values for this schema:

  • {
      "alternativeAnalyses": [
        {
          "forms": [
            {
              "transcription": {
                "APA": "kʼuš"
              }
            }
          ],
          "lemma": {
            "APA": "kʼuš",
            "IPA": "kˀuš"
          },
          "senses": [
            {
              "argumentStructure": "eat(agent, patient)",
              "gloss": "eat"
            }
          ]
        }
      ],
      "bibliography": [
        {
          "citationKey": "Duralde1802"
        }
      ],
      "citationForm": {
        "APA": "kʼušti",
        "IPA": "kˀušti",
        "Mod": "guxti",
        "Swad": "gušti"
      },
      "dateCreated": "2018-11-03T00:23:55.842Z",
      "dateModified": "2018-11-03T00:24:04.730Z",
      "id": "783cbaa8-befe-4049-bfe4-bb5688173780",
      "key": "guxt-(1)",
      "language": {
        "abbreviation": "chiti",
        "id": "3d91a22d-005b-4ec5-8151-09e44629f58f"
      },
      "lemma": {
        "APA": "kʼušt",
        "IPA": "kˀušt",
        "Mod": "guxt",
        "Swad": "gušt"
      },
      "link": "https://explorer.digitallinguistics.io/languages/Chitimacha/lexemes/guxt",
      "type": "Lexeme",
      "url": "https://data.digitallinguistics.io/languages/Chitimacha/lexemes/guxt",
      "forms": [
        {
          "allomorphs": [
            {
              "environments": [
                "_m"
              ],
              "syllableStructure": "CVC",
              "transcription": {
                "APA": "kʼuš",
                "IPA": "kˀuš",
                "Mod": "gux",
                "Swad": "guš"
              }
            }
          ],
          "components": [
            {
              "id": "e0e2dbdb-f89b-4002-bd46-4f6803bb4391",
              "key": "gux"
            },
            {
              "id": "797f0f6b-3024-4d0c-bbfc-a1bc7cc48b81",
              "key": "t1"
            }
          ],
          "inflectionClass": "main verb",
          "link": "https://data.digitallinguistics.io/languages/Chitimacha/lexemes/guxt/forms/guxt",
          "media": [
            {
              "id": "24a14428-f2e7-47a8-8d4f-00c4437f6c3a",
              "filename": "guxt.wav"
            }
          ],
          "morphemeType": "stem",
          "sources": [
            {
              "source": {
                "abbreviation": "DWH"
              }
            }
          ],
          "syllableStructure": "CVCC",
          "transcription": {
            "APA": "kʼušt",
            "IPA": "kˀušt",
            "Mod": "guxt",
            "Swad": "gušt"
          },
          "variants": [
            {
              "id": "0f765e8d-1401-4c01-a88f-3092d077813b",
              "key": "guxma",
              "variantType": "pluractional"
            }
          ]
        }
      ],
      "senses": [
        {
          "argumentStructure": "eat(agent, patient)",
          "category": "transitive verb",
          "definition": "to eat",
          "gloss": "eat",
          "link": "https://data.digitallinguistics.io/languages/Chitimacha/lexemes/guxt/senses/2"
        }
      ]
    }