Lexeme
Validate against: http://json-schema.org/draft-07/schema#
Schema ID: http://schemas.digitallinguistics.io/Lexeme-9.0.0.json
Type: object
Description
A lexeme is an abstract entity that represents all the various forms of a word. In DLx, a lexeme refers broadly to any bundle of related senses and forms, whether the item is an individual word, morpheme, idiom, etc. — anything that constitues a semantic unit. Examples of lexemes in English might include be, run up (a phrasal verb), and ‑ing. A lexeme will typically have multiple senses or meanings, and those are listed in the senses
property. It is up to the linguist to decide when two meanings are related, and therefore belong to the same lexeme, or when they belong to different lexemes. A lexeme often also has multiple base forms, such as suppletive forms, irregular forms, or morphologically-conditioned forms. For example, the lexeme be has the base forms be, am, is, etc. These are listed in the forms
field. The forms
field should not be used to list all the regularly-inflected forms of a word. In addition, individual base forms may have phonologically-conditioned allomorphs, and these are listed in the allomorphs
field of the form. The lexeme and its forms and senses may also have variations, such as dialectal and idiolectal variants, rapid speech variants, register-based variants, variations in meaning, or even spelling variants. These are listed in the variants
fields. By convention, one of the forms of a lexeme is typically chosen as a representative headword or lemma, and this is indicated by the lemma
field. For example, the form man in English is typically used as the lemma/headword for the lexeme that includes the forms man and men. Note that the DLx Lexeme does not represent a lexical entry in a dictionary. Dictionaries typically list each base form of a lexeme as a separate lexical entry. The DLx Lexeme lists puts each of these lexical entries together in the forms
field instead.
Required Properties
forms
lemma
senses
Dependencies
This object has the following dependencies between properties:
-
If this object has the
variantType
property, it must have the following properties as well:- variantOf
Properties
The following properties are defined for this object:
Type:
type
Type:
string
Read-only:
true
Description
The type of object. Must be set to
Lexeme
.This item must have the following value:
"Lexeme"
ID:
id
Description
A unique database identifier for this Lexeme
Lexeme Key:
key
Type:
string
Description
A human-readable key that uniquely identifies this lexeme or variant within the language. Best practice is for the key to consist of a representation of the lemma form of the word without diacritics, and, if the word is a homonym, the homonym number. However, any value is acceptable as long as it is unique for the language. (Keys do not need to be unique across languages.)
Regular expression to match:
^[^\s]+$
Alternative Analyses:
alternativeAnalyses
Type:
array
Description
An array of alternative Lexeme objects for this lexeme. This can be useful when working with historical sources or research from other linguists. It allows users to decide on their own analysis, while still maintaining a faithful record of the analyses of the original documentation.
Items must be unique:
true
Items
Each item in this array must adhere to the following schema:
Alternative Analysis:
alternativeAnalyses
Type:
object
Description
A lexeme object representing an alternative analysis of this lexeme.
Referenced Schema
This item must validate against the following schema:
Bibliography:
bibliography
Type:
array
Description
A list of citations to attested bibliographic sources where this lexeme appears or is discussed. For precision’s sake, it is recommended that sources be listed for specific senses or forms of a lexeme whenever possible.
Items must be unique:
true
Items
Each item in this array must adhere to the following schema:
Citation:
bibliography
Type:
object
Description
A citation to an attested source for this lexeme.
Referenced Schema
This item must validate against the following schema:
Citation Form:
citationForm
Type:
object
Description
The citation form of a lexeme is the form given when spoken in isolation, which may be different from its lemma form. For example, in Swahili the citation form of a verb is typically the infinitive, e.g. kuandika
to write
, even though ‑andika is typically used as its lemma form. It may be represented in multiple orthographies. Do not include leading or trailing tokens (e.g. hyphens, equal signs) in this field.Referenced Schema
This item must validate against the following schema:
Date Created:
dateCreated
Type:
string
Description
The date and optionally time that this lexeme was created
This item must also validate against exactly one of the following schemas:
Format:
date
Format:
date-time
Date Modified:
dateModified
Type:
string
Description
The date and optionally time that this lexeme was last modified
This item must also validate against exactly one of the following schemas:
Format:
date
Format:
date-time
Examples:
examples
Type:
array
Description
A collection of examples illustrating this lexeme in use. Each example is an Utterance from a Text. The Utterance number should be indicated in the
index
field of the Database Reference object. If using a full Utterance object rather than a Database Reference object, thekey
field should be included. For precision’s sake, it is recommended that examples be given for individual senses and forms rather than the entire lexeme when possible.Items must be unique:
true
Items
Each item in this array must adhere to the following schema:
Example Utterance (Database Reference):
examples
Type:
object
Description
A database reference to an Utterance object
Referenced Schema
This item must validate against the following schema:
Features:
features
Type:
object
Description
A set of inflectional features for this lexeme (used primarily with grammatical morphemes). Each property should be the name of a feature type (e.g.
case
,person
,number
,gender
,nounClass
, etc.), and its value should be the value for that feature, as a string (e.g.nominative
,1
,singular
,masculine
, etc.). Features may be written more than once, in different languages. For example, a morpheme may have the featurecase: accusative
(English) as well ascaso: acusativo
(Spanish).This item must also validate against all of the following schemas:
Tags
Type:
object
Description
The Features object must be a Tags object
Referenced Schema
This item must validate against the following schema:
Additional Properties
Any additional properties must adhere to the following schema:
Feature:
1
Type:
string
Description
Individual features must be represented as Strings
Minimum length:
1
Lexeme Base Forms:
forms
Type:
array
Description
A collection of base forms for this lexeme, i.e. the different forms that this lexeme or morpheme may take, exclusive of its regular inflectional variants. Each base form typically corresponds to a lexical entry in a dictionary. For example: the lexeme man would include the forms man and men; the lexeme run would include the forms run and ran, but not runs or running, because these are regularly-inflected and therefore predictable; the lexeme be would include am, are, is, etc., because these are irregular / suppletive forms, but would not include being.
Items must be unique:
true
Items
Each item in this array must adhere to the following schema:
Lexeme Base Form:
forms
Type:
object
Description
One of the base forms of this lexeme
Referenced Schema
This item must validate against the following schema:
Language (DatabaseReference):
language
Type:
object
Description
The language of this Lexeme. This property is most useful when working with lexical data from multiple languages.
Referenced Schema
This item must validate against the following schema:
Lemma:
lemma
Type:
object
Description
A lemma is the form of a lexeme conventionally used to represent that lexeme. It is typically used as the headword in dictionary entries. It may differ drastically from the citation form. For example, the form be is typically used as the lemma form of the English verb forms am, are, is, etc. Lemmas may be represented in multiple orthographies. Do not include any leading or trailing tokens (e.g. hyphens, equal signs).
Referenced Schema
This item must validate against the following schema:
Lexical Relations:
lexicalRelations
Type:
array
Description
A list of the lexical relations that this lexeme has to other lexemes. Each item is a Database Reference, and must also have a property called
relation
, indicating the type of lexical relation. For precision’s sake, lexical relations should be specified for individual senses rather than the entire lexeme whenever possible.Items must be unique:
true
Items
Each item in this array must adhere to the following schema:
lexicalRelations
This item must also validate against all of the following schemas:
Lexeme (Database Reference)
Type:
object
Description
A database reference representing a lexical relation between two lexemes or senses. Note: The database reference must also have a
relation
property specified, indicating the type of lexical relation.Referenced Schema
This item must validate against the following schema:
Required Properties
relation
Properties
The following properties are defined for this object:
Relation Type:
relation
Type:
string
Description
The type of lexical relation that holds between the current item and the referenced Lexeme. Can also be used for general cross-references (a
compare
relation) or historical relationships (aderivedFrom
ororiginOf
relation). Examples:antonym
,synonym
,cognate
,derivedFrom
,originOf
,compare
,partOf
,hypernymOf
,hyponymOf
.Referenced Schema
This item must validate against the following schema:
Lexeme Type:
lexemeType
Type:
string
Description
The type of lexeme this is (either lexical or grammatical). The primary purpose of this field is to make it easy to style interlinear glosses in small caps.
Allowed Values
"lexical"
"grammatical"
Link:
link
Type:
string
Description
A URL where a presentational format for this resource may be viewed
Format:
uri
Literal Meaning:
literalMeaning
Description
The literal meaning of the lexeme, optionally in multiple languages
Referenced Schema
This item must validate against the following schema:
Media:
media
Type:
array
Description
Media items associated with this lexeme, such as recordings of the citation form of the word, pictures of the item this word refers to, or videos of the action being performed. If a media item pertains a specific sense or form, it should be placed in that sense or form’s
media
field instead.Items must be unique:
true
Items
Each item in this array must adhere to the following schema:
Media Item (Database Reference):
media
Type:
object
Description
A database reference to a media item associated with this lexeme
Referenced Schema
This item must validate against the following schema:
Morpheme Type:
morphemeType
Description
The morphological type for this lexeme. Example values might include
"root"
,"stem"
,"prefix"
,"suffix"
,"compound"
,"phrasal verb"
, etc.Referenced Schema
This item must validate against the following schema:
Notes:
notes
Type:
array
Description
A collection of notes about this lexeme. Each Note object must have its
noteType
property specified. Notes with a note type ofprivate
are not intended for publication in dictionaries, while other types of notes are. For precision’s sake, it is recommended that notes be attached to specific forms or senses whenever possible.Items must be unique:
true
Items
Each item in this array must adhere to the following schema:
notes
This item must also validate against all of the following schemas:
Note
Type:
object
Description
A note about this lexeme
Referenced Schema
This item must validate against the following schema:
Required Properties
noteType
Properties
The following properties are defined for this object:
Note Type:
noteType
Type:
string
Description
The type of note about this lexeme
Allowed Values
"private"
"general"
"anthropology"
"discourse"
"encyclopedic"
"grammar"
"phonology"
"semantics"
"sociocultural"
"pragmatic"
Senses:
senses
Type:
array
Description
A collection of meanings or senses for this lexeme. It is up to the linguist to decide whether two uses of a lexeme are distinct enough to be considered separate senses.
Minimum number of items:
1
Items must be unique:
true
Items
Each item in this array must adhere to the following schema:
Sense:
senses
Type:
object
Description
A sense or meaning for this lexeme.
Referenced Schema
This item must validate against the following schema:
Sources:
sources
Type:
array
Description
A list of the initials of the speaker or speakers who are the attested sources for this lexeme. For precision’s sake, sources should be listed for specific senses or forms of a lexeme whenever possible. These sources should be DatabaseReferences to a Person object.
Items must be unique:
true
Items
Each item in this array must adhere to the following schema:
Source:
sources
Type:
object
Description
An attested source for this lexeme. This will often be the initials of a speaker.
Referenced Schema
This item must validate against the following schema:
Tags:
tags
Type:
object
Description
A set of tags for this lexeme
Referenced Schema
This item must validate against the following schema:
URL:
url
Type:
string
Description
A URL where a JSON representation of this lexeme may be retrieved
Format:
uri
Variant Of:
variantOf
Type:
object
Description
If this lexeme is a variant of another lexeme, this field should contain a reference to the other Lexeme. Lexemes may only be variants of one other Lexeme.
Referenced Schema
This item must validate against the following schema:
Variants:
variants
Type:
array
Description
A list of variants of this this lexeme. This field should be used for dialectal and idiolectal variants, rapid and careful speech variants, register-based variants, variations in meaning, spelling variants, etc. It should not be used for phonologically-conditioned variants (use the
allomorphs
field of a specific form instead) or morphologically-conditioned variants (use theforms
field instead). Each variant should have itsvariantType
property specified.Items must be unique:
true
Items
Each item in this array must adhere to the following schema:
variants
This item must also validate against all of the following schemas:
Variant (Database Reference)
Type:
object
Description
A database reference to a variant of this lexeme. Note: The Database Reference object must have a
variantType
property, indicating the type of variant.Referenced Schema
This item must validate against the following schema:
Required Properties
variantType
Properties
The following properties are defined for this object:
Variant Type:
variantType
Description
This field is be used to specify the type of variant. Possible values might be a person’s name (representing an idiolectal variant), or simply
idiolectal
, ordialectal
(or the name of the dialect), orrapid speech
, etc. May be in multiple languages.Referenced Schema
This item must validate against the following schema:
Variant Type:
variantType
Description
If this lexeme is a variant of another lexeme or sense, this field can be used to specify the type of variant. Possible values might be a person’s name (representing an idiolectal variant), or simply
idiolectal
, ordialectal
(or the name of the dialect), orrapid speech
, etc. Optionally in multiple languages.Referenced Schema
This item must validate against the following schema:
Additional Properties
Any additional properties must adhere to the following schema:
This schema imposes no restrictions. All values are valid.
Examples
The following are example values for this schema:
-
{ "alternativeAnalyses": [ { "forms": [ { "transcription": { "APA": "kʼuš" } } ], "lemma": { "APA": "kʼuš", "IPA": "kˀuš" }, "senses": [ { "argumentStructure": "eat(agent, patient)", "gloss": "eat" } ] } ], "bibliography": [ { "citationKey": "Duralde1802" } ], "citationForm": { "APA": "kʼušti", "IPA": "kˀušti", "Mod": "guxti", "Swad": "gušti" }, "dateCreated": "2018-11-03T00:23:55.842Z", "dateModified": "2018-11-03T00:24:04.730Z", "id": "783cbaa8-befe-4049-bfe4-bb5688173780", "key": "guxt-(1)", "language": { "abbreviation": "chiti", "id": "3d91a22d-005b-4ec5-8151-09e44629f58f" }, "lemma": { "APA": "kʼušt", "IPA": "kˀušt", "Mod": "guxt", "Swad": "gušt" }, "link": "https://explorer.digitallinguistics.io/languages/Chitimacha/lexemes/guxt", "type": "Lexeme", "url": "https://data.digitallinguistics.io/languages/Chitimacha/lexemes/guxt", "forms": [ { "allomorphs": [ { "environments": [ "_m" ], "syllableStructure": "CVC", "transcription": { "APA": "kʼuš", "IPA": "kˀuš", "Mod": "gux", "Swad": "guš" } } ], "components": [ { "id": "e0e2dbdb-f89b-4002-bd46-4f6803bb4391", "key": "gux" }, { "id": "797f0f6b-3024-4d0c-bbfc-a1bc7cc48b81", "key": "t1" } ], "inflectionClass": "main verb", "link": "https://data.digitallinguistics.io/languages/Chitimacha/lexemes/guxt/forms/guxt", "media": [ { "id": "24a14428-f2e7-47a8-8d4f-00c4437f6c3a", "filename": "guxt.wav" } ], "morphemeType": "stem", "sources": [ { "source": { "abbreviation": "DWH" } } ], "syllableStructure": "CVCC", "transcription": { "APA": "kʼušt", "IPA": "kˀušt", "Mod": "guxt", "Swad": "gušt" }, "variants": [ { "id": "0f765e8d-1401-4c01-a88f-3092d077813b", "key": "guxma", "variantType": "pluractional" } ] } ], "senses": [ { "argumentStructure": "eat(agent, patient)", "category": "transitive verb", "definition": "to eat", "gloss": "eat", "link": "https://data.digitallinguistics.io/languages/Chitimacha/lexemes/guxt/senses/2" } ] }
Developer Notes
This is a top-level database object.