Utterance

Validate against: http://json-schema.org/draft-07/schema#

Schema ID: http://schemas.digitallinguistics.io/Utterance-6.0.0.json

Type: object

Description

The term utterance is intentionally ambiguous, and refers to any unit of a text above the word level. The DLx framework imposes no requirements regarding this size of this unit or how segmentation of the text into units should be accomplished. The user may choose to segment a text based on prosodic units, turns, sentences, or any other appropriate subdivision.

Required Properties

  • transcription
  • translation

Dependencies

This object has the following dependencies between properties:

  • If this object has the startTime property, it must have the following properties as well:

    • endTime
  • If this object has the endTime property, it must have the following properties as well:

    • startTime

Properties

The following properties are defined for this object:

  • Type: type

    Type: string

    Read-only: true

    Description

    The type of object. Must be set to Utterance.

    This item must have the following value:

    "Utterance"
  • Key: key

    Type: string

    Description

    A key which uniquely identifies this Utterance within the Text. The key for an Utterance consists of the abbreviation of the Text, a period, dash, or underscore, and then the number of this Utterance within the Text (index starts at 1). For example, the third Utterance of a Text with the abbreviation A would be A.3. Keys should be unique within a corpus.

    Regular expression to match: ^[(a-z)|(A-Z)|(0-9)]+[-_\.][0-9]{1,3}$

  • End Time: endTime

    Type: number

    Description

    The time that the speaker finishes producing this Utterance within the media file(s) associated with this Text. The timestamp should be formatted in SS.MMM (seconds and milliseconds).

    Minimum: 0.001

  • Grammaticality & Acceptability judgments: judgments

    Type: array

    Description

    An array of grammaticality judgments or acceptability judgments for this utterance.

    Items must be unique: true

    Items

    Each item in this array must adhere to the following schema:

    Grammaticality / Acceptability judgment: judgments

    Type: object

    Description

    A judgment of the grammaticality or acceptability of the utterance. Some linguists distinguish between grammaticality and acceptability, such that some utterances may be considered grammatical but not acceptable. Unacceptable utterances are typically those which are semantically or pragmatically odd in context. It is strongly recommended that a note be included with each judgment, and the source of the judgment indicated in the note.

    Required Properties

    • judgment
    • judgmentType

    Properties

    The following properties are defined for this object:

    • Grammaticality / Acceptability judgment: judgment

      Type: number

      Description

      The grammaticality / acceptability judgment for this utterance, represented as a value between 0 (completely ungrammatical / unacceptable) and 1 (completely grammatical / acceptable). Simple binary judgments (“good” vs. “bad”, “grammatical” vs. “ungrammatical”) can simply use the values 0 and 1. Scalar judgments should be normalized to a value between 0 and 1. For example, a scale of 1-3 asterisks for grammaticality could be represented as follows: 0.00 = ***, 0.33 = **, 0.66 = *, 1.00 = completely grammatical.

    • judgment Type: judgmentType

      Type: string

      Description

      Indicates whether the judgment is an acceptability judgment or a grammaticality judgment.

      Allowed Values

      • "acceptability"
      • "grammaticality"
    • judgment Note: note

      Type: object

      Description

      A note about this judgment. It is strongly recommended that every judgment be accompanied by a note, indicating the speaker / source of the judgment, and if possible an explanation for unacceptable or ungrammatical judgments.

      Referenced Schema

      This item must validate against the following schema:

      http://schemas.digitallinguistics.io/Note.json

    Additional Properties

    Any additional properties must adhere to the following schema:

    No values are valid for this schema.

  • Language: language

    Type: string

    Description

    The key for the Language used in this Utterance, e.g. spa or eng. If the text is labeled with a Language, all its Utterance are assumed to be the same Language unless labeled otherwise. Likewise, if a Utterance is given a Language, all its words are assumed to be the same Language unless the word is labeled otherwise.

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/Abbreviation.json

  • Link: link

    Type: string

    Description

    A URL where a presentational format for this resource may be viewed

    Format: uri

  • Literal Translation: literal

    Type: object

    Description

    The literal translations for this Utterance, optionally in multiple languages.

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/Translation.json

  • Phonetic: phonetic

    Type: string

    Description

    The phonetic transcription for this Utterance in IPA. Only valid IPA characters are allowed. The transcription should not include phonetic brackets.

  • Notes: notes

    Type: array

    Description

    A collection of notes about this Utterance

    Items must be unique: true

    Items

    Each item in this array must adhere to the following schema:

    Note: notes

    Type: object

    Description

    A note about this Utterance

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/Note.json

  • Speaker: speaker

    Type: object

    Description

    The Person who produced (uttered, signed, spoke, sung) this Utterance. The value of this field must match one of the people listed in the contributors array of the Text. If the text has a single contributor with the role of speaker, that speaker is assumed to be the speaker for all Utterances in the Text. If multiple contributors with a speaker role are included in a text, each Utterance must have its speaker attribute specified.

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/Person.json

  • Source: source

    Type: object

    Description

    A citation to the publication where this utterance was taken from. When the utterance is not part of a text, or when the text consists of random utterances taken from different places, this field is strongly recommended.

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/Citation.json

  • Start Time: startTime

    Type: number

    Description

    The time that the speaker begins producing this Utterance within the media file(s) associated with this Text. The timestamp should be formatted in SS.MMM (seconds and milliseconds).

    Minimum: 0

  • Tags: tags

    Type: object

    Description

    A set of tags for this Utterance

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/Tags.json

  • Transcript: transcript

    Type: object

    Description

    A transcript of this Utterance, including things like prosodic markup, overlap, pauses, and various other discourse features. This field is intended for use by those doing discourse or conversation analysis, who need to mark up their text without affecting the phonemic transcription (in the transcription property). The transcript may be in multiple orthographies, or representational systems (e.g. you might have a CA transcript and a DT transcript, for discourse transcripts using Conversation Analysis and Discourse Transcription conventions respectively).

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/Transcription.json

    Minimum number of properties: 1

  • Transcription: transcription

    Type: object

    Description

    The transcriptions for this Utterance, optionally in multiple orthographies. This field is intended for use with purely phonemic / morphophonemic transcriptions. Punctuation should generally be avoided. To add punctuation and other discourse-level transcriptional features, use the transcript property. The transcription must be provided in at least one orthography.

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/Transcription.json

    Minimum number of properties: 1

  • Translation: translation

    Type: object

    Description

    The free translations for this Utterance, optionally in multiple languages. The translation must be provided in at least one language.

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/Translation.json

  • URL: url

    Type: string

    Description

    The URL where this Utterance can be retrieved in JSON format

    Format: uri

  • Words: words

    Type: array

    Description

    A collection of the word tokens contained in this Utterance. Tokens do not need to be unique.

    Items must be unique: false

    Items

    Each item in this array must adhere to the following schema:

    Word: words

    Type: object

    Description

    A Word object

    Referenced Schema

    This item must validate against the following schema:

    http://schemas.digitallinguistics.io/Word.json

Additional Properties

Any additional properties must adhere to the following schema:

This schema imposes no restrictions. All values are valid.

Examples

The following are example values for this schema:

  • {
      "judgments": [
        {
          "judgment": 0.66,
          "judgmentType": "acceptability",
          "note": {
            "source": {
              "abbreviation": "BP"
            },
            "text": "Speaker B found this utterance odd because the first two words were contracted."
          }
        }
      ],
      "literal": {
        "eng": "one day a man"
      },
      "phonetic": "waʃtˀunkˀu ʔasi",
      "speaker": {
        "familyName": "Paul",
        "givenName": "Benjamin"
      },
      "source": {
        "citationKey": "Swadesh1946"
      },
      "transcript": {
        "Mod": "Waxdungu qasi,"
      },
      "transcription": {
        "Mod": "waxdungu qasi"
      },
      "translation": {
        "eng": "One day a man,"
      },
      "words": []
    }