The Auslan Corpus annotation files
At present, 201 movies in the Auslan Corpus have annotation files containing annotations at various levels of detail. Annotations are being added to the corpus all the time. The current annotation files have one or more of the following types of annotations:
- identification and glossing of all signs
- identification and glossing of nouns and verbs
- grammatical class ("part of speech") tagging
- identification of clause boundaries
- identification of verb arguments
- tagging of verb arguments for macro-roles and semantic roles
- tagging for the presence or absence of spatial modification
- the identification of periods role shift
- free translation
- literal translation.
The amount of time required for the annotation of signed language texts is enormous and it is anticipated that it will take many years before the Auslan archive becomes sufficiently richly annotated (and hence machine-readable) and qualifies as a true linguistic corpus.
Value-adding the movies in the archive with annotations is time consuming and expensive. These annotation files are not publicly available but will be made to fellow researchers on requests on a data-sharing and data-enrichment basis (i.e., access to existing annotation files will be granted on condition that enriched annotation files are returned to the corpus). Research collaboration is also encouraged.
Click here for a copy of the guidelines that apply to the annotation of the Auslan Corpus generally.
Click here for a copy of the guidelines used to annotate the data in the research project titled "The linguistic use of space in Auslan: semantic roles and grammatical relations in three dimensions" Australian Research Council discovery project #DP0665254 (Louise de Beuzeville & Trevor Johnston).