JB

Jose Benavides

Ph.D. candidate in Linguistics (minor: Computational Linguistics)
Department of Linguistics — Indiana University
Bloomington, IN • josbenav@iu.edugithub.com/josbenav
Available Summer 2026, willing to relocate Research: development • annotation • measurement • LLM evaluation
About

I am a Ph.D. candidate in Linguistics with a minor in Computational Linguistics (CL) at Indiana University specializing in syntax and semantics, with focused training in pragmatics, corpus annotation, and computational methods. My work investigates how speakers convey meaning beyond literal content (implicature, politeness, information structure) and how those phenomena can be annotated and measured reliably computational evaluation.

I am member of the NLP Lab (Prof. Damir Cavar) and the Syntax–Semantics Reading Group (Profs. Emily Hanink & Tom Grano) at Indiana University.

Research interests & methods

Selected topics

  • Pragmatics & evaluation: speech-act distinctions, implicature, politeness, stance, sociopragmatics operationalization for annotations and model evaluation.
  • Measurement & psychometrics: annotation design, inter-annotator reliability, validity, basic psychometric analyses (item response, factor analysis, reliability metrics).
  • Computational methods: dataset preparation, reproducible pipelines (Python/R), LLM evaluation and calibration (LLM-as-judge experiments), corpora extraction and preprocessing.
  • Linguistic fieldwork & formal analysis: question formation, clitic doubling, information structure (field data from Kamënts̈á, Lutuv, and Andean Spanish).

Methodological toolkit

Python (pandas, NumPy, Jupyter) R (tidyverse, psych) Annotation: ELAN, FLEx, SayMore Speech tools: Praat NLP basics: tokenization, corpora extraction, evaluation metrics LaTeX • Git • Linux
Publications & selected work
Benavides, J. & Jurado Eraso, J. (in press). Clefting in wh-es que-questions in Nariñense Andean Spanish. Issues in Hispanic and Lusophone Linguistics (John Benjamins).
Work in progress: Pragmatic inference and information structure; meanign, focus, and presupposition in wh-pseudo/clefts; clitic doubling (Nariñense Andean Spanish); question formation in Lutuv Chin.
See selected projects, code, and writing samples
Selected projects
Annotation protocols for sociopragmatics

Pilot dataset: annotated indirect requests, politeness markers, and speaker stance for pragmatic reasoning.

LLM-as-judge calibration notebook

Notebook that compares human judgments to automated LLM scoring across pragmatic test items and explores simple calibration strategies.

Corpus extraction & preprocessing pipelines

Scripts for tokenization, balancing, and extraction.

Linguistics research

Kamënts̈á (Isolate): phonetics and phonology
Lutuv (Maraic, Kuki-Chin) fieldword and elicitation on question formation, relative clauses, negation, and information-structuring diagnostics
Nariñense Andean Spanish: clefting, wh-questions, clitic doubling, caustives, mood
Romance: clitic behavior