Rhythm as Authorship
Temporal Agency and Neurocomputational Mechanisms of Emotional Engagement in Interactive Music
Author: Ian Xed
Affiliation: Eastern Mediterranean University / Neurobinary
Abstract
Interactive Music redistributes authorship: participants control note timing (and thereby character motion and world interactions) via taps, while a lightweight conditional autoregressive transformer (symbolic next-token prediction <50 ms on-device, cross-attention conditioned on cached chord/harmony stream) supplies pitch, duration, and velocity. We argue temporal control confers substantial emotional authorship in this hybrid system. Established evidence shows rhythm engages motor-subcortical circuits and dopaminergic anticipation more directly than pitch structure alone (Grahn & Brett, 2007; Chen et al., 2008; Salimpoor et al., 2011). Predictive-processing accounts (Friston, 2010; Vuust & Witek, 2014) indicate temporal priors structure when prediction errors occur, modulating affective intensity via precision weighting. User-generated timing creates immediate action–outcome contingencies that satisfy core criteria for sense of agency (Haggard, 2017). The Limbic Prior Hypothesis (speculative) proposes that self-timed rhythmic patterns interface directly with evolutionarily conserved subcortical affective systems (Panksepp, 1998) via low-level priors relatively independent of cortical tonal interpretation. Operational definitions of authorship and medium-specific testable predictions are provided.
Keywords: rhythm, active inference, predictive processing, sense of agency, dopamine, motor entrainment, interactive music
Introduction
Interactive Music operationalises a precise division: human motor timing (inter-onset intervals chosen by tap) controls both auditory events and visuomotor outcomes; AI satisfies harmonic/rhythmic constraints given base-track conditioning. This raises a testable question: does temporal control alone suffice for emotional authorship in a system where pitch content is algorithmically coherent?
We define emotional authorship mechanistically as: (1) reliable causal influence over sensory outcomes (sense of agency), (2) modulation of affective prediction-error dynamics (arousal/valence trajectory), and (3) differentiable personal outcomes (distinct rhythmic choices yield distinct experiences). The architecture guarantees low-latency feedback (<50 ms), preserving the sensorimotor contingency window critical for agency. Assumptions: AI outputs remain musically coherent (no large unexpected harmonic violations); cultural priors allow entrainment to the base meter.
Predictive Processing and Temporal Expectation (established + plausible inference)
Perception minimises variational free energy: an upper bound on surprise (Friston, 2010; Clark, 2013). In music, hierarchical generative models predict both what (pitch/harmony) and when (meter/timing). Temporal predictions engage motor circuits even passively: Grahn & Brett (2007) showed beat perception activates basal ganglia and supplementary motor area; Chen et al. (2008) confirmed motor planning recruitment during rhythm listening.
Dynamic attending theory (Large & Jones, 1999) models attentional oscillations entrained by meter; violations produce measurable arousal. In predictive-coding terms (Vuust & Witek, 2014), meter acts as a strong prior; syncopation generates precision-weighted prediction errors whose magnitude follows an inverted-U for pleasure (Witek et al., 2014). In Interactive Music, the user actively selects the timing of these errors via taps—shifting from passive minimisation to active inference: taps realise policies that sample the world (auditory + graphic) on a self-chosen schedule. This active component amplifies agency relative to passive listening.
Dopaminergic Reward and Temporal Anticipation (established)
Salimpoor et al. (2011) demonstrated, via [11C]raclopride PET, endogenous dopamine release in caudate during anticipation of musical peaks and in nucleus accumbens during peak experience itself (fMRI dissociation confirmed). Anticipatory dopamine tracks reward prediction error; temporal control lets the participant set the schedule of these errors. Moderate syncopation (user-controlled note density and placement) maximises groove and wanting-to-move (Witek et al., 2014; Janata et al., 2012). Pitch variation alone rarely elicits comparable striatal responses unless embedded in temporal frameworks.
Motor Coupling and Emergence of Agency (established + plausible inference)
Agency arises from reliable action–outcome contingencies and efference-copy–reafference matching (Haggard, 2017). Rhythm is inherently motoric: passive listening recruits motor areas; active synchronisation strengthens ownership and affiliation (Hove & Risen, 2009; Chen et al., 2008). In Interactive Music each tap produces: (i) immediate AI-generated note (auditory reafference), (ii) character movement and world interaction (visuomotor reafference). This multimodal contingency loop satisfies agency criteria more robustly than auditory-only or pre-scripted rhythm games.
Rhythm versus Pitch: Complementary Rather than Competitive Contributions
Rhythm and pitch processing are dissociable (Dowling, 1978; neuropsychological dissociations), yet interactive. Rhythm provides the temporal skeleton that structures when harmonic expectations are tested; pitch supplies tonal valence and narrative contour (Huron, 2006). In Interactive Music the AI handles pitch conditioning, freeing the user for rhythmic authorship without skill barriers or dissonance risk. This complementarity enables non-musicians to experience the motor-reward dynamics musicians achieve through full instrumental control.
Temporal Control as Emotional Authorship (plausible inference for this medium)
By controlling note onsets, density, syncopation, and silence, the participant modulates:
Precision weighting of prediction errors,
Schedule of dopaminergic anticipation peaks,
Sensorimotor binding windows.
Thus temporal agency translates directly into control over affective trajectory. Operational measures: (a) subjective agency scales (adapted from Haggard paradigms), (b) physiological coupling (skin-conductance or heart-rate variability locked to self-generated vs. yoked taps), (c) behavioural differentiation (exploration of timing space correlated with reported authorship).
Limbic Prior Hypothesis (speculation)
Subcortical affective systems (SEEKING, PLAY; Panksepp, 1998) operate via rhythmic motor sequences and dopaminergic modulation. Within predictive processing, low-level priors may be evolutionarily conserved and less dependent on cortical context. Self-timed rhythmic patterns could directly modulate these priors via precision-weighted entrainment, producing rapid, embodied affect relatively independent of harmonic interpretation. This remains speculative; it predicts weaker cultural-harmonic modulation of emotional response in timing-control conditions versus pitch-control conditions.
Interactive Music as Neurocomputational System
The on-device architecture (causal transformer + cross-attention to harmony stream) ensures zero-latency feedback, enabling tight action–outcome loops. Creator-supplied base track and default path supply the generative model scaffold; player taps perform active inference within it. Multimodal outcomes (sound + navigation + interaction) further reinforce agency.
Counterarguments and Limitations
Pitch contour and harmony contribute specific valence (major/minor, tension/resolution); without them, rhythm alone may lack narrative depth. However, in this medium the AI supplies coherence, so timing structures delivery of that content. Cultural and training differences modulate priors (assumption: base tracks are accessible). If AI latency exceeds ~200 ms or outputs incoherent notes, contingency breaks and agency collapses—testable failure mode. Individual differences in beat perception or groove sensitivity will moderate effects.
Conclusion
In Interactive Music, rhythm is the computational vector through which non-musicians exert temporal agency. By granting control over prediction-error timing and sensorimotor contingencies within a musically coherent scaffold, the medium closes the participation gap with mechanistic precision. The framework is falsifiable, directly tied to the architecture, and generates concrete experiments. Rhythm is not ancillary; in this interactive context, it is the substrate of authorship.
References
Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008). Listening to musical rhythms recruits motor regions of the brain. Cerebral Cortex, 18(12), 2844–2854.
Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204.
Dowling, W. J. (1978). Scale and contour: Two components of a theory of memory for melodies. Psychological Review, 85(4), 341–354.
Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11(2), 127–138.
Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain. Journal of Cognitive Neuroscience, 19(5), 893–906.
Haggard, P. (2017). Sense of agency in the human brain. Nature Reviews Neuroscience, 18(4), 196–207.
Hove, M. J., & Risen, J. L. (2009). It’s all in the timing: Interpersonal synchrony increases affiliation. Social Cognition, 27(6), 949–960.
Huron, D. (2006). Sweet anticipation: Music and the psychology of expectation. MIT Press.
Janata, P., Tomic, S. T., & Haberman, J. M. (2012). Sensorimotor coupling in music and the psychology of groove. Journal of Experimental Psychology: General, 141(1), 54–75.
Large, E. W., & Jones, M. R. (1999). The dynamics of attending. Psychological Review, 106(1), 119–159.
Panksepp, J. (1998). Affective neuroscience: The foundations of human and animal emotions. Oxford University Press.
Phillips-Silver, J., & Trainor, L. J. (2005). Feeling the beat: Movement influences infant rhythm perception. Science, 308(5727), 1430.
Salimpoor, V. N., Benovoy, M., Larcher, K., Dagher, A., & Zatorre, R. J. (2011). Anatomically distinct dopamine release during anticipation and experience of peak emotion to music. Nature Neuroscience, 14(2), 257–262.
Vuust, P., & Witek, M. A. G. (2014). Rhythmic complexity and predictive coding. Frontiers in Psychology, 5, 1111.
Witek, M. A. G., Clarke, E. F., Wallentin, M., Kringelbach, M., & Vuust, P. (2014). Syncopation, body-movement and pleasure in groove music. PLoS ONE, 9(4), e94446.