This repository accompanies the paper "Interruption Handling for Conversational Robots". It contains the source code for an example of our interruption handling system integrated into an LLM-powered social robot built on the Platform for Situated Intelligence (\psi). We prompt engineered a large language model (GPT-4o-2024-05-13) to generate contextualized robot speech and select fitting facial expressions, head movements, and task actions for a timed decision-making (dessert survival simulation), a contentious discussion (whether the federal government should abolish capital punishment), and a question and answer task (space exploration themed). For more details on the interruption handling system, please refer to our manuscript. For more detailed description of the robot implementation, see the supplemental materials.
Environment:
- Windows 10 x64
Prerequisites:
- Python3
- Microsoft Platform for Situated Intelligence (\psi)
- node.js
- React
In overview, the SocialRobot folder contains files to run the main psi program. The expressive-face-server contains the web app that dispays the robot's face. The google-speech-to-text folder contains the python script that runs the speech recognition and wakeword detection service. The head-server folder contains the head code for the arduino as well as the python head server that communicates between Psi and the Arduino. The web-app folder contains files to run the React web app containing the task information.
There are three folders each containing the program to run for each task (practice, discussion, survival) in the study. Each folder contains the following files: a Google speech to text component, (GoogleSpeechToTextComponent), a Google text to speech component (GoogleTextToSpeechComponent), a component for classifying the type of interruption (LLMInterruptionHandlingComponent), a component to generate robot behavior (LLMResponseComponent), timer helper components (TimerSecondsComponent), and the main program.
Open the SocialRobot.solutions in visual studios to select the program to run. prompts.json contains all the prompts used in the study by the social robot programs.
_face.css _ contains the source code for the hand-crafted robot facial expressions. New facial expressions can be crafted by adjusting the positions and timing of the elements. face.html displays the robot face called by the API. face-testing.html display buttons to select the facial expression to display to help with testing. server.js contains an Express server that takes HTTP requires to change the robot's facial expressions displayed on face.html.
google-speech-to-text-luna-experimental.py contains a program that runs the google speech to text service and detects the wakewords ("Luna" and "stop") from the transcribed interim speech.
The head_movement_arduino folder contains a head_movement_arduino.ino program that controls three servo motors in the robot. test.py contains a program to help test the connection to the arduino. testServer.py contains a program to help communicate head movements between main psi program and the arduino.
The task-interface folder contains code for React App displaying the task interface for the study. There is an initial screen where the experimenter may choose the order at which tasks appear. The practice task always appears first. The python backend (server.js) communicates with the Psi program to signal the robot when the task begins and when the user makes changes to the task interface during the survival task.
To launch the expressive face component:
- Start the expressive face server:
cd expressive-face-server
python server.js
- Open
face.html
in your browser to view the animated robot face.
To launch the task interface:
- Start the task server:
cd web-app
python server.js
- Run the task interface web application:
cd task-interface
npm start
For questions that are not covered in this README, we welcome developers to open an issue or contact the author at [email protected].
If you use our system in a scientific publication, please cite our work:
@inproceedings{cao2025Interruption,
title={Interruption Handling for Conversational Robots},
author={Cao, Shiye and Moon, Jiwon and Mahmood, Amama and Antony, Victor Nikhil and Xiao, Ziang and Liu, Anqi and Huang, Chien-Ming},
year={2025}
}