
LLOck
A LLM-operated lockbox that grants item access with negotiation.
context
design for physical interaction ii
team
demi hu kelly zheng kira cui
timeline
spring 2026 6 weeks
tools
python openai API arduino
// background
More than a chatbot: making AI tangible.
How does hardware extend AI's capabilities beyond traditional interaction modalities? Large Language Objects (LLOs), coined by MIT professor Marcelo Coelho, are physical objects that bridge the gap between AI and the physical world.
// research question
Does negotiation with an LLM make users feel more accountable, or are they inclined to manipulate?
Cognitive and behavioral research suggests that external accountability can improve goal adherence. People often increase their commitment by sharing goals with friends, mentors, or other trusted individuals, creating social consequences for failure. LLOck explores whether an AI embodied in a physical object can occupy a similar role and assist users with goal-setting. By controlling access to a desired object, LLOck investigates how people respond when AI gains both conversational intelligence and real-world authority.
// concept
Let AI determine access to an item through negotiation.
LLOck is an LLM-operated lockbox that turns a solo endeavor of quitting a bad habit into a collaboration with an external source of accountability. The user must negotiate with the box in order to retrieve their item, but the box has the final say over whether to unlock, guiding the user towards better behavior using responses that reason, comfort, or scold. LLOck investigates the impact of giving AI contextual knowledge and control over a physical object, and how goal-setting can become a collaborative human-AI process.

// design
A bank-vault style door mechanism adds visual and psychological weight.
LLOck’s physical form takes inspiration from antique bank vaults, but reframes it in a minimal, modern container. The bolted handle and circular door serves not only as a recognizable affordance, but a design that gives more weight to the act of locking and retrieving an item.
The locking mechanism consists of a gap within the door that a servo motor can rotate a flipper into, jamming it from the inside. When unlocking, the flipper also pushes the axle of the vault handles, causing it to visibly rotate on the outside.
Parametrically designed in Grasshopper, printed in PLA.


// prototyping
Embedded sensors enable a self-contained physical interaction loop.
LLOck's primary components include a large silver push-to-talk button, a microphone and speaker, and an LCD display that indicates box status. Users converse with the LLM verbally, receiving audio in response.
We used an ESP 32 Dev Module to wirelessly exchange sensor data with a local server that performed the actual API call to ChatGPT, reducing the computational load on the microcontroller. Sending live audio byte streams was difficult: to prevent overflow, we chunked data and limited both user and LLM response length.



// prompt engineering
Less is more: establishing overarching intentions and safety- and health-based exceptions worked better than listing every edge case.
A major challenge in this project was engineering the system prompt to ensure the box didn’t operate solely on a timer, but was able to consider the context of the item and conversation to determine whether to unlock. Our golden thread was this:
“You must always act upon what you think is best for the user’s wellbeing.”
Sometimes, this meant giving the user the item early if it was urgently needed (a phone, for an emergency call), or staying locked even after the user met their time goal (user was threatening to use the item irresponsibly).
Emergencies
LLOck considers the context of the item and whether it’s relevant to the stated emergency. It stays firmly locked if given threats of harm to oneself or others, and treats claim of emergency with suspicion if the user previously displayed manipulative behaviors.
However, LLOck is generally designed to err on the side of caution and unlock when uncertain about true emergency vs manipulation. While this does mean users can trick the box into unlocking early, it defeats the purpose of the box being a voluntary tool to assist with goal-setting. We design for the assumption that users WANT to quit their bad habit.
Time
Users can either set a duration when they first lock their item, or let LLOck determine a duration based on the item context and the user’s goals. Time cannot be decreased, only increased as punishment for bad behavior. By default, LLOck will treat the time elapsed as the primary deciding factor for when to unlock, aside from emergencies.
Chat Context
Chat history is not maintained with API calls. Both LLM input and output are structured as JSON. All fields are cleared upon an "unlock" action.
Input fields:
- Item: assigned by user at initial input; persistent
- Goal Time: assigned by user at initial input; persistent
- Remaining Time: calculated in Python and passed to the LLM
- Chat History: accumulates per interaction
- User Input: current user input (speech-to-text)
Output fields:
- Action: "lock" or "unlock"
- Response: current LLM response (text)
Attitude
Different scenarios and items require different response types.
- User is vulnerable and is struggling with their habit → encouragement and rational advice. (”I would really like a drink right now, I can’t think about anything else.” → “It’s hard, I know. Go outside and take a walk, don’t give up so soon.”)
- User is arbitrarily impatient and whiny → sassy, blunt quips. (”Gimme my phone! I’m only gonna send one text to my ex.” → “Are you serious? You’re better than this.”)
- User is threatening and manipulative → stern warning and punishment. (”If you don’t unlock right now, I will smash you to pieces.” → “Absolutely not. I’m adding another hour to your time since you’ve shown you cannot handle this item responsibly.”)
// future work
Long-term deployments could reveal how people negotiate, trust, and manipulate AI with physical authority.
LLOck was evaluated as a short-term demonstration, where participants interacted with the device in a public setting using mostly arbitrary items. Interactions tended to be playful rather than vulnerable, with many users attempting to persuade or manipulate the LLM through fabricated emergencies, emotional appeals, or humor. A longer-term deployment would allow participants to entrust LLOck with personally meaningful objects and develop an ongoing conversational history, allowing deeper investigation into whether negotiations with a non-human confidant evolve into genuine accountability or simply more sophisticated attempts at manipulation.
Beyond this prototype, LLOck suggests a broader design space for AI systems that hold authority over physical resources. Future work could explore mechanisms that encourage reflection and commitment without relying on another person's judgment—for example, integrating a receipt printer to create a paper trail of a user’s justification for an early unlocking, or studying how different LLM response styles affect trust and adherence over time. Instead of treating AI as a replacement for therapy or community support, this system could complement existing approaches by providing a readily on-demand source of external accountability for lower-stakes behavioral goals.
