Turning Human Intent into Machine Action
Our robot uses a model called SmolVLA. It combines:
Your prompt provides the critical context for which action to choose.
Source: huggingface.co/blog/smolvla
Name the target object precisely.
Instead of "the block", use "the small red lego block".
Use simple, direct language. Avoid ambiguity.
Instead of "deal with it", use "pick it up".
Use verbs that describe physical movements.
Good verbs: "push", "pick up", "place", "move to", "slide".
Task: Move a red cube into a blue bowl. A yellow ring is also on the table.
"put the block in the bowl"
Which block? Which bowl? The robot might get confused.
"pick up the red cube and place it inside the blue bowl"
Unambiguous, specific, and uses clear action verbs.
Task: Move a can to the right side of the workspace.
"move the can"
Move it where? How? This lacks a clear goal state.
"push the soda can to the right side of the table"
Specifies the action ("push") and a clear destination.
For complex, multi-step tasks, giving a sequence of simpler instructions can improve performance.
Task: Pick up a block, and then wave.
Chained Prompts:
We will now control a robot in real-time to see how different prompts affect its behavior.
Task Description: "A toy car is on its side. Put it back on its wheels."
Discuss: What would be a good prompt? A bad one?
Example Good Prompt:
"flip the blue toy car upright so it rests on its wheels"
Any questions about prompt engineering?