Do you work with the raw interaction data in terms of voice and video, or are you working only from the written part.
j 下一段next speechk 上一段previous speech