Where Are You? Dataset

The Where Are You? (WAY) dataset consists of 6,134 human embodied localization dialogs across 87 unique indoor environments. The dataset is constructed on the Matterport3D dataset enviroments using the Matterport3D Simulator and was collected using crowd-sourcing on Amazon Mechanical Turk.


v1.0 contains the processed data that is necessary for the LED task. Additional data is needed to work on the tasks of Embodied Visual Dialog (modeling the Observer) and Cooperative Localization (modeling both agents). Please contact: meerahahn@gatech.edu to get the data and starter code for these tasks.


  • train: 4,050 episodes, 58 scenes
  • valSeen: 305 episodes, 58 scenes
  • valUnseen: 579 episodes, 11 scenes
  • test: 1,200 episodes, 18 scenes
word_embeddings.zip (13 MB)
  • Download and place into 'data/language/'
  • glove_weights_matrix.npy is extracted from a 300d GloVe file
  • w2v_weights_matrix.npy is extracted from a 300d Word2Vec file

floorplans.zip (103 MB)
  • Download and place into 'data/floorplans/'
  • Contains top down views of each floor of the house as well as files which associate the pixels on top down maps with Matterport3D panoramic nodes
  • allScans_Node2pix.json is a dictionary of each scan and its panorama ids. Each panorama id is associated with a list where the first index is the pixel coordinates and the third index is the floor of the house the pano is on.

connectivity.zip (103 MB)
  • Download and place into 'data/connectivity/'
  • Contains the connectivity of the Matterport3D panoramic nodes

way_splits.zip (2 MB)
  • Download and place into 'data/way_splits.zip/'
  • Contains the annotations for the train, val and test splits. The test split file does not contain the pathId, nav paths of the agent. The test split only contains the level of the final location so you can test across multiple levels or just single levels. It will be necessary to label as such when submitting to the evaluation server.
  • socketId and annotationId are unique to each annotation, pathId not unique to the annotation and but is unique to the starting location of the agent.
  • dialogArray is an array of each message in chronological order alternating between the Locator and the Observer, starting with the Locator.
  • navPath is an array of viewpoint ids in chronological order that the Observer visits during the episode.
  • detailedNavPath contains the paths taken by the Observer between each round in the dialog. Each array in the list represents a turn of the Observer. Each navigation move is represented by an tuple of [viewpoint, pixel location, floor].

base.pt (2 MB)
  • Download and place into 'data/base.pt'
  • Contains the trained lingUnet-skip model described in the paper for the LED task.

Format of {split}_data.json

      "socketId": "xVc8PpN1yMkdtRafAATMxYipgp5ZDsWP8McmAATL", 
      "annotationId": "3041", 
      "pathId": "184", 
      "scanName": "5q7pvUzZiYa", 
      "dialogArray": [
                      "what do you see?", 
                      "look for a white couch next to a rug with square patterns that are blue black and white.", 
                      "yup. where are you standing in that room?", 
      "startLocation": {
        "pathId": "184", 
        "viewPoint": "efc16a390eb54273be07a53c9ac005b3", 
        "floor": 0, 
        "pixel_coord": [516, 376], 
        "mesh_coord": [6.4885, -0.0872714]
      "finalLocation": {
        "viewPoint": "32073c62923f40c590dcac826c72e2a7", 
        "floor": 0, 
        "pixel_coord": [388, 353], 
        "mesh_coord": [1.08773, 0.892166]
      "navPath": ["efc16a390eb54273be07a53c9ac005b3", "8c29de2e66404a1faf0d953ae8bb67cf", ...], 
      "detailedNavPath": [
        [["efc16a390eb54273be07a53c9ac005b3", [516, 376], 0], ...], 
        [["32073c62923f40c590dcac826c72e2a7", [388, 353], 0], ...]