How do I get the benchmark data?


The benchmark data can be accessed through the Transfer Learning in Dialogue Benchmarking Toolkit (TLiDB link) with just a few simple lines of code.

We highly recommend to install and utilize the benchmark through the TLiDB package, but the raw datasets can also be downloaded.


To Download Data Through TLiDB [Recommended]:


Note: TLiDB does not require an explicit downloading step. By using the TLiDB dataloader, the datasets will automatically be downloaded
Quickstart (2 options):
  1. To install TLiDB with pip (recommended for recreating baselines):
    Install TLiDB with pip:
    pip install tlidb
    Then models can quickly be trained directly from the command line such as:
    tlidb --source_datasets Friends --source_tasks emory_emotion_recognition --target_datasets Friends --target_tasks reading_comprehension --do_train --do_finetune --do_eval --eval_best --model_config bert --few_shot_percent 0.1
  2. To utilize the TLiDB dataloader (recommended for use with your own models and training script):
    Install TLiDB with pip as above:
    pip install tlidb
    OR
    Install TLiDB from source:
    
    git clone git@github.com:alon-albalak/TLiDB.git
    cd TLiDB
    pip install -e .
                            
    Then, follow these instructions on data loading to incorporate TLiDB dataloaders into your own script.

To Download Raw Data:


Note: These links will download a zipped version of each dataset. Each raw dataset contains 6 files: the raw dataset (eg. TLiDB_Friends.json), 3 files that identify train/dev/test dialogue IDs for full-data splits (eg. TTiDB_test_ids.txt), 2 files that identify train/dev dialogue IDs for few-shot splits (eg. TTiDB_0.1_percent_few_shot_train_ids.txt)
Warning: Raw data does not contain prompts, and will require writing dataloaders that already exist in TLiDB.
FETA-DailyDialog:
FETA-Friends:


Dataset Schema



{
    "metadata": {
        "dataset_name": "Dataset Name",
        "tasks": [ # list of task names
            "task1",
            "task2",
        ],
        "task_metadata": { # metadata about tasks, for example: labels, metrics, or metric keyword arguments 
            "task_1": {
                "labels": [
                    "label1",
                    "label2"
                ],
                "metrics": [
                    "f1"
                ]
            },
            "task_2":{
                "labels": [
                    "label1",
                    "label2",
                    "label3"
                ]
            }
        }
    },
    "data": [ # list of dicts
        {
            "dialogue_id": "dialogue-1",
            "dialogue_metadata":{ # can be used to determine which tasks exist in this dialogue
                "dialogue-level-classification-task1": null,
                "dialogue-level-classification-task2": null,
                "turn-level-classification-task1": null,
                "turn-level-classification-task2": null,
            }
            "dialogue-level-classification-task1": {
                "label": "ground truth label",
                "instance_id": instance_id
            ,
            "dialogue-level-classification-task2": {
                "label": "ground truth label",
                "instance_id": instance_id
            },
            "dialogue": [ # list of dicts
                {
                    "turn_id": "1",
                    "speakers": ["speaker1"],
                    "utterance": "Example utterance",
                    "turn-level-classification-task1": {
                        "label": "ground truth label",
                        "instance_id": instance_id
                    },
                    "turn-level-classification-task2": {
                        "label": "ground truth label",
                        "instance_id": instance_id
                    },
                },
                {
                    "turn_id": "2",
                    "speakers": ["speaker2"],
                    "utterance": "Second example utterance",
                    "turn-level-classification-task1": {
                        "label": "ground truth label",
                        "instance_id": instance_id
                    },
                    "turn-level-classification-task2": {
                        "label": "ground truth label",
                        "instance_id": instance_id
                    }
                }
            ]
        }
    ]
}