The course is 5000m (5k) long. This has been accurately measured by us with a professional measuring wheel. The course is in Birkenhead Park, Claughton. The course is run entirely on tarmac paths.
We offer two splits for each dataset: Dev and Test. The multi-turn interaction requires an LLMs to generate around 4k and 13k times respectively. Here is the scores on test set (standard) results of ...