Data collection

If you look at the concordances you will realise that there is a LOT of input data. This is probably what i've spent the most time on, yet I feel like I could spend a lot more time on it.

The way I manage the input data is by using a separate system called the Transport Data System. The TDS has a few major tasks:

  • cleaning and categorising data that is gathered from various sources. These all need to be categorised in a way that is consistent with the model.
  • concatenating all the data and making sure that it is all in the same format.
  • identifying what data we have multiple sources for and then deciding which source to use.
  • identifying what data we don't have and then deciding how to fill in the gaps.

You can find the code here, although I wouldn't recommend seriously looking at using it, maybe just extracting the bits that you need.

You can also access the input data that is used in the model here