The power of VBA collections

VBA collections are very powerful for doing lookups. (Even if you know most of this already, I suggest you still look at Summing by Client ID below, in case you haven't come across a very powerful technique for summarising data with collections).

Introduction

A collection is like somewhere where you leave your luggage and get a ticket. When you present your ticket, you get your luggage back. Similarly, you use a collection to store an item, and provide a key which can be used to look up the item again.

Dim coll As New Collection
coll.Add "What I want to store", "TheKey" 'the key must be a string
MsgBox coll("TheKey") 'will return the string "What I want to store"

Note that if you try add two items with the same key, you will get an error. Each key must be unique.

Unique list

Suppose you have thousands of transaction records for clients, and each client may have many records. How do you get a unique list of the clients? The code below stores the client IDs in a collection, and also uses the ID as the key. Because keys have to be unique, we can only store each ID once, giving us a unique list - note we have to use "On Error.." to trap the errors that occur when we try to add the same IDs over and over again.

Dim coll As New Collection
On Error Resume Next 'to trap the error when we add the same Client ID to the collection
Do While Not rs.EOF 'assume data is coming from a database, for our example
     coll.Add rs("ClientID"), rs("ClientID") 'we'll store the client ID and also use it as the key
    rs.MoveNext 'go to next data record
Loop
On Error GoTo 0
For i = 1 to coll.Count
    MsgBox coll(i) 'this will give us a list of the unique client IDs
Next i

Summing by client ID

Suppose we again have thousands of transation records, and we want to add up all the transaction amounts for three different types of account, for each client ID (and assume it can't be done directly, by database queries!).

Suppose we set up an array to hold the results, and we'll assign the first client ID that we read in, to position 1 in the array, the next client ID to position 2, etc. The problem is that each time we read in a record, we have to search all the existing assignments to see if we already have this client ID, before assigning a new position. This where collections come to the rescue.

The trick is to use the collection to store the array position, so that if you look up a client ID and you get back 3, you know that this client ID has been assigned position 3 in the array.

Before you invest the time in examining it, why not see it in action with some real code, here. I tested it with 100,000 transactions for 20,000 unique client IDs, and it took just 1 second to add up 3 separate data items for each client ID across all the transaction records, which involved not just 300,000 additions, but sorting 100,000 transactions between 20,000 unique client IDs.

Dim C As New Collection
'set up array to store 3 items+ client ID = 4 items, & allow for 100 unique client IDs to start with
'we can expand this to more items if we need to, but we can only expand the last dimension of the array,
'which is why the number of records is in the 2nd column of the array

Dim nItems As Long
nItems =100 'start with 100 items
ReDim Data(4,nItems) As Single

Dim Index As Long 'this will tell us the position of the data in our array
Do While Not rs.EOF 'assume data is coming from a database, for our example
    Index = GetIndex(C,CStr(rs("ClientID"))) 'this function takes the client ID and returns the position in the array
    If Index <0 Then
         'trap error here - client ID couldn't be added to collection
    Else
        If Index > nItems Then 'increase the number of client IDs in the data array if we need to
           nItems = nItems+50 'add 50 items
           Redim Preserve Data(3,nItems) 'keep the existing data
        End If
        'now add the transactions to the existing totals
        Data(1,Index) = Data(1,Index) + rs("Item1")
        Data(2,Index) = Data(2,Index) + rs("Item2")
        Data(3,Index) = Data(3,Index) + rs("Item3")
        'store client ID if not already stored
        If Data(4,Index)=0 Then Data(4,Index) = rs("ClientID")
        rs.MoveNext 'go to next data record
    End If
Loop
On Error GoTo 0

'This function tries to add an item to the collection, and returns the position of the item in the collection
Public Function GetIndex(C As Collection, Key As String) As Long
    On Error Resume Next 'Adding an existing item causes an error, trap it
    C.Add C.Count + 1, Key 'try to add item, store next index

    On Error GoTo 0
    GetIndex=-1 'set default value as "not found" - shouldn't be necessary as we should always find the item
    '(unless, of course, the key we are trying to add is invalid, eg not a string)

    GetIndex = C(Key) 'look up and return the index number of our item
End Function

 

 

 

Note: Where possible, I've attributed ideas to the people who thought of them. If I have missed someone or made a mistake, please let me know.