Say you have the string
s = 'AAAABBBCCDAABBB'
and you would like to split it so all the 'A's are in one list and so with all the 'B's and 'C', etc. You could do something like this
s = 'AAAABBBCCDAABBB'
s_dict = {}
for i in s:
if i not in s_dict.keys():
s_dict[i] = [i]
else:
s_dict[i].append(i)
s_dict
Results in
{'A': ['A', 'A', 'A', 'A', 'A', 'A'],
'B': ['B', 'B', 'B', 'B', 'B', 'B'],
'C': ['C', 'C'],
'D': ['D']}
But for large data set you would be building up these items in memory. This is where groupby() comes in
We could get the same result in a more efficient manner by doing the following
# note that we get a {key : value} pair for iterating over the items just like in python dictionary
from itertools import groupby
s = 'AAAABBBCCDAABBB'
c = groupby(s)
dic = {}
for k, v in c:
dic[k] = list(v)
dic
Results in
{'A': ['A', 'A'], 'B': ['B', 'B', 'B'], 'C': ['C', 'C'], 'D': ['D']}
Notice that the number of 'A's in the result when we used group by is less than the actual number of 'A's in the original string. We can avoid that loss of information by sorting the items in s before passing it to c as shown below
c = groupby(sorted(s))
dic = {}
for k, v in c:
dic[k] = list(v)
dic
Results in
{'A': ['A', 'A', 'A', 'A', 'A', 'A'], 'B': ['B', 'B', 'B', 'B', 'B', 'B'], 'C': ['C', 'C'], 'D': ['D']}
Now we have all our 'A's.