Identifying logical contexts in logs
Published:
Some months ago, I had to track a bug for which appeared on one SIP call, in a pre-production environment. So I had to dig a very lengthy log file, with lots of information about many mixed calls.
This situation is not really a problem if you know exactly the call you are tracking, thanks to its Call-ID or indirectly with a From or To tag for example. Hopefully, your program already offers interesting logs to address the problem, or you can provide a modified program to gather more information. If not, you know the rest of the week will be a pain.
In this case, I was lucky enough to have the opportunity to provide a new binary with lots of logs in the area which caused the bug, but I still had to search through thousand of lines to identify the path which lead one call to trigger the error.
So I came up with a simple python script which parsed the file for me and built a map from the logs I was interested in. Each entry of the map was identified by an ID which was unique to a call, and contained all the logs for this call in order of appearance in the file. To achieve this, the script relied on a regular expression which identified the logs I had added. At last, the script dumped those contexts, which were far easier to read than the initial file.
Then I got a new job, and I forgot this…
Until this week, when I had to identify in a very long file some
sixteen calls out of more than one thousand which were behaving
differently. So I wrote the following script in the vein of the
previous one. It can be quite easily adapted to other problems by
modifying the PATTERN
constant. It can read multiple log files, and
it can output the number of contexts found, the number of lines read
for each one of these contexts, and dump them. I think it is self
explanatory, and it comes with inline help.
#!/usr/bin/env python
# argparse means Python 2.7 is required.
import argparse
import re
# The pattern can be customized to adapt the program behaviour to the
# kind of logs to be analysed. It must provide a group named `key'
# which will be used to identify the logs belonging to a same context.
PATTERN = '.+ \[INFO\] .* with Call-ID = (?P<key>[a-zA-Z0-9@\.-]+)'
def build_contexts(filenames):
contexts = {}
for n in filenames:
with open(n) as f:
e = re.compile(PATTERN)
while True:
l = f.readline()
if '' == l:
break
m = e.match(l)
if not m is None:
k = m.group('key')
if not k in contexts:
contexts[k] = []
contexts[k].append(l.strip())
return contexts
def display_context(contexts, key, cmd):
print '%s: %s' % (key, str(len(contexts[key])) + ' lines.' if cmd.summary else '')
if cmd.list:
for l in contexts[key]:
print '\t', l
def handle_contexts(contexts, cmd):
if not cmd.key is None:
if cmd.key in contexts:
display_context(contexts, cmd.key, cmd)
elif cmd.list or cmd.summary:
for key in contexts:
display_context(contexts, key, cmd)
print 'Total:', len(contexts), 'contexts'
# Main
parser = argparse.ArgumentParser(
description='Correlates log lines.',
epilog='See the PATTERN variable in the code to suit your needs')
parser.add_argument('files', metavar='file', type=str, nargs='+',
help='the file to inspect')
parser.add_argument('--get', action='store', dest='key',
help='search only the context with this key')
parser.add_argument('--list', action='store_true',
help='dump the contexts content')
parser.add_argument('--summary', action='store_true',
help='print a summary for each context found')
args = parser.parse_args()
handle_contexts(build_contexts(args.files), args)