On my last post, I talked about the Comparing Test Results project that I am currently working on. In that post, it also mentioned parsing of logs to extract useful information. So far this has already been implemented and tested with data from Fennec builds. The intention of this post is to explain how the parsing is done so it can be a reference for improvements that may need to be made to extend this project to other builds.
There are two parts to the parsing. The first part is to look at the entire log and to find metadata used to characterize and identify a set of data. Here the code is looking for: the build ID, the product, the operating system, and the type of tests that were run.
- To get the build ID, the code is looking for the line that contains the string “BuildID=” such as this “BuildID=20090715053040”. The information to be extracted will then be whatever after the character ‘=’.
- Similarly for the product, the code is looking for the string ”Name=”.
- For the OS and test type, the code is looking for the line that contains the string “tinderbox: build: ” such as this “tinderbox: build: Maemo mozilla-central crashtest”. The OS will be taken from the first word after the string “tinderbox: build:” and the test type will be taken from the last word.
The second part is to parse the log per line to find lines that contain the output of a test run. On each line, if it’s an output of a test run, it is looking for: the result of the test (PASS/FAIL/TODO), the location of the test code (normally the path to the test source file), and if any the message that may describe the intent of the test.
- To determine if a line is an output of a test the code is looking for either of these strings: “TEST-PASS”, “TEST-FAIL”, “TEST-UNEXPECTED-FAIL”, “TEST-TIMEOUT”, “TEST-KNOWN-FAIL”.
- The line is then split into three sections as separated by the divider ‘|’. The first section will determine the code will go into either of these conditions: the test passed, the test is marked as TODO, or the test failed.
- The second section is simply taken as the location of the test code. Since only the relative path is needed, the beginning is striped as it appears for ‘reftest’ and ‘xpcshell’.
- The third section is taken as the message, if any, from the test. It is taken if the test did not pass.
- The code also increments the count of the number of passes, failures, and todos.
- As a result, in any of the three conditions, the code will see if it is a new test. If it is, it is added into the data structure with a count of 1 and if it is not a test that passes, its message is added as well. Otherwise, it will update the count of an existing test and if it is not a test that passes, the message is appended and separated by a coma.
Everything mentioned is not final, still continuously being improved, and is open to suggestions and extensions. The post will be updated as the code is updated.
see more Fail Blog