Reparsing old log files with Logstash 5

In my current research, a set of log files need to be processed by Logstash before sent to ElasticSearch. After a the first run, I realized that the log format was changed once, which made my Logstash configuration fail in processing half of the log lines. I updated the filter and ran Logstash again, hoping that it would reparse the log file with the new configuration. Unfortunately, no line was parsed! This blog post tell about my experience in making Logstash to parse old files without changing files names.

The way Logstash works

The file input plugin of Logstash remember the last position of each file that Logstash process by using .sincedb* files. The location of .sincedb* files can be set implicitly be the sincedb_path option of the plugin. I did not provide any value for this option. Therefore, I needed to know the default location of these files. Results from Google search told me that Logstash normally saved the files in the $HOME directory. I searched in my home folder (the machine runs Ubuntu OS) and found no such files. After that, I ran a command to find to them:

find / -type f -n '.sincedb*'

The results showed me that my Logstash instance saved sincedb files inside the data plugins/inputs/file sub folder of the current installation location.  Each file had a MD5 hash suffix.

Clear the sincedb

In my case, the virtual machine was used for this research only and I simply deleted all .sincedb* files inside the folder. If your Logstash instance is used for many tasks, you should not do that. You should delete only the sincedb file that stored the metadata of the files that need reparsing. You can look inside the content of each sincedb file or using the MD5 suffix of the file to get the information. The source of the file input plugin shows that Logstash calculate the MD5 hash of the concatenation of all paths in the configurations.  After sincedb files was clear, I reran Logstash and my log files were reparsed with the new filters.

Leave a Reply