Tag Archives: Operations

Ignoring Warnings in Log Files

I know that many people think along the lines of ignoring warnings in log files. But I am responsible for an environment where we  run applications that are sort-of business critical. So I need to take warnings seriously, especially if their wording is not really clear. The consequence is that any unnecessary warning causes operational problems. Because who can tell me, kind-of “written in blood”, that I can absolutely always and forever ignore this warning? Only in that case could I consider adding an exception to the log monitoring system and of course to the system documentation, the operations manual, etc. So for me, and from my consulting past I know many customer think the same, this is not just a small nuisance but a real issue.

On the other hand I have had many discussions with people who told me that I could just ignore this or that entry. In many cases it turned out after some discussion, that the log level was actually chosen badly and INFO would have been more appropriate. In that respect the semantics of the commonly used log levels deserve a closer look. Here are two good links (link 1, link2) for definitions. When I first read them, my initial thought was that I might have overreached with the first paragraph of this post. But looking at the example from link 1 about WARN a bit closer, I think my concerns are still valid.

So what can be done? Reality is that you rarely have the ability to get a log statement changed. So you do need a scalable approach to deal with log messages that you consciously choose to ignore. It involves primarily two things: Firstly, you need to have documentation why the decision was made that a given log message is not critical. Secondly, there should be an automated link with your log file monitoring system, that configures an exception in it. Depending on your business this whole area might also be regulated, so the legal side may very well play a role as well. But that is outside the scope of this post.

I know this post is not really actionable, but still wanted to share my thoughts.

 

Michael T. Nygard: Release It! Design and Deploy Production-Ready Software

I bought this book about a year ago and -shame on me- only just read it. It’s really great for everyone that is interested in designing and developing robust software. So in that sense a must-read for all of us.

The book is organized in four general sections: Stability, capacity, general design issues and operations. For all of them a number of typical scenarios are described and general approaches discussed. The author seems to have a pretty strong Java and web application background (at least those are the areas of most of his examples), but the patterns and solutions are great for all systems, languages and use-cases.

So overall what we have here is a book that is fun to read and at the same time offers great insight into large-scale software systems. In my view everyone who works in this field can benefit from this book.

The author is also blogging on Amazon.com and seems to cover quite a few interesting topics.