In pygrok library, There are many patterns for many sorts of information extraction from getting a name to IP addresses there are patterns for all sorts of data. We are allowed to use any of them for our needs. The following are the few other the patterns which are used mostly for extraction.
PATTERNS
The following patterns can be used in python programs for matching text.
name - pattern
username - %{USERNAME}
EMAILADDRESS - %{EMAILLOCALPART}@%{HOSTNAME}
NUMBER - (?:%{NUMBER})
WORD - %{WORD}
SPACE - s*
WINDOWSMAC - (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2})
IPV4 - (?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])
WINPATH - (?>[A-Za-z]+:|)(?:[^?*]*)+
MONTH - (?:Jan(?:uary|)?|Feb(?:ruary)?|M(?:a)?r(?:ch)?|Apr(?:il)?|Ma(?:y)?|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|O(?:c)?t(?:ober)?|Nov(?:ember)?|De(?:c)(?:ember)?)
DAY - (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)
YEAR - (?>dd){1,2}
CUSTOM PATTERNS
We can even create and use custom patterns for our usage in the programs. For the creation of custom pattern, We have to know Regex expressions. As pygrok uses regex expressions to match strings and extract information. for example, If we want to create an expression for a word containing all lower case alphabets. we use pattern [a-z]*.
USING CUSTOM PATTERNS IN PYTHON
We generally create grok object with passing pattern as parameter. Along with the pattern parameter it has two additional parameters custom_pattern_path and custom_pattern. The custom_pattern_path is a path variable consisting of the path of the custom patterns we want to match to the data. The custom_pattern is the pattern we created to match to the data.
It is better to pass the patterns in custom_pattern variable if they are fewer rather than defining path to the custom_pattern_path variable.
PROGRAM TO IMPLEMENT CUSTOM PATTERNS WITH PYGROK
In the above program we can even take pat (custom pattern) at run time by using input function. The custom directory path is the path of the custom patterns we had in our system.
Comments