Introduction to Shlex:
Hello, This is Rohit Kumar .In this article I will talk about the Introduction of Shlex Module.
The shlex module implements a class for parsing simple shell-like syntaxes. It can be used for writing your own domain specific language, or for parsing quoted strings (a task that is more complex than it seems, at first).
The shlex class makes it easy to write lexical analyzers for simple syntaxes resembling that of the Unix shell. This will often be useful for writing minilanguages, (for example, in run control files for Python applications) or for parsing quoted strings.
Parsing Rules In Shlex:
When operating in non-POSIX mode, shlex will try to obey to the following rules.
1.Quote characters are not recognized within words (Do"Not"Separate is parsed as the single word Do"Not"Separate );
2.Escape characters are not recognized;
3.Enclosing characters in quotes preserve the literal value of all characters within the quotes;
4.Closing quotes separate words ("Do"Separate is parsed as "Do" and Separate);
5.If whilespace_split is False, any character not declared to be a word character, whitespace, or a quote will be returned as a single-character token. If it is true, shlex will only split words in whitespaces;
6.EOF is signaled with an empty string ('');
7.It%u2019s not possible to parse empty strings, even if quoted.
When operating in POSIX mode, shlex will try to obey to the following parsing rules.
1.Quotes are stripped out, and do not separate words ("Do"Not"Separate"is parsed as the single word DoNotSeparate);
2.Non-quoted escape characters (e.g. '\') preserve the literal value of the next character that follows;
3.Enclosing characters in quotes which are not part of escapedquotes (e.g. "'") preserve the literal value of all characters within the quotes;
4.Enclosing characters in quotes which are part of escapedquotes (e.g. '"') preserves the literal value of all characters within the quotes, with the exception of the characters mentioned in escape. The escape characters retain its special meaning only when followed by the quote in use, or the escape character itself. Otherwise the escape character will be considered a normal character.
5.EOF is signaled with a None value;
6.Quoted empty strings ('') are allowed.
Improved Compatibility with shells by Shlex:
The shlex class provides compatibility with the parsing performed by common Unix shells like bash, dash, and sh. To take advantage of this compatibility, specify the punctuation_chars argument in the constructor. This defaults to False, which preserves pre-3.6 behaviour. However, if it is set to True, then parsing of the characters ();<>|& is changed: any run of these characters is returned as a single token. While this is short of a full parser for shells (which would be out of scope for the standard library, given the multiplicity of shells out there), it does allow you to perform processing of command lines more easily than you could otherwise. To illustrate, you can see the difference in the following snippet:
>>> import shlex
>>> text = "a && b; c && d || e; f >'abc'; (def \"ghi\")"
>>> s = shlex.shlex(text, posix=True)
>>> s.whitespace_split = True
>>> list(s)
['a', '&&', 'b;', 'c', '&&', 'd', '||', 'e;', 'f', '>abc;', '(def', 'ghi)']
>>> s = shlex.shlex(text, posix=True, punctuation_chars=True)
>>> s.whitespace_split = True
>>> list(s)
['a', '&&', 'b', ';', 'c', '&&', 'd', '||', 'e', ';', 'f', '>', 'abc', ';',
'(', 'def', 'ghi', ')']
Result:
Of course, tokens will be returned which are not valid for shells, and you%u2019ll need to implement your own error checks on the returned tokens.
Instead of passing True as the value for the punctuation_chars parameter, you can pass a string with specific characters, which will be used to determine which characters constitute punctuation. For example:
>>> import shlex
>>> s = shlex.shlex("a && b || c", punctuation_chars="|")
>>> list(s)
['a', '&', '&', 'b', '||', 'c']
***********************************************************************************************************
Comments