String Manipulation














































String Manipulation



In Python, strings are immutable which means that they cannot be altered character by character. However, an entire string can be overwritten. For example, 
string = "HELLO"
string[0] = "Y"
This is not allowed, however, the whole string can be replaced like so - 
string = "HELLO"
string = "YELLO"
Also, since strings cannot be changed explicitly or using a built-in function, the functions mentioned in this post return a copy of the string and will not make changes to the original string. So we have to assign it to another variable or to itself. 

1. Slicing

Each character of a string can be accessed like a list. For example, string[0] returns "H". To print a substring or a range of letters, we use a concept called slicing. For example, string[0:3] returns the letters starting from 0 to 2, ie. "HEL". We can also slice strings such that it skips a fixed number of characters. For example, 

stringtwo = "hellomynameisxxx"
print(stringtwo[0:10:2]) 
prints "hloya". It skips one character each time. The letters chosen are in red, and the ones skipped are in grey. "hellomynam". Similarly, stringtwo[0:10:3] returns "hlym" - "hellomynam"

2. Changing the case

There are four methods in python to change the case of the string. capitalize() changes the first letter of the string. Of course, throughout this article, when i say 'changes', i mean that it returns a copy of the string with the specified changes made. 

string = "good morning!"
string.capitalize()
# Output - "Good morning"

lower() returns the copy of the string converted to lowercase and similarly, upper() returns copy of string converted to uppercase.

hello = "Hello"
hello.lower()
"hello" 
hello.upper()
"HELLO"

swapcase() returns a copy of string with all uppercase alphabets converted to lowercase and vice versa. 

hello.swapcase()
"hELLO"

3. Justification

You may not encounter many occasions to use these, but it might come in handy sometime. In this context, what justification methods do is that they embed the existing, given string in another string of a larger width(ideallly, and in cases where specified width is smaller, then same string is returned) while padding the extra region with spaces(default) or a character of user's choice. center(width, fillchar) embeds the original string in the new string right in the center. fillchar is optional and it specifies the character to padd the extra region with. 

string = "hello"
string.center(10)
'  hello   '

string.center(10, '#')
'##hello###'

The methods ljust() and rjust() do a similar job of padding the string but here, justifies it to the right or left.

string = 'hello'
string.ljust(10)
'hello     '

string.rjust(10, 'o')
'ooooohello'

There is another method called zfill() which is used for numeric strings. It adds zeros to the left of the string to make a total length 'width'. It adds zeros in the beginning regardless of whether the string is numeric or not but its typically used for numeric data as it does not hold much significance otherwise. 

num = '100'
num.zfill(5)
'00100'

4. Strip white spaces


The method strip() by default, strips the white spaces from both the ends of the string. It does not touch any spaces in between the string. This means to say that as soon as it encounters a character that's not meant to be stripped, it stops and starts scanning from the other end. The method lstrip() behaves similarly but it strips white spaces (or characters optionally) only from the beginning of the string. rstring() is also similar but it strips from the end of the string. These functions have an optional argument 'chars' where you can specify the characters to strip instead of white spaces. Please note that on passing this argument, it does not strip white spaces anymore. It strips any combination of the 'chars' as long as it does not encounter a character in the string that's not meant to be deleted. 

Given below is an example for strip(chars) 

'    hello   '.strip()
'hello'

'   goodmorning'.strip('gnm')
'   goodmorni'

'helloworld  '.strip('hdo ')
'elloworl'

'goodmorning'.strip()
'oodmorni' 

lstrip() and rstrip() works on a similar basis but only on one end. 

5. Split into lines or words


The method split(sep, maxsplit) returns a LIST of the words in the string, with sep acting as the delimiter. Both sep and maxsplit are optional. If maxsplit is specified, then the number of splits will be equal to maxsplit, and the total number of words = maxsplit + 1. Some examples are given below. 

'1, 2, 3'.split(', ')
['1', '2', '3']

'1, 2, 3'.split(', ', 1)
['1', '2, 3']

'   1, 2, 3'.split(', ')
['   1', '2', '3']

The method rsplit(sep, maxsplit) returns a list of the words in the string, using sep as the delimiter. If maxsplit is given, at most 'maxsplit' number of splits are done, the rightmost ones. Other than this, it behaves similar to split(). 

'1, 2, 3, 4, 5'.rsplit(', ', 2)
['1, 2, 3', '4', '5']

The method splitlines(keepends) returns a list of lines in a string. The string is broken at line boundaries. Line ends are not included in the list unless specified in which case keepends will be true. The line boundaries can be one among \n (line feed), \r (carriage return) etc. 

string = "Hello\nMy name is abc\rWhat's your name?"
string.splitlines()
['Hello', 'My name is abc', 'What's your name?']

6. Replace a substring

The method replace(old, new, count) returns a copy of the string with all occurances of the substring 'old' replaced with 'new'. However, count is optional, if it is specified, only the first 'count' occurances of substring 'old' are replaced by 'new'.

string = 'every road i walk along i walk along with you'
string.replace('walk', 'tread')
'every road i tread along i tread along with you'

string.replace('walk', 'tread', 1)
'every road i tread along i walk along with you'

Here, only one instance (the first one) of 'walk' is changed because we specify count as 1.

7. Join 

The method string.join(seq) returns a string which is the concatenation of the strings in the iterable or sequence 'seq'. 'string' is the seperator between the elements being joined. The examples below use the iterables list, dictionary and tuple. Note that in the case of dictionary, the strings returned and joined are the keys and not the values. 

list = ['hello','yellow','banana']
" ".join(list)
'hello yellow banana'

dic = {'name':'abc','age':15}
" ".join(dic)
'name age'

tup = ('apple', 'banana', 'orange')
" ".join(tup)
'apple banana orange'

The methods mentioned above all changed the string in some way. The methods below are different. They perform functions like identifying whether a string consists of digits, alphabets, stc or to find where a substring is situated in a given string.

1. Finding starting index of a substring

There are two methods to accomplish this. The method find(sub, start, end) returns the lowest index in the string where substring 'sub' is found, such that 'sub' is contained in the range [start, end]. Optional arguments 'start' and 'end' are interpreted as in slice notation. ie tries to located 'sub' within the range [start, end]. If 'sub' is not found, it returns -1. Another method called index(sub, start, end) performs the same function as find() but if 'sub' is not found, index() throws a ValueError instead of returning -1. 

The function rfind() returns the starting index of the rightmost occurance of the substring 'sub' optionally within the range [start, end]. Similarly, rindex() also returns the starting index of the rightmost occurance of the substring 'sub'. 

'lion tiger elephant lion'.find('lion')
0

'lion tiger elephant lion'.find('lion', 15, 24)
20 

In the above example, it returns the starting index of the substring 'lion' within the range 15 - 24. And hence the index of the second instance of 'lion' is returned because the first one does not lie in this range. 

'lion tiger elephant lion'.rfind('lion')
20 

'lion tiger elephant lion'.find('mouse')
-1

2. Count the number of occurances

To count the number of occurances of a substring 'sub' within a string in the range [start,end], we use the method count(sub, start, end). 

'lion tiger elephant lion'.count('lion')
2

3. Starts with and ends with

The method endswith(suffix, start, end) returns True if the string ends with the specified 'suffix', otherwise returns False. 'suffix' can also be a tuple of suffixes to look for. With optional start, test beginning at that position. With optional end, stop comparing at that position. 

The method startswith(prefix, start, end) also behaves similarly. It returns True if string starts with the prefix, otherwise return False. 

tup = ('lion', 'tiger', 'hippo')
string = 'lion elephant giant gorrilla'
print(string.startswith(tup))
print(string.endswith(tup))
True
False

4. Check nature of string


- isalpha() returns true if all characters in the string are alphabetic and there is at least one character, false otherwise. 

- isalnum() returns true if all characters in the string are alphanumeric and there is at least one character, false otherwise. 

- isdigit() returns true if all characters in the string are digits and there is at least one character, false otherwise.

- islower() returns true if all cased characters in the string are lowercase and there is at least one cased characted, false otherwise. 

- isspace() returns true if there are only whitespace characters in the string and there is at least one character, false otherwise. 

- isupper() returns true if all cased characters in the string are uppercase and there is at least one cased characted, false otherwise. 


Generally, to manually alter the string character by character, we convert the string into a list since it cannot be altered in string form. After editing, the list is converted back to a string using the join() function like we saw above. In such cases, we can use the functions applicable to lists to remove, replace, etc. Some examples of useful functions are given below. 

- To convert string to list of characters

string = 'fish attack'
string = list(string)
print(string)
['f', 'i', 's', 'h', ' ', 'a', 't', 't', 'a', 'c', 'k']

Now we can easily change any characters we want like. 

string[4] = ""
string[0:4] = 'panda'
"".join(string)
'pandaattack'

To append new characters, use the append() function

string = list(string)
string.append('s')
'pandaattacks'

To append a whole string, use the extend() function

string.extend(' frenzy')
'pandaattacks frenzy'

Note that the lists functions we have used alter the original list and not a copy. 

These are some ways to manipulate strings in Python. The source for this article is Python docs. Visit python.org

Comments