Modifying Strings in Python

These are my notes on modifying strings in python.

This is my favorite Python book on Amazon, if you are interested in learning Python I highly recommend it

 



Modifying Strings
The python "str" object has many useful methods that can be dot-suffixed to its
name for modification of the string and to examine its contents. The most
commonly used string modification methods are listed below.

capitalize()        change string's first letter to uppercase
title()             change all first letters to uppercase
upper()             change the case of all letters to uppercase
lower()             change the case of all letters to lowercase
swapcase()          change to the inverse of the current case
removeprefix(sub)   remove substring from start of string
removesuffix(sub)   remove substring from end of string
join (seq)          merge into string into separator sequence
lstrip()            remove leading whitespace, trailing
rstrip()            remove trailing whitespace
strip(0)            remove leading and trailing whitespace
replace(old,new)    replace all occurrencies of old with new
ljust(w,c)          pad string to left to total column width by c
rjust(w,c)          pad string to right to total column width by c
center(w,c)         pad string each side to total column width by c
count(sub)          return the number of occurrences by sub
find(sub)           return the index number of the first occurrence of
                    sub
startswith(sub)     return true if sub is found at start
endswith(sub)       return true if sub is found at end
isalpha()           return true if all characters are letters only
isnumeric()         return true if all characters are numeric only
isalnum()           return true if letters or numbers only
islower()           return true if string characters are lowercase
isupper()           return true if string characters are uppercase
istitle()           return true if all first letters are uppercase
isspace()           return true if string contains only whitespace 
isdigit()           return true if string contains only digits
isdecimal()         return true if string contains only decimals

A space character is not alphanumeric so isalnum() returns false when examining
strings that contains spaces.

string = "age of mythology is a great game"
print("\nCapitalized:\t", string.capitalize())
print("\nTitled:\t\t", string.title())
print("\nCentered:\t", string.center(30,'*'))
print("\nUppercase:\t", string.upper())
print("\nJoined:\t\t", string.join('**'))
print("\njustified:\t", string.rjust(30,'*'))
print("\nReplaced:\t", string.replace('s', '*'))
 
With the rjust() method a right justified string gets padding added to
its left, and with the ljust() method a left justified string gets
padding added to its right.

Converting Strings
Before python 3.0, string characters were stored by their ascii numeric
code values in the range 0-127, representing only unaccented latin
characters. For example, the lowercase letter 'a' is assigned 97 as its
ascii code value. Each byte of computer memory can store values in the
range 0-255 but this is still too limited to represent all accented
characters and non-Latin characters. 

For example, accented characters used in Western Europe and the
cyrillic alphabet used for Russian cannot be represented in the range
128-255 because there are more than 127 such characters. Recent
versions of python overcome this limitation by storing string
characters as their unicode code point value to represent all
characters and alphabets in the numeric range 0-1,114,112. Characters
that are above the ascii range may require two bytes for their code
point value. 

The str object's encode() method can be used to convert from the
default unicode encoding, and its decode() method can be used to
convert back to the unicode default encoding. Python's unicodedata
module provides a name() method that reveals the unicode name of each
character. Accented and non-Latin characters can be referenced by
their unicode name or by decoding their unicode hexidecimal point
value.

The term ascii is an acronym for american standard code for
information interchange. You can use the character map app in Windows
accessories to select non-ascii characters. A string containing byte
addresses must be immediately prefixed by a 'b' to denote that string
as a byte literal. unicode names are uppercase and referenced by
inclusion between {} braces prefixed by a \N in this notation format.