- Strings syntax
- Operations with strings
- Unicode strings
- String formatting
Strings are ordered collections of characters, and are used mostly to store text.
Strings are typically delimited by single or double quotes (also in other programming languages):
'this is a string' "this is also a string"
If a single or double quote appears as part of the string itself, it might conflict with the surrounding quotes. There are a few ways to avoid this.
Whenever possible, choose the quote which does not appear in the string content to delimit the string:
"don't worry be happy" 'national "airquoting" competition'
Escaping quote characters
It is also possible to escape the quote character, so it is used literally (as a quote character, not as a string delimiter). In Python strings, to escape a character, simply add a backslash before it:
"this is a \"string\" too" "\ is an escape character inside strings"
The backslash character can also be used to invoke some special characters, such as the newline character
\n and the tabulator
>>> hello >>> world
When working with longer multi-line strings, it can be cumbersome to break lines with
\n. After a few linebreaks, the legibility of the text is sacrificed.
In such cases, we can use triple-quoted strings: a special notation which makes it possible to use linebreak and tabulator characters literally:
print('''hello) python world'''
>>> hello >>> python >>> world
Operations with strings
Strings can perform a couple of operations, using the same notation as operations with numbers (this is called operator overloading).
Strings can be added to another string:
print('a' + 'b')
a = "a string" b = "another string" print(a + " " + b)
>>> a string another string
Strings can be multiplied by integers:
print("spam " * 5)
>>> spam spam spam spam spam
…but not by floats (even if they look like an integer):
print("spam " * 5.0)
>>> Traceback (most recent call last): >>> File "<untitled>", line 1, in <module> >>> TypeError: can't multiply sequence by non-int of type 'float'
In Python 2.x there are 2 different kinds of string objects: the default ASCII-based
str object, and the more recent Unicode-based
There is only one syntatic difference between strings and Unicode strings: Unicode strings are written with a preceding
print(type('this is a normal string'))
>>> <type 'str'>
print(type(u'this is a unicode string'))
>>> <type 'unicode'>
Often, using Unicode characters in normal strings will just work:
print("Åbenrå © Ђ ק")
One of the biggest changes in Python 3 is Unicode support. In Python 3, all strings are unicode objects.
Python offers a string formatting syntax which makes it easier to create strings out of smaller parts.
This syntax uses the character
% as a placeholder, and specifies the corresponding parts to be replaced after the string:
print("spam spam %s spam" % 'eggs')
>>> spam spam eggs spam
In the above example, only one part is used when formatting the string. In case there are two or more parts, they need to be wrapped in a tuple:
print("spam %s spam %s spam spam" % ('eggs', 'bacon'))
>>> spam eggs spam bacon spam spam
s after the
%s. This means that the formatting expression treats the values as a string. This is the most used option, but other ones are also available.
%f gives a few options for formatting decimal numbers:
print("hello %f" % 50.4625)
>>> hello 50.462500
As we can see from the output,
%f uses 6 decimal places by default.
We can shorten the amount of digits after the dot by a specified amount. Here we are limiting it to two digits only:
print("hello %.2f" % 50.4625)
>>> hello 50.46
By using a
+ sign before the decimal dot, we can indicate that a plus/minus sign should be prepended to the number to indicate its polarity:
print("hello %+.2f" % 1.5)
>>> hello +1.50
print("hello %+.2f" % -3.0)
>>> hello -3.00