More on strings

Published

2023-08-01

More on string literals

Strings as ordered collections of characters

As we’ve seen, strings are ordered collections of characters, delimited by quotation marks. But what kind of characters can be included in a string?

Since Python 3.0, strings are composed of Unicode characters.1

Unicode, formally The Unicode Standard, is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world’s writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, thousands of emoji (including in colors), and non-visual control and formatting codes.2

That’s a lot of characters!

We won’t dive deep into Unicode, but you should be aware that Python uses it, and that "hello", "Γειά σου", and "привіт" are all valid strings in Python. Strings can contain emojis too!

Strings containing quotation marks or apostrophes

You’ve learned that in Python, we can use either single or double quotation marks to delimit strings.

>>> 'Hello World!'
'Hello World!'
>>> "Hello World!"
'Hello World!'

Both are syntactically valid, and Python does not differentiate between the two.

It’s not unusual that we have a string which contains quotation marks or apostrophes. This can motivate our choice of delimiters.

For example, given the name of a local coffee shop, Speeder and Earl’s, there are two ways we could write this in Python. One approach would be to escape the apostrophe within a string delimited by single quotes:

>>> 'Speeder and Earl\'s'
"Speeder and Earl's"

Notice what’s going on here. Since we want an apostrophe within this string, if we use single quotes, we precede the apostrophe with \. This is called escaping, and it tells Python that what follows should be interpreted as an apostrophe and not a closing delimiter. We refer to the string \', as an escape sequence.3

What would happen if we left that out?

>>> 'Speeder and Earl's'
Traceback (most recent call last):
  ...
  File "<input>", line 1
    'Speeder and Earl's'
                       ^
SyntaxError: unterminated string literal (detected at line 1)

What’s going on here? Python reads the second single quote as the ending delimiter, so there’s an extra—syntactically invalid—trailing s' at the end.

Another approach is to use double quotations as delimiters.

>>> "Speeder and Earl's"
"Speeder and Earl's"

The same applies to double quotes within a string. Let’s say we wanted to print

“Medium coffee, please”, she said.

We could escape the double quotes within a string delimited by double quotes:

>>> "\"Medium coffee, please\", she said."
'"Medium coffee, please", she said.'

However, it’s a little tidier in this case to use single quote delimiters.

>>> '"Medium coffee, please", she said.'
'"Medium coffee, please", she said.'

What happens if we have a string with both apostrophes and double quotes?

Say we want the string

“I’ll have a Speeder’s Blend to go”, she said.

What now? Now we must use escapes. Either of the following work:

>>> '"I\'ll have a Speeder\'s Blend to go", she said.'
'"I\'ll have a Speeder\'s Blend to go", she said.'
>>> print('"I\'ll have a Speeder\'s Blend to go", she said.')
"I'll have a Speeder's Blend to go", she said.

or

>>> "\"I'll have a Speeder's Blend to go\", she said."
'"I\'ll have a Speeder\'s Blend to go", she said.'
>>> print("\"I'll have a Speeder's Blend to go\", she said.")
"I'll have a Speeder's Blend to go", she said.

Not especially pretty, but there you have it.

More on escape sequences

We’ve seen how we can use the escape sequences \' and \" to avoid having the apostrophe and quotation mark treated as string delimiters, thereby allowing us to use these symbols within a string literal.

There are other escape sequences which work differently. The escape sequences \n and \t are used to insert a newline or tab character into a string, respectively. The escape sequence \\ is used to insert a single backslash into a string.

Escape sequence meaning
\n newline
\t tab
\\ backslash
\' single quote / apostrophe
\" double quote

Python documentation for strings

For more, see the Python documentation for strings, including An Informal Introduction to Python4 and Lexical Analysis.5

Original author: Clayton Cafiero < [given name] DOT [surname] AT uvm DOT edu >

No generative AI was used in producing this material. This was written the old-fashioned way.

This material is for free use under either the GNU Free Documentation License or the Creative Commons Attribution-ShareAlike 3.0 United States License (take your pick).

Footnotes

  1. You may have heard of Unicode, or perhaps ASCII (American Standard Code for Information Interchange). ASCII was an early standard and in Python was superseded in 2008 with the introduction of Python 3.↩︎

  2. https://en.wikipedia.org/wiki/Unicode↩︎

  3. Escape sequence is a term whose precise origins are unknown. It’s generally understood to mean that we use these sequences to “escape” from the usual meaning of the symbols used. In this particular context, it means we don’t treat the apostrophe following the slash as a string delimiter (as it would otherwise be treated), but rather as a literal apostrophe.↩︎

  4. https://docs.python.org/3/tutorial/introduction.html#strings↩︎

  5. https://docs.python.org/3/reference/lexical_analysis.html#literals↩︎