angle-uparrow-clockwisearrow-counterclockwisearrow-down-uparrow-leftatcalendarcard-listchatcheckenvelopefolderhouseinfo-circlepencilpeoplepersonperson-fillperson-plusphoneplusquestion-circlesearchtagtrashx

IMAPClient and flattening the BODYSTRUCTURE

Flattening the BODYSTRUCTURE requires that you start reading the RFC3501.

27 September 2021 Updated 27 September 2021
In Email
post main image
https://unsplash.com/@2hmedia

Application developers want to use proven solutions to create an application. Many times this works but with the IMAPClient package there are a number of things missing.

The whole idea of IMAP is to get only what you request. Suppose you have an email with many attachments but you want to view or download only one of them. To be able to do this you need the 'body_number' of this attachment and then FETCH this part.

On the internet you see people downloading the whole message but that is not the way to do this! Here I present a solution to get the body_numbers of all parts of an email message.

Flattening the BODYSTRUCTURE

I struggled with this myself and to get going I used some code I found on the internet but is was of limited use. Time to start reading the RFC3501, see links below. An IMAP email message not only consists of attachments like images and PDF files, but the attachments can also be messages themselves, meaning our code must use recursion.

What I want is a list of body_numbers that can be used to FETCH the part(s) we want. We can call this operation 'flatten the BODYSTRUCTURE' because we go from recursions to a list. During this process we generate the body_numbers.

MULTIPART/ALTERNATIVE, MULTIPART/MIXED, MULTIPART/RELATED, ...

Email messages are most the time constructed with items that are related to each other. There are different types of relationships, for example:

  • ALTERNATIVE: the parts in alternative have the same content, so that the mail client can choose which one to show
  • RELATED: the parts are to be presented together, not alternatively. As a result, they are combined. e.g. inline image
  • MIXED: the parts contain different information and are not supposed to be shown together.

The document 'IMAP BODYSTRUCTURE: formatted examples' gives a nice introduction, see links below.

Here is an example of a ALTERNATIVE relationship. The BODYSTRUCTURE returned by IMAPClient is:

(
    [
        (b'text', b'plain', (b'charset', b'iso-8859-1'), None, None, b'quoted-printable', 426, 15, None, None, None, None), 
        (b'text', b'html', (b'charset', b'iso-8859-1'), None, None, b'quoted-printable', 1085, 36, None, None, None, None)
    ], 
    b'alternative', 
    (b'boundary', b'_000_CWXP265MB4244C3FA1F3563A988AAE2CABBDF9CWXP265MB4244GBRP_'), 
    None, 
    (b'en-US',), 
    None
)

IMAPClient gives us a list of ALTERNATIVE items. We can choose to show either the plain text or the HTML part. The flattened body_parts are, first number is the body_number:

1        - ALTERNATIVE : TEXT, text/plain, iso-8859-1
2        - ALTERNATIVE : TEXT, text/html, iso-8859-1

Here is the same example with a PDF attachment added:

(
    [
        (
            [
                (b'text', b'plain', (b'charset', b'UTF-8'), None, None, b'8bit', 1268, 25, None, (b'inline', None), None, None), 
                (b'text', b'html', (b'charset', b'UTF-8'), None, None, b'8bit', 10887, 115, None, (b'inline', None), None, None)
            ], 
            b'alternative', 
            (b'boundary', b'alt-60e413d9174229.65052373'), 
            None,
            None,
            None
        ), 
        (b'application', b'pdf', (b'name', b'summary.pdf'), None, None, b'base64', 187354, None, 
            (b'attachment', (b'filename', b'summary.pdf')), None, None)
    ], 
    b'mixed', 
    (b'boundary', b'multipart-60e413d9174190.84932644'), 
    None, 
    None, 
    None
)

The flattened body_parts are, first number is the body_number:

1.1      - ALTERNATIVE : TEXT, text/plain, utf-8
1.2      - ALTERNATIVE : TEXT, text/html, utf-8
2        - MIXED       : NON_MULTIPART-ATTACHMENT, application/pdf, summary.pdf

Where do the body_numbers come from?

We do not get the the body_numbers from the IMAP server. Instead we get the BODYSTRUCTURE from the IMAP server and must generate the body_numbers from it. In the examples above this is not difficult. But it gets more complicated with attached messages.

The structure of the BODYSTRUCTURE

To understand the BODYSTRUCTURE I copied some text from RFC3501:

The basic fields of a non-multipart body part are in the following order:

  • body type
    A string giving the content media type name as defined in [MIME-IMB].
  • body subtype
    A string giving the content subtype name as defined in [MIME-IMB].

  • body parameter parenthesized list
    A parenthesized list of attribute/value pairs [e.g., ("foo" "bar" "baz" "rag") where "bar" is the value of "foo" and "rag" is the value of "baz"] as defined in [MIME-IMB].

  • body id
    A string giving the content id as defined in [MIME-IMB].

  • body description
    A string giving the content description as defined in [MIME-IMB].

  • body encoding
    A string giving the content transfer encoding as defined in [MIME-IMB].

  • body size
    A number giving the size of the body in octets. Note that this size is the size in its transfer encoding and not the resulting size after any decoding.

A body type of type MESSAGE and subtype RFC822 contains, immediately after the basic fields, the envelope structure, body structure, and size in text lines of the encapsulated message.

A body type of type TEXT contains, immediately after the basic fields, the size of the body in text lines. Note that this size is the size in its content transfer encoding and not the resulting size after any decoding.

Extension data follows the basic fields and the type-specific fields listed above. Extension data is never returned with the BODY fetch, but can be returned with a BODYSTRUCTURE fetch. Extension data, if present, MUST be in the defined order.

Note that the body type of type MESSAGE and subtype RFC822 introduces recursion. When we encounter this, we process this message as a new message, extract body_number's, etc. And this message again may contain one or more other messages.

According to the RFC3501, every message contains an envelope structure, and a body structure. The message body structure is the new body structure that we must process.

Nested messages example

Here is an example of a message that contains another message that contains another message with an attachment. The BODYSTRUCTURE returned by IMAPClient is:

(
    [
        (b'text', b'plain', (b'charset', b'utf-8'), None, None, b'7bit', 35, 7, None, None, None, None), 
        (b'message', b'rfc822', (b'name', b'Re: My message.eml'), None, None, b'7bit', 10034116, 
            (
                b'Sun, 19 Sep 2021 10:04:43 +0200', 
                b'Re: My message', 
                ((b'Bob Smith', None, b'bobsmith', b'example.com'),), 
                ((b'Bob Smith', None, b'bobsmith', b'example.com'),), 
                ((b'Bob Smith', None, b'bobsmith', b'example.com'),), 
                ((b'richardroe@example.org', None, b'richardroe', b'example.org'),), 
                None, 
                None, 
                None, 
                b'<8b678e28-d03a-2bdd-2930-12470235ef9a@example.com>'
            ), 
            (
                (b'text', b'plain', (b'charset', b'utf-8'), None, None, b'7bit', 46, 7, None, None, None, None), 
                (b'message', b'rfc822', (b'name', b'Fw: Some email two.eml'), None, None, b'7bit', 10029135, 
                    (
                        b'Sat, 18 Sep 2021 18:35:47 +0200', 
                        b'Fw: Some email two', 
                        ((b'John Doe', None, b'johndoe', b'example.org'),), 
                        ((b'John Doe', None, b'johndoe', b'example.org'),), 
                        ((b'John Doe', None, b'johndoe', b'example.org'),), 
                        ((None, None, b'richardroe', b'example.org'),), 
                        None,
                        None,
                        None,
                        b'<a7dfbf41-1a26-4316-b8b5-1753fc17cd54-1631982946791@3c-example.org>'), 
                    (
                        (b'text', b'html', (b'charset', b'UTF-8'), None, None, b'7bit', 5053, 89, None, None, None, None),
                        (b'application', b'pdf', (b'name', b'YZ345.pdf'), None, None, b'base64', 187354, None, 
                            (b'attachment', (b'filename', b'YZ345.pdf')), None, None),
                        b'mixed', 
                        (b'boundary', b'abmob-49888c7b-3a09-4d10-b119-df9366de9f4c'), 
                        None, 
                        None, 
                        None
                    ), 
                    128656,
                    None,
                    (b'attachment', (b'filename', b'Fw: Some email two.eml')), 
                    None, 
                    None
                ), 
                b'mixed', 
                (b'boundary', b'------------3F95A42110AABF9E24EC86EB'), 
                None, 
                (b'en-US',), 
                None
            ), 
            128759, 
            None, 
            (b'attachment', (b'filename', b'Re: My message.eml')), None, None
        )
    ], 
    b'mixed', 
    (b'boundary', b'------------7323DBBF0E22BDA4B95E42D1'), 
    None, 
    (b'en-US',), 
    None
)

The flattened body_parts are, first number is the body_number:

1        - MIXED       : TEXT, text/plain, utf-8
2        - MIXED       : MESSAGE_RFC822, message/rfc822, Re: My message.eml
2.1      - MIXED       : TEXT, text/plain, utf-8
2.2      - MIXED       : MESSAGE_RFC822, message/rfc822, Fw: Some email two.eml
2.2.1    - MIXED       : TEXT, text/html, utf-8
2.2.2    - MIXED       : NON_MULTIPART-ATTACHMENT, application/pdf, YZ345.pdf

Fixing the BODYSTRUCTURE of nested messages

When you look at the nested messages example above you will notice that nested messages do not contains lists, like the top level message. I consider this a bug but it might be a design decision.

Whatever, IMAPClient contains a class that we can use to convert the nested message BODYSTRUCTURE to a top level BODYSTRUCTURE.

    body_data = BodyData()
    body_structure = body_data.create(nested_message_body_structure_part)

Now the nested message BODYSTRUCTURE contains lists and can be processed the same way as the top level BODYSTRUCTURE.

The BodyStructurePart class and part types

Flattening means we create a list of BodyStructurePart objects. A BodyStructurePart class contains all information to do futher processing. It must at least contain the following:

  • body_number: the body_number
  • body_type: ALTERNATIVE, MIXED, etc.
  • body_part: the actual part of the BODYSTRUCTURE

In addition I added attributes:

  • part_type
  • part_subtype
  • content_type

A part_type can be, see also RFC3501 above:

  • MESSAGE_RFC822
  • TEXT
  • NON_MULTIPART

A part_subtype, used with part_type = NON_MULTIPART, can be:

  • ATTACHMENT
  • INLINE
  • OTHER

The BODYSTRUCTURE parser code

And finally here is the parser code. It contains three classes:

  • IMAPBodyStructurePartUtils
  • IMAPBodyStructurePart
  • IMAPBodyStructureParser

The IMAPBodyStructureParser class has a parse method that is called with the BODYSTRUCTURE returned by IMAPClient. This method returns a list of IMAPBodyStructurePart objects that can be used in our code.

I put the code and examples in a single file in case you want to try it:

import sys

from imapclient import IMAPClient
from imapclient.response_types import BodyData


class IMAPBodyStructurePartUtils:

    @classmethod
    def __decode(cls, s):
        try:
            s = s.decode()
        except Exception as e:
            pass
        return s

    @classmethod
    def get_part_type_and_part_subtype(cls, body_part):
        part_type = None
        part_subtype = None
        try:
            if body_part[0] == b'message' and body_part[1] == b'rfc822':
                part_type = 'MESSAGE_RFC822'
            elif body_part[0] == b'text':
                part_type = 'TEXT'
            else:
                part_type = 'NON_MULTIPART'
        except:
            pass
        if part_type == 'NON_MULTIPART':
            try:
                if body_part[8][0] == b'attachment':
                    part_subtype = 'ATTACHMENT'
                elif body_part[8][0] == b'inline':
                    part_subtype = 'INLINE'
                else:
                    part_subtype = 'OTHER'
            except Exception as e:
                pass
        return part_type, part_subtype

    @classmethod
    def get_content_type(cls, body_part):
        try:
            ctype = cls.__decode(body_part[0])
            csubtype = cls.__decode(body_part[1])
            return ctype.lower() + '/' + csubtype.lower()
        except Exception as e:
            pass
        return None  

    @classmethod
    def __get_charset_or_name(cls, charset_or_name, body_part):
        try:
            for a in range(0, len(body_part[2]), 2):
                key = body_part[2][a].lower()
                val = body_part[2][a + 1]
                if key == charset_or_name:
                    if charset_or_name == b'charset':
                        return cls.__decode(val).lower()
                    return cls.__decode(val)
        except Exception as e:
            pass
        return None  

    @classmethod
    def get_charset(cls, body_part):
        return cls.__get_charset_or_name(b'charset', body_part)

    @classmethod
    def get_name(cls, body_part):
        return cls.__get_charset_or_name(b'name', body_part)

    @classmethod
    def get_filename(cls, body_part):
        try:
            sub_body_part = body_part[8][1]
            for i in range(0, len(sub_body_part), 2):
                key = sub_body_part[i]
                val = sub_body_part[i + 1]
                if key == b'filename':
                    return cls.__decode(val)
        except:
            pass
        return None


class IMAPBodyStructurePart:

    def __init__(
        self,
        body_number=None,
        body_type=None,
        body_part=None,
    ):
        self.body_number = body_number
        self.body_type = body_type
        self.body_part = body_part

        self.part_type, self.part_subtype = IMAPBodyStructurePartUtils.get_part_type_and_part_subtype(self.body_part)
        self.content_type = IMAPBodyStructurePartUtils.get_content_type(self.body_part)
        self.name = IMAPBodyStructurePartUtils.get_name(self.body_part)
        self.charset = IMAPBodyStructurePartUtils.get_charset(self.body_part)
        self.filename = IMAPBodyStructurePartUtils.get_filename(self.body_part)

    def __str__(self):
        if self.body_type is None:
            self.body_type = ''
        if self.part_type == 'MESSAGE_RFC822':
            return '{:8} - {:12}: {}, {}, {}'.format(self.body_number, self.body_type, self.part_type, self.content_type, self.name)
        elif self.part_type == 'TEXT':
            return '{:8} - {:12}: {}, {}, {}'.format(self.body_number, self.body_type, self.part_type, self.content_type, self.charset)
        return '{:8} - {:12}: {}-{}, {}, {}'.format(self.body_number, self.body_type, self.part_type, self.part_subtype, self.content_type, self.filename)


class IMAPBodyStructureParser:

    def __init__(
        self,
        dbg=False,
    ):
        pass

    @classmethod
    def __is_multipart(cls, part):
        return isinstance(part[0], list)

    @classmethod
    def __get_body_type(cls, part):
        # body_type (ALTERNATIVE, MIXED, ...) is first item after list
        body_type = None
        if len(part) > 1:
            body_type = part[1]
            if body_type is not None:
                try:
                    body_type = body_type.decode()
                except Exception as e:
                    pass
        if body_type is not None:
            body_type = body_type.upper()
        return body_type

    @classmethod
    def __add_body_part(cls, body_parts, body_number, body_type, part):
        body_parts.append(IMAPBodyStructurePart(
            body_number=body_number,
            body_type=body_type,
            body_part=part,
        ))

    @classmethod
    def parse(cls, part, body_number='', body_type=None):
        return cls.__recursive_parse(body_parts=[], part=part, body_number=body_number, body_type=body_type)

    @classmethod
    def __recursive_parse(cls, body_parts, part, body_number='', body_type=None):
        if part is None:
            return None

        part_type, part_sub_type = IMAPBodyStructurePartUtils.get_part_type_and_part_subtype(part)
        if part_type == 'MESSAGE_RFC822':
            cls.__add_body_part(body_parts, body_number, body_type, part)
            # convert message body_structure at part[8] using BodyData
            body_data = BodyData()
            part = body_data.create(part[8])
            if cls.__is_multipart(part):
                body_type = cls.__get_body_type(part)
                for i, p in enumerate(part[0], 1):
                    if body_number == '':
                        next_body_number = str(i)
                    else:
                        next_body_number = body_number + '.' + str(i)
                    cls.__recursive_parse(body_parts, p, body_number=next_body_number, body_type=body_type)
            else:
                cls.__add_body_part(body_parts, body_number, body_type, part)

        elif cls.__is_multipart(part):
            body_type = cls.__get_body_type(part)
            for i, p in enumerate(part[0], 1):
                if body_number == '':
                    next_body_number = str(i)
                else:
                    next_body_number = body_number + '.' + str(i)
                cls.__recursive_parse(body_parts, p, body_number=next_body_number, body_type=body_type)
        else:
            if body_number == '':
                body_number = '1'
            cls.__add_body_part(body_parts, body_number, body_type, part)

        return body_parts


# BODYSTRUCTURE examples
body_structures = [
    
    {
        'name': 'single text/plain part',
        'body_structure': 
        (b'text', b'plain', (b'charset', b'utf-8'), None, None, b'7bit', 1148, 59, None, None, None, None),
    },
    {
        'name': 'text/plain and text/html',
        'body_structure': 
        (
            [
                (b'text', b'plain', (b'charset', b'iso-8859-1'), None, None, b'quoted-printable', 426, 15, None, None, None, None), 
                (b'text', b'html', (b'charset', b'iso-8859-1'), None, None, b'quoted-printable', 1085, 36, None, None, None, None)
            ], 
            b'alternative', 
            (b'boundary', b'_000_CWXP265MB4244C3FA1F3563A988AAE2CABBDF9CWXP265MB4244GBRP_'), 
            None, 
            (b'en-US',), 
            None
        ),
    },
    {
        'name': 'text/plain and pdf attachment',
        'body_structure': 
        (
            [
                (b'text', b'plain', (b'charset', b'ISO-8859-15'), None, None, b'quoted-printable', 394, 13, None, (b'inline', None), None, None), 
                (b'application', b'pdf', (b'name', b'manual.pdf'), None, None, b'base64', 175098, None, (b'attachment', (b'filename', b'manual.pdf')), None, None)
            ], 
            b'mixed', 
            (b'boundary', b'_----------=_1631938636414264302'), 
            None, 
            None, 
            None
        ),
    },
    {
        'name': 'text/plain and text/html and pdf attachment',
        'body_structure': 
        (
            [
                (
                    [
                        (b'text', b'plain', (b'charset', b'UTF-8'), None, None, b'8bit', 1268, 25, None, (b'inline', None), None, None), 
                        (b'text', b'html', (b'charset', b'UTF-8'), None, None, b'8bit', 10887, 115, None, (b'inline', None), None, None)
                    ], 
                    b'alternative', 
                    (b'boundary', b'alt-60e413d9174229.65052373'), 
                    None, 
                    None, 
                    None
                ), 
                (b'application', b'pdf', (b'name', b'summary.pdf'), None, None, b'base64', 187354, None, 
                    (b'attachment', (b'filename', b'summary.pdf')), None, None)
            ], 
            b'mixed', 
            (b'boundary', b'multipart-60e413d9174190.84932644'), 
            None, 
            None, 
            None
        ),
    },
    {
        'name': 'text/plain, text/html and inline image',
        'body_structure': 
        (
            [
                (
                    [
                        (b'text', b'plain', (b'charset', b'UTF-8'), None, None, b'quoted-printable', 3909, 138, None, None, None, None), 
                        (b'text', b'html', (b'charset', b'UTF-8'), None, None, b'quoted-printable', 21375, 397, None, None, None, None)
                    ], 
                    b'alternative', 
                    (b'boundary', b'000000000000239fe505c86b594b'), 
                    None, 
                    None, 
                    None
                ), 
                (b'image', b'png', (b'name', b'image.png'), b'<17afcc25c9bcb971f161>', None, b'base64', 1491868, None, 
                    (b'inline', (b'filename', b'image.png')), None, None)
            ], 
            b'related', 
            (b'boundary', 
            b'000000000000239fe605c86b594c'), 
            None, 
            None, 
            None
        ),
    },
    {
        'name': 'text/plain, text/html and inline images and pdf attachment',
        'body_structure': 
        (
            [
                (
                    [
                        (
                            [
                                (b'text', b'plain', (b'charset', b'UTF-8'), None, None, b'quoted-printable', 4393, 90, None, None, None, None), 
                                (b'text', b'html', (b'charset', b'UTF-8'), None, None, b'quoted-printable', 12720, 264, None, None, None, None)
                            ], 
                            b'alternative', 
                            (b'boundary', b'0000000000007dda0f05c7b3e2d2'), 
                            None, 
                            None, 
                            None
                        ),
                        (b'image', b'png', (b'name', b'image.png'), b'<17acdbf96cccb971f161>', None, b'base64', 120514, None, 
                            (b'inline', (b'filename', b'image.png')), None, None), 
                        (b'image', b'png', (b'name', b'image.png'), b'<17acdbf96cccb971f162>', None, b'base64', 78208, None, 
                            (b'inline', (b'filename', b'image.png')), None, None)
                    ], 
                    b'related', 
                    (b'boundary', b'0000000000007dda1005c7b3e2d3'), 
                    None, 
                    None, 
                    None
                ), 
                (b'application', b'pdf', (b'name', b'Love letter.pdf'), b'<17acdbf96cc7f74e7e76>', None, b'base64', 591456, None, 
                    (b'attachment', (b'filename', b'Love letter.pdf')), None, None)
            ], 
            b'mixed', 
            (b'boundary', b'0000000000007dda1105c7b3e2d4'), 
            None, 
            None, 
            None
        ),
    },
    {
        'name': 'text/plain with message/rfc822 attachment',
        'body_structure': 
        (
            [
                (b'text', b'plain', (b'charset', b'utf-8'), None, None, b'7bit', 35, 7, None, None, None, None), 
                (b'message', b'rfc822', (b'name', b'Please respond.eml'), None, None, b'7bit', 10034116, 
                    (
                        b'Sun, 19 Sep 2021 10:04:43 +0200', 
                        b'Please respond', 
                        ((b'Peter Mooring', None, b'petermooring', b'gmail.com'),), 
                        ((b'Peter Mooring', None, b'petermooring', b'gmail.com'),), 
                        ((b'Peter Mooring', None, b'petermooring', b'gmail.com'),), 
                        ((b'peterpm@xs4all.nl', None, b'peterpm', b'xs4all.nl'),), 
                        None, 
                        None, 
                        None, 
                        b'<8b678e28-d03a-2bdd-2930-12470235ef9a@gmail.com>'), 
                    (
                        (b'text', b'plain', (b'charset', b'utf-8'), None, None, b'7bit', 46, 7, None, None, None, None), 
                        (b'text', b'html', (b'charset', b'UTF-8'), None, None, b'quoted-printable', 21375, 397, None, None, None, None),
                        b'alternative', 
                        (b'boundary', b'------------3F95A42110AABF9E24EC86EB'), 
                        None, 
                        (b'en-US',), 
                        None
                    ), 
                    128759, 
                    None, 
                    (b'attachment', (b'filename', b'Please respond.eml')), None, None
                )
            ], 
            b'mixed', 
            (b'boundary', b'------------7323DBBF0E22BDA4B95E42D1'), 
            None, 
            (b'en-US',), 
            None
        ),
    },
    {
        'name': 'text/plain with message/rfc822 attachment including another message/rfc822',
        'body_structure': 
        (
            [
                (b'text', b'plain', (b'charset', b'utf-8'), None, None, b'7bit', 35, 7, None, None, None, None), 
                (b'message', b'rfc822', (b'name', b'Re: My message.eml'), None, None, b'7bit', 10034116, 
                    (
                        b'Sun, 19 Sep 2021 10:04:43 +0200', 
                        b'Re: My message', 
                        ((b'Bob Smith', None, b'bobsmith', b'example.com'),), 
                        ((b'Bob Smith', None, b'bobsmith', b'example.com'),), 
                        ((b'Bob Smith', None, b'bobsmith', b'example.com'),), 
                        ((b'richardroe@example.org', None, b'richardroe', b'example.org'),), 
                        None, 
                        None, 
                        None, 
                        b'<8b678e28-d03a-2bdd-2930-12470235ef9a@example.com>'
                    ), 
                    (
                        (b'text', b'plain', (b'charset', b'utf-8'), None, None, b'7bit', 46, 7, None, None, None, None), 
                        (b'message', b'rfc822', (b'name', b'Fw: Some email two.eml'), None, None, b'7bit', 10029135, 
                            (
                                b'Sat, 18 Sep 2021 18:35:47 +0200', 
                                b'Fw: Some email two', 
                                ((b'John Doe', None, b'johndoe', b'example.org'),), 
                                ((b'John Doe', None, b'johndoe', b'example.org'),), 
                                ((b'John Doe', None, b'johndoe', b'example.org'),), 
                                ((None, None, b'richardroe', b'example.org'),), 
                                None,
                                None,
                                None,
                                b'<a7dfbf41-1a26-4316-b8b5-1753fc17cd54-1631982946791@3c-example.org>'), 
                            (
                                (b'text', b'html', (b'charset', b'UTF-8'), None, None, b'7bit', 5053, 89, None, None, None, None),
                                (b'application', b'pdf', (b'name', b'YZ345.pdf'), None, None, b'base64', 187354, None, 
                                    (b'attachment', (b'filename', b'YZ345.pdf')), None, None),
                                b'mixed', 
                                (b'boundary', b'abmob-49888c7b-3a09-4d10-b119-df9366de9f4c'), 
                                None, 
                                None, 
                                None
                            ), 
                            128656,
                            None,
                            (b'attachment', (b'filename', b'Fw: Some email two.eml')), 
                            None, 
                            None
                        ), 
                        b'mixed', 
                        (b'boundary', b'------------3F95A42110AABF9E24EC86EB'), 
                        None, 
                        (b'en-US',), 
                        None
                    ), 
                    128759, 
                    None, 
                    (b'attachment', (b'filename', b'Fw: Some email two.eml')), None, None
                )
            ],
            b'mixed', 
            (b'boundary', b'------------7323DBBF0E22BDA4B95E42D1'), 
            None, 
            (b'en-US',), 
            None
        ),
    },
]


# show examples
for b in body_structures:
    print('\nBody structure: {}\n{}'.format(b['name'], '-'*60))
    for body_structure_part in IMAPBodyStructureParser.parse(b['body_structure']):
        print('{}'.format(body_structure_part))

A word about IMAP and privacy

IMAP was designed to leave your messages on an IMAP server, they can be accessed from multiple devices. With IMAP you initially request only minimal data. If you want more, only the selected parts are downloaded. Search requests are also send to the IMAP server. Personally I do not like IMAP because it leaves much more of your data on the IMAP server and viewing specific attachments and search requests can also be used for fingerprinting.

Summary

Flattening the IMAP BODYSTRUCTURE took me time because there were no Python recipes on the internet. After reading the RFC3501 it appeared not that difficult ... but ... Because we decode the BODYSTRUCTURE ourselves is is easy to make mistakes. And can we handle all types of (malformed) BODYSTRUCTUREs?
On the internet you can find information about sometimes failing decoders e.g. from RoundCube. They fallback by fetching the whole message.

Links / credits

IMAP BODYSTRUCTURE: formatted examples
http://sgerwk.altervista.org/imapbodystructure.html

IMAPClient
https://imapclient.readthedocs.io/en/2.2.0/index.html

INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4rev1
https://datatracker.ietf.org/doc/html/rfc3501

Read more

Email IMAP

Leave a comment

Comment anonymously or log in to comment.

Comments

Leave a reply

Reply anonymously or log in to reply.