Extracting Metadata

Using Calibre’s `ebook-meta``_ tool, extracts metadata and cover images from ebook files

..ebook-meta: https://manual.calibre-ebook.com/generated/en/ebook-meta.html

class capybre.metadata.Metadata(author=None, author_sort=None, description=None, ebook_format=<EbookFormat.UNKNOWN: 0>, identifiers=None, isbn=None, language=None, last_edited=None, publication_date=None, publisher=None, rating=None, series=None, tags=None, title=None)

Standardization of the metadata output from ebook-meta with mild changes to parse and clean the data.

The title field is guaranteed to exist, and ebook_format defaults ito EbookFormat.UNKNOWN if not specified, but everything else may be Nonr

Parameters:
  • author (str) – Author (Looks like this may be an &-seperated list of authors for multi-author works)
  • author_sort (str) – String by which the author should be sorted
  • description (str) – Paragraph length text
  • ebookFormat (EbookFormat) – Enum of ebook formats used by Calibre
  • identifiers (Dict[str,str]) – Dict of identifiers, like {‘isbn’:’xxx’}
  • isbn (str) – ISBN
  • language (str) – Language used, seemingly in 3-letter language codes
  • last_edited (date) – Timestamp on the file
  • publication_date (date) – Publication date of this edition
  • publisher (str) – Publisher’s name
  • rating (int) – Rating, out of 5?
  • series (str) – Series that this belongs to (possibly including a number indicating rank in the series)
  • tags (List[str]) – List of tags describing the book
  • title (str) – Title
capybre.metadata.clean_metadata_map(metadata_map)

Cleans and standardizes the metadata map returned by extract_raw_metadata_map()

Parameters:metadata_map (dict) – Dict result of extract_raw_metadata_map()
Returns:Metadata object
capybre.metadata.extract_cover(input_file: str, output_file: str = 'cover.jpg', suppress_output=True)

Extracts the cover image from the given ebook, and saves it in the output file

Parameters:
  • input_file (str) – path to the input file
  • output_file (str, optional) – path to the output cover image file, defaults to ‘cover.jpg’
  • suppress_output (bool, optional) – Suppresses stdout from ebook-convert call (typically dozens of lines). Defaults to True
capybre.metadata.extract_metadata(input_file) → capybre.metadata.Metadata

Extracts metadata from an ebook into the standardized Metadata format

Parameters:input_file (str) – path to the input file
Returns:Metadata object
capybre.metadata.extract_metadata_map(input_file: str)

Extracts metadata from an ebook via an ebook-meta call, returning a dict

Parameters:input_file (str) – path to the input file
Returns:
Dict mapping between metadata keys and values as directly output from
the ebook-meta call
capybre.metadata.extract_raw_metadata_map(raw_metadata: List[str])

Given output of ebook-meta program, extract metadata map :param raw_metadata: String representation of the lines produced

by Calibre (also produced by a call to fetch-ebook-metadata)
Returns:
Dict mapping between metadata keys and values as directly output from
the ebook-meta call
class capybre.metadata.extracted_cover_fileobj(input_file, suppress_output=True)

Extracts the cover image and temporarily presents it as a fileobj context

For use like

with extracted_cover_fileobj('original.epub') as f:
    # f is file pointer to file object
    upload(f)
Parameters:
  • input_file (str) – path to the input file
  • output_file (str, optional) – path to the output cover image file, defaults to ‘cover.jpg’
  • suppress_output (bool, optional) – Suppresses stdout from ebook-convert call (typically dozens of lines). Defaults to True