Extracting Metadata¶
Using Calibre’s `ebook-meta``_ tool, extracts metadata and cover images from ebook files
..ebook-meta: https://manual.calibre-ebook.com/generated/en/ebook-meta.html
-
class
capybre.metadata.Metadata(author=None, author_sort=None, description=None, ebook_format=<EbookFormat.UNKNOWN: 0>, identifiers=None, isbn=None, language=None, last_edited=None, publication_date=None, publisher=None, rating=None, series=None, tags=None, title=None)¶ Standardization of the metadata output from ebook-meta with mild changes to parse and clean the data.
The title field is guaranteed to exist, and ebook_format defaults ito EbookFormat.UNKNOWN if not specified, but everything else may be Nonr
Parameters: - author (str) – Author (Looks like this may be an &-seperated list of authors for multi-author works)
- author_sort (str) – String by which the author should be sorted
- description (str) – Paragraph length text
- ebookFormat (EbookFormat) – Enum of ebook formats used by Calibre
- identifiers (Dict[str,str]) – Dict of identifiers, like {‘isbn’:’xxx’}
- isbn (str) – ISBN
- language (str) – Language used, seemingly in 3-letter language codes
- last_edited (date) – Timestamp on the file
- publication_date (date) – Publication date of this edition
- publisher (str) – Publisher’s name
- rating (int) – Rating, out of 5?
- series (str) – Series that this belongs to (possibly including a number indicating rank in the series)
- tags (List[str]) – List of tags describing the book
- title (str) – Title
-
capybre.metadata.clean_metadata_map(metadata_map)¶ Cleans and standardizes the metadata map returned by
extract_raw_metadata_map()Parameters: metadata_map (dict) – Dict result of extract_raw_metadata_map()Returns: Metadataobject
-
capybre.metadata.extract_cover(input_file: str, output_file: str = 'cover.jpg', suppress_output=True)¶ Extracts the cover image from the given ebook, and saves it in the output file
Parameters: - input_file (str) – path to the input file
- output_file (str, optional) – path to the output cover image file, defaults to ‘cover.jpg’
- suppress_output (bool, optional) – Suppresses stdout from ebook-convert
call (typically dozens of lines). Defaults to
True
-
capybre.metadata.extract_metadata(input_file) → capybre.metadata.Metadata¶ Extracts metadata from an ebook into the standardized
MetadataformatParameters: input_file (str) – path to the input file Returns: Metadataobject
-
capybre.metadata.extract_metadata_map(input_file: str)¶ Extracts metadata from an ebook via an
ebook-metacall, returning a dictParameters: input_file (str) – path to the input file Returns: - Dict mapping between metadata keys and values as directly output from
- the ebook-meta call
-
capybre.metadata.extract_raw_metadata_map(raw_metadata: List[str])¶ Given output of ebook-meta program, extract metadata map :param raw_metadata: String representation of the lines produced
by Calibre (also produced by a call tofetch-ebook-metadata)Returns: - Dict mapping between metadata keys and values as directly output from
- the ebook-meta call
-
class
capybre.metadata.extracted_cover_fileobj(input_file, suppress_output=True)¶ Extracts the cover image and temporarily presents it as a fileobj context
For use like
with extracted_cover_fileobj('original.epub') as f: # f is file pointer to file object upload(f)
Parameters: - input_file (str) – path to the input file
- output_file (str, optional) – path to the output cover image file, defaults to ‘cover.jpg’
- suppress_output (bool, optional) – Suppresses stdout from ebook-convert call
(typically dozens of lines). Defaults to
True