How to Make and Format E-books

how to make ebooks epub

Image Credit: Flickr Creative Commons

Required Programs: Microsoft Word (recommended), Calibre (recommended), Sigil (optional)

1. Open the document you want to convert into an e-book.1

2. If the text in your document has been encoded in a code page other than Unicode, copying and pasting the text into a new MS Word document and saving it will automatically re-encode the text as Unicode (source). Skip this step if your document is already in Unicode.

Steps: Select all the text (CTRL+A), copy (CTRL+C) and Paste as HTML Format. The reason I wouldn’t recommend Paste Special > Unformatted Unicode Text or Unformatted Text is because it can create really messy html code.2

3. Find (CTRL+F) all the chapter titles which you wish to be included in the table of contents and mark them as style “Heading 1.” You can find the style “Heading 1” in the Home tab, in the Styles section. (You can also tag titles as headings in Sigil).

Example: Let’s say I want to convert a Korean drama script into an epub file and that I want the script for each episode to be a separate chapter in my e-book. I’ll have to search for the phrase “1회” (Episode 1), select the text, and mark it as “Heading 1.” Then I’ll have to do the same for “Episode 2,” “Episode 3,” and so forth. If the document you wish to convert doesn’t have “chapter titles,” then you’ll have to add them in yourself if you want a table of contents.

4. Fix any extra hard line breaks with the Find and Replace tool in MS Word (CTRL+F). See instructions near the end of this page. Skip this step if you don’t have any irregular line breaks.

5. Save the document as a Web Page Filtered (*.htm;*.html) file.

6. Add/import the html file into Calibre.

7. Using Calibre, convert the html file into an epub file (or any e-book format you wish). (You can also convert html files into epub files with Sigil).

To convert the html file into an epub file, right-click the html file and select “Convert books > convert individually“. The conversion window will open to the Metadata tab. Edit the metadata (title, author, etc), and add a cover image (Change cover image > Browse > Select image). Sigil and Calibre recommend that the cover image size should be 590 pixels wide and 750 pixels high. Then click on “Look & Feel.” If you wish, you can “Remove spacing between paragraphs” or “Insert blank line between paragraphs.” You can also adjust the line height (line spacing). Next, click on “Page Setup.” Make sure the appropriate input profile (“Default input profile” e.g. zip) and output profile (“iPad” e.g. epub) are selected.

Next, click on “Structure Detection.” If you’d like page breaks before each chapter, Click on the magic wand beside “Insert page breaks before (XPath expression):”. Then select “h1” from the “Match HTML tags with tag name” drop down listTo generate a table of contents, click on “Table of Contents.” Click on the magic wand beside “Level 1 TOC (XPath expression):“. Then select “h1” from the “Match HTML tags with tag name” drop down list. (You can also generate a table of contents in Sigil). Now that you have adjusted all the settings, click “OK.” Calibre will convert your file.

If you don’t have Calibre, you can convert documents into any e-book format by uploading the files onto ebook.online-convert.com. After the site converts your file, you just have to download your converted file.

8. Tweak the formatting of your e-book (optional). With Sigil, you can tweak the CSS style sheet of your epub.

Some basic CSS tweaks:

  • h1 {font-size:175%}
  • font-family: “맑은 고딕”, “Malgun Gothic”. 

You can also correct any errors in the text or in the html code. All you have to do is open your epub file with Sigil and open the htm pages. To edit the html code, switch to “code view.” Click save when finished.

Read these great tutorials for more detailed instructions and explanations:

Read more about Sigil:


How to Fix Irregular Line Breaks with Microsoft Word

Irregular line breaks are usually the result of copying and pasting OCR’ed text or text from pdfs). In MS Word, the code for a hard line break is ^p. Read this helpful tutorial: How Do I Fix Hard Line Breaks.

Examples:

spacing,-header

This Winter Sonata script had tons of extra hard line breaks. I fixed it by entering:
Find what: ^p^p^p^p^p
Replace with: ^p^p
And clicking Replace All. Once I replaced all the unnecessary 5-paragraph hard line breaks, I continued in this fashion by replacing the 4-paragraph hard line breaks (^p^p^p^p) and the 3-paragraph hard line breaks (^p^p^p). Now we’re only left with single paragraphs (^p) and double paragraphs (^p^p).

hard-line-breaks

If you’d like to get rid of the all the line breaks but preserve the double paragraphs, you can do so by replacing the double paragraphs (^p^p) with a unique string of characters, for example, “xxxxxx”. Next, replace all the line breaks. After all the line breaks have been removed, you can re-insert the double paragraphs you preserved by replacing “xxxxxx” with “^p^p”.

The Green Rose script shown above had double paragraphs before every scene (S#1 프롤로그) which I wished to preserve. However, I wanted to get rid of all the extra line breaks which were disrupting the flow of the script’s sentences. So I replaced “^pS#” with “xxxxxx”. Then I got rid of the extra line breaks by replacing “^p^p” with “^p”. Afterwards, I re-inserted the double paragraphs I preserved by by replacing “xxxxxx” with “^p^p”.

How to Fix Irregular Line Breaks with Regex and Sigil

You can also fix irregular hard line breaks with Regex and Sigil. Upon inspecting an epub with irregular line breaks in Sigil’s code view, I found that it had multiple “empty” paragraphs with varying numbers of non-breaking spaces ( ).

For example:
<p class=”MsoNoSpacing”>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</p>
<p class=”MsoNoSpacing”>&nbsp;&nbsp;&nbsp;&nbsp;</p>
<p class=”MsoNoSpacing”>&nbsp;</p>

To find and replace or delete all of these empty paragraphs, you can enter the regular expression “<p class=”MsoNoSpacing”\>(&nbsp;)+\</p>” in the Find field.  Sigil will search for strings with tags that match the text in black and have multiples of the group “&nbsp;” within those tags.

Example Regular Expression Break Down

  • (…..) = Group. I’m searching for the group “&nbsp.” 
  • + = 1 or more. I’m searching for multiples of the group “(&nbsp).”
  • \ = Escape character. < and > are common Regex metacharacters. The escape character “\” signifies that the metacharacter should be treated as literal character, not a Regex metacharacter. 

Additional Comments

Once you’ve converted your documents into html files with MS Word or LibreOffice, you can also create epubs with Sigil. Everything you can do with Calibre, you can do with Sigil: change or add covers and images, generate tables of contents, convert html files into epubs, etc. With Sigil, you can also directly inspect and edit html code, edit the CSS style sheet, use Regex to fix formatting errors, and validate your epub. Sigil probably also creates cleaner code than Calibre as well. The learning curve for Sigil is a little higher, but once you get more familiar with creating e-books, it’s well worth trying Sigil out.

Notes

1 If you open a file and it has garbled text, it’s probably because Word decoded it incorrectly. Read this article to fix it.

2 Clean code produced by Paste as HTML Format:
<p><span style=”font-family:굴림체”><span xml:lang=”KO”>오터번 부인은 머리에 두른 터번을 풀면서 신경질적으로 말했다</span>.</span></p>

Messy code produced by Paste as Unformatted Unicode Text:
<p><span xml:lang=”KO”>오터번</span> <span xml:lang=”KO”>부인은</span> <span xml:lang=”KO”>머리에</span> <span xml:lang=”KO”>두른</span> <span xml:lang=”KO”>터번을</span> <span xml:lang=”KO”>풀면서</span> <span xml:lang=”KO”>신경질적으로</span> <span xml:lang=”KO”>말했다</span>.</p>

Sources

Additional Reading

Advertisements

2 thoughts on “How to Make and Format E-books

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s