As a developer, you will see that working with images can be quite a complex topic in PDF format. A wide variety of manipulations with images awaits the user when he is faced with the work of PDF documents. PDF files can contain the following types of images.
PdfPig provides InlineImage
and XObjectImage
to handle both types, and both of the classes implements IPdfImage
.
You can access all the images on a page by calling the Page.GetImages()
method that returns the set of images on the page.
public static void Example1()
{
using (PdfDocument document = PdfDocument.Open(@"D:\my_pdf_file1.pdf"))
{
foreach (Page page in document.GetPages())
{
IEnumerable<IPdfImage> images = page.GetImages();
Console.WriteLine("Total Images: {0}", images.Count());
}
}
}
The actual content of the image bytes is either:
ColorSpace
.Let's consider another example that extracts the images from a PDF file.
public static void Example2()
{
using (PdfDocument document = PdfDocument.Open(@"D:\my_pdf_file1.pdf"))
{
foreach (Page page in document.GetPages())
{
foreach (var image in page.GetImages())
{
if (!image.TryGetBytes(out var b))
{
b = image.RawBytes;
}
var type = string.Empty;
switch (image)
{
case XObjectImage ximg:
type = "XObject";
break;
case InlineImage inline:
type = "Inline";
break;
}
Console.WriteLine("Image with {0} bytes of type '{1}' on page {2}. Location: {3}.", b.Count, type, page.Number, image.Bounds);
}
}
}
}
Where the image is a JPEG decoding the bytes are not supported directly and IPdfImage.TryGetBytes(out var bytes)
will return false.
IPdfImage.RawBytes
is a valid JPEG file.RawBytes
are usually the bitmap with one or more PDF filters applied.IPdfImage.TryGetBytes(out var bytes)
will return the bytes after reversing these filters in PDF format.ColorSpace
.