Tuesday, 13 May 2014

Read PDF file content into string

Note:- This article is mainly intended to read content from a PDF file and convert that into a string using C#.

The following steps will guide you to read content from a PDF file:
  1. To start with this, you need to download itextsharp-all-5.2.1, which can be download from here.
  2. Extract the whole archive (inside itextsharp-all-5.2.1 folder also) to your local directory.
  3. Create a new Console Project.
  4.  Add itextsharp-all-5.2.1.dll as reference.
  5. the following Code is :

  6. using System;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Data;
    using System.Drawing;
    using System.Linq;
    using System.Text;
    using iTextSharp.text.pdf;
    using iTextSharp.text.pdf.parser;

    namespace pdf2Text2
    {
      public  class Program      
        {
          static string pdfFile = @"D:\oshAden.pdf";
            static void Main(string[] args)
            {          
                ExtractTextFromPDFPage(pdfFile, 1);
            }
            public static void ExtractTextFromPDFPage(string pdfFile, int pageNumber)
            {
                PdfReader reader = new PdfReader(pdfFile);
                int pageNum = reader.NumberOfPages;
                StringBuilder sb= new StringBuilder ();
                for (int i = 1; i <= pageNum; i++)
                {
                    sb.AppendLine(PdfTextExtractor.GetTextFromPage(reader, i));
                }

                try { reader.Close(); }
                catch { }
                Console.WriteLine(sb);
                Console.ReadLine();
            }
        }
    }

    Download this Project : Click @ me
    Enjoy.

0 comments:

Post a Comment

Topics

ADFS (1) ADO .Net (1) Ajax (1) Angular (47) Angular Js (15) ASP .Net (14) Authentication (4) Azure (3) Breeze.js (1) C# (49) CD (1) CI (2) CloudComputing (2) Coding (8) CQRS (1) CSS (2) Design_Pattern (7) DevOps (4) DI (3) Dotnet (10) DotnetCore (19) Entity Framework (4) ExpressJS (4) Html (4) IIS (1) Javascript (17) Jquery (8) Lamda (3) Linq (10) microservice (4) Mongodb (1) MVC (46) NodeJS (8) React (10) SDLC (1) Sql Server (32) SSIS (3) SSO (1) TypeScript (3) UI (1) UnitTest (2) WCF (14) Web Api (16) Web Service (1) XMl (1)

Dotnet Guru Archives