html - How to extract particular text from pdf using php

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

html - How to extract particular text from pdf using php

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

I need to store name of candidate and his id in mysql table , I have extracted the text using pdfparser

<?php

// Include Composer autoloader if not already done.
include 'vendorautoload.php';

// Parse pdf file and build necessary objects.
$parser = new  SmalotPdfParserParser();
$pdf    = $parser->parseFile('C:DesktopDataApplicationForm.pdf');

$text = $pdf->getText();
echo $text;

?>

right now its just showing the extracted text ,now I need to extract name and id from the page(the page which appears when we run the above program) which is filled with extracted text ,on clicking view page source I found the id I need

appears on:-

tr 1115*15 td.line-number 31*15 and td.line-content:1084*15, line number value = 12

name exists on :-

tr 1115*15 td.line-number 31*15 and td.line-content:1084*15, line number value = 13

I am lost at this point as I don't know how to get this info .Please help me .

I have multiple pdf's and all info I need is at same place (by same place I mean on line number value =13,tr 1115*15 td.line-number 31*15 and td.line-content:1084*15, )I just want to find a way to solve this problem , help me .

if you have any doubts I will clarify , if the question seems unclear I will improve it.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

187 views

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:23:25+0000

I need to extract name of candidate and his id from a pdf ,so after using pdfparser I extracted the text and downloaded the html page using php

<?php
$filename = 'filename.txt';
header('Content-disposition: attachment; filename=' . $filename);
header('Content-type: text');
// ... the rest of your file
?>
<?php

// Include Composer autoloader if not already done.
include 'C:UsersDownloadspdfparser-master (1)pdfparser-mastervendorautoload.php';

// Parse pdf file and build necessary objects.
$parser = new  SmalotPdfParserParser();
$pdf    = $parser->parseFile('C:UsersDesktopDataApplicationForm (3).pdf');

$text = $pdf->getText();
echo $text;


?>

I did this cause the info I need that was on line 12 and 13 of the view source page and this was was with all the pdf's I need ,so after downloading the html page in form of text file, I used the code below to extract text I needed from the downloaded file and store it in database

<?php

$source = file("filename.txt");

$number =$source[12];
$name = $source[13];
$gslink = "https://www.google.co.in/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=google+scholar+".$name;        
$dblplink = "https://www.google.co.in/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=dblp+".$name ;
$servername = "127.0.0.1";
$username = "root";
$password = "";
$dbname = "mydb";
// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);
// Check connection
if ($conn->connect_error) {
    die("Connection failed: " . $conn->connect_error);
} 
$sql = "INSERT INTO faculty (candidate_no,candidate_name,gs_link,dblp_link)VALUES('$number','$name','$gslink','$dblplink')";
if ($conn->query($sql) === TRUE) {
    echo "New record created successfully";
} else {
    echo "Error: " . $sql . "<br>" . $conn->error;
}

$conn->close();
?>

Categories

html - How to extract particular text from pdf using php

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags