Skip to content

Instantly share code, notes, and snippets.

@Rishabh04-021
Last active October 25, 2019 12:33
Show Gist options
  • Select an option

  • Save Rishabh04-021/8d814ac7706b35c395559178c21e2d0b to your computer and use it in GitHub Desktop.

Select an option

Save Rishabh04-021/8d814ac7706b35c395559178c21e2d0b to your computer and use it in GitHub Desktop.

Design Document for PDF Generator

Use Cases

  • Generate pdf for form responses - This is the only one we are catering right now.

Design Specifications :

Google variable mapping Spreadsheet

  • It should contain the mapping details in sheet named mappingDetails.

  • It should have the options sheet named optionSheet.

  • The mapping sheet (mappingDetails) will have variable mapping in the following format:

variableInTemplate variableType variableInRawData
0 text studentFirstName
1 text studentLastName
2 imageLink studentImage
3 text studentRollNo
4 imageLink studentSignature
5 options myVariable1
6 options myVariable2
7 options myVariable3
  • The options sheet (optionSheet) will have the mapping details in the following format:
myVariable1 a::first option b::this might c::value d::value
myVariable2 a::this is an option b::that value c::value d::value e::value f::values here g::value available
myVariable3 a::that option b::this is variable c::value d::value
myVariable4 a::this is answer b::it is not true c::value d::value e::values here f::not available g::orange

Template Document

  • The template document should have the placeholders in the following format:
Student Detail Document.

The name of student is <<0>> <<1>>
The roll number of student is <<3>>
The picture of student is <<2>>
The signature of student is <<4>>
.
.
The students response to Question 1 is <<5>>

Input parameters for API:

  1. Variable Mapping Spreadsheet id (variable name - varMappingID)
  2. Docs template id (variable name - docTemplateID)
  3. data in json format (variable name - dataAll). This should also contain uuid, this will be used to check the status of the ongoing requests.

Note - The data should be a dictionary like {"data Header": "data Value",}

Output of API:

  1. I'll return the current status of the process or the url of the pdf if it is generated.
  • Output If pdf is generated:
{
	"status":"completed",
	"url":"url_here"
}
  • Output if the process is in queue:
{
	"status":"queue"
}

Requirements and the Point of failures for PDF Generation:

  • The API will not work if any of the input parameters are entered incorrectly.
    • E.g. if the template id is entered wrong.
    • The correction of input parameters will be checked by trying to get data from the urls. and an exception will be returned if some error is found error in input parameters.
  • The API will only accept post requests.
  • It will not work if the authenticated account did not have the readonly permission for the Spreadsheet and the read write permission to the google drive of the authenticated account (The template document will be present in this drive).
    • User authentication is not being done at this point. So the user has to make and account (authenticated account) the owner of the template id.
  • The API will not work if the mapping is incorrect, means a variable mapped in the mapping sheet to a value which is not present in the variable dictionary dataAll. It will return an error wrong mapping.
  • It will not work if the app script is installed on some other drive than that of authenticated account.
  • It will work only if the app script is published as a web app and has the following scopes enabled
https://www.googleapis.com/auth/documents
https://www.googleapis.com/auth/drive
https://www.googleapis.com/auth/script.external_request
  • If the url (of image) present in the excel sheet is wrong, then the pdf will be generated with a sample image in its place.
  • If the data is not found for a template it'll write No data found in the template.
  • The temporary document in the drive will be deleted once its pdf is uploaded to the gcs.
  • The API will queue the requests and will work for single pdf generation.
  • The filename on the drive and the gcs will be generated automatically. The user can't supply a custom file name.

Note - API and user authentication is not being implemented at this point of time.

Database format

  • Why use database?
    • We need to store the details for all the pdf's we are generating. So, that we can process queries related to them even after 2 months of generation of pdf. Also, we are using queuing so need a database to search from.
  • Database to be used - plsql
  • Database schema -
uuid, google doc url (returned by the app script) , pdf name, current status (waiting, processing, complete), GCS URL

Note - This design doc is designed keeping in mind the single use case. But can adapt to all the similar use cases. Database is designed keeping in mind the same.

FAQ's

  • What will the API return?
    • It will return the final url of the pdf uploaded to GCS or the current status.
  • How will you get the pdf url if it is 300th request in 1 minute?
    • The API will implement queuing and in case if it has a lot of requests then it'll return response "request in queue"
  • How will you check if the pdf is generated for a particular request?
    • It will have a module that will check the status of your request by providing the uuid. If the pdf is generated and uploaded to GCS then it'll return the url of uploaded PDF.
  • When the template placeholder (<<19>>) for an image/text is not found, what will it do.
    • If the placeholder for an image/text is not found then no error or exception will be thrown, a sample image will be placed and No data found will be written in case of text. The PDF will still be generated using the provided data and arguments.

Things to keep in mind for me

  • Write content of body of one document to other document and then saving it.
  • Modify urls to actual ones. Send authentication tokens along with them to fetch the image.
  • Fetch the pdf url from instance id.
  • Store things in database.
  • Implement queue model.

Modules and Components of the project

  1. Getting all the data and checking if it is in proper format (It should include uuid as that will be used for checking the status of the project).

  2. Fetching data from google spreadsheets on the basis of the data received on the API.

  3. Variable mapping, including mapping the options.

  4. Correcting and Checking the urls.

    4.1. Modify urls to actual ones.

  5. Generating the App Script url for the received data.

  6. Queuing the data to be sent to the App script.

  7. Appscript processing the data and generating a new file.

    7.1. Send authentication tokens along with them to fetch the image. 7.2. Write content of body of one document to other document and then saving it.

  8. Downloading the pdf when generated successfully.

  9. Uploading the pdf to GCS and getting a return url

  10. Adding the contents to the database.

  11. Creating a module to check if the pdf is generated for a particular uuid. Returning the status of request or the final url of the uploaded pdf. It will check the status of process in queue and then in the database and will return the appropriate response.

Development Timeline

Module Time to develop Comments
1 ~40 mins New module. Doesn't exist already.
2 ~10 mins Modifying the existing module.
3 ~70 mins 20 mins for mapping the variables. 50 mins (hopefully) for mapping the options.
4 ~10 mins 10 mins for modifying the urls.
5 ~20 mins It'll be generated on the basis of the collected data.
6 ~120 mins I've never implemented anything like this before. So, considering the time on the basis of that fact.
7.1 ~60 mins I have to fetch the images after doing authentication. Not sure how to do it.
7.2 ~40 mins We tried various things, nothing worked. Have to find a new efficient method to do it. If nothing worked then we can implement the currently implemented method.
8 ~20 mins will implement threading here.
9 ~20 mins will do it using threading.
10 ~60 mins Create all the required modules related to databases.
11 ~60 mins Have to create all related modules.

Limitations

  • We are not authenticating the API at this point.
  • No user authentication is available.
  • The user have to grant write access to admin@samagragovernance (or any other authentication account) the document template.

Future Scope

  • The project can work for various similar use cases.
  • API and user authentication.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment