We want to benchmark our LinkedIn Company Page against a series of our competitors. As part of this, we want to collect data on their posts, and the engagement on their posts.
From a list of LinkedIn Company Pages -- e.g. (link removed):
Collect posts and their metadata (date published, social actions on the post, the user IDs of people who has engaged with the post etc. -- a full list is published below)
We expect to collect at least 100 posts per page. Note that LinkedIn pages have an “infinite scroll” feature.
Please note: If successful, this may be the first of many similar jobs - we may hire more than one freelancer to work on the project simultaneously.
Please see attached JSON “LinkedIn post model" or this schema for example models of the data.
PAGE OBJECT:
- Page name
- Page URL
- Followers
- Employees
POST OBJECT:
- Post URL
- Author (page URL)
- Date (elapsed time since post. LinkedIn provides "fuzzy" dates -- e.g. "1d", "3d", "1w", "2m")
- Content (complete message text)
- Type of post (one of “post”, “shared article”, “shared post”, “linkedin video”, “external video”, “native document”)
- Number of engagements
- Number of comments
ENGAGER OBJECT
- Post URL where the engagement occurs
- Name of engager
- Bio text of engager
- Profile URL of engager
- Type of engagement (one of “like”, “celebrate”, “love”, “insightful”, “curious”
COMMENT OBJECT
- Post URL where the comment occurs
- Name of commenter
- Bio text of commenter
- Profile URL of commenter
- Comment (complete message text)
- Date (elapsed time since comment. LinkedIn provides "fuzzy" dates -- e.g. "1d", "3d", "1w", "2m")
- Comment URL
Depending on the post type (e.g. “shared article”, “linkedin video”) we’ll want to collect a little additional meta data. In each case, we’d collect all the data listed above, plus the specific data listed below. We’ve listed what we believe to be the “special post types” below.
SHARED ARTICLE Very common — a clickable rich media sharing card with share image, headline and subhead Shared article example
- post URL
- shared article URL (the URL behind the sharing card)
- title (bold text)
- subtitle (grey text - usually the root URL of the site where the shared article lives)
LINKEDIN VIDEO Fairly rare — a video has been uploaded to LinkedIn and appended to the post. It appears in an embedded viewer LinkedIn video example
- post URL
- LinkedIn video URL
- views (integer)
EXTERNAL VIDEO Fairly rare — a video on an external platform (e.g. YouTube or Vimeo) has been linked from the post. It has much in common with the Shared Article type above. External video example
- post URL
- external video URL
- title (bold text)
- subtitle (grey text - usually the root URL of the site where the shared article lives)
NATIVE DOCUMENT Relatively rare — a document (e.g. PDF) has been uploaded to LinkedIn and appended to the post. It appears in an embedded viewer. Native document example
- post URL
- native document URL
SHARED POST Rare — the page has shared a post by another LinkedIn user as an embedded attachment to their own post. Shared post example
- post URL
- shared post URL
- shared post author name
- shared post author bio
- shared post author profile URL
- shared post content
- shared post date (elapsed time since post. LinkedIn provides "fuzzy" dates -- e.g. "1d", "3d", "1w", "2m")
Please supply data as SQLite files (or MySQL exports.) JSON is an acceptable alternative. We can discuss other alternatives; but note that we think flat file options are unlikely to suit the nature of the project.
Please supply scripts used (preferably Python 3.)