How to crawl LinkedIn companies in a single line of Python code
Proxycurl API is a developer's tool. And the chances are that if you can code, you will have no problem scraping a website, unless it is LinkedIn. It is a full-time job keeping up with LinkedIn's bot detection and layout changes.
Proxycurl has just gone live with an update that introduces a new feature -- the ability to turn LinkedIn Company Profile URLs into structured profile data.
In this article, I will share how you can fetch structured data of LinkedIn company profiles with no more than a long line of code.
The beauty of Proxycurl API is that you do not have to concern yourself with the tedious task of building a LinkedIn scraper. Be it one profile or a million profile, today or three months from now, your code to get structured data of Linkedin Companies will remain the same one-liner python code.
What kind of data can you get by scraping Linkedin company profiles?
With Proxycurl Company Profile Endpoint, you can get everything in the "About" page of a Linkedin Company profile. These include phone numbers, funding data, and office locations.
The following table will specify everything that the Proxycurl Company Profile Endpoint will return:
Key | Description |
---|---|
description | The overview section in the About page |
website | The listed website on the profile |
industry | Industry of the company |
company_size | Listed range of company headcount |
company_size_on_linkedin | Total employees on Linkedin that declared themselves to be staff of said company |
hq | Address of company's headquarters |
company_type | Enumerator of company type. Could be PUBLICLY_HELD or PRIVATELY_HELD |
founded_year | The year that this company is founded in |
specialities | List of specialities |
locations | List of locations |
name | Name of the company |
tagline | Tagline of the company |
universal_name_id | Month and year of the user's birthday (Dependent on profile's privacy settings) |
profile_pic_url | Profile picture of the |
background_cover_image_url | Wechat contact information (Dependent on profile's privacy settings) |
funding_data | Crunchbase data of said company's funding data |
phone | Phone number |
html_src | (Optional) HTML source of this Linkedin Profile. To have this value shown, please include src=include in the Proxycurl request. |
Example with code
Proxycurl Company Profile Endpoint is like any other ReST API. Make a request, and get something back.
Here, I will make a request in Python with the requests library.
from pprint import pprint
import requests
api_endpoint = 'https://nubela.co/proxycurl/api/linkedin/company'
linkedin_profile_url = 'https://www.linkedin.com/company/gojektech/'
api_key = 'YOUR_API_KEY'
header_dic = {'Authorization': 'Bearer ' + api_key}
response = requests.get(api_endpoint,
*params*={'url': linkedin_profile_url},
*headers*=header_dic)
pprint(response.json())
In no more than 4 seconds, I will get back this result back in JSON format:
{'background_cover_image_url': 'https://media-exp1.licdn.com/dms/image/C511BAQGaR4Ivd9F9-A/company-background_200/0?e=1596247200&v=beta&t=nik1PHzJd7m7EhRIFcqVMESIxzKc9idEvH14b-lUjqw',
'company_size': [201, 500],
'company_size_on_linkedin': 569,
'company_type': 'PRIVATELY_HELD',
'description': 'Gojek is a Super App. It’s one app for ordering food, '
'commuting, digital payments, shopping, hyper-local delivery, '
'and two dozen services. It is Indonesia’s first and '
'fastest-growing decacorn building an on-demand empire.\n'
'\n'
'In the last 36 months, the startup’s total order volumes have '
'grown to 1100x and diversified into 20+ verticals. The '
'company runs the equivalent of three Indian unicorns rolled '
'into one.\n'
'\n'
'A total of 2,000,000 drivers collectively cover an average '
'distance of 16.5 million kilometers each day, making Gojek '
'Indonesia’s de facto transportation partner. Gojek is a verb. '
'Gojek is a way of life. It is quite simply the operating '
'system of Indonesia. 400+ engineers spread across Jakarta, '
'Singapore and India make software decisions that impact '
'entire Southeast Asia.\n'
'\n'
'Gojek Tech is the product development and training center of '
'Gojek. The tech team comprises of developers, data '
'scientists, designers, and product managers who work on '
'product innovation, mining data, and crafting consumer '
'experiences. The average age of the team is 29 and it runs '
'one of the largest JRuby, Java and Clojure & Go clusters in '
'Asia.',
'founded_year': 2015,
'funding_data': {'$type': 'com.linkedin.voyager.organization.FundingData',
'companyCrunchbaseUrl': 'https://www.crunchbase.com/organization/go-jek?utm_source=linkedin&utm_medium=referral&utm_campaign=linkedin_companies&utm_content=profile_cta',
'fundingRoundListCrunchbaseUrl': 'https://www.crunchbase.com/organization/go-jek/funding_rounds/funding_rounds_list?utm_source=linkedin&utm_medium=referral&utm_campaign=linkedin_companies&utm_content=all_fundings',
'lastFundingRound': {'$type': 'com.linkedin.voyager.organization.FundingRound',
'announcedOn': {'$type': 'com.linkedin.common.Date',
'day': 3,
'month': 6,
'year': 2020},
'fundingRoundCrunchbaseUrl': 'https://www.crunchbase.com/funding_round/go-jek-series-f--983516e8?utm_source=linkedin&utm_medium=referral&utm_campaign=linkedin_companies&utm_content=last_funding',
'fundingType': 'SERIES_F',
'investorsCrunchbaseUrl': 'https://www.crunchbase.com/funding_round/go-jek-series-f--983516e8?utm_source=linkedin&utm_medium=referral&utm_campaign=linkedin_companies&utm_content=all_investors',
'leadInvestors': [{'$type': 'com.linkedin.voyager.organization.Investor',
'image': {'$type': 'com.linkedin.voyager.common.ImageViewModel',
'attributes': [{'$type': 'com.linkedin.voyager.common.ImageAttribute',
'imageUrl': 'https://media-exp1.licdn.com/media-proxy/ext?w=800&h=800&f=n&hash=YsQ89YghUULWcvihMyY2mOmaGp0%3D&ora=1%2CaFBCTXdkRmpGL2lvQUFBPQ%2CxAVta5g-0R6pgQ4UwRQj4b2E4F-i60NSRpbVDW68GXDp5IbcPzK9IJmOO_u_9wJLZ3VcwVNnYKroHG-wRozvRNavLIht0_jkJpD4cThXO0x6g1ZF_NZ0L0hw4cPyUr2oOXgB3u9KaT2xO-jhYVNvBCc7pPWKNNSWOVMW',
'sourceType': 'URL'}]},
'investorCrunchbaseUrl': 'https://www.crunchbase.com/organization/paypal?utm_source=linkedin&utm_medium=referral&utm_campaign=linkedin_companies&utm_content=investor',
'name': {'$type': 'com.linkedin.voyager.common.TextViewModel',
'text': 'PayPal'}},
{'$type': 'com.linkedin.voyager.organization.Investor',
'image': {'$type': 'com.linkedin.voyager.common.ImageViewModel',
'attributes': [{'$type': 'com.linkedin.voyager.common.ImageAttribute',
'imageUrl': 'https://media-exp1.licdn.com/media-proxy/ext?w=800&h=800&f=n&hash=GmeEU%2BXGM652rteZWAUKEz4nabs%3D&ora=1%2CaFBCTXdkRmpGL2lvQUFBPQ%2CxAVta5g-0R6pgQ4UwRQj4b2E4F-i60NSRpbVDW68GXDp5IbcPzK9IJmOO_u_9wJLZ3VcwVNnYKroHG-wRozvRNavLIht0_jkJpD4cThXO0x6g1ZF_NZ0Kx0h4I2uAezxay9fwPVSOzWtJu8',
'sourceType': 'URL'}]},
'investorCrunchbaseUrl': 'https://www.crunchbase.com/organization/facebook?utm_source=linkedin&utm_medium=referral&utm_campaign=linkedin_companies&utm_content=investor',
'name': {'$type': 'com.linkedin.voyager.common.TextViewModel',
'text': 'Facebook'}}],
'moneyRaised': {'$type': 'com.linkedin.common.MoneyAmount',
'amount': '375000000',
'currencyCode': 'USD'},
'numOtherInvestors': 0},
'numFundingRounds': 10,
'updatedAt': 1592331991},
'hq': {'city': 'Bengaluru',
'country': 'IN',
'geographic_area': 'Karnataka',
'is_hq': True,
'line_1': 'Diamond District, Tower-B, 4th Floor',
'postal_code': '560008'},
'industry': 'Internet',
'locations': [{'city': 'Bengaluru',
'country': 'IN',
'geographic_area': 'Karnataka',
'is_hq': True,
'line_1': 'Diamond District, Tower-B, 4th Floor',
'postal_code': '560008'},
{'city': 'Jakarta Selatan',
'country': 'ID',
'geographic_area': 'Jakarta',
'is_hq': False,
'line_1': 'Pasaraya Blok M, Jalan Sultan Iskandarsyah II No.1, '
'RT.3/RW.1, ',
'postal_code': '12160'},
{'city': 'Sinagpore',
'country': 'SG',
'geographic_area': 'Singapore',
'is_hq': False,
'line_1': '8 Shenton Way, AXA Tower',
'postal_code': '068811'},
{'city': 'Gurgaon',
'country': 'IN',
'geographic_area': 'Gurgaon',
'is_hq': False,
'line_1': '1st Floor, Tower A, Building 8A, DLF Cyber Hub',
'postal_code': '122002'}],
'name': 'Gojek Tech',
'phone': None,
'profile_pic_url': 'https://media-exp1.licdn.com/dms/image/C510BAQFYcIg1UROecg/company-logo_400_400/0?e=1604534400&v=beta&t=6xZrLFeAJUT5gMRHViZ4G3l4zAUH7_g6jsW0jTHuUNg',
'specialities': [],
'tagline': '',
'universal_name_id': 'gojektech',
'website': 'https://www.gojek.io'}
Managed service to scrape public LinkedIn profiles
I wish there is more, but this is it. Proxycurl manages the changes in LinkedIn layout and bot detection, so all you have to do is make the above requests a million times with different companies and concern yourself with product design.
Leave the hard stuff to us.
API documentation can be found at https://nubela.co/proxycurl/docs#linkedin-api and you can try Proxycurl out immediately with 10 credits by entering your email at https://nubela.co/proxycurl/auth/login.