How to crawl LinkedIn companies in a single line of Python code
Scrape LinkedIn company profiles in a single line of code

How to crawl LinkedIn companies in a single line of Python code

Proxycurl API is a developer's tool. And the chances are that if you can code, you will have no problem scraping a website, unless it is LinkedIn. It is a full-time job keeping up with LinkedIn's bot detection and layout changes.

Proxycurl has just gone live with an update that introduces a new feature -- the ability to turn LinkedIn Company Profile URLs into structured profile data.
In this article, I will share how you can fetch structured data of LinkedIn company profiles with no more than a long line of code.

The beauty of Proxycurl API is that you do not have to concern yourself with the tedious task of building a LinkedIn scraper. Be it one profile or a million profile, today or three months from now, your code to get structured data of Linkedin Companies will remain the same one-liner python code.

What kind of data can you get by scraping Linkedin company profiles?

With Proxycurl Company Profile Endpoint, you can get everything in the "About" page of a Linkedin Company profile. These include phone numbers, funding data, and office locations.

The following table will specify everything that the Proxycurl Company Profile Endpoint will return:

Key Description
description The overview section in the About page
website The listed website on the profile
industry Industry of the company
company_size Listed range of company headcount
company_size_on_linkedin Total employees on Linkedin that declared themselves to be staff of said company
hq Address of company's headquarters
company_type Enumerator of company type. Could be PUBLICLY_HELD or PRIVATELY_HELD
founded_year The year that this company is founded in
specialities List of specialities
locations List of locations
name Name of the company
tagline Tagline of the company
universal_name_id Month and year of the user's birthday (Dependent on profile's privacy settings)
profile_pic_url Profile picture of the
background_cover_image_url Wechat contact information (Dependent on profile's privacy settings)
funding_data Crunchbase data of said company's funding data
phone Phone number
html_src (Optional) HTML source of this Linkedin Profile. To have this value shown, please include src=include in the Proxycurl request.

Example with code

Proxycurl Company Profile Endpoint is like any other ReST API. Make a request, and get something back.

Here, I will make a request in Python with the requests library.


from pprint import pprint
import requests


api_endpoint = 'https://nubela.co/proxycurl/api/linkedin/company'

linkedin_profile_url = 'https://www.linkedin.com/company/gojektech/'

api_key = 'YOUR_API_KEY'

header_dic = {'Authorization': 'Bearer ' + api_key}



response = requests.get(api_endpoint,

                        *params*={'url': linkedin_profile_url},

                        *headers*=header_dic)

pprint(response.json())

In no more than 4 seconds, I will get back this result back in JSON format:

{'background_cover_image_url': 'https://media-exp1.licdn.com/dms/image/C511BAQGaR4Ivd9F9-A/company-background_200/0?e=1596247200&v=beta&t=nik1PHzJd7m7EhRIFcqVMESIxzKc9idEvH14b-lUjqw',
 'company_size': [201, 500],
 'company_size_on_linkedin': 569,
 'company_type': 'PRIVATELY_HELD',
 'description': 'Gojek is a Super App. It’s one app for ordering food, '
                'commuting, digital payments, shopping, hyper-local delivery, '
                'and two dozen services. It is Indonesia’s first and '
                'fastest-growing decacorn building an on-demand empire.\n'
                '\n'
                'In the last 36 months, the startup’s total order volumes have '
                'grown to 1100x and diversified into 20+ verticals. The '
                'company runs the equivalent of three Indian unicorns rolled '
                'into one.\n'
                '\n'
                'A total of 2,000,000 drivers collectively cover an average '
                'distance of 16.5 million kilometers each day, making Gojek '
                'Indonesia’s de facto transportation partner. Gojek is a verb. '
                'Gojek is a way of life. It is quite simply the operating '
                'system of Indonesia. 400+ engineers spread across Jakarta, '
                'Singapore and India make software decisions that impact '
                'entire Southeast Asia.\n'
                '\n'
                'Gojek Tech is the product development and training center of '
                'Gojek. The tech team comprises of developers, data '
                'scientists, designers, and product managers who work on '
                'product innovation, mining data, and crafting consumer '
                'experiences. The average age of the team is 29 and it runs '
                'one of the largest JRuby, Java and Clojure & Go clusters in '
                'Asia.',
 'founded_year': 2015,
 'funding_data': {'$type': 'com.linkedin.voyager.organization.FundingData',
                  'companyCrunchbaseUrl': 'https://www.crunchbase.com/organization/go-jek?utm_source=linkedin&utm_medium=referral&utm_campaign=linkedin_companies&utm_content=profile_cta',
                  'fundingRoundListCrunchbaseUrl': 'https://www.crunchbase.com/organization/go-jek/funding_rounds/funding_rounds_list?utm_source=linkedin&utm_medium=referral&utm_campaign=linkedin_companies&utm_content=all_fundings',
                  'lastFundingRound': {'$type': 'com.linkedin.voyager.organization.FundingRound',
                                       'announcedOn': {'$type': 'com.linkedin.common.Date',
                                                       'day': 3,
                                                       'month': 6,
                                                       'year': 2020},
                                       'fundingRoundCrunchbaseUrl': 'https://www.crunchbase.com/funding_round/go-jek-series-f--983516e8?utm_source=linkedin&utm_medium=referral&utm_campaign=linkedin_companies&utm_content=last_funding',
                                       'fundingType': 'SERIES_F',
                                       'investorsCrunchbaseUrl': 'https://www.crunchbase.com/funding_round/go-jek-series-f--983516e8?utm_source=linkedin&utm_medium=referral&utm_campaign=linkedin_companies&utm_content=all_investors',
                                       'leadInvestors': [{'$type': 'com.linkedin.voyager.organization.Investor',
                                                          'image': {'$type': 'com.linkedin.voyager.common.ImageViewModel',
                                                                    'attributes': [{'$type': 'com.linkedin.voyager.common.ImageAttribute',
                                                                                    'imageUrl': 'https://media-exp1.licdn.com/media-proxy/ext?w=800&h=800&f=n&hash=YsQ89YghUULWcvihMyY2mOmaGp0%3D&ora=1%2CaFBCTXdkRmpGL2lvQUFBPQ%2CxAVta5g-0R6pgQ4UwRQj4b2E4F-i60NSRpbVDW68GXDp5IbcPzK9IJmOO_u_9wJLZ3VcwVNnYKroHG-wRozvRNavLIht0_jkJpD4cThXO0x6g1ZF_NZ0L0hw4cPyUr2oOXgB3u9KaT2xO-jhYVNvBCc7pPWKNNSWOVMW',
                                                                                    'sourceType': 'URL'}]},
                                                          'investorCrunchbaseUrl': 'https://www.crunchbase.com/organization/paypal?utm_source=linkedin&utm_medium=referral&utm_campaign=linkedin_companies&utm_content=investor',
                                                          'name': {'$type': 'com.linkedin.voyager.common.TextViewModel',
                                                                   'text': 'PayPal'}},
                                                         {'$type': 'com.linkedin.voyager.organization.Investor',
                                                          'image': {'$type': 'com.linkedin.voyager.common.ImageViewModel',
                                                                    'attributes': [{'$type': 'com.linkedin.voyager.common.ImageAttribute',
                                                                                    'imageUrl': 'https://media-exp1.licdn.com/media-proxy/ext?w=800&h=800&f=n&hash=GmeEU%2BXGM652rteZWAUKEz4nabs%3D&ora=1%2CaFBCTXdkRmpGL2lvQUFBPQ%2CxAVta5g-0R6pgQ4UwRQj4b2E4F-i60NSRpbVDW68GXDp5IbcPzK9IJmOO_u_9wJLZ3VcwVNnYKroHG-wRozvRNavLIht0_jkJpD4cThXO0x6g1ZF_NZ0Kx0h4I2uAezxay9fwPVSOzWtJu8',
                                                                                    'sourceType': 'URL'}]},
                                                          'investorCrunchbaseUrl': 'https://www.crunchbase.com/organization/facebook?utm_source=linkedin&utm_medium=referral&utm_campaign=linkedin_companies&utm_content=investor',
                                                          'name': {'$type': 'com.linkedin.voyager.common.TextViewModel',
                                                                   'text': 'Facebook'}}],
                                       'moneyRaised': {'$type': 'com.linkedin.common.MoneyAmount',
                                                       'amount': '375000000',
                                                       'currencyCode': 'USD'},
                                       'numOtherInvestors': 0},
                  'numFundingRounds': 10,
                  'updatedAt': 1592331991},
 'hq': {'city': 'Bengaluru',
        'country': 'IN',
        'geographic_area': 'Karnataka',
        'is_hq': True,
        'line_1': 'Diamond District, Tower-B, 4th Floor',
        'postal_code': '560008'},
 'industry': 'Internet',
 'locations': [{'city': 'Bengaluru',
                'country': 'IN',
                'geographic_area': 'Karnataka',
                'is_hq': True,
                'line_1': 'Diamond District, Tower-B, 4th Floor',
                'postal_code': '560008'},
               {'city': 'Jakarta Selatan',
                'country': 'ID',
                'geographic_area': 'Jakarta',
                'is_hq': False,
                'line_1': 'Pasaraya Blok M, Jalan Sultan Iskandarsyah II No.1, '
                          'RT.3/RW.1, ',
                'postal_code': '12160'},
               {'city': 'Sinagpore',
                'country': 'SG',
                'geographic_area': 'Singapore',
                'is_hq': False,
                'line_1': '8 Shenton Way, AXA Tower',
                'postal_code': '068811'},
               {'city': 'Gurgaon',
                'country': 'IN',
                'geographic_area': 'Gurgaon',
                'is_hq': False,
                'line_1': '1st Floor, Tower A, Building 8A, DLF Cyber Hub',
                'postal_code': '122002'}],
 'name': 'Gojek Tech',
 'phone': None,
 'profile_pic_url': 'https://media-exp1.licdn.com/dms/image/C510BAQFYcIg1UROecg/company-logo_400_400/0?e=1604534400&v=beta&t=6xZrLFeAJUT5gMRHViZ4G3l4zAUH7_g6jsW0jTHuUNg',
 'specialities': [],
 'tagline': '',
 'universal_name_id': 'gojektech',
 'website': 'https://www.gojek.io'}

Managed service to scrape public LinkedIn profiles

I wish there is more, but this is it. Proxycurl manages the changes in LinkedIn layout and bot detection, so all you have to do is make the above requests a million times with different companies and concern yourself with product design.

Leave the hard stuff to us.

API documentation can be found at https://nubela.co/proxycurl/docs#linkedin-api and you can try Proxycurl out immediately with 10 credits by entering your email at https://nubela.co/proxycurl/auth/login.

Steven Goh | CEO
Share:

Subscribe to our newsletter

Get the latest news from Proxycurl

Featured Articles

Here’s what we’ve been up to recently.