Introducing API Playground (and YouTube Monitoring!) Learn more

Why is the API returning S3 links for profile pictures scraped from LinkedIn profiles?
proxycurl

Why is the API returning S3 links for profile pictures scraped from LinkedIn profiles?

When we scrape a public LinkedIn profile, for example with our Person Profile API, we are served a temporary CDN URL by LinkedIn, which usually begins with https://media-exp.... Some of our customers store that temporary CDN URL directly, and then get burned when it expires in an unpredictable manner. To prevent this, we temporarily cache the image and return an S3 link instead, so the picture URL persists in a predictable fashion. To be precise, 30 minutes.

This is by design, so developers can handle profile images properly by downloading them, instead of relying on LinkedIn's unreliable temporary CDN URLs.

The recommended approach with these temporary links is simple: download the image from the URL returned by our API as soon as you receive the response, then host the image on your end.

Why we do this

LinkedIn does not give us a stable, permanent profile picture URL. What we get upstream is usually a short-lived CDN link. If we were to pass that URL through directly, you would end up storing something that might break minutes later, or later the same day, with no predictable expiry behavior.

So instead, we cache the image briefly on S3 and return that cached link to you. This gives you a small but reliable window to fetch the image and store it yourself.

That 30-minute window is intentional. It is long enough for your application or worker queue to download the image. It is not intended to be permanent asset hosting.

What you should do instead

If you need the profile picture for long-term use, do this:

  1. Call the API.
  2. Read the profile_pic_url returned in the response.
  3. Download the image immediately.
  4. Store it in your own object storage, CDN, or database-backed file store.
  5. Serve that stored image from your own infrastructure.

That is the correct integration pattern.

If you store our returned S3 URL directly in your database and expect it to remain valid forever, it will break. That is not a bug. That is the expected behavior.

About Proxycurl

Note: Proxycurl API has been sunset. I am the founder behind Proxycurl, and I now work on NinjaPear. If you landed on this post from older Proxycurl documentation or integrations, that is why you are seeing the old product name here in context.

The core behavior described in this article remains the same: when dealing with profile pictures scraped from LinkedIn profiles, you should treat returned image URLs as temporary fetch URLs, not permanent storage.

The short version

The API is returning S3 links for profile pictures scraped from LinkedIn profiles because LinkedIn's own image URLs are temporary, and we need to give you a predictable window to download the image safely.

Download first. Host it yourself after that.

If you are building against NinjaPear now and are unsure how to handle image fields in your pipeline, reach out to us before you bake the wrong storage assumption into production. It is a small implementation detail, but it is exactly the kind of detail that causes annoying failures later.

Steven Goh | CEO
World's laziest CEO. CEO of NinjaPear. Ex-Founder of Proxycurl (10+M), Steven founded 5 other startups: Gom VPN, Kloudsec, SilvrBullet, NuMoney, and SharedHere.

Featured Articles

Here's what we've been up to recently.

I dismissed someone, and it was not because of COVID19

The cadence of delivery. Last month, I dismissed the employment of a software developer who oversold himself during the interview phase. He turned out to be on the lowest rung of the software engineers in my company. Not being good enough is not a reason to be dismissed. But not

sharedhere

I got blocked from posting on Facebook

I tried sharing some news on Facebook today, and I got blocked from posting in other groups. I had figured that I needed a better growth engine instead of over-sharing on Facebook, so I spent the morning planning the new growth engine. Growth Hacking I term what I do in