The 10 year old indie mobile app
What worked for the long haul, what we cut out over time, and the boring tech that keeps the app alive
Hi, I’m Taylor Hughes. I’m a software engineer. I have shipped apps and built teams at Facebook, Google, Clubhouse and a bunch of start-ups in between.
We built a photo sharing app called Cluster in 2013. It launched in the App Store 10 years ago next month. At the time, there were a lot of competitors, with more launching seemingly every day.
By 2015, we figured out that Cluster wasn’t going to grow to be a billion-user product. We tried a million different strategies, and we could never get growth to the place it needed to be for a VC-sized return. But the app slowly kept finding new users, and lots of people connected with their loved ones inside the app — including our own families. So we worked hard to keep it going.
Cluster is still alive today, and most of its original competitors don’t exist anymore (including heavy hitters like Facebook Moments and Dropbox Carousel). More than 4 million people have shared almost 150 million photos and videos over the last decade on the platform. We’ve seen long term family and friend groups stay on the app for the full 10 years, and thousands of long term users still use the app almost every day.
Over the years, we’ve had to maintain Cluster’s original software, slowly upgrading and changing core parts of the app to fix bugs, drive down costs and try to make enough money to keep the business alive. We added a way to print photo books (2016) and an App Store subscription model (2022). We migrated between AWS accounts and moved some services to GCP. We’ve fixed bugs from new third party SDKs and performed major platform upgrades across all six languages. (Python, Go, JavaScript, Swift, Objective-C, and Java.)
And a lot of the original code remains! But we’ve learned a lot of valuable lessons in building software for the long term.
Where we’ve been, and where we are now
The original backend in 2013 was a Python 2.7 monolith, running Django 1.4 with a Postgres database. At the time, I used fabric to deploy with git to custom AMIs on EC2. Last summer I upgraded to Python 3, and we’re running on Elastic Beanstalk on brand new EC2 Graviton instances. We now deploy with CodePipeline.
As for the mobile apps, the original iOS app was built against the iOS 6.0 SDK. I think Android targeted 3.x, Honeycomb. I distinctly remember having to deal with the iOS 7 release — remember how everything became flat all of a sudden? (Maybe you don’t.) Our current minimum SDK target is iOS 14, and we’re using a bespoke mix of Swift (original release date: June 2014) and Objective-C. The Android app is behind, but we’ll circle back to spruce that up as soon as we can.
So, what stood the test of time?
Optimizations often aren’t worth maintaining
One of the biggest lessons speaks to my own growth as an engineer over the last decade: Novelty and optimization are the enemy of durable, long term code.
In 2013, I heard about how Instagram managed to share your photos extra-super-duper fast. They have this neat trick: When you choose a photo and start playing with filters, they are already uploading the photo in the background. So when you finally hit “Post”, boom! The photo seems to post instantly. How awesome.
So, of course I built this, too. I built the system to upload the photos after choosing them, but before hitting “Post”, and made it extra complex by layering in a way to upload downscaled images first, to make the initial posts even faster. I made the notifications system super complex to create perfect aggregated notifications — much better than just “Taylor uploaded a new photo”. I made video uploads encode on the fly, because it made video uploads complete faster on mobile connections. (At the cost of a whole new language to support forever.)
All of this stuff became really burdensome to maintain, especially when it wasn’t me who was maintaining the app. For a number of years, I couldn’t be involved due to conflicts of interest with my employers, and Cluster was maintained by some amazing engineers who loved it like I did.
The complexity of this code didn’t meaningfully increase the value of Cluster to its users, so when it eventually broke or became confusing, it wasn’t worth fixing.
The overcomplex stuff, the stuff that was nice to have, that was a little fancy or bespoke — that all got ripped out.
The uploader is basic now, it just starts uploading photos when you hit “Post” and shows a progress bar like it probably should’ve originally. Videos just encode after we have received the full file. The notifications are simpler and less prone to race conditions.
Every major dependency hastens your code’s demise
What’s changed since 2013 in the world of development? Nothing much, right?!
Python and the mobile platforms themselves are probably the most obvious large scale shifts — we lived to see the sunset of Python 2 and moved through 8+ major iOS releases. But beyond those, we also have some code in Go, and some code in Node.js, both of which have seen many significant changes since 2014/2015. (Some of Cluster’s web app runs on a custom Node.js frontend — again, what was I thinking?)
Over the years we have had to rebuild the runtime and deployment process for each of these major languages — Python (Django and Celery apps), Golang and Node.js — and each of these adds another dimension to the time and complexity of ongoing maintenance.
But third party dependencies are much worse than core languages and frameworks.
Third party SDKs, especially for authentication, have probably been the single worst moving target. The Google SDK became the Google+ SDK and is now the Google Sign-In SDK; the Facebook SDK has been completely gutted. (Re-inflating Facebook sessions? I ripped that out.) Crashlytics became Twitter Fabric and is now part of Google Firebase. Apple sunset their awful binary push notifications service and replaced it with the current weird HTTP/2 solution. We even had to implement Apple Sign-In, which has a stunning lack of documentation.
Third party integrations are the worst. Choose your third party partners with great care.
What’s changed the least? The “boring” technology. Postgres and Redis required zero hijinks to keep running. Even Django itself caused very little trouble when I upgraded from Django 1.5 to Django 4.0 in June 2022 — I barely had to touch my Django views! Most of the Django changes were in configuration and routing, and the most difficult single thing was migrating ancient Django 1.5 session data to the modern Django 4.0 format so users wouldn’t get logged out during the migration.
In the client, the core ViewController logic is essentially unchanged, but I had to rewrite the entire login flow due to all the new third party SDKs.
Integration tests are a lifesaver
Cluster has a set of really expensive integration tests, which attempt to run the full stack — including calls to a local instance of Google App Engine (!). The test harness literally fires up local Google App Engine, and shares the URL with the test suite, which issues calls against it. It’s bananas.
These tests are slow and awful by modern standards — the full suite takes a few minutes to run. But they have been an absolute godsend as I continue to maintain this codebase, because they exercise most of the core code that makes the app work.
The Python 3 migration, notably the switch from str to bytes, was a big lift. But it was also fairly straightforward: run 2to3, run this giant suite of tests, and fix the failures.
In fact, the places that broke the most after the migration were the places we didn’t have any tests. Of course! If I didn’t have tests, I would’ve been fully screwed.
Aside: Building for the long run vs. launching an MVP
In one sense, building for the long run would seem to run counter to the “lean startup”-type goals of launching and iterating on an MVP as quickly as you can. MVP means launching something hacky and simple, but building for the long term would seem to be the opposite of launching something hacky.
But I think the two goals are actually very well aligned:
- Focus on building simple solutions that solve real, urgent user problems, in well-worn ways.
- Use battle-tested software and platforms that are stable and don’t change too much over time.
- Only integrate with third parties when it’s absolutely necessary for solving those real, urgent problems.
This is what you need in order to build long term value, but it’s also what you should do when you build an MVP. It’s likely that we overbuilt Cluster to begin with, but I also think that if you embrace YAGNI and KISS, you too can build a sustainable offering if something doesn’t end up working out as a rocketship.
(One thing that certainly differs is optimizing runtime performance and costs: Tuning the app to run reasonably well for the main endpoints, configuring autoscaling, setting up glacier file storage, etc. is not something to waste time on as you validate a concept.)
Wrapping up
When I built Cluster, I never imagined it’d still be running in 2023. Or if it was still running, I thought it’d be some massive success, and we would’ve rewritten the whole thing in C++ or whatever anyway.
But the app survives, and I think it’s partially because we have a solid core of simple code that keeps the app running. Now that we’ve shed some of the complexity of the original implementation, I am confident we’ll be in good shape come 2033!
Like the post? Find me on Twitter, @taylorhughes!