Functional Requirements
Core Requirements
- Users should be able to view a list of coding problems.
- Users should be able to view a given problem, code a solution in multiple languages.
- Users should be able to submit their solution and get instant feedback
- Users should be able to view a live leaderboard for competitions.
Below the line (out of scope):
- User authentication
- User profiles
- Payment processing
- User analytics
- Social features
Non-Functional Requirements
Core Requirements
- The system should prioritize availability over consistency.
- The system should support isolation and security when running user code.
- The system should return submission results within 5 seconds.
- The system should scale to support competitions with 100,000 users.
Below the line (out of scope):
- The system should be fault-tolerant.
- The system should provide secure transactions for purchases.
- The system should be well-tested and easy to deploy (CI/CD pipelines).
- The system should have regular backups.
- Rate limit user submissions apis
Core Entities:
- Problem: This entity will store the problem statement, test cases, and the expected output.
{
id: string,
title: string,
question: string,
level: string,
tags: string[],
codeStubs:
{
python: string,
javascript: string,
typescript: string, ...
},
testCases:
{
type: string,
input: string,
output: string
}
}
- Submission: This entity will store the user’s code submission and the result of running the code against the test cases.
{
id: string,
commptitionId: string,
submittedAt: long,
testCaseResult : string,
error:
}- Leaderboard: This entity will store the leaderboard for competitions.
APIS:
`NOTE : User data should always be passed in the session or JWT, while timestamps should be generated by the server.
GET /problems?page=1&limit=100
header: user jwt token
-> response
Partial<Problem>[]
endpoint to view a specific problem
GET /problems/:id?language={language}
header: user jwt token
-> Problem
We've added a query parameter for language which can default to any language, say python if not provided. This will allow us to return the code stub in the user's preferred language.
_Endpoint to submit a solution to a problem
POST /problems/:id/submit
header: user jwt token
-> Submission
{
code: string,
language: string
}
endpoint to view a live leaderboard for competitions.
GET /leaderboard/:competitionId?page=1&limit=100
header: user jwt token
-> Leaderboard
3) Users should be able to submit their solution and get instant feedback
Users should be able to view a live leaderboard for competitions
First, we should define a competition. We will define them as:
- 90 minutes long
- 10 problems
- Up to 100k users
- Scoring is the number of problems solved in the 90 minutes. In case of tie, we’ll rank by the time it took to complete all 10 problems (starting from competition start time).
In a SQL database, this would be a query like:
SELECT userId, COUNT(*) as numSuccessfulSubmissions FROM submissions WHERE competitionId = :competitionId AND passed = true GROUP BY userId ORDER BY numSuccessfulSubmissions DESC
In a NoSQL DB like DynamoDB, you’d need to have the partition key be the competitionId. Then you’d pull all items into memory and group and sort.
How would you make fetching the leaderboard more efficient?
Approach 1: Clients poll the server every few seconds, and on each poll, the server queries the database for the top N users, sorts them, and returns the result. We’d then keep the data fresh by having the client poll every 5 seconds. While this works better if we switched to a relational database, it still has signifcant shortcomings.
challanges
The main issues with this approach are the high database load due to frequent queries and increased latency as the number of users grows. It doesn’t scale well for real-time updates, especially during competitions with many participants. With each user poll triggering a query for a million records every 5 seconds, the database would struggle to keep up, especially during peak times or competitions
approch 2 To reduce the load on the database, we can introduce a cache (e.g., Redis) that stores the current leaderboard. The cache is updated periodically, say every 30 seconds, by querying the database using cron job. When clients poll the server, we return the cached leaderboard instead of querying the database each time.
challanges
Updates aren’t truly real-time, and there’s still polling involved, which isn’t ideal for live updates. There’s also a potential for race conditions if the cache update frequency is too low. However, this method is a significant improvement over the previous approach and could work well for smaller-scale competitions.
Approach 3 The **Redis Sorted Set(priorty queue) **: keep data sorted based on some parameter. like score here Redis’ sorted sets maintain ordered data which can be queried in log time which make them appropriate for leaderboard applications. The high write throughput and low read latency make this especially useful for scaled applications where something like a SQL DB will start to struggle.
Redis sorted sets to maintain a real-time leaderboard while storing submission results in the main database. When a submission is processed, both the database and the Redis sorted set are updated. Clients poll the server every 5 seconds for leaderboard updates, and the server returns the top N users from the Redis sorted set which is wicked fast and requires no expensive database queries.
Score: is based on completed submisstion and time taken for submission for each in case of conflicts.
with Periodic Polling solution strikes a good balance between real-time updates and system simplicity. It’s more than adequate for our needs and can be easily scaled up if required in the future.
The Redis sorted set uses a key format like competition:leaderboard:{competitionId}, with the score being the user’s total score or solve time, and the value being the userId. We’d implement an API endpoint like GET /competitions/:competitionId/leaderboard?top=100. When processing a submission, we’d update Redis with a command like ZADD competition:leaderboard:{competitionId} {score} {userId}. To retrieve the top N users, we’d use ZREVRANGE competition:leaderboard:{competitionId} 0 N-1 WITHSCORES
Other implrovements If we find that the 5-second interval is too frequent, we can easily adjust it. We could even implement a progressive polling strategy where we poll more frequently (e.g., every 2 seconds) during the last few minutes of a competition and less frequently (e.g., every 10 seconds) at other times.
How will the system support isolation and security when running user code?
By running our code in an isolated container, we’ve already taken a big step towards ensuring security and isolation.
How do we make it more better?
- Read Only Filesystem: To prevent users from writing to the filesystem, we can mount the code directory as read-only and write any output to a temporary directory that is deleted a short time after completion.
-
- CPU and Memory Bounds: To prevent users from consuming excessive resources, we can set CPU and memory limits on the container. If these limits are exceeded, the container will be killed, preventing resource exhaustion.
-
- Explicit Timeout: To prevent users from running infinite loops, we can wrap the user’s code in a timeout that kills the process if it runs for longer than a predefined time limit, say 5 seconds. This will also help us meet our requirement of returning submission results within 5 seconds.
-
- Limit Network Access: To prevent users from making network requests, we can disable network access in the container, ensuring that users can’t make any external calls. If working within the AWS ecosystem, we can use Virtual Private Cloud (VPC) Security Groups and NACLs to restrict all outbound and inbound traffic except for predefined, essential connections.
-
- No System Calls (Seccomp): We don’t want users to be able to make system calls that could compromise the host system. We can use [seccomp] restrict the system calls that the container can make.
`To be concrete, this means just saying, “We’d use docker containers while limiting network access, setting CPU and memory bounds, and enforcing a timeout on the code execution” is likely sufficient.
How would the system scale to support competitions with 100,000 users?
queue between the API server and the containers. This will allow us to buffer submissions during peak times and ensure that we don’t overwhelm the containers. We can use a managed queue service like [SQS]to handle this for us.
When a user submits their code, the API server will add the submission to the queue and the containers will pull submissions off the queue as they become available. This will help us manage the load on the containers and ensure that we don’t lose any submissions during peak times.
How would the system handle running test cases?
need a standard way to serialize the input and output of each test case and a test harness for each language which can deserialize these inputs, pass them to the user’s code, and compare the output to the deserialized expected output.