一道很考验数据结构与算法的功底的笔试题：用JAVA设计一个缓存结构

2023-02-10 JAVA 0 Comments Word Count: 999(words) Read Count: 4(minutes)

我在上周的笔试中遇到了这样一道题目，觉得有难度而且很考验数据结构与算法的功底，因此Mark一下。

需求说明

设计并实现一个缓存数据结构:

该数据结构具有以下功能：

get(key)
如果指定的key存在于缓存中，则返回与该键关联的值，则返回-1。

put(key、val、weight)

将值与缓存中的键关联，以便以后可以通过get(key)检索值。

缓存具有固定的容量，当达到该容量时，score最小的密钥必须失效，直到密钥的数量落在缓存容量之内。
score的计算方法如下：

weight ∕ [ln(current_time - last_accessed_time + 1) + 1]

缓存的实现需要get(key)的时间复杂度为O(1)。为了实现高速缓存，您可以假设可用一些常见的数据结构，如数组、不同类型的列表和哈希表。在答案的最后，给出并解释get(key)和放入put(key)的计算复杂度

我的思路

首先，一说到get和put，肯定会想到哈希map，并且哈希的get时间复杂度也为O(1),符合要求，但比较棘手的需求是如何实现缓存的score机制，当缓存满的时候需要让score最低的节点drop掉。苦思冥想之后我想到了优先队列(priority queue)，平时觉得这个数据结构很冷门，但确实有应用场景，优先队列是一种根据权重进行出队顺序排列的队列，那么我只需要将题目中的score定位为权重就行了。
此时我又想到了用JAVA中的Comparator去定义一个这样的权重策略，因为优先队列的权重是可以被Comparator重写的。所以我总共需要用到两个数据结构。
用hashmap实现get和put的一一对应，同时将节点存入优先队列，当容量满时让score小的出队就行了。（注意，Java中优先队列是权重小的先出队）

我的答案

import java.util.*;

class Node{
	int key;
	int val;
	int weight;
	int timeStamp;

	public Node(int key, int val, int weight, int timeStamp) {
		this.key = key;
		this.val = val;
		this.weight = weight;
		this.timeStamp = timeStamp;
	}
}

public class Cache {
	int capacity;
	int timeStamp;
	Map<Integer,Node> nodeMap;  //k-v mapping
	PriorityQueue<Node> prque;  //store the node

	public Cache(int capacity){
		this.capacity = capacity;
		this.timeStamp = 0;
		nodeMap = new HashMap<>();
		Comparator<Node> timeWeightComparator = new Comparator<Node>() { //rewrite the priority
			@Override
			public int compare(Node o1, Node o2) {
				return (int) (o1.weight / (Math.log(o1.timeStamp - o2.timeStamp + 1) + 1) -
						(o2.weight / (Math.log(o2.timeStamp - o1.timeStamp + 1) + 1)));
			}
		};
		prque = new PriorityQueue<>(timeWeightComparator);
	}

	public int get(int key){  //时间复杂度O(1), hashmap.get为O(1)
		if (!nodeMap.containsKey(key)){
			return -1;
		}
		Node getNode = nodeMap.get(key);
		getNode.timeStamp = ++timeStamp;
		return getNode.val;
	}

	void put(int key, int val, int weight){ //最好的情况是已经包含这个键了,时间复杂度为O(1)
		if (this.capacity <= 0){
			return;
		}
		if (nodeMap.containsKey(key)){
			Node newNode = nodeMap.get(key);
			newNode.val = val;
			newNode.weight = weight;
			newNode.timeStamp = ++ timeStamp;
		}else {
			if (nodeMap.size() == capacity){
				Node leastNode = prque.poll(); //O(logN)
				assert leastNode != null;
				nodeMap.remove(leastNode.key);
			}
			Node newNode = new Node(key, val, weight, ++timeStamp);
			prque.add(newNode);
			nodeMap.put(key,newNode);
		}

	}

	public static void main(String[] args) { //test case
		Cache cache = new Cache(5);
		cache.put(0,15,3);
		cache.put(1,28,10);
		cache.put(2,16,4);
		cache.put(3,4,6);
		cache.put(4,75,5);
		cache.put(4,100,100);
		System.out.println(cache.get(1));
		System.out.println(cache.get(2));
		System.out.println(cache.get(3));
		System.out.println(cache.get(4));
		System.out.println(cache.get(0));

	}
}

BigO notation analysis

get

The get operation is base on the hashmap.get(key). So, the time complexity is O(1).

put

The put operation can be seperated to follow two case:

1. Don’t need insert a new node (when the key is exist)

In this case, we only need to get the node from hashmap and update it. The time complexity is O(1).

2. Insert new Node

If the capcity is not reached. we can insert a new node directly. the complexity is O(logN) + O(1) = O(logN) —- (O(logN) for priorityque, O(1) for hashmap).

If the capicity is reached. we need to poll a node with least score, the time complexity is O(logN). Then inster a new node. The time complexity is O(logN) + O(logN) + O(1) = O(logN).

本文链接： https://stephen-smj.tech/2023/02/10/用JAVA设计一个缓存结构/

版权声明： 本博客所有文章除特别声明外，均采用 CC BY 4.0 CN协议许可协议。转载请注明出处！

小孙不够睡AI Engineer & Software Developer & Big Data Scientist

PhD Student @ Hong Kong Polytechnic University