CMU-15445-Project#2

发表于 2024/07/29 更新于 2024/08/18

作者 Silly Cheese 6 分钟阅读

Task #1 - Read/Write Page Guards

我们将要实现 BasicPageGuard，它存储着 BufferPoolManager 与 Page 的指针，同时可以确保 UnpinPage 当 Page out of scope 的时候可以被调用。当然，它也可以公开一个方法以便于我们手动 unpin。

在 Page 中，有相关的 latching methods。和 unpin 差不多，我们可能忘记 unlatch 在使用后。所以我们要实现 ReadPageGuard 和 WritePageGuard 来自动 unlatch。

BasicPageGuard, ReadPageGuard, WritePageGuard

PageGuard(PageGuard &&that): Move constructor.(记得清除参数成员)
operator=(PageGuard &&that): Move operator.（先把 this 给 Drop，然后赋值）
Drop(): Unpin and/or unlatch.记得要检查 page 的 pin 状态，再调用 UnpinPage 或者 Latch（注意顺序）
~PageGuard(): Destructor.直接调用 Drop 就行了

You will also need to implement the following upgrade functions for BasicPageGuard. These functions need to guarantee that the protected page is not evicted from the buffer pool during the upgrade.

UpgradeRead(): Upgrade to a ReadPageGuard
UpgradeWrite(): Upgrade to a WritePageGuard

Task1 暂时貌似用不到，我目前的做法是给里面加相关的 Latch 然后赋值。

With the new page guards, implement the following wrappers in BufferPoolManager.

FetchPageBasic(page_id_t page_id)
FetchPageRead(page_id_t page_id)
FetchPageWrite(page_id_t page_id)
NewPageGuarded(page_id_t *page_id)

以上基本是对 FetchPage 与 NewPage 的封装，记得加相关 Latch 即可。

以上更多的信息与注释都在 buffer_pool_manager.h and page_guard.h.

个人批注

Task 1 花了一天的时间，大部分时间用在理解与调试上了（看懂信息很关键，另外 Discord 也很有用）

Task #2 - Extendible Hash Table Pages

时隔两个月，让我们再次回归到Task 2。

注意hash如何转化成directoryIndex的方法,其余函数按照注释写便是。

        
      
auto ExtendibleHTableHeaderPage::HashToDirectoryIndex(uint32_t hash) const -> uint32_t {  
  uint32_t mask = (1 << max_depth_) - 1;  
  return hash & mask;  
}

Hash Table Directory Page

我们将要用到的信息与知识：

The DBMS maintains a global and local depth bit counts that determine the number bits needed to find buckets in the slot array.
When a bucket is full, the DBMS splits the bucket and reshuffle its elements. If the local depth of the split bucket is less than the global depth, then the new bucket is just added to the existing slot array. Otherwise, the DBMS doubles the size of the slot array to accommodate the new bucket and increments the global depth counter.

几个值得关注的函数

auto ExtendibleHTableDirectoryPage::GetSplitImageIndex(uint32_t bucket_idx) const -> uint32_t

这个函数主要是求新分裂的Bucket的位置，运算方法如下

        
      
auto ExtendibleHTableDirectoryPage::GetSplitImageIndex(uint32_t bucket_idx) const -> uint32_t {  
  uint32_t local_depth = local_depths_[bucket_idx];  
  uint32_t mask = 1 << (local_depth - 1);  
  return bucket_idx ^ mask;  
}

Hash Table Bucket Page

这一部分的函数其实还算好写，直接跳过。

Task #3 - Extendible Hashing Implementation

这一部分相当麻烦，一定要理顺其中的层层关系。直接搬运十分重要的bucket splitting/merging and directory growing/shrinking信息。

Empty Table

When you first create an empty hash table, it should only have the (one and only) header page. Directory pages and bucket pages should be created on demand.

Header Indexing

You will want to use the most-significant bits for indexing into the header page’s directory_page_ids_ array. This involves taking the hash of your key and perform bit operations with the depth of the header page. The header page depth will not change.

Directory Indexing

You will want to use the least-significant bits for indexing into the directory page’s bucket_page_ids_ array. This involves taking the hash of your key and perform bit operations with the current depth of the directory page.

Bucket Splitting

You must split a bucket if there is no room for insertion. You can ostensibly split as soon as the bucket becomes full, if you find that easier. However, the reference solution splits only when an insertion would overflow a page. Hence, you may find that the provided API is more amenable to this approach. As always, you are welcome to factor your own internal API.

Bucket Merging

Merging must be attempted when a bucket becomes empty. There are ways to merge more aggressively by checking the occupancy of buckets and their split images, but these expensive checks and extra merges can increase thrashing.

To keep things relatively simple, we provide the following rules for merging:

Only empty buckets can be merged.
Buckets can only be merged with their split image if their split image has the same local depth.
You should keep merging recursively if the new split image of the merged bucket is empty.

If you are confused about a “split image,” please review the algorithm and code documentation. The concept falls out quite naturally.

Directory Growing

There are no fancy rules for part of the hash table. You either have to grow the directory, or you do not.

Directory Shrinking

Only shrink the directory if the local depth of every bucket is strictly less than the global depth of the directory.

Task #4 - Concurrency Control

不多说，直接搬运重要信息。

We recommend that you complete this task by using the FetchPageWrite or FetchPageRead buffer pool API, depending on whether you want to access a page with read or write privileges. Then modify your implementation to grab and release read and write latches as necessary to implement the latch crabbing algorithm.

Note: You should never acquire the same read lock twice in a single thread. It might lead to deadlock.

Note: You should make careful design decisions on latching. Always holding a global latch the entire hash table is probably not a good idea. TAs will manual review your implementation and bad latching design would result in points deduction.

Finally

终于完成了，前前后后拖了三个多月才完成。下面会贴上给到我帮助的友链

Project#2: Extendible Hash Index

CMU 15-445(Fall 2023) Project2 Extendible Hash Index个人笔记

CMU 15445 Project 2

CMU15445, Project

Database

本文由作者按照 CC BY 4.0 进行授权