元组插入
元组的插入主要由函数heap_insert
来完成,主要分为几步:
初始化元组头 HeapTupleHeader
获取可用的page
判断元组可见性和事务冲突
将元组写入可用的page,并标记page dirty
写wal
初始化元组的元数据
在heap_prepare_insert
中完成,接口比较简单,我们直接来看源码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 static HeapTupleheap_prepare_insert (Relation relation, HeapTuple tup, TransactionId xid, CommandId cid, int options) { tup->t_data->t_infomask &= ~(HEAP_XACT_MASK); tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK); tup->t_data->t_infomask |= HEAP_XMAX_INVALID; HeapTupleHeaderSetXmin (tup->t_data, xid); if (options & HEAP_INSERT_FROZEN) HeapTupleHeaderSetXminFrozen (tup->t_data); HeapTupleHeaderSetCmin (tup->t_data, cid); HeapTupleHeaderSetXmax (tup->t_data, 0 ); tup->t_tableOid = RelationGetRelid (relation); if (relation->rd_rel->relkind != RELKIND_RELATION && relation->rd_rel->relkind != RELKIND_MATVIEW) { Assert (!HeapTupleHasExternal (tup)); return tup; } else if (HeapTupleHasExternal (tup) || tup->t_len > TOAST_TUPLE_THRESHOLD) return heap_toast_insert_or_update (relation, tup, NULL , options); else return tup; }
获取可用的page
在RelationGetBufferForTuple
中完成。
relation是目标表对象,len是tuple插入需要的长度。otherBuffer用于元组update时替换旧的buffer,options是写入的选项,bistate表示批量插入对象的状态,vmbuffer和vmbuffer_other用于可见性映射
1 2 3 4 5 Buffer RelationGetBufferForTuple (Relation relation, Size len, Buffer otherBuffer, int options, BulkInsertState bistate, Buffer *vmbuffer, Buffer *vmbuffer_other)
通过填充因子,计算空闲空间。
填充因子FillFactor
是一个百分比,限制了我们对于page的使用率
1 2 3 4 5 6 7 8 9 10 11 saveFreeSpace = RelationGetTargetPageFreeSpace (relation, HEAP_DEFAULT_FILLFACTOR); nearlyEmptyFreeSpace = MaxHeapTupleSize - (MaxHeapTuplesPerPage / 8 * sizeof (ItemIdData)); if (len + saveFreeSpace > nearlyEmptyFreeSpace) targetFreeSpace = Max (len, nearlyEmptyFreeSpace); else targetFreeSpace = len + saveFreeSpace;
尝试从cache中获取表最近使用的page,如果没有则尝试通过FSM获取满足插入条件的page
1 2 3 4 5 6 7 8 9 10 if (bistate && bistate->current_buf != InvalidBuffer) targetBlock = BufferGetBlockNumber (bistate->current_buf); else targetBlock = RelationGetTargetBlock (relation); if (targetBlock == InvalidBlockNumber && use_fsm){ targetBlock = GetPageWithFreeSpace (relation, targetFreeSpace); }
接下来我们拿着之前获取到的block,去获取对应的buffer
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 if (otherBuffer == InvalidBuffer){ buffer = ReadBufferBI (relation, targetBlock, RBM_NORMAL, bistate); if (PageIsAllVisible (BufferGetPage (buffer))) visibilitymap_pin (relation, targetBlock, vmbuffer); if ((options & HEAP_INSERT_FROZEN) && (PageGetMaxOffsetNumber (BufferGetPage (buffer)) == 0 )) visibilitymap_pin (relation, targetBlock, vmbuffer); LockBuffer (buffer, BUFFER_LOCK_EXCLUSIVE); } page = BufferGetPage (buffer);
1 2 3 4 5 6 7 8 pageFreeSpace = PageGetHeapFreeSpace (page); if (targetFreeSpace <= pageFreeSpace){ RelationSetTargetBlock (relation, targetBlock); return buffer; }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 if (bistate && bistate->next_free != InvalidBlockNumber){ targetBlock = bistate->next_free; if (bistate->next_free >= bistate->last_free) { bistate->next_free = InvalidBlockNumber; bistate->last_free = InvalidBlockNumber; } else bistate->next_free++; } else if (!use_fsm){ break ; } else { targetBlock = RecordAndGetPageWithFreeSpace (relation, targetBlock, pageFreeSpace, targetFreeSpace); }
如果以上循环中没有找到空闲buffer,我们只能进入扩页的逻辑来新增一个page用来插入
1 2 3 4 5 6 7 buffer = RelationAddBlocks (relation, bistate, num_pages, use_fsm, &unlockedTargetBuffer); targetBlock = BufferGetBlockNumber (buffer); page = BufferGetPage (buffer);
新增的页我们再来检查下是否有足够的空闲空间,如果有的话就能返回使用了
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 pageFreeSpace = PageGetHeapFreeSpace (page); if (len > pageFreeSpace){ if (unlockedTargetBuffer) { if (otherBuffer != InvalidBuffer) LockBuffer (otherBuffer, BUFFER_LOCK_UNLOCK); UnlockReleaseBuffer (buffer); goto loop; } elog (PANIC, "tuple is too big: size %zu" , len); } RelationSetTargetBlock (relation, targetBlock);return buffer;
写入数据
写入数据是由接口RelationPutHeapTuple
来完成
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 void RelationPutHeapTuple (Relation relation, Buffer buffer, HeapTuple tuple, bool token) { Page pageHeader; OffsetNumber offnum; ... pageHeader = BufferGetPage (buffer); offnum = PageAddItem (pageHeader, (Item) tuple->t_data, tuple->t_len, InvalidOffsetNumber, false , true ); ItemPointerSet (&(tuple->t_self), BufferGetBlockNumber (buffer), offnum); if (!token) { ItemId itemId = PageGetItemId (pageHeader, offnum); HeapTupleHeader item = (HeapTupleHeader) PageGetItem (pageHeader, itemId); item->t_ctid = tuple->t_self; } }
这里主要的工作都有函数PageAddItem
来完成,PageAddItem
是一个宏,内部调用PageAddItemExtended
,我们来详细看下这个接口
page是插入的页面,item的插入的数据指针,size的插入数据的大小。offsetNumber是元组在页面中的偏移量,如果插入成功则会被返回。flags是插入的选项
1 2 3 4 5 6 OffsetNumber PageAddItemExtended (Page page, Item item, Size size, OffsetNumber offsetNumber, int flags)
首先我们在page中寻找下一个slot的位置,如果之后没有找到空余的slot,我们就会使用这个位置来插入元组
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 limit = OffsetNumberNext (PageGetMaxOffsetNumber (page)); if (PageHasFreeLinePointers (page)){ for (offsetNumber = FirstOffsetNumber; offsetNumber < limit; offsetNumber++) { if (!ItemIdIsUsed (itemId) && !ItemIdHasStorage (itemId)) break ; } } else { offsetNumber = limit; }
接下来我们计算page的pd_lower和pg_upper指针指向的offset
1 2 3 4 5 6 7 8 if (offsetNumber == limit || needshuffle) lower = phdr->pd_lower + sizeof (ItemIdData); else lower = phdr->pd_lower; alignedSize = MAXALIGN (size); upper = (int ) phdr->pd_upper - (int ) alignedSize;
最后我们就可以把元组插入到page中了
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 itemId = PageGetItemId (page, offsetNumber); if (needshuffle) memmove (itemId + 1 , itemId, (limit - offsetNumber) * sizeof (ItemIdData)); ItemIdSetNormal (itemId, upper, size);memcpy ((char *) page + upper, item, size);phdr->pd_lower = (LocationIndex) lower; phdr->pd_upper = (LocationIndex) upper;
###WAL log和统计信息
将元组真正插入到page之后,我们会写WAL log并更新相关的统计信息。这两块我们就不在这里详细的描述了