public class DocLengthTable4B extends Object implements DocLengthTable
Object that keeps track of the length of each document in the collection as a four-byte integers (ints). Document lengths are measured in number of terms.
Document length data is stored in a serialized data file, in the following format:
Since the documents are numbered sequentially starting at d + 1, each short corresponds unambiguously to a particular document.
Constructor and Description |
---|
DocLengthTable4B(Path file)
Creates a new
DocLengthTable4B . |
DocLengthTable4B(Path file,
FileSystem fs)
Creates a new
DocLengthTable4B . |
Modifier and Type | Method and Description |
---|---|
float |
getAvgDocLength()
Returns the average document length.
|
int |
getDocCount()
Returns number of documents in the collection.
|
int |
getDocLength(int docno)
Returns the length of a document.
|
int |
getDocnoOffset()
Returns the first docno in this collection.
|
public DocLengthTable4B(Path file) throws IOException
DocLengthTable4B
.file
- document length data fileIOException
public DocLengthTable4B(Path file, FileSystem fs) throws IOException
DocLengthTable4B
.file
- document length data filefs
- FileSystem to read fromIOException
public float getAvgDocLength()
DocLengthTable
getAvgDocLength
in interface DocLengthTable
public int getDocCount()
DocLengthTable
getDocCount
in interface DocLengthTable
public int getDocLength(int docno)
DocLengthTable
getDocLength
in interface DocLengthTable
public int getDocnoOffset()
DocLengthTable
getDocnoOffset
in interface DocLengthTable