public class DocLengthTable2B extends Object implements DocLengthTable
Object that keeps track of the length of each document in the collection as a two-byte integer (shorts). Document lengths are measured in number of terms.
Document length data is stored in a serialized data file, in the following format:
Since the documents are numbered sequentially starting at d + 1, each short corresponds unambiguously to a particular document.
Constructor and Description |
---|
DocLengthTable2B(Path file)
Creates a new
DocLengthTable2B . |
DocLengthTable2B(Path file,
FileSystem fs)
Creates a new
DocLengthTable2B . |
Modifier and Type | Method and Description |
---|---|
float |
getAvgDocLength()
Returns the average document length.
|
int |
getDocCount()
Returns number of documents in the collection.
|
int |
getDocLength(int docno)
Returns the length of a document.
|
int |
getDocnoOffset()
Returns the first docno in this collection.
|
static void |
main(String[] args) |
public DocLengthTable2B(Path file) throws IOException
DocLengthTable2B
.file
- document length data fileIOException
public DocLengthTable2B(Path file, FileSystem fs) throws IOException
DocLengthTable2B
.file
- document length data filefs
- FileSystem to read fromIOException
public float getAvgDocLength()
DocLengthTable
getAvgDocLength
in interface DocLengthTable
public int getDocCount()
DocLengthTable
getDocCount
in interface DocLengthTable
public int getDocLength(int docno)
DocLengthTable
getDocLength
in interface DocLengthTable
public int getDocnoOffset()
DocLengthTable
getDocnoOffset
in interface DocLengthTable